DirectEntry encoding [SOLVED]

ninth · August 29, 2011, 1:14pm

Under linux i recive text from DirectEntry in utf-8 and all good, but under windows it seems in cp1251 (cyrillic). Although if i try text.decode(‘cp1251’) it causes codec error, like this

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 0:
ordinal not in range(128)

I used this hack

for c in txt: chr(ord(c)).decode('cp1251')

and has got needed characters. But in the DirectEntry input field is still “hieroglyphs”

drwr · August 29, 2011, 2:17pm

DirectEntry[‘get’] will normally return a Unicode object, not a string object, if the user’s text contains non-ASCII characters. (You can change this default behavior with config variables if you require something different.)

This means you shouldn’t try to decode it, it is already decoded. Depending on what you do with it (for instance if you write it directly to a file or something), Python may automatically re-encode it according to your default Python encoding, whatever that is.

I don’t know what you mean when you say the DirectEntry field contains “hieroglyphs”; do you mean you want to type in Cyrillic and you see nonsense instead? That suggests you have loaded a font that does not contain Cyrillic characters.

David

ninth · August 30, 2011, 4:31am

I mean this

My test code

# -*- coding: utf-8 -*- 
from panda3d.core import *
from direct.gui.DirectGui import *
import direct.directbase.DirectStart

def test(*args):
    #print e.get()
    print map(ord, e.get())
font = loader.loadFont('arial.ttf')
e = DirectEntry(entryFont = font, scale=0.07, command = test)
run()

ninth · August 30, 2011, 4:39am

In addition, if I set initialText = u’ЫЫЫ’ it’s displayed normally on XP

drwr · August 30, 2011, 5:11am

It seems you don’t have an encoding problem, you have a typing problem. The Unicode characters 192, 193, 194, 195, and 196 are correctly rendered as À, Á, Â, Ã, and Ä, respectively. So the DirectEntry appears to actually contain these five characters, and not the five Cyrillic letters you think it should contain.

The question then is why the DirectEntry contains these characters instead of the ones you think it should. How did you type these characters?

David

ninth · August 30, 2011, 6:04am

АБВГД has 192, 193, 194, 195, 196 codes in cp1251.
Entering the characters from the keyboard into DirectEntry input field as usual.

drwr · August 30, 2011, 6:12am

Interesting, so it appears that Windows has sent the codes 192 through 196 to Panda (presumably via WM_CHAR) when you typed, and Panda interpreted them as Unicode characters. This is not altogether surprising, since Panda knows nothing about cp1251. What’s surprising is that Windows sent the character codes using this codec in the first place. I’ve never seen anything other than Unicode sequences coming in through WM_CHAR, but I guess I’ve been naive.

I’m also a bit surprised that no one else has reported this odd behavior before.

Edit: checking the MSDN docs right now, it clearly specifies that WM_CHAR is supposed to send the character codes in UTF-16, which means they should be Unicode, not cp1251. So either something’s very wrong in the way Windows is generating these keycodes in your case, or somehow it’s not going through WM_CHAR. But looking at the code, it appears that WM_CHAR is the only way to populate the DirectEntry (except for control-V pasting or the IME), so I’m completely baffled.

Edit: Nevermind, after further research via Google, I now understand that the MSDN docs are misleading here. WM_CHAR will send Unicode to a Unicode window, but ANSI to an ANSI window. And Panda is naively creating an ANSI window, since that’s what you get by default when you call RegisterClass(). So the short answer is, there’s an easy fix.

David

ninth · August 30, 2011, 6:52am

I think the same problem has discussed here [Is DirectEntry support Asian languages ?)

Hm… I only just tried on Win 7 x64 - seems that on win7 DirectEntry works correctly for me.

drwr · August 30, 2011, 1:49pm

Hmm, you’re right, perhaps the OP on that other thread was indeed experiencing this same problem, and I failed to understand it correctly.

Yes, it seems that the behavior of WM_CHAR is quite inconsistent across Windows platforms, which is no doubt a big part of why this has escaped my notice before. Even the MSDN docs don’t describe the behavior accurately.

David

drwr · August 30, 2011, 3:58pm

OK, I’ve just checked in what should be a suitable fix. After the buildbot server has had a chance to pick this recent change up, please try the buildbot release and let me know if it solves the problem.

Thanks for helping discover this bug!

David

ninth · September 1, 2011, 7:37am

All works fine, thanks. Problem is solved