Under linux i recive text from DirectEntry in utf-8 and all good, but under windows it seems in cp1251 (cyrillic). Although if i try text.decode(‘cp1251’) it causes codec error, like this
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 0:
ordinal not in range(128)
I used this hack
for c in txt: chr(ord(c)).decode('cp1251')
and has got needed characters. But in the DirectEntry input field is still “hieroglyphs”
DirectEntry[‘get’] will normally return a Unicode object, not a string object, if the user’s text contains non-ASCII characters. (You can change this default behavior with config variables if you require something different.)
This means you shouldn’t try to decode it, it is already decoded. Depending on what you do with it (for instance if you write it directly to a file or something), Python may automatically re-encode it according to your default Python encoding, whatever that is.
I don’t know what you mean when you say the DirectEntry field contains “hieroglyphs”; do you mean you want to type in Cyrillic and you see nonsense instead? That suggests you have loaded a font that does not contain Cyrillic characters.
It seems you don’t have an encoding problem, you have a typing problem. The Unicode characters 192, 193, 194, 195, and 196 are correctly rendered as À, Á, Â, Ã, and Ä, respectively. So the DirectEntry appears to actually contain these five characters, and not the five Cyrillic letters you think it should contain.
The question then is why the DirectEntry contains these characters instead of the ones you think it should. How did you type these characters?
Interesting, so it appears that Windows has sent the codes 192 through 196 to Panda (presumably via WM_CHAR) when you typed, and Panda interpreted them as Unicode characters. This is not altogether surprising, since Panda knows nothing about cp1251. What’s surprising is that Windows sent the character codes using this codec in the first place. I’ve never seen anything other than Unicode sequences coming in through WM_CHAR, but I guess I’ve been naive.
I’m also a bit surprised that no one else has reported this odd behavior before.
Edit: checking the MSDN docs right now, it clearly specifies that WM_CHAR is supposed to send the character codes in UTF-16, which means they should be Unicode, not cp1251. So either something’s very wrong in the way Windows is generating these keycodes in your case, or somehow it’s not going through WM_CHAR. But looking at the code, it appears that WM_CHAR is the only way to populate the DirectEntry (except for control-V pasting or the IME), so I’m completely baffled.
Edit: Nevermind, after further research via Google, I now understand that the MSDN docs are misleading here. WM_CHAR will send Unicode to a Unicode window, but ANSI to an ANSI window. And Panda is naively creating an ANSI window, since that’s what you get by default when you call RegisterClass(). So the short answer is, there’s an easy fix.
Hmm, you’re right, perhaps the OP on that other thread was indeed experiencing this same problem, and I failed to understand it correctly.
Yes, it seems that the behavior of WM_CHAR is quite inconsistent across Windows platforms, which is no doubt a big part of why this has escaped my notice before. Even the MSDN docs don’t describe the behavior accurately.
OK, I’ve just checked in what should be a suitable fix. After the buildbot server has had a chance to pick this recent change up, please try the buildbot release and let me know if it solves the problem.