PNMImage byte-string encoding

Thaumaturge · October 27, 2019, 1:18am

I’m currently trying to save and load data from a PNMImage into a file that contains other data. As per usual, I’m doing this via my “GameSaver” module–but I’m hitting a problem.

From what I’ve gathered, the problem seems to be something like this:

As part of handling strings, GameSaver converts them via a call to “encode(‘unicode_escape’)”. It then later decodes them via “decode(‘utf-8’)” to get a string-object again before writing that out.

It’s been a while since I implemented this, but if I recall correctly, this was implemented in order that strings could be saved without having to worry about special characters. And indeed, for normal strings, it seems to work as intended.

The problem comes when I try to save something that is already a “bytes” object–in particular, the “bytes” object acquired from the “getData()” method of a StringStream that has just been written to by PNMImage. This goes through the system, hits the call to “decode”, and trips over it with the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

After a bit of searching around, I tried just catching and ignoring the error. The “write” method of a file-object won’t accept a “bytes” object, however, and simply casting it to a “str” results in something that isn’t successfully loaded by a new PNMImage (reading it via a StringStream).

I’m really not sure of what to do here. Any suggestions?

(I could probably just save the PNMImage to a separate file, but I’d rather not do that in this case.)

rdb · October 27, 2019, 7:22am

If you’re trying to write a mixture of text and binary data, I would just open the file in binary mode to begin with and do the decoding/encoding of any text manually.

If you want to keep it in a text format, then I would suggest encoding the binary data using base64 or similar.

Thaumaturge · October 27, 2019, 2:04pm

I really don’t want to mess too much with the way that I handle files in general–aside from this issue, it’s proven to be a stable and reliable system thus far, and very useful as a result. So, I suppose that I’m looking into base64 encoding, then!

Thank you for the suggestions.

Thaumaturge · October 28, 2019, 10:51pm

Ah, I think that it might be simpler than that! I don’t know whether the following will prove stable in the long run, but it seems to work for now:

When saving, I pretty much just convert the bytes-object to a string. This results in a thing that starts with the identifying “b” of a bytes-object within the string itself. When I load the object, I can just “eval” that string, and it seems, for now at least, to produce the original bytes-string!

rdb · October 29, 2019, 12:11pm

I would really not use eval(); it allows people to put arbitrary code in your save file, which creates an attack vector that could be a security risk if these files are shared or synced with some cloud service or something of the sort. Even if that were no concern, it feels to me like using a sledgehammer to hammer in a nail.

If you want to do the equivalent of that, you can use codecs.escape_encode, which encodes things similar to how repr(bytes) does it, and you can use codecs.escape_decode to do the inverse.

escaped, _ = codecs.escape_encode(bytedata)
file.write(escaped.decode('ascii'))

escaped = file.read(...)
bytedata, _ = codecs.escape_decode(escaped)

That said, I am not sure that this is more compact or elegant than using base64.

Thaumaturge · October 29, 2019, 1:50pm

That is a concern–but it’s a concern that the saving-and-loading module as a whole contains: I use similar means to restore objects in general.

Thus far I don’t think that I’ve found a better way to restore arbitrary objects from strings, and having a saving-and-loading system that just works (this “bytes” issue notwithstanding), with next to no worries about file-format or initial setup, has proven incredibly valuable.

I do have one idea that might allow me to circumvent the requirement for “eval” and “exec” in general usage (“bytes” objects might still be an issue…), however, so I may give that some further thought…

[edit] It does look like making the module rather more unwieldy to use, however, which is a serious pity. :/