Modules to Dicts: Safe, but efficient...?

Thaumaturge · December 11, 2020, 9:08am

A concern of mine for a while now has been the closing of some security issues in my game, primarily by virtue of removing usages of “exec” and “eval” that seemed problematic.

Now only one system remains as a significant security issue, I think: The localisation/text-string system.

As things are currently implemented, each group of text-strings within my game is a Python module, with each individual text-string being a variable within that module.

So, for example, the text-strings specific to the first level might be stored in a module named “level1”, and a text-string with id “someObject” might then be accessed as “level1.someObject”.

In order to gain access to a given group of text-strings, the relevant Python module is dynamically imported.

Now, this incurs a security risk: since the system dynamically imports these text-string modules, it would likely be possible to include malicious code in a module, causing it to be run on importation.

Furthermore, when a text-string is accessed, the code to this end (e.g. “level1.someObject”, as above) is dynamically generated–after all, the underlying system doesn’t know what modules or objects might exist. This process then uses “eval” in order to run that dynamically-generated code, once again incurring a security risk.

Now, I have an idea as to how I might rework this to remove the security issues: I currently have it in mind to essentially replace the “module-and-variable” approach with a “dictionary-of-dictionaries” approach. What was the module-name would become the first dictionary-key, and what was the variable-name would become the second dictionary key.

As far as I see, this should function, and should be safe.

However, what I don’t know is whether it will perform well, or have other issues that I’m not aware of.

So, before I go and rework my localisation/text-string system in this way, I’d like to ask here: is this “dictionary-of-dictionaries” approach a good idea? Are there likely to be performance issues associated with it? Are there any other issues that it might be a good idea for me to be aware of?

For reference:

My game accesses these text-strings fairly often, and does so mid-gameplay.
- As a result it would be problematic to have the program hitch noticeably while accessing a text-string, I fear.
Based on a quick parsing, I think that I have about 1400 text-strings thus far, and the game is still quite unfinished.
- The final count could add another five-hundred to a thousand strings, at a guess.
- Furthermore, I’d ideally like this system to be somewhat future-proof, with the potential to be used in subsequent games of unknown size.

rdb · December 11, 2020, 11:37am

Why not just use JSON (or, if you prefer, TOML) files to store your text strings, instead of Python modules? There’s no risk of code execution that way.

You could also use gettext with .po files, which is a fairly standard format for localisation.

I don’t think you’ll notice a meaningful performance difference with dictionary-of-dictionary accesses, for what it’s worth, unless you were accessing these strings many thousands of times within a single frame.

Thaumaturge · December 11, 2020, 11:44am

Honestly, I’m just not familiar with those tools.

(Although I have the impression that JSON comes with some XML-style tagging, which I don’t really want. I’d prefer a less-cluttered approach.)

That said, I’m open to being convinced!

My thought in moving to the dictionary-of-dictionaries approach is to convert my current Python-files into simple text-files, and to then read and parse them instead of importing them.

Something like:

newDict = {}

text = vfs.readFile(fileName)

lines = text.split(b"\n")
for line in lines:
    if len(line) > 0:
        id, sep, val = line.partition("=")
        newDict[id] = val

self.stringDict[fileName] = newDict

Okay, that’s reassuring–thank you!

(I might access the dictionary once per frame in certain cases, but it’s unlikely to be much more than that.)

rdb · December 11, 2020, 11:59am

JSON is a very commonly understood and used format. Its syntax looks like this (quite similar to Python dictionaries):

{
    "key": "value",
    "mydict": {
        "key1": "value1",
        "key2": "value2"
    }
}

A JSON parser is built into Python. You just need to feed it into json.loads(...) and you have the string safely into a Python dictionary.

TOML syntax looks like this, and may be a little easier to edit:

key = "value"

[mydict]
key1 = "value1"
key2 = "value2"

I don’t think you should reinvent the wheel. It looks like for the format you have in mind you might as well use TOML or just plain old ini style (which is similar to TOML, but untyped so doesn’t require quotation marks—the ConfigParser module can handle it) rather than roll your own data format and parser.

Thaumaturge · December 11, 2020, 12:05pm

Hmm… Okay, both of those look less cumbersome than I had expected. I’ll look into TOML in particular, with JSON as a fallback, I think.

(Using inverted commas is actually not entirely a bad thing in this case: since my current data is all in Python strings, and thus uses inverted commas, having the target format also use them means that there’s less translation to do. Indeed, the format is very similar to what I have already, save for the square-bracketed dictionary-tags.)

Thank you for so arguing!

rdb · December 11, 2020, 12:19pm

Both the toml and json modules support writing, so you could probably without much effort automatically convert your existing strings to those formats.

Thaumaturge · December 11, 2020, 12:20pm

That is encouraging! I’ll likely look into doing that then, thank you.

Thaumaturge · December 16, 2020, 10:57am

Can I just check, please:

When you spoke of “the toml module”, was the following the one that you meant?

As installed under Ubuntu like so:

$ sudo apt install python3-toml
$ pip3 install toml

Is that correct?

And finally, when building a distributable version of a program, what, if anything, do I add to my requirements.txt and/or setup.py? Just “toml” in the requirements.txt file?

To clarify, it looks like there isn’t a built-in Python module for this purpose, and I want to check that I’m using the correct one, and that it will be included in my distributables. ^^;

rdb · December 16, 2020, 1:43pm

Yes, that’s the module. You just need to add toml to your requirements.txt when building, and deploy-ng will know what to do.

Thaumaturge · December 16, 2020, 2:50pm

Got it, excellent, and thanks!