Recommending a TOML Parser

Thaumaturge · September 23, 2024, 3:25pm

So, for my current project, I’m using TOML to manage strings. To this end, I have TOML installed via pip, which seems to specifically be this parser.

And for the most part, it seems to work well!

However, it also has an issue in which it seems to duplicate lone or escaped backslashes, which is interfering with my attempts to incorporate TextProperties-tags in my strings.

And what’s more, looking around the above-linked GitHub page, it doesn’t look like it’s very actively maintained. (For example, relevant to this issue, there’s already an issue there covering more or less the same problem. This was posted back in June, as I recall, and doesn’t seem to have received a response from the devs.)

I was about to post an issue on our own GitHub page requesting an alternative set of TextProperties-tags, when I discovered that there are in fact other TOML-parsers out there.

Hence this issue: Can someone recommend a good TOML-parser, please? One that has solid support for TOML, and that works on both Ubuntu Linux (and ideally other flavours of Linux) and Windows.

rdb · September 23, 2024, 4:06pm

Python 3.11 and onward have tomllib as part of the standard library:

If you can’t use Python 3.11, try tomli. The pip tool itself switched from toml to tomli at some point.

If you need round-trippability, you could check out tomlkit:

Thaumaturge · September 23, 2024, 4:24pm

Ah, wonderful, thank you!

I saw mention of tomlib in Python 3.11–but I wasn’t confident of how available that is just yet: whether it’s available for my version of Ubuntu, and how easy it would be to set things up such that it’s what the project uses, and whether it would be the version incorporated into a build, etc. I’m inclined to prefer waiting for my system Python version to be updated with time, and Panda’s version likewise…

Hmm… I don’t think that I require round-trippability. (If I’m understanding correctly what that is: that what comes out of TOML’s interpretation can be re-parsed to produce the original input-text, complete with comments, etc.)

Thus I think that tomli might be the best bet for me right now.

rdb · September 23, 2024, 6:01pm

For what it’s worth, tomli and tomllib seem to have an identical API, so it should be trivial to switch from tomli to tomllib down the line, once Python 3.11 becomes ubiquitous.

Thaumaturge · September 23, 2024, 6:22pm

Ah, that’s excellent news! All the more reason to go with tomli for now!

[edit]
*sigh* It looks like tomli also duplicates backslashes. :/

I’ve opened an issue on their GitHub page, and it looks like they’re more responsive than the devs of toml, so we’ll see what happens…

[edit 2]
([edit 4] You can ignore edit 2 for now, as the discussion is perhaps better placed in the more-relevant thread that I’ve created.[/edit 4])

I wonder whether it wouldn’t be worth considering changing the tags used by TextProperties–or at least adding an alternate set.

Thinking about it, backslashes tend to be treated a special cases by text-parsers, escaping various characters. As such, even if the conflict here, with TOML, were resolved, another may yet crop up in the future.

And it seems to me that TextProperties likely doesn’t require escaping, as such–just a set of clear tags.

So maybe something like markup–something like [textProp]<property name>[textProp]<text>[/textProp] in place of the current \1<property name>\1<text>\2 system?

[edit 3]
Okay, looks like I was wrong! As pointed out in the tomli GitHub issue, the “double backslashes” seem to have been an artefact of the way that backslashes in strings are printed–even within the debugger, to my annoyance.

… And yet, somehow, the TextProperties markup is still not working. Or rather, it works just as expected if I don’t derive the string in question from TOML–just using a plain hard-coded string–but breaks when the string is derived from TOML.

(Which doesn’t mean that the issue lies with TOML, of course–there are a few moving parts in the system that uses it.)

More investigation on my end is called for then, it would seem!