Activity Spikes and Performance Degradation?

Thaumaturge · October 4, 2019, 5:38pm

I’ve had a report of performance issues after a few consecutive reloads of a save-game in A Door to the Mists.

Investigating on my end, I found no such effect. (I’ll confess that I haven’t yet tried consecutive reloads under Windows, thinking that the issue is platform-specific. [edit] I’ve now checked, and I don’t see it there either–on my machine, at least. [/edit]) However, I did notice via PStats that each reload (somewhat understandably) comes with a significant spike in activity times.

Could that activity be resulting in slowdown on some systems–perhaps by computer resources not being properly freed up?

(I’ve at least somewhat checked for objects not being cleaned up on my end. Aside from a handful of nodes that had been missed, I haven’t found anything–but the game is complex enough that it’s not impossible that I’ve missed something…)

serega-kkz · October 5, 2019, 2:47am

I can confirm that there are slowdowns when loading saves multiple times on Windows. Levels load each time longer and longer.

Thaumaturge · October 5, 2019, 3:03am

That’s weird–I’d love to know what’s causing it. :/

Thank you for the corroborating report, however!

serega-kkz · October 5, 2019, 3:20am

The problem may not be with memory at all, but with processor time. Perhaps the whole thing is in scripts and tasks that are duplicated upon reboot.

Thaumaturge · October 5, 2019, 3:23am

It’s possible–but then I would expect to see the same effect on my own machine. And yet on my machine, whether in Linux or Windows (Windows 8, at least; I haven’t tried another version), reloading multiple times seems to have no effect on the frame-rate. (Except for a brief dip immediately after the reload, of course, after which the frame-rate returns to pretty much the previous value.)

serega-kkz · October 5, 2019, 4:25am

Perhaps you have platform-based branching code.
This may occur when using :

sys.platform

rdb · October 5, 2019, 1:00pm

You could confirm with PStats whether the number of states, textures, vertex buffers, used video memory, or anything of the sort keeps increasing after load/save.

It could be that on some more resource-constrained hardware, such issues are noticed sooner, due to the card running out of memory, or something of the sort.

Thaumaturge · October 5, 2019, 2:36pm

I believe that I’m not doing anything like that–in my own code, at least; I don’t know whether Panda itself has any such.

That’s a good idea–it hadn’t occurred to me to check those numbers. (I’d forgotten that PStats offered them. ^^; )

I’ll do that and then edit this post with the results, I think.

(I did look at the node-hierarchy in DirectTools, as I recall–indeed, that’s how I found two very minor cases of nodes not being cleaned up. I didn’t see anything else accumulating–but I may well have missed something.)

It’s quite possible–although I would have thought that I’d at least notice it in the more-intensive parts of the game, or when performance is constrained by things like DirectTools.

[edit]

Okay, I’ve looked at the PStats graphs, and it does look like something is accumulating. :/

Here is what I saw in one experiment, over the course of a few reloads:

“Nodes” went from 282 to 288
- Not a huge increase, but an increase nevertheless
“Geoms”, however, held steady at 213
“RenderStates” started at 1002, then climbed quickly to 1282, then 1331, then dropped slightly to 1327.
- Odd that it held so steady at the end, but a significant increase nevertheless
“TransformStates” climbed, but only a little, from 2742 to 2777
“System memory” was similar, going from 1031 MB to 1135 MB
“Vertices”, oddly enough, held rock-steady at 3108 K
“State changes” climbed a little, from 228 to 242
“PipelineCyclers” rose dramatically, starting at 44001 and rising to 62832! 0_0
“Vertex Data (MB)” rose initially from 162 MB to 211 MB, then held steady there
“Graphics memory (MB)” was similar, rising initially from 702 MB to 777 MB, then holding there.
“Geom cache size” rose from 502 to 573.

So, it seems that something is leaking, somewhere. :/

The question now is: what?

Looking at the data, there might be some models going un-cleaned-up, but it doesn’t look like a lot. If anything, I’m tempted to think that it’s more likely “logical” objects than graphical ones (that is, things like “GameObject” classes, etc.

On the other hand, what is “PipelineCyclers”, and why is it rising so much more than anything else?

Regarding the graphical/scene-graph side, is there some reasonably-easy way to find nodes that aren’t being cleaned up? Or is it just a matter of combing through my code for anything that might have been missed…?

[edit 2]
Some of the answers in this thread seem to be proving useful in finding leaked objects!

(Spoiler: it looks like, amongst other things, I have some DirectGUI objects going un-cleaned-up… Those might be a pain to track down and fix! >_<)

However, a call to “MemoryUsage.getPointers(mup)” is crashing due to an AssertionError, despite my having “track-memory-usage 1” in my PRC file… :/

serega-kkz · October 5, 2019, 6:24pm

You can look at this in the source code, but why it is growing, you need to find out.

Thaumaturge · October 5, 2019, 6:27pm

Thank you for that.

Hmm… Looking at the main comment there, it may just be that it’s increasing because other things–vertex data, transform states, etc.–are increasing. But I’d like to hear from one of the engine devs on this, I think…

Thaumaturge · October 8, 2019, 7:08pm

Right, I think that I have this dealt with–or at least largely so! O_O

That proved quite difficult–but I’m glad that I did attend to it.

PStats values after about five reloads now remain pretty much steady–in my test some increased a little bit, a few actually decreased, and many remained surprisingly steady.

To elaborate:

The following increased:

Geom cache size: 480 -> 483
TransformStates: 2679 -> 2686
System memory: 863 MB -> 883 MB
State changes: 230 -> 242
PipelineCyclers: 44513 -> 44640

The following decreased:

RenderStates: 980 ->972
Graphics memory: 704 MB -> 695 MB

The following remained the same:

Geoms: 211
Nodes: 277
Vertices: 3108 K
Vertex Data: 162 MB

At least some of the variation might come from shifts in camera perspective; if not that, then I’m mystified at the decreases in certain values.

I’m also rather glad to note that some of those values seem lower now than in the earlier test–presuming that I’m correct in thinking that both were tested in the same location, with the same save-file. I note in particular that system memory usage seems to have dropped by ~150 MB, and is now less than 1 GB.

eldee · October 8, 2019, 9:20pm

Out of curiosity, what was the root cause of the problem ? I mean, are there some pitfalls (that could be common) that lead to the endless increase of PipelineCyclers, or was it entirely related to your game engine ?

Thaumaturge · October 8, 2019, 10:19pm

I think that it was largely a matter of oversights on my part, of one sort or another.

If I recall correctly, I found a number of potential leaks over the course of the investigation. Some of them were simply objects that I’d missed cleaning up. Others were our old friends, the Python-tags.

The last issues that I cleared up were a little different:

As part of my menu-navigation code I setup some events, with their “extraArgs” holding references to the related navigation-objects.

Now, I did have code that was intended to clean these up. However, that code assumed that the navigation “map” from which the events in question were determined was unchanged when it came time to clean up. It thus simply iterated over the contents of the navigation “map” as it had during event-setup, clearing events based on that data.

However, in some cases the “map” could in fact have changed, leaving events and their “extraArgs” uncleared. As a result there were at the least still references to the navigation-objects.

I fixed it by simply keeping a separate list of event-names, stored as the events were specified. When it came time to clean up, the code then just iterated over that list instead.

(I may be forgetting a few issues.)

… I still don’t know exactly what caused the PiplelineCycler issue.

eldee · October 8, 2019, 10:30pm

Thank you for your detailed reply, it’s hands-on experience like that which helps improving your code and knowledge of the Panda engine (and it reminds me that I should double check my usage of the Python-tags).

Not so long ago, I discovered a seemingly benign construction that costed my app a few ms per frame! To fade labels with distance, I was setting the label color using setTextColor() with the label color times the fading coef. It works fine, but it causes the text to be regenerated each frame. I’ve switched to set_color_scale on the node path instead, using the fading coef as scale, and no more text generation

Thaumaturge · October 8, 2019, 10:33pm

It’s my pleasure! And I’m glad if it’s helpful at all.

And yeah, there’s always something new to learn, it does seem!

Ooh, ouch! I can see that being a tricky one to spot! How did you come to realise that it was a problem come to that, if I may ask?

(I’m glad to say that I generally use “setColorScale”/“setAlphaScale” as a matter of habit.)

eldee · October 8, 2019, 10:38pm

pstats FTW I noticed that in the ‘*’ task was suddenly taking a lot of time when the labels were displayed, and, diving into it, I saw that the activity ‘Generate text time’ was the culprit. Then it was just a matter of time to find that the root cause was the setTextColor() call. Without psats I would still be wondering why drawing text take so much time and looking at how to reimplement myself…

rdb · October 8, 2019, 10:56pm

FWIW, pipeline cyclers are things just about every pipelined object (which includes most objects part of the rendering pipeline) has, so reducing the number of objects (eg. nodes, textures, etc.) is the only way of reducing the number of cyclers.

Thaumaturge · October 8, 2019, 11:30pm

Ahh, I see! That makes a lot of sense!

PStats is the source of the numbers that I posted above, as I recall, which were rather important in discovering that there was a problem. So I do agree with “PStats FTW”.

Ah, thank you for that–I did wonder.

My best guess, then, is that leaked GUI-related objects were one of the main causes of that original huge increase in the number of cyclers, along with a few othewise-un-cleared nodes and the like.