Finding My State Changes

Thaumaturge · May 2, 2024, 2:46pm

I’m attempting to improve the performance of my current project, and looking at PStats, my current best guess is that I have too many state-changes.

(In a recent test, I had about 309 state-changes indicated in PStats, most of then Transforms.)

Which leaves me with a question: How can I determine where these state-changes are, and what sort of states they might be?

Without some idea in that direction, I’m somewhat guessing at what to change–thus far to no avail.

For example, I tried merging various walls together into a smaller number of single models. While I didn’t check PStats on that occasion, I found that there was little if any effect on my frame-rate at the time.

Likewise, inspired by this post, I tried having my menus detach themselves when “hidden”, rather than calling “hide”–again with little apparent effect.

So I feel like I could use some way of finding the actual cause of these state-changes…

Does anyone have any suggestions…?

serega-kkz · May 2, 2024, 3:57pm

It is possible that this is the result of automatic culling, both object rendering and physics or collision. As far as I know, panda under the hood uses the BVH system for optimization.

Thaumaturge · May 2, 2024, 9:53pm

Which, if so, would imply that most of my state-changes are on nodes that are being culled away, and thus aren’t greatly impacting the final render-time–which would mean that the number of state-changes is a red herring…

But are we confident of that…?

serega-kkz · May 2, 2024, 9:56pm

I am sure that this is the case, but I am not sure that PStats takes this into account, this is again just a theory to begin with.

Max12345 · May 6, 2024, 11:30am

At that point, I would write test cases where you change variables dealing with, what’s happening in your project, and then look at the performance of those test cases.

E.g. lots of static objects, lots of objects that move, static objects where only a fraction are shown on camera, etc…

Thaumaturge · May 8, 2024, 9:21am

(Sorry for the delay in response! ^^; )

This is fair!

I mean, I’ve devised various test-scenarios thus far:

I’ve cored out my primary shaders such that they render only basic colour; I’ve tried removing skeletal animation from the relevant shader; I’ve tried changing my custom culling-bins to use the “state-sorted” mode instead of the current “fixed” mode; I’ve tried killing all enemies in a given area, leaving me with just level-geometry and the player in view; I’ve tried removing (at least some) per-object shader-inputs; and more besides!

Thus far to little avail. :/

(And I want to test object-pooling for projectiles today.)

The problem, I feel, is that I’m stabbing in the dark. I could really use a direction in which to investigate. :/

rdb · May 14, 2024, 2:21pm

“State changes” are only incremented for objects that are in view. It basically occurs whenever the renderer has to update the graphics state, which occurs when it moves onto the next object to render and it has a different texture, material, or otherwise.

It’s fairly trivial to write a script that iterates through your scene and identifies objects with different states. You just have to compare the .state property. Objects are sorted by state by default, so objects with the same state are already grouped together for minimum state changes, unless they have some special other sorting requirements (eg. if they have transparency enabled).

To reduce your state changes, reduce the number of unique textures, materials, non-vertex colours, etc. If textures are the problem, for example, you can merge together textures for different parts of the same object into a larger texture atlas, with the texture coordinates referencing different parts of the same atlas. (I believe Panda’s flattenMultitex can do this automatically.)

If you can record a session with the latest (master/dev build) version of PStats and send it to me, I’d be happy to give you some insights as well, if something jumps out.

rdb · May 14, 2024, 2:26pm

Ah, sorry, you mentioned it’s transforms, not render states (like textures). So the problem is too many objects with a unique transformation. For static objects, you can flatten their transformation onto the object with flattenLight (or you bunch static objects together into a group with similar parent node and flatten their transform relative to the group node - this makes culling a lot more efficient).

You can use ls() to diagnose this, which will list the transform along with the object. If you don’t mind sharing the output of ls() on your scene, maybe I can offer some suggestions.

Note that having many UI items also counts. Just hide() your render2d to see if that changes anything to the performance / state change count.

For dynamic objects… do you have a lot of them? The RigidBodyCombiner can help, but it has a hefty CPU cost. Do you despawn dynamically created objects? There are solutions, but they are specific to the kind of situation. A particle system, or hardware instancing, can sometimes help. If dynamic objects consist of sub-objects that don’t need to move independently, those sub-objects should have their transforms flattened relative to the parent node.

Thaumaturge · May 14, 2024, 4:20pm

Based on my observations thus far, I now suspect that I have two main problems:

First, that I have too many individual nodes composing my scenes–as you said.

And second, that spawning entities (even just projectiles) appears to have a noticeable cost somewhere along the way.

Regarding having many nodes, I did once attempt to combine together objects within a room that shared textures. (e.g. All columns in that room, and all wall-sides, and all wall-tops.) This didn’t seem to help noticeably, I’m afraid.

Still, I did only do it for the one room that was in view…

Now, as to combining objects into single meshes, things are complicated a bit by the fact that I use a fair bit of transparency, with explicit sorting, due to my chosen art-style.

For example, a wall might have skirting at its feet that uses a texture with transparency, and have a top that overlaps it a bit and that likewise uses transparency.

Still, I have been thinking that I might be able to combine some of this into a single texture: the wall would still have skirting, but it would be part of the wall-texture, and the wall-geometry would be adjusted at its feet to have the appropriate shape. This, and similar endeavours, might help to reduce the number of objects present…

Conversely, it might be awkward in cases in which the multiple objects are being used to construct composites from parts. We’d have to see just how big a texture I ended up with!

All that said, actually implementing this isn’t something that I want to do just yet–I’m trying to reach a milestone, and fear that it would set me back too much.

Regarding the spawning of entities, I still want to investigate further…

(As I recall, I tried the PStats python profiler–only for it to slow things down so much that both my IDE and Firefox crashed. ^^; I’m still thinking to use it–but with nothing else running but a terminal each for PStats and the game, and of course those two programs themselves…)

Hmm… I wouldn’t think so–but it depends on what counts as “a lot”.

(Save for one rather specific scenario in which quite a few projectiles can end up being generated.)

I do.

That said, I recently tried creating an internal, recycled pool of projectiles for a particular enemy-type that happens to spawn several projectiles at once, and that therefore seems to particularly show the issue–and it didn’t seem to help.

I don’t mind–but calling it on the root-node of my “world”-object produces quite a lot of output. (Specifically, more than 2000 lines.) So I’m hesitant to ask that you attempt to examine so great an amount of text! ^^;

(Seeing earlier just how many individual nodes I have in my scene was, I think, one of the things that prompted me to consider merging objects into single textures.)

Looking at said output does prompt a thought, however: Do tags count against performance at all…?

I use basic string-tags to transfer information from my modelling package into the game–things like construction parameters specific to an individual enemy, for example. And looking at the output of “ls()”, I see that they’re (naturally) still on the node.

It seems unlikely, but could those be an issue…? Should I perhaps be removing them…?

rdb · May 14, 2024, 8:42pm

This is not what I said, actually—I said many nodes with unique transforms. It won’t be a transform state change otherwise. It may still be a problem that you have too many nodes, but let’s be careful not to jump to conclusions lest you optimize the wrong thing.

If your non-transform state change count is low, this may not be a bottleneck, and therefore that might be a waste of time.

You don’t need to use the PStats Python profiler. Since Python code isn’t your bottleneck, it won’t tell you anything interesting, and just consume a lot of memory.

What I suggested was that you record a PStats session file so that I can look at the charts myself. I’m only going on your guesses right now for what the problem might be.

Well, it’s easy to put this to the test. Make a key that instantly spawns as many objects as might normally be spawned at one time, and observe the effect (in milliseconds) on your frame time.

No, they don’t. Easy to put to the test, by doing a findAllMatches of nodes with tags and calling clearTags, but I don’t think you should look here.

I’m not looking to read every individual line, but I also am looking to get a sense of the structure. It can have a big impact on performance if you have a very flat or poorly structured scene graph. That said, based on your description of the problem, I’m assuming most of the time is spent in Draw and not very much in Cull…?

Thaumaturge · May 15, 2024, 10:49am

Ah, you’re right, sorry–I was tired when I replied last night! ^^;

That is a good point, and honestly a bit of a relief. ^^;

True. I think that I was previously thinking that the problem lay with the spawning code–maybe I was being too free with vector-constructions or something–since there seemed to be a marked dip in frame-rate whenever a certain enemy (that shoots multiple projectiles) fired.

I still think that I have a problem there, somewhere. Just that it’s not the only problem.

(And maybe solving the other problem will free up enough frame-time that the above problem won’t be an issue.)

That would be appreciated, thank you!

All right, I installed the current dev build via pip, and… it seems to result in my scene being rendered entirely in black. The menus and UI seem unaffected, but perhaps there’s a problem with my off-screen buffer, or some such thing. :/

As such, I’m not sure of how useful the session-file would be, and it’s tricky to navigate to a location that has one of the enemies that shoots multiple projectiles.

That said, woah, the new PStats looks good!.. 0_0

Good thinking! I’ve just been observing a certain enemy, but the test that you propose was easy enough!

So, the test was performed in a simple testing-scenario–as a result, the frame-times are lower than in the main game.

In short, I created a test-function that runs a simplified version of my weapon-code, spawning a fan of 12 projectiles around the player.

The baseline frame-time (i.e. with no such spawning) was ~6.4ms.

After a single run of the test-function, no change was observed in the frame-time.

While rapidly pressing the key that controls the test-function, the frame-time increased to ~11.2ms

Not as huge a difference as I’d expected, I’ll confess!

Hmm… So perhaps the dips are coming from some other aspect… Maybe there’s an inefficiency in my weapon class, or the effect that accompanies firing is more expensive than I realise, or something in the enemy AI…

[edit]
I decided to repeat the experiment with the actual weapon used by the enemy mentioned above–once again, it sprays out a fan of 12 projectiles. However, it uses the full weapon-code, and spawns effects for firing and for the projectile hitting a wall.

In this case, a single press seems to consistently add 0.1ms, while spamming the associated key incurred an initial jump to a frame-time of ~15ms (adding ~8.6ms), before settling down to a frame-time of ~14ms.
[/edit]

Okay, that’s a minor relief!

I thought that they likely weren’t a problem–but I’d rather ask and be a fool today than stay silent and be a fool indefinitely.

Ah, I see!

Well, in that case you should find the file attached below, should you still want to look at it:
lsOutput.zip (26.2 KB)

Indeed!

(The following screenshots, let me note, were taken with the 1.10 version of PStats.)

Although looking again, I do seem to have a fair bit of time still in “App”…

Examining that, said time appears to be split between my central “update” task, the Audio3D manager, and “garbageCollectStates”–the latter of which may support the idea that there’s a state problem.

For what it’s worth, the following is the result of commenting-out the call to the update-methods of my game’s enemies. It saves about 4ms out of 18ms.

rdb · May 15, 2024, 5:43pm

Ahh, data. There’s no need to guess when you can measure. It always takes some time and effort to learn and set up the tools, but it saves so much time in the long run.

It seems your biggest problem, by far, is App. Your Draw is taking at most 5 ms, which is more than adequate for a 60 fps game. That suggests to me your geometry, textures, etc. are not a problem. (I see occasional spikes in Draw, which may be worth analysing if you’re noticing lag spikes—PStats will tell you what it’s doing there, like uploading textures, or whatnot. The Timeline view in the new PStats is great for identifying single events during particularly slow frames.)

App is split into three evils. The first is garbageCollectStates. The ls() output is most enlightening. You have lots of static-looking nodes with a transform state applied, like T:(pos 9.72115 51.078 -29.9999 hpr -90 0 0). You should flatten those transforms, as I said earlier, using at least a flattenLight. That should help tremendously with your transform state count and therefore also garbageCollectStates.

The second evil is your update task. It’s unfortunate that the PStats Python profiler isn’t working for you, because it would tell you exactly which method is slow. In absence of it, I would suggest that you either use Python’s own profiler, or that you manually create PStats collectors around methods of yours (this is explained here), especially within your update task. Python’s own profiler can also narrow it down to a by-line basis, which is great for finding out which parts of your enemy updates are so slow. 4ms is a lot.

The third evil is updating the 3D sounds. How many 3D sounds do you have attached to a node at once? How many of them are playing at once? Do you remove the sounds from the Audio3DManager once the nodes they are attached to are despawned?

Hmm, this is a bit alarming. That said, you only really need to use the pstats server binary from a dev build—you can still run the game itself on Panda3D 1.10.14.

This sounds pretty big to me. Your frame time doubled once you spawned the projectiles. And if your budget is 16.7 ms (for 60 fps), 5 ms is a big difference. Or are you saying that this is an amount of projectiles that wouldn’t normally happen in-game?

Thaumaturge · May 15, 2024, 7:27pm

Hahah, the refrain of the scientist! XD

Indeed, that’s not surprising, and then a good place to start, I feel!

I did perform a quick test with “flattenLight”, but at the time it didn’t seem to help all that much.

However, I should I think experiment with it more–it’s possible that I called it on a bad choice of node. (I recall, for example, that it doesn’t go past "ModelNode"s, and it may well be that there was one such just below the node on which I was calling it…)

Ever a difficult one, I fear!

I have tried custom collectors in the past, but I have thus far seemed to find that the frame-time ends up dissolving into myriad smaller parts, each insignificant on its own–or sometimes seems to be spent in some aspect that I’m not seeing.

I will, I intend, have another shot however!

As you say, the Python Profiler would be potentially very helpful. I may still try it with nothing else running…

You mention “Python’s own profiler”–I wasn’t aware that it had one! I should look that up, I think!

(I distinctly recall getting some sort of Python profiler working in the hazy past, and finding it useful–but I don’t recall what profiler that was, alas. Perhaps it was the above!)

Good questions all! I intend to investigate them–especially that last!

(I wouldn’t think that I had that many–but they can add up when multiple enemies are around, and they’re all shooting, and the player is shooting too, and so on…)

A little bit. ^^;

Ah, good to know! I may try that tomorrow, then!

Oh no, that was spamming the key. I was actually a little surprised at how little effect there was for so many projectiles!

(It may be worth noting here that projectiles actually bypass the standard game-object update-method, instead using their own, simplified version.)

~

Okay, I think that you’ve given me a good few things to try! I intend to give them a shot and report back!

Thaumaturge · May 16, 2024, 4:54pm

Okay, update!

Regarding “flattenLight”:

Since flattening seems likely to be primarily useful with scenery, not enemies, I performed this test with enemies disabled.

Specifically, I performed flattening on the root-node to which my walls are attached. (I originally had it also applied to another scenery-related node–but that seemed to result in at least one shader-input going missing, and thus in a crash.) I tried all three of “flattenLight”, “flattenMedium”, and “flattenStrong”.

And in short… it didn’t seem to have much effect. (And what tiny effect there was seemed to be more often an increase in frame-time, rather than a decrease.)

Regarding profiling:

I found the Python profiler. (For any who, like me, were initially unfamiliar, see this page in the Python documentation.)

I have thus managed to profile the game.

Alas, I haven’t found a way to profile a single, limited portion of the game, which muddies the waters a bit, I fear. (I am aware that one can run the profiler on any given function-call–but it seems to then record data for only the most recent such call, with each call’s profile-data being overwritten by the next’s.)

As of yet, I’m not quite sure of what to make of the data. There’s plenty of time being spent in the Audio3DManager, as we knew, and quite a bit in collision-traversing. I’m apparently calling “getPos” an awful lot, and “setPos” quite a bit. And apparently I’m calling “set3dAttributes” even more often than I’m calling “getPos”!

Regarding custom PStats collectors:

Experimenting with these uncovered a single-frame spike when a certain enemy fires its weapon.

Drilling down, it looks like this is to a large degree caused by my copying the relevant sounds and special-effects when spawning new projectiles.

I’m not yet sure of what to do about this…

Regarding questions asked about my use of 3D sound:

Looking at the enemy that I’ve been referencing, it seems to have 6 sounds. All of these are attached, but most are not playing at a given time.

I’m not sure of how many might overlap at any given time.

As to removing sounds from the Audio3DManager when their source is despawned, I’ve added some code to call the “detachSound” method when a game-object is destroyed. I’ve seen no clear improvement from this, but it seems like a good idea, nevertheless.

That said, is there more to removing a sound from the audio-manager…? I don’t see any likely-looking methods in the Audio3DManager class…

Regarding the dev-build of PStats:

I haven’t yet tried this, but still have it in mind to perhaps do so!

And by the way:

During my investigations, I discovered a loading issue that was multiplying certain objects. This is fixed, I believe.

That said, it seems to have had minimal effect–I don’t think that the particular group of objects in question was doing much at most times–but it was still a worthwhile fix, I daresay!

serega-kkz · May 16, 2024, 9:40pm

Hmm, if you are developing on a laptop, then perhaps its capacity is not enough, and the transformations you have discovered are actually not related to performance. I once had a situation when I wrote a simple program in C++ and it started to slow down. Then, in disbelief, I turned to rdb to test this program on his PC, it turned out that he did not notice any slowdown.

Walking back and forth across the room, I realized that the problem was overheating of my processor, I also realized that the performance of mobile systems is very dependent on the presence of dust. When I cleaned my laptop, I found that the program was no longer slowing down. You may need to test your application on a desktop computer or clean your laptop.

Thaumaturge · May 17, 2024, 8:02am

Hmm… I can’t say that that’s impossible!

I am developing on a laptop, and… ah, let’s say that there’s a good chance that there’s dust somewhere in there. ^^;

That said, I don’t really have good access to a gaming desktop right now, and am anxious about breaking something if I try to clean the laptop’s interior myself, so testing in this manner might have to wait… ^^;

[edit]
I do want to look at my usage–perhaps overusage–of “getPos” and “traverse”, I will say.

I suspect that the latter, at least, might be reduced: I have a traversal that’s called by each active object, and that might be combined into a single sweep, and perhaps some of my custom traversers might be replaced by usage of “cTrav”…

serega-kkz · May 17, 2024, 3:04pm

You don’t need to remove the dust to make sure there is such a problem. You can use software tools to monitor hardware parameters, temperature, frequency, reading speed, and so on. By the way, this will give you an idea of how much resources the application itself consumes.

Thaumaturge · May 19, 2024, 10:41am

Oh, I have a temperature monitor active on my system. But I’ll confess that I’m not much of a hardware person, so I’m not all that familiar with what sort of temperature would be “bad”… ^^;

~

On another note, where in PStats would time spent by non-cTrav traversers be recorded…?

You see, one thing noted in my Python profiler results is that a fair bit of time was apparently spent in “traverse”. (Specifically, when sorting by “tottime”, “traverse” appears as the sixth entry.)

Now, I have multiple traversers in my game. For most collisions, I have a traverser assigned to the standard automatically-updated “cTrav” variable. But for various other tasks (one-off ray-casts, detection of the current room, etc.) I sometimes use separate traversers.

And I note in PStats that I seem to only see “cTrav” under the “Collision” section. I’m thus wondering whether the other traversers might be appearing under “Show Code”, accounting for some of the time spent there…

Of course, I could place a custom PStats collector around the call to “traverse” for one such traverser–but if they have internal PStats collectors then I’m not sure that the results would be valid…

~

And finally, I mentioned previously, I believe, that I found a PStats-spike when copying sounds and special-effects for projectiles. I have it in mind then to change my handling of these copies, such that they’re only copied when actually called for (e.g. when the projectile hits something), and that only the relevant objects are copied (e.g. copying either the “successfully hit” effect or the “failed to hit anything” effect, as called for–but not both).

rdb · May 20, 2024, 8:47pm

You seem to be struggling to interpret the Python profiling results. Look at the Flame Graph or the Timeline view, these views make it a lot more obvious what’s going on. Or save the session file and share it with me and I can take a look myself, and explain how to read the charts.

Did the flattenLight do anything? What is the output of ls() after flattenLight?

You clearly have a lot of sounds simultaneously added to the Audio3DManager. This also explains the many calls to getPos and setPos. Reducing the number of sounds simultaneously attached would help tremendously.

Thaumaturge · May 20, 2024, 9:06pm

Well, remember, this was done with the Python-native profiler, so I’ve been working with text-only output. (As far as I’m aware.)

But you reminded me that I had yet to get the updated version of PStats!

I’ve done that now… but am running into the issue that if I grab just PStats from the whl-file and run it from the download location (more or less), it fails to find “libpanda.so.1.11”. Extracting that, too, (to pretty much the same location) doesn’t seem to be working. Is PStats looking for that file in a specific place…?

If I recall correctly, I did indeed see fewer transform-states on the nodes.

If you want to take a look, I believe that the file below is the correct one:
lsOutput2.txt.zip (9.2 KB)

Ah, interesting! I hadn’t thought of that!

Well, I’m glad that I didn’t go hunting through my own code to find uses of such, then! ^^;

So, I’ve been thinking about this, and… how do I know when to detach a sound?

Now, in some cases a sound is associated with a state-change, in which case it can be detached when the state is exited. For example, the “walking” sound doesn’t need to be attached when the related character isn’t in the “walking” state.

However, in some cases a sound may, I suspect, overlap state-changes. For example, if a character fires a weapon, and then immediately exits their “firing” state, the sound may still be playing for a short while after the state-change. I don’t want such sounds to be clipped, but rather to end naturally.

Of course, there are events for sounds–but do I really want to set events for so many sounds…?

One thought that occurred to me: Might this be a thing that the audio-manager could handle? Specifically, is there a reason for the audio-manager to update a sound’s position while the sound isn’t playing? Could it not just limit itself to setting and getting sound-positions for only sounds that are playing…?