Finding My State Changes

rdb · May 20, 2024, 11:04pm

That’s rubbish–I really suggest you get the GUI version working. (If you can’t use the GUI version, you can use the -jo output.json flags to text-stats to get JSON output that you can load into https://ui.perfetto.dev/, but PStats is more convenient.)

libpanda.so also depends on other libraries. You need to extract the whole thing somewhere, preserving the directory structure. Alternatively, just install the .whl into a venv.

Okay, does PStats confirm this? And did the time in garbageCollectStates go down?

The flame graph view shows you which function a particular function is called by, so there’s no need to hunt once you get the newer PStats up and running.

Surely, you are calling play() and stop() somewhere? That’s where you can attach/detach the sound.

The Audio3DManager is a very small convenience class, written in Python, that simply adds a task that calls getPos on each NodePath relative to the listener and updates that on the AudioSound. It doesn’t have the ability to hook into the C++ audio classes to get an event when a sound starts or stops playing.

Thaumaturge · May 21, 2024, 9:35am

Ah, I wasn’t aware that there was a GUI version! (Other than PStats, of course!)

Hmm… Trying that, I seem to still be getting the same result.

See this paste from my terminal:

~/Downloads/NewPStats/panda3d-1.11.0.dev3444-cp312-cp312-manylinux2014_x86_64(1)/panda3d_tools$ ./pstats 
./pstats: error while loading shared libraries: libpanda.so.1.11: cannot open shared object file: No such file or directory

(I also tried running it from the directories above that one, to similar effect.)

I don’t have much experience with venv, but might consider it if we can’t get the new PStats working from a directory… (Which latter would be my preference.)

Oddly, when flattening is applied PStats seems to report a number of transform- (and render-) states that is not much changed–and if anything, slightly higher!

Before and after:

There does seem to be a drop in the time used for “garbageCollectStates” of about 0.4 to 0.2ms.

The graph also looks a little less-intensely spiky.

Before and after:

It should be noted, however, that this was done with flattening applied only to the walls of my scene. There are other decorative elements in the scene that might be flattened–but an earlier attempt at this resulted in a shader-input going missing and a resultant crash.

That really does sound useful!

For continuous sounds, like walking-loops, yes.

But for non-continuous sounds, like explosions or “hurt”-noises, I’m pretty sure that I just call “play” and then let them stop on their own…

(And I think that the latter are more common.)

Can it not check the state of the sound in the code that calls “getPos”?

That is, I’m imagining that it currently does something like this:

for sound, np in self.attachedSounds:
    pos = np.getPos()
    sound.set3dAttributes(pos.x, pos.y, pos.z, 0, 0, 0)

I’m suggesting instead that it do something like this:

for sound, np in self.currentSounds:
    pos = np.getPos()
    sound.set3dAttributes(pos.x, pos.y, pos.z, 0, 0, 0)

self.currentSounds = [(snd, np) for sound, np in self.allAttachedSounds if snd.getStatus() == AudioSound.PLAYING]

Or perhaps more simply, something like this:

for sound, np in self.attachedSounds:
    if sound.getStatus() == AudioSound.PLAYING:
        pos = np.getPos()
        sound.set3dAttributes(pos.x, pos.y, pos.z, 0, 0, 0)

rdb · May 21, 2024, 10:03am

The libraries are in another directory, you can do:

LD_LIBRARY_PATH=/home/blah/blah ./pstats

where the path is the directory containing libpanda.so in the extracted whl (not sure off the top of my head which one that is, probably panda3d).

That’s very weird. Okay, so the flattening didn’t really do anything. What are you calling the flattenLight on, exactly?

I think you need to call it on your whole (static) scene or transforms applied to a node above it won’t get applied.

Flattening light caused a shader input error? That’s very strange.

Create a “play” wrapper method that creates a SoundInterval in a Sequence with Func, so you can detach the sound after it’s done.

I agree we need a better way to handle this in the engine, hmm. It seems SoundInterval also has a way to pass in a node and listener node, as an alternative to using the 3D audio manager, but it just does some volume falloff calculation and doesn’t really use the 3D positioning mechanism of the audio engine. Perhaps this should be updated, so that the 3D audio manager isn’t necessary for playing brief sound effects.

Yes, possibly—with the caveat that when you play the sound, the initial position might be wrong for a frame until the task gets a chance to run. And it’s still less efficient because the non-playing sounds are still being iterated over.

I think we ought to just handle NodePath-based audio updates in the C++ end of the audio manager, which probably isn’t even a lot of work to do.

Thaumaturge · May 21, 2024, 10:29am

Ah, that did it, indeed! Thank you!

Give me a bit of time with this, after which I’ll report back, I intend–I’ve gotten involved in another aspect of the project in the meanwhile.

The really weird thing is that “ls()” indicates that it does do something–doing a diff of the output of “ls()” as produced with and without the flatten, I can see TransformStates vanishing.

For context, a given “world” in my project contains multiple “rooms”, which are associated with various levels. (They’re stored in the “world”-object, but attached and detached according to the current level.)

Each “room”-object then has its own root-node, to which are attached a number of nodes–one of which is the “wallRoot”, to which wall-geometry nodes (and wall-colliders) are attached.

It’s on this “wallRoot” that I’m calling “flattenLight”.

There’s also a sibling-node to “wallRoot” called “content”, to which various elements of decor are attached. (As well as spawners, although those are I believe detached as part of the level-loading process.)

It really is. Even odder, it didn’t happen immediately: it only happened when I moved into another room than the one into which I loaded. (Any other room, as far as I saw.)

[edit]
Okay, I got the game to flatten the “content”-node without causing a crash. (I suspect that the problem was something to do with the previous timing of my flattening the node.)

In short… The result is pretty much the same: no apparent impovement in “garbageCollectStates”, and if anything a slight increase in the number of transform- and render- states reported. :/
[/edit]

Hmm… This could work.

And I do already have a class that manages sounds for objects that are expected to have them. Right now it attaches whatever sounds are loaded as soon as they’re loaded–but it wouldn’t be hard to change that, I daresay!

It will have a particular case in which a sound is expected to outlive a game-object, I fear. But I think that intervals clean themselves up–is that right?

Hmm… I don’t know–that sounds a bit counter-intuitive, to my mind. I would expect the 3D audio manager to handle, well, all 3D audio, and sound-intervals to just handle timing for sounds. I could see such a change tripping people up.

Hm, fair enough.

That sounds like it could be a good solution!

[edit 2]
Okay, I’ve tried out the new PStats–and wow, that flame-chart really is pretty cool, and much easier to read than the strip-chart!

So, you asked for a session-file–you should find it linked-to below, if I’ve done the job correctly.
https://www.dropbox.com/scl/fi/nu6c95dlnbjnmr3wqfhj9/session.pstats.zip?rlkey=skn7qu1qcf6zpr0hhtc8xipe5&st=mt6w5k3n&dl=1

Note that this starts just before the game is run, after which there should be a delay, followed by a huge performance dip as the game is loaded–followed at last by a period of gameplay.

Note also that, for testing purposes, I had a number of enemies on-screen–I felt that this would help to magnify any issues that they might be introducing.

rdb · May 21, 2024, 1:47pm

Yes, once they’re done playing.

Cool! This is without the Python profiler enabled though, right? This limits how much you can see inside “Show code”. Otherwise you’d be able to see the call graph of all the individual methods in the flame graph / timeline.

So, this is how your frame is divided up, according to the Timeline view:

I can’t see much of your Python code since you didn’t have the Python profiler enabled, but all those little blocks under “update” do grab my attention:

Are you making lots of individual calls to the collision traverser within a single frame? There might be some gains to be had there as well by trying to have those done as part of the main collision traversal.

There are other ways to improve collision performance, such as with something like this.

That said, it’s only a small part of your frame time in total. I think the main priorities should be (1) figuring out the transform states, (2) implement the sound attach/detach suggestion and (3) optimizing your Python code, these will give the most gains. After that, you can start worrying about optimizing the render loop on the right side, this will be harder.

Please note that transform states stack. If you, say, give a high level parent node a transform, and have lots of child nodes with individual transforms, then this creates unique transforms every time they are composed to form the final transform. This is why flattening can be used strategically to reduce the number of transforms. It’s interesting that this isn’t working for you, and probably worth investigating.

Are you using the Bullet physics engine with thousands of rigid bodies, by any chance? I know that Bullet also creates transform states for synchronizing these transforms.

Thaumaturge · May 21, 2024, 4:05pm

Okay, that’s good.

I’ll probably start with implementing this, then.

Correct. I might still give it another shot, with everything else closed, however, as I do agree that it could be very useful indeed. (Especially with these flame-charts!)

I am–albeit not to “cTrav”, despite the naming there.

This is why I was asking where I’d see non-cTrav traversers in the output of PStats.

You see, I have a basic cTrav traverser for most purposes–it keeps the player and enemies out of walls, detects triggers, etc.

But I have a few cases in which I want to perform a more-specific bit of collision. An enemy might want to check for a valid location before summoning something, for example.

Perhaps the most ubiquitous cases of this, however–and likely the main one that you’re seeing there–is that I use a separate traverser, operating on a separate mini-scene-graph, to detect what room each (applicable) game-object is in.

For context, the mini-scene-graph is pretty sparse, both geometrically and in terms of number of nodes, and if I recall correctly is only used with ray-casts.

That said, I’ve tried removing this last for testing purposes, and it only gained me about 1ms.

The other case–occasional traversals for specific purposes, like an enemy checking for a summoning location–is a little trickier to deal with.

I’ve given it some thought, and I could likely roll these into cTrav–but it means complicating the logic involved, and likely waiting a frame in order to be confident that traversal has happened.

Hmm… I’ll have to think about this, then, I feel.

I would still have thought that the flattening would have helped–after all, that would be fewer additional states to compound with the upper-level transform(s).

Oh wow no! I’m not using Bullet at all for this project, let alone in so intensive a way! ^^;

rdb · May 21, 2024, 4:58pm

Does the transform state count actually go down when you hide (most of) your scene with .hide()?

For what it’s worth, the master build of Panda improves performance of PStats considerably, also on the client side, so it may be worth figuring out at some point why your scene renders black. You may not have as many issues with applications crashing then.

Thaumaturge · May 21, 2024, 6:12pm

It does! By about 1400.

(From ~13700 to ~12300.)

I mean, at the very least I may want to upgrade to that version one day, so it’s likely enough something to sort out sooner or later! XD;

On another note, fun fact: the PStats Python profiler doesn’t work if you, say, leave the default Python profiler active through all of this testing, having completely forgotten that you hadn’t yet switched over to default execution… ^^;

I was quite confused for a while there as to where to look for the Python profiling results in PStats! XD;

That said, I got it working–and this time the impact wasn’t too bad! I wonder whether it wasn’t my IDE causing problems when I tried the PStats Python profiler in the past…

In any case, this has provided some very useful information, I do feel!

It looks like I’m detaching some sounds when destroying projectiles–and that’s costing me some time.
- This might be a case in which it would be better to copy sounds late, rather than attach them and then detach them…
I may well want to combine my room-detection traversals into a single traversal
- With a number of enemies on-screen, it looks like doing one traversal per enemy can add up to about a millisecond.
A certain bit of enemy-logic is using NodePath.find–over-using it, perhaps. I might want to look for a better way.
I’m using a custom Actor-class that automatically blends frames. This is looking a bit more costly than I’d realised.
- For this game, given that it uses a semi-distant third-person perspective, I’ll likely just drop the custom Actor-class.

If I may, I’d like to make a feature suggestion: it looks to me like it could be very helpful to be able to scrub back and forth in a session’s history while within a flame-chart.

Now, I see the timeline view–but I’m having a little trouble orienting myself there, especially as I end up zooming out in order to move at a reasonable pace, which makes it difficult to see what the flame charts are doing.

What I really find myself wanting is to watch the flame-chart “animate” as it does when live, but rather than in response to the passage of time, doing so in response to my adjusting a slider at the bottom, or some such.

rdb · May 21, 2024, 8:16pm

Ah, yes—only one profiler can be active at the same time.

Yes, fair. The timeline view helps with this, but the flame graph has a nice consolidated view. It would be nice to be able to click a particular frame to view the flame graph for that frame, or a slider as you suggest (though that’s probably a fair bit more work to implement).

I personally find that using WASD to move around the timeline view is the most convenient. But your feature request is reasonable, feel free to file it on GitHub!

rdb · May 21, 2024, 8:17pm

Oh, that’s not very much! So the transform states are coming from elsewhere. The GUI, perhaps? What if you do something like render2d.hide()?

Thaumaturge · May 21, 2024, 8:40pm

That, at least, I think that you can do? In the timeline, if you right-click on a strip it provides the option to open a flame-graph, I believe.

That’s fair!

Good good–I’ll add it there then, I intend!

[edit]
It is there added!

github.com/panda3d/panda3d

PStats: Scrubbing Through Flame-Charts

opened 08:52PM - 21 May 24 UTC

ArsThaumaturgis

enhancement

## Description I've been using the new version of PStats, and I really really l…ike it! ^_^ Of the things that I so like, two are relevant to this feature-request: First is the flame-chart view, which I find really illustrates well how the components of a frame relate to each other, and how they change over time. And second, is the fact that one can examine a session after the program has exited, allowing for analysis after-the-fact. To that latter end, it looks to me like it could be very helpful to be able to scrub back and forth in a session’s history while within a flame-chart. Now, I see the timeline view–but I’m having a little trouble orienting myself there, especially as I end up zooming out in order to move at a reasonable pace, which makes it difficult to see what the flame charts are doing. What I really find myself wanting is to watch the flame-chart “animate” as it does when live, but rather than in response to the passage of time, doing so in response to my adjusting a slider at the bottom, or some such. ## Use Case The one that prompted this thought in me, as I recall, was a situation in which the major components of a frame changed over time. One set of elements would predominate, then be replaced by another. While this was happening live, it was hard to catch what was going on. I suppose that I could have paused the output--but I didn't think of it. Further, that doesn't help much with comparing and contrasting different parts of the run. But going to the timeline, I found it hard to tell which parts were which: When zoomed out enough for (mouse-)navigation, the flame-charts all looked rather similar; when zoomed in enough to distinguish the charts, (mouse-)navigation was too slow. On top of that, there were gaps between charts (from dropped frames, I gather), and seeing the charts side-by-side made the changes over time less clear to me than seeing them animate one frame to the next. What _would_, I think, work would be to show the flame-chart as usual, and to then allow the user to scrub back and forth--I'm imagining a timeline-slider at the bottom--loading frames to the chart in-place. This should allow quick navigation regardless of zoom-level (although perhaps the slider could be zoomed independently), and the animation thus produced should, I think, be easier (for me) to parse.

[/edit]

Hmm, good point!

Trying that, I get a count of 13 575 transform-states. So a small reduction, but nothing significant, I feel…

Hmmm… That did prompt a thought, however:

I tried removing all of my environmental collision-nodes. This results in ~10 500 transform-states–a reduction of about 3 200.

Perhaps the issue isn’t just one thing, but a handful of sets of nodes that carry transforms…

Thaumaturge · May 22, 2024, 8:36pm

Okay, update:

First of all, I think that I’ve made some improvements!

The repeated traversals previously seen have been to a large degree combined into a single, larger traversal
I’ve removed most usages of my custom Actor-class
- I’ve left in the potential for its use by limited objects, especially larger ones that might benefit more saliently from animation-blending
I’ve replaced the logic that was using “NodePath.find”
- It now involves a class-level “registry” of relevant objects that can be checked instead

The result is that the frame-time remains noticeably lower when several enemies are active at once!

Conversely, I’m afraid, the frame-time is actually a bit higher when no enemies are near. I do think that I know the cause of this, however, and have a potential fix in mind…

I have yet to implement any changes related to sounds or effect-copying…

Next, I think that I’ve identified more transform-states. The list that I have at the moment looks as follows:

Environmental collision objects
- Lots of these
- Generally take the form of CollisionCapsules
Visible geometry for the environment
- Again, lots!
Visible “content” geometry (decor, etc.)
- Quite a few of these, I think
Breakable objects
- Not as many of these, but…
- Each is associated with a collision-object and two visible-geometry objects, one of the latter of which is generally hidden; all of these seem to have transform states
“Floor” geometry
- Often just one per room, I think.
- Essentially defines both the geometry underfoot and the collider that’s used for room detection
- The latter of those is held in a separate mini-scene-graph for purposes of room-detection
“Door” geometry
- Not all that many of these
- But each door may have more than one part, at least one of which will have a transform-state
“Blockage” geometry
- Essentially the same as “door” geometry
  - (It’s even handled similarly in code)

Now, I’m guessing that I can ignore doors and blockages–I doubt that they’re common-enough right now that they’re causing much trouble.

“Floors” I might want to flatten, perhaps.

Breakables are a bit awkward, as the colliders aren’t actually attached to the visible geometry; they’re just at the same location. We’ll see…

As for the first three, together they seem to account for nearly 5000 states, if I’m interpreting the data correctly.

[edit]

A short update: I’ve now addressed both of the above–the higher frame-time without enemies and the time used in copying (visual) effects; I decided to eschew addressing the copying of sound-effects.

And both of these have been a big help, dropping my frame-times quite pleasingly, I feel!

Now, neither of these helps with the transform-state issue, so there is still more work to be done–but still, these changes have been worthwhile, I feel.