Hi. Came across some information that bullet physics can be multithreaded and am wondering how this can be done within panda3d. I noticed that physics simulation with physics in panda3d is really slow and am only simulating about 3600 box rigid bodies (with a colored cube model attached to it) before dipping below 30fps (note I have an i7 14700k cpu so this seems like a low count).
I’m just trying to look for ways to optimize dynamic physics objects. If I’m doing something wrong or am missing something please let me know.
Also I think I saw something about bullet now being able to offload some calculations onto the gpu, like how PhysX can.
Any help would be greatly appreciated.
Edit: Just Tested on godot and have more than 7400 cubes before dipping below 30fps
Are you sure that the problem is coming from Bullet?
After all, 3600 nodes in the scene-graph is… a lot. o_o I could easily see the the bottleneck in fact being in rendering or culling, rather than in physics.
(It’s true that modern graphics cards can easily handle tons of vertices… but, as I understand it, the prefer those vertices in a smaller number of batches.
So a single, very-high-poly model will tend to result in better performance than a great many lower-poly models, even if the total number of polygons is the same in both scenarios.)
Hi. I did some more tests without rendering the cubes and just box rigid body shapes and only got about 200 more than the 3600 nodes before hitting 30fps so I don’t thing its a culling bottleneck.
As for the single, very-high poly mesh, that’s not what I’m testing for. I’m doing a classic, spilling boxes into one pile test to stress test how many the engine can handle. compared to Godot, which is considered very weak on the physics side, panda seems to be much weaker at handling. Bullet physics is known to be better than Godot physics, so I’m confused to why this isn’t the case with my tests. perhaps I’m setting something wrong?
Hmm… I’m honestly not sure of what Panda does when there are nodes in the scene-graph that aren’t visible, and that aren’t parents of things that are visible. It’s possible that the culling traverser is still visiting those 3800 nodes behind the scenes.
For a clearer view, I’d suggest looking at PStats–and I recommend the new PStats, if you’re willing–to get a more direct idea of where the bottleneck lies.
That wasn’t quite what I meant.
What I was saying was that, even though modern graphics cards are powerful, they’re powerful at pushing lots and lots of polygons, not at pushing lots and lots of individual objects. Those are two different things, and you’re doing the latter.
Still, as you say, it looks like it’s not a rendering issue–although as I said above, it could still be a culling (or, I suppose, other node-handling) issue.
Well, even if we presume that the issue is a physics issue, define “better”. (I really don’t know what people are saying about the comparison between Godot and Bullet; I’m genuinely asking.)
Does that mean that Bullet is faster than Godot physics? That it’s more accurate? That it’s more precise? That it handles tricky edge-cases better? That it handles common-but-sometimes-difficult circumstances like stacking better?
All of these (and perhaps more) could count as it being “better”. And some of them could potentially result in it being slower–at least in certain circumstances.
even if we presume that the issue is a physics issue, define “better”. (I really don’t know what people are saying about the comparison between Godot and Bullet; I’m genuinely asking.)
When I mean ‘better’, I mean performance. I do expect Bullets speed to vary depending on the engine, but I don’t expect it to be half as slow in panda3d as Godot.
Does that mean that Bullet is faster than Godot physics? That it’s more accurate? That it’s more precise? That it handles tricky edge-cases better? That it handles common-but-sometimes-difficult circumstances like stacking better?
Bullet is about the middleground for physics engines and should perform close to Godot’s.
according to this github repo, this guy did a test and said that bullet was only 10% slower than godots inbuilt physics, but in my tests, bullet only performs about half of what godot is doing. https://github.com/omggomb/GodotPhysXPrototype
(if you scroll down you can read the data collected)
I do know that bullet can be compiled to work with multithreading and I might experiment with recompiling panda3d to work with bullet multithreading. I’m not really experience in this matter so it might take me a while to figure out (As I’ve only tinkered with building projects with panda3d SDK in C++).
edit: If anyone else has some optimization ideas, let me know please.
Update: When removing alpha (Transparency Attrib) I pushed the count close to 4300 colored cubes. A bit better but still not close enough to Godot’s attempt.
For context, I would like to get to about 6500 cubes or more. that’s about 10% off Godot’s.
In which case I stand by my recommendation that we first test to see whether the problem actually is with the physics system, and not some other element.
(Again, I suggest looking at PStats to determine this.)
(And this may well tell us what optimisations to potentially look at.)
Your test-program gives about half the performance that the Godot test-program gives–but I don’t think that we’ve established that it’s Bullet that’s the main factor in that difference.
Which suggests that the bottleneck may well be in something other than Bullet itself, since removing a bit of rendering overhead improved performance.
Very trivially, the frame rate in the pool is a tick rate by analogy with a game server. The higher the frequency, the more calculations, and accordingly the load.
hmm, rendering may be the bottleneck in this scenario. I’m not really sure how panda3d handles its multithread rendering but it looks to not handle nodes in separate threads and only keeps to one thread (coming back to the many objects in the scene issue). As I understand it, panda3d renders in stages that can separated into threads (“Draw/Cull”) and doesn’t go any deeper like managing nodes in multiple threads. I might be entirely wrong on this matter and am happy to be corrected. But the issue still stands that the engine can’t handle physics and nodes as well as others.
The issue might also be embedded into the graphics API, OpenGL as I understand that the API is hard bounded to use only one core. This is slower on newer hardware where graphics APIs like Vulkan (which is now used in Godot) can utilize multiple cores to offload more tasks onto other threads which in turn increases performance (and could be the reason why Godot could handle more cubes).
as for bullet, unlike Godot, panda3d only seems to run the physics engine in a single thread and not spread out the calculations over multiple cores on the cpu. Theoretically, if bullet was implemented with multithreading capabilities, would the physics benchmark increase with performance?
As for pstats, It seems to fail to run as I am getting this error:
:net(error): Unable to open TCP connection to server [::1]:5185
:pstats(error): Couldn't connect to PStatServer at localhost:5185
Thanks for the solution to the pstats problem. This would be the first time I’ve had to use the function.
When running you example, I indeed get an incredible performance boost and had 14,000 frames when the cubes landed and 45 frames when the cubes were falling. When putting your example into a live panda3d application, I also saw an increase in performance and got to the goal of 6500 rigid body cubes computed. I guess it came down to how many nodepaths were added to the scene.
However, this still hasn’t completed the goal of rendering cubes attached to the rigid bodies. With this example, it wont be possible because the code is not adding the rigid body nodepaths to the scene where geometry can be attached to it.
Both look identical and it looks like the cpu is the bottleneck in this situation. I think that the amount of nodepaths in the scene are causing the slow down. This would explain why, even when I remove the colored cubes from the rigid bodies but keep the nodepaths, there isn’t much of a difference. When I make a nodepath but don’t parent them to the scene, the performance hits the 6500 body mark. When I do add the nodepaths to the scene, the performance hit is seen.
also coming back to this question, are you suggesting that the physics shouldn’t be synced to the rendering? If is shouldn’t, does it actually do anything to increase performance? When I ran the DoPhysics() function in another thread, I actually got worse performance.
Now that we definitely know that the amount of nodes are the bottleneck, what can I do to increase performance while achieving the goal of rendering 6500 colored cubes? .If Godot has found a way to allow this many nodepaths in the scene and hit 7400 cubes, how can panda3d do this?
I think Godot uses multithreading to achieve the calculations and some other robust optimization tricks. The question is how do I get this into panda3d?
I think the Godot documentation says that the developers have gone to simplify the quality of physics simulation. I suggest not synchronizing physics, just like Godot.
It says here that the physics is updated a fixed number of times per second.
I’m just guessing that on the Godot side you’re using _physics_process()
To increase the number of rendered objects: The Rigid Body Combiner — Panda3D Manual
This reduces the number of node data transmitted to the graphics card per frame.
As far as I know, Panda has a multithreaded rendering mode. threading-model Cull/Draw
Add this line to the configuration file.
I have attempted to use the rigid body combiner and it provides no benefit with performance. In fact, It causes most of the cubes to disappear. A weird issue.
I have tried doing this in the past and there is a forum page out there with some info where I asked the use of the combiner on bullet objects. The answers from that forum specifically said that the rigid body combiner is not designed for physics nodepaths. A lot of weird bugs happened when I used it, this is one of them that I still haven’t quite solved to this day.