I have a scene with around 8000 cubes and it only gives me 5 FPS. I expected no drop, that is 60 FPS. Using Pstats, I find that it is the app that takes too much time, but my app draws it only once and does nothing after that; it shows a lot of waiting time. I thought it might be that all those cubes have the same origin file, so does panda3d real time call the file each frame, making it laggy as the same file is accessed 8000 times per frame? Also, using sys.sizeOf(), I find that each cube only takes 40 bytes of space, so totally less than 5 MB. I am using a GT1030, but I expect that that isn’t the barrier.
Is each of those cubes a separate node in the scene-graph? If so, then I believe that eight thousand nodes is an awful lot for the system, and for a graphics card, to process.
To explain, my understanding is that modern graphics cards are capable of handling a ridiculous number of polygons, as I daresay that you’re aware. However, they’re not good at handling a large number of individual objects–regardless of the polygon count of those objects. So a single mesh with a huge number of polygons is fine, but a huge number of meshes each of only a few polygons is not.
(I’m not sure offhand of whether there’s also a bottleneck on the engine side, processing all of those objects.)
Thus you might find that your scene performs better if you merge your various cubes together. If they’re static geometry, and not expected to change at all, then perhaps careful use of the various “flatten” methods might help. If they’re expected to move about, then perhaps the Rigid Body Combiner might work for you. See this page for more information.
If something more complex is called for, then more-complex methods might be in order–but with those I myself am not familiar, I’m afraid.
Regarding PStats showing that time is being spent in “app”, did you activate GPU profiling? If not, then the graph may not have been entirely accurate to your bottleneck. See this section of the manual-page for PStats for more information.
Each cube is a separate node. Using flattenStrong() increases FPS to 7 and rigidBodyCombiner to 11ish. I even used LODNode, but got worse results (maybe because the cubes anyway have less detail?). Here is what I understand: each object has a mesh, defining the object, multiple objects means multiple meshes, overloading the GPU. Suppose I originally beforehand had those 8000 cubes as one object, would it render better (I assume there are lesser meshes)? Is there a way in panda3d itself to combine these cubes and reduce meshes, as if it were originally one object? Also in many games (like Minecraft), each chunk has 65536 blocks, and loading 10 chunks still give you a whopping 60 FPS. How can I get this performance? How do I reduce meshes?
Hmm… In that case, further investigation might be called for into the exact nature of the bottleneck. Have you tried PStats with GPU-timing, as I mentioned above?
That said, perhaps try “flattenMedium”, instead–it may be that “flattenStrong” is going a little too far.
There is also one other consideration: You say that your cubes are loaded from a model. The “flattenStrong” method (and I presume “flattenMedium” and “flattenLight” likewise) doesn’t go past “ModelRoot” nodes, which may be generated as part of loading a model from a file.
More information and a means of dealing with the issue can be had on this page, towards the bottom of the page, I believe.
In this case that doesn’t surprise me overmuch: after all, LODNode results in there being even more nodes in the scene-graph, it seems to me.
That is correct to my understanding, indeed.
Possibly. These things get a little complicated: Reducing the number of objects also reduces the amount of data that the engine can cull away. Thus it’s sometimes less a matter of minimising the number of objects than of finding the correct balance between the number of objects and the amount that can be culled.
That’s basically what “flattenStrong” and the Rigid Body Combiner are intended to do, I believe.
I don’t know the internals of Minecraft and other such games, I’m afraid.
However, I strongly suspect that they don’t store each cube as a separate node, but instead generate sections of world-geometry based on the underlying cube-data, resulting in a relatively-small number of nodes each representing multiple cubes. When the player destroys or adds a cube, the relevant model would then be regenerated. This, however, is speculation on my part.
Perhaps there are articles out there describing the methods used by such games? It might be worth looking up!
You should call clearModelNodes() on your cube model before flattenStrong will have any meaningful effect.
However, time spent in App is puzzling. (GPU timing won’t really help drill this down, since it’s in App). Which part of App? You can just double-click the App label in PStats to zoom further in. It might be something silly, like some initialization code that is accidentally run in a task, or something like that.
Yep, it shows less than 1ms spent on frame.
None of the flattens help, all reduce frame rate.
What does that do? I tried it, only got 2 to 4 FPS.
These are the results from Pstats:
App jumps to maximum when you do things on the render (look around etc.), as shown in the beginning, double-clicking wait shows that all the wait time is for thread-sync, I am using threading-model Cull/Draw. Double-clicking app shows that all time is spent in cycle.
By the way I realized my mistake, I was using RigidBodyCombiner in the wrong way, I reparented the RBC to the cube. Fixing that is better, at 45ish FPS. Can I still improve it?
RigidBodyCombiner does the same thing as flattenStrong(), except it does so continuously (as your objects move around). If you don’t need to move your objects around individually, you don’t need to use RigidBodyCombiner. It’s a very specific tool that usually makes things slower, so I don’t recommend it unless there is no other alternative.
If you’re getting better performance with RigidBodyCombiner than with flattenStrong, then flattenStrong is not being used correctly. You are calling clearModelNodes() on the cube model after loading it from file, and you are later calling flattenStrong() on a node that is a parent of all the cube instances?
I did not realise you were using the threaded pipeline. Is it actually offering you improvements in FPS? It may be easier to profile your application when it’s still running in single-threaded mode, and then later switch to the threaded pipeline.
I think what PStats is telling you is that the main thread is just waiting for other threads to complete. That just means the actual time is spent in other threads. You need to use the menu at the top of PStats to switch to the thread that is actually taking up all your time.
Ah, my mistake–I think that I remembered that without GPU timing PStats values could indicate GPU-time elsewhere, but mistakenly thought that it might appear under “App”. My apologies for that!
In short, I believe that it addresses the issue that I mentioned in the quote above, regarding “ModelNodes”: It makes a change to ModelNodes that allows the “flatten” methods to take effect beyond them, without being stopped. This should allow the “flatten” methods to flatten together such models.
I flattenStrong each node individually, not parenting to NodePath and then flattenStrong, that was the problem.
So I would think I can add more features to the app and maintain the same FPS. I agree on using 1 thread for development, but sometimes I get horrible frame rate, that is why I had set it to multithreaded.
Thanks a lot, much better frame rate now.
Now I am getting 60 FPS even with 100000 blocks!