Performance problems with many nodes/models

wesleystjohn · September 2, 2008, 9:54pm

I am trying to simulate a school of fish in panda. I have a fairly simple fish model with a simple texture and no animation. I would like to have up to 1000 models in the school (and at least 200-300). Doing this the naive way results in an extremely slow framerate. I have read about instancing, flattening, using copyTo(), and using point sprites or a particle system, but I don’t think these techniques will work because I need each model to be able to translate and rotate independently.

To eliminate possible causes of the slow-down, I took out the physics equations governing the schooling, and replaced the behavior with very basic setPos and setHpr calls. I also replace all the fish models with a simple sphere without texture. These changes sped things up a very small amount.

I used pstats and found that the majority of the time was spent not in cull or draw, but in App:ShowCode:General. I added some collectors in the updates and found that things like simply rotating the fish models took at least 15ms (this was with 200 models). The game runs at about 6-10 fps with 200 fish.

Here is the result of my analyze() call on the parent node of the fish:

1211 total nodes (including 0 instances); 0 LODNodes.
411 transforms; 16% of nodes have some render attribute.
200 Geoms, with 200 GeomVertexDatas and 1 GeomVertexFormats, appear on 200 GeomNodes.
39200 vertices, 39200 normals, 0 colors, 39200 texture coordinates.
GeomVertexData arrays occupy 7K memory.
GeomPrimitive arrays occupy 1K memory.
20800 triangles:
  20800 of these are on 9200 tristrips (2.26087 average tris per strip).
  0 of these are independent triangles.
1 textures, estimated minimum 64K texture memory required.

I have an Nvidia GeForce 7800 GTX.

So, first is it reasonable to be able to render and move and rotate several hundred models independently on screen at the same time without dropping below 30 or 40 fps?
If so, what advice do you have to speed things up?

Thanks!

ThomasEgi · September 2, 2008, 10:01pm

you should know about the some hardware-dependent limitations.
in the last several years graficcards performance improved hell of a lot.
but they still suffer from the same problem they had back 8 or even 10 years ago.

while graficcards can handle literally millions of triangles at ease. they totaly break performance if you seperate them into many small pieces.
200 geom nodes woule be ways too many pieces for a graficcard to process the data efficient.
you’r only realy chance is to reduce the number of nodes which are visible on-screen at the same time. usualy about 40 or 60 nodes is a good limit.

now for the good news, you’r not on your own as it happens. panda provides usefull stuff for you.
have a look at this -> https://www.panda3d.org/apiref.php?page=RigidBodyCombiner
i never really used it, but from the describtion it sounds like the perfect thing for you.

hope this helps to let your fps jump a “little”

drwr · September 2, 2008, 10:23pm

What Thomas says is absolutely right.

Also, to help your ShowCode problem, try putting:

transform-cache 0

in your Config.prc file. For most applications, Panda’s transform cache is an overall performance win, because it minimizes the number of times a particular transform needs to be recomputed. But if you have many hundreds or thousands of transforms all updating every frame, you might spend more time updating the cache than you gain from the cache benefits. So turning off the cache in unusual cases like this can be a big help.

David

Thaumaturge · September 2, 2008, 10:28pm

Are you sure that instancing won’t help? After all, each instance has its own parent nodepath, allowing it, I think, an independent position and rotation. After all, if that weren’t the case, surely instancing wouldn’t be very useful in general, as you would seem to then have all of your instances in the same place, facing the same direction.

The trick, I think, is to not move your “main object” - the one from which the others are instanced. Hide that somewhere, and then move and rotate only instances.

drwr · September 2, 2008, 10:41pm

Instancing won’t help.

First, let me correct a misunderstanding about instances and transforms: all instances share the same transform, because they all share the same node. It is true they each have their own, different NodePath, but the transform is stored on the underlying node, not the NodePath. (Nothing is actually stored on a NodePath, since that’s just a handle–basically, a fancy pointer–to a node, which is where all the real data is stored.) However, that doesn’t mean that instances must all share the same net transform. Normally, to use instancing effectively, you would instance your object to many different, unique nodes, and then set a different transform on each different parent node.

Now, let me correct the other common misunderstanding about instances: the graphics card still has to render all of them individually. As far as the graphics card is concerned, it doesn’t care about the the fact that all of your fish are really copies of the same object. If you have 1,000 fish, they still need to be rendered with 1,000 separate calls to the graphics card. It doesn’t matter if they all happen to share the same vertex buffer or not–it’s that 1,000 separate calls that kills you.

So scene graph instancing is hardly ever a performance benefit. There is another use of the word instancing, which isn’t related to scene graph instancing. This is a new idea involving writing a vertex shader to replicate 1,000 copies of your fish directly on the card. Then you can theoretically render your 1,000 fish with just one or two calls to the graphics card, which is of course much faster. But you’ll have to write a pretty clever shader to do this, and at the moment at least, Panda won’t provide you any assistance in this task.

David

drwr · September 2, 2008, 10:48pm

Let me also be sure that I don’t paint too grim a picture. The RigidBodyCombiner is designed to solve exactly this problem, and I think it should solve it quite well. Especially in conjunction with disabling the transform-cache. I’d be very suprised if the two of these together don’t solve your frame rate issues completely.

David

Thaumaturge · September 2, 2008, 10:52pm

Aah, my mistake, then (although with regards to the transform what you said is more or less what I meant, I believe - I should have used “node” instead of “nodepath” in retrospect, but that’s a misunderstanding of the relationship between nodes and nodepaths on my end, I think) - thank you for the explanation. ^^;

wesleystjohn · September 3, 2008, 2:28pm

All right, thanks for all the replies.
I’m running into some conflicts with the RigidBodyCombiner. Here is the class structure I currently have:

BaitBall(NodePath, FSM) - has a list of BaitFish, controls the state (offscreen, schooling, scared, etc)

BaitFish(NodePath) - has a model and an update function to set position, hpr, etc.

So I tried adding a RigidBodyCombiner to the BaitBall, and parenting the BaitFish to that. The RBC can add only PandaNodes through the addChild() method, right? I guess I am unclear on the different kind of nodes (NodePath, PandaNode, ModelNode, etc). I tried to add the BaitFish’s model to the RBC, but evidently that is a NodePath, and is not allowed.

Are there any examples using a RigidBodyCombiner?

drwr · September 3, 2008, 2:41pm

RigidBodyCombiner is just another kind of PandaNode. All of the standard node interfaces apply. Thus, the easiest way to add nodes to a RigidBodyCombiner is to wrap a NodePath around it, and then use the standard reparentTo() interfaces like anything else.

e.g.:

rbc = RigidBodyCombiner('rbc')
rbcnp = NodePath(rbc)
rbcnp.reparentTo(render)

for fish in fishList:
  fish.reparentTo(rbcnp)

rbc.collect()

David

drwr · September 3, 2008, 2:45pm

Watch out for the NodePath vs. Node confusion. That’s a common confusion to Panda newcomers, and is probably the single-most confusing thing about Panda’s interface.

In a nutshell: there are lots of different node types. PandaNode is the most general type, the base class. Then there are subclasses of PandaNode, like ModelNode, GeomNode, TextNode, Camera, RigidBodyCombiner, Character, and the list goes on.

Then there is NodePath, which is a handle, or a kind of pointer, to a node. It can be a handle to any of the above node types. It also contains a sense of the full path from the root of the scene graph (e.g. render) to this node, which is where it gets its name.

All of the fundamental scene graph operations are defined at a high level on NodePath. These are things like setPos(), getHpr(), reparentTo() and so on. There are also lower-level variations on these same operations which can be performed directly on the PandaNode class, like addChild(), but these are rarely used, because the NodePath versions are so much better.

David

wesleystjohn · September 3, 2008, 3:39pm

Thanks for the reply. I tried that with unexpected results…

    def setupFishes(self):
        rbc = RigidBodyCombiner('rbc')
        rbcnp = NodePath(rbc)
        rbcnp.reparentTo(self)
        
        for i in range(200):                                  
            pos = a random Vec3
            
            f = loader.loadModel('fish')
            f.setPos(pos)
            f.reparentTo(rbcnp)
            
        rbc.collect()

What I get is one huge fish…?

drwr · September 3, 2008, 4:48pm

This works fine for me:

from direct.directbase.DirectStart import *
from pandac.PandaModules import *
import random

rbc = RigidBodyCombiner('rbc')
rbcnp = NodePath(rbc)
rbcnp.reparentTo(render)

for i in range(200):                                 
    pos = Vec3(random.uniform(-100, 100),
               random.uniform(-100, 100),
               random.uniform(-100, 100))

    f = loader.loadModel('smiley.egg')
    f.setPos(pos)
    f.reparentTo(rbcnp) 

rbc.collect()

I get a cloud of smileys. Something wrong with your random Vec3 computation, maybe?

wesleystjohn · September 3, 2008, 6:27pm

No, the problem is with the model for some reason. Your code worked for me with 200 smileys. I replaced ‘smiley.egg’ with some of my models and I just get one big model in the middle. About half of my models work like smiley, and the other half do not.
I added an update task, and all the nodes are still there, printing out there random positions (which I change every frame), but the model shown on the screen doesn’t move!
If I attach the models to render, they all show up and scatter around… weird.

Additionally, in this small example (using smiley), using the RigidBodyCombiner actually slows things down significantly. After adding the update task (and running with a lot of other apps open) it runs at about 13.5 fps with the RBC, but about 19.5 when the models are parented to render.

Here is the code:

from direct.directbase.DirectStart import *
from pandac.PandaModules import *
import random
from direct.task import Task

rbc = RigidBodyCombiner('rbc')
rbcnp = NodePath(rbc)
rbcnp.reparentTo(render)
fishes = []

for i in range(200):                                 
    pos = Vec3(random.uniform(-100, 100),
               random.uniform(-100, 100),
               random.uniform(-100, 100))

    f = loader.loadModel('BaitFish')
    f.setPos(pos)
    f.reparentTo(rbcnp)  # shows one model, if reparented to render, shows all of them
    fishes.append(f)

rbc.collect() 

def update(self):
    for f in fishes:
        pos = Vec3(random.uniform(-100, 100),
               random.uniform(-100, 100),
               random.uniform(-100, 100))

        f.setPos(pos)
        print "fish at ", f.getPos()
    return  Task.cont

taskMgr.add(update,"update")
run()

I can send you the egg file if you want to try it…

drwr · September 3, 2008, 6:52pm

Hmm, I see it too. There appears to be a serious bug in RigidBodyCombiner. I’ll look into it. Why don’t you go ahead and email me the egg file (my email address is available through the forum software) so I can make sure I fix it for that model too?

David

drwr · September 3, 2008, 7:37pm

OK, thanks for the model. I found my bug, but it turns out this wasn’t your problem. What was burning you is that your model was converted as an animated model, not a rigid model, and thus isn’t suitable for RigidBodyCombiner. (Arguably, this is also a bug in RigidBodyCombiner–that it behaves so badly in this case. I’ll see about fixing that bug, too.)

Anyway, there’s an easy workaround. Either re-convert your fish model without the “-a model” option (or with “-a none” instead, which is the default), or simply hand-edit the egg file and comment out the flag and the hierarchy of entries.

Note that there is an enormous difference between “-a model” and “-a none”. The former converts a model with a hierarchy of joints for soft-skinned animation via the Actor class; the latter converts a model as a collection of rigid geometry for just about anything else. Converting a model with the incorrect option for your purposes is a very common mistake.

David

wesleystjohn · September 3, 2008, 9:14pm

David,
Originally we thought we would need normals and animation, and that is what the modelers gave us (we outsourced most of the art assets for this project). Later, it became clear that the number of fish and their proximity and small size would not necessitate animation. I am ignorant about models, so I figured if I just didn’t reference or run the animation, everything else would be fine. I will get him to re-export it without animation or normals and try it again.

I did a test with box.egg, 5000 of them! First I added them all to a node underneath render and rotated that node. Not surprisingly it ran very slowly (8 fps). Then I used the RBC and it ran at almost 60 fps!

In another test with 5000 boxes, setting the positions randomly each update, it ran at 5 fps and increased to 12 fps with RBC.

I will post an update after I test with my new fish model.

Thanks again!

wesleystjohn · September 9, 2008, 3:17pm

So, I got my fish model with no animation, normals or anything special.
I ran my test with the simple script using 500 models, and the RigidBodyCombiner was an exceptional performance improvement - about a 2/3 speedup. Great.

However, when implementing it in my actual game, the performance gain seems minimal. With 200 models, it was running at about 16 fps under normal parenting, but only 17 fps when using the RigidBodyCombiner.

In tests with 500 models the results were about 6 fps, and 7 fps with RBC.

So, my question is, what am I doing in my game that is causing the slowdown, or more precisely, what kinds of things would decrease the performance benefit that might be had from the RigidBodyCombiner? I know that having the wrong kind of model (Normals, etc.) can slow it down, but that isn’t the case here.

Basically what I am doing is on every update, setting the position, hpr (using lookAt and setHpr), and scale of each fish model…

Thanks for the help so far.

drwr · September 9, 2008, 6:05pm

Well, now I don’t know. Maybe the transform-cache thing?

Are you sure you set up the RigidBodyCombiner properly in your game?

You’ll have to look at PStats to see where the performance bottlenecks are, with and without the RigidBodyCombiner. That will surely provide some insight.

David

wesleystjohn · September 9, 2008, 6:14pm

David,
Am I right in thinking that the RigidBodyCombiner will only help with rendering? I found that the more operations I did to each fish model (setting position, rotation, scale), the less of a percentage speed increase I got from the RBC. So what might be happening is the RBC provides a certain amount of performance gain per model, but the more operations I do, the less this performance gain is noticable because the rest of my code is just slow.

What I am going to try now is to have a number of fish groups in the bait ball, each of which have a number of fish models. I will only perform the setPos and setHpr operations on the groups. Additionally we will move the setScale() to a shader. (We have to do this every frame due to the very small angle of view of the camera to provide the feeling of depth)

Using groups of fish will look a little less realistic, but should improve performance enough (along with the RBC and transform-cache 0).

drwr · September 9, 2008, 6:22pm

It’s true that the RBC is only a draw-time improvement, and other costs can certainly dwarf your performance gain there.

Have you tried replacing:

fish.setPos(pos)
fish.setHpr(hpr)
fish.setScale(scale)

with:

fish.setPosHprScale(pos, hpr, scale)

? Also, are you sure your time is not spent more in calculating pos and hpr, rather than simply in setting them?

David