Performance issue

crimson · April 3, 2006, 9:05am

Is there a limit on the number of models that can be in the scene graph at any one time? I have a program that loads 100100 = 10000 low poly models (mostly boxes and squares). The “boxes” should have 52 = 10 polygons each and the squares only two. So this scene should have a total of 100000 polygons at most. It’s more than few, but my GeForce 7800 GS should be able to handle lots more than that. Still, the whole application becomes almost unresponsive when I run it. It takes a long time to load which isn’t really so big a problem, but once the loading is over I can’t even move the camera properly because the program seems to be so sluggish. It does show the scene once the loading is finished but the updating seems to be slow. I tried switching to bam files but that didn’t help in the updating. I’ve tried to run the program with fever models and have found out that the framerate drops noticeably with only 100 models in the scene. Could this problem have something to do with my graphics card drivers?

bigfoot29 · April 3, 2006, 9:53am

AFAIK its only limited by your hardware. The question is: do you really want to reach the limit of your hardware? Remember that other ppl might have worse tech, so it might be a good idea to keep it as low as possible.

However, there can’t be a simple answer on that. But of course, you could write a small tool and ask ppl here to test this tool on their platform asking for their frame-rate to see the overall performance… (or just let the program print out a “overall rating” that these ppl can post here then together with their specs.)

Regards, Bigfoot29

Another question is: Did you load the modells as a copy of one or did you load each model again and again? Maybe just your memory is full and he starts swapping…
Anyways, the CPU will still have to handle many of the calculations. The models have to go to the graphic card, right? and their position, collision and stuff need to be processed, whats happening on the CPU, not on the GPU.

drwr · April 3, 2006, 2:21pm

While a GeForce 7800 GS can easily handle 100,000 polygons, it needs to have all of those polygons in one batch. That means they all have to be in the same node for maximum performance.

When you have multiple nodes, Panda has to send each node to the graphics hardware as a separate batch of polygons (because the nodes might move independently, or have different state changes on them). Modern graphics hardware hasn’t made any improvements recently in handling large numbers of batches, just in handling large numbers of polygons per batch. So if you have a lot of nodes with only a handful of polygons per node, your frame rate will suffer. This problem is not specific to Panda; any graphics engine will have the same problem–it’s due to the nature of the PC and the AGP bus.

But there’s a workaround. If you intend all of these boxes to just sit around and be part of the background, or to move as a single unit, you can flatten them all together into a handful of nodes (or even one node). To do this, parent them all to the same node, and use:


node.flattenStrong()

There are also flattenLight() and flattenMedium(), which have different effects, and each is appropriate at certain times. I don’t want to get into a whole lecture on scene graph optimizing right now, but in general, flattenStrong() will do the best it can to collapse nodes together.

One thing that flattenStrong() won’t touch is geometry under a ModelRoot or ModelNode node. Since each egg or bam file loads itself up under a ModelRoot node, you will have to get rid of that node first if you want the geometry from multiple different egg files to be flattened together. You can do that with something like this:

modelRoot = loader.loadModel('myModel.egg')
newModel = NodePath('model')
modelRoot.getChildren().reparentTo(newModel)

David

Fixer · April 3, 2006, 2:22pm

One extremely useful tool for figuring out scenarios like this one is the analyze() method recognized by NodePath objects. Once you get your scene loaded, try the following statement:


render.analyze()

You should get a response in the command window that looks something like this:


371 total nodes (including 43 instances).
21 transforms; 16% of nodes have some render attribute.
205 Geoms, with 94 GeomVertexDatas, appear on 133 GeomNodes.
21665 vertices, 21573 normals, 21557 texture coordinates.
35183 triangles:
  3316 of these are on 662 tristrips (5.00906 average tris per strip).
  0 of these are on 0 trifans.
  31867 of these are independent triangles.
0 lines, 0 points.
99 textures, estimated minimum 326929K texture memory required.

This information may prove helpful in determining where your graphics hardware is getting dogged by your scene. In the example above, I’ve managed to build a scene that demands about 326MB of texture memory if every object is visible; that’s quite a bit, and as I scan around the scene with the camera, my performance (on an NVIDIA GeForce 4) will pop and stutter as textures get swapped in and out. Decreasing the dimensions of my textures (they’re all 1024 x 1024 right now) could greatly improve the speed with which my scene renders.

Your hardware may be better or worse at rendering textures or multiple triangles than mine. Once you get the analyze() data, cross-referencing it against your hardware specifications should help you figure out where you can improve performance.

Best of luck!
-Mark

crimson · April 3, 2006, 4:38pm

Thank you all for your replies. Now I understand why the program was so sluggish. I have one question, though. The kind of functionality I have in mind is to have a world represented by cells that can be manipulated when needed (hidden, their models changed and so on). Is it possible to achieve this? I imagine it would be harder to do if the cells were flattened to fewer nodes. Right now I have every cell represented by individual nodes.

drwr · April 3, 2006, 5:10pm

You can have several hundred batches (Geoms) in your scene and still achieve 60Hz. You just can’t have several thousand batches.

Achieving this balance point is what makes designing any interactive world tricky. Ideally, for the programmer’s convenience, the world should be subdivided into many different easily manipulated pieces.

Unfortunately, for the graphics card’s convenience, you need the opposite to be true, to a point. (You don’t want to go too far the other way, and put too much geometry into a single node, which can impede culling–you’ll end up always rendering your entire scene, even if most of it is behind you.)

What makes it even worse is that the optimal balance point is different for different hardware. A GeForce2 performs best with (say) 1,000 triangles per batch. A GeForce 7800 GS might perform best with (say) 10,000 triangles per batch. A TNT2 might prefer only 100 triangles per batch. These are made-up numbers, but you get the idea.

But, to answer your question, achieving this optimal balance point, or close to it, is certainly possible. But performance tuning at this level is something that Panda can’t really do for you, unfortunately; you’ll have to do a lot of balancing yourself.

You’ll have to decide on a particular target hardware (for instance, the card that you happen to have), and optimize to that. Then choose an appropriate model granularity for your scene. If you need more fine-grained control–more individual, smaller pieces–than your graphics card can handle easily, then you’ll have to get clever with your coding, for instance by flattening together the pieces that are static, and then swapping in unflattened versions of those pieces when you need to change them.

David

crimson · April 3, 2006, 5:28pm

Ok. I will think about it and see if I can come up with a solution. Thanks for your help.

Rob2 · July 31, 2006, 6:36pm

If there is a practical limit of hundreds of nodes, is it then impossible to have thousands of billboards in the scene? After all each one requires a node to rotate around and in my case it can be possible that there will be thousands of billboards in view that need updating.

I had a go at a work-around that puts several billboards together into a single geometry to reduce the amount of batches. That would mean that the billboard effect is no longer present, and that would I need to code a way to rotate parts of the mesh.

I tried that by using GeomVertexWriter and GeomVertexReader to modify vertices of the mesh directly but I noticed that that way is fairly slow. It already starts to choke when I move a single quad 10,000 times (which would simulate moving 10,000 quads).

Does this mean that Panda3D is incapabeable of showing and moving thousands of sprites? Or is there a different method of handling this?

Thanks for any answer,
Rob

drwr · July 31, 2006, 8:02pm

Panda can render thousands of sprites, but not as individual billboards (which, as you’ve observed, are far too expensive for this sort of thing). Instead, you should create a GeomPoints primitive with the “thick points” render mode and the TexGenAttrib.MPointSprite texgen mode. This is, incidentally, exactly what the SpriteParticleRenderer does if you are using the particle system to generate sprites.

If you are generating sprites by hand, you have to first create a GeomPoints either using the GeomVertexWriter interface (see the Panda3D manual) or the LineSegs interface (see the generated docs). Put just one vertex at the center of each sprite (don’t try to draw a square). Then you apply the following attributes to the NodePath that contains your GeomPoints:

points.setRenderModePerspective(True)
points.setRenderModeThickness(1.0) # or however big you want them to be
points.setTexture(myTexture)
points.setTexGen(TextureStage.getDefault(), TexGenAttrib.MPointSprite)

Of course, if you are animating thousands of sprites in Python code, you may find that Python becomes a performance bottleneck–in this case, you may need to drop your GeomVertexWriter code down into C++.

David

Rob2 · August 1, 2006, 8:02am

Thanks! I will play around with these GeomPoints then.

But from your last comment I understand that animating them still means using the GeomVertexWriter, which will remain a bottleneck, albeit with 4 less modifications to make as I only need to change 1 vertex.

I wouldn’t have thought that Python would be that slow. I did a similar experiment with MacroMedia’s Director (Shockwave) and I was able to edit 60,000 vertex positions of a mesh with a reasonable framerate. And with easier code, but I won’t go on about that

[…]

I did a further test to see wether it is GeomVertexReader and GeomVertexReader that are causing the bottleneck. And that indeed seemed the case as I removed the code that modified the vertices with getData3f and setData3f, the frame rate would be as slow as before. So it seems that if I put as many vertices in one mesh the framerate will increase considerably. Now I just need to find out wether that is possible with GeomPoints too.

Cheers,
Rob

drwr · August 1, 2006, 1:14pm

Panda does all of its vertex animation using the GeomVertexWriter interface, down in C++. This includes vertices animated by the particle system, as well as all vertices animated by joints in an Actor. Using this interface, Panda can effortlessly animate many thousands of vertices in a millisecond or two–no problem.

The reason it is slow when you do the same thing from Python is that the actual call to setVertex3f() takes a lot of overhead in Python. In fact, any function call requires a lot of overhead in Python; function call overhead is Python’s biggest performance bottleneck. Since animating thousands of vertices requires calling setVertex3f() thousands of times, this overhead will add up quickly.

I guess we could add a special Python-friendly interface, something like setVertices3f(), that would allow you to pass a list of tuples in one call, and thereby cut down on this overhead. Of course, you’d still have to construct that list of tuples one-at-a-time in Python code, so I’m not sure how much you’d really gain at the end of the day.

Python is excellent for object-level scene management. It’s perfectly fast enough, and it’s such a joy to program in I wouldn’t dream of using anything else anymore. But it’s not really well-suited to per-vertex operations.

David

Cyan · August 2, 2006, 8:37pm

If you ever find that python becomes a bottleneck you could always try Psyco.

Psyco is a Python extension module which can massively speed up the execution of any Python code. It works kind of like a JIT compiler that trades memory for speed. Often Python code with Psyco can execute faster than code in native C!