My prim rendering library now runs with a raw OpenGL renderer, and with a Panda3D renderer.

I was hoping that Panda3D would automatically optimize the drawing so it would render faster, but in fact it runs maybe 10 times slower :confused:

What sort of things should I be doing to optimize this?

Lets assume two scenarios:

  • first scenario is the prims are relatively static (the vasssstt majority will be)
  • second scenario is that the prims are changing on a per-frame basis

Note that the first scenario doesnt necessarily imply that the prims should always have the same mesh, because LOD will likely change the number of triangles in the mesh wrt distance.

Optimizations so far:

  • added level of detail
  • only redo mesh if level of detail changes
  • level of detail is quantized to 4,8,16 and 32

Framerate with 300 prims on screen is up to 7.5, (from about 1 before).

What else should I do to improve this?


Welcome to the world of frame-rate optimization. This is where I’ve been spending a lot of my time lately. :slight_smile:

There are several points.

(1) The optimizations that are appropriate for a high-end card are completely different from those for a low-end card. What kind of card do you have at the moment?

(2) I am not yet completely satisfied with the performance of Panda with the new qpGeom structures, and have been working on improving this. There is some more work to be done here.

(3) It seems likely that the CPU is dominating the frame rate. How much work is being done in your cull_callback(), even when the object is not changing? It’s quite possible that the LOD calculation itself is in fact dominating the frame rate; what happens to the frame rate if your objects are totally static and fixed at a particular mesh?

You might find it useful to use the PStats tool to investigate where all of your time is being spent. This tool is documented in the manual, although the doc still needs some formatting and pictures.


More thoughts: are you measuring frame rate in Panda with the bounding spheres showing? There’s some cost to those as well.

Also, if you are using the ‘f’ key in pview to determine frame rate, note that the first press of ‘f’ is inaccurate, since it’s also factoring in the startup time–you have to press ‘f’ again to note the subsequent average. But it’s probably better to use the frame rate meter, enabled via “show-frame-rate-meter 1” in your Config.prc file.

Does the frame rate improve as you change your point of view to include fewer objects? That’s a good sign that culling is working properly. What effect does reducing the window size have on frame rate? If you are CPU-limited, as I strongly suspect, it should have no effect.


Some more thoughts:

  • Did you build Panda with OPTIMIZE 3? Although OPTIMIZE 1 or 2 is appropriate for developing new C++ code, for run-time performance analysis you really need to use OPTIMIZE 3 or 4 (and 3 is usually the best choice, because it enables useful tools like PStats).

  • If you have 300 objects, all at the same level (e.g. all parented to render), and each with its own transform, you have created a worst-case scenario for cull. Try putting “view-frustum-cull 0” in your Config.prc and see what happens to the frame rate; if it gets remarkably better, especially with the whole world in your view, you are cull-limited.

  • If it is a problem with cull, you might consider implementing PandaNode::xform(), which should apply the indicated transform to your primitive’s vertices (as I described in the other thread about LODNode::_center); then you can call NodePath::flatten_light() after you have loaded your scene, which should apply down the transforms. Obviously this is only appropriate for a static scene. (Well, you should implement xform() eventually anyway, but this might be a reason to do it sooner rather than later.)

  • Also, you can get a lot of bang for the buck simply by restructuring your scene graph so that they are not all at the same level. Group nearby primitives into a common group, so that the scene graph hierarchy reflects the spatial distribution of your primitives. For instance, there might be four top nodes, for each of the four quadrants; within each of those, there might be four more nodes, and so on.