Integrating hardware instancing into the render pipeline

Right now Panda 3D supports hardware instancing, but it is agnostic of the scene graph and thus of bounding volumes, culling, collisions and maybe something else.

I am exploring the options how to add a new “hardware instanced” render attribute to a node path and make the graphics pipeline respect it. If I understand correctly, it should be something like this:

  1. If the CullTraverser visits the same GeomNode with the “hardware instanced” attribute twice (either if it is a reference set by user, or a recursive traversal) or more, it shouldn’t add this Geom to the cull result, but rather store the instance ID and transform data in some kind of container.
  2. At the same time, GraphicsStateGuardian should somehow update the shader attribute with each node’s transofrm.
  3. GeomPipelineReader should read this data and make the GraphicStateGuardian dispatch a correct call (glDrawArraysInstanced or glDrawElementsInstanced, since OpenGL is a primary backend)
  4. Preferably, the ShaderGenerator should add an input with array of node transforms so the user could do without writing own shaders.
  5. Don’t know if it will work with the fixed pipeline, but I think it’s not important

Said all that, I have the following questions to the people who know how the render pipeline functions:

  1. Is my understanding is correct?
  2. How much work be needed for this?
  3. Does it worth implementing at all, or the performance bottleneck will be somewhere else? E.g. if we still traversing 1000 nodes and then drawing with 1 call, it will take more time than to draw instances in 1000 calls without scene graph traversing?

Ahhh. You’re on the right track.

The render state is comprised of a combination of things that require an OpenGL call (such as the cull face attribute, or a texture), and things that could possibly be looked up in an array in the shader (such as the material, transform, color). The cull traverser will need to be clever enough to realise which geoms it encounters could be an instance of an existing geom, and which ones can’t (because they require a GL state change).

(In the past, I’ve thought that perhaps the ShaderAttrib could have a bitmask of which attributes the shader is able to instance (eg. color, material, position), so that the CullTraverser can know which attributes to select for when considering when to instance.)

Without fundamentally rewriting the cull system, this means that the CullBin/CullResult needs to have some sort of a map, with the key being the Geom and the non-instanced part of the RenderState, and the value being a buffer containing a list of instances with all the instance-specific state. When the cull traverser finds a new Geom, it would then have to look up the geom+state in this map, and if found, it would update the corresponding instance list, or otherwise start a new one.

The big question is how good the performance will be. We won’t really know until we try. You’d be very welcome to experiment with it. The main bottleneck will likely be that the cull traverser is still looking at every individual Geom for as many times as it is instanced, and will have to do a map lookup and write to the instance table. The fact that it has to do this every frame, for every instance, on the fly, probably means there will still be overhead.

@CFSworks and I have been talking about a way we can perhaps rewrite the cull system so that it is persistent; the culled list of objects would be made once and then updated only when that specific part of the scene graph is updated. This would probably go a long way to mitigate the above bottleneck, by not having to create and fill an instance list every frame.


In the meantime, I have been working on an alternative way to do HW instancing. On the instancing branch on GitHub, I have a working InstancedNode, which is a node that stores an InstanceList (really an array of transforms) entered by the user. It culls the entries of the InstanceList and then passes it down the scene graph, and every geom encountered below it gets this instance list passed to its shader.

It is obviously a much less automagical and less flexible way of doing instancing than what you are proposing, but until we implement that, I wanted to have a simple way of having instancing that worked with culling. And, an advantage is that it doesn’t require the cull traverser to re-traverse the nodes for every single instance.

If you want to try it, an example script is available here.

Thank you for such a detailed answer, I will explore both options: try your solution and continue to study the source to understand how to fit HW instancing without rewriting the whole pipeline. I am far from writing the code that actually will work, but I think it would be cool to let the user just draw a lot of meshes by just adding a single render attribute.

Good to know I am on the right track. I was thinking I have written something stupid, but I am glad I got at least some of the things right.