Geometry instancing; how to further increase performance

Hi! Long text ahead…

Im adding trees to my scene using hardware geometry instancing. When i’ve added about 200 trees (2 different models, so 100 of each model) my performace has dropped to 50fps (med-high end Gpu & Cpu). To stress test the system i use relatively high vertex models (1 tree-model has 43K vertexes and the other 6.3K). I want to be ready for high vertex demand later on, so i’d like to greatly increase this fps.

My general question is ‘How can i increase this performance?’ Basically i want to keep all these (43K+6.3K)*100 vertexes surrounding the player at higher fps (LOD will be reserved for trees further away).

I already had an idea, about which i have a more specific question; One of the things i noticed was that every tree is drawn whether it is on screen or not. This is due to the geometry instancing. I tried countering this by manually determining whether a vertex is on screen, and if not dont draw it. (The vertex position if transformed to the camera space, there it is determined whether it is in the view cone, the fragment shader is ‘discarded’ when it is not).

This code works: when I make the view-cone smaller, indeed the edges of the screen no longer show trees. There is however no performance increase. Apparently, the main computational time lies in transforming each of the vertexes to the position of their instance (inc rotation and scaling). Only after that it can be determined whether that point is on screen. So te transformation code always has to be executed first, for all the (43K+6.3K)*100 vertexes.

Is there a better way of doing this? Might it be possible to check for a whole instance if it is anywhere near the screen, and if not automatically discard all the vertexes corresponding to this instance.

Thanks for reading all that, I realize i dont have a clear question but I hope someone can help me anyway :slight_smile:

My guess is you could probably set up some bounding volumes where the instances are placed, then do some checking to see if the volumes are in the camera view. You could then build a shortened list of instances to send to the GPU.

I could indeed do this outside of the shader, so with the CPU. Problems are that the CPU isnt optimised for these calculations, while the GPU is, and that i can not change the amount of instances in runtime. I could supply an array that is partly empy, or another array with booleans telling which instances should be drawn. But it would still have to ‘cancel’ for all the millions of vertexes; optimal solution would make it skip the entire instance.

Odds are it is still faster that doing these calculations for all the vertexes so ill definately give it a chance.

Best solution is to do your suggestion in the shader, is there a way to make the geometry instancing skip an entire instance?
gl_InstanceID += 1; is probably not going to work…

Well the idea would be to disqualify them per-object on the CPU instead of per-vertex on the GPU. It all depends on where your bottleneck is and load balancing. You can also save a lot of calculations using knowledge about your game. For example if you use a cell structure you can quickly disqualify any trees that are in cells that are not visible. You could use a distance check too which is also cheap.

I’ve decided to drop the witch hunt: having to render 100 models of 43K vertices is completely unrealistic. When i render 200 instances of the 6.3K trees i regain my 350 fps (incl. animated grass), and even 6.3K vertices is high for an averaged tree.

While writing that i had a major break through. For the past few days i thought the number of instances was fixed (since arrays have to have a predefined fixed size in the shader) and my only freedom was 'discard;'ing the unwanted vertices and fragments shaders every frame. But apparently setInstanceCount() can be reset during runtime.

Perhaps i will come back to this issue later now i know that it is possible, but for now ill redirect my attention to something else. Thanks for thinking with me!

For the record, back when I used instancing, I used the technique to render thousands of trees without significant drops, so I’m a bit surprised that you’re hitting such an apparently low limit. Perhaps you could consider that the bottleneck is about the particular way you render your models; ie, the particular shader, frequent state changes, or something like that.
Keep in mind that “discard” can in rare cases decrease performance due to the driver not being able to apply certain optimisations any more - this is something to experiment with.

And what vertex count dit those trees have? The amount at which im getting problems is a total of 5M vertices.

Im not doing anything strange in the shader; a texture2D() call in the fragment, and calculating its position in the vertex shader. For that i do use rotation matrices which have to be generated every time. I could try generating them beforehand in panda. Off to do that then.

That didnt make a differece. even if i only keep

    gl_Position = p3d_ModelViewProjectionMatrix * p3d_Vertex;
    gl_TexCoord[0] = p3d_MultiTexCoord0; 

In the vertex shader and

    gl_FragColor = texture2D(p3d_Texture0, gl_TexCoord[0].st);

In the fragment shader, i still get a fps of 53.

btw, im getting the message that mat4 arrays are not yet supported, any idea (if/)when they will be?