Texture Buffers for hardware instanced geometry

Hi,

I have been trying to use some hardware instanced geometry via setInstanceCount. It works fine and seems that it can give me the speed up I am looking for (over geometry shaders). At the moment I am using an iimage1d in the shader to get the x,y,z and type for each instance. I access it via imageLoad(texture, gl_InstanceID). It all works fine.

My problem is that 1d texture size on my hardware (and most others) do not get larger then 16k. According to general OpenGl info texture buffer was created just for that purpose. It usually can be any length as long as it fits in the GPU memory.

I can not find anything about if texture buffers are part of Panda3d? (tried googling , forums and documentation)
Or should I be using some other technique to pass a large amount of data to instanced call?

I am looking for numbers possibly 10-20 times larger then 16k.

It all works fine if I just try to calculate the data on the fly in the vertex shader. I get good performance. Not all can be calculated. I do need to pass a large table to it.

I know I could simply call GPU few times with a different set of instances or use 2d or array texture. Still all this seems not needed if a large buffer texture can be used. It would save few shifts in the shader over using 2d texture.

I think that UBOs (uniform buffer objects) might be more suitable for such a task. They are planned, but not implemented right now.

But can’t you simply use a 2D texture, and use (gl_InstanceID % width) to determine the Y value? An 8192x8192 image would allow you to store 67 million data points.

Agreed - a 2d array is the best solution. A flat buffer would be nice but a shift >> and an & is not that bad (as I can keep the size at a power of 2). Thank you. :slight_smile: Saved me a lot of thinking and trying to find/make a texture buffer like … thing. :open_mouth:

Still … looking forward to UBOs. :wink:

After making that post, I thought to myself: Gee, adding support for buffer textures wouldn’t really be difficult at all, and it would certainly make it easier (and possibly faster) to pass huge amounts of data to a shader.

So, I did, and it’s checked into Git. Windows development builds containing these changes are now available here.

To use it, there’s a new way to set up a texture, using setupBufferTexture. You need to make sure that your component type and your format match up or you’ll end up reinterpreting data. It also takes an additional usage hint parameter, which indicates how often you’ll be updating the buffer data. This has the same meaning as it does for GeomPrimitive and GeomVertexData.

I’ve attached a sample program that shows how to use it.
EDIT: mistake in the sample, replace “NUM_INST * 16” with “NUM_INST”
buffer-textures.zip (1.59 KB)

I’d forgotten about this earlier, but there’s also a way to pass a lot of per-instance data using the GL_ARB_instanced_arrays extension, part of OpenGL 3.3.
This works by creating another GeomVertexArrayData where each row represents per-instance data (rather than per-vertex data). You do this by calling setDivisor(1) to indicate that it’s per-instance data.

That’s somewhat more difficult to set-up, and it’s still a bit experimental, but it also allows for passing large amounts of data, and is far more widely supported than buffer textures. You can even pass matrices, and there’s no need to do any indexing using gl_InstanceID in the shader, since the GPU will automatically give you the data for the active instance.

I could create a sample program for this if you wish.

Addendum: TobiasSpringer made me aware of a bug in setupBufferTexture, which should be taking a pixel count rather than a number of bytes. I’ve just pushed a fix for that. The buffer size is now correctly multiplied by the number of bytes per pixel, and setupBufferTexture takes the number of pixels instead of bytes.

In C one would bind the per-instance attribute buffer with glBindBuffer(GL_ARRAY_BUFFER, …)

But in Panda3D, how you do that?

Do you use something like:
instancedThingy = loader.loadModel(…)
thingyPositions = GeomVertexData(‘asdf’, GeomVertexFormat.getV3(), Geom.UHStatic)

(here we would fill thingyPositions with positions)

instancedThingy.setShaderInput(‘per_instance_position’, thingyPositions)

Or how does this work? I would like to know.

It would be cool if you could link a sample program. I would be especially interested in an example where you pass a per-instance model-to-world 4x4 matrix for vertex transformation.

Here’s some quick and dirty code showing how to do that in Panda. I may clean up the code a bit when I have time.

from random import random
from panda3d.core import *
loadPrcFileData("init-ms-meter", """
show-frame-rate-meter true
frame-rate-meter-milliseconds true
frame-rate-meter-ms-text-pattern %0.2f ms
""")

from direct.showbase.ShowBase import ShowBase
from direct.actor.Actor import Actor

NUM_INSTANCES = 3600

vshad = """#version 330

uniform mat4 p3d_ModelViewProjectionMatrix;

in vec4 vertex;
in vec2 texcoord;

//in mat4 transform;
in vec4 offset;

out vec2 tcset;

void main() {
  gl_Position = p3d_ModelViewProjectionMatrix * (vertex + offset);
  tcset = texcoord;
}"""

fshad = """#version 330

uniform sampler2D p3d_Texture0;

out vec4 p3d_FragColor;

in vec2 tcset;

void main() {
  p3d_FragColor = texture(p3d_Texture0, tcset);
}
"""

base = ShowBase()

node = Actor('panda-model', {'walk' : 'panda-walk4'})
node.loop('walk')
node.setScale(0.01)

shader = Shader.make(Shader.SL_GLSL, vshad, fshad)
node.setShader(shader)
node.reparentTo(render)

gnode = node.find("**/+GeomNode").node()

iformat = GeomVertexArrayFormat()
iformat.setDivisor(1)
iformat.addColumn("offset", 4, Geom.NT_stdfloat, Geom.C_other)

format = GeomVertexFormat(gnode.getGeom(0).getVertexData().getFormat())
format.addArray(iformat)
format = GeomVertexFormat.registerFormat(format)

vdata = gnode.modifyGeom(0).modifyVertexData()
vdata.setFormat(format)

poswriter = GeomVertexWriter(vdata.modifyArray(2), 0)
for i in range(NUM_INSTANCES):
    poswriter.add_data3((i % 60) * 700 + random() * 100, (i // 60) * 1200 + random() * 200, 0)

poswriter = None
vdata = None
geom = None

node.setInstanceCount(NUM_INSTANCES)

node.node().setBounds(OmniBoundingVolume())
node.node().setFinal(True)

base.trackball.node().set_pos(4.7, 172.7, -49.7)
base.trackball.node().set_hpr(61.5281, 12.0915, -18.2124)

base.run()

It’s also possible to pass a matrix using this method, by using C_matrix instead of C_other and add_matrix4 instead of add_data3. However, I can’t get that to work right now. I’ll try to come back to it later.