Using SSBO for light data

Melan · November 1, 2024, 4:47pm

Old thread but still very interesting.

@rdb: thanks for that! Since data is formatted and sent as floats in the SSBO, is there any convenient way to bind/send a texture (e.g. a shadow map) to it (I imagine we could read the texture from a compute shader and fill the SSBO from it but I don’t know if there is a more convenient way to do it directly from python)?

Thanks!

rdb · November 2, 2024, 2:51pm

Sure, you can bind the shadow texture directly as a shader input to the compute shader. Or, you can assign it to the vertex shader rendering the embers and cull them that way.

Melan · November 11, 2024, 5:50pm

Thanks!

To continue the conversation (I’ll create an other thread as not to alter this thread too much), do you think possible to have something such as:

struct Ember {
  vec3 pos;
  float speed;
  vec3 vel;
  float size;
  texture2D shadowmap; //Change compared to the initial code
};

layout(std430) buffer EmberBuffer {
  Ember embers[];
};

as it could mimic the p3d_LightSourceParameters uniforms

I think I recall someone said (probably you but I can’t remember where ) that mixing in the same structure lights data and shadowmap is not a good design idea.

So would it be better to store shadowmaps in separate arrays that are indexed into by the p3d_LightSource (as per p3d_LightSourceParameters shadowMap not universal. · Issue #707 · panda3d/panda3d) ?

Thanks!

rdb · November 11, 2024, 6:55pm

Resources are opaque and you can’t put them into a buffer (short of ARB_bindless_texture, but it’s not supported that well and we don’t have the right infrastructure to handle it in the engine).

What you can do is create an array texture (which is different from an array of textures, which indexing into is a little trickier and is a bit more limited) and index into it using an integer that you read from the SSBO. The caveat is that you’ll suffer pretty awful latency with such a long memory fetch dependency chain–it’s a lot better when the index is a uniform parameter. Another caveat is that in an array texture, all slices must be the same size.

What are you trying to do here? Perhaps I can offer more specific suggestions. I can split off the thread if necessary.

Melan · November 11, 2024, 7:38pm

Thanks! Probably better to have a separate thread.

But in (not so) short:

I already built a Light management system enabling 1/ to have specific data for each light (e.g. 2D or 3D texture for light cookies or PSSM matrices/shadow maps) 2/ mixing all type of shadows (DL/SL/PL) 3/ leveraging to some extent on the P3D Lighting system (for shadows for DL & SL)

I ended up having my own Light data structure (including shadowmaps for PL and PSSM) and using the pd3_LightSourceParameters (for shadow maps and shadow view matrices) as well and use both of them with indexes.

Everything works well, but at some point I would like to remove one light without recompiling the shader. This almost works but I still have an issue - the removed light still partly appears and so I suspect it is still present in the array.

At that point, I was wondering if a different design (e.g. using a SSBO for light data and arrays of textures for shadow maps and cookie textures) would be a better one and hence my question: is there a better way to manage P3D lights & shadows with specific additional data (my initial preference would have to use the existing pd3_LightSourceParameters and extend it with my own additional data but it does not work like this).

rdb · November 11, 2024, 8:49pm

I’m not sure how the SSBO is supposed to help in this scenario over regular uniforms. We can use a UBO (though we need to work on support for that in Panda- it’s a lot easier to add with the new shaderpipeline work being done though) for more efficient uniform passing than individual uniforms, but the SSBO will have more latency than either one.

Yes, SSBOs support runtime-length arrays, but you can get the same effect with a fixed-size array and a light counter input, right? I’m happy to add such a counter to Panda if you need it, it’s a fairly trivial change to do on the shaderpipeline branch.

Panda will currently only bind the lights that are active on a node to the p3d_LightSource input, with remaining lights in the array being zero-valued so they don’t contribute.

In the long term I think I want to use a global uniform buffer containing a big array with data for all lights, and indices into this array being part of the per-instance data.

What I would do for shadow maps is put them all in a global atlas or texture array and index into that with per-instance or per-light indices.

Melan · November 11, 2024, 9:23pm

Ok, got it, then let’s forget SSBO for that use case. Indeed if UBOs could be set up with the new shaderpipeline, that would be a good enhancement. By the way, many thanks for your hard work on the pipeline (a lot of recent commits)!

Indeed, yes, I already use a uniform counter input (I suspected it was my issue for my non-completly removed light, so I switched from a array of structs length() to a uniform counter, even if as a matter of fact it does not completely solved the issue - I need to investigate more)

Thanks for the proposal! For me, I am already managing it by myself but this feature could be interesting for other P3D devs…

When you mention “texture array”, do you mean an array of uniform textures such as:

uniform samplerCubeShadow ShadowMap[MAX_LIGHTS];

I suppose they would need to be setup from P3D with a set_shader_input(“ShadowMap[Index]”,shadowmap tex)?

rdb · November 12, 2024, 8:46am

Well, it’s trivial to add, so… added.

You could do it that way, and it would be similar to having them in p3d_LightSource, but I was actually suggesting having all shadow maps in the scene in a global samplerCubeArrayShadow (together with a sampler2DArrayShadow), and the p3d_LightSource struct contain an index into this global array.

The caveat is that each array layer has to be the same size, but if you need shadow maps with different sizes, there’s also a way to deal with that: smaller shadow maps could be atlassed, meaning we render the shadow map into a smaller section of the texture, with multiple shadow maps fitting into the same array layer. This is what tobspr’s RenderPipeline does, in fact. You can just bake the UV offset/scale into the shadowMatrix.

rdb · November 12, 2024, 12:47pm

I should add that cube map arrays are an OpenGL 4.0 feature, but I suppose so is dynamic indexing.

Melan · November 17, 2024, 9:13pm

Thanks! Probably good to add it to the manual list of GLSL input.

I suppose using a sliced texture is much more efficient compared to an array of uniform textures?

I followed your suggestion and tried to set up (I think I missed a good old-fashioned example :-)) but nevertheless I almost managed to make it work with 2 lights. In short (and please tell me if I am correct and what I am doing wrong):

Create a texture array and 2 lights

        self._tex_array = Texture("volume")
        self._tex_array.setup2dTextureArray(2)
       ....

Capture P3D shadow buffers through:

       sBuffer = self._light_np.node().getShadowBuffer( GraphicsStateGuardianBase.getGsg(0) )
       sBuffer2 = self._light2_np.node().getShadowBuffer( GraphicsStateGuardianBase.getGsg(0) )

Create 2 cardmakers and assign to them a fragment shader indicating through a uniform which layer should be considered to display the given “slice” of texture:

void main() {
  vec4 sampled = texture(texture_array, vec3(TexCoords,layer));
  p3d_FragColor = sampled;
}

Create “bind_layered” RenderToTextures for each light shadowbuffer to fill the array texture. Assign a specific geometry shader (see below) to the light (camera) state, as to enable every objects seen by the light to be rendered by the geometry shader

 #Light1
            sBuffer.addRenderTexture(self._tex_array, GraphicsOutput.RTM_bind_layered, GraphicsOutput.RTP_depth)
            attr = self._generate_shader(True,0)
            state = self._light_np.node().get_initial_state()
            state = state.add_attrib(attr, 1)
            self._light_np.node().set_initial_state(state)

The aforementioned shader has a Geom shader with a unifom “Mylayer” for each light indicating wich part of the texture should the geometry be rendered:

void main() {
        for (int i = 0; i < 3; ++i) {
            gl_Position = gl_in[i].gl_Position;
            gl_Layer = Mylayer;
			TexCoords = TexCoords1[i];
            EmitVertex();
        }
		
        EndPrimitive();

This works well when I only activate 1 RenderToTexture but not when I activate both: I just see the first light shadow map in my cardmaker.

I am certainly not understanding 100%: so if you could help, that would be great!

Thanks again!

rdb · November 17, 2024, 9:37pm

Yes, mostly because we can bind the texture only once in the frame, we never need to rebind it.

There are two ways to bind to the layered texture: one is using RTM_layered and a geometry shader (or vertex shader with an extension like AMD_vertex_shader_layer) to pick the layer to render into, and one is just choosing the target layer with dr.setTargetTexPage(n) on the display region of the shadow buffer. The latter way is easier for this use case. The geometry shader version gets efficient if you render multiple views in the same pass and assign them to a slice in the shader, which is especially useful for cube map rendering.

I’m not really sure why you’re not seeing both, though. What happens if you hardcode the gl_Layer to 1, does it actually affect which face is rendered into?

And what happens if you hardcode layer to 0 or 1 in the texture on the quad? Note that the layer coordinate is not normalized, so it’s not 0-1, but an integer index cast to a float.

I’m assuming you’re not using shadow sampling/filtering for this, since you would be missing a fourth coordinate in that situation.

Melan · November 23, 2024, 9:15am

Thanks for you answer.

Actually, buffer 1 is displayed on cardmaker #2 (so inverted compared to the “normal” situation). Nothing displayed in cardmaker #1

Buffer1 appears on both cardmakers with layer = 0

When layer = 1, no shadow buffer is displayed (which seems normal since no shadowmap #2 seems generated)

Thanks, indeed it seems pretty much simplier. I tried it but no change (shadowmap #2 does not appear):

            sBuffer.addRenderTexture(self._tex_array, GraphicsOutput.RTM_bind_layered, GraphicsOutput.RTP_depth)
            sBuffer.get_display_region(0).setTargetTexPage(0)

and

sBuffer2.addRenderTexture(self._tex_array, GraphicsOutput.RTM_bind_layered, GraphicsOutput.RTP_depth)
            sBuffer2.get_display_region(0).disable_clears()
            sBuffer2.get_display_region(0).setTargetTexPage(1)

The only way to make the shadowmap #2 displayed (and probably generated??) is to deactivate the sBuffer1:

            #sBuffer.addRenderTexture(self._tex_array, GraphicsOutput.RTM_bind_layered, GraphicsOutput.RTP_depth)
            #sBuffer.get_display_region(0).disable_clears()

the sliced texture array is declared as a normal texture in the shader

uniform sampler2DArray texture_array;

Even if I declared it as not as a shadow texture:

        self._state = SamplerState()
        self._state.setMinfilter( SamplerState.FT_nearest)
        self._state.setMagfilter( SamplerState.FT_linear )
        self._tex_array.setDefaultSampler( self._state )

It is however associated to the sBuffer1 and SBuffer2 through the RenderTarget process with the RTP_depth field. So is tex_array automatically converted to a shadow texture? (but P3D is not complaining about this)

sBuffer.addRenderTexture(self._tex_array, GraphicsOutput.RTM_bind_layered, GraphicsOutput.RTP_depth)

One last thing in mind: to make it fast, I use the P3D shader generator to generate shadows and shades the 2 cubes. Not sure how it could interfere with all of that but:
When sBuffer#1 only is activated, a shadow is displayed on the cube
When sBuffer#2 only is activated, no shadow is displayed on the cube

Melan · December 1, 2024, 6:47pm

Update: I uploaded a minimal sample of code that demonstrates the issue.

By using two sliced textures in this code (which is not the goal, but I did it to test it), the code actually works. Using a single texture array makes the same type of issue as described in the previous post (see line 113).

@rdb: in you spare(!) time, should you have a few minutes to have a look, that would be great, as I struggle to make it work correctly!

Code.zip (4.6 KB)

rdb · December 3, 2024, 12:20pm

I took a look. I had to fix another bug first, as Panda was crashing for me on this code. Turns out there was a use-after-free issue, which I checked in a fix for. Can maybe work around it by setting gl-force-fbo-color false (which you may want to set anyway).

The issue is that a clear of a layered FBO will clear all layers of the attachment. So all layers get cleared again when the second pass renders. So the easy fix is to control the order in which the shadow buffers are rendered (with a sort value in set_shadow_caster) and disable the clears on all but the first buffer (and their display regions).

This isn’t quite how layered FBOs are intended to be used, but a fine way to get it working under the current infrastructure. I intend to make some changes to Panda to let you use shadow texture arrays (or atlases) natively, probably ideally with only a single FBO shared between the shadow passes.

Melan · December 27, 2024, 10:10am

Many thanks rdb and apologies for my late answer.
It works well, so thanks!

Edit
It works well with geom shader but not with the set_target_tex_page(n) trick you mentionned. Indeed, when I do:

sBuffer.addRenderTexture(self._tex_array, GraphicsOutput.RTM_bind_layered, GraphicsOutput.RTP_depth)
sBuffer.get_display_region(0).setTargetTexPage(0)
sBuffer2.addRenderTexture(self._tex_array, GraphicsOutput.RTM_bind_layered, GraphicsOutput.RTP_depth)
for dr in sBuffer2.get_display_regions():
            dr.setTargetTexPage(1)
            dr.disable_clears()
sBuffer2.disable_clears()

I get a

AssertionError: page >= 0 && (size_t)page < _fbo.size() at line 1513 of c:\buildslave\sdk-windows-amd64\build\panda\src\glstuff\glGraphicsBuffer_src.cxx

If I only target the DR 0 (which seems to be inactive for sBuffer2), P3D is not complaining but the result is that the second buffer is sent to page 0 making the both buffers displayed on the same page.

Any idea?

I tried to do the same exercise with PointLights and samplerCubeArray.
I managed to make it partially work (I suppose) since I am not sure all faces are correctly displayed.

I use this specific sampler shadow = texture(texture_array3d, vec4(dir.xyz,0.0)).r; to retrieve the first cubemap texture where 0.0 is the layer and dir.xyz is p3d_Vertex (as usually done to display a cubemap texture on a cube).

Is there any adjustment needed to the geometry shader (e.g rotating the gl_Position 6 times to cover all the cameras)? So far for 2D:

void main() {
        for (int i = 0; i < gl_in.length(); ++i) {
            gl_Position = gl_in[i].gl_Position;
            gl_Layer = Mylayer;  // Set the layer for the current primitive
	        TexCoords = TexCoords1[i];
            EmitVertex();
        }
		
        EndPrimitive();
}

Is there any other adjustments needed compared to 2D textures?

Nice! That would definitively make life easier. If you don’t mind, shadowViewMatrix should also be made easy to retrieve from python as well (unless there is a current way to do it - I am interested to know how).

When everything will work, I’ll post a complete example here: I think a good example on how using sliced 2D/3D textures would have been a great accelerator for me!

Thanks again!

Melan · January 2, 2025, 7:20pm

I made multiple futher investigations and tests…

For 2D textures, it works well with a shader (no geom shader used) except I can’t really understand how using the set_target_tex_page.

For 3D textures, it is still not working. I gave up on the use of the geom shader: difficult to make it work and potential performance issues that may potentially impact the benefit of using sliced texture.
I first decided to use the extension “GL_AMD_vertex_shader_layer” to use gl_layer in the vertex shader.

So I followed two routes - I included a minimal test code showing both:

Using the set_target_tex_page(n)
It took me a while to understand that using it requires to bind with RTM_bind_or_copy
It works well to display all the faces of the first shadow map but…

not with two textures, where I need to indicate a tex_page > 5 for the second texture (causes assertion error)
even with only one texture, I get an error message (not blocking) from P3D stating not knowing how to fill the framebuffer

Using a shader (without a geom)
P3D is not complaining but I only can display one face per texture on each of the cube - see below

Screenshot2304×1296 158 KB

@rdb: should you have any suggestion…thanks!

Attached code.

Code_updated.zip (8.5 KB)

Melan · January 4, 2025, 8:12am

After performing other tests, I gave up in using the 2 aforementionned routes for a PointLight/3D textures. So I reverted back to my previous design where I create my own buffers / PL light shadow map generation and adapted it to make it work with a sliced 3D texture.

And it works almost perfectly…so there is something in the configuration of the buffer generated by P3D for the PL shadowmap that prevents it to transfer to a sliced texture (and I don’t see what actually…)

well, not so perfectly, as P3D is still complaining with the message with the 3D sliced texture:

:display:gsg:glgsg(error): Don’t know how to copy framebuffer to texture volume2

Any suggestion?

Attached an updated version of an almost working code…

Code_updated2.zip (9.2 KB)

rdb · January 19, 2025, 10:11am

I just checked in some changes to handle cube map arrays in the render-to-texture code. Hopefully they work. Buildbot builds will appear here sooner or later.

Melan · January 19, 2025, 5:31pm

Thanks so much for your time and effort rdb!

First good news: I thought that the 1.11 version would require a lot of changes in the code (with a new shader class and pipeline) but it worked without any change…that is great!
Second good news: sliced 3d textures are working well and P3D is not complaining any more
But unfortunately there is no more shadow appearing for the first two lights (DL and SL, for the sliced 2d texture). The code still works for 1.10.15 (I tested it in parallel) but not for 1.11.

Thanks again!

rdb · January 20, 2025, 8:01pm

Would you be willing to give me a very minimal test case to just show what’s no longer working? There’s a lot going on in your program, and it’s running at 2-3 fps for me, so it’s hard for me to debug.

As a sidenote, 1.11 does not yet have the shaderpipeline changes merged.