Using SSBO for light data

Old thread but still very interesting.

@rdb: thanks for that! Since data is formatted and sent as floats in the SSBO, is there any convenient way to bind/send a texture (e.g. a shadow map) to it (I imagine we could read the texture from a compute shader and fill the SSBO from it but I don’t know if there is a more convenient way to do it directly from python)?

Thanks!

Sure, you can bind the shadow texture directly as a shader input to the compute shader. Or, you can assign it to the vertex shader rendering the embers and cull them that way.

Thanks!

To continue the conversation (I’ll create an other thread as not to alter this thread too much), do you think possible to have something such as:

struct Ember {
  vec3 pos;
  float speed;
  vec3 vel;
  float size;
  texture2D shadowmap; //Change compared to the initial code
};

layout(std430) buffer EmberBuffer {
  Ember embers[];
};

as it could mimic the p3d_LightSourceParameters uniforms

I think I recall someone said (probably you but I can’t remember where :slight_smile: ) that mixing in the same structure lights data and shadowmap is not a good design idea.

So would it be better to store shadowmaps in separate arrays that are indexed into by the p3d_LightSource (as per p3d_LightSourceParameters shadowMap not universal. · Issue #707 · panda3d/panda3d) ?

Thanks!

Resources are opaque and you can’t put them into a buffer (short of ARB_bindless_texture, but it’s not supported that well and we don’t have the right infrastructure to handle it in the engine).

What you can do is create an array texture (which is different from an array of textures, which indexing into is a little trickier and is a bit more limited) and index into it using an integer that you read from the SSBO. The caveat is that you’ll suffer pretty awful latency with such a long memory fetch dependency chain–it’s a lot better when the index is a uniform parameter. Another caveat is that in an array texture, all slices must be the same size.

What are you trying to do here? Perhaps I can offer more specific suggestions. I can split off the thread if necessary.

Thanks! Probably better to have a separate thread.

But in (not so) short:

I already built a Light management system enabling 1/ to have specific data for each light (e.g. 2D or 3D texture for light cookies or PSSM matrices/shadow maps) 2/ mixing all type of shadows (DL/SL/PL) 3/ leveraging to some extent on the P3D Lighting system (for shadows for DL & SL)

I ended up having my own Light data structure (including shadowmaps for PL and PSSM) and using the pd3_LightSourceParameters (for shadow maps and shadow view matrices) as well and use both of them with indexes.

Everything works well, but at some point I would like to remove one light without recompiling the shader. This almost works but I still have an issue - the removed light still partly appears and so I suspect it is still present in the array.

At that point, I was wondering if a different design (e.g. using a SSBO for light data and arrays of textures for shadow maps and cookie textures) would be a better one and hence my question: is there a better way to manage P3D lights & shadows with specific additional data (my initial preference would have to use the existing pd3_LightSourceParameters and extend it with my own additional data but it does not work like this).

I’m not sure how the SSBO is supposed to help in this scenario over regular uniforms. We can use a UBO (though we need to work on support for that in Panda- it’s a lot easier to add with the new shaderpipeline work being done though) for more efficient uniform passing than individual uniforms, but the SSBO will have more latency than either one.

Yes, SSBOs support runtime-length arrays, but you can get the same effect with a fixed-size array and a light counter input, right? I’m happy to add such a counter to Panda if you need it, it’s a fairly trivial change to do on the shaderpipeline branch.

Panda will currently only bind the lights that are active on a node to the p3d_LightSource input, with remaining lights in the array being zero-valued so they don’t contribute.

In the long term I think I want to use a global uniform buffer containing a big array with data for all lights, and indices into this array being part of the per-instance data.

What I would do for shadow maps is put them all in a global atlas or texture array and index into that with per-instance or per-light indices.

Ok, got it, then let’s forget SSBO for that use case. Indeed if UBOs could be set up with the new shaderpipeline, that would be a good enhancement. By the way, many thanks for your hard work on the pipeline (a lot of recent commits)!

Indeed, yes, I already use a uniform counter input (I suspected it was my issue for my non-completly removed light, so I switched from a array of structs length() to a uniform counter, even if as a matter of fact it does not completely solved the issue - I need to investigate more)

Thanks for the proposal! For me, I am already managing it by myself but this feature could be interesting for other P3D devs…

When you mention “texture array”, do you mean an array of uniform textures such as:

uniform samplerCubeShadow ShadowMap[MAX_LIGHTS]; 

I suppose they would need to be setup from P3D with a set_shader_input(“ShadowMap[Index]”,shadowmap tex)?

1 Like

:blush:

Well, it’s trivial to add, so… added.

You could do it that way, and it would be similar to having them in p3d_LightSource, but I was actually suggesting having all shadow maps in the scene in a global samplerCubeArrayShadow (together with a sampler2DArrayShadow), and the p3d_LightSource struct contain an index into this global array.

The caveat is that each array layer has to be the same size, but if you need shadow maps with different sizes, there’s also a way to deal with that: smaller shadow maps could be atlassed, meaning we render the shadow map into a smaller section of the texture, with multiple shadow maps fitting into the same array layer. This is what tobspr’s RenderPipeline does, in fact. You can just bake the UV offset/scale into the shadowMatrix.

I should add that cube map arrays are an OpenGL 4.0 feature, but I suppose so is dynamic indexing.

Thanks! Probably good to add it to the manual list of GLSL input.

I suppose using a sliced texture is much more efficient compared to an array of uniform textures?

I followed your suggestion and tried to set up (I think I missed a good old-fashioned example :-)) but nevertheless I almost managed to make it work with 2 lights. In short (and please tell me if I am correct and what I am doing wrong):

  1. Create a texture array and 2 lights
        self._tex_array = Texture("volume")
        self._tex_array.setup2dTextureArray(2)
       ....
  1. Capture P3D shadow buffers through:
       sBuffer = self._light_np.node().getShadowBuffer( GraphicsStateGuardianBase.getGsg(0) )
       sBuffer2 = self._light2_np.node().getShadowBuffer( GraphicsStateGuardianBase.getGsg(0) )
  1. Create 2 cardmakers and assign to them a fragment shader indicating through a uniform which layer should be considered to display the given “slice” of texture:
void main() {
  vec4 sampled = texture(texture_array, vec3(TexCoords,layer));
  p3d_FragColor = sampled;
}
  1. Create “bind_layered” RenderToTextures for each light shadowbuffer to fill the array texture. Assign a specific geometry shader (see below) to the light (camera) state, as to enable every objects seen by the light to be rendered by the geometry shader
 #Light1
            sBuffer.addRenderTexture(self._tex_array, GraphicsOutput.RTM_bind_layered, GraphicsOutput.RTP_depth)
            attr = self._generate_shader(True,0)
            state = self._light_np.node().get_initial_state()
            state = state.add_attrib(attr, 1)
            self._light_np.node().set_initial_state(state)

The aforementioned shader has a Geom shader with a unifom “Mylayer” for each light indicating wich part of the texture should the geometry be rendered:

void main() {
        for (int i = 0; i < 3; ++i) {
            gl_Position = gl_in[i].gl_Position;
            gl_Layer = Mylayer;
			TexCoords = TexCoords1[i];
            EmitVertex();
        }
		
        EndPrimitive();

This works well when I only activate 1 RenderToTexture but not when I activate both: I just see the first light shadow map in my cardmaker.

I am certainly not understanding 100%: so if you could help, that would be great!

Thanks again!

Yes, mostly because we can bind the texture only once in the frame, we never need to rebind it.

There are two ways to bind to the layered texture: one is using RTM_layered and a geometry shader (or vertex shader with an extension like AMD_vertex_shader_layer) to pick the layer to render into, and one is just choosing the target layer with dr.setTargetTexPage(n) on the display region of the shadow buffer. The latter way is easier for this use case. The geometry shader version gets efficient if you render multiple views in the same pass and assign them to a slice in the shader, which is especially useful for cube map rendering.

I’m not really sure why you’re not seeing both, though. What happens if you hardcode the gl_Layer to 1, does it actually affect which face is rendered into?

And what happens if you hardcode layer to 0 or 1 in the texture on the quad? Note that the layer coordinate is not normalized, so it’s not 0-1, but an integer index cast to a float.

I’m assuming you’re not using shadow sampling/filtering for this, since you would be missing a fourth coordinate in that situation.

Thanks for you answer.

Actually, buffer 1 is displayed on cardmaker #2 (so inverted compared to the “normal” situation). Nothing displayed in cardmaker #1

Buffer1 appears on both cardmakers with layer = 0

When layer = 1, no shadow buffer is displayed (which seems normal since no shadowmap #2 seems generated)

Thanks, indeed it seems pretty much simplier. I tried it but no change (shadowmap #2 does not appear):

            sBuffer.addRenderTexture(self._tex_array, GraphicsOutput.RTM_bind_layered, GraphicsOutput.RTP_depth)
            sBuffer.get_display_region(0).setTargetTexPage(0)

and

sBuffer2.addRenderTexture(self._tex_array, GraphicsOutput.RTM_bind_layered, GraphicsOutput.RTP_depth)
            sBuffer2.get_display_region(0).disable_clears()
            sBuffer2.get_display_region(0).setTargetTexPage(1)

The only way to make the shadowmap #2 displayed (and probably generated??) is to deactivate the sBuffer1:

            #sBuffer.addRenderTexture(self._tex_array, GraphicsOutput.RTM_bind_layered, GraphicsOutput.RTP_depth)
            #sBuffer.get_display_region(0).disable_clears()

the sliced texture array is declared as a normal texture in the shader

uniform sampler2DArray texture_array;

Even if I declared it as not as a shadow texture:

        self._state = SamplerState()
        self._state.setMinfilter( SamplerState.FT_nearest)
        self._state.setMagfilter( SamplerState.FT_linear )
        self._tex_array.setDefaultSampler( self._state )

It is however associated to the sBuffer1 and SBuffer2 through the RenderTarget process with the RTP_depth field. So is tex_array automatically converted to a shadow texture? (but P3D is not complaining about this)

sBuffer.addRenderTexture(self._tex_array, GraphicsOutput.RTM_bind_layered, GraphicsOutput.RTP_depth)

One last thing in mind: to make it fast, I use the P3D shader generator to generate shadows and shades the 2 cubes. Not sure how it could interfere with all of that but:
When sBuffer#1 only is activated, a shadow is displayed on the cube
When sBuffer#2 only is activated, no shadow is displayed on the cube

Update: I uploaded a minimal sample of code that demonstrates the issue.

By using two sliced textures in this code (which is not the goal, but I did it to test it), the code actually works. Using a single texture array makes the same type of issue as described in the previous post (see line 113).

@rdb: in you spare(!) time, should you have a few minutes to have a look, that would be great, as I struggle to make it work correctly!

Code.zip (4.6 KB)

I took a look. I had to fix another bug first, as Panda was crashing for me on this code. Turns out there was a use-after-free issue, which I checked in a fix for. Can maybe work around it by setting gl-force-fbo-color false (which you may want to set anyway).

The issue is that a clear of a layered FBO will clear all layers of the attachment. So all layers get cleared again when the second pass renders. So the easy fix is to control the order in which the shadow buffers are rendered (with a sort value in set_shadow_caster) and disable the clears on all but the first buffer (and their display regions).

This isn’t quite how layered FBOs are intended to be used, but a fine way to get it working under the current infrastructure. I intend to make some changes to Panda to let you use shadow texture arrays (or atlases) natively, probably ideally with only a single FBO shared between the shadow passes.