Optimising Lighting

Thaumaturge · April 26, 2024, 10:02am

I’ve discovered a performance issue in my current project, and I’m hoping for some aid in determining what to do about it:

In my project, I have a few shaders that, in most cases, account for a significant proportion for the pixels rendered to the screen, I believe. Now, I want my project to include lighting, so I’ve implemented that into those shaders. Specifically, any node rendered via those shaders can be affected by up to 32 lights.

Aaaand… it looks like the code for this may well be more performance-intensive than I’d prefer. :/

I’ve already tried a few small optimisations within the code itself, but to little effect thus far, I fear.

So, a few thoughts occur to me:

I could reduce the maximum number of lights
- In fact, the previous maximum was 16. However, a recent change elsewhere prompted me to increase the average size of my rooms, meaning that more lights were called for in order to cover them…
I could attempt to have a given node be affected only by those lights that are actually near it
- After all, while a room may have many lights, in general only a few will actually affect a given point
- However, this would mean finding a way to detect which lights are affecting which objects, potentially calling for distance-calculations or collision shenanigans
- Further, it would presumably mean cutting up my room geometry more than I currently am, increasing the number of nodes in the scene and complicating level-building
- It would also presumably incur more shader-states, as different nodes within a room would have different lights in their shader-inputs, and more changes in shader-states, as moving objects travelled into the range of different lights

And that, then, is where I stand at the moment. Thoughts…?

jbandhauer · April 26, 2024, 10:19am

Deferred Shading could help https://learnopengl.com/Advanced-Lighting/Deferred-Shading . Are you already using it?

Thaumaturge · April 26, 2024, 10:47am

Ah, right, I forgot to mention deferred shading!

I’m not currently using it, mainly due to inexperience with it.

That said… is the Firefly demo a representative implementation of it…? If so, then that has never really run all that impressively well for me, I fear–and I can’t quite bring myself to believe that rendering the scene so many times, into so many buffers, is really going to improve things that much… ^^;

Still, it may be something that I should look into at some stage…

(And thank you for mentioning it!)

serega-kkz · April 26, 2024, 11:01am

Actually the answer is simple, use occluders to cover the light sources, since you won’t see all the rooms at once anyway.

Thaumaturge · April 26, 2024, 11:17am

I mean, this problem occurs with only one room in view, I’m finding.

Further, the problem seems to be on the objects that are receiving light, rather than the sources themselves.

Actually, I’m reminded of a question that I had in mind: In a GLSL shader, is it possible to early-out of a for-loop, or of an individual iteration of a for-loop–and without killing performance…?

My thought is twofold:

There are often rather fewer than 32 lights in a given room, so if I can just… stop iterating once I hit the first “empty” light then I could potentially save a fair few iterations.

and

More often than not, lights are quite far away from their targets. If I could do a broad-phase cull via an early-out of the current iteration when the light is far away on one or both horizontal axes, then again I could save some computation. (If not full iterations.)

Thoughts…?

serega-kkz · April 26, 2024, 11:28am

I’m not sure, but having only one room in view does not prevent all lights from being rendered. By the way, the occluder actually hides the geometry from the rendering pass.

Thaumaturge · April 26, 2024, 11:39am

I’m aware of the effect of an occluder–I just don’t think that it would help here.

I should clarify: I’m not using Panda’s built-in light system. (I have some particular requirements for my lights.)

Instead, each node has two arrays (well, lists on the Python-side) of length 32 applied as shader-inputs: one that holds Vec2s specifying position, and another that holds Vec3s specifying the light’s range, intensity, and softness.

Within the shaders that render various objects these arrays are accessed in a fixed-length loop, and the effects of each light applied according to the relevant maths.

serega-kkz · April 26, 2024, 2:56pm

I think I should clarify that the problem is the number of shader passes for each object in the scene. If you have a shader applied to the root node, then when calculating the lighting shader, each object will be passed down the hierarchy to the shader, even if it is not in the camera’s field of view. This statement is true for objects located outside the truncated cone of the camera, but it is not true for objects that are directly behind the wall of the room. There is no wall for the camera, it renders everything that is in the field of view of the cone.

eldee · April 26, 2024, 3:34pm

Deferred rendering does not help when you have many lights, that’s the purpose of light culling. You can use light culling also with a forward rendering pipeline.

You can sort your light by luminosity and in the loop stop iterating when you are below a given threshold. (You can estimate that luminosity coarsely either in the CPU or in the vertex shader). It should give you some speed improvement as probably the neighbouring pixels will also trigger the same condition and so the whole execution block will not wait for the full iteration.

Thaumaturge · April 26, 2024, 8:39pm

That… seems strange to me: I would expect it to be the other way around, if anything. That is, that objects outside of the camera-frustum would be culled, including from shader application, but that objects behind an occluder might still be considered due to some quirk of the culling applied by them…

Ah, and I keep forgetting to mention: this program has a top-down perspective, so there aren’t really walls that prevent one seeing into neighbouring rooms.

(There is a line-of-sight feature in play–but I’m not sure of how useful it would be for culling…)

Ah, I see! That’s interesting!

I’d always thought that deferred rendering was used to improve performance with many lights…

If not, then what is its purpose…?

Ah, so one can early-out in GLSL!

And, spurred by your comment, I just tried it…

And wow, what a difference! 0_0

So, my lights are already arranged, if I recall correctly, such that “unused” lights–which have a very, very small range-value–appear later in the array than “in-use” lights.

Thus I’ve put a simple, naive condition near the start of my loop. In pseudocode:

if (lightRange < valueSlightlyLargerThanTheUnusedSize)
{
    break;
}

And my test-scene, which was running at just below 60fps now runs at around 120fps! 0_0

(Better still: When my laptop is running on battery power it slows down, thus slowing any application running on it. In this state, the same scene was only running at around 20-odd fps–and it’s now running at about 50-odd!)

This does mean that I’ll want to be careful about rooms with lots of lights–but still, for many cases, it seems like a really effective change!

Furthermore, I might still consider trying out a “continue”-statement for lights that are too far from the object in question…

eldee · April 26, 2024, 9:39pm

Yes, but within very strict limits the fragment shader is executed on a block of pixels, and the gpu waits that the shader has finished its execution on all the pixels before doing the next block. Also the GPU keeps the execution of this shader on all the pixels synchronised. But if all the pixels go through the same branch ofca condition and all bail out early, the GPU dont mind .

It helps but only in complex scenes whith heavyweight lighting shaders.
When using deferred rendering you store material and geometry informations into buffers then in a second stage you calculate the shading for each pixels. But if you have 1000 lights you will still calculate your shading 1000 times.

It is also useful when you add other techniques, like screen space reflections, global occlusion and so on

serega-kkz · April 26, 2024, 9:55pm

Perhaps the discard operator can be used for this purpose somehow, since you can always skip a few pixels during processing.

I did not say that walls prevent you from seeing other objects, I said that objects outside the walls are equally taken into account when rendering. However, you have now explained that the camera is static and theoretically does not see all objects behind the wall.

Thaumaturge · April 27, 2024, 8:47am

Ah, I see! Thank you for the explanation!

In this case, the lights affecting an object aren’t likely to change often, so my current early-out should be applicable to all pixels of an object, I believe.

The other early-out that I have in mind–based on distance to the light–would be more useful in some cases (e.g. characters, which are fairly small on the screen) than others (e.g. room-floors, which can take up much or all of the screen).

I’ve also been thinking of maybe moving a vector-subtraction (used to find the vector from the light to the fragment, and thus the distance between the two) into the vertex shader. I’m not sure that the result would remain accurate, but I think that it might…

Aaah, I see–again, thank you!

Indeed, that is a good point.

That said, I’m not using any such for this project at the moment…

Hmmm…

I mean, I am using a discard earlier in the shader, for a different purpose. But I’m not sure that it would help here: I want the fragments in question to be unaffected by lights, not to be omitted entirely…

Ah, okay, that makes more sense! Thank you for clarifying!

Well, not static–it pans around, potentially seeing new rooms as it moves around the scene. But it’s the edges of the screen that affect which objects are seen, not the walls of the rooms.

serega-kkz · April 27, 2024, 3:19pm

It will be difficult to explain, but some objects, such as the floor, may only partially be in the camera view area. However, the lighting will be calculated for the entire surface, not just for the one that is visible.

However, I do not know how to technically implement this, I am not sure if there will be a performance gain if we discard the invisible pixels of the object, in fact this is just a theory.

In addition, I must explain that I take into account the fact that the light sources are independent cameras and, accordingly, calculations take place relative to their lenses. The very fact that an object is visible triggers the calculation of all the light sources that illuminate that object, and this happens for each lens if you use a shadow map.

I’m not sure if you’re using a shadow map, since you don’t use the default lighting.

Thaumaturge · April 27, 2024, 3:54pm

Aaah, I see! That makes a lot of sense–good point!

I think that it should be possible to using the result of the model-view-projection matrix calculation to determine whether a fragment is “on-screen” or not, if I’m not much mistaken…

I mean, it seems worth trying! I might give it a shot a little later and report back…

Well, as I said (or think that I said), I’m not actually using Panda’s lights. I’m using my own, which don’t use scene-graph nodes at all, let alone cameras.

I am not.

[edit]
Okay, discarding fragments didn’t seem to help.

(Specifically, I tried calculating the result of “trans_model_to_clip * p3d_Vertex” in my vertex-shader, then in the fragment shader using the condition “if (abs(result.x) > 1 || abs(result.y) > 1)” to discard fragments.)

Perhaps fragments outside of the camera-view are already being culled somewhere along the line?

That said, based on what I’m seeing, it may be that I’ve made most of the gains that I’m likely to make with these shaders at this point. It may be that I would be better off starting to look elsewhere for performance issues…

Anyway, let me say “thank you” to all of you who have offered help in this thread!