Performance questions on setShaderInput and setShaderAuto

clcheung · March 25, 2009, 7:08am

I have a basic question on using these api. I found that if I call setShaderInput regularly and if one of the child node under this node has the auto shader turned on, the performance of the system will be downgraded significantly.

Can you explain why it happens ? Looks like the autoshader keep recreating the shader program ?

Here is a test program. The FPS is about 454 after the program started. If now press “1”, the FPS goes down to 380. Now press “t”, the FPS goes down to about 8.

from pandac.PandaModules import loadPrcFileData
#loadPrcFileData("", "want-directtools #t")
#loadPrcFileData("", "want-tk #t")
loadPrcFileData("", "sync-video #f")
import direct.directbase.DirectStart
from pandac.PandaModules import *

from direct.task import Task
from direct.interval.IntervalGlobal import *
import math
from direct.actor.Actor import Actor

class World():
    def __init__(self):
        self.LoadModel()

        base.accept("1", self.setshader1)
        base.accept("2", self.setshader2)
        base.accept("t", self.startTimeTask)

    def startTimeTask(self):
        taskMgr.add(self.setTime, "settime")

    def LoadModel(self):
        base.setBackgroundColor(255, 0, 0)
        base.disableMouse()
        base.cam.setPos(0, -25, 1.5)
        self.actor= Actor('panda.egg', {'walk' : 'panda-walk.egg'})
        self.actor.setScale(0.25,0.25,0.25)
        self.actor.reparentTo(render)
        self.actor.loop("walk")
        base.camera.lookAt(self.actor.getPos())

        maker = CardMaker( 'leaf' )
        maker.setFrame( 0, 1, 0, 1)
        self.ground = render.attachNewNode(maker.generate())
        self.ground.setPos(Vec3(-50,0,0))
        self.ground.setHpr(0,-90,0)
        self.ground.setScale(100,100,1)

        dlight = NodePath(DirectionalLight('dlight'))
        dlight.reparentTo(base.cam)
        render.setLight(dlight)

##
##        plight = PointLight('plight')
##        plight.setColor(VBase4(0.4, 0.4, 0.4, 1))
##        #plight.setAttenuation(Point3(0, 0, 0.01))
##        plnp = render.attachNewNode(plight)
##        plnp.setPos(0, 5, 5)
##        render.setLight(plnp)

    def setshader1(self):
        render.setShaderAuto()

    def setshader2(self):
        self.actor.setShaderAuto()

    def setTime(self, task):
        render.setShaderInput('time', task.time)
        return task.cont

world = World()
run()

rdb · March 25, 2009, 7:19am

I can confirm. The performance drop when pressing 1 is just the normal drop when enabling per-pixel lighting, when I press t it suddenly regenerates the shader every frame (and also has to compile it every frame), so this becomes kinda slow.
I’ll investigate.

rdb · March 25, 2009, 7:39am

Actually, it’s because the RenderState of the object changes. The code checks if the RenderState has changed, if so, gets a new ShaderAttrib from the ShaderGenerator. I suppose we could make an exception when the ShaderAttrib gets changed since it gets overridden anyways. I’ll look further into this.
For now, just don’t change the shader attrib while auto-shader is enabled.

drwr, is there some way I can check whether only the ShaderAttrib has changed, and not the rest? Would I need to loop through the attribs?

clcheung · March 25, 2009, 10:43am

It happens that I have two shaders that accept time as input, so it is convenient to set the shader input at the top. In my case I can avoid it.

But how about a scene that has a node that use autoshader but is also the child of a node that use a specific shader that need a time input ?

drwr · March 25, 2009, 6:23pm

Hmm, right, any change to the state requires generating a new shader when the auto-shader is in effect. This is clearly necessary because a shader represents a compiled version of the render state, and any changes to the state that should have a visible effect on the scene will clearly require a new shader.

The ShaderAttrib is a special case, because this is not used by the auto-generated shader, and therefore results in no visible effect on the scene (at least, not on the part of the scene controlled by the auto-shader). So, it might be a useful feature for the shader generator to somehow recognize that the changed attrib isn’t signficant, and somehow locate the shader that was generated for the equivalent RenderState without this new changed attrib.

Like I said, it would be a useful feature, but it would be kind of hard to do with the current mechanism; and I don’t think it’s a necessary feature. Really, it’s a mistake to call setShaderInput() on a node that is controlled by the auto-shader. You really shouldn’t be setting it on render, for instance; it’s better to set it on the node that actually has the shader attrib that wants this input.

If you have two nodes that want to share a common parent, and one of them wants to have a custom shader that requires a shader input, while the other one wants to have the auto-shader, then don’t parent the auto-shader node to the custom shader node. Instead, parent them both to a common node. That way you can still call setShaderInput() on the custom shader node only, without polluting the auto-shader node.

Still, I recognize the unfortunate counterintuitive aspect of this: you wouldn’t expect setShaderInput() to have such a devastating effect on performance when you inadvertently apply it to an auto-shader node. But really, any state change will have this same effect on performance when you apply it to an auto-shader node. So when you’re using the auto-shader, you have to get used to being careful with your state changes.

David

rdb · March 25, 2009, 6:38pm

I agree, but I do think that it should be possible to call setShaderInput on a root node - if your scene gets really complex, the user would either end up with a performance drop, or having to traverse through the scene graph to find non-auto-shader nodes.
A few ideas:

Can’t we copy both the old and new states, use remove_attrib to remove the ShaderAttrib, and check if they are equal?
Or, in ShaderAttrib::set_shader_input, we check if the autoshader flag is set, if so, ignore the input call. (However, if autoshader is then disabled, people might wonder where their input is.)
Maybe even better, in the ShaderAttrib == operator (or however it is compared), also return true if both happen to have autoshader flag set.

Maybe the perfect solution would be to deny ShaderAttrib::set_shader_input calls when autoshader is set, but that doesn’t solve the original problem, that would only make the error clearer (instead of a slowdown).

drwr · March 25, 2009, 7:07pm

Sure, if you have an old and a new state to compare. You don’t even need to do that much work; it’s certainly possible (even easy) to iterate through all of the attribs, and see if they are equal except for the ShaderAttrib.

However, we don’t have an old and a new state in this case. We just have the current state. The current state happens to be a different object from the state that was in effect last frame we rendered this object, but we don’t remember what the last state was. All we can do is ask the current state for the appropriate shader.

If you really wanted to solve this problem, we would have to get away from caching the current shader in the RenderState object, since there would now be multiple RenderStates that share the same shader. Instead, we would have to cache the shader elsewhere (maybe in the GSG, to be consistent with your shadow efforts), and we would look them up in the GSG based on a hash code computed from the RenderState, which didn’t involve the ShaderAttrib.

This doesn’t work, because we are talking about the case when the setShaderInput is applied to a root node (which does not have the auto-shader flag) and then automatically propagated to a child node (which does have the auto-shader flag).

Or maybe you mean we could change the rule of state composition, so that when a setShaderInput state composes with an auto-shader state, the shader inputs all go away from the result. This would solve the problem nicely, until you come across a lower-level node that disables the auto-shader again. Come to think of it, this probably is what you mean, isn’t it?

I think this is my favorite solution.

You can’t change the == operator without also changing the composition rule, as above, because if a == b then it must also be true that (a * c) == (b * c). The state cache relies on this principle. If you change the == operator only, you have violated this rule, and the state cache will become corrupt.

David

rdb · March 25, 2009, 7:26pm

In the set_state_and_transform step, isn’t that what _state_rs and _target_rs is, or did I misunderstand the code?

Oh, I thought NodePath.setShaderInput automatically propagated down and applied it to the ShaderAttrib on every node it found.

Sounds good, but I don’t entirely understand it yet. Do you mean that we altered glGSG::set_state_and_transform to remove the inputs from _target_shader when autoshader is enabled, and then compared _target_shader to the attrib in _state_rs? (Then we’d only need to regenerate the shader if they matched)

Or, do you mean that we made the ShaderAttrib::set_shader_auto call to remove all the inputs? Hmm, that would work, but if people add an input, quickly enable&disable autoshader, their inputs would be gone?

drwr · March 25, 2009, 10:59pm

No, that’s not the same thing. That’s storing the previous state within the same frame that we were just rendering a different object. It has nothing to do with the state that we used to render this object last frame.

See, the basic cull algorithm is: traverse the scene graph, and build up a list of Geom/RenderState pairs for rendering all of the objects in the scene. The basic draw algorithm is: walk though the list of Geom/RenderState pairs, and for each one, switch the graphics state to the new RenderState, and draw the Geom.

Not directly. The changes are propagated implicitly later, when the states are composed during the cull traversal. What actually happens is, during the cull traverser, Panda visits each node in the scene graph and composes all of the RenderStates together from the root to each node. When it reaches a GeomNode, it takes the net RenderState composition for that Geom, and adds it to the list.

It all comes down to the way RenderStates compose together. Think of them in the same sense as TransformStates. Applying a new transform to a node in the scene graph doesn’t actually change the nodes below; but it means if you recompute the net transform to a node below, you’ll get a new answer. It’s the same way with RenderStates.

During the cull traversal, states and transforms are accumulated for each node the same way.

No, I’m not talking about set_state_and_transform() now, since that’s dealing with the net state and transform once they have already been computed. I’m talking about changing the definition of the state composition.

Image that node A is a parent of node B, node A has a shader input, and node B has the auto-shader flag. Presently, if you ask for b.getNetState()–which means the same thing as a.getState().compose(b.getState())–you will get a state that includes the shader input as well as the auto-shader flag. This is the same RenderState that is computed for B during the cull traversal. I’m proposing changing the state-composition rules, so that if you ask for b.getNetState(), you will get a state that includes the auto-shader flag, but not the shader input.

David

clcheung · March 26, 2009, 1:57am

Thank you for the replies. Looks like it is a simple issue to be fixed easily. May be a general note can be put in the manual so that others can avoid it ? I met this issue several time and until yesterday I discover the real cause.

rdb · March 28, 2009, 9:01am

drwr: Ah, thanks for the explanation. So, we should just modify new_state in the RenderState::do_compose function not to include the shaderInputs in the returned render state when autoshader is enabled?

drwr · March 28, 2009, 5:21pm

Yes, I think that will solve the problem nicely.

David

rdb · April 28, 2009, 12:11pm

Revisiting this with a different, related issue.
When using a ColorScaleInterval (or similar), every frame the shader has to be regenerated, even though the shader itself has not changed a bit. That gives a big performance loss.
Is there a way to make this smarter? If not, I can at least speed it up by adding some kind of compiled-shader-caching mechanism.

drwr · April 28, 2009, 3:31pm

Josh’s comments seem to imply that there already is a shader-caching mechanism, in the sense that it is supposed to avoid the recompilation step if it happens to generate a shader that’s textually equivalent to a previous shader. No idea whether this works as claimed.

The other way to solve this would be to build some kind of a hash code for each RenderState, which is used to uniquify the state objects (instead of uniquifying them by pointer). This hash code could consider only the relevant RenderAttribs, and would ignore things such as ColorScaleAttrib (or at least, it would ignore the particular value of the ColorScaleAttrib, and hash only on the presence or absence of the attrib). This design would mean the cache of generated shaders would have to be stored outside of the RenderState objects themselves.

David

rdb · June 11, 2009, 6:48pm

It should indeed. But still it is reanalyzing the state and regenerating the shader every frame and this is a performance loss.

I still want to get this fixed though. I ran into it again when trying to get reflective surfaces and projected textures to work in the shader generator - since Panda is setting a new TexMatrix every frame, the ShaderGenerator has to do all the work again.

I think we need indeed to do some render state matching ignoring several attribs (or even - only a couple of settings of some attribs.)
Any pointers or specific suggestions how something like that would be implemented?

drwr · June 12, 2009, 5:48pm

This may require some significant design work. My first instinct is to add a RenderState->compare_auto_shader() or some such method, that compares two RenderStates similar to RenderState->compare_to(), but it only considers those attribs that are meaningful to the shader generator.

Then use this method to implement a pmap of <RenderState *, CPT(Shader) > or some such table, which caches all of the generated shaders for each RenderState. This cache would have to be stored somewhere, perhaps in the GSG.

One sticky detail is removing entries from the cache when the RenderStates are destructed. Perhaps a gsgbase->destruct_render_state() method can assist with that.

This is just my first instinct. Experience shows that in program design, though, first instincts are often horribly inappropriate. Especially since I have only an imperfect understanding of all of the issues involved in the shader generator.

David