Motion Blur Shader Problem

Just getting started in Panda3D, and I’m trying to understand shaders now. I have a lot of programming experience but I am very new to this kind particular engine and the shader language.

I’ve got a problem. The Motion Trails sample that Panda3D comes with just is not working out for what I want to do. I really wish someone would change that darn “Motion Blur” name for the effect, it really is not motion blur in any way. So in order to implement fullscreen “true” motion blur, I decided to write a shader for it. This is my first shader so I’m very new (and I understand how difficult motion blur tends to be), but maybe there’s a chance it’s just a simple mistake. This is the shader code:

//Cg

void vshader(float4 vtx_position : POSITION, 
             float2 vtx_texcoord0 : TEXCOORD0,
             out float4 l_position : POSITION,
      	     out float2 l_texcoord0 : TEXCOORD0,
			 uniform float4 texpad_src,
             uniform float4x4 mat_modelproj)
{
  l_position=mul(mat_modelproj, vtx_position);
  //l_texcoord0=vtx_texcoord0;
  l_texcoord0 = vtx_position.xz * texpad_src.xy + texpad_src.xy;
}


void fshader(float2 l_texcoord0 : TEXCOORD0,
             out float4 o_color : COLOR,
             uniform sampler2D k_src : TEXUNIT0,
             uniform sampler2D k_depth : TEXUNIT1,
             uniform float4x4 inv_projection,
             uniform float4x4 k_prevMatProjection,
             uniform float4 k_param1)
{

float motionSamples = k_param1.x;

//Get the dpeth value at this pixel
float zOverW = tex2D(k_depth, l_texcoord0);

//H is the viewport position at this pixel in range of -1 to 1
float4 H = float4(l_texcoord0.x * 2 - 1, (1 - l_texcoord0.y) * 2 - 1, zOverW , 1);

//Transform by the view-projection inverse.
float4 D = mul(H, inv_projection);

//Divide by w to get the world position.
float4 worldPos = D / D.w;

//Current viewport position
float4 currentPos = H;

//Use the world position, and transform by the previous view projection matrix
float4 previousPos = mul(worldPos, k_prevMatProjection);

previousPos /= previousPos.w;

//Compute pixel velocity
float2 velocity = (currentPos - previousPos) / 2.f;

//Get initial color at pixel.
float4 color = tex2D(k_src, l_texcoord0);

float2 texCoord = l_texcoord0;

texCoord += velocity;

for(int i = 1; i < motionSamples; ++i, texCoord += velocity)
{
float4 currentColor = tex2D(k_src, texCoord);
color += currentColor;
}

o_color = color / motionSamples;

}

I think that it should work by now, but when I try and run the program, I get the following error:

[color=red]:gobj(error): Cg program too complex for driver: motionblur.sha. Try choosing a different profile.

I do have “advanced shaders” enabled in the configuration, so I know that it’s not that problem

Just some information about my hardware:

  • Windows 7
  • Panda 1.7 (stable) and the latest Panda 1.8.0… tried and failed on both
  • AMD Mobility Radeon HD 5400 Series (not the best graphics card on the market, I know, but I have a multi-gfx arrangement and I run plenty of games just fine at advanced settings at a relatively stable framerate: Mass Effect 2, Skyrim, Portal 2… and so on)

Does anybody know what part of the shader program is “too complex” for the driver to handle? How can I change it to be compatible. Thank you in advance :slight_smile:

And just as a note, I will accept “it’s not possible” as an answer as long as it is clearly explained why.

maybe the loop? I vaguely recall that being one of the things that some profiles can’t handle. …

That’s actually very correct… I went in and did a quick-and-dirty modification to the shader code and removed the “for” loop, and that error was gone.

So now I know that the loop can’t be handled by the program, so are there any work-arounds? As far as I’m concerned, the algorithm needs some sort of looping mechanism there. Do I have to do something to the profile or do I have to cheat and just copy the code a bunch of times? :wink:

I’m so close, it would be a shame if my first shader could never work :frowning:

Try hardcoding the motionSamples value of the loop.

Yeah, AMD cards need to unroll all of the loops, which isn’t possible when it relies on a non-constant length.