I’m currently considering using normals aswell (thanks for the hint). One big advantage is that you only have to sample a hemisphere and not a full sphere. In fact in my implementation 50% of all samples were wrong. Therefore I’m not sure if your implementation is faster with/without a normal buffer, because more samples means more cache misses (according to some NVIDIA papers the random sampling is a torture for the GPUs internal cache, reducing width and height by factor of 2 may raise the speed more than factor of 4 (2*2)).
The most advanced implementation is maybe available at developer.download.nvidia.com/SD … mples.html. But I have to admit that I don’t understand it fully.
There is nothing special about the blur shader (and I bet that most other implementations are more sophisticated than this one). Both shaders only differ in the line “float2 st = …”. Either RESPECT_DETPH or RESPECT_NORMAL should be enabled. RESPECT_NORMAL is IMO inferior in this implementation.
const float FILTER[] = { 0.05, 0.1, 0.2, 0.3, 0.2, 0.1, 0.05 };
#define RESPECT_DETPH
//#define RESPECT_NORMAL
struct BlurVertexIn {
float4 position : POSITION;
float2 texcoord : TEXCOORD0;
};
struct BlurVertexOutFragmentIn {
float4 position : POSITION;
float2 texcoord : TEXCOORD0;
};
struct BlurFragmentOut {
float4 color : COLOR;
};
void BlurVertexProgram(in BlurVertexIn i, out BlurVertexOutFragmentIn o) {
o.position = i.position;
o.texcoord = i.texcoord;
}
void BlurSSAOHorizontalFragmentProgram(in BlurVertexOutFragmentIn i, out BlurFragmentOut o, uniform sampler2D samplerColor, uniform sampler2D samplerNormalDepth) {
const float2 scale = float2(1.0 / ViewPortPixelSize.x, 1.0 / ViewPortPixelSize.y);
float currentColor = tex2D(samplerColor, i.texcoord).r;
#ifdef RESPECT_NORMAL
float3 currentNormal = tex2D(samplerNormalDepth, i.texcoord).xyz;
#endif
#ifdef RESPECT_DETPH
float currentDepth = tex2D(samplerNormalDepth, i.texcoord).w;
#endif
float color = 0;
for(int n = 0; n < 7; n++) {
float2 st = i.texcoord + float2(scale.x * (n - 3), 0.0);
float sampleColor = tex2D(samplerColor, st).r;
#ifdef RESPECT_NORMAL
float3 sampleNormal = tex2D(samplerNormalDepth, st).xyz;
if(dot(sampleNormal, currentNormal) < SSAOBlurThreshold) {
color += currentColor * FILTER[n];
} else {
color += sampleColor * FILTER[n];
}
#endif
#ifdef RESPECT_DETPH
float sampleDepth = tex2D(samplerNormalDepth, st).w;
if(abs(sampleDepth - currentDepth) > SSAOBlurThreshold) {
color += currentColor * FILTER[n];
} else {
color += sampleColor * FILTER[n];
}
#endif
}
o.color = color;
}
void BlurSSAOVerticalFragmentProgram(in BlurVertexOutFragmentIn i, out BlurFragmentOut o, uniform sampler2D samplerColor, uniform sampler2D samplerNormalDepth) {
const float2 scale = float2(1.0 / ViewPortPixelSize.x, 1.0 / ViewPortPixelSize.y);
float currentColor = tex2D(samplerColor, i.texcoord).r;
#ifdef RESPECT_NORMAL
float3 currentNormal = tex2D(samplerNormalDepth, i.texcoord).xyz;
#endif
#ifdef RESPECT_DETPH
float currentDepth = tex2D(samplerNormalDepth, i.texcoord).w;
#endif
float color = 0;
for(int n = 0; n < 7; n++) {
float2 st = i.texcoord + float2(0.0, scale.y * (n - 3));
float sampleColor = tex2D(samplerColor, st).r;
#ifdef RESPECT_NORMAL
float3 sampleNormal = tex2D(samplerNormalDepth, st).xyz;
if(dot(sampleNormal, currentNormal) < SSAOBlurThreshold) {
color += currentColor * FILTER[n];
} else {
color += sampleColor * FILTER[n];
}
#endif
#ifdef RESPECT_DETPH
float sampleDepth = tex2D(samplerNormalDepth, st).w;
if(abs(sampleDepth - currentDepth) > SSAOBlurThreshold) {
color += currentColor * FILTER[n];
} else {
color += sampleColor * FILTER[n];
}
#endif
}
o.color = color;
}
Playing around with shaders, I normally do in FX Composer (IMO it is somehwat faster to test ideas than with Panda3D). I can’t be copied directly into the FilterManager. I’ll clean up the whole testsuite in the next days and upload everything somewhere. One more note: The screen space normal (xyz) and depth (w) are stored in one floating point buffer. The normals are already normalized. If the offscreen buffers contain bytes, a “tex2D(…) * 2.0 - 1.0” is perhaps needed.
More details (with a link to the source): discourse.panda3d.org/viewtopic … 3&start=30.