Hi all,
As you may know, I’ve been working on a new postprocessing filter framework for Panda 1.9. (Forum thread: [url]CommonFilters - some new filters, and the future] )
I’m thinking that maybe some time after 1.9, Panda could use a depth-of-field filter. It seems that currently the best technique for semi-fast, realistic depth of field is currently the one by Pixar, originally meant for film editing previews: http://graphics.pixar.com/library/DepthOfField/paper.pdf
The paper does not mention a publication date, but the latest references it makes to literature are from 2005. Thus it seems likely that the paper was published some time in 2006 or maybe 2007. The computational power of GPUs has significantly increased since then, so I think it would be worth investigating whether the algorithm is fast enough for realtime applications on current hardware.
The algorithm is based on the diffusion equation. This makes sense - blur can be thought of as a diffusion process (usually an isotropic one). Using a diffusion equation with variable coefficients makes variable width blur easy, if one knows how to numerically solve partial differential equations (PDEs).
Of course, to be fast, the PDE solver must run on the GPU, and utilize the available massive parallelization. Also, keeping the data in the GPU memory for the whole calculation is essential in order not to unnecessarily consume PCI-E bandwidth.
GPU parallelization of diffusion solvers, as of early 2015, mainly relies on a technique known in the numerics community as ADI (Alternating Direction Implicit, a classic operator splitting method from the 1950s; see e.g. Wikipedia: http://en.wikipedia.org/wiki/Alternating_direction_implicit_method ), which makes the linear equation system tridiagonal. This is usually combined with some variant of cyclic reduction ( http://people.mpi-inf.mpg.de/~strzodka/papers/public/GoSt11CR.pdf ), which is a parallel algorithm for the solution of tridiagonal linear equation systems. This is indeed the approach proposed in the paper by Pixar.
So that’s the background - now to the point. I would like to try to implement this, but there are some practical issues I could use some help with.
First, partial differential equations tend to require a lot of precision in the intermediate computations. Is it possible to use float textures in Panda? (Even if it turns out they are not needed for this particular application, I’m also planning something else for later that will definitely need them.)
Also related to texture formats, are 3D textures supported? (Not needed in this particular filter, but this would be useful for volumetric applications, especially 3D PDEs.)
Secondly, is it possible to tell Panda to render only a part of a viewport using a particular shader? PDE solvers in OpenGL have been commonly implemented by rendering the fragments residing on the domain boundaries using a different shader (that implements the boundary conditions), while rendering the interior of the domain using a shader that operates on the interior (typically, it is assumed that interior points have neighbors in all directions). See e.g. section 38.3.2 in http://http.developer.nvidia.com/GPUGems/gpugems_ch38.html
And finally, compute shaders seem the ideal choice for implementing the cyclic reduction for the tridiagonal solver. Fortunately, I happened to upgrade my GPU over the holidays (now running a Radeon R9 290), and it supports compute shaders, so now I should be able to play around with them.
What the algorithm needs to be able to do is, at each step, to write to a target that is half size (compared to the step’s input) along one axis only, while retaining the original resolution along the other axis. The number of steps required depends on the dimensions of the viewport that is being rendered.
Cyclic reduction is based on eliminating the odd-numbered unknowns - each step halves the number of remaining unknowns. (It is explained better in the papers linked above.) The reduction continues until there are only one or two unknowns left. If one, its value can then be read directly, and if two, the remaining 2x2 system can be solved explicitly. Then, a similar backward process computes the final answer, at each step doubling the number of knowns until the original size is reached. (Some practical complications arise for non-power-of-two sizes, but I think those are solvable - if nothing else, these particular textures could be padded when they are allocated.)
So, the question is: any suggestions as to what documentation, code examples or similar I should read, in order to figure out how to insert the rendering of this “shrinking” texture sequence at an arbitrary point in the overall render sequence?
I’m asking because at the moment I can’t wrap my head around how compute shaders would interact with Panda’s FilterManager, which already controls the render order of any created postprocessing buffers. I already know how to render postprocessing buffers in the desired order using regular shaders - the new postprocessing framework does exactly that - so the question is, how to mix in some compute shaders.
I.e. I would like a setup where some filters first render using the regular kind of shaders, then this renders using some compute shaders and some regular shaders, and then other filters (using regular shaders) continue from there. The important thing is to be able to invoke the compute shaders at an arbitrary step of the overall render sequence.
If anyone knows, input would be appreciated