Cartoon shader improvements

Screenshots - effect of supersampling

This post concludes the status update.

All screenshots in this post taken using the new version. Depth buffer is used as an auxiliary data source.

The depth buffer is also used to slightly modulare the separation parameter (changing the thickness of the line depending on distance from camera), but I might still tweak this. Due to the nature of the algorithm, separation does not really control line thickness: instead, it controls the radius of the edge detection stencil.

Up to one pixel of separation, these are effectively the same thing, but above one pixel, it also displaces the outline from the pixel containing the edge, because the edge is detected from further away. An object that is thinner than the separation value may get several non-overlapping “ghost” outlines. (This is a consequence of the inking algorithm, and was there already in 1.8.1.)

The only difference between the pictures in this post is the number of subpixel samples. Look at the top of the head to see the effect clearly.

And now for the reply:

Np :slight_smile:

Sometimes limitations like this can be worked around. For example, many older GPUs (mine included) do not support variable length for loops (the Cg compiler wants to unroll loops, and cannot if the end condition depends on a variable).

If you have a shader generator, and the loop’s end condition uses a variable just because it depends on a configuration parameter (which remains constant while the shader is running), you can make the shader generator hardcode it from its configuration when it writes the shader. If you’re coding your application in Python, Shader.make() (from pandac.PandaModules) comes in useful for compiling shaders generated at runtime. Look at for usage examples. But of course doing this adds another layer of logic.

Also, keep in mind that error messages from Cg can sometimes be misleading. I encountered the variable length for loop problem when I was trying to figure out why Panda’s SSAO wouldn’t run on my GPU. I was debugging entirely the wrong thing until rdb stepped in, said he’d seen a similar situation before, and that it is likely the problem is in the variable-length loop, not the array it is indexing into (although the error message indicated that the problem should have been in array indexing).

(SSAO is fixed in 1.9.0, using the approach mentioned above.)

Just after I said that, I did some testing this evening and found that the following run at the same speed on my GPU:

if(samples > CUTOFF)
  o_color = lerp(o_color, k_targetcolor, (samples - CUTOFF) / (NUMSAMPLES - CUTOFF));

vs. the branch-free alternative

float f = step(CUTOFF, samples);
o_color = (1.0 - f)*o_color
        + f*lerp(o_color, k_targetcolor, (samples - CUTOFF) / (NUMSAMPLES - CUTOFF));

but on the other hand, this was the only if statement in the shader. When I later complicated this to

if(samples1 > CUTOFF)
  o_color = lerp(o_color, k_targetcolor, (samples1 - CUTOFF) / (NUMSAMPLES - CUTOFF));
else if(samples2 > CUTOFF)
  o_color = lerp(o_color, k_targetcolor, (samples2 - CUTOFF) / (NUMSAMPLES - CUTOFF));

vs. the branch-free equivalent

float f1 = step(CUTOFF, samples1);
float f2 = step(CUTOFF, samples2);
o_color = (1.0 - max(f1,f2))*o_color
        + f1*lerp(o_color, k_targetcolor, (samples1 - CUTOFF) / (NUMSAMPLES - CUTOFF))
        + (1.0 - f1)*f2*lerp(o_color, k_targetcolor, (samples2 - CUTOFF) / (NUMSAMPLES - CUTOFF));

the alternatives still ran at the same speed. Of course, this test is hardly conclusive; the texture lookups in the supersampler are probably taking so much time that a single if statement (or two) has a negligible effect on the total time taken by this particular shader. But that’s also a useful piece of information: branching is not always a total performance killer.

I also observed that the Cg compiler, at least as invoked by Panda, seems to optimize the code (which is of course the sensible thing to do - what is not clear a priori is whether there is an optimizer in any given compiler, and if so, what kinds of optimizations it applies).

The optimizer seems pretty advanced - it seems to do some kind of dependency analysis and omit code that does not affect the output. (I was trying to do a rudimentary kind of manual profiling of the shader, disabling parts of it to see what is taking the most time.)

Namely, even if the shader code analyzes both the normal and depth textures, there is absolutely no speed impact if it does not use the result (filling o_color with a constant value). The expected performance hit from the texture lookups appears immediately when the result of the calculation is used in computation of o_color. I disabled the if statement and the lerp, too, using just “samples/NUMSAMPLES” to set the red component of o_color, setting the other components to constant values. The result was the same.

In conclusion, it might be good to test your particular case using both the nested-if and branch-free approaches, if the branch-free version is not too complicated to write.

Regarding the new version: I do think that it’s an improvement over the 1.8.1 version–the reduction in missed edges alone is enough to make it a worthwhile inclusion, methinks.

One thing that I notice: the edges in the new version seem to be a lighter colour than in the 1.8.1 screenshots–is that intentional?

Hmm… Looking at the lines, the jaggedness of the edges does seem reduced. Looking closely, the antialiasing pixels seem a bit light–could they be deepened a bit, to make the antialiasing a little stronger?

I don’t think that this will likely help in my case–as you mention a little further on in your post, there’s a variable-length loop involved. In short, I was experimenting with using a count of sample colours (which are effectively object ids in my implementation, recall) when detecting edges in order to antialias my lines somewhat–the idea being that a point the samples of which are heavily biased towards one colour or another is presumably further from the edge than one that has a nearly even distribution, and can thus be rendered as “partially-inked”, hopefully shading the line a little.

Hmm… Perhaps… I might go back and have another shot at a non-if version (the nested-if version replaced a non-if version that wasn’t working); it will likely be cleaner, at any rate.


Hmm. Sort of yes and no :slight_smile:

The old version inks at 100% strength whenever it inks a pixel, leading to a dark (and jaggy) line.

[i]EDIT: Oops, this observation is mistaken. Looking again at the original 1.8.1 sources, here is the complete code of the inker:

float4 cartoondelta = k_cartoonseparation * texpix_txaux.xwyw;
float4 cartoon_p0 = l_texcoordN + cartoondelta.xyzw;
float4 cartoon_c0 = tex2D(k_txaux, cartoon_p0.xy);
float4 cartoon_p1 = l_texcoordN - cartoondelta.xyzw;
float4 cartoon_c1 = tex2D(k_txaux, cartoon_p1.xy);
float4 cartoon_p2 = l_texcoordN + cartoondelta.wzyx;
float4 cartoon_c2 = tex2D(k_txaux, cartoon_p2.xy);
float4 cartoon_p3 = l_texcoordN - cartoondelta.wzyx;
float4 cartoon_c3 = tex2D(k_txaux, cartoon_p3.xy);
float4 cartoon_mx = max(cartoon_c0,max(cartoon_c1,max(cartoon_c2,cartoon_c3)));
float4 cartoon_mn = min(cartoon_c0,min(cartoon_c1,min(cartoon_c2,cartoon_c3)));
float cartoon_thresh = saturate(dot(cartoon_mx - cartoon_mn, float4(3,3,0,0)) - 0.5);
o_color = lerp(o_color, k_cartooncolor, cartoon_thresh);

This code is pasted by the generator into the fshader if inking is enabled.

Some notes. Here l_texcoordN is initialized based on vtx_position.xzxz (in the vshader), so it contains the same coordinates twice. The parameter k_cartoonseparation is a float4 with the first and third components nonzero. Note the .xwyw applied to texpix_txaux. Finally, observe that in the computation of each cartoon_c*, the last two components of the corresponding cartoon_p* are discarded (accessing it by .xy). Combining these observations, the .wzyx is a trick that, given this setup, allows using the same delta variable to offset in the y direction (used as .xyzw it offsets in the x direction).

As for the colouring, observe the last two lines of code. The saturate() clamps between 0 and 1, but does not touch values already in that range, leading to a sort of shading as the value of the expression varies. However, as can be evidenced by changing the relevant line to

float cartoon_thresh = step(0.5, dot(cartoon_mx - cartoon_mn, float4(3,3,0,0)));

this is actually the culprit behind the “reversed smoothing” in the side bangs in the original vanilla 1.8.1 screenshot that I posted earlier. The normals don’t always behave in the way this code assumes!

Also, the float4(3,3,0,0) is the reason for some of the missed edges; float4(2,2,2,0) would be better, as the normal is a 3D vector. But I have to admit that in my modified version, I went for float4(1,1,1,0) and changed the cutoff (0.5) to 0.3. This seems to produce the best results out of the combinations I have tested.

End EDIT.[/i]

The supersampling (subpixel) version uses the number of supersamples that would like to have the pixel inked to control the alpha value. The control is scaled so that the voting threshold corresponds to zero alpha, and when all supersamples agree, the alpha becomes 1.0.

Hence, unless all supersamples agree, the ink pixel will be partially translucent. Thus the line will in general be lighter.

At the same time, it is of course this exact same property of the supersampling version that produces the smoothing.

Yes. It’s just a matter of inserting a suitable mapping function to the alpha control. The difficult part is figuring out what is a good shape for the function :slight_smile:

(Maybe a fractional power such as sqrt, as they are commonly used to boost the low end when values are in the range 0…1.)

But before experimenting with that, how about this version (finished over this morning’s coffee)?

Now the supersampler is off, and instead postprocessing is used.

This is my modified pixel-local version based on your “detect runs” suggestion to line smoothing. It is designed to detect certain patterns of 2 and 3 inked neighbours (so at most it applies a two-pixel “run” of smoothing).

It runs a lot faster than the supersampling version, requiring 12 additional texture lookups in the ink texture, for a total of 4*2 + 12 = 20 texture lookups per pixel. Here the first term comes from the first pass; 2 = number of textures to process (normals, depth); 4 = number of detection points.

Compare to the 9 x supersampling version, which requires 9 * 4 * 2 = 72 lookups.

Also, the smoother requires no parameters.

If both are supersampling and postprocessing are active at the same time, the result looks fuzzy:

The fuzziness comes from smoothing pixels that are only partially inked. The algorithm already factors in the alpha values of the original inked pixels, so it may be that there is simply no need to postproc-smooth in places where there are already translucent pixels. Some additional logic could probably fix this (e.g. by switching off smoothing locally below a critical alpha), but I’m not sure if there is a point in doing that.

So it seems we have two options to smooth the lines - either supersampling (costly, but can produce thin lines; works down to separation = 0.5) or postprocessing (relatively cheap, but lines always look thicker; starts breaking down if separation < 0.7).

I’m rather tempted to support both, making the cheap postproc the default.

I see. Well, too bad :slight_smile:

By the way, one more link - if you haven’t already read the Cg documentation, it contains some useful information. There’s of course API documentation on the stdlib functions, but also the listing of profiles is useful.

For example, many shaders in Panda use the arb_fp1 profile. The documentation for this profile says that variable-length loops are not supported, because the profile requires all loops to be unrolled by the compiler. Maybe I should have read that first a year ago :stuck_out_tongue:

One more observation.

If the object being rendered is far away from the camera (which may often be the case in e.g. 3rd person games), then supersampling gives visibly better quality than the cheap postprocessor. Here’s a screenshot at 1:1 resolution, with the camera placed at (0,-150,0) instead of its original position at (0,-50,0) (used in the previous screenshots):

Left: vanilla 1.8.1
Center: new inker, using postprocessing
Right: new inker, using 9 x supersampling

Ooh, that does look better, I do believe! :slight_smile:

One thing that I’ll say against it is that it doesn’t seem to handle the eyes–the pupils in particular–quite as well as did the super-sampled version, I feel. I’m not quite sure of what’s going on there, but the post-processed version leaves that little white section somewhat square, and there seems to be a line being generated at the top of the pupil that the supersampler is perhaps handling a little better–even if it’s just by virtue of making it harder to spot.

I… Actually prefer the centre image–the post-processed version–I believe. While I can see that one might want the thinner lines of the right-hand image, the thick lines of the centre version given the result a nicely cartoony feel to my eye.

Thanks :slight_smile:

I think it’s because the postprocessing version is basically taking a slightly improved version of the vanilla render (i.e. the version generated by the new inker without supersampling), and then inking additional pixels on top of that.

Without supersampling, the edge detection is not as accurate, so the input to the postprocessor is not very good. The postprocessing still removes the “jags”, but due to the inaccurate detection of the more difficult edges, it will cause some areas to fill where they shouldn’t.

The eyes of the dragon model are especially problematic. I think these edges can be detected reliably only by looking for material discontinuities; there is not enough variation in the normals or in the depth. (The same applies to the side bangs in my own test model.)

To fix this, it might be possible to use the supersampler also in the postprocessing-based version, in order to get more accurate edge detection. As my test shows, this obviously needs some changes to the logic that decides the colour of the original inked pixels; maybe they all need to be rather dark for the postprocessor to work properly without causing a fuzzy look.

There is also a related issue: this type of postprocessing cannot vary the thickness of the line based on the distance of the object from the camera. It will always cause the line to look approximately two pixels thick. Thus, as the dragon gets further away from the camera, a larger relative proportion of the white area in the eyes will be filled with ink.

This implies also that when the camera (or the character) moves in the depth direction, the change in relative line thickness (w.r.t. the size of the character on screen) becomes very noticeable. If this is the intent, then that’s fine - but I find I personally prefer a version that tries to keep the relative line thickness approximately constant for moderate to far zoom levels.

One further idea: it would be possible to extend the line detection by one more pixel in a rather simple manner, by introducing another 12 texture lookups. The key observation is that if a line steps onto this row/column N pixels away (along either coordinate axis), the other N-1 pixels belonging to the line must be on a neighbouring row/column. For N >= 2, three new pixels are needed per cardinal direction, leading to a total of 12. But I’m not sure if the extra cost is justified - two-pixel fades to smooth out the “jags” already seem to work pretty well.

Ah! Thanks for the input! Yes, it’s the post-processed version.

I’m aiming for an anime look, and I think the supersampling version approximates that better, especially when a character covers only a small part of the screen (as is common in platformers, strategy games, …).

But it is good to have the option for a different kind of cartoon look, too - especially in a general-purpose library such as Panda.

All the more reason to support both :slight_smile:

Aah, that is a pity. :confused:

Aah, fair enough–that does make sense: while I’m not sufficiently familiar with anime to comment on that, I feel, I do see that thick lines would likely be a bit of a problem in cases in which objects typically cover only a small part of the screen.

Indeed. :slight_smile:

The reason is of course that the line is already at least one pixel thick and is completely black. The postprocessing spreads more ink (into previously non-inked pixels) to smooth out “jags”. Hence, a pixel-aligned line in a 45 degree angle will look approximately two pixels thick. Lines nearer to horizontal or vertical directions will look between one and two pixels thick.

The supersampling is able to represent lines of varying thicknesses, because it can compute the fraction of the pixel covered by the detected edge.

Though I have to admit that when I paused to think about it in more detail, I don’t fully understand why the supersampling works so well.

After all, this is a postprocess filter working on fullscreen textures, which gets a 1x resolution normal map as its input, so the input contains no actual subpixel data. When asked for normals at some fractional location that is not a pixel center, the GPU just bilinearly interpolates between the normals captured at the nearest pixel centers (i.e. the values in the aux texture).

It is clear that in areas where a quantity is continuous, its (bi)linear interpolant is often a pretty good approximation, when the set of actual source data points is dense enough compared to the space rate at which the quantity changes.

[i](Arguably, though, in the case of normals, the most accurate interpolation is spherical linear interpolation (a.k.a. slerp, quaternion rotation) instead of the regular kind, because the normal represents a direction. In this case, regular linear interpolation can be seen as an approximation, which works reasonably well only if the change in the normal over a pixel is small enough. Using regular linear interpolation between two unit direction vectors, the interpolated vector won’t even be of unit length, because (as the convex combination parameter varies from 0…1) the interpolated vector’s tip moves in a straight line, instead of following a great circle on the unit sphere.

It is, however, a non-trivial question how the kind of interpolation used affects the edge detector. It may happen that the current edge detection algorithm works better with the regular kind of interpolation, although it is the “wrong” kind.

So this is primarily a theoretical aside one should be aware of; since the inker already works, and regular lerp is supported in GPU hardware, there is really no reason to switch the interpolation of normals to slerp.)[/i]

The next thing to observe is that linear interpolation without any special handling for jumps always produces continuous interpolants. Discontinuities are automatically eliminated.

Furthermore, to be mathematically accurate, it should be pointed out that because the input is a texture - i.e. discrete sampled data that is only defined at the grid points - the input itself contains no information whatsoever about what happens between the sampled points (strictly speaking, it doesn’t even claim anything exists there). Linear interpolation is just a particularly convenient choice for the operator that is used to “promote” (in the computer programming sense of the word) the data from discrete sampled points on a grid into a function of x and y.

Then there is the design of the detector. The edge detector, as it is currently implemented, basically has an arbitrary threshold for the maximum allowed jump of a “continuous” quantity over one pixel, and if the detected change exceeds that, then it declares a discontinuity (i.e. an edge).

Some of the supersamples clearly read data from both sides of the edge (now understanding “edge” as where a human would declare it to be), while some may read only on one side of the edge (maybe this causes the lighter shade of ink?). The most peculiar category are those that read the linear interpolant from the halfway point between the pixels.

This is the point where intuition stops helping - I suppose if I wanted to look more closely into this, I should write out the equations for a simplified 1D case to figure out what is going on.

I’ll support both - it is good to have flexibility in a general-purpose library.


In this variant, 9 x supersampling is used to determine edge locations. In the first pass, all pixels passing the voting test are inked fully black, and in the second pass the postprocessor is applied to do the smoothing.

Here are the results. Again 1:1 size to show artifacts:

Comparison with the other variants (object far from camera):

EDIT: This test didn’t really look any better than postprocessing without supersampling, so this hybrid mode is NOT included in the updated patch.

Patch updated and posted to the bug tracker.

This version is hereby submitted for code review and consideration for 1.9.0, replacing the previous versions of the patch.

Ah, good, and well done overall. :slight_smile:

Thanks :slight_smile:

And thanks for the input! I feel that this discussion enabled me to improve the inking quality somewhat.

Now the only thing to do at this point is to wait for rdb’s comments :slight_smile:

I figured out a better antialiased inking algorithm, which is way faster and I think produces better results:

Updated patch coming soon.

Ooh, that does look good!

If I may ask, how does this one work?

I do think that there are a few areas in which the previous post-process version worked better (the side of the nose, for example), but overall that looks like a very good version, and an improvement over the old supersample version at the least.

Thanks :slight_smile:

Magic! :stuck_out_tongue:

Seriously, though, an idea suddenly struck me, leading to a new detection algorithm. Instead of finding min and max values of the quantity of interest in a cardinal-directions stencil (as in 1.8.1 and all the previous improvement attempts), in this one we compare each pixel in the stencil individually to the pixel in the center. Thus, we use more information from the tested pixels.

For each pixel in the stencil, the comparison result is thresholded. The comparison and thresholding for normals is

float3 diff = (cartoon_caux - cartoon_caux0).xyz;
float vote_aux = step(k_cutoff.x, dot(diff,diff));

where cartoon_caux contains the normals data from the pixel being tested, and cartoon_caux0 the corresponding data at the center pixel. Since all the normals in the aux texture are normalized to the same magnitude, the test is a nonlinear measure of the difference in direction.

For depth, similarly,

float diff2 = cartoon_cdepth - cartoon_cdepth0;
float vote_depth = step(0.0001, diff2*diff2);

As the variable naming suggests, both the normal map and depth detectors vote whether to ink, and these two votes are OR’d together using max() (following the observation that false negatives in the edge detector are more common than false positives).

The resulting yes/no vote is then weighted by the pixel’s euclidean distance from the stencil’s center and summed to a counter. I observed that some form of distance-based weighting of the votes is critical to avoid a blurry look.

Once votes from all the pixels in the stencil are in, the final result is thresholded (to avoid noise, from when only one pixel in the stencil triggers). The range above the threshold is mapped to [0,1], and then processed through a nonlinear remapping function that emphasizes the low end (i.e. makes ink darker than it would be by linear mapping, if only a few pixels trigger).

The upside is that not only does this render reasonably well, but it’s also fast. If the depth detector is off, it’s almost as fast on my machine as the old 1.8.1 inker was. The depth detector does slow it down, but it’s still reasonably fast - unlike the subpixel version.

(With the default stencil size and the depth detector enabled, this needs 24 texture lookups per pixel, whereas the subpixel version needed 72. The number can be reduced to 16 with a slight effect on quality.)

It’s also mathematically correct - there is no need to interpolate normals, as this algorithm samples only at pixel centers.


I’m thinking of making the new algorithm the default, but also providing the previous two as optional.

(The supersample version is perhaps not very useful, but it is used internally to render non-antialiased lines (using num_samples = 1) for input to the postprocessor, and with the same settings, also works as a mostly-backward-compatible version for people who want only the bugfixes and some new options, but no antialiasing. Dropping the for loops from that code would not simplify the overall result much :slight_smile: )

Updated patch posted to bug tracker.

This is now the final version of this patch for 1.9.0, except for possible changes required after code review.

Sorry that it’s taken me so long to respond–I started an entry for a one-week game development competition on Monday, and have been kept rather busy! However, thank you very much for the explanation of your antialiasing algorithm–it seems rather good. :slight_smile:

One week for developing a game sounds awfully short :slight_smile:

Yeah, it seems this algorithm does the job. Looking forward to getting it integrated into 1.9.0.

I have some stuff to take care of in CommonFilters first, so that we can also get ninth’s lens flare in (and some simple filters of my own: CommonFilters - some new filters, and the future).

It is, believe me, it is! XD;