Draw Performance Tweaking

I’m doing some work on some ATI cards and it seems that many things that run well on nVidia cards don’t as well on ATI. Running Pstats reveals that the “clear” takes as much as half of the main buffer drawing instructions.

I’m not really sure what clear is but I"m assuming its the Depth Buffer and Color buffer clear. Why is that taking so long? And can I turn of clearing? The scene is setup such that there is no background shown ever, so turning off clear would do nothing. Any thoughts?

You can turn off clear. By default, Panda enables a depth and color clear on the overall GraphicsWindow, and a depth clear on render2d’s DisplayRegion.

You can turn both of these off with:

base.win.setClearDepthActive(0)
base.win.setClearColorActive(0)
base.cam2d.node().getDisplayRegion(0).setClearDepthActive(0)

And, if you happen to be using a very recent version of Panda (newer than 1.2.3), you may also need to do:


base.win.setClearStencilActive(0)

However, it may well be that clear is not your actual culprit. Graphics drivers are funny things, and it may be that your driver is doing some initial setup at the beginning of the frame (or maybe finishing up the drawing from last frame) that it for some reason decides to do when it receives a clear command. If you turn off the clear, it’s possible that it may simply shift this burden to some other part of the frame.

Then again, it might actually be the honest time to clear the frame. :slight_smile: If it is, then this time should be linearly proportional to the number of pixels in the window. Try reducing the window by 50% in each dimension–does the clear time go down by a factor of 4? If so, then yes, you’ve got a driver that’s slow to clear pixels for some reason. (It might be that playing with the color depth of your framebuffer will help this. If you are using multiple monitors, it may be faster on one of the monitors than on the other.)

David

Note that when you turn off depth buffer clear, you have to be clever with your geometry to get it to render properly. If your scene doesn’t require a depth buffer at all, you can do something like:


render.setDepthTest(0)

If you do require a depth buffer, but your scene is completely enclosed inside a convex shell, you can put that shell in a special bin that is drawn first, and turn off depth test on that shell only:

shell.setBin('background', 0)
shell.setDepthTest(0)

This way, when it draws the shell, it will fill in the depth values that will be useful for the rest of the frame.

David

So, now I’m trying to do some performance tweaking for nVidia cards. It seems that Flip seems to be taking the majority of the time now. What exactly does flip do? I assume this is the swapBuffers full for double buffered windows, but why does it take so long?

I would assume that drawing buffers would be more time consuming, but that does not appear to be the case.

Same story as above. Flip is indeed the SwapBuffers call. There are two reasons it might be slow. (a) First, the graphics card has to wait until it has finished drawing the scene before it can SwapBuffers. Since, in an ideal situation, graphics commands can queue up before they are fully processed, Panda might have run a bit ahead of the graphics card, and “flip” is when you have to wait for the card to catch up. (b) Second, even if the graphics card is fully caught up, you might have video sync enabled (this is the default), which means it has to wait for the next vertical retrace. This will be the next interval of 60Hz, or 72Hz, or whatever your monitor refresh rate is. Normally this wait time will only be significant when your frame rate is very high, of course.

In neither case is there a cause for worry. The graphics card is running as fast as it can. If you’re spending most of your time in flip, it generally means that Panda is sending graphics commands to the card as fast or faster than it can handle them, which is an enviable position to be in.

To make it faster in case (a), you probably have to reduce the depth complexity of your scene. In case (b), you could turn off video sync, but usually you wouldn’t want to do this (doing so would introduce artifacts and waste CPU, since there’s not much point in rendering faster than the frames can be presented anyway).

David