PStats: Flip begin

Hello, I’m analyzing the stats of my game, and I consume about 10 ms for the beginning of the flip. But I don’t know what this is exactly, and I don’t know the possible reasons that are causing this delay. Any ideas? Can I remove this performance problem? Thank you!

“flip” is time spent waiting for the graphics card to be ready to draw the next frame. In most simple scenes, a high “flip” time means only that you have sync-video enabled, and so your graphics card is waiting for the next 60hz video sync before it starts to draw.

This is a setting in your Config.prc file. For best quality rendering, you want to keep sync-video enabled. For performance analysis, you should disable it.

David

Thank you David, the problem is that (I suppose) I’ve already disabled vsync. In the main file I’ve:

loadPrcFileData( '', 'sync-video 0' )

And, before this game section (where I consume 25 ms per frame), the framerate is about 120-150 fps (during the menu section), so the syncvideo should be disabled (am I wrong?).
And another doubt: even if the problem is the sync-video, is 10 ms a “right” time for a game where I already spend 15 ms for other stuff?
Thank you very much!

10ms is certainly a reasonable time for video sync; it will round up to the next integer fraction of your video sync rate (which is usually 60hz). Thus, it will round up to 60hz, 30hz, 20hz, 15hz, and so on.

But if you’ve disabled video sync, then it means you’re waiting for the card to finish drawing. This is usually due to too much depth complexity or too-complex shaders. You can try reducing the complexity of your shaders, or if you have a lot of overdraw you can try sorting the scene from front to back with the CullBinManager.

David

Thank you again; I’ve a small example which produces a similar phenomenon. Here I can’t apply your hints because I’ve only one mesh and I’m using a prebuilt shader.

from panda3d.core import loadPrcFileData
loadPrcFileData( '', 'want-pstats 1' )
loadPrcFileData( '', 'show-frame-rate-meter 1' )
loadPrcFileData( '', 'sync-video 0' )
loadPrcFileData( '', 'win-size 1920 1080' )
from pandac.PandaModules import CardMaker, Spotlight
import direct.directbase.DirectStart
base.disableMouse()
camera.setPosHpr( ( 0, -2, 2 ), ( 0, -48.5, 0 ) )
render.setShaderAuto()
def setupFloor():
  cm = CardMaker( '' )
  cm.setFrame( -1.2, 1.2, -1, 1 )
  render.attachNewNode( cm.generate() ).setP( -90 )
def setupLight():
  light = render.attachNewNode( Spotlight( 'Spot' ) )
  light.setPosHpr( ( 0, -20, 20 ), ( 0, -48.5, 0 ) )
  light.node().setShadowCaster( True, 1024, 1024 )
  render.setLight( light )
setupFloor()
setupLight()
run()

As you can see, almost all the time (6.7 ms of 8.8 ms) is consumed in the Flip::Begin section (the sections ‘Draw’, ‘Cull’ and ‘App’ are almost zero).

Is this situation normal (a single quad with a shader is consuming almost 9 ms)? (How) Can I improve this situation (i.e. am I doing anything wrong?)? Thank you for the patience, this is my first attempt in this kind of analysis… :blush:

Just for the record, I think that only applies to CRT monitors.

It’s not impossible. What kind of graphics card do you have? I’ve seen some cards with very poor shader performance. Many Intel cards are like this; also the NVidia GeForce 7600 gave miserable performance.

Does the performance improve if you make the window smaller, or stop using shadows?

CRT monitors typically have a refresh rate of 72hz, though 60hz is seen occasionally. For LCD monitors, 60hz seems far more common. Aside from the preferred refresh rate, video sync works the same way for either monitor type; it’s a property of the video card and its output signal, not really a property of the display technology (though it’s true the DVI signal was originally designed with CRT displays in mind).

David

It’s a NVidia GeForce 8600.

With a resolution 1024x768 I get 4.7 ms (2.5 ms are for Flip::Begin).

Commenting the line:

#light.node().setShadowCaster( True, 1024, 1024 )

I get 5.7 ms (4.0 ms are for Flip::Begin).

Thanks!

I’m sorry, my implicit questions were too much implicit… :blush:

Is 4.7 ms a “normal” value for the rendering of a quad in 1024x768?

Is 5.7 ms a “normal” value for the rendering of a quad without shadows?

Thank you!

I don’t know; it does sound like a lot, but it is also a full-screen quad, and it does have a spotlight applied, which can be expensive too. (This cost is per-pixel, not per-vertex, so the fact that it is just one quad doesn’t mean much.)

I bet if you applied only a directional light, or no light at all, it would be much faster. I also bet it would be faster if you disabled the auto-shader. The auto-shader by default enables per-pixel lighting, which can be expensive, particularly with large pixel areas and complex lighting light a spotlight and shadows.

David

Ok, but I would like the spotlight because of the shadows, and I would like the per-pixel lighting. So I wouldn’t remove these features altogether.

But I could disable some of these on old hardware. Is there any value that I can query in order to decide if I should disable shadows and/or auto-shader? Should I monitor the framerate or there are other ways (like querying specific hardware features that “imply” slow performances if they fight with shaders)? Thank you!

Unfortunately, there is no reliable way to determine a priori whether your hardware will perform well, short of a large database of graphics cards with known performance characteristics. You could render a few frames and try to measure the actual performance, though this is also risky (you might be misled by momentary system hiccups or something).

On the other hand, 4.7 ms, or even 10 ms, is not really that bad to render a whole frame, and if your scene is well-designed, it may not require much more time to render even as it grows more complex than a single quad. Remember, that cost is per-pixel, and ideally you will only draw each pixel onscreen exactly once, which means you will pay exactly the same cost no matter how many vertices you have in your scene.

Of course, it’s a bit unrealistic to say that you will only draw each pixel once, but if you do a good job of designing your scene it may not be much more than once, and perhaps this level of performance is then acceptable.

David

Ok, now I see the problem better.

A problem is that with LCD monitors I can’t use a “low” resolution (while the game is in fullscreen mode) without getting bad results (I get perfect quality with the native resolution of the monitor, but very bad quality if I use a different resolution). So, I’ve to use the biggest resolution, and this leads to the performances problems you pointed out (is this right or there exist ways to get a good quality with non-native monitor resolutions?).

I got results applying your hints about using cullbins for the meshes (now I draw foreground meshes before background meshes and this improves performances), but I can’t get results sorting the last elements of my scenes, the GUI elements.

As I said, my scenes have background and foreground meshes, and these are arranged correctly now. But I also have GUI elements, and I use OnscreenImages for them. This is a small example:

from panda3d.core import loadPrcFileData 
loadPrcFileData( '', 'want-pstats 1' ) 
loadPrcFileData( '', 'show-frame-rate-meter 1' ) 
loadPrcFileData( '', 'sync-video 0' ) 
loadPrcFileData( '', 'win-size 1920 1080' ) 
from pandac.PandaModules import CardMaker, Spotlight 
from direct.gui.OnscreenImage import OnscreenImage
import direct.directbase.DirectStart 
base.disableMouse() 
camera.setPosHpr( ( 0, -2, 2 ), ( 0, -48.5, 0 ) ) 
render.setShaderAuto()

def setupFloor():
  cm = CardMaker( '' ) 
  cm.setFrame( -1.2, 1.2, -1, 1 ) 
  m = render.attachNewNode( cm.generate() )
  m.setP( -90 )

def setupGUI():
  m = OnscreenImage( image = 'img.jpg', scale = ( 2, 0, 1 ) )
  #m.setBin( 'background', 10 )

def setupLight():
  light = render.attachNewNode( Spotlight( 'Spot' ) ) 
  light.setPosHpr( ( 0, -20, 20 ), ( 0, -48.5, 0 ) ) 
  light.node().setShadowCaster( True, 1024, 1024 )
  render.setLight( light )

setupFloor()
setupGUI()
setupLight()
run()

In this example I’ve a fullscreen image (i.e. it covers the mesh entirely). So, I should have only the consumption of the image, and not the one of the quad. This example requires 9.7 ms.

If I run with the image only:

#setupFloor()

I get 4.3 ms. So, I should tell to Panda to draw the image before the quad, but if I uncomment this line in setupGUI:

m.setBin( 'background', 10 )

I still get 9.7 ms (my attempt is to draw the image before the quad, and considering that the quad is in the opaque bin, I would try to assign the image to the background bin).

Where is my error? Thank you!

This sounds right, depending on what you mean by “good quality”. I suppose you could render into a (small) offscreen buffer and then draw it onto a large quad that overlaid the entire window; that would use filtering to scale it onto all of your pixels evenly, providing a nice antialiased (but blurry) rendering of any resolution. But then, you would also be paying for the full-screen quad.

Does your card really render a single full-screen quad so slowly, even when the shader is not enabled? I am surprised that your GUI has a measurable impact.

Using the GUI to obscure the 2-d rendering is a good idea, but challenging. The problem is that the depth buffer is not in the same space, so it is possible that if you do this, some objects in the 3-d space may appear closer than the 2-d objects. I suppose you could use the stencil buffer instead of the depth buffer for more reliable results.

But anyway, the reason that your attempts to reorder the 2-d scene with binning isn’t working is because the 2-d scene is drawn in its own DisplayRegion, which is a separate pass that is currently set to draw after the main DisplayRegion. You can change this order with:

base.cam2d.node().getDisplayRegion(0).setSort(-10)

Then you can play with the depth buffer and/or stencil buffer to get the results you want. Note that the default is for the main camera to clear the depth buffer before it starts, so you may want to disable this with:

base.cam.node().getDisplayRegion(0).setClearDepthActive(False)

David

Yes, the fullscreen image consumes 4.3 ms even without shader.

Luckily in my application I’ve no fullscreen GUI. I’'m using fullscreen quad here only to highlight the performances differences between the different versions of code. I’ve some smaller impacts, though.

Thank you David! The stencil buffer advice works, now I’ve better performances!

There’s a last issue with stencil buffer. I can get it working with opaque images, but when I use transparent images I’ve no idea about how to get it (I’m talking about images with totally opaque/transparent pixels, not images with semi-transparent pixels). I’ve this code:

from panda3d.core import loadPrcFileData 
loadPrcFileData( '', 'framebuffer-stencil #t' )
from pandac.PandaModules import *
from direct.gui.OnscreenImage import OnscreenImage
import direct.directbase.DirectStart 

base.disableMouse() 
camera.setPosHpr( ( 0, -2, 2 ), ( 0, -48.5, 0 ) ) 

base.cam2d.node().getDisplayRegion( 0 ).setSort( -10 )
base.cam.node().getDisplayRegion( 0 ).setClearDepthActive( False )

stencilReader = StencilAttrib.make( 1,
  StencilAttrib.SCFNotEqual, StencilAttrib.SOKeep,
  StencilAttrib.SOKeep, StencilAttrib.SOKeep, 1, 1, 0 )
render.setAttrib( stencilReader )

def setupFloor():
  cm = CardMaker( '' ) 
  cm.setFrame( -.8, .8, -1, 1 ) 
  m = render.attachNewNode( cm.generate() )
  m.setP( -90 )  

def setupGUI():
  i = OnscreenImage( image='img.png', pos=(0,0,-.6), scale=(1.2,0,.4) )
  i.setTransparency( TransparencyAttrib.MAlpha )

  cm = CardMaker( '' )
  cm.setFrame( -1, 1, -1, 1 )
  viewMask = render.attachNewNode( cm.generate() )
  viewMask.setPosHprScale( i.getPos(), (0,0,0), i.getScale() )
  viewMask.reparentTo( aspect2d )
  constantOneStencil = StencilAttrib.make( 1, StencilAttrib.SCFAlways,
    StencilAttrib.SOZero, StencilAttrib.SOReplace,
    StencilAttrib.SOReplace, 1, 0, 1 )
  viewMask.node().setAttrib( constantOneStencil )
  viewMask.node().setAttrib( ColorWriteAttrib.make( 0 ) )
  viewMask.setBin( 'background', 0 )
  viewMask.setDepthWrite( 0 )

setupFloor()
setupGUI()
run()

If the image img.png hasn’t transparent pixels it works, but, if I use an image with transparent pixels, these pixels still give their contribute to the stencil buffer. I would like transparent images for rounded-corners GUI.

For example, I obtain this:

I see that it’s the expected behaviour because this code writes 1 in the stencil buffer for each pixel of the image, but I can’t figure out how to assign the right value depending on the transparency value of the corresponding pixel of the image.

Is it possible to obtain that or I am on a wrong way? Very thanks!

Hmm, if you apply TransparencyAttrib.MBinary instead of MAlpha, it uses the AlphaTestAttrib to only draw pixels where alpha is > 0.5. Maybe this will also avoid drawing to the stencil buffer?

David

Yes, from your words I tried using another copy of the image (is it correct to use a copy for the drawn image and another for the stencil buffer (as in the code below)?), and I obtained the expected result! So, now I’ve:

def setupGUI():
  i = OnscreenImage( image='img.png', pos=(0,0,-.6), scale=(1.2,0,.4) )
  i.setTransparency( TransparencyAttrib.MAlpha )
  viewMask = OnscreenImage( image='img.png', pos=(0,0,-.6), scale=(1.2,0,.4) )
  viewMask.setTransparency( TransparencyAttrib.MBinary )
  constantOneStencil = StencilAttrib.make( 1, StencilAttrib.SCFAlways,
    StencilAttrib.SOZero, StencilAttrib.SOReplace,
    StencilAttrib.SOReplace, 1, 0, 1 )
  viewMask.node().setAttrib( constantOneStencil )
  viewMask.node().setAttrib( ColorWriteAttrib.make( 0 ) )
  viewMask.setBin( 'background', 0 )
  viewMask.setDepthWrite( 0 )

and it works! Thank you!!!