Time spent in "clear"

(This post originally included a second question; as suggested below, this should now be found here.)

How might I reduce the time taken by what PStats calls “clear”?
I’ve been working on an element for my current project that involves multiple texture stages; at the moment my particular problem is improving the frame-rate achieved on my development computer. Having run it with PStats monitoring, it seems that the majority of the time used is in Draw->Clear. Is there a means by which I might reduce this time?

These are two separate problems; I’ll address your “clear” problem, I advise you to put your MeshDrawer question in a different thread and delete it from your post.

“clear” means to literally clear the background of the framebuffer, giving it a certain background colour. However, this typically isn’t a very expensive operation of course. The problem is that OpenGL is asynchronous, which means that when a command is sent to the driver, it is not guaranteed to happen right away.

If you disable clears on your window (setClearActive(0) on your window and display regions) you’ll probably see the same slowdown reappear in other categories.

You can put “gl-finish 1” in your Config.prc file to force the GPU to finish executing all commands after every major graphics operation. This will slow down your performance significantly, but it will make PStats report more accurately which category is consuming the most amount of time.

Ah, fair enough, and thank you - I’ll hopefully try your suggestions soon. :slight_smile:

As to the MeshDrawer question, I included it here simply because it seemed better to me to create one thread than two at the same time; since that seems to not be preferred, I’ll do as you suggest and start a new thread for it, I believe. Thank you for pointing that out. :slight_smile:

All right, I have some further information, I believe.

First of all, please check for me that I’m using “setClearActive” correctly. Here is what I’m doing at the moment, I believe:

base.win.setClearActive(0, 0)

# ... Elsewhere
self.surfaceBuffer = base.win.makeTextureBuffer("surface buffer", 256, 256)
self.surfaceCamera = base.makeCamera2d(self.surfaceBuffer)
# ...
self.surfaceCamera.node().getDisplayRegion(0).getWindow().setClearActive(0, 0)

If so, it seems to have no apparent effect, either on my frame-rate or on the distribution of time in PStats - “clear” still seems to be taking up the largest portion of time.

As to “gl-finish”, including the indicated line did seem to reduce my frame-rate, but to only slightly decrease the amount of time spent in “clear” (from about 15ms to about 12ms, I think that it was). Other values did shoot up, however.

Which brings me to something that seems odd to me (and which I now realise was visible in PStats, but which I did not recognise): it seems that base.win has a full four display regions. Is that as it should be? Could it simply be that clearing five display regions (my single offscreen region included) is my problem, or is this the expected behaviour?

Either way, with “gl-finish 1” in my .prc file I now see a significant amount of time in “dr_0” and “dr_1”, which I presume to be the first and second display regions, respectively.

I’ve tried removing the element that I was working on, and thus the off-screen buffer, leaving me with a scene empty save for a single two-quad, two-texture model. With the “gl-finish 1” line in place, there seems to be little effect. Without it, my frame-rate does improve, albeit not to stellar levels. The time spent in “clear” remains at around 11-12ms, it seems.

You’re not actually disabling the clear on the display regions, getDisplayRegion(0).getWindow() returns the window from that display region, which is the same as base.win. Instead, iterate over getDisplayRegions() and disable the clears on that.

If you’re seeing a lot of time in dr_0 and dr_1 under Draw, then that indicates that the drawing is just very processor intensive. Are dr_0 and dr_1 under Cull equally intensive? What does PStats report as the number of batches being sent to the GPU? What is the output of render.analyze() ? Do you have expensive pixel shaders?

Hmm… On reflection, I wonder whether my problem isn’t simply that I have a pretty weak machine, especially in terms of graphics. Perhaps things normally passed on to the GPU are being executed on CPU in my case? :confused:

I’ll confess that I’m rather hoping that my game will at least run playably on this machine; for one thing, this is my development machine, and I’m not likely to get something faster any time soon.

I changed my clear-code to the following:

for region in self.surfaceCamera.node().getDisplayRegions():
    region.setClearActive(0, 0)
    region.setClearActive(1, 0)
    region.setClearActive(2, 0)
for region in base.win.getDisplayRegions():
    region.setClearActive(0, 0)
    region.setClearActive(1, 0)
    region.setClearActive(2, 0)

(I wasn’t sure of what to use for the first parameter, so I guessed that clearing the first three of each would be reasonably safe.)
There seems to be little apparent effect (perhaps a change of one or two milliseconds, although that might be attributable to extraneous variations.

(The following results were all obtained with “gl-finish 1” still present in my .prc file.)

“Cull” seems to be associated with very little time overall, although for what it’s worth dr_0 and dr_1 do seem to take up a reasonable portion of what time that is.

Is that the value reported in the graph labelled “Primitive batches”? If so, it seems to hold steady at 7.

5 total nodes (including 0 instances); 0 LODNodes.
1 transforms; 40% of nodes have some render attribute.
2 Geoms, with 1 GeomVertexDatas and 1 GeomVertexFormats, appear on 1 GeomNodes.
8 vertices, 8 normals, 0 colors, 8 texture coordinates.
GeomVertexData arrays occupy 1K memory.
GeomPrimitive arrays occupy 1K memory.
4 triangles:
  4 of these are on 2 tristrips (2 average tris per strip).
  0 of these are independent triangles.
2 textures, estimated minimum 768K texture memory required.

I have no shaders active at all, I believe - “setShaderAuto” should not be being called, and I’m not doing any shader-work of my own, I believe.

You can disable all clears on a display region or window by simply calling disableClears().

Sorry, by “processor intensive” I didn’t mean “CPU intensive”. The time under “Draw” has to do with Panda passing the primitives to the GPU.

However, 7 batches isn’t much, and 4 triangles certainly shouldn’t cause an extraordinary amount of time. Are the triangles covering the entire screen? Does running your game in a much smaller window also make it significantly faster?
Maybe it is just your GPU being slow after all - can you run other OpenGL apps with decent performance?

Also, by the way, you did disable sync to vblank when performing your performance tests, right? “sync-video 0”

Ah, fair enough and thanks, on both counts.

They do (a little more than entirely in the case of one quad), and it does seem to help by a few frames per second (up from around 26-27fps to around 30fps); I take it then that fill-rate is a part of my problem.

Well, I’m running on Ubuntu Linux, so I presume that OpenGL is my primary renderer (perhaps my only one; I’m not sure).

That said, I don’t have many graphically intensive games installed at the moment, I don’t think, and at least some of those are Windows or DOS games run through WINE or DosBox, making them less than ideal test cases.

I do have a game called “Hedgewars” which runs at about 30-40fps, although I haven’t managed to confirm that it does use OpenGL; while this is in the region in which my Panda projects have been running thus far, they seem to be getting away with doing so with what seem to be multiple backdrop layers and what I imagine is a fairly large foreground layer.

I also managed to run a few OpenGL benchmarks using a tool called “GLOBS”; at least one test (a shadow test) ran at around 60fps, but others only managed around 30 (that said, I didn’t see a v-synch option, so it may have been limited by that).

It does look likely that it’s just my machine, doesn’t it? :confused:

Ah well, if there’s little more to do in order to squeeze performance from this element I might try adding it to the main project and seeing whether it bogs things down overmuch there.

Actually, I’d forgotten about that. ^^;

Nevertheless setting it didn’t seem to achieve anything.

Have you tried installing the latest drivers for your video card? What is your video card, and which drivers are you using?

My video card is the “Intel GMA 3150”, I believe; as to drivers, I’m not sure, but I seem to recall that driver support was a bit of an issue, and that I ended up getting a community-created driver that seemed to be recommended - this one, I believe.

I believe that I have Ubuntu set up to check for updates to the driver linked-to above.