Performance optimisation: Where am I going wrong?


I’m having some trouble finding the source of a performance issue that I’m encountering, and I’m hoping that someone here will have some insight.

In short, some areas seem to have conspicuously low frame-rates–lower than I’d expect, given the scenes. Other areas, however, seem to have a far lesser impact on performance.

I’ve spent some time hunting for the source of this problem, and indeed have made some performance gains via optimisations discovered along the way. However, the core issue remains–indeed, I have a few views in the game that don’t always reach sixty frames per second, I believe.

Looking at PStats, the GPU-specific graph seems to show a large chunk of time in ->“Draw”->“window1”->“dr_0”.

No one category takes up the same amount of time in the general graph, I believe. The largest here seem to be “Wait”->“Flip”->“End”, and a similar amount split roughly evenly between “App”->“Bullet” and “App”->“Show code”, the latter being dominated by “garbageCollectStates” and my main update task. However, by that stage we’re talking about fairly small durations–“garbageCollectStates”, for example, only uses about 1.7ms in a “problem room”.

I do seem to have more nodes than I’d expect: around 190-210, with a little over half being Geoms, and depending on where I stand, I think.

One value that I slightly suspect of being relevant is the number of render states: I’m seeing somewhere around 450, according to PStats.

Another that I’m uncertain of is the vertex-count: in one problem room I’m seeing around 450 000-600 000 being reported in PStats–but I honestly don’t know whether that’s a high number or not. It does seem that “non-problem rooms” tend to have lower vertex counts–but I’m not sure that I have enough examples to be confident of that.

Attempts to compare stats produced when looking at views that give various frame-rates hasn’t yet produced anything terribly compelling–to my eye, at least. :confused:

There are two things that might muddy the waters:

First, I have antialiasing active. Disabling it does, naturally, improve my frame-rate–but “problem rooms” seem to remain significantly slower than others, if I recall correctly.

Second, I’m working in Ubuntu Linux, and I’ve come to suspect that my drivers there might be a minor issue–although it’s hard to tell with confidence.

My thanks for any help! :slight_smile:

Firstly, since time is spent in flip, make sure you rule out vsync as a factor. Make sure it’s disabled on the driver level and the application level. Also, switch to measuring milliseconds, not fps, since fps is a non-linear indicator of performance. You can set “frame-rate-meter-milliseconds true” in Config.prc to display this.

If a lot of time is being spent in draw and flip, then that suggests that you might be driver-bound or GPU-bound, since the CPU is waiting for the GPU or driver to keep up. A lot of time spent in draw can also, however, indicate general draw overhead due to the driver not being able to process the batches quickly enough, or too many state changes being made.

It could be that you’re fill rate bound. This means that the application is asking the GPU to render too many fragments, or it is too slow to render each fragment. This could be if you have lots of transparent planes, such as blades of grass.

An easy way to find out whether this is the case is to check whether resizing the window also proportionally increases the frame rate. If you cut the window size in two, you are also halving the number of fragments to render. If you see a significant improvement in your frame time, you know that this is your bottle neck.

If this is the case, then it could be that your pixel shaders are too complex, using too many interpolants or instructions, or that you should simply reduce the amount of geometry that’s in view at one time. Front-to-back rendering really helps if it concerns opaque geometry.

If this is not so much the case, then we must look at other things. State changes and too-many-geoms are common issues, but you indicate that you don’t have so many; it might still be good to look at what is causing the state changes in these areas to reduce them, however. Too many vertices is unlikely, but combined with a particularly heavy vertex shader this can still be an issue.

Finally, I’m currently working on a big performance patch for Panda, so if you send your program to me there may be additional bottlenecks in Panda that I can identify using a C++ profiler.

Thank you for the response! :slight_smile:

(By the way, posting on another forum that lacks it has resulted in me being somewhat appreciative of the “selective quote” functionalty on this forum!)

I’m pretty confident that v-synch is off: I believe that, across various views in the game, I’ve seen frame-rates in the fifties, sixties, seventies, eighties, nineties, and hundreds.

Ah, I didn’t know about the “milliseconds” option–I’ve switched over to that; thank you for mentioning it. :slight_smile:

Hmm… I do think that I’ve seen better performance under Windows than Ubuntu Linux (the latter of which being my development environment); still, I think that I recall seeing variations in performance–if less worrisome ones–even under Windows.

Hmm… I’m pretty confident that I have very little transparency in the room in question. In fact, I think that I have only one transparent thing, off to one side.

This does seem to have a significant effect–but only when antialiasing is enabled.

Before I discovered the latter part of the above, I did try coring out the majority of my main vertex- and pixel- shaders. This had some effect (as one might expect), but not a huge one, I think.

How do I find the causes of these state changes?

(I’ve tried combining elements of the environment that share materials, and having materials share textures, but I’ve seen little improvement from what I’ve managed thus far, I believe.)

A few more things that I noticed while experimenting in response to your post:

Even just a few “steps” forwards or backwards can have a noticeable effect, with frame-times ranging between ~20ms and ~15-16ms.

However, perhaps most salient is that disabling antialiasing (by overriding it in the NVidia control panel) produces a significant improvement in performance. It even seems to make the frame-times a little bit more stable, reducing that “few steps back or forward” effect.

The improvement in performance is, of course, somewhat expected. However, combined with window-resizing no longer having an effect when antialiasing is disabled and the seeming increase in stability of frame-times, I now wonder whether the antialiasing is a part, at least, of the problem.

I tried replacing my walls with simple quads, hastily UV-mapped and given the material assigned to the walls–and this resulted in a significant improvement in performance.

Between this and the effect of disabling antialiasing, might the antialiasing algorithm be affected by the geometry of the scene being processed? I wouldn’t expect so–but then, I’m not terribly familiar with modern antialiasing algorithms.

Thank you–I appreciate that. :slight_smile:

Hmm… My program is, however, rather big and unwieldy at the moment–not to mention the various assets involved. On top of that, I’m a little anxious about sending it out–silly, I know, but nevertheless. ^^;;;


Any further thoughts? Especially regarding seeking out state-change issues (I’ve checked the manual, but that portion remains incomplete, it seems), and the possibility that the problem may be related to antialiasing?

I recently learned that setting shader inputs via a lerp function interval can pomp up render states really fast, and eventually kill you framerate and/or interpreter process. Maybe you’re also doing something like that?
Just because you didn’t set transparency on any objects doesn’t mean you don’t have transparency enabled for some objects, all it takes is a texture with a alpha channel. Make sure you check this.
If AA give you trouble, maybe consider using fxaa? It’s a one step post-process filter with just the original frame as an input.

Hmm… I don’t believe that I’m setting shader inputs via an interval of any sort. I do have several shaders, but I think that only two (perhaps three) should be applied to the objects that I expect to be rendered in the views in question.

I do have some objects that use “setShaderInput” to update shader-input values with each frame; this is done via an update task, I believe. Is that similarly a bad idea? If so, is there another means of providing to a shader data that changes on a frame-by-frame basis?


I believe that I did check (in a previous round of performance-issue hunting) by setting “show-transparency 1” in my PRC file. The expected transparent objects flashed, and the opaque ones didn’t, as I recall, so I’m reasonably confident that the opaque textures don’t have superfluous alpha channels.

Hmm… I think that I’ve heard that FXAA results in the image looking blurry overall, which doesn’t sound appealing. :confused: (Unless there’s a version that doesn’t have this issue?)

All right, after a bit more testing I think that I am indeed going to chalk this up to the antialiasing.

As I noted above, I’ve seen some dislike for FXAA expressed a few times, so I’m hesitant to include it; the recourse for players who–like me–find performance dropping as a result of antialiasing would presumably be to simply turn it off. (But then see my thread regarding in-game toggling of antialiasing no longer working.)

Rdb, Wezu, thank you both for your responses! :slight_smile: