quadro/optimus performance issue with offscreen buffers

I am having some serious performance issues with my Panda3D game on my new laptop. It uses the new NVIDIA Optimus feature to auto-switch GPUs. For now I’ve set the laptop into discrete graphics mode in the BIOS rather than rely on the auto switching. The discrete graphics is a Quadro 2000M which is almost as powerful as my desktop PC. I have the latest NVIDIA drivers for the card installed.
The problem seems to be specific to my game, which uses a lot of off-screen buffers. It is running at about 1 FPS with the VAST majority of time spent in “draw”. I tried reducing all the buffers to 64x64 but this did not help at all. Using the PRC setting “force-parasite-buffer 1” got me up to the 20-40 FPS range which is workable but not quite the performance level I was expecting compared to my similar spec desktop.
Why would the performance be so terrible when not using parasite buffers which are supposed to be slower?

Hmm, that’s a new one. Hmm, I don’t know, but this is the sort of problem that comes up from time to time in the world of 3-D graphics. I’m guessing that some operation is particularly slow in this particular driver. Presumably it’s an operation that Nvidia didn’t expect people to use often, but it’s an operation that Panda does in fact use when rendering to offscreen buffers.

More specifically than that, I can’t say, of course, without being able to experiment hands-on with your hardware. It could be related to framebuffer properties (antialiasing? multisampling? double-buffered?) or texture filtering (compressed textures? mipmaps? non-power-2?) or context switching or any number of other possibilities.

It’s also possible that it’s simply a driver bug that will (hopefully) be fixed in the next release. This sort of thing happens a lot, too, especially with brand-new hardware, which is commonly released before the drivers are ready.

If you could isolate one particular feature that causes this problem, it would help.


So far I have tried playing around with various Panda PRC settings, which didn’t have any benefit. I tried the previous certified driver from NVIDIA and also the manufacturer driver, both had the same issue. Also tried changing all the settings in the NVIDIA panel, disabling power saving, double buffer, and all that, still no effect.
I am using base.win.makeTextureBuffer to create my buffers, is it possible I might see a difference using the “low level” interface alluded to in the manual?

What kind of buffers are they exactly? I believe that you can print out the type of the buffers with type(buffer). Then we’ll be able to look more specifically.

Without the PRC setting “force-parasite-buffer 1” (when the frame rate is terrible), they are <type ‘libpanda.GraphicsBuffer’>.

With the PRC setting they are <type ‘libpanda.ParasiteBuffer’> and frame rate is lowish but playable.

I have an Asus N61J with Optimus set to use the Nvidia chip for 3d apps. My performance dropped noticably when I updated the driver yesterday.

In addition, a new bug cropped up. I periodically build a sequence of distances to objects and then use min() to find the closest object. This code broke and could not find the object that was indexed to it. Apparently, the VBase3.length() between two objects is now being calculated as a double precision float and min() downcasts the result to a single precision float. This wasn’t the case with the previous driver.

A small update on this situation.
Apparently if I delete the BAM cache before running the game, it will usually run at a much better frame rate (about 45 instead of 20) using the parasite buffers. Next time running the game, it is back down to 20. I am a little perplexed as to why this is.
I suppose I will have to build a stripped down program to properly diagnose this, but won’t be able to get to that for a week or so.

That’s charming. I did once encounter a driver that would initially (for the first 15 to 30 seconds) run at a lower frame rate, then suddenly kick into higher gear, even though nothing in the scene has changed. Perhaps that’s the phenomenon you’re seeing here, and deleting the bam cache simply forces the startup to take longer and thereby give the driver more chance to do whatever it does?


That is what I suspected as well.
I did try running in DirectX and the game runs fine using both parasite and regular buffers, but of course almost all of my shaders do not work in DirectX.

Is it possible that the buffer is being copied to system RAM? How would I check if it were or not? I ask this because I tried modifying some of my makeTextureBuffer calls to set the to_ram parameter True, but it did not seem to run any slower than it already was running at.

Hmm, well, I guess you can print the texture object and see if it says it has a ram image.


Well this certainly does seem to be a driver issue.
Looks like someone on the NVIDIA forum is having the same issue:
My guess is that the GL buffers are bypassing the dedicated GPU memory and going straight to system/shared memory. Not sure if there is anything that could be changed in Panda that could fix it.
Hopefully it will be resolved with a future driver.