GPU thrashing

I’m getting long (300ms) delays, and pstats tells me I have “graphics memory” thrashing.

My situation:

I have 8 (off-screen) GraphicsOutputs, bound to textures (using RTM_bind_or_copy). Lets just call these TexCs.
The final shader uses all 8 TexCs to make the final (on-screen) image.

However, I do not update the TexCs every frame. Sometimes 0, or 1 are updated, but sometimes all 8.
I use set_active(false) on a GraphicsOutput when I don’t want to render that texture in a frame.

To complicate things further, to generate each TexC there are 2 custom shaders that are run. I.e.
I have 8 TexAs which feed into 8 TexBs which feed into 8 TexCs. When I want to update one TexC, I set_active(true) on all the corresponding ones.
The reason I have 8 TexAs and TexBs is so that its possible to update all in a given frame if needed.

Note each one is 102410243 bytes

The problem:
These textures (around 75MB) should all fit in graphics memory right? [I have 256MB of on board graphics memory + more shared]
However pstats tells me that only 14MB is resident (active or inactive) and 12MB are nonresident.
When I try to enable more than 2 of my 8 update paths, I get “trashing”. The shader for TexC which usually takes around 2ms suddenly takes around 300ms.

Note: It’s using OpenGL on Windows in C++.

Any suggestions?
Is there some way to suggest to the graphics driver not to evict my memory?
Or is having 24 bound GraphicsOutputs to many? How else could I do this?

An offscreen render buffer takes up considerably more graphics memory than an ordinary texture. In addition to the 3 bytes for each of RGB, you also have alpha, stencil, multisample, and depth buffers, as well as whatever auxiliary buffers. Plus who knows what else. Multiply all that together times 8 times 3, and you could easily exceed 256 MB.

One solution is to use ParasiteBuffers instead, which all render into the same buffer, one at a time, and copy the results into a texture. Another solution is to reduce the size of your buffer to 512x512, which will use only 1/4 of the memory. Another possible solution is to try to reduce the memory requirements of all your offscreen buffers by not requesting all of the auxiliary buffers and whatnot, but this isn’t really reliable.

Note that PStats doesn’t know about the size of your offscreen buffers, and the size of the textures is only a guess (which is probably wildly inaccurate in the case of rendered textures, since it doesn’t know about the excess baggage). But the reported texture thrashing is real: this is accurate because Panda knows when OpenGL evicts a texture from graphics memory, even if it doesn’t know how big that texture is (OpenGL doesn’t provide a way to ask that question).


Hmm. So a GraphicsOutput has all those things even if you don’t add_render_texture on them, or the shader doesn’t output them?

Previously I had:

PT(GraphicsOutput) newBuffer = graphicsEngine->make_output(framework.get_default_pipe(), "name", -100, FrameBufferProperties(), WindowProperties::size(1024, 1024), GraphicsPipe::BF_refuse_window | GraphicsPipe::BF_fb_props_optional, graphicsWindow->get_gsg(), graphicsWindow);

newBuffer->add_render_texture(theTexture, GraphicsOutput::RTM_bind_or_copy, GraphicsOutput::RTP_color);

However, if I change BF_refuse_window to BF_require_parasite, and RTM_bind_or_copy to RTM_copy_texture I get a different output. I’m wondering if this is because the texture is slightly bigger than the window. Changing the size of this texture isn’t really an option.

Is there some way to share an off-screen buffer as a parasite buffer instead of the on-screen one so I don’t have this size restriction?
Or some way to make a single GraphicsOutput render multiple things in a frame?

Yes. Create one off-screen buffer, then create all of the parasite buffers by passing that buffer in to make_output() instead of graphicsWindow.


Thanks David. I did as you said, but the parasite-buffers in my host-buffer still seem to get rendered to the main window, creating lots of flickering.
(Also, if I don’t attach any render_texture to the host-buffer if gets rendered to the main window too)

The host-buffer was created with GraphicsPipe::BF_refuse_window | GraphicsPipe::BF_refuse_parasite
(I tried all combinations)
So I don’t see why any of it should be seen on screen.
It’s definitely not just stuff that happens to be in one of the textures which is rendered to the final output.

When I disable the host-buffer, it stops rendering all the parasite-buffers, which makes sense. So its doing something correct.

In fact, if I instead create the host-buffer with BF_require_window it seems to work, except I have another window on screen (which I don’t want)

Hmm, that sounds rather like a bug in Panda. Perhaps it’s not setting the context correctly when switching to a ParasiteBuffer that’s attached to an offscreen buffer. It’s true I’ve never actually tried doing that before.

But even if it did make that mistake, it still shouldn’t cause flickering if you set the sort on all of these buffers so that they all rendered before the main window. As long as the main window renders last, you shouldn’t ever see anything, because the main window clears its buffers before it starts, and all of the drawing is made to the invisible back buffer and only revealed when the main window calls flip().

So, are you sure you have set the sort values sensibly? And, you can prove which kind of buffer you got–parasite or otherwise–for the toplevel offscreen buffer by examining buffer->get_type(). What does that return?


It’s almost like the rendering is happening on the front buffer, as I see all the buffers flicker through very fast, with a tearing like look - Not just the last one.

The (offscreen) host-buffer I create returns a get_type() == GLGraphicsBuffer.
My 8 “child” buffers I create using the host-buffer all return get_type() == ParasiteBuffer

My host-buffer has a sort of -150, and the child buffers in the range -100 to -80 or so.
I checked the main window’s sort. It is 0.
(I experimented with other sort values, but none stopped the flickering)

Also, I tried updating from 1.7.0 to 1.7.1. No difference.

Hmm, it does sound like Panda is rendering to the wrong buffer. That’s unfortunate.

I don’t suppose you can create a main window that’s 1024x1024 or larger as a workaround?


It seems in GraphicsEngine.cxx, make_output function, I see the lines

if (host != 0) {
host = host->get_host();

Doesn’t this mean it is using the host of the host?
Which in my case would be the main window, which might explain it whats happening.

GraphicsBuffer::get_host() is defined to return the GraphicsBuffer itself (this), not anything else. It’s only ParasiteBuffer::get_host() that returns a different object from itself. So, host->get_host() is defined to be either host itself, or if the host is another ParasiteBuffer, then the ParasiteBuffer’s host.


I found a hack, which allows me to share render buffers with a size larger than the window.

I make another window as large as I need, and hide it:

GraphicsWindow *graphicsWindow = framework.get_window(0)->get_graphics_window();

PT(GraphicsOutput) hostBuffer = graphicsEngine->make_output(framework.get_default_pipe(), "host buffer", -200, FrameBufferProperties(),
			WindowProperties::size(width, height),
			GraphicsPipe::BF_require_window | GraphicsPipe::BF_refuse_parasite | GraphicsPipe::BF_fb_props_optional,
			graphicsWindow->get_gsg(), graphicsWindow);

(which was working fine as before)

But now I just hide the window.
I’m in windows, and use:

GraphicsWindow *windowBuffer = dynamic_cast<GraphicsWindow*>(&*hostBuffer);
if(windowBuffer) {
  size_t winHandle = windowBuffer->get_window_handle()->get_int_handle();
  ShowWindow((HWND)winHandle, SW_HIDE);

Also, I found it necessary to attach a render_texture to the hostBuffer, otherwise any parasites of your hostBuffer don’t see to work.

One problem is that I still get a single flicker as the window is created and hidden.

I’m sure there is probably a tidier solution?

I think I solved it! I managed to make an offscreen host buffer.

There were two important things I had to do to make it work.

Firstly, the default FrameBufferProperties constructor is not sufficient. I have to use

FrameBufferProperties props;

Secondly, I found you have to pass NULL as the host, even though I’m refusing a parasite.

PT(GraphicsOutput) hostBuffer = graphicsEngine->make_output(
    WindowProperties::size(1024, 1024),
    GraphicsPipe::BF_refuse_window | GraphicsPipe::BF_refuse_parasite,

Now I am able to make multiple parasite buffers using this off-screen hostBuffer, which can be larger than the main window.

Ah, interesting! These sound like they might be bugs in the offscreen buffer creation code.


A small note:
I also had to add


To get the alpha channels working in the parasite buffers.