Copy pre-rendered textures to back buffer?

otraupe · January 27, 2014, 5:49pm

I am new to Panda3D but not totally new to creating visual stimulation code. I have worked with Matlab and the PsychToolbox for conducting psychophysical experiments for a while. Now I need to get my Panda code running accurately as we are planning to include 3D virtual reality into our lab environment. But for starters I try to get comfortable with 2D.

To my understanding, the fastest way to present a 2D image is to pre-render it into an “offscreen window” (such as a GraphicsBuffer), derive the result e.g. with the getTexture method) and copy it directly onto the back buffer (before igLoop initiates the flip).

This should not be a problem if the GraphicsBuffer and the back buffer share the relevant properties. However, after half a workday of searching the web I haven’t found a single solution for that in Panda3D. Is there any?

I have read about the ‘auto-flip’ option and the possibility to set the render priority below that of ‘renderFrame()’ (i.e. <50, if remember correctly). But I would like to do it in a manual way to be sure my image data is prepared in time and presentation deadlines are reliably met.

Also I would be interested in a way of rendering into a GraphicsBuffer without the necessity of invoking a camera (something like ‘render2d’).

Any help is greatly appreciated, as in parallel I have to familiarize myself with object-oriented programming, which doesn’t make the job easier for me.

otraupe · January 29, 2014, 3:33pm

No ideas? Or have I understood much less about Panda than I thought? Anyways, I will put it more simple:

I got huge delays (multiple frames, up to 90 ms) in the past even when I present simple pictures with Panda, e.g. with ‘direct.gui.OnscreenImage()’. But when I use a - also OpenGL-based - Matlab toolbox with the same hardware, the timing is fine, i.e. I can present images on the very next frame (with a delay of ~16 ms). I measure delays with a photo sensitive diode (“photo transistor”) powered via the parallel port.

Here is my current Panda code (what is to me the relevant part of the run() method; if you are interested in Matlab as well, let me know):

        #create texture from image
        tex = loader.loadTexture("checker_1024x1024.png")
        
        #prepare card with image data
        cm = CardMaker("Checker")
        card = render.attachNewNode(cm.generate())
        card.setTransparency(self._engine.pandac.TransparencyAttrib.MAlpha)
        card.setTexture(tex)
        
        #prepare looping vars
        iter = 120
        frameDuration = zeros(iter-1, int)
        
        for i in range(iter):
            
            #image onset every 12th frame + measurement turned on
            if i%12==0:
                card.reparentTo(render2d)
                base.graphicsEngine.renderFrame()
                #power measurememt curcuit ON - steep flank upwards
                self.p.Out32(self.parAddress, 255)
            #image offset after 3 frames
            elif i%12==3:
                card.detachNode()
                base.graphicsEngine.renderFrame()
            #after 7 more frames measurement is turned off
            elif i%12==10:
                base.graphicsEngine.renderFrame()
                #power measurememt curcuit OFF - steep flank downwards
                self.p.Out32(self.parAddress, 0)
            #standard procedure for other frames
            else:
                base.graphicsEngine.renderFrame()
                
            #flip should display image immediately after power ON, right?
            base.graphicsEngine.flipFrame()
            
            flipTimeStamp = datetime.now().microsecond
            if i>0:
                #reveals loop durations of roughly 1 frame (17 ms)
                #flipFrame obviously waits for the actual flip to return
                frameDuration[i-1] = flipTimeStamp-lastFlip
            lastFlip = flipTimeStamp

This Python/Panda code still produces an incredible delay of ~74 ms. I attach the result figures, the dip in the middle of each “bump” reflects the response of the photo transistor to the image presentation (decrease of resistance).

However, from the (left) Mablab data it is evident that the hardware is able to reliably present images right away. The input lag of the display is ~9 ms (according to the web and my own observations) and the initial parallel port flank of each measurement kicks in pretty much in between of two flips.

I would be totally happy if you could help me answer 2 questions:

What could be the reason for this delay?
How do I avoid it?

Thanks for reading all this!

otraupe · January 30, 2014, 9:43am

Still no ideas for me where to look or improve? Or did I do something wrong here? I don’t post in forums very often. So in case I stepped on so’s toes with something, I apologize!

I also know that Panda is not intended for my purpose. It is only that somebody created a very nice open source toolbox based on Panda for ease-of-use scientific experiment programming. It is very convenient, but it comes with a subtantial lag (see above). And now I realize that this probably is not due to the particular implementation but to a more general display logic of Panda. Or is it?

rdb · January 30, 2014, 10:32am

You may have already figured this out, but renderFrame flips the front and back buffer at the beginning of the call, and therefore anything you draw before that will not be displayed until the next call to renderFrame.

OnscreenImage is a high-level wrapper that creates a card and then puts it in the scene graph. The delay when using that probably comes from generating the card. Using CardMaker to generate your own card that you just reuse whenever you need to put an image in the scene graph is probably a better way to go.

I don’t really understand your approach, so I’m not really sure what kind of delay you’re talking about. Are you talking about the delay of renderFrame in general? If so, there are many things that go on that you could perhaps disable or limit (such as v-sync), and you could use PStats to find out what’s taking up so much time.

I don’t know if it will be possible to copy the contents of one buffer straight into the other. By which OpenGL call would this be possible?

otraupe · April 14, 2014, 3:16pm

rdb, thanks for your effort in trying to get through my lengthy post! Also, I apologize for not getting back to you for that long. I just forgot about this topic because of my schedule being filled with other frustrating stuff.

Yes, I figured out the particular nature of renderFrame, which causes 1 frame delay if not taken into account otherwise. And I tried to achive good timing by using my own card maker (see my code in one of my posts), just as you suggested - without any improvement. Or is something wrong with my approach?

I am talking about each and every frame being delayed about 90 ms - which is 5 (!) frames and a bit - when I use Panda3D for the presentation.

Actually, I can’t believe that…
a) Panda3D as a gaming engine is really THAT laggy and there isn’t anything you can do about it
b) I am the first one noticing this (at least I found no further discussions on this issue)

Could someone chime in who has knowledge on this from the developers side?

Thaumaturge · April 14, 2014, 5:00pm

I may be misunderstanding your purpose here–are you talking about double-buffering? If so, are you sure that you’re not attempting to do something that Panda is already doing behind the scenes?

If I may ask, where does your 2D image come from, and how often does it change? Could you perhaps just create an OnscreenImage, then call “setImage” when the image changes? Something like this:

#In your initialisation code
self.view = OnscreenImage(image = initialImage) # Presuming that there's an initial--or at least a stand-in--image at startup

#When the image changes:
self.view.setImage(newImage)

I’m imagining the above using either an event to signal the change in image, or an update task polling for changes.

I may very well be missing a point in your situation, however, and my apologies if I am. ^^;

rdb · April 15, 2014, 7:29am

I’m not sure why your frames would be delayed so much, to be honest. I can’t say that I’ve got too much experience with this particular kind of stuff, but I’ll try to help wherever I can.

Just to make sure I understand the issue properly - you’re observing a latency between updating the content in the scene graph and seeing the pixels update on the screen, is that correct? I suppose this could either be due to Panda not sending your changes to the GPU quickly enough or the rendered frame not being sent to the monitor quickly enough, or a combination thereof.

In general, Panda is probably designed to prefer high throughput over low latency; which is probably common in game development where latency is not all that important compared to frame rate. An exception to this is VR, and since I’ve been working on Oculus Rift support, I suppose I now have extra reason to investigate potential latency issues in Panda.

Things you might try to see where the latency comes from are disabling or enabling sync-video, disabling back-buffers, enabling gl-finish in Config.prc, and enabling auto-flip which makes render_frame flip it before returns (although I guess that would be moot if you disable double buffering). Having some indication of where the frame gets held up would probably help here.

You could also try setting allow-incomplete-render to 0. This makes Panda wait if not all of the textures or geometry is available on the GPU before actually rendering it; normally, with the default of 1, Panda prefers to keep on rendering without the data being resident to avoid lag spikes, but I can imagine that it might serve to increase latency in this case.

Also keep in mind that you don’t enable the multi-threaded rendering pipeline, since it may serve to increase latency as well.

As for your original question, here’s another angle: is using PyOpenGL out of the question for you? There’s nowadays the possibility of adding a callback to a DisplayRegion (or a scene graph node) that would allow you to set up a simple renderbuffer and blit it into the main framebuffer whenever Panda renders that PandaNode or DisplayRegion. It might not be as elegant as doing it natively in Panda, but it might work more reliably.

otraupe · April 22, 2014, 2:49pm

Yes. See earlier posts.

I would rule out the latter, as the combination of Matlab/PsychToolbox (including OpenGL-based image presentation) does not exhibit such a lag - on the same machine without any changes to hardware, drivers, Windows settings, or other software.

I understand that. If the final answer is “faster is not possible” I will have to accept that.

This is interesting, as we are planning on using the Rift as well. Would you mind keeping me posted on your own findings?

Things you might try to see where the latency comes from are disabling or enabling sync-video, disabling back-buffers, enabling gl-finish in Config.prc, and enabling auto-flip which makes render_frame flip it before returns (although I guess that would be moot if you disable double buffering). Having some indication of where the frame gets held up would probably help here.

You could also try setting allow-incomplete-render to 0. This makes Panda wait if not all of the textures or geometry is available on the GPU before actually rendering it; normally, with the default of 1, Panda prefers to keep on rendering without the data being resident to avoid lag spikes, but I can imagine that it might serve to increase latency in this case.

Also keep in mind that you don’t enable the multi-threaded rendering pipeline, since it may serve to increase latency as well.

Thanks for all the ideas! I will take my time to go through them. But - before I do this - could anyone have a quick look at my code which I already posted above? Just to confirm that there isn’t anything seriously wrong?

I use a cardmaker and basically the only two things I do in a frame-wise (looped; one cycle takes appr. 16.7 ms) manner are attaching/detaching a card to a render node and powering on/off my measurement device (aside from issuing an additional flip to achieve “immediate” buffer swap). The code is commented and should be easy to understand.

Sounds interesing! If this is reliably faster, it is the method of choice. Particularly, as I assume the smaller renderbuffer could even be of fullscreen size.

otraupe · April 29, 2014, 11:57am

Update: I just found out, that the lag depends on the graphics card AND driver revision. I will try to look into the relevant differences.

Also auto-flip and gl-finish (enabled) each speed up the render/display process appr. 1 frame, whereas allow-incomplete-render doesn’t do anything. sync-video and back-buffers are essential to my attempt (and purpose) and disabling them practically made my code fail (even if the screen indeed was updated very fast).

But more important, there was a flaw in my code: enabling auto-flip doesn’t do the same thing as calling render_frame twice (which is the reason for auto-flip having an effect in this case).

Alltogether (GPU, driver, auto-flip, gl-finish) I am down to 24 ms between my stimulation marker and the first visual changes on the screen, which is an acceptable value, after all.

Thanks for the very valuable tips!