How to get GPU memory pointer from `Texture` Object

Recently, I found some perf issue when using

self.win.addRenderTexture(self.screen_texture, GraphicsOutput.RTMCopyRam)

with 1080p images. I think it should be caused by the bandwidth to move data from GPU to system RAM.
Actually in our project, we need to encode the rendered frame with h264 and then stream to other place. And since NVIDIA GPU also supports hardware encoding, I am thinking if I use

self.win.addRenderTexture(self.screen_texture, GraphicsOutput.RTMBindOrCopy)

then the rendered frame should be on the GPU and the GPU encoder should be able to use it directly. The question is can I get the memory pointer of GPU buffer from the Texture object of Panda3D?

I don’t believe OpenGL provides a (standard) way to get the raw GPU memory pointer for a texture.

However, you can get the OpenGL texture handle from the TextureContext object that is asynchronously returned from a tex.prepare() call, via tc.getNativeId().

Thanks @rdb. I found some useful information here: OpenGL Interoperability with CUDA | 3D Game Engine Programming (3dgep.com) for how to get the device memory pointer of the OpenGL texture. I am trying to test E2E, but I am stuck by how to call text.prepare()? What parameter should I pass in for the prepared_objects? Do you have sample code for this? Also, can I use prepareNow for my case?

I am testing below code and get invalid resource handle when call cudaGraphicsGLRegisterImage(&cgr, handleId, GL_TEXTURE_2D, cudaGraphicsMapFlagsReadOnly) in C++.

        self.screen_texture = Texture()
        self.screen_texture.setMinfilter(Texture.FTLinear)
        self.screen_texture.setFormat(Texture.FRgba32)
        logging.info(f"Format is {self.screen_texture.getFormat()}")
        # self.win.addRenderTexture(self.screen_texture, GraphicsOutput.RTMCopyRam)
        self.win.addRenderTexture(self.screen_texture, GraphicsOutput.RTMBindOrCopy)

        self.gsg = GraphicsStateGuardianBase.getDefaultGsg()
        texture_context = self.screen_texture.prepareNow(0, self.gsg.prepared_objects, self.gsg)
        self.cugl_op.register_graphics_resource(texture_context.getNativeId())

If I create the texture myself with below code and use it, all API works fine.

    glEnable(GL_TEXTURE_2D);
    glGenTextures(1, &mTestViewGLTexture);
    glBindTexture(GL_TEXTURE_2D, mTestViewGLTexture);

    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 1920, 1080, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
    glBindTexture(GL_TEXTURE_2D, 0);

So is the return value from tc.getNativeId() a OpenGL texture handle, or it is something else?

Ah, I need to use self.win.addRenderTexture(self.screen_texture, GraphicsOutput.RTMCopyTexture) intead of self.win.addRenderTexture(self.screen_texture, GraphicsOutput.RTMBindOrCopy), now it is working.

From the doc, GraphicsOutput.RTMCopyTexture means it will copy from buffer every frame. Even though the copy happens on GPU, still want to avoid it. I am trying to register the buffer directly, can I get the offscreen render buffer handle? @rdb.

Sorry for the late response.

I am not sure what you want to get by “the offscreen render handle”. The OpenGL renderbuffer ID? The FBO container object ID? I don’t think we currently provide a way to get at either.

If you call tex.prepare(), then Panda will prepare it at the next frame. It returns an AsyncFuture object that you can await or register a callback with using add_done_callback(), or check using done() and result(). Please note however that Panda may recreate the texture the first time you render into it. So in any case you need to first render a frame or so to get meaningful results.

The difference between RTMCopyTexture and RTMBindOrCopy is that the former will create a renderbuffer and copy to a texture at the end of the frame, whereas the latter will have Panda3D render directly into the texture. In this case I suspect the former works because the latter may cause the texture to be recreated upon first being rendered into, and you are operating with an old handle, but this is just a guess.

The parameter to pass is self.win.gsg.prepared_objects (or win->get_gsg()->get_prepared_objects() in C++), as you have discovered.

I would generally recommend against prepare_now() since it assumes that the graphics context is still bound to the current thread. In a simple, single-context, single-threaded application, this may work fine. Otherwise you need to ensure this is done in a draw callback or manually bind the context yourself using win->begin_frame() and win->end_frame() (only available from C++, I think).

Thanks a lot for your response, @rdb!

Here I mean the OpenGL RenderBuffer ID because I see the cudaGraphicsGLRegisterImage API also support RenderBuffer object.

.

Hmm, that is very weird, I tried to print the NativeId for every frame and it does not change.

Now I am trying to achieve another task: I have data already on GPU and want to copy it directly into the Texture of panda3D and then apply the Texture to some object. In this case, I can’t run self.win.addRenderTexture(tex, GraphicsOutput.RTMCopyTexture), and it will fail when call cudaGraphicsGLRegisterImage. So what does addRenderTexture do under the hood to make the tex.NativeId valid for cudaGraphicsGLRegisterImage? Can I just do that for a Texture but does not add it to the render texture list?

How exactly does it fail when using cudaGraphicsGLRegisterImage?

It is possible that addRenderTexture ends up changing some texture format settings or something like that, but then it should be possible to match those on the CUDA side.

The error message it returns is CUDA Error: invalid resource handle.

Is the source code here: panda3d/panda/src/display/graphicsOutput.cxx at 2ae6e5c6cec3e9cd5b46c3151429fc48c7a5c336 · panda3d/panda3d · GitHub(Texture%20*tex%2C%20RenderTextureMode%20mode%2C? I don’t see any special things it performed :frowning:

I am reading the source code of TextureContext here: panda3d/panda/src/glstuff/glTextureContext_src.cxx at v1.10.9 · panda3d/panda3d · GitHub, is it the right place? There is a get_handle method I want to try, but looks like it is not exposed to python?

get_handle is not what you want, it is for the bindless texturing extension.

What is probably happening is that Panda generated a texture ID but didn’t actually create the texture yet (which happens when binding it for the first time).

Assuming you have set up the texture correctly (using setup_2d_texture) and you have then called prepare and waited for that to complete (or prepare_now), Panda should have bound the texture for the first time, and it should be usable in CUDA.

I suspect you first investigate whether that perhaps hasn’t fully occurred yet. You can also use a tool like apitrace to dig down into the OpenGL calls that Panda3D is making.

OK, previously I used prepare_now, after switch to prepare it works now. Thanks a lot for your help!
For prepare_now, even though it returned the id, but from apitrace, it does not really generate the texture with OpenGL API, unless you upload some data to it and set this texture to some object. This can work for fixed size texture, but in my case, the texture size changes every frame, so I have to use prepare.

Ah, yes. I forgot that the regular prepare also calls GSG::update_texture, which is necessary for the bind call to be made and the image to be initialized.

@rdb , another question: for the same code, it works well if I used simpleprb or the RenderPipeline, but if I used the default render pipeline of Panda3D, then I will get below result when I try to read the texture buffer.

I used below code to setup the render to texture, and assume the texture size is render_height * render_width * 4, is this assumption not right when use the default render pipeline of Panda3D?

        self.screen_texture = Texture()
        self.win.addRenderTexture(self.screen_texture, GraphicsOutput.RTMCopyTexture)

I’m not sure why you would get different results without more information, but it might have something to do with the fact that both those solutions render the scene to an FBO instead of to the main window directly.

Using FilterManager/CommonFilters in stock Panda3D might also return to the configuration of rendering to an FBO.