How to specify GPU for OpenGL backend on headless machine

Hi,

I am running offscreen rendering on the headless machine with 8XA100 GPU. The rendering pipeline is glxGraphicsPipe. Can I specify which GPU to use? I want to parallel my rendering task on different graphics devices.

Besides this, I have a minor question. the rendered images on the headless machine are almost the same as those rendered locally but without shadow for all objects. I create the shadow through setShadowCaster and use simplePBR. I verified this problem on the A100 server and a 1080ti server, so I believe that it is a common issue.

Thank you.

It depends.

With EGL, you can specify a “device index”, using the egl-device-index Config.prc variable. It depends on your driver whether this mechanism is properly exposed. Set notify-level-egldisplay debug to see a debug output of the available devices.

If you’re using GLX (though that’s not headless, is it?) then you can look into setting the DRI_PRIME=1 variable in your environment to select the discrete GPU.

For NVIDIA + GLX, there are more specific settings available:
https://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/primerenderoffload.html

1 Like

Thank you so much @rdb. Yes, I am using EGL, but setting egl-device-index doesn’t work. The debug output is:

Known pipe types:
  glxGraphicsPipe
(1 aux display modules not yet loaded.)
:display:x11display(error): Could not open display ":0.0".
:display:egldisplay(debug): Supported EGL client extensions:
  EGL_EXT_platform_base
  EGL_EXT_device_base
  EGL_KHR_client_get_all_proc_addresses
  EGL_EXT_client_extensions
  EGL_KHR_debug
  EGL_KHR_platform_x11
  EGL_EXT_platform_x11
  EGL_EXT_platform_device
  EGL_MESA_platform_surfaceless
  EGL_EXT_explicit_device
:display:egldisplay(debug): Successfully initialized EGL display, got version 1.5
...
:display:egldisplay(debug): Chosen config 87: depth_bits=24 color_bits=24 red_bits=8 green_bits=8 blue_bits=8 alpha_bits=8 multisamples=8 back_buffers=1 force_hardware
:task(warning): Creating implicit AsyncTaskChain default for AsyncTaskManager TaskManager

Is this because my driver doesn’t expose the related interface?

It looks like you don’t have EGL_EXT_device_enumeration, so there is no way for Panda to enumerate the available devices.

If you set notify-level-glgsg debug in Config.prc, do you see the wrong device being chosen?

Hmmm, I can not see any errors by setting the debug flag for glgsg.

I mean, is the problem that GL_RENDERER is reporting the wrong graphics card?

Hi @rdb,

I turned on debug for glgsg. The GL_RENDERER doesn’t disclose the device ID. But by running nvidia-smi, I am sure that the program doesn’t run on the device specified by egl-device-index. No matter what the value of egl-device-index is, my program always runs on device 3.

The glsgsg debug message is as follows:

:display:x11display(error): Could not open display ":0.0".
:display:gsg:glgsg(debug): GL_VENDOR = NVIDIA Corporation
:display:gsg:glgsg(debug): GL_RENDERER = NVIDIA RTX A5000/PCIe/SSE2
:display:gsg:glgsg(debug): GL_VERSION = 4.6.0 NVIDIA 515.43.04
:display:gsg:glgsg(debug): GL_VERSION decoded to: 4.6
:display:gsg:glgsg(debug): EGL_VERSION = 1.5
:display:gsg:glgsg(debug): GL_SHADING_LANGUAGE_VERSION = 4.60 NVIDIA
:display:gsg:glgsg(debug): Detected GLSL version: 4.60
:display:gsg:glgsg(debug): Using compatibility profile
:display:gsg:glgsg(debug): GL Extensions:
...

Maybe there’s some way to update or reconfigure your drivers? I don’t see how we could select a device on the Panda end without the necessary EGL extensions.

There’s no EGL_EXT_device_drm extension either, which I would have expected for headless rendering.

Thank you, I will look into this wink: