cull/draw, multiple render buffers, and threading

I came across this old thread discussing the rendering pipeline and how potentially cull and draw could run in parallel. Obviously that would be a huge performance gain on now-standard multicore systems.

That got me thinking about my project and how I might be able to more effectively make use of extra cores that may be available. I need to render multiple cameras simultaneously as part of the gameplay, so I figured that even if cull and draw need to be run in sequence, at least I should be able to have a thread for each render buffer.

It may even be possible to spawn a separate process of python/panda for each camera view, and have a “main” process that sends data to the minions, then somehow magically retrieves the renders to combine in a main view as texture cards for example.

Does any of this sound possible? Any other suggestions? I’ve put this in discussion as I suppose I am not looking for any particular answer, but hoping to generate some discussion on Panda and the possible uses of multiple cores for the purpose of increased rendering performance.

Also how does GraphicsThreadingModel play into this? It seems to imply that cull and draw could be run in separate threads.

Yes, this is all possible in theory. None of it is yet possible in practice; the threading support within Panda is not yet mature enough to support multiple operations in parallel operating on the same Panda data structures. Even if you are rendering two unrelated scenes, they still share many of the same data structures (the transform cache, RenderState objects, and so on), so you can’t run them in parallel unless they will play nice with each other.

Actually, some of it is possible in practice, but it isn’t practical: enabling threading support in Panda, which turns on the features that enables parallel threads to play nice with each other, does work today. And you can use the GraphicsThreadingModel to actually render things in parallel. The problem is that turning on this support necessarily causes some additional overhead; and it turns out that this overhead nearly precisely cancels out the additional performance gain that you get from running in parallel.

Some more research is clearly needed. It may be that we are right on the cusp of solving what is only a very simple problem to make all of this work the way it was originally designed and support true multicore rendering. Or, it may be that we are still far away from that goal.

I haven’t had any time lately to research this. With my previous employer, there was little interest in improving Panda’s support of multicore systems. My current employer, on the other hand, is interested in improving this support, but there are many other things to get to first. But it will certainly happen eventually.

David

Interesting!

I don’t know much about threading but it looks like something that might be fun for me to poke around in the code and experiment with.

I was actually trying to get my build with real threads to not crash so I could monkey around with this. I’ve spent some time to pinpoint the problem in the Panda source. It happens in executionEnvironment.cxx trying to get the “MAIN_DIR” special variable. If I edit the Config.prc and change the “model-path $MAIN_DIR” to something else, it will no longer crash. This is good enough for me, but I figured it might be good to document this problem.

I inserted some printouts into the problem area like so:

#ifdef HAVE_PYTHON
    cout << "START PYTHON PART\n";
    // If we're running from Python code, read out sys.argv.
    if (!ns_has_environment_variable("PANDA_INCOMPATIBLE_PYTHON") && Py_IsInitialized()) {
      cout << "  CONDITION TRUE\n";
      PyObject* obj = PySys_GetObject((char*) "argv");
      cout << "  STILL OK\n";
      if (obj) {
        cout << "    OBJ EXISTS\n";
        Filename main_dir = Filename::from_os_specific(PyString_AsString(PyList_GetItem(obj, 0)));
        if (main_dir.empty()) {
          cout << "      MAIN DIR EMPTY\n";
          // We must be running in the Python interpreter directly, so return the CWD.
          return get_cwd().to_os_specific();
        }
        cout << "    MAIN DIR NOT EMPTY\n";
        main_dir.make_absolute();
		cout << Filename(main_dir.get_dirname()).to_os_specific() << "\n\n";
        return Filename(main_dir.get_dirname()).to_os_specific();
      }
    }
#endif

Just opening up a panda window works:
test.py

from panda3d.core import *
import direct.directbase.DirectStart
from direct.showbase.DirectObject import DirectObject

class Game(DirectObject):
    def __init__(self):
        pass

Game()
run()

Output:

DirectStart: Starting the game.
START PYTHON PART
  CONDITION TRUE
  STILL OK
    OBJ EXISTS
    MAIN DIR NOT EMPTY
C:\work\sp1

Trying to load a model causes a crash:
test.py

from panda3d.core import *
import direct.directbase.DirectStart
from direct.showbase.DirectObject import DirectObject

class Game(DirectObject):
    def __init__(self):
        loader.loadModel('smiley')

Game()
run()

Output:

DirectStart: Starting the game.
START PYTHON PART
  CONDITION TRUE
  STILL OK
    OBJ EXISTS
    MAIN DIR NOT EMPTY
C:\work\sp1

START PYTHON PART
  CONDITION TRUE

At this point Python crashes on the PySys_GetObject call.

Hmm, it seems this method is being called from a sub-thread. That’s not surprising when Panda is built with true threads (actually it’s a little bit surprising in your simple program), but it’s a problem because it makes a Python call, which isn’t legal to do in a sub-thread unless you first acquire the Python GIL.

Noted. The right fix is probably to add some lines in here to acquire the GIL. There are just a few subtleties, though, which bear some thinking about. It’s unfortunately difficult to operate in the very low-level code like this because it doesn’t have access to the higher-level constructs like Panda’s Thread interface.

David

Interesting, so that was the problem all along. Beh. Maybe we should just move the MAIN_DIR stuff to a Python module, say, panda3d.py.