multi_core gain or pain ?

LBarret · January 9, 2008, 2:19pm

Hi all,
our team need to demo the speed gain of multicore CPUs (or at least their almost optimimum use).

Our app has three obvious (parallelisable) part :
1- Panda rendering,
2- GTK main loop,
3- IA.

But we need to setup the way the 3 parts communicates which each other : for example, the Panda part is the interface for selection, and affect the IA part (the abstract non-visual world).

I see 2 obvious strategy :

threads (recompiling Panda with havethreads, and then ?)
process and IPC (the python lib “processing” may help)

Any ideas ? I would be really glad to hear what you think, it’s a quite thorny subject.

rdb · January 9, 2008, 4:37pm

You don’t have to use threads. Pick one main loop of your choice (probably panda’s), and find for the other libraries the function that iterates through the main loop, add it to the taskmanager.

panda3d.org/manual/index.php/Main_Loop

treeform · January 9, 2008, 5:55pm

pro-rsoft he wants to go multi core.

I would try both options, but try the python process first.

bigfoot29 · January 10, 2008, 12:29pm

In older revisions of P3D, the threading stuff in Python was NOT multicore capable… You will need to run complete different python threads (indivdual started python instances) to be able to use something multi-core’d.

Tried this in the past with highly threaded stuff run by one single python instance. (python threads) - Always uses not more than a single core…

But that was with Python 2.3 - dunno if that changed with 2.5 or latest P3D releases…

Regards, Bigfoot29

enn0x · January 11, 2008, 7:10pm

The problem is Python itself. Python is not designed for running on on multiple cores. For detailed information google for global interpreter lock, aka the infamous Python GIL.

I think Guido van Rossum is right when he said that threading is not the only way to do concurrency, and that you should undo the brainwashing you got from Windows and Java. IPC (inter process communication) is a better way when developing with Python.

Some IPC projects you should look up:

Pyro (Python Remote Objects, easy to learn for our newbies)
Twisted (not so easy, but lots of features)
omniorb (Python CORBA module, the big thing

enn0x

rdb · January 12, 2008, 11:17am

Stackless Python might be of interest as well:
stackless.com/

Josh_Yelon · January 14, 2008, 10:51pm

Actually, what you might consider is panda’s built-in support for a separate cull and render thread. These threads are created from deep inside the C++ code, so they don’t need to be python-aware. Currently, the performance is bad, but I think it’s going to be improving a lot over time.

LBarret · January 15, 2008, 5:28pm

@pro_soft: stackless is exactly what we would need. But AFAIK, stackless use only one core.

@Josh Yelon : mmm…that could be interesting. I need to demo a performance gain, so that’s may be a bit early for that. A former colleague (game engine writer) told me that Boost+thread was a very good combination.

@ennox : Twisted use all the core available ? I didn’t know it very well but I could look into that.

Thanks all.
So far I use the processing library with a message based system. it’s not perfect, I simulated the stackless approach but at least it’s going forward.

enn0x · January 15, 2008, 9:04pm

Hmm… wrong question. The point with INTER process communication is that you create more than one process, and the processes talk to each other. Each process can run on another core of course. The cores don’t even have to be on the same computer! Call it servers and clients if you like.

With threading you have only one process, and inside this process you create threads which exchange information between each other and share some resources.

Neither way is an automatic speed gain. In both cases YOU have to make up your mind how the various concurrent activities exchange information, wait for each other and share resources like (memory for example).

About stackless: I heard that most C/C++ extension modules work well with stackless Python. Panda3D is basically a Python extension. So maybe it’s just a matter of seconds: copy the right stackless .dlls over the existing Python .dlls and try to see if it works (of course backup before, or reinstall Panda3D afterwards).

enn0x

LBarret · January 17, 2008, 8:46pm

@ennox :

I know the theory, my question wasn’t very good. I agree that when you talk about many-processes/one app, you implicitly talk about messaging between processes.
I don’t know much about twisted, And frankly I don’t see how it can be used efficiently in a interactive/panda app.

Stackless is one process, one core only.

enn0x · January 17, 2008, 9:59pm

I have to admit that I never have done some serious (big) in Panda3d, since I am lacking spare time for this hobby, but this could be an way to utilize IPC (assuming a traditional RPG or FPS game):

Process 1: 3d rendering, GUI and accepting user input. Panda3D obviously.
Process 2: collision detection, physics and kinematics (movement controlled by direct input, which can be either user input or AI input). Could be Python and ODE or PhysX, and no Panda3D.
Process 3: AI and game logic (maybe more than one process for this). No Panda3D needed for this Process. Also a highly parallel process.

What has to be exchanged between the processes each frame:

P3 (AI) give a desired velocity for each active NPC to P2.
P1 (user input) gives one desired velocity to P2, for the player character.
P2 gives actual position & orientation to P1 for each object that has moved.
This listing is not complete of course, but the amount of information passed is not very much (a few bytes per game “object”), and it can be reduced by clever filtering (“active” objects). I even think that P2 and P3 can run at other “frame rates” than P1.

Stackless: As far as I understand Stackless Python it is not bound to one core, since it has no GIL. Could be interesting.

Hmm… finally it could be interesting to set the Panda3D config variable “lock-to-one-cpu” to false. I think Panda3D (the C++ core, not the Python layer) can create threads which can be move to other cores by the OS. But I am just guessing here. Someone who test it or reads through the code is required.

enn0x