Multithreaded Render slower than singlethreaded?

Hey, i saw that testers are wanted for this, so…
I get very bad results with it.

OS: linux mint 11 x64
panda: 25th Oct Natty buildbot (mint is based on it)
CPU: Phenom 2 quad core
GPU: Nvidia GTS 250

dl.dropbox.com/u/14000546/DUMP/screenshot6.png

dl.dropbox.com/u/14000546/DUMP/screenshot7.png

I tried running several times (as red ships are randomly positioned, values are roughly same)

With 3 threads, it is even worse.

There are certainly many ways to design your application such that multithreaded pipeline does not help, and just as many ways in which multithreaded pipeline actually hurts. This isn’t necessarily a bug in the multithreaded pipeline.

David

Sorry, i am still newbie… Just trying to help :slight_smile:

I tried running several official samples, ALL that i tried have better framerate with multithreaded rendering disabled.

Also, from rougly everything, when using MT, python code in my app(s) needs roughly double amount of time with two threads (threading-model Cull).
For example inside my bullet class, i move bullet with setPos, this takes biggest amount of time.

I just tried to use 3 threads, panda actually crashes without any useful message.

In case you want to test it, code is here, 90kb

dl.dropbox.com/u/14000546/DUMP/src.7z

wasd to move, space to shoot, when bullet reaches its “end” app crashes.
config settings are at top of “main.py”.

By chance, do you have a single-core CPU? If you do, then multithreaded solutions will generally always be slower than single-threaded solutions.

The fact that you say that Python code slows down by a factor of two when you run with two threads strongly suggests that you have a single-core CPU.

I don’t know about the crash, though, that sounds strange.

David

No, i stated already in first post that it is quad core.

So you did, sorry. But when you run the test applications, do you see them using more than one core, or do you see them using 100% of one core only? I’m not sure what performance tools are available on Ubuntu to determine this, but there’s always “top”.

David

When running three threads, cpu usage is around 55-80% for each core, one core dominates, and during life of app, dominated core is not always same.
for example 55,55,60,80 or 50,60,60,78

When running 2 threads, cpu usage is around 30-70% for each core,one dominates, and one is lower. Activity across cores changes.
30,50,55,70

On single thread, one core is nearly maxed, as expected.

Hmm, interesting.

Still, most of the demo programs aren’t the sort of program that would benefit from multithreading. They’re generally one-sided, usually either trivial app and heavy draw, or vice-versa.

It’s generally only a real program, that involves heavy app, draw, and cull, that benefits most from this sort of thing; demo programs tend to emphasize only one aspect.

David

Yeah, i know, for example app that i posted, when running in single threaded mode has half time in App and other half is in Cull and Draw, so i thought it would be cool to use MT…

Hmm, yes. I’m investigating a bit closer. The crash you’re reporting is indeed a legitimate bug in Panda, triggered when a node with addPythonTag() is deleted, and I’ll check in a fix shortly.

As to the performance, my first impression is that the app is too simple to benefit much from the multithreading, and the performance benefit is getting dwarfed by the additional overhead. I’ll continue to investigate as I get a chance, to make sure I understand what’s going on.

Thanks!

David

Yes, in your app, most of the time in App is spent in the collision traversal, which much visit all of the nodes one at a time. These same nodes must also be visited one at a time in the Cull traversal, and since one node can not be visited simultaneously by two different threads, you have App and Cull waiting politely for each other to visit each of the nodes in turn. Thus, App and Cull both slow down by a factor of about two.

If you were doing more than collisions in the App traversal, or if your collision traversal didn’t have to visit the exact same set of nodes, you would probably see a better gain from going multithreaded.

But nothing appears to be broken, at least (other than the crash which I’ve just fixed).

Thanks again!
David

Thanks you very much for explanation, I misunderstood something earlier. I thought that data in Cull/Draw pass is full copy from past frame so it can be safely read by Cull/Draw thread.

It can be safely read, but first a lock must be briefly acquired to ensure it is fresh. It’s that lock that’s causing synchronicity issues.

David