HAVE_THREADS and performance

Hi,

I have recompiled panda3d with the HAVE_THREADS option, so I can load models in the background.

I just recently did some performance tests and found out that a build with threading enabled runs at about half the speed of a build with threading turned off.

First I thought that I somehow screwed up on my end (wrong opt-level, strange other options or so), or that VS2005 does something weird on my machine, so I benchmarked my non-thread version against the pre-built version from the panda3d site, and it was equally fast (or maybe even slightly faster).

So my question: is this performance drop I see normal? Did others experience the same?

I’m still hoping that I did something else wrong and that I can get the threaded version to run at a decent framerate, but if not I’ll need to find a different way to load my models in the background :wink:

Cheers,

Erik

its really hard to measure speed of panda3d. I propose a blue print here blueprints.launchpad.net/panda3 … anda3dmark.

I have compiled the new 1.5.3 on VC++ Express 2005 and from what i read it includes and optimizing compiler (the early free versions did not) but does not include profile guided optimizations of professional VC++ .

It would also be cool to see how panda3d compiles under LLVM llvm.org/ but without the panda3d mark tool i discuss int he blue print we will never know.

I think to get a big performance gain out of threading you really have to use the threading, for example to load textures asynchronously, or update your terrain in a different thread, etc. That’s why, in the default build, it’s disabled.
I believe 1.6.0 will contain a better system for threaded texture loading, but I could be wrong.

@treeform: unfortunately, your blueprint link doesn’t work… :frowning:
However in my case it wasn’t a problem to measure the performance. I have a drop of ~170Hz to ~70Hz.
That’s kind of noticable :wink:

@pro-rsoft:
I understand that I need to do things in a seperate thread to get the real bang-for-the-buck of threading. (I wanted to do the loading in a seperate thread)
But what I initially wanted to ask is:
Are others also experiencing that the threaded version of panda is MUCH slower than the unthreaded version? I was expecting a ~15% slowdown, due to the threading overhead, but I’m seeing a 100% to 125% slowdown.
I’m interested whether others are having the same slowdown or if it maybe was caused by the way I compiled panda.

Because with a 125% slowdown in the core panda rendering, I don’t see how I can get that time back from tasks in other threads

Erik

>>> 1./170
0.0058823529411764705
>>> 1./70
0.014285714285714285

so threading gives you a 10ms slow down? That might be reasonable.
[/code]

I haven’t measured recently, but I get more like a 10% drop in performance for enabling threads in the build. 50% sounds awfully high. There are lots of different factors, though.

One factor is whether you are running with a single-core or multi-core CPU. On a multi-core CPU, the cost is higher, because mutex operations (even a trivial uncontested lock) need to be synchronized between the two cores.

Another factor is CPU architecture. The cost is higher on a PPC than on an i386, because we can take advantage of a special i386 instruction to perform an atomic test-and-increment operation, which doesn’t exist (to my knowledge) on a PPC. There may be a better way to do the same thing on PPC beyond my knowledge, of course.

One easy way to get asynchronous loads without any performance penalty to speak of is to use SIMPLE_THREADS. This enables a special user-space phony-threading mechanism that won’t be able to use true parallelism (it will always be limited to single-core), but it can still simulate multiple threads by playing games with the CPU registers. And there is very little runtime overhead because we don’t have to worry about actually synchronizing different CPU’s.

David

Say, you didn’t by chance enable DO_PIPELINING in addition to HAVE_THREADS, did you? That setting is still experimental, and is known to reduce performance dramatically.

I’ve started compiling with HAVE_THREADS recently, true threading, and I do find it is about a 10% performance hit. It does depend on the nature of the scene, though–a mostly-empty scene will probably suffer a disproportionately high performance hit.

David