Test with ODE and Shadowmapping: is this too much for Panda?

I have a really simple scene with a trimesh (~4000 triangles) and as many spheres as we want (we can add a new sphere left-clicking on the terrain). ODE handles collisions between spheres and ground, and there’s also some shadowmapping. You can try all this downloading this file: megaupload.com/?d=9Y3EAFPD (sorry, I do not have a server, so I had to use megaupload for just 45Kb)

The code is quite ugly, I just mixed three examples in that file (the chessboard demo, an ODE example and shadowmapping). Spheres are not really spheres but boxes, so they don’t roll forever. That’s not a problem.

I seriously need to know if this is a natural limit for Panda3D or if I’m doing something wrong (which is quite probable!). When I reach 60 spheres, I get 26fps. That’s not much. If that’s a ‘natural limit’, I should forget about any kind of post-processing or complex AI.

Is it because of Panda3D? Is it because of Python? Is it ODE? Is it my code? I just don’t know, but I wish it were my code, because I really would like to use Panda for this project.

I havent check out your code but you do have ways to see whats slowing down your scene.

panda3d.org/manual/index.php/M … ith_PStats

Is a build in way to see whats really slowing your scene down. It could be a number of things from collision to too many pixles.

From there if its to many pixles you can use mipmaping. Or to many collision, maybe only do so many at one time or something.

Ok, I launched the scene with PStats, and I get the following results. I’ll explain them in percentages. Well, 67ms is the vertical top of my PStats graph, so you can get an idea.

The graph is something like:

*: 10% (generate text, but negligible)
Wait: 25% -50% (thread block)
App: 25% (most of it, it’s render_frame)
Cull: 20% (15% window1, 5% offscreen buffer)
Draw:25% (15% window1, 5% offscreen buffer)

So… I still have no idea what’s going on. Rendering seems to take most of the time, but that’s normal: it is the only thing I’m doing, apart from ODE (but ODE is almost invisible in the App graph) and Wait is required, right?

This machine is a Dell Studio 15, Intel Centrino 2, 2.27GHZ, 4Gb RAM, Ati Mobility Radeon HD 3400 with Vista, DX10 (and Ubuntu). On the other hand, we get 50fps with 150 spheres on an AMD 64 DUAL X2 2.6GHZ Y 2HB RAM with a graphics card under 60€ (with XP)

But I can play Half Life 2 with no problems at all :m There must be something I’m doing wrong…

Any idea? Is there something wrong in the code? Is it my machine? Vista, perhaps? I have to try it in my Ubuntu…

I’ve not researched on the performance of ODE in panda is reasonable or not. But you can download demomaster to compare its performance on your machine. Demomaster comes with several ODE demos

I run ODE demos on my Windows XP, NVidia 9500:
ODE demo 1, around 105 geoms, (click drop box/spheres 5 times), I get 40 fps.

ODE with shadow manager 1, around 85 geoms (click drop box/sphere 4 times), I get 38 fps

ODE with shadow manager 2, around 85 geoms (click drop box/sphere 4 times), I get 31 fps

ODE Car demo, when no particle is emitted (no smoke) and the car is at rest, I get around 42 fps.

Ok, I also tried Demomaster, and my framerate on an Xeon 3Ghz is similar to yours, clcheung. I have to try it in my laptop this afternoon.

So… that’s it? Those are Panda’s limits, right? My only hope is that Panda runs in some kind of “debug mode” by default, so changing to “release” mode speeds up things. I read something about optimization levels in the PStat page, I’ll look for that… or I’ll switch to Ogre in C++ :’(

It’s very unlikely that it’s Panda’s limit. Panda just provides thin wrappers around ODE, so instead of switching to a different engine, it’s a better idea to find out where the bottleneck is.

I am not sure if it is panda’s problem. So did you get much better performance using ODE with other engines ?

May be you can use other physics engine, like PhysicX ?

please notice: having hundrets of individual objects send to the GPU is a slow process. even if the physic calculations would be fast this still can bring you down. if you go into the several hundret objects range you might need to optimize them, in case of spheres the rigidBodyCombiner might be of use.

if panda is slow for you, you most likely do something wrong or you forgot to use some essential ways to optimize your application.
the hardware imposes many limits on your scene but most of them are solveable with a few tricks. depending on what the exact cause of the slowdown is.

have you tried to disable rendering output to check the ODE performance alone? or have you tried to halt the ODE simulation to get the rendering details isolated?

I recommend running pstats and seeing where the bottleneck is. Just checking the pview graphs usually points you right to the bottleneck(s).

EDIT: Oh, I see you did above.
The fact that “Wait” is taking up so much time most likely indicates you hit vsync - try putting this in Config.prc:

sync-video 0

Also, the fact that rendering and culling takes so long could indicate that you just have too much Geoms in your scene. Not related to ODE.
This is usually the first bottleneck someone runs into. This poorly written manual page could help you further:
panda3d.org/manual/index.php/Perfo … any_Meshes

clcheung, ThomasEgi: ODE does not seem to be the problem. It is almost invisible in the PStats graph.

ThomasEgi: I’ll check the rigidBodyCombiner thing. I thought that having all my spheres in a node in the sceneGraph was similar to ‘sending them all together’ to the GPU.

pro-rsoft: I already used PStats. Its results are detailed in the third message of this post.

The problem seems to be, as I stated before, an error of mine. I’m just trying to do things without really knowing how to do them properly. Yes, I have ODE; yes, I have shadowmapping… but I’m missing something. That’s why I uploaded my code…

But then, the Demomaster ODE demo with shadowmapping is not a good example of Panda3D’s capabilities, right? Its framerate drops quickly when approaching 100 objects…

I noticed that after I posted - sorry about that. I edited my post above.

Thanks pro-rsoft, that was useful.

I didn’t really care about V-Sync. Turning it off simply pushes framerate up… until there are many objects in the scene : )

I used the RigidBodyCombiner. I have a sphere model loaded in self.sphere. When I create a new one, y use self.sphere.copyTo(self.rbcnp) (the RigidBodyCombiner node path). I only collect when a new sphere is created… but things don’t get better (they even seem a little bit worse!)

Following the “Too many meshes” page, I also tried NodePath.analyze(render), which returns the following information for 50 smiley spheres:

131 total nodes (including 0 instances); 0 LODNodes.
53 transforms; 39% of nodes have some render attribute.
66 Geoms, with 17 GeomVertexDatas and 3 GeomVertexFormats, appear on 66 GeomNodes.
2886 vertices, 2598 normals, 288 colors, 673 texture coordinates.
GeomVertexData arrays occupy 71K memory.
GeomPrimitive arrays occupy 26K memory.
20 GeomVertexArrayDatas are redundant, wasting 1K.
10 GeomPrimitive arrays are redundant, wasting 1K.
67112 triangles:
144 of these are on 72 tristrips (2 average tris per strip).
66968 of these are independent triangles.
1 textures, estimated minimum 384K texture memory required.

Which looks exactly the same if I don’t use the RBC.

Instead of using the smiley model, which is a sphere with a lot of triangles, I used the box model. Things don’t get much better either.

So:

it’s not ODE: framerate does not improve if spheres do not collide with spheres.

it’s not what I send to the GPU: if, instead of the smiley object (that has lots of triangles) I use the box object, I get more or less the same framerate.

I forgot to mention that, when I exit, Panda writes the following error in the console many times. It may be important:

:display:gsg:glgsg(error): at 2838 of c:\p\p3d\panda3d-1.6.2\panda\src\glstuff\glGraphicsStateGuardian_src.cxx : GL error 1282

but it’s not very informative: GL_INVALID_OPERATION

render.analyze() doesn’t report what’s sent to the graphics card, it reports what’s within your scene graph. So it doesn’t take into account the effect of the RBC. You should check the “Geom” count within PStats (pull down the “Geom” graph from the menu) to get a real-time report of the number of individual Geoms being sent to your graphics card.

It’s true, though, with only 66 Geoms in the scene, you can’t be seeing hundreds of them sent to the graphics card. You really should be getting a decent frame rate out of that.

This only means it’s not a limit of the number of vertices. But it would be really surprising it if were: most modern GPU’s can easily handle tens of thousands of vertices without breaking a sweat. They usually bottleneck on other things like Geom counts.

Not particularly important. Panda doesn’t always exit in the cleanest fashion, so these messages can be considered normal. It’s been a low-priority bug on my list.

David

Thank you, drwr.

I checked PStats again. With 101 spheres I get almost 250 Geoms. Does it make sense?

But then, there is a method consuming a lot (most App graph). It’s the one that updates physics. I call it with :

taskMgr.doMethodLater(1.0, self.updatePhysics, “Physics Simulation”)

when I create the world. Then it’s called continually, as it returns a Task.cont. Here’s the code:

  def updatePhysics(self, task):
    self.space.autoCollide()   
    self.world.quickStep(1.0/60.0)
    for ball in self.balls[:]: 
        np = ball[0]
        geom = ball[1]
              
        if not np.isEmpty():
            pos = geom.getBody().getPosition()
            if pos[2] < -5:
                np.removeNode()
                self.balls.remove(ball)
            else:
                if self.push:          
                  geom.getBody().addForce(self.wind[0], self.wind[1], self.wind[2])                    
                  np.setPosQuat(self.ground, geom.getBody().getPosition(), Quat(geom.getBody().getQuaternion()))
    self.contactgroup.empty() 
    return Task.cont

I iterate through a copy, so I may remove elements if certain conditions are met (they fall through the terrain, for example)

I feel we are getting closer. That function does not look clean to me… May be, I should remove elements outside or something…

It sounds like each sphere has two Geoms, which implies there are two different render states in each sphere or something. Not inherently a problem, but something to keep in mind if you want to have thousands of these things. (You should aim for no more than a few hundred Geoms in your complete scene in order to achieve a consistently good frame rate.)

Ah, yes, this function does look like trouble. There are some techniques you can use to narrow down which parts of the function are more expensive: you can use Python’s own profiler (see the python docs), or you can insert PStats calls around key lines within this function (see the Panda3D manual).

But, off the top of my head, I can see some room for improvement. The relative setPosQuat() here:

np.setPosQuat(self.ground, geom.getBody().getPosition(), Quat(geom.getBody().getQuaternion())) 

is many times more expensive than the non-relative version, which would look more like this:

np.setPosQuat(geom.getBody().getPosition(), Quat(geom.getBody().getQuaternion())) 

So, if you can guarantee that your balls are already parented to self.ground, or are at least in the same transform space as self.ground, you can save quite a bit of time by using the non-relative operation.

Also, you have multiple calls to geom.getBody(), which requires re-creating a new Python wrapper object around the underlying OdeBody object (or whatever it is). You can avoid this by saving the result of geom.getBody(), perhaps in your self.balls list, and just using it repeatedly. This is probably a relatively minor cost, though.

It’s probably worth measuring the cost of the ODE calls themselves, such as autoCollide(), quickStep(), and addForce() and even getPosition() and getQuaternion().

David

Thank you, drwr. Your tips were useful to scratch some frames per second, but nothing serious.

The problem is worse when we compare Panda with, for example, Quest3D (even when we now that Q3D is propietary and expensive). An ODE application in Q3D handles more than 400 boxes in a landscape with 40000+ polygons at 80fps. Our last test with Panda reaches 50 boxes in a landscape with 4000 polygons at 50fps.

We feel that there must be a compromise between effort and results. If a lot of effort is needed, we expect great results. Of course, there’s still the benefit of doubt: we must be doing something wrong.

But, on the other hand, there’s no single example where Panda shows its power. Demos in Demomaster are too simple… or too disappointing, with framerates that are simply not enough for what is on screen.

Anyway, thank you for your time!

If you get better results with a different engine, it’s you who implemented it differently there. I don’t see a reason why a different engine would be faster than Panda if you put the same issues in your game (unintentionally of course).

That’s being worked on.

So is it an apple to apple comparison ? e.g. using the same setting on collision categories, spaces and etc. ?

It seems PyODE’s performance is similar to panda ode. I doubt if you can run 400 boxes in Quest3D, in one single space, and all boxes are in the same cat.

Just download and build ODE, and it can run several hundreds of objects seamlessly. It is interesting to investigate further on the reasons behind !