Performances: CollisionSolid::get_bounds really greedy?!

jean-claude · July 31, 2011, 12:25am

Hi,

jfyi, trying to optimize my P3d code, I ended up with the following figures by using fine grain performance sampler:

Total CPU Time:					    32.936s		100%

Top Hotspots
------------
CollisionSolid::get_bounds	      4.380s		13.3%
wglGraphicsWindow::begin_flip	      3.925s		11.9%
CopyOnWritePointer::get_read_pointer        1.398s		 4.2%
CollisionTraverser::compare_collider_to_node  	   1.332s		 4.0%
CopyOnWriteObject::unref  			    1.117s		 3.4%
others						           		63.2%

What strikes me is that about 40% of the CPU time is dedicated to barely 4 big cpu eaters!

btw. if you consider only this code line in CollisionSolid.cxx

  return _internal_bounds;

it eats 4.09s ie 12.5% of the total execution time !!!
considering _internal_bounds is usually empty, this seems to be a high price to pay…

drwr · July 31, 2011, 12:50am

It also includes the time to call compute_internal_bounds(), which is a fairly complex function for CollisionPolygon.

I’m guessing you have a high number of CollisionPolygons in your scene. But this time should only be spent at startup; it’s not recomputed every frame. So that’s not so bad, right? Be careful that your performance analysis code is measuring your frame-to-frame time and not your startup time (unless it is startup time that you are presently concerned with).

David

jean-claude · July 31, 2011, 1:21am

Well, I don’t know what startup time exactly means in this case.

The way I’m running the sampler is the following:

I start my P3D program and when everything seems to have stabilized and after having run it for say 5mn, I launch the performance sampler, thus the values I’m getting are the ones after the app. has been running for some time already.

edit: Have a look at this run:
https://rapidshare.com/files/1041877195/CaptureRun.JPG
The app had been running for about 3mn prior to the sampler was started. And the figures are about the same percentage as mentionned in my previous post!

drwr · July 31, 2011, 2:08pm

Ah, you’re not, by chance, computing collisions with visible geometry, are you? Instead of using precomputed CollisionSolids?

Because the way collisions-with-visible-geometry is implemented, it actually creates a dynamic CollisionPolygon on-the-fly for each triangle in your mesh. Then it throws it away and creates it again next frame. It’s insanely inefficient, but the whole mechanism is provided only for quick-and-dirty uses. It’s not intended to be used for production code. If you need to use collisions in production code, you should always use pre-created collision geometry.

David

jean-claude · July 31, 2011, 2:58pm

Well at this point I must admit I’m somewhat confused.
Bear with me & let me explain how I get there:

(1) I initially have a full egg (geometry) with no collision call it ‘file.egg’

(2) I copy it to a new file: ‘file_collision.egg’, edit this file and add

<Collide> { Polyset keep descend }

in order to get collisions. At this point I assume that file_collision.egg contains both geometry & collisions, right?

(3) next: I generate a new file optimized for collisions. In order to do so I load ‘file_collision.egg’ in a specific P3D session, during which I strip off geometry, octreefy the structure and save it to ‘file_collision_octreefyed.bam’

(4) ok, now in my P3D app. I load ‘file.egg’ for visible geometries and ‘file_collision_octreefyed.bam’ for collisions.

A quick look at each egg file gives :

from file.egg

...
<Group> arkadi {
/* <Dart> { 1 } */
  <Group> groundPlane_transform {}
/* <Collide> { Polyset keep descend } */
  <Group> Model {
    <Group> Terrain {
      <Group> Mesh1 {
        <VertexPool> Mesh1Shape.verts {
          <Vertex> 0 {   
             -100.794 101.479 -17.363
             <UV> UVSet0 { 0.0415738 0.999999 }
             <Normal> { -0.193109 0.0260929 0.98083 }  
             <RGBA> { 1 1 1 1 } 
           }
...

from file_collision_octreefyed.bam (converted to file_collision_octreefyedegg)

...
<Group> arkadi_col.egg {
  <Group> {
    <Group> arkadi {
      <Collide> { Polyset descend }
      <VertexPool> vpool-collision {
        <Vertex> 0 {
          -100.794 17.363 101.479
          <Normal> { -0.187965 -0.981709 0.0302807 }
        }
...

So really I’m confused on what could go wrong… Any suggestion?

drwr · July 31, 2011, 9:34pm

Well, the key thing is whether you’re inadvertently setting your collide masks to allow collision with visible geometry, which could happen without your realizing it even if you also have created collision geometry.

There are several ways this could happen: you might be explicitly adding GeomNode.getDefaultCollideMask() to one of your from objects, or you might be setting one of your from objects to all bits on, or some other bitmask which accidentally includes the bit named by GeomNode.getDefaultCollideMask() (which is bit 20), or you might have called something like model.setCollideMask(BitMask32.bit(1)) on one of your nodes in the scene, which would add bit 1 to all nodes at model and below, including visible geometry nodes.

The first thing to ask is whether you are, in fact, colliding with visible geometry at all. PStats can answer you that question–if you run pstats, you can open the “Collision Volumes” graph and see the number of CollisionNodes, GeomNodes, and Geoms you are testing for collisions. Hopefully the first number will be nonzero, and the second two numbers will be zero.

David

jean-claude · August 1, 2011, 4:55pm

ok, Looking at PSTATS’graph Collision Volumes:

with both ‘file.egg’ and ‘file_collision_octreefyed.bam’ loaded

CollisionNode     roughly 30
CollisionPolygon  roughly 300
PandaNode         roughly 150
runrate 11Hz
on the upper right corner of the graph stands a value oscillating between 25000 & 50000 - btw is that the actual number of collision polygons??

with only ‘file.egg’ is loaded

CollisionNode    roughly 15
PandaNode        roughly 100
runrate 20Hz

with only ‘file_collision_octreefyed.bam’ loaded

CollisionNode     roughly 20
CollisionPolygon  roughly 300
PandaNode         roughly 150
runrate 13Hz
on the upper right corner of the graph stands a value oscillating between 25000 & 50000

Is that to say that the “culprit” is somewhere inside or around ‘file_collision_octreefyed.bam’ file?

drwr · August 1, 2011, 6:13pm

Well, maybe. But I think it’s more likely to blame a problem in the code, rather than in the model. But the model is a good place to start.

David

jean-claude · August 2, 2011, 1:00am

Well I’m must seriously be missing a basic understanding here.

I checked the code as suggested to try and locate a possible: GeomNode.getDefaultCollideMask(), GeomNode.getDefaultCollideMask() (which is bit 20), or model.setCollideMask(BitMask32.bit(1))… nothing,NADA!

I inspected again with Pstats and got:

Collisions volume components:
CollisionTube          0
Geom                   0
GeomNode               0
CollisionNode	      15
CollisionSphere        0
CollisionSolid         0
CollisionPolygon   38000
CollisionPlane         0
PandaNode            128
CollisionInvSphere     0
CollisionGeom	       0

Actually at this point I don’t know what I should expect to get

In other words could it be the way the ‘file_collision_octreefyed.bam’ was generated that is at the origin of the per frame recomputation of compute_internal_bounds() ???

Just in case, here is the egg corresponding to file_collision_octreefyed.bam
https://rapidshare.com/files/2095963831/file_collision_octreefyed…egg

drwr · August 2, 2011, 5:23pm

Ah, I think I understand now. It’s not that it’s using collision-with-geometry, but the problem is that you have 38,000 CollisionPolygons all in one big mesh. That’s far too many polygons.

When Panda traverses the scene and discovers that mesh, it has to walk through each one of the 38,000 polygons and test it individually to see if it is close to your collision object. That’s why you’re seeing so much time in CollisionSolid::get_bounds(). Each individual call is very fast, but when you call it 38,000 times per frame, that very fast time adds up quickly.

You should seriously consider refactoring your geometry so that you have far fewer collision polygons. A few hundred is more reasonable. There’s usually no reason to have so much detail in your collision geometry–38,000 is a fine number for visible geometry, but too detailed for collisions.

On the other hand, you report octreefying the file, and that should help by splitting the mesh up into many smaller meshes so that no single mesh has that many polygons. Something may have gone wrong in this step.

You could try running with show_collisions in effect (traverser->show_collisions(render)), to highlight the polygons that it is testing for bounding volumes. You should see no more than a handful of polygons light up each frame. If you see thousands of polygons, your scene is not well suited for collisions.

David

drwr · August 2, 2011, 5:35pm

Looking at your egg file, it does appear to have been octreefied correctly. But if the object that you are testing for collisions happens to be large enough to come without the bounding volume of many of these polygons at the same time, then the octree doesn’t help you at all–it will still have to test all of them.

Perhaps this is what’s happening. The bottom line is, you’ve got way too many collision polygons.

David

jean-claude · August 3, 2011, 12:57pm

Hi David, Thanks again for your advices.

I already checked with traverser->show_collisions(render) and even if some polygons are lighted quite far away from the collider, it doesn’t seem too bad.

I then suspected a scaling issue, since I bring together a bunch of stuff, rescale it and frequently use ‘object.set_effect(CompassEffect::make(render, CompassEffect::P_scale))’… I did some exploration on this with no success…

So in order to simplify things, I simply loaded the environment in Ralph demo, and this is the outcome:

using embedded collision with geometry ('file_collision.egg') or using optimized (hopefully) ('file_collision_octreefyed.bam') give exactly the same performances!?!
In both case most of the time (30ms) is spent in Collision traversing, more precisely in 'pass1'.

I’m quite surprised since using collision to geometry or specific octreefyed doesn’t make at all a difference!

Here is a quick self-contained test to illustrate this: https://rapidshare.com/files/2062408404/test_collisions_ralph.7z

What I don’t really understand is that apparently:

no diffference exists between loading a file with embedded collisions to geom and loading a collision only file
in either case the traverser seems to explore (?) all the nodes

Bottom line: I’m still puzzled!

jean-claude · August 4, 2011, 4:32pm

Hi, I think I’ve finally found the (nasty) issue!!!

This comes from duplicate collision node names when generating the octree structure, eg

    <Group> octree-root {
      <Group> leaf-0 {
        <Group> leaf-0 {
          <Collide> { Polyset descend }
          <VertexPool> vpool-collision.vpool1 {
            <Vertex> 0 {
              -7.23668 -7.97214 -0.674276
              <Normal> { 0 -0.868243 -0.496139 }
            }
...
          }
          <Polygon> {
            <VertexRef> { 0 1 2 <Ref> { vpool-collision.vpool1 } }
          }
...
          <Polygon> {
            <VertexRef> { 41 40 42 <Ref> { vpool-collision.vpool1 } }
          }
        }
      }
      <Group> leaf-0 {
        <Group> leaf-0 {
          <Collide> { Polyset descend }
          <VertexPool> vpool-collision.vpool2 {
...

Since the names are the same I suppose the traverser is grouping the whole stuff together…

Neithertheless to fix the issue I changed the source code of the treeform/mindstormss/fenrirwolf octreefyer utility in order to cope with the issue (merely adding the quadrant number as part of the collision node name), and everything works ok now.

For those of you interested, here is the new version of the octreefyer.
https://rapidshare.com/files/3904131854/ocquadtreefy.py