Magical collision scene optimizer

Do you occasionally struggle with slow performance of collision detection routines?

Do you sometimes wish there were a magical function you could call that would just… suddenly make everything fast, with no extra work needed?

Well, no longer! I have written a function that will optimize any node graph containing “into” collision nodes in such a way that collision detection will become orders of magnitude more efficient.

As an extra feature, it can also convert visible geometry into collision geometry for you, in case you were too lazy to figure out the pipeline for getting optimized collision geometry out of your modelling program.

Where to get it

How to use it

Import it, and then:

optimize_collisions(render, track_progress=True)

That’s it! It will automatically take all CollisionNodes with an “into” node, extract the solids, and reforge them into a massive new tree that you think has far too many nodes, but contains a secret kind of magic that actually makes it really really fast, for typical scenes.

The track_progress option shows a nice progress bar on the console in case it takes a while, but you can take it out if you don’t want it (or pass a custom function to integrate with your own loading bar). If it really takes long, you can of course just write_bam_file() the result as a kind of offline cache.

By default, it does not convert visible geometry into CollisionPolygons, but if you really want to do this, then you can use convert_geometry=True.

If you don’t care about preserving the names of CollisionNodes or their tags (like, if you don’t use this information in your collision handlers), then you can pass preserve_name=False or preserve_tags=False to strip this information for extra efficiency.

Also, if your game is strictly top-down with no verticality, use ignore_z=True for a possibly slightly more improved layout (quadtree instead of octree).

Limitations

This method works well if you have a large, static collision scene with many relatively small colliders. If you have huge colliders that span the entire scene, or a lot of overlapping colliders, that will make this method less efficient. If you have a ground plane or something like that, exclude it from the computation, or split it up into smaller polygons.

The method also assumes your “from” nodes are reasonably-sized and don’t encompass the entire scene. Don’t use this method if you have a massive “from” node that can collide with all solids at once.

If you wish to move the “into” CollisionNodes individually, this doesn’t work. Instead, organize all your static collision geometry under the same node, then call optimize_collisions on that node.

Currently, it uses 60-bit Morton codes, which limits the depth of the graph to 20 levels, but that’s enough for a quintillion “into” nodes to all have their own cells, so that should be quite enough for now!

How it works

If you just want to use it, you can stop reading now. But if you want to understand how it works, read on…

The general idea is to build an octree, so that instead of going through all the collision solids one by one, Panda’s hierarchical collision detection system can progressively narrow down on the parts of the scene that may intersect with a model, and only perform collision detection on the “into” solids that are actually close to the “from” collider in question. For a scene with 50k spread-out collision solids, this can reduce the number of actual tests done from (roughly) 50k to about 16. That’s a 3000x speed-up!

We build the tree by gathering all the collision solids, taking their center points, turning them into a Morton code, and sorting all the solids by this code, which ensures that the solids in any branch of the octree are in a contiguous range. Then we can just walk through the tree with some clever bit-shifting with the Morton codes and build up the collision node hierarchy, with some extra cleverness to not construct empty parts of the tree.

Technically, the result is not an octree, but an AABB tree. This is because the solids aren’t actually points, so in on octree they would have to be put in more than one cell. But since Panda’s node hierarchy already automatically calculates the AABB for everything, I still just assign the CollisionNode to one cell and have Panda make the AABB large enough to account for any overlap. (This is why it’s bad if you include huge solids that encompass the entire scene; the AABB will grow to very large proportions, negating the advantage of hierarchical testing.)

But my node count is massive now, how could this possibly be faster?

The resulting node count can be huge, such that calling render.ls() actually takes a while! But the renderer will skip them, because the root nodes are marked hidden. And the collision traverser only visits the nodes that have a chance of actually colliding with the “from” node based on the AABB test. Unless your “from” node is so huge that it will collide with a majority of the solids at once, only a few nodes will ever be visited at a time.

How is this different from what subdivideCollisions(8) does?

Well, it’s far more aggressive. This rips apart all your “into” CollisionNodes and rebuilds them in the most optimal manner, whereas subdivideCollisions only subdivides already existing CollisionNodes. That’s not useful if you have many individual CollisionNodes, only if you have single CollisionNodes containing many individual.

Can we add this to Panda?

Maybe! We could do something like this as part of the egg/gltf loader. I am also considering adding a similar feature directly to CollisionNode, so that CollisionNodes with many solids will automatically be optimized into an internal octree.

7 Likes

This sounds quite impressive indeed! Thank you for it! :slight_smile:

(I’m not yet sure of whether it’ll work with my current project, but I do want to consider whether it might. And even if it doesn’t, it may be very useful for other projects, mine or otherwise.)

Hello. sorry to bother you, but it seems that I have ran into garbage collection issues with this, you see, when you first suggested this to me, I used it on a one scene demo, and it worked well, however, when I decided to use this on a more mature prototype, upon switching scenes (rooms), everything went haywire.
here is the original scene
(image removed to save bandwidth)
and here it is with the optimizer running once every scene rebuild.
(image removed to save bandwidth)
so what is happening is each picture is the character coming out of a castle, the theory is that the castle room collisions somehow have survived the space rebuild, the castle is of a different height of then the field, and you can see the NPCs are floating in the air.

also the character is boxed in a small space, the castle room was small, so I decided to test this farther, with render…analyze() and here was the log

original (these are consistant)
room 1 - - -
72 total nodes (including 0 instances); 0 LODNodes.
30 transforms; 22% of nodes have some render attribute.
16 Geoms, with 16 GeomVertexDatas and 4 GeomVertexFormats, appear on 16 GeomNodes.
16954 vertices, 16954 normals, 0 colors, 16904 texture coordinates.
GeomVertexData arrays occupy 286K memory.
GeomPrimitive arrays occupy 75K memory.
2 GeomVertexArrayDatas are redundant, wasting 4K.
1 GeomPrimitive arrays are redundant, wasting 2K.
28066 triangles:
  288 of these are on 47 tristrips (6.12766 average tris per strip).
  27778 of these are independent triangles.
10 textures, estimated minimum 37120K texture memory required.

room 0 - - -
112 total nodes (including 0 instances); 0 LODNodes.
43 transforms; 28% of nodes have some render attribute.
32 Geoms, with 32 GeomVertexDatas and 4 GeomVertexFormats, appear on 32 GeomNodes.
22366 vertices, 22366 normals, 0 colors, 22316 texture coordinates.
GeomVertexData arrays occupy 1022K memory.
GeomPrimitive arrays occupy 121K memory.
17 GeomVertexArrayDatas are redundant, wasting 21K.
7 GeomPrimitive arrays are redundant, wasting 4K.
24061 triangles:
  2046 of these are on 349 tristrips (5.86246 average tris per strip).
  22015 of these are independent triangles.
40 textures, estimated minimum 160000K texture memory required.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
with optimizer activated once every room switch.
room 1 - - -
8146 total nodes (including 0 instances); 0 LODNodes.
4736 transforms; 0% of nodes have some render attribute.
16 Geoms, with 16 GeomVertexDatas and 4 GeomVertexFormats, appear on 16 GeomNodes.
16954 vertices, 16954 normals, 0 colors, 16904 texture coordinates.
GeomVertexData arrays occupy 286K memory.
GeomPrimitive arrays occupy 75K memory.
2 GeomVertexArrayDatas are redundant, wasting 4K.
1 GeomPrimitive arrays are redundant, wasting 2K.
28066 triangles:
  288 of these are on 47 tristrips (6.12766 average tris per strip).
  27778 of these are independent triangles.
10 textures, estimated minimum 37120K texture memory required.

room 0 - - -
17466 total nodes (including 0 instances); 0 LODNodes.
10227 transforms; 0% of nodes have some render attribute.
32 Geoms, with 32 GeomVertexDatas and 4 GeomVertexFormats, appear on 32 GeomNodes.
22366 vertices, 22366 normals, 0 colors, 22316 texture coordinates.
GeomVertexData arrays occupy 1022K memory.
GeomPrimitive arrays occupy 121K memory.
17 GeomVertexArrayDatas are redundant, wasting 21K.
7 GeomPrimitive arrays are redundant, wasting 4K.
24061 triangles:
  2046 of these are on 349 tristrips (5.86246 average tris per strip).
  22015 of these are independent triangles.
40 textures, estimated minimum 160000K texture memory required.

room 1 - - -
30779 total nodes (including 0 instances); 0 LODNodes.
16768 transforms; 0% of nodes have some render attribute.
16 Geoms, with 16 GeomVertexDatas and 4 GeomVertexFormats, appear on 16 GeomNodes.
16954 vertices, 16954 normals, 0 colors, 16904 texture coordinates.
GeomVertexData arrays occupy 286K memory.
GeomPrimitive arrays occupy 75K memory.
2 GeomVertexArrayDatas are redundant, wasting 4K.
1 GeomPrimitive arrays are redundant, wasting 2K.
28066 triangles:
  288 of these are on 47 tristrips (6.12766 average tris per strip).
  27778 of these are independent triangles.
10 textures, estimated minimum 37120K texture memory required.

room 0 - - -
44933 total nodes (including 0 instances); 0 LODNodes.
23595 transforms; 0% of nodes have some render attribute.
32 Geoms, with 32 GeomVertexDatas and 4 GeomVertexFormats, appear on 32 GeomNodes.
22366 vertices, 22366 normals, 0 colors, 22316 texture coordinates.
GeomVertexData arrays occupy 1022K memory.
GeomPrimitive arrays occupy 121K memory.
17 GeomVertexArrayDatas are redundant, wasting 21K.
7 GeomPrimitive arrays are redundant, wasting 4K.
24061 triangles:
  2046 of these are on 349 tristrips (5.86246 average tris per strip).
  22015 of these are independent triangles.
40 textures, estimated minimum 160000K texture memory required.

and so forth.....

it appears to be the case, I wanted to ask where is your program writing all those nodes to, I know there is supposed to be a lot, but I dont think they are supposed to keep growing in number like this, I want remove these nodes correctly, perhaps your program has a function to do this, I would love to know if that were the case.

anyway, thank you for listening.

I recommend looking at the structure of the scene graph with render.ls(), and seeing which nodes remain before and after your scene transition.

It may be helpful to have your scene objects parented to some dummy node that you can easily remove with .removeNode() and recreate upon scene changes.

I was thinking of adding a “Plane” node where I dump all the nodes onto it then kill it for my clean up like you said, but I tried to test this with my scene node first, which has all the into nodes at the start, but I got a zero division error.

but maybe I did it, wrong, I will try like you said and report back here soon, thank you for the suggestion.


Edit: it turns your suggestion solved the issue, once again thank you rdb, I now have full optimized versions of my games, which I will be releasing in the coming days.

1 Like