Squeezing performance out of bullet

Hi all,
I am trying to learn how to use Bullet with Panda3d. I am using the Continuous Collision Detection example as my basic setup, and one of the things I am trying to do is increase the number of boxes that can be used while having good framerate (30+).

What can I do to increase the performance when there are many boxes present(at 500 I run the simulation with 8 fps)? From what I gather it seems that the many boxes simply result in too many meshes which overwhelm the gpu.

I looked into the manual and tried calling

render.flattenMedium()

and the like on the render node, but that has no effect. I believe that this is because the visual node paths from the example are never reparented to the render node - they are reparented to the body nodes, and thus out of the graph (how do these get rendered then?). I tried putting everything under render and then flattening, but the result was the same.

I also tried the rigid body combiner, but I must be using it wrong since I didn’t gain any performance from it.

Thanks in advance, any tips are appreciated. I’m sorry if the question is too noobish, I am just starting with Panda.