Memory Leak in Bullet Code

timo · March 13, 2013, 10:19pm

Hi,

I am on Ubuntu 12.10 64 Bit with a self-compiled panda3d and a self-compiled Bullet. The following piece of python code seems to include a memory leak. After 5 minutes it occupies 70 % of my 8GB of RAM… Still, for my knowledge it should be perfectly valid code:

from panda3d.bullet import BulletRigidBodyNode, BulletWorld
from panda3d.core import Vec3

world = BulletWorld()
world.set_gravity((0, 0, -1))
body = BulletRigidBodyNode()
body.set_deactivation_enabled(False)
body.set_mass(1)
world.attach(body)

while True:
	world.do_physics(1e-4)

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND       
 4935 timo      20   0 6444m 6.2g 1856 R  96.9 80.1   7:31.65 python

Is this a bug in panda? Or might this be due to wrong compile ooptions?

Any help would be appreciated!
Thanks
Cheers
TImo

enn0x · March 13, 2013, 11:28pm

Hmm… yes, there seems to be a memory leak somewhere. Good look it is tiny. For an application runnig at 100 frames per second it seems to be around 10M per hour. So only a very long running application would get serious trouble. I will try to find out what is causing the memory growth. It seems no to depend on how many bodies are added to the world, just how often do_physics is called.

timo · March 13, 2013, 11:41pm

Thanks for the answer!

As you pointed out, it seems to depends on how often do_physics is called. In my case, I want to have a precise simulation and I am using the angular motor of BulletHingeConstraint, so I have to call do_physics considerably more often than just 100 times per second and I am really suffering from this bug… In addition, I rely on unit tests and unfortunately the memory leak accumulates over the different tests:-(

I will also try to find the bug. Do you have an idea whether it’s a panda or a bullet thing?

timo · March 14, 2013, 12:34am

One additional observation: the leak occurs only if the body is moving. Removing the line which sets the gravity removes also the memory leak

enn0x · March 14, 2013, 1:07am

I tracked the memory leak down to TransformState::make_pos_quat_scale, which is used to keep Bulleet & Panda3D in sync.

Can you confirm that the following script shows a fast increase in memory consumption:

from panda3d.core import Vec3
from panda3d.core import Point3
from panda3d.core import Quat
from panda3d.core import TransformState

q = Quat.identQuat()
s = Vec3(1,1,1)

i = 0
while True:
   i += 1
   p = Point3(0,0,i*0.001)
   ts = TransformState.makePosQuatScale(p,q,s)

timo · March 14, 2013, 7:05am

enn0x,

I can confirm this! The increase is at least three or four times faster than the one observed with do_physics. There is an even more reduced script showing the same effect:

from panda3d.core import TransformState

i = 0
while True:
	i += 1
	TransformState.make_pos((0, 0, i))

The same is possible with TransformState.makeQuat or TransformState.makeScale as long as the argument is different every time.

timo · March 14, 2013, 8:13am

I have no idea whether this is helpful, but when I set

uniquify-transforms 0

in the config.prc the leak in the simple test script vanishes. At the same time, every more complex simulation ends up either with an assertion error or a segmentation fault…

enn0x · March 14, 2013, 8:24am

Ok. I will try to dig a bit deeper, but at the same time ask rdb or drwr for help. I’m not sure if there is perhaps some caching mechanism involved. A bug in PointerTo/ConstPointerTo is unlikely, since this code has been use way to often for a bug to go unnoticed until now.

TransformState has a check if the caller provided “zeros”. If this is the case it will return a pointer to a static object (identity transform).

Can you post the assertion? Would be helpful to know where an assert fires. I don’t check the return value of TransformState::make_xyz for null for example, since I assume this method will always return a valid pointer.

timo · March 14, 2013, 8:42am

Here is the assertion:

 File "xxx.py", line 73, in step
    self.world.do_physics(self.dt, 0)
AssertionError: _composition_cache.is_empty() && _invert_composition_cache.is_empty() at line 112 of panda/src/pgraph/transformState.cxx

I have not yet broken it down to a minimalistic error producing script…

rdb · March 14, 2013, 8:49am

The whole point behind the weird TransformState interface is so that they can be cached, and then compared by pointer to see if two transforms are equal. This cache allows Panda to cache the result of expensive matrix computations (through the ‘composition cache’) so that they are not done more than once.

Not all applications benefit from the transform cache. You can disable it by setting “transform-cache 0” in Config.prc.

uniquify-transforms toggles whether or not transforms should be made unique so that they can be compared by pointer for equality. This avoids unnecessary state changes. I’m not sure why disabling it would make the problem go away, although I guess that David would know that.
This makes sure that you get the same TransformState object every time you call makePos with the same arguments.

If you would still like to take advantage of the transform cache, you can try TransformState.garbage_collect() to clean up unused transforms or TransformState.clear_cache() to clear the cache entirely. (For debugging, you can call TransformState.list_states() to get a list of transforms in the cache.)

One option is for our Bullet code to allow passing a Mat4 instead of a TransformState, so that one can pass Mat4.posMat(x, y, z) without having to use the TransformState interface. Since the Bullet code doesn’t compare transforms by pointer or performs matrix computations on them through the TransformState interface, this arguably makes more sense.

That said, Panda is known for its easy scene graph manipulations without having to resort to matrices or transform states. Perhaps there is a way to design this interface to use scene graph nodes or something like that?

(This problem isn’t limited to Bullet, it occurs when you use nodePath.setPos as well: bugs.launchpad.net/panda3d/+bug/952815 )

timo · March 14, 2013, 11:20am

“transform-cache 0” resolves the described problem.
Unfortunatley, it introduces also a new one. With “transform-cache 0” in my Config.prc, the following two lines are enough to produce a segmentation fault:

from direct.showbase.ShowBase import ShowBase
ShowBase().run()

Here is the complete output:

$ python test.py 
Known pipe types:
  glxGraphicsPipe
(all display modules loaded.)
:audio(error):   load_dso(libp3openal_audio.so) failed, will use NullAudioManager
:audio(error):     No error.
Segmentation fault (core dumped)

(The audio message has been there before and should not cause any harm apart from silence.)

A segmentation fault seems even harder to debug than a memory leak… I am afraid I will need further help. Thanks in advance!

timo · March 14, 2013, 11:45am

Here is the backtrace of the Segfault:

#0  0x0000000000000000 in ?? ()
#1  0x00007ffff40d80ca in void unref_delete<TransformState>(TransformState*) () from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#2  0x00007ffff40d81c5 in PointerToBase<TransformState>::~PointerToBase()
    () from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#3  0x00007ffff455ddba in GraphicsEngine::setup_scene(GraphicsStateGuardian*, DisplayRegionPipelineReader*) ()
   from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#4  0x00007ffff456cf10 in GraphicsEngine::cull_to_bins(GraphicsOutput*, DisplayRegion*, Thread*) ()
   from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#5  0x00007ffff456da22 in GraphicsEngine::cull_to_bins(ov_set<PointerTo<GraphicsOutput>, IndirectLess<GraphicsOutput> > const&, Thread*) ()
   from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#6  0x00007ffff456dda2 in GraphicsEngine::WindowRenderer::do_frame(GraphicsEngine*, Thread*) ()
   from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#7  0x00007ffff456ed4a in GraphicsEngine::render_frame() ()
   from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#8  0x00007ffff45b1615 in Dtool_GraphicsEngine_render_frame_615(_object*, _object*, _object*) ()
   from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#9  0x000000000045f912 in PyEval_EvalFrameEx ()
#10 0x0000000000467209 in PyEval_EvalCodeEx ()
#11 0x00000000004a9fea in ?? ()
#12 0x000000000048249d in ?? ()
#13 0x000000000049e116 in PyObject_Call ()
#14 0x00007ffff45f707d in Thread::call_python_func(_object*, _object*) ()
   from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#15 0x00007ffff4610223 in PythonTask::do_python_task() ()
   from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#16 0x00007ffff4610490 in PythonTask::do_task() ()
   from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#17 0x00007ffff460fe5a in AsyncTask::unlock_and_do_task() ()
   from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#18 0x00007ffff4619e5f in AsyncTaskChain::service_one_task(AsyncTaskChain::AsyncTaskChainThread*) ()
   from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#19 0x00007ffff461a870 in AsyncTaskChain::do_poll() ()
   from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#20 0x00007ffff461aa01 in AsyncTaskManager::poll() ()
   from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#21 0x00007ffff462dced in Dtool_AsyncTaskManager_poll_121(_object*, _object*, _object*) () from /PROJECTPATH/venv/lib/panda3d/libpanda.so
#22 0x000000000045f912 in PyEval_EvalFrameEx ()
#23 0x00000000004602b7 in PyEval_EvalFrameEx ()
---Type <return> to continue, or q <return> to quit---
#24 0x0000000000467209 in PyEval_EvalCodeEx ()
#25 0x000000000045ff77 in PyEval_EvalFrameEx ()
#26 0x00000000004602b7 in PyEval_EvalFrameEx ()
#27 0x0000000000467209 in PyEval_EvalCodeEx ()
#28 0x00000000004d0242 in PyEval_EvalCode ()
#29 0x00000000005102bb in ?? ()
#30 0x000000000044a466 in PyRun_FileExFlags ()
#31 0x000000000044a97a in PyRun_SimpleFileExFlags ()
#32 0x000000000044b6bc in Py_Main ()
#33 0x00007ffff6f0576d in __libc_start_main ()
   from /lib/x86_64-linux-gnu/libc.so.6
#34 0x00000000004ce0ad in _start ()

(I replaced the original project path)

drwr · March 15, 2013, 4:50am

The fundamental problem with the original code:

while True:
   world.do_physics(1e-4)

is that you are running infinite calculations in a tight loop, without running the task loop. Normally, the transform cache is configured to purge itself at the end of each task step, which you are never reaching. It would be better to simply put your do_physics() call by itself in a task function, and then call run(), which will achieve the results you want without causing a leak. Or, you could turn off this behavior of the transform cache by putting “garbage-collect-states 0” in your Config.prc file, but this is the clumsier solution.

This is, incidentally, the same problem in the second code sample:

i = 0
while True:
   i += 1
   p = Point3(0,0,i*0.001)
   ts = TransformState.makePosQuatScale(p,q,s)

Now, as to why your program is crashing with “transform-cache 0”, that appears to be a legitimate problem. Not sure why it’s happening, but I’ve just reproduced it, so I should be able to track it down and fix it soon enough.

David

enn0x · March 17, 2013, 9:10am

Thank you drwr for explaining. If I got everything right there is no action point for me (Bullet module).

For the record: it not Mat4.posMat but Mat4.translateMat (I would find posMat a better name, since at first glance I would assume that a method called translateMat “translates” a mat inplace, because translate is a verb. At second glance I would see that translateMat is a static method, and thus my first assumption is nonsense).

Still an interesting idea. The most common situation when placing a physics object within the world will be that the object has both a translation and a rotation. So far we don’t have a convenience function in Mat4 which creates such a matrix in one call. Could be amdended, maybe Mat4.posHprMat(x,y,z,h,p,r).

Here in this case it won’t solve the problem, since no user API is involved. It is the automatic Bullet-to-SceneGraph syncronisation which creates the TransformStates. For each Bullet object which has it’s transform changed we need to update the net transform of the corresponding PandaNode, after the simulation step, that is at the end of the do_physics method. Now, even in I convert the Bullet btTransform into a Mat4, and then use NodePath.set_mat, then set_mat will internally first create a CPT(TransformState) and then apply it.

If I understand PandaNodes right then there is no way to move a PandaNode without making a new TransformState (where making also can mean getting it from the transformState cache), simply because the PandaNode stores the TransformState as a member.

timo · March 19, 2013, 3:35pm

drwr,

thanks for your reply! “garbage-collect-states 0” does indeed do the trick. The memory leak is gone and applications derived from ShowBase do not crash. What does this option actually do? Setting it to zero sounds like turning off some garbage collector. So on the first I would have expected to obtain even more memory leaks with this setting. Anyhow, it should then probably be added to http://www.panda3d.org/manual/index.php/List_of_All_Config_Variables.

Just for explanation: In this part of my application I am only interested in the physics provided by the bullet bindings. I do not have any graphical output there, so I am just iterating the world and reading and setting some values in between. Therefore, it does not seem to make sense to use the complete panda3d infrastructure including tasks, run, etc.

Thanks again for your help, my problem seems solved for now.
Best
Timo