Bullet Causing Segfault on Ubuntu

Hi.

I have been using panda3d 1.81 on Ubuntu 13.10 fine for some time. However since adding multiple collision shapes to the the same bulletRigidBody node I’m getting a segfault whenever the rigid body node collides with another.

Using GDB I found the cause of the segfaut to be:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff0eb3060 in Dtool_BulletContact_get_node0_178(_object*, _object*, _object*) () from /usr/lib/panda3d/libpandabullet.so

With the following trace:

#0  0x00007ffff0eb3060 in Dtool_BulletContact_get_node0_178(_object*, _object*, _object*) () from /usr/lib/panda3d/libpandabullet.so
#1  0x000000000052de05 in PyEval_EvalFrameEx ()
#2  0x000000000050697a in ?? ()
#3  0x00000000005301ae in PyEval_EvalFrameEx ()
#4  0x000000000050697a in ?? ()
#5  0x00000000004d53f4 in ?? ()
#6  0x000000000051bee6 in PyObject_Call ()
#7  0x00007ffff3f421a5 in Thread::call_python_func(_object*, _object*) ()
   from /usr/lib/panda3d/libpanda.so
#8  0x00007ffff3f62c9e in PythonTask::do_python_task() ()
   from /usr/lib/panda3d/libpanda.so
#9  0x00007ffff3f62ef8 in PythonTask::do_task() ()
   from /usr/lib/panda3d/libpanda.so
#10 0x00007ffff3f649c3 in AsyncTask::unlock_and_do_task() ()
   from /usr/lib/panda3d/libpanda.so
#11 0x00007ffff3f69e88 in AsyncTaskChain::service_one_task(AsyncTaskChain::AsyncTaskChainThread*) () from /usr/lib/panda3d/libpanda.so
#12 0x00007ffff3f6a3a0 in AsyncTaskChain::do_poll() ()
   from /usr/lib/panda3d/libpanda.so
#13 0x00007ffff3f6aa49 in AsyncTaskManager::poll() ()
   from /usr/lib/panda3d/libpanda.so
#14 0x00007ffff3f86d88 in Dtool_AsyncTaskManager_poll_121(_object*, _object*, _o---Type <return> to continue, or q <return> to quit---
bject*) () from /usr/lib/panda3d/libpanda.so
#15 0x000000000052de05 in PyEval_EvalFrameEx ()
#16 0x000000000052e672 in PyEval_EvalFrameEx ()
#17 0x0000000000505b24 in PyEval_EvalCodeEx ()
#18 0x000000000052f3f2 in PyEval_EvalFrameEx ()
#19 0x000000000052e672 in PyEval_EvalFrameEx ()
#20 0x0000000000567cdc in PyEval_EvalCode ()
#21 0x0000000000451adb in ?? ()
#22 0x0000000000451e5b in PyRun_FileExFlags ()
#23 0x0000000000452394 in PyRun_SimpleFileExFlags ()
#24 0x0000000000453ead in Py_Main ()
#25 0x00007ffff7816de5 in __libc_start_main ()
   from /lib/x86_64-linux-gnu/libc.so.6
#26 0x00000000005786be in _start ()

I don’t think this is an issue with my code as it runs fine on other platforms. maybe it is due to how I have built panda ?

I’m really stuck so any pointers or advice would be great !

Thanks

James

Hmmm… the source code for this function is rather simple:

INLINE PandaNode *BulletContact::
get_node0() const {

  return _obj0 ? (PandaNode *)_obj0->getUserPointer() : NULL;
}

There is protection against _obj neing NULL. Could be that the user pointer returned is NULL or something which can’t be casted into a PandaNode.

I don’t know for sure, but I think I heard that Bullet uses temporary collison shapes in some cases when dealing with compounds (multiple shapes). Maybe they mixed up something with transfering user pointers to thsi temporary shapes, but this is just a wild guess. And the problem you described should then appear on all platforms.

Are you using the same version of the Bullet libs on all platforms?
Does this problem exist if you use the current Panda3D CVS code (and not 1.8.1)?

Using CVS is what I would recommend anyway, because many important bugfixes could not be added to the 1.8.x branch.

If you compile yourself anyway, could you try and use current Panda3D CVS + Bullet 1.8.2? I use this combination on Windows currently.

Casting NULL to a PandaNode pointer is in fact perfectly valid, and would result in a None on the Python side.

That code would produce a Segmentation fault when _obj0 is either uninitialised or has been deleted (or more rarely when the method is called on an invalid object). I’m guessing that Bullet might not guarantee the lifetime of the btCollisionObject beyond the collision callback, or something like that?

Alternatively, the problem could be in the bindings, that they do an operation on the returned PandaNode pointer which has become invalid. Could Panda perhaps have destroyed the PandaNode? Since you’re storing it as a regular pointer, are you making sure to increase the reference count manually (with ref()) when you store it so that Panda can’t destroy it in the meantime, making sure to unref() it when the object is deleted?

Thanks for your replies guys.

Thanks enn0x I will try this and let you know.
Edit P.S. did you mean bullet 2.82 ? if so I get strange build error when trying to build with it =/

compilation terminated.
The following command returned a non-zero value: g++ -ftemplate-depth-30 -fPIC -c -o built/tmp/p3bullet_composite.o -Ibuilt/tmp -Ibuilt/include -I/usr/include/python2.7 -I/usr/include/eigen3 -Ipanda/src/bullet -pthread -msse2 -O2 -DBUILDING_PANDABULLET panda/src/bullet/p3bullet_composite.cxx

Thank rdb Im sorry but Im not sure what you mean by increasing reference count. could you elaborate a bit ?

Thanks James.

My reply was mostly directed at enn0x, sorry.

As for the compilation error, you snipped off the part containing the actual error - it would be helpful if you shared it.

Thanks rdb.

Sorry I realised the compilation error was because my build script was applying a patch we wrote to expose some more bullet functionality to panda, which apparently isn’t compatible with the CVS version.

Having successfully built the CVS version my issue has been resolved. Thanks so much for your help guys :smiley:

Thank you rdb, I think your post sets me on the right path.

I don’t think that _obj0 is not intialized. Ok, I don’t have written a CTOR for BulletContact, but since I am the only one who creates instances of BulletContact, and since I initialize all field immediately after creation (BulletContactResult::addSingleResult), the value is whatever Bullet provides me for the btCollisionObject. As of today I would consider this bad practice - will fix this.

btCollisionObject is Bullet’s base class for all rigid bodies, soft bodies and so on. These objects are usually long-living, i. e. they don’t get created/released for a single frame. Every btCollisionObject created is wrapped by a Panda3D object (e. g. BulletRigidBody, derived from PandaNode), which controlls the lifetime of the btCollision object in it’s CTOR/DTOR. And the lifetime of the wrapper objects is guaranteed, because once they get added to a world I store them in private arrays within BulletWorld. So even removing calling remove_node() won’t destroy it.

However, I start to realize that there are ways how _obj0 could become invalid. For example if the body is removed from the world (and thus might get released) between the contact query (this is when BulletContact gets created) and access to _obj0 (calling get_node0()). Another way would be the already mentioned temporary btCollisionObjects involved in multi-shape bodies. If Bullet really creates temporary objects then I have no control over their liftime.

If this is the reason then it is easy to fix: I resolve the corresponding PandaNode* at the time of the contact query, and store it in a PT(PandaNode).

The only drawback is that resolving the PandaNode* will be done always, no matter if the users wants to access it or not. But this should be only a tiny performance penalty.

I try to check in a fix this evening (if I find some spare time).