[solved] bug? help please :/

[size=100]History[/size]
[size=75](Skip to ‘isolating the problem’ if you have a short attention span and want to just look at the 3 line snipplet illustrating the actual problem.)[/size]

I was trying to re-create the panda physics example in C++ when I stumbled on a strange problem.

The application would segfault on PhysicsManager->attach_physical_node() with following backtrace:

#0  0x080558e8 in PointerTo<Physical>::operator 
          Physical* (this=0x15) at /usr/local/panda3d/include/pointerTo.I:83

#1  0x08055911 in PhysicalNode::get_physical 
          (this=0x82af0f4, index=0) at 
          /usr/local/panda3d/include/physicalNode.I:32

#2  0x08056ced in 
          PhysicsManager::attach_physical_node (this=0xbfd0477c, 
          at /usr/local/panda3d/include/physicsManager.I:70

in other words, the physics manager tries to call get_physical on the ActorNode, and gets a bad pointer back.

At first I figured I forgot to add something to the ActorNode, but as it turns out, In the ActorNode constructor you see this in the panda source:

 ActorNode(const string &name) : PhysicalNode(name) {
   _contact_vector = LVector3f::zero();
   add_physical(new Physical(1, true));
  // <snip> ...

In other words, a Physical object should always be in the _physicals vector (which is in ActorNode’s parent class, PhysicalNode), as one gets added in the ctor.

Thus I proceeded debugging by trying to print out ->get_num_physicals().
get_num_physicals simply prints out the vector’s size. The vector is created on the stack in the PhysicalNode header, so there is no reason for it to be uninitialised or whatever. Yet I get:

num_physicals: 34261536

which smells like uninitialised/bad memory to me.

Thus I proceeded to try and make the simplest test-case possible.
Try and add a Physical:

[size=100]Isolating the problem[/size]
[size=75](Start reading here if you have a short attention span.)[/size]


#include <actorNode.h>
#include <physical.h>

int main(int argc, char* argv[])
{

  PT(ActorNode) an = new ActorNode("jetpack-guy-physics");
  PT(Physical) test = new Physical();
  
  // The line below throws these runtime errors:
  //
  // Invalid TypeHandle index -1208867616!  Is memory corrupt?
  //
  // testprog: dtool/src/dtoolbase/typeHandle.cxx:56:
  //   void TypeHandle::inc_memory_usage
  //     (TypeHandle::MemoryClass, int):
  //
  //     Assertion `rnode != (TypeRegistryNode *)__null' failed.
     
  an->add_physical(test);

  return 0;
 
 }

As mentioned in the source comments above, This very simple, essentially 3 line example, throws runtime errors of uninitialised memory!

Nothing can possibly go out of scope, everything is allocated with new, and used immediately afterward. How is this possible?!

The only way I thought it could be possible is if the object would call delete this in the constructor or something, or a self-destructing = operator, who knows. None of the logical explanations made sense, until i looked at how _physicals is defined in physicalNode.h:

  typedef pvector<PT(Physical)> PhysicalsVector;
  PhysicalsVector _physicals;

Looks fine, if pvector were a normal stl vector.
But when I go look at the pvector source, I see it’s using custom stl allocators etc, and does some very very creative memory management.

class pvector : public vector<Type, pallocator_array<Type> >

Meaning that this little problem goes beyond the scope of my intellect, as lots of things can happen that normally don’t happen when you’re dealing with ‘creative’ memory management.

So, what did I stumble upon here?
Did I stumble on a panda bug, or am I violating the laws of nature and the universe? :slight_smile:

I recompiled panda with full debugging enabled (-ggdb -O0), below is gdb output and backtrace from the little code snipplet above:

(gdb) r
Starting program: ~/testprog

[Thread debugging using libthread_db enabled]
[New Thread 0xb402f6f0 (LWP 2220)]

Invalid TypeHandle index -1209232160!  Is memory corrupt?
testprog: dtool/src/dtoolbase/typeHandle.cxx:56: void TypeHandle::inc_memory_usage(TypeHandle::MemoryClass, int): Assertion `rnode != (TypeRegistryNode *)__null' failed.

Program received signal SIGABRT, Aborted.

[Switching to Thread 0xb402f6f0 (LWP 2220)]
0xffffe424 in __kernel_vsyscall ()

(gdb) bt

#0  0xffffe424 in __kernel_vsyscall ()

#1  0xb474e101 in raise () from /lib/libc.so.6

#2  0xb474f8e8 in abort () from /lib/libc.so.6

#3  0xb47477a5 in __assert_fail () from /lib/libc.so.6

#4  0xb4fd5bb0 in TypeHandle::inc_memory_usage (this=0x80d64e0, memory_class=TypeHandle::MC_array, size=0) at dtool/src/dtoolbase/typeHandle.cxx:56

#5  0x0805c873 in pallocator_array<PointerTo<Physical> >::allocate (this=0x80d64e0, n=1073741823) at /usr/local/panda3d/include/pallocator.T:53

#6  0x0805c8ca in std::_Vector_base<PointerTo<Physical>, pallocator_array<PointerTo<Physical> > >::_M_allocate (this=0x80d64e0, __n=1073741823)
    at /usr/lib/gcc/i686-pc-linux-gnu/4.3.2/include/g++-v4/bits/stl_vector.h:144

#7  0x0805ca64 in std::vector<PointerTo<Physical>, pallocator_array<PointerTo<Physical> > >::_M_insert_aux (this=0x80d64e0, __position={_M_current = 0x0}, 
    __x=@0xbf8ff720) at /usr/lib/gcc/i686-pc-linux-gnu/4.3.2/include/g++-v4/bits/vector.tcc:308

#8  0x0805cc6d in std::vector<PointerTo<Physical>, pallocator_array<PointerTo<Physical> > >::push_back (this=0x80d64e0, __x=@0xbf8ff720)
    at /usr/lib/gcc/i686-pc-linux-gnu/4.3.2/include/g++-v4/bits/stl_vector.h:694

#9  0x0805cd7e in PhysicalNode::add_physical (this=0x80d640c, physical=0x80d69d4) at /usr/local/panda3d/include/physicalNode.I:50

#10 0xb4e1f8e4 in ActorNode (this=0x80d640c, name=@0xbf8ff7a4) at panda/src/physics/actorNode.cxx:32

#11 0x0805b911 in main () at ../../src/client/main.cpp:35

Hmm, I just tried the same code and it works fine for me.

It looks like there is some static initialization that wasn’t performed properly. In particular, it appears that pvector_type_handle wasn’t initialized, which is strange (that should have been done when libdtoolbase.so was loaded and initialized, and init_system_type_handles() was called).

So, either there are two different pointers to pvector_type_handle in your memory address space, which means the run-time loader didn’t properly unify these, or the function init_system_type_handles() wasn’t called when libdtool.so was loaded, which means the run-time loader didn’t properly call the static initializers.

Either way, it appears to be a problem with your run-time loader. I know there were some issues with the run-time loader on OS X 10.4, but that’s the only system I know about that has these kinds of problems. What OS are you building on?

Also, are you sure you’re not inadvertently bringing in the wrong version of some .so from a different build? And are you sure that all of your build is internally self-consistent (for instance, you didn’t change compile-time parameters between building dtool and panda, or something like that)?

David

Compiled on Gentoo GNU/Linux:

2.6.28-gentoo #2 SMP
Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux

With g++ version: 4.3.2 (Gentoo 4.3.2-r2 p1.5, pie-10.1.5)

Yes. The first time the problems showed up was from a vanilla build (makepanda), then to debug i rebuilt panda with --optimize 1, but i had first removed everything in the ./built directory, and /usr/local/panda3d. So everything was built from scratch in one go. That didn’t resolve the problem either.

I then rebuilt a few more times, hacking some custom compiler options into makepanda.py for -ggdb. I also tried -DUSE_STL_ALLOCATOR -DDO_MEMORY_USAGE in order to simplify the code a bit where the error happens and have a cleaner backtrace.

Ever rebuild, i cleaned out ./built and /usr/local/lib/panda

I also doublechecked with ldconfig -p | grep panda to make sure the desired libraries are used.
Then just now I also doublechecked with ldd myprogram to make sure it linked against the desired libs. All looks fine.

# ldconfig -p | grep -i panda
        libp3pystub.so (libc6) => /usr/local/panda3d/lib/libp3pystub.so
        libp3ptloader.so (libc6) => /usr/local/panda3d/lib/libp3ptloader.so
        libp3openal_audio.so (libc6) => /usr/local/panda3d/lib/libp3openal_audio.so
        libp3heapq.so (libc6) => /usr/local/panda3d/lib/libp3heapq.so
        libp3glstuff.so (libc6) => /usr/local/panda3d/lib/libp3glstuff.so
        libp3framework.so (libc6) => /usr/local/panda3d/lib/libp3framework.so
        libp3fmod_audio.so (libc6) => /usr/local/panda3d/lib/libp3fmod_audio.so
        libp3dtoolconfig.so (libc6) => /usr/local/panda3d/lib/libp3dtoolconfig.so
        libp3dtool.so (libc6) => /usr/local/panda3d/lib/libp3dtool.so
        libp3direct.so (libc6) => /usr/local/panda3d/lib/libp3direct.so
        libpandastripped.so (libc6) => /usr/local/panda3d/lib/libpandastripped.so
        libpandaskel.so (libc6) => /usr/local/panda3d/lib/libpandaskel.so
        libpandaphysics.so (libc6) => /usr/local/panda3d/lib/libpandaphysics.so
        libpandaode.so (libc6) => /usr/local/panda3d/lib/libpandaode.so
        libpandamesa.so (libc6) => /usr/local/panda3d/lib/libpandamesa.so
        libpandagl.so (libc6) => /usr/local/panda3d/lib/libpandagl.so
        libpandafx.so (libc6) => /usr/local/panda3d/lib/libpandafx.so
        libpandaexpress.so (libc6) => /usr/local/panda3d/lib/libpandaexpress.so
        libpandaeggstripped.so (libc6) => /usr/local/panda3d/lib/libpandaeggstripped.so
        libpandaegg.so (libc6) => /usr/local/panda3d/lib/libpandaegg.so
        libpanda.so (libc6) => /usr/local/panda3d/lib/libpanda.so
        libfmodexp.so (libc6) => /usr/local/panda3d/lib/libfmodexp.so
        libfmodex.so (libc6) => /usr/local/panda3d/lib/libfmodex.so
        libfmod-3.74.so (libc6) => /usr/local/panda3d/lib/libfmod-3.74.so
        libCgGL.so (ELF) => /usr/local/panda3d/lib/libCgGL.so
        libCg.so (ELF) => /usr/local/panda3d/lib/libCg.so


# ldd testprog
        linux-gate.so.1 =>  (0xffffe000)
        libpandafx.so => /usr/local/panda3d/lib/libpandafx.so (0xb7e8c000)
        libpandaexpress.so => /usr/local/panda3d/lib/libpandaexpress.so (0xb7b6e000)
        libpanda.so => /usr/local/panda3d/lib/libpanda.so (0xb5186000)
        libp3dtoolconfig.so => /usr/local/panda3d/lib/libp3dtoolconfig.so (0xb50c5000)
        libp3dtool.so => /usr/local/panda3d/lib/libp3dtool.so (0xb506f000)
        libp3pystub.so => /usr/local/panda3d/lib/libp3pystub.so (0xb5069000)
        libpandaphysics.so => /usr/local/panda3d/lib/libpandaphysics.so (0xb4e67000)
        libp3direct.so => /usr/local/panda3d/lib/libp3direct.so (0xb4c60000)
        libp3framework.so => /usr/local/panda3d/lib/libp3framework.so (0xb4bdb000)
        libpthread.so.0 => /lib/libpthread.so.0 (0xb4bc4000)
        libdl.so.2 => /lib/libdl.so.2 (0xb4bbf000)
        libutil.so.1 => /lib/libutil.so.1 (0xb4bbb000)
        libpython2.5.so.1.0 => /usr/lib/libpython2.5.so.1.0 (0xb4a82000)
        libstdc++.so.6 => /usr/lib/gcc/i686-pc-linux-gnu/4.3.2/libstdc++.so.6 (0xb4995000)
        libm.so.6 => /lib/libm.so.6 (0xb496f000)
        libgcc_s.so.1 => /usr/lib/gcc/i686-pc-linux-gnu/4.3.2/libgcc_s.so.1 (0xb4961000)
        libc.so.6 => /lib/libc.so.6 (0xb4830000)
        libCg.so => /usr/local/panda3d/lib/libCg.so (0xb4529000)
        libz.so.1 => /lib/libz.so.1 (0xb4515000)
        libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0xb44cc000)
        libpng12.so.0 => /usr/lib/libpng12.so.0 (0xb44a5000)
        libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0xb447d000)
        libtiff.so.3 => /usr/lib/libtiff.so.3 (0xb4424000)
        libfreetype.so.6 => /usr/lib/libfreetype.so.6 (0xb438f000)
        /lib/ld-linux.so.2 (0xb7f27000)
        libgssapi_krb5.so.2 => /usr/lib/libgssapi_krb5.so.2 (0xb4365000)
        libkrb5.so.3 => /usr/lib/libkrb5.so.3 (0xb42d5000)
        libcom_err.so.2 => /lib/libcom_err.so.2 (0xb42d1000)
        libk5crypto.so.3 => /usr/lib/libk5crypto.so.3 (0xb42a9000)
        libresolv.so.2 => /lib/libresolv.so.2 (0xb4297000)
        libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0xb4146000)
        libkrb5support.so.0 => /usr/lib/libkrb5support.so.0 (0xb413c000)

This is panda 1.5.4 … I guess maybe I could try and grab the latest cvs and maybe try with an older gcc version.

I tried with a 3.x gcc (3.4.6) - didn’t resolve anything.

I tried manually calling init_system_type_handles(); eg:

#include <actorNode.h>
#include <physical.h>


int main(int argc, char* argv[])
{


  init_system_type_handles();

  PT(ActorNode) an = new ActorNode("jetpack-guy-physics");
  PT(Physical) test = new Physical();
 
  // The line below throws these runtime errors:
  //
  // Invalid TypeHandle index -1208867616!  Is memory corrupt?
  //
  // testprog: dtool/src/dtoolbase/typeHandle.cxx:56:
  //   void TypeHandle::inc_memory_usage
  //     (TypeHandle::MemoryClass, int):
  //
  //     Assertion `rnode != (TypeRegistryNode *)__null' failed.
     
  an->add_physical(test);

  return 0;
 
 } 

But that doesn’t solve the crash. same issue remains.
So that function must somehow not be doing what it’s supposed to be doing.

I did a small test:

In tool/src/dtoolbase/register_type.cxx - where the pvector type gets registred, i inserted a small debug print:


    // ...
    std::cout << "registering type pvector" << std::endl;
    std::cout.flush();
    register_type(pvector_type_handle, "pvector");
    // ...
 

Now, my compiled test program shows the type getting registred, but still crashes:

$ bin/crashtest 
registering type pvector
Segmentation fault

Modified register_type function for extra debugging, see:
linkerror.com/stuff/typeRegistry.cxx
With which I get the following output:
linkerror.com/stuff/out.txt

So, it’s looking more to me like the initialization is being called and completes just fine…


I was going to try with the cvs (1.6.0) but I can’t even get that to compile.
It seemed to need a different fmodex version, so i grabbed the latest of that, got a bit further then, but then it couldn’t find python stuff, so i hacked makepanda.py to include python-config --libs and python-config --cflags into the compiler flags, got a bit further then, but now i get:

built/bin/interrogate -srcdir panda/src/express -Ipanda/src/express -D__STDC__=1 -D__cplusplus -D__inline -D__const=const -D__i386__ -DFORCE_INLINING -oc built/tmp/libexpress_igate.cxx -od built/pandac/input/libexpress.in -fnames -string -refcount -assert -python-native -Sbuilt/include/parser-inc -Ipanda/src/express -S"/usr/include/python2.5" -S"built/tmp" -S"built/include" -DBUILDING_PANDAEXPRESS -module pandaexpress -library libexpress buffer.h checksumHashGenerator.h circBuffer.h config_express.h datagram.h datagramGenerator.h datagramIterator.h datagramSink.h dcast.h encrypt_string.h error_utils.h export_dtool.h express_composite.cxx hashGeneratorBase.h hashVal.h indirectLess.h memoryInfo.h memoryUsage.h memoryUsagePointerCounts.h memoryUsagePointers.h multifile.h namable.h nodePointerTo.h nodePointerToBase.h nodeReferenceCount.h ordered_vector.h pStatCollectorForwardBase.h password_hash.h patchfile.h pointerTo.h pointerToArray.h pointerToArrayBase.h pointerToBase.h pointerToVoid.h profileTimer.h pta_float.h pta_uchar.h ramfile.h referenceCount.h stringDecoder.h subStream.h subStreamBuf.h textEncoder.h threadSafePointerTo.h threadSafePointerToBase.h trueClock.h typedReferenceCount.h typedef.h unicodeLatinMap.h vector_float.h vector_uchar.h virtualFile.h virtualFileComposite.h virtualFileList.h virtualFileMount.h virtualFileMountMultifile.h virtualFileMountSystem.h virtualFileSimple.h virtualFileSystem.h weakPointerCallback.h weakPointerTo.h weakPointerToBase.h weakPointerToVoid.h weakReferenceList.h windowsRegistry.h zStream.h zStreamBuf.h
        *** Error in ~/development/lib/panda3d_cvs/built/include/dtoolbase_cc.h near line 105, column 2:
        syntax error, unexpected IDENTIFIER, expecting '{' or ';' or ':' or '='
Error parsing file: 'buffer.h'

Line 105 in dtoolbaqse_cc.h is:

typedef ios::seekdir ios_seekdir;

I don’t see anything wrong with that so I guess I’m still stuck. :frowning:

Help? :slight_smile:

Maybe something is wrong with your parser-inc directory? This contains header files that interrogate needs to properly understand the C++ code it reads. It is strange that it yelled about line 105, though, after a string of similar lines 101 - 104.

You certainly appear to be running in a strange alien universe, compared to the rest of us. :confused: Did other C++ examples work properly before you got to the jetpack guy problem? Does pview run properly? Do the Python examples work properly?

David

Well, I just re-downloaded it again, to rule that out…
Unless something’s wrong with panda3d.org/download/panda3d-1.5 … 5.4.tar.gz Then I don’t see any way of that happening.

Also no errors during compilation or anything… :frowning:

The python examples and pview run just fine… So did my code until I used the physics lib with the particular code above.

Problem solved.

pro-rsoft helped me tackle it. He was kind enough to spend over 2 hours on this one. - If any one is ever concerned about support in panda3d, they shouldn’t be! :smiley:

The cause of it isn’t any less mysterious though…

I was using -lp3pystub in combination with python-config --libs which includes -lpython2.5, and pystub shadows all functions panda uses from python. Not sure if that might cause issues or not…

Either way, the problem was solved when he compiled 1.6.0 for me on my machine. So it might have been something fixed in 1.6.0 :slight_smile:

So, problem solved! :slight_smile:

Infinite thanks to pro-rsoft for this!

Wow, that’s a good one.

Thanks, pro-rsoft!

David