Any one up for OSX work.


#202

Mhm, that explains it, it wasn’t a debug build. Try again with the new dmg I just uploaded. It also includes a potential fix to what might be the problem, but I’m not sure.

Doesn’t pview crash too when you hit the “P” key? (sorry, I forgot to mention that.) (Also, a full traceback would be useful too, with “bt full”.)


#203

Sorry about that, I didn’t know of the “P” key :slight_smile:

Yes, it crashes.

The new dmg doesn’t seem to have debug symbols either, since even if I don’t have the sources installed, it should show something like this:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000090
0x00001fdf in main (argc=1, argv=0xbffff548) at test.c:14
14	test.c: No such file or directory.
	in test.c
(gdb) list
9	in test.c

Here’s the binary info:

machine:proj user$ ls -l /Applications/Panda3D/1.6.0/bin/pview
-rwxr-xr-x@ 1 user  admin  114096 28 feb 13:55 /Applications/Panda3D/1.6.0/bin/pview
machine:proj user$ file /Applications/Panda3D/1.6.0/bin/pview
/Applications/Panda3D/1.6.0/bin/pview: Mach-O universal binary with 2 architectures
/Applications/Panda3D/1.6.0/bin/pview (for architecture ppc7400):	Mach-O executable ppc
/Applications/Panda3D/1.6.0/bin/pview (for architecture i386):	Mach-O executable i386

Here is the gdb output from the latest version:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x02da7caf in RenderState::get_generated_shader ()
(gdb) where
#0  0x02da7caf in RenderState::get_generated_shader ()
#1  0x01126136 in GLGraphicsStateGuardian::set_state_and_transform ()
#2  0x02faa923 in CullBinStateSorted::draw ()
#3  0x02ceda10 in CullResult::draw ()
#4  0x031c2ad6 in GraphicsEngine::do_draw ()
#5  0x031c4429 in GraphicsEngine::draw_bins ()
#6  0x031c4768 in GraphicsEngine::draw_bins ()
#7  0x031c759a in GraphicsEngine::WindowRenderer::do_frame ()
#8  0x031d8b0e in GraphicsEngine::render_frame ()
#9  0x00068a7b in PandaFramework::task_igloop ()
#10 0x03251874 in GenericAsyncTask::do_task ()
#11 0x03255cf4 in AsyncTask::unlock_and_do_task ()
#12 0x0325f5f8 in AsyncTaskChain::service_one_task ()
#13 0x0325ff26 in AsyncTaskChain::do_poll ()
#14 0x0325ffdb in AsyncTaskManager::poll ()
#15 0x00068a45 in PandaFramework::do_frame ()
#16 0x00068c0c in PandaFramework::main_loop ()
#17 0x00002d80 in main ()
(gdb) bt full
#0  0x02da7caf in RenderState::get_generated_shader ()
No symbol table info available.
#1  0x01126136 in GLGraphicsStateGuardian::set_state_and_transform ()
No symbol table info available.
#2  0x02faa923 in CullBinStateSorted::draw ()
No symbol table info available.
#3  0x02ceda10 in CullResult::draw ()
No symbol table info available.
#4  0x031c2ad6 in GraphicsEngine::do_draw ()
No symbol table info available.
#5  0x031c4429 in GraphicsEngine::draw_bins ()
No symbol table info available.
#6  0x031c4768 in GraphicsEngine::draw_bins ()
No symbol table info available.
#7  0x031c759a in GraphicsEngine::WindowRenderer::do_frame ()
No symbol table info available.
#8  0x031d8b0e in GraphicsEngine::render_frame ()
No symbol table info available.
#9  0x00068a7b in PandaFramework::task_igloop ()
No symbol table info available.
#10 0x03251874 in GenericAsyncTask::do_task ()
No symbol table info available.
#11 0x03255cf4 in AsyncTask::unlock_and_do_task ()
No symbol table info available.
#12 0x0325f5f8 in AsyncTaskChain::service_one_task ()
No symbol table info available.
#13 0x0325ff26 in AsyncTaskChain::do_poll ()
No symbol table info available.
#14 0x0325ffdb in AsyncTaskManager::poll ()
No symbol table info available.
#15 0x00068a45 in PandaFramework::do_frame ()
No symbol table info available.
#16 0x00068c0c in PandaFramework::main_loop ()
No symbol table info available.
#17 0x00002d80 in main ()
No symbol table info available.
(gdb) 

#204

Darn it. Must be a bug in makepanda.
I’ve made it print some debug info now - can you try the new dmg?


#205

Since where and bt full don’t give any usable info, I’ll skip those.

$ pview
Known pipe types:
  osxGraphicsPipe
(all display modules loaded.)
a:0x1c698714:3
b:1
c:0xf8e6cc:3
d:0xf8e1a4:16
1:0xf8e6cc
2:0
3:0
Bus error

$ gdb pview
...
a:0x1c82a524:3
b:1
c:0xf8e6cc:3
d:0xf8e1a4:16
1:0xf8e6cc
2:0
3:0

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x02da7b6e in RenderState::get_generated_shader ()

First run is directly, and second run through gdb.


#206

The problem seems to be that ShaderGeneratorBase::get_default() returns NULL. I am lost though, I don’t get the ShaderGeneratorBase thing - it’s new to me. I’m even more confused as to why I’m not getting the error here.
It appears David created that file when he separated pgraph and pgraphnodes.
It’s supposed to be initialized in config_pgraphnodes.cxx:

ShaderGenerator::set_default(new ShaderGenerator());

But I don’t see anything like ShaderGeneratorBase::set_default being set.

drwr, could you maybe shine some light on this?


#207

I only created ShaderGeneratorBase to allow the division of pgraph into two smaller directories, pgraph and pgraphnodes, simply because pgraph was getting too large to compile on certain platforms.

But ShaderGeneratorBase::set_default() is the same thing as ShaderGenerator::set_default(), which is, as you noted, called in config_pgraphnodes. There’s no need to call anything else.

David


#208

That’s very weird why it’s still NULL on his machine, then. WhiteFang, try out the new build, which prints some more info, to see if set_default gets called at all.


#209

From the march 02 version:

$ pview
Set here
default SG :0x1013024
default SGB:0x1013024
Known pipe types:
osxGraphicsPipe
(all display modules loaded.)
a:0x1c716814:3
b:1
c:0xf8e6cc:3
d:0xf8e1a4:16
1:0xf8e6cc
2:0
3:0
Bus error


#210

Mysterious. I’m stumped.

It means set_default certainly does get set to a ShaderGenerator instance.
But later, get_default returns NULL.
Somewhere in between that, it must have been set to NULL.
But I can’t find any single reference to set_default or _default_generator in the Panda source except for the one in pgraphnodes.


#211

I concur. There are no references that I can find.

Furthermore, the code:

ShaderGenerator *ShaderGenerator::
get_default() {
  if (_default_generator == (ShaderGenerator *)NULL) {
    _default_generator = new ShaderGenerator;
  }
  return _default_generator;
}

Can’t get much simpler…

GCC on Mac OS X returns 0 for undefined variables - same as NULL, so the if() should work.

So if this is the problem, then I’m stumped as well :neutral_face:


#212

Try putting a cerr statement within the get_default() method, to ensure that it is being called and that it is initializing the pointer correctly. Also put a cerr statement when the pointer is being queried, to ensure that this happens after get_default() has been called.

David


#213

Okay. WhiteFang, try the new DMG. I totally recompiled it from scratch (this time, gdb “bt full” should work too.). I spawned some extra debug info in set_default, get_default, and the constructor and destructor.


#214

(gdb) run
Starting program: /Applications/Panda3D/1.6.0/bin/pview
Reading symbols for shared libraries ++++++++++warning: .o file “/Developer/SDKs/MacOSX10.4u.sdk/usr/lib/gcc/i686-apple-darwin9/4.0.1/libgcc_eh.a(unwind-dw2.o)” more recent than executable timestamp
warning: .o file “/Developer/SDKs/MacOSX10.4u.sdk/usr/lib/gcc/i686-apple-darwin9/4.0.1/libgcc_eh.a(unwind-dw2-fde-darwin.o)” more recent than executable timestamp
… done
Set here
SGB Constructed : 0x7212f74
current :0
set_default called, with SGB 0x7212f74
SGB::get_default called: 0x7212f74:1
default SG :0x7212f74
SGB::get_default called: 0x7212f74:1
default SGB:0x7212f74
Reading symbols for shared libraries warning: Could not find object file “/Users/pro-rsoft/panda3d/built/tmp/pandagl_pandagl.o” - no debug information available for “panda/metalibs/pandagl/pandagl.cxx”.

warning: Could not find object file “/Users/pro-rsoft/panda3d/built/tmp/glgsg_config_glgsg.o” - no debug information available for “panda/src/glgsg/config_glgsg.cxx”.

warning: Could not find object file “/Users/pro-rsoft/panda3d/built/tmp/glgsg_glgsg.o” - no debug information available for “panda/src/glgsg/glgsg.cxx”.

warning: Could not find object file “/Users/pro-rsoft/panda3d/built/tmp/osxdisplay_composite.o” - no debug information available for “panda/src/osxdisplay/osxdisplay_composite.mm”.

.warning: Could not find object file “/Users/pro-rsoft/panda3d/built/tmp/glstuff_glpure.o” - no debug information available for “panda/src/glstuff/glpure.cxx”.

… done
Known pipe types:
osxGraphicsPipe
(all display modules loaded.)
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
Reading symbols for shared libraries … done
Reading symbols for shared libraries . done
Reading symbols for shared libraries … done
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
Reading symbols for shared libraries … done
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
a:0x1e429964:3
b:1
c:0x27d6cc:3
d:0x27d1a4:16
1:0x27d6cc
2:0
SGB::get_default called: 0
3:0

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x039f4dfe in RenderState::get_generated_shader ()
(gdb) where
#0 0x039f4dfe in RenderState::get_generated_shader ()
#1 0x07325969 in GLGraphicsStateGuardian::set_state_and_transform ()
#2 0x03bf86e7 in CullHandler::draw ()
#3 0x03bf6d31 in CullBinStateSorted::draw ()
#4 0x0392eefc in CullResult::draw ()
#5 0x03e0d6c1 in GraphicsEngine::do_draw ()
#6 0x03e0e3c9 in GraphicsEngine::draw_bins ()
#7 0x03e0e6f4 in GraphicsEngine::draw_bins ()
#8 0x03e116b5 in GraphicsEngine::WindowRenderer::do_frame ()
#9 0x03e124ab in GraphicsEngine::render_frame ()
#10 0x000d870d in PandaFramework::task_igloop ()
#11 0x03ea0dd4 in GenericAsyncTask::do_task ()
#12 0x03ea9c8f in AsyncTask::unlock_and_do_task ()
#13 0x03eade4e in AsyncTaskChain::service_one_task ()
#14 0x03eae981 in AsyncTaskChain::do_poll ()
#15 0x03eaea6c in AsyncTaskManager::poll ()
#16 0x000d8505 in PandaFramework::do_frame ()
#17 0x000d8549 in PandaFramework::main_loop ()
#18 0x00003288 in main ()
(gdb) bt full
#0 0x039f4dfe in RenderState::get_generated_shader ()
No symbol table info available.
#1 0x07325969 in GLGraphicsStateGuardian::set_state_and_transform ()
No symbol table info available.
#2 0x03bf86e7 in CullHandler::draw ()
No symbol table info available.
#3 0x03bf6d31 in CullBinStateSorted::draw ()
No symbol table info available.
#4 0x0392eefc in CullResult::draw ()
No symbol table info available.
#5 0x03e0d6c1 in GraphicsEngine::do_draw ()
No symbol table info available.
#6 0x03e0e3c9 in GraphicsEngine::draw_bins ()
No symbol table info available.
#7 0x03e0e6f4 in GraphicsEngine::draw_bins ()
No symbol table info available.
#8 0x03e116b5 in GraphicsEngine::WindowRenderer::do_frame ()
No symbol table info available.
#9 0x03e124ab in GraphicsEngine::render_frame ()
No symbol table info available.
#10 0x000d870d in PandaFramework::task_igloop ()
No symbol table info available.
#11 0x03ea0dd4 in GenericAsyncTask::do_task ()
No symbol table info available.
#12 0x03ea9c8f in AsyncTask::unlock_and_do_task ()
No symbol table info available.
#13 0x03eade4e in AsyncTaskChain::service_one_task ()
No symbol table info available.
#14 0x03eae981 in AsyncTaskChain::do_poll ()
No symbol table info available.
#15 0x03eaea6c in AsyncTaskManager::poll ()
No symbol table info available.
#16 0x000d8505 in PandaFramework::do_frame ()
No symbol table info available.
#17 0x000d8549 in PandaFramework::main_loop ()
No symbol table info available.
#18 0x00003288 in main ()
No symbol table info available.
(gdb)


#215

Really? That’s weird. Can you try the new DMG? (not through gdb this time)
It just prints out the pointer to the default SG pointer.

I still don’t know why gdb prints no extra debug information.


#216

I’ll try when I get home (in a few hours).

I did some testing, and yes, if a variable isn’t assigned a value, it returns 0.

That is - unless there hasn’t been declared/assigned a value to some other variable. Then it returns 4096.

$ cat und.c
#include <stdio.h>

void main(void)
{
	int i=4;
	int j;
	int *k=&i;
	int *l;
	printf("%d\n%d\n",i,j);
	printf("%d\n%d\n",k,l);
}

gives

$ ./und
4
0
-1073743696
4096

#217

No, I doubt it. It’s just returning a random value, whatever happens to be in memory at that particular address at the time the program started. It might be 0 more often than not, but that’s just the way the ball bounces.

David


#218

And the latest run :slight_smile:

$ pview
Set here
SGB Constructed : 0x7213014
  current :0x5cbc698
set_default called
  new 0x5cbc698
SGB::get_default called: 0x5cbc698
default SG :0x7213014
SGB::get_default called: 0x5cbc698
default SGB:0x7213014
Known pipe types:
  osxGraphicsPipe
(all display modules loaded.)
a:0x1e29daf4:3
b:1
c:0x27d6cc:3
d:0x27d1a4:16
1:0x27d6cc
2:0
SGB::get_default called: 0x5cbc698
3:0
Bus error

#219

The address of the pointer is the same. Nothing else sets the variable. Still, suddenly it becomes NULL.
So unless Panda calls ((int)0x5cbc698) = 0; somewhere this is impossible in my eyes. I’m lost here.


#220

Ah, I think I understand what’s going on. This is a static-init ordering problem.

This is one of those real nasty C++ problems; and one that I’m quite familiar with (we’ve been fighting it in various forms for years). It’s also one of the reasons I’m not at all a fan of doing a lot of stuff automatically in static init, but one of our early Panda developers thought this was a swell idea and started us down this path, and it’s too late to go back now.

Static init is a concept that was introduced with the development of C++ and its constructors. Originally, when all compiled programs were written in C or some similar non-object-oriented language, there wasn’t much code that ran before main() was called; just some startup stuff hardcoded into the system runtime libraries. C allows you to define global or “static” variables outside of any function scope, and even give them initial values, like this:

int x = 10;
int main() {
  ...
}

which means that at the time main() is called, x already exists and has the value 10. This was implemented by preloading a memory image that already had the right bits in the right place when it was loaded from disk; no code was necessary to run before main in order to assign 10 to x.

But, now introduce C++ and its constructors. Now you can declare an object outside of main that has a constructor. According to C++ semantics, that constructor has to be called to initialize that object, and thus you now have user code that is running before main:

class Thing {
  Thing() { cerr << "initializing\n"; }
};
Thing x;
int main() {
  cerr << "running main\n";
  return 0;
};

This caused a sea change in system library support, because suddenly the system runtime loader has to support calling user code automatically when a program is started, or even when a .so is loaded in at runtime.

But anyway. Part of Panda’s low-level design takes advantage of these static initializers to call all sorts of setup function when the libraries are loaded. init_libpgraphnodes() is one of those functions, and one of the things it calls is ShaderGenerator::set_default(new ShaderGenerator()). This gets called at static init time, by virtue of a class object with a constructor, and so it is supposed to be called automatically when libpgraphnodes.so gets loaded into the running program. So, we’re supposed to be guaranteed that the ShaderGenerator already has a default value set by the time we start running.

But wait! We also have a static constuctor in libpgraph.so. It looks like this:

PT(ShaderGeneratorBase) ShaderGeneratorBase::_default_generator;

Don’t see the static constructor? It’s hard to see, isn’t it? Welcome to the joys of C++, where code can be hidden from the programmer. In fact, there’s a default constructor for the class PT(ShaderGeneratorBase), and the default constructor’s job is to initialize its pointer to NULL.

So, as long as libpgraph.so’s static constructors are called before the ones in libpgraphnodes.so, then everything is good: the default constructor for _default_generator will be called, ensuring that pointer is NULL. Then the static constructors in libpgraphnodes.so will be called, which will call set_default(), reassigning the pointer to a valid value. But, if the static constructors happen to get called in the opposite order, we have a terrible situation: the set_default() will be called first, assigning the pointer to a valid value, and then the default constructor will be called later, reassigning the pointer to NULL! That’s certainly what’s happening here.

Unfortunately, the system does not guarantee any ordering of static init constructors between different .so’s. It’s absolutely unpredictable. So on one system, it might call these in the correct order, and on another system, it might call them in the incorrect order. The ordering might even change from one day to the next.

So, basically, I introduced this bug when I split up libpgraph.so and libpgraphnodes.so, because in doing so I introduced a nondeterministic behavior between these static initializers. But because C++ tries so hard to make things automatic, the bug is extremely hard to see until it bites you, and you spend days isolating it down to discover that a pointer is getting reset to NULL after you had thought it was properly set.

I’ll fix the bug now. It’s easy to fix, by replacing the PT(ShaderGeneratorBase) with an ordinary ShaderGeneratorBase * pointer. The reason this will fix the problem is that an ordinary pointer doesn’t have a constructor, so its default value will be set to NULL by preloading the memory image, and so there’s no longer an ordering issue between static initializers. (I’ll also have to explicitly manage the reference counts in set_default() to compensate for this change, but that’s not so bad.)

My apologies for the long trip down a dark corridor I caused you guys.

David


#221

It was in this moment, when all hope had faded, that Isild-- oh, wrong line :slight_smile:

But seriously, thanks a lot! You’re the best :slight_smile: Also thanks for the clear explanation, next time something like this occurs, I’ll know what actually happens.

We’ll know for sure when compiling has finished, but I’d bet my bottom dollar on it.