interrogate2swig.py


#3

Obviously, Hugh is some kind of mad scientist.

The thing that scares me about this is that, every 3d engine I’ve started messing around with that has python bindings…sooner or later they start changing it to swig, or boost, or whatever, for whatever reasons…and it ends up being dropped and has to be picked up by someone else. Who promptly decides to do it using a different method! Of course, that’s not too likely here, since Python is decidely tied to Panda3d as the primary language supported and documented.

I think that if you guys change the way it works, you should consider as a primary concern wether it will make it easier for developers to contribute new C++ code to Panda3d. The users only using Python are not going to be affected, by and large. So it should be most important that the method used is the one that makes extending Panda3d easy, and accessible to the most coders.

Would using SWIG make it easier (or harder) for someone to create bindings for other languages such as java or ruby?


#4

This change (if it works) will probably be largely transparent to the developer. Developers will continue to specify the Python interface using the PUBLISHED: section in the headerfiles.

Of course, Swig is relatively well-known, so anyone who wanted to dig into the details of how the build works would have a reasonable chance of knowing straightaway how Swig works; and of course they can use it on other projects.

Would using SWIG make it easier (or harder) for someone to create bindings for other languages such as java or ruby?

Swig supports the following languages:

  • C#
  • Chicken
  • Guile
  • Java
  • Modula-3
  • MzScheme
  • Ocaml
  • Perl5
  • PHP4
  • Pike
  • Python
  • Ruby
  • Tcl

I think its almost as easy as specifying -java instead of -python to generate bindings in java.

Some background on SWIG: “Originally developed in 1995, SWIG was first used by scientists in the Theoretical Physics Division at Los Alamos National Laboratory for building user interfaces to simulate codes running on the Connection Machine 5 supercomputer. In this environment, scientists needed to work with huge amounts of simulation data, complex hardware, and a constantly changing code base. The use of a scripting language interface provided a simple yet highly flexible foundation for solving these types of problems. SWIG simplifies development by largely automating the task of scripting language integration−−allowing developers and users to focus on more important problems.”

Hugh


#5

Current status:

  • swig will run without errors now
  • occurred to me that I dont need to manually process macros in the script, since Swig is perfectly capable of handling them itself in fact, as long as we tell Swig about the macro definitions
  • issues with compiling resulting file, see below
  • possible namespace issues, see below

Template-style defines

Panda uses defines instead of templates, which probably makes this a lot easier than it otherwise would be. Nevertheless we do have to do something with these defines.

  • FLOATNAME and FLOATTYPE are being defined to name##f and float respectively.
  • CDP is being hardcoded to name##Swig, since presumably these classes are so low-level that they dont actually matter.

Info on swig pass

Swig will take the generated .i file, and process it in about 30 seconds.

The output is:

  • panda.py (1.5MB)
  • panda_wrap.cxx (9.5MB)

Currently it ignores anything with any hint of being a template, since we’re not telling it about templates (it expects a %template line for each template instantiation, which is doable, but we didnt do this yet).

Compilation issues

Since we just took all .h files in panda/src, it looks like we’re probably taking more than we really should do. For example char_headers.h and a few other _headers files dont seem to want to build on their own.

Also, a lot of these .h files want files from the thirdparty directory, which I’d rather exclude for now, for a few reasons. At least, make it optional.

Script has been modified to:

  • only take files from a specific set of directories (audio, chan, char, collide, device, dgraph etc), missing out the hardware specific ones, such as mesagl, wgl etc
  • ignore any files starting in test_ or ending in _headers.h

… but errors persist.

Current strategy is to think about whether we can plug the script into makepanda.py somehow, so that makepanda.py is going to describe what headers we process.

Presumably, makepanda.py can tell us:

  • the module name
  • the header files associated with this module name
  • composite file names (not sure why/if we need these?)

Worst case, makepanda.py can simply call something like registerfilesforswigprocessing.py, and later we take the output from that, and process it in one go, in batch-type style.

Namespace issues

Swig doesnt do anything with namespaces: it just ignores them.

Looking at the manual for Panda, it looks like interrogate is providing some namespace information?

For example, we have:

import direct.directbase.DirectStart

So, direct is the module name, and I’m guessing that directbase is a namespace? And DirectStart is probably a class?

This would essentially break backwards-compatibility with old scripts?

However, the impact could perhaps we limited to just the import commands at the head of each script?

Hugh

Current script is at


#6

Things looking resolved:

  • Integration with makepanda.py works ok; this will probably help with file selection; it also means we know which module a file is associated with

New issues:

  • since we stripped out all the preprocessor symbols, this means the headerfile includes classes that shouldnt be there, and which are not visible in the headerfiles at compile time

Possible solutions:

  • run all headerfiles through the preprocessor before mashing them into the interface
  • pursuade swig to handle the #ifdefs etc for us. One issue with this is that our sorting code assumes that preprocessor directives dont matter; and handles sorting at the class level, treating what is in “class … { … };” as a unit.

Current strategy: run all headerfiles through preprocessor

  • we’ll use the compiler option to do preprocessing and add the # comments. This seems to work with both g++ and cl
  • we’ll use a parser that will strip everything from this output that didnt come from our original file
  • this will hopefully have the happy side-effect that we no longer have to consider macro expansion replacement at all
  • then we’ll put it through the grinder above

Hugh


#7

This is sounding really promising.

As to the C++ namespaces, that should not be an issue; Panda doesn’t really use them (at one point we had to support compilers that didn’t understand them).

The Python-scoping convention of things like “direct.directbase.DirectStart” is only scoping Python filenames, not C++ namespace scopes.

However, I think we do publish the occasional nested class here and there.

David


#8

Ok, that’s cool.

Yes, there’s a nested class in panda/src/downloader/downloadDB.h, but luckily I dont think this is a really critical class for getting Panda running and checking the core is working ok?

Progress update

  • swig runs to completion with no errors, for both pandaexpress and panda
  • the swig-generated .cxx file compiles now! :smiley:
  • and the libpanda.dll link has run to completion

Timings

Swig: ~30 seconds
Compiling Swig output: ~1-2 minutes

Current methodology

interrogate function in makepanda.py rewritten to do simply:

  • run each header file through the preprocessor
  • add all the information it received as arguments to an xml file “swiginfo.xml”
  • swiginfo is a set of elements, each of which has a modulename, a libraryname, and a list of headerfiles
  • we also store a list of defines and include paths in swiginfo just in case that might be useful.

interrogatemodule function modified:

  • just a placeholder for now; returns immediately

new function GenerateSwigFile created:

  • closes swiginfofile, appending to the end first to close the document element
  • calls interrogate2swig.py, passing in the module name, and the path to swiginfo.xml
  • interrogate2swig.py will read all the information from swiginfo.xml corresponding to that module, and generate the sorted swig interface file; basically the code from above
  • back in makepanda.py, swig will be executed on this interface file to generate the appropriate .cxx file, and also the .py file

We call GenerateSwigFile just before linking libpandaexpress.dll and libpanda.dll, then we compile the resulting .cxx, and ensure this is in the list of objects to link for that dll.

Observations

  • preprocessing the headerfiles makes things a lot easier
  • we could probably just preprocess them in one go, rather than one by one; but doing them one by one doesnt take really long (5-10 minutes)
  • defining CPPPARSER during the preprocessing is essential, because it maps PUBLISHED: to __published:, rather than to public:, which we can detect in interrogate2swig.py
  • swig doesnt really understand inherited, nested enums, so the script replaces EnumName by ClassName::EnumName where applicable
  • we skip any file that use templates, because (i) swig doesnt like them too much and (ii) the interface published to Python is pretty-much template-free anyway

Overall, its looking promising, but I’m anticipating a few new issues when I actually try using the resulting python file in a few minutes.

Hugh


#9

status updat

Managed to get this far:


F:\dev\pandacvspymake>python
Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import framework
prc_dir_envvars PANDA_PRC_DIR
prc_dir_envvar_list[i] PANDA_PRC_DIR
prc_dir
>>> import sys
>>> a = framework.PandaFramework()
>>> a.open_framework( len(sys.argv), sys.argv )
>>> a.set_window_title("blah")
>>> window = a.open_window()
Known pipe types:
  wglGraphicsPipe
(3 aux display modules not yet loaded.)

However, at this point, Panda crashes. Not really sure why, or where to go from here. The arguments should be being passed in ok, since I’ve checked in a test piece of code.

Methodology update

  • Created a framework.i file, to save having to understand what Direct does.
  • added in typemaps for the types we are using above
  • typemaps basically map between for example string & and a python type. We’ve got: string &, int &, and char **&.

Hugh


#10

Updates

Not too much further really :confused:

  • building libpanda.dll and libframework.dll separately from the python wrappers,
  • then we link _panda.dll from libpanda.lib and libpanda_module.obj; and we do the same for _framework.dll.

pview.exe runs just fine, so its a little (very) odd that the Python wrapped version crashes. I cant think of anything that could be particularly crashoriphically different between the two.

The crash occurs in the open_window() call within the wrapper, ie on this line:


    cout << "about to call openwindow" << endl;
    result = (WindowFramework *)(arg1)->open_window();
    cout << "Never gets to here" << endl;

where arg1 is:


PandaFramework *arg1 = (PandaFramework *) 0 ;

I’m really hoping I dont have to delve into the code to diagnose this, because libpanda.dll takes around 4-8 minutes to link on my machine, even with all optimizations disabled, and incremental linking activated.

Hugh


#11

It does seem very strange. Maybe something didn’t get initialized via static init properly somehow? Could it be a problem with reference counts, something that Panda expected to keep around and Python unexpectedly deleted? But I don’t think there’s anything of that nature returned by the framework library yet.

libframework.dll is its own DLL, which takes very little time to link, so maybe you can discover the problem by delving just within this module.

David


#12

Discovered that there is a free full-featured debugger for msvc: windbg ( microsoft.com/whdc/devtools/ … fault.mspx ) :smiley: It doesnt seem really widely advertised. It does pretty much everything the full Visual Studio IDE one does.

Whilst poking around through this, I came across the following:


// We absolutely depend on the static initialization of these pointers
// to happen at load time, before any static constructors are called.
void *(*global_operator_new)(size_t size) = &default_operator_new;
void (*global_operator_delete)(void *ptr) = &default_operator_delete;

No idea if its to do with this, but couldnt help thinking this looks ominous?

Hugh


#13

Update:

the crash is occurring with the following call stack:


libpanda!FrameBufferProperties::operator=
libpanda!GraphicsEngine::get_frame_buffer_properties
libpanda!GraphicsEngine::make_gsg
libframework!WindowFramework::open_window
libframework!PandaFramework::open_window
_framework!_wrap_PandaFramework_open_window__SWIG_0
_framework!_wrap_PandaFramework_open_window

The crash is occurring in GraphicsEngine::make_gsg, because the GraphicsEngine object’s _frame_buffer_properties pointer is invalid/unitialized.

Hugh


#14

Woot! Solved!

The problem was not actually (directly) related to Python AFAIK. Basically, we are missing the _engine = 0 initializer in PandaFramework::
PandaFramework. Slotted that in and a window appears!

Possibly the memory we receive when we are linked with Python is not initialized to 0, whereas for some reason when we are linked with pview.cxx the memory is already clean?

Hugh


#15

Screenshot:

[/img]


#16

Ok, got this working with FractalSpline and ported pview to Python as a proof of concept.

Screenshot:

Toolkit with the files you need (you’ll need to download swig, and the latest CVS files first; well those from 2 days ago to be honest, but it will probably work with the latest ones):

manageddreams.com/pandawiki/file … oolkit.zip

This will work on Windows. It may work on unix, but completely untested.

Hugh


#17

To be honest however, what I’m thinking I will probably do for my own project is: leave the Panda build as is, and just link my new classes to the current libpanda.dll, using swig to wrap just my own classes.

The advantage for me and my project is that there is no requirement to build Panda3D itself, or to write patch files for makepanda.py etc.

The advantage for Panda is that I’ll probably stop pushing for Panda to migrate to swig. Of course, this could be a double-edged sword.

In any case, if using wrap-around linking works, and if I’m still thinking along similar lines tomorrow, this basically means that the kit above will not be maintained unless someone else picks it up and uses it.

Nevertheless, as a feasibility study for using Swig, I think it is pretty useful. It’s also not bad as a prototype for how one could create a swig-based build system for Panda.

It shows that:

  • swig can be used successfully to wrap Panda3D
  • swig is much faster than interrogate (30 seconds compared to several hours); and its not memory intensive either
  • swig is mature and well designed, following a KISS philosophy

Lessons learned as far as actually using it in a build system, assuming no modification to actual source-code, or manual interface-file building:

  • passing the header files through the c preprocessor first makes life much easier, especially if one defines the CPPPARSER macro
  • swig really wants to know everything about your class, including protected members, in order to decide whether the class should have a constructor.
  • swig doesnt need to know, doesnt care, about any base classes that you dont care about advertising to your scripting language, even if your advertised classes derive from them
  • swig doesnt like inherited, nested enums; it wont handle nested classes at all
  • swig does need to know about any classes you’ll actually use for parameter passing
  • swig does need the classes in sorted order in the interface file
  • I guess one could also put each class in its own interface file and just use %import statements to load the interfaces of classes on which it depends

Other observations:

  • theres no particular requirement for the c++ wrapper files to be linked statically inside libpanda.dll etc; it could be at least as easy to link them to separate dlls that link to libpanda.dll at runtime
  • this reduces the build time for libpanda.dll, and generally makes the build system easier to manage
  • there’s no particular requirement for all the panda classes to be linked inside the same wrapper dll; all these wrapper dlls will be linking with the same libpanda.dll anyway (same static data).

Final thoughts:

  • swig is definitely much faster than interrogate
  • but there’s a fair wodge of work to migrate cleanly from interrogate to swig
  • it could arguably be easier to just upgrade interrogate a little

Hugh

Note that there is a file missing from the above kit, which is pandatypes.i, which should go in the directory above makepanda, panda etc, and which looks as follows:


// Copyright Hugh Perkins 2005
// This code is public domain.

%typemap(in) string & {
   /* Check if is a string */
   if (PyString_Check($input)) {
      //$1 = (string *)malloc( sizeof( string * ) );
      $1 = new string( PyString_AsString( $input ) );
   } else {
      PyErr_SetString(PyExc_TypeError,"not a string");
      return NULL;
   }
}

%typemap(freearg) string & {
   delete( $1 );
   //free( $1 );
}

%typemap(in) string {
   /* Check if is a string */
   if (PyString_Check($input)) {
      $1 = string( PyString_AsString( $input ) );
   } else {
      PyErr_SetString(PyExc_TypeError,"not a string");
      return NULL;
   }
}

%typemap(freearg) string {
   //delete( $1 );
}

%typemap(in) char ** {
   /* Check if is a list */
   if (PyList_Check($input)) {
      int size = PyList_Size($input);
      int i = 0;
      $1 = (char **) malloc((size+1)*sizeof(char *));
      for (i = 0; i < size; i++) {
         PyObject *o = PyList_GetItem($input,i);
         if (PyString_Check(o))
            $1[i] = PyString_AsString(PyList_GetItem($input,i));
         else {
            PyErr_SetString(PyExc_TypeError,"list must contain strings");
            free($1);
            return NULL;
         }
      }
      $1[i] = 0;
   } else {
      PyErr_SetString(PyExc_TypeError,"not a list");
      return NULL;
   }
}
// This cleans up the char ** array we malloc'd before the function call
%typemap(freearg) char ** {
   free( (char *) $1);
}

%typemap(in) char **& {
   /* Check if is a list */
   if (PyList_Check($input)) {
      int size = PyList_Size($input);
      int i = 0;
    //  cout << "allocating array, for " << size << " elements" << endl;
      $1 = (char ***)malloc( sizeof( char *));
     // cout << "allocated pointer to pointer" << endl;
      *($1) = (char **) malloc((size+1)*sizeof(char *));
    //  cout << "assinged to array variable" << endl;
      char **array = *$1;      
    //  cout << "got array" << endl;
      for (i = 0; i < size; i++) {
         PyObject *o = PyList_GetItem($input,i);
         if (PyString_Check(o)) {
           //  cout << "adding string " << PyString_AsString(PyList_GetItem($input,i)) << endl;
            array[i] = PyString_AsString(PyList_GetItem($input,i));
         }
         else {
            PyErr_SetString(PyExc_TypeError,"list must contain strings");
            free(array);
            free($1);
            return NULL;
         }
      }
      array[i] = 0;
   //   cout << "done" << endl;
   } else {
      PyErr_SetString(PyExc_TypeError,"not a list");
      return NULL;
   }
}
// This cleans up the char ** array we malloc'd before the function call
%typemap(freearg) char **& {
   //   cout << "freeing mem" << endl;
   free( (char *) (*$1));
   free( (char *) ($1));
    //  cout << "done" << endl;
}

#18

Some documentation on how the kit above works, just in case it is useful.

In makepanda.py:

  • interrogate doesnt actually do very much any more

  • interrogate writes the name of the header files to a global list,

  • and it writes out all the information to an xml file swiginfo.xml

  • interrogatemodule calls generateswigfile

  • generateswigfile is a new function that preprocesses all the headerfiles, specified in the global list, naming the processed files xxx.pre

  • then it calls interrogate2swig.py, passing in the name of the module (ie panda, pandaexpress, framework etc)

  • interrogate2swig.py is going to generate a swig interface file in built/swig, called something like panda.i or pandaexpress.i

  • makepanda.py will execute swig on this interface file, to generate the wrapper .cxx file (libpanda_module.cxx, libpandaexpress_module.cxx, etc) and the python interface file (panda.py, framework.py etc)

  • then it will compile it

  • the wrapper will be linked with the appropriate dll (eg libpanda.dll) to create a new dll with an _ prefix, for example _framework.dll or panda.dll. The name is important, otherwise the python file wont link with it correctly at runtime.

  • the rest of makepanda.py is largely unchanged

interrogate2swig.py:

  • reads the module name and swiginfo.xml filepath from the commandline
  • reads the list of all headerfiles from teh swiginfo file, for that module
  • filters this list against an exclusions list, basically files that are hard to deal with at this time, and generally files that arent necessary in the Python interface (but not always)
  • if the headerfilename doesnt end in _src.h, adds it to a list of #include files athat the C/C++ wrapper file will include, so that it has access to the actual underlying headerfiles.
  • the _src.h files are excluded, because there are special rules for including these files; basically only the corresponding file without a _src postfix should be included, so this is what we do
  • then the preprocessed headerfile is read in, ie the .pre file that makepanda.py generated earlier
  • all class members not in __published: are stripped
  • __declspec(import) and/or __declspec(export) are stripped
  • nested classes are stripped
  • a few enum uses are prefixed with the classname, eg ShadeModel -> qpGeomEnums::ShadeModel (handleenums function)
  • certain classes have additional members added to them, eg PointLight. These members exist in the original headers, but only in public:, not in published:.
  • there’s a load of stuff to process macros; but these are largely obsolete now that we are running the files through a preprocessor beforehand, in makepanda.py
  • it writes out the processed, filtered class to a file pandaraw.swg
  • next it sorts this file in batch mode, making several passes, to create the final swig interface file built/swig/panda.i
  • it also adds in a few %include and %import commands to the head of this interface file, so that it works correctly
  • for example %include “pandatypemaps.i” is added, to provide support for using strings and char*&[] as arguments

Known outstanding issues:

  • overloaded member functions that use strings wont be correctly selected at runtime, eg TexturePool_load_texture(“myfile.jpg”)
  • I think the argument name needs to be replaced by INPUT.
  • It may be sufficient to detect whether there is a const or not. If there is a const, replace by const string &INPUT; and if not replace by string &OUTPUT.
  • one would also need to add a new typemap function for string &OUTPUT to the pandatypemaps.i file that is at the bottom of the post above
  • it may be that strings are almost never returned via the arguments list, which would simplify this somewhat

Hugh


#19

This is all good stuff. I think the ten-million-dollar question, still to be answered, is whether the swig-generated code runs faster or slower than the interrogate-generated code, especially the new version of interrogate that we have in-house and will make available soon.

Build time is important for developers, and the prospect of a much faster build time is exciting. Although I think the currently very large runtime for interrogate is indicative of some bug: it didn’t used to take so long, or so much memory; and surely that bug can be fixed.

I think the most important thing to be gained from a switch to swig would be the adoption of a well-maintained tool (well-maintained by someone else :slight_smile:). But that does cut both ways, too, because that means someone needs to maintain the interface to swig, as both swig and Panda continue to grow.

There are still a few stumbling blocks to overcome before we can even consider wholeheartedly embracing swig. The biggest is the large body of code that is already written using the interrogate-generated interface; we would probably need to find a way to finesse the swig interface into being mostly compatible with the interrogate interface. The first 90% of this is renaming the methods the same way we do, which is trivial; but the rest of it may be trickier (automatic reference count maintenance?).

David


#20

[quote]

There are still a few stumbling blocks to overcome before we can even consider wholeheartedly embracing swig. The biggest is the large body of code that is already written using the interrogate-generated interface; we would probably need to find a way to finesse the swig interface into being mostly compatible with the interrogate interface. The first 90% of this is renaming the methods the same way we do, which is trivial; but the rest of it may be trickier (automatic reference count maintenance?).
[/quote

Yes, this is my thought too. I felt a little like when you are climbing one of those convex mountains. You can see the top about 100m away, so you can get there in just a few minutes, but when you get there, you realize that wasnt actually the top.

So, there’s no way I could handle the migration to swig on my own (not without a lot of time anyway; for example I dont even know what Direct does at this point).

I have to say, the more I think about this, if I was going to run a production switch to swig, I’d probably switch to use swig interface files as the published reference. I dont want to say this because it does imply a change to what developers see, but I do believe its the cleanest way.

These files can easily be generated initially (interrogate2swig.py can do this; it has a tree walk which generates a .swg file for each header file, right next to it). However, once they’ve been generated, they should probably be maintained as the reference, in place of the __published: interface.

Then, a migration would run like this:

  • run an initial runtime performance benchmark against interrogate
  • simplify the published classes to not use anything nested, including enums or other classes
  • modify interrogate2swig.py to generate the %rename directives so that the function naming is compatible with interrogate
  • run the interface file generation
  • tweak the interface files (eg add the as_node implementation to public: for LightNode, so that PointLight can be instantiated)
  • check that everything builds ok
  • benchmark runtime performance against interrogate
  • check that everything builds ok again
  • switch build methods

Hugh


#21

did this go any where?


#22

Just what you see here. Are you offering to pick up the baton and run with it?

David