Memory leak in DirectGuiWidget

rdb · January 2, 2009, 2:33pm

There’s a memory leak in DirectGuiWidget: (and thus in all gui widgets)

from direct.directbase import DirectStart
from direct.gui.DirectGuiBase import DirectGuiWidget
import gc
while True:
  model = DirectGuiWidget()
  model.removeNode()
  del model
  gc.collect()
  print len(gc.get_objects())

Hypnos · January 2, 2009, 2:43pm

(could be that this only affects 1.6, we havent tested it on older versions)

ynjh_jo · January 2, 2009, 4:32pm

No, there is NOT.
Dude, it’s destroy(), instead of removeNode().

rdb · January 2, 2009, 5:23pm

But certainly this method should be called in the destructor, not?

drwr · January 2, 2009, 7:45pm

Yes, but you haven’t called the destructor when you call del model. In fact, the purpose of the destroy() method is to clean up all of the references to the widget, and make it possible to call the destructor.

Python destructors are problematic; they only get called when the very last reference to the object goes away. But if an object has outstanding references to it in some global table (in this case, the messenger table), the destructor will never get called. This is why we have had to make methods like destroy() and cleanup(), whose purpose is generally to remove all of these references.

The syntactical similarity of Python’s “del obj” and C++'s “delete obj” confuses just about everyone. These two statements look very similar, but in fact they are completely different. In C++, this is a direct call to the destructor, and it means that any other references to the object are now invalid (and will cause a crash if you try to use them). In Python, all it does is delete the local reference, which will only invoke the destructor if there are no other references. In Python, “del obj” does nothing to obj at all, at least not directly.

David

rdb · January 2, 2009, 8:12pm

In the case of the sample above, ‘del’ decreased the reference count to 0 just for the gc.collect() function to collect it. The ‘del’ command can certainly be useful in a large flat non-nested code with huge objects.

Also, wouldn’t it be an idea to add a constructor which simply calls destroy()? This would at least minimize these issues. I mean, the reference count will most likely reach 0 eventually, and when it does, there’s a chance that you didn’t call destroy() yet.

drwr · January 2, 2009, 8:39pm

No, actually, it didn’t. It decreased the reference count to 12:

>>> sys.getrefcount(model)
13

One of the 13 references is the name “model” in the local space, so deleting this removes that reference, leaving 12. What are the other 12 references? They are the circular created within DirectGuiWidget. That is, they are references stored within DirectGuiWidget that reference itself, directly or indirectly. It is these references which are cleaned up by the call to destroy().

Everyone, upon discovering this, always wants to put a call to destroy() in the destructor. But this cannot work, because unless you calling destroy(), the destructor will never be called.

Another approach would be to avoid the circular references in the first place. But this would mean you wouldn’t be able to store useful things like pointers to messenger hooks or function pointers or such. Because we want to be able to store these things, we need an explicit destroy() method.

This is kind of a consequence of Python’s initial life as a reference-counted language, and its subsequent half-hearted embracing of a garbage collector. Reference-counted languages occasionally require explicit calls to destroy()-like methods, to clean up inevitable circular references. Garbage-collected languages don’t require this, but Python’s garbage collector doesn’t have enough authority to break circular references; it can only report them.

David

rdb · January 2, 2009, 8:42pm

Ah! Thanks for the clear explanation, I didn’t know of any circular references.

Hypnos · January 2, 2009, 8:47pm

i dont know if this affects this thread, but the initial problem is that when creating a directbutton/directframe, and removing it, there is a memory leak…

i think adding destroy to removeNode would help solving this problem, doesnt it?

(it should be enougth to call removeNode, not destroy as well from the python side)

rdb · January 2, 2009, 8:51pm

Well, removeNode actually removes the PandaNode the NodePath points to, and is implemented in C++, while the destroy() is meant for cleaning up the Python stuff (which probably also calls removeNode()). So that would require something like an upcast from a C++ class to a Python type if you were to call destroy() from C++ code.

Actually, maybe it would be possible just to override removeNode in the Python code.

Hypnos · January 2, 2009, 9:24pm

while it’s important to learn how panda3d works, consistency would be much better for me (and other ppl’s).

Eigther every node should be destroyed using destroy() or removeNode(), but having diffrent ways for everything (i know it’s only nodepath’s and dgui-nodes) is just making things more complex then it should be.

drwr · January 2, 2009, 10:12pm

I agree. There should be only one method name for cleaning up all objects, no matter what kind of objects they are. Instead, we have cleanup(), destroy(), removeNode(), delete(), and maybe others, and you just have to know what kind of object you have, and learn what method name is the right one to call. It’s a problem.

If Panda had been designed as a complete system, we would certainly have designed it that way. However, it wasn’t designed like that; it evolved, over a long period of time, and each of these method names was invented at a different time by a different programmer working on a different part of the system. Now that we have all these different methods, it’s difficult to unify them, because we need to maintain compatibility with existing code.

It would probably be possible to unify them if we really really wanted to. But in practice, it’s not really that terrible. It makes the Panda3D learning curve a bit steeper, but that’s about all. The human mind is actually good at learning lots of exotic and disparate rules, and applying them in the right place. This is why natural languages like English are so self-inconsistent; but it doesn’t bother people except first-year English students.

David

treeform · January 2, 2009, 10:22pm

That’s a very lame excuse.

drwr · January 2, 2009, 11:30pm

Which part is lame? That it’s hard to fix because we have to maintain backward compatibility, or that it’s not high on our list of problems to fix because it’s not such a bad problem in practice?

David

ynjh_jo · January 3, 2009, 4:56am

It must be for your panda3d-editor.

For now, you blame panda for not providing consistent destructors.
Later on, when you let the user to insert any class, which turns out to be a deep inheritance class, say from Actor and DirectObject, how would you like to destroy it ?
Using removeNode wouldn’t make any sense at all. And if you don’t call ignoreAll(), it won’t be destroyed at all, the event hook is still there and MIGHT bite you at the next frames.
I’m afraid, at the end you’d blame the user for creating such class.

Complaining about this on panda side won’t help, since the possibilities are endless, and maybe eventually you’ll have to deal with it anyway.

In my IDE, user can create any kind of crazy class, and I must be able to destroy its instance, so I built a common destructors table :

destructors = {
          'Task':['remove'],
          'FSM':['cleanup'],
          'Actor':['stop','cleanup'],
          'AudioSound':['stop'],
          'MovieTexture':['releaseAll'],
          'DirectObject':['ignoreAll'],
          'DirectDialog':['cleanup'],
          'DirectFrame':['destroy'],
          'OnscreenText':['destroy'],
          'MetaInterval':['pause'],
          'LerpFunctionInterval':['pause'],
          'LerpNodePathInterval':['pause'],
          'NodePath':['removeNode'],
          'CollisionTraverser':['clearColliders'],
          'CollisionHandlerPhysical':['clearColliders','clearRecorder'],
          }

which is used in this sequence :
[1] iterate on object’s dictionary and do [2] & [3] on each attribute, and also recursively iterate (unless already visited) on it if it’s a sequence (tuple, list (deque is not there yet)) or a dictionary
[2] iterate and destroy each object’s baseclass (except Actor)
[3] destroy object’s class
and I have to do final cleanup on render, render2d, and aspect2d, in case of unsaved-as-attribute object.

Perhaps that’s the behavior you’re after.

rdb · January 3, 2009, 9:32am

I’m not blaming Panda. I’m just wondering about potential other solutions.

Of course not, since the DirectObject destructor will get called too, which should ignore the events. Without DirectObject, how would you catch the events anyways?

Hypnos · January 3, 2009, 12:31pm

thanks ynjh_jo that is indeed a nice list how to destroy stuff, maybe it should go into the manual as well. (destroying stuff is something i often search and wont find)

but my real point here is that it would be helpful to have it consistent. i am not blaming anything or anyone, i am just trying to start a talk about something panda could consider for a mayor version change.

maybe another solution would be to remove python functions that dont work the way a user expects? like removing the removeNode on dgui elements. or adding a del (not del) function which is just a wrapper for the individual destroy functions…

(i know, i often start talks about basics that dont really fit into the development cycle. however i have a different view on panda then a developer that has been into the engine for a long time. and i think it’s worth talking about such things from time to time, even if it doesnt change anything right now, it may at a later time)

zuck · January 3, 2009, 5:12pm

Well, you could use the classic “duplicate/deprecate/delete” (yes, I’ve invented tha name now ) code life cycle on these methods:

Add a new method “del” (or something else) which implements the destructor logic in each class.
Change original method body (‘destroy’, ‘cleanup’, ‘removeNode’, ecc.) so it calls the new method and shows a runtime “deprecate” message (so the user knows that method will be removed in the future and he shoulds switch on new one).
After X releases of Panda3D you can finally remove all deprecated methods from code and unify all destructors in a unique method name.

drwr · January 3, 2009, 7:00pm

Right, this is the way we would eventually solve the problem and make things more consistently-named. It’s a long road, though, and in practice deprecating functions has had inconsistent success (people tend to continue using the functions they’re familiar with, long after they have been officially deprecated). But we could get there eventually.

Still, as ynjh_jo pointed out, having a more consistent naming in Panda doesn’t really solve the fundamental problem here. There are bigger issues, and to a certain extent, making the method names consistent may actually hide some of these issues and lead to bigger, harder-to-solve problems for naive programmers.

As an example: suppose we agreed that all objects should have a destroy() method which cleaned them up properly. So we define a NodePath.destroy() which does the same thing that removeNode() does now. And we also let DirectGuiWidget.destroy() continue to do what it does now. And so on. So now everything can be cleaned up with destroy(), and there’s one less set of weird rules you have to learn to use Panda. Great!

But wait! It’s not that simple. Because DirectGuiWidget.destroy() does more than destroy itself; it also destroys all of its children that are also DirectGuiWidgets, and their children as well, and so on. This is an important convenience because frequently you create hierarchies of these things, and it would be a nuisance to have to store pointers to every one of these and call destroy() on each one.

On the other hand, NodePath.removeNode() does no such thing. It doesn’t do anything to its children at all; and this is important too because we call NodePath.removeNode() a lot, and we don’t want to pay the cost of walking the entire hierarchy looking around for things that need to be cleaned up every time we call it.

So, now we get used to calling thing.destroy() to clean up a DirectGuiWidget, and thing.destroy() to remove a hierarchy of nodes, and we forget that really they are different things, and now we will try to call root.destroy() on a plain NodePath to remove a hierarchy of nodes that includes DirectGuiWidgets, but the DirectGuiWidgets won’t get cleaned up. And we won’t know why, and we may not even notice the problem until much later, when our application crashes mysteriously.

So at the end of the day, we’ve just moved the problem, from a relatively small problem that will confuse new Panda users for a day or two, to a much more subtle, more difficult to diagnose problem that could stump expert Panda users for a week or more.

I’m not trying to justify the mishmash of naming conventions, or say that we shouldn’t attempt to make Panda more self-consistent. I’m just pointing out that the problem is bigger than it first appears.

David

zuck · January 4, 2009, 12:42pm

Yes, don’t you worry So, in my opinion is clearer to specialize a unique (polymorphed) name method instead of use a different name for each one.

For example (with DirectGUIWidget and NodePath):

NodePath:

def destroy(self, deep=False):
     # destroy me.
     if deep is True:
         # destroy all childrens.


DirectGUIWidget:

def destroy(self):
    # destroy me and all childrens.

Obviously this is a simplification of the problem but in my opinion the solution isn’t so difficult. Of course we need to improve the documentation