Tracking down memory leaks when creating thousands of scenes

andyljones · May 21, 2019, 1:59pm

I’m using panda3d for a reinforcement learning project. Part of RL is to generate the same simulated environment thousands, millions of times, and test how a smart agent interacts with it. I’m finding that every time I create/destroy the environment I burn a few hundred kb of memory, and eventually that kills the machines I’m running my experiments on.

The full project isn’t ready for public release, but I’ve put together a torn down version here which can be run directly from the commandline with minimal setup. There are instructions at the top of the file. It creates and destroys a very simple procedurally-generated environment ten times, and uses panda3d’s MemoryUsage module to see what’s left lying behind. When I run the script (on panda 1.10.1 and macOS 10.14.3), this is the output:

Leaked 0.13MB of memory in 10 loops. Pointer counts follow:

   LightAttrib  PerspectiveLens  PointLight  RenderState  Texture  TextureAttrib
0            1                6           1            1        1              1
1            2               12           2            2        2              2
2            3               18           3            3        3              3
3            4               24           4            4        4              4
4            5               30           5            5        5              5
5            6               36           6            6        6              6
6            7               42           7            7        7              7
7            8               48           8            8        8              8
8            9               54           9            9        9              9
9           10               60          10           10       10             10

Each row reports the state of the memory after a create/destroy iteration, and as you can see it burns about 10KB/loop and leaves various light and texture objects lying around. There are two lines in the script marked #TODO which are the culprits, and commenting them out zeros the memory leak.

Most of the script is memory tracking. The scene itself is the nigh-on-trivial

    root.setShaderAuto()

    # Add a textured surface
    surf = root.attach_new_node(GeomNode('wall'))
    tex = Texture()
    tex.setup_2d_texture(256, 1, Texture.T_unsigned_byte, Texture.F_luminance)
    surf.set_texture(tex)  # TODO: This is the line that generates half the memory leaks. 

    # Add point lights
    point = root.attach_new_node(PointLight('point_light'))
    root.set_light(point)  # TODO: This is the line that generates the other half of the memory leaks.

    root.flattenStrong()

I’m already removing all nodepaths, clearing the render cache and clearing the texture cache with

    # Remove all the nodes I created
    for p in descendents(root):
        p.remove_node()

    # Clear various caches
    TransformState.garbage_collect()
    RenderState.garbage_collect()

What am I missing? Where are these remnants recorded? How do I kill them dead?

Edit: Now with newly-simplified script

Thaumaturge · May 21, 2019, 5:41pm

Hmm… I believe that you can remove a light from a node via “clearLight”–as in: “render.clearLight(self.myLight)”. I don’t know offhand whether it’s supposed to be required (it makes sense that it might be), but you might try it.

Similarly, it looks like there’s a “clearTexture” method in NodePath (see the API here); I don’t know whether it’s the right method to call, but it may be worth investigating.

andyljones · May 21, 2019, 6:06pm

Good idea! Unfortunately

    for p in descendents(root):
        p.clear_texture()
        p.clear_light()
        p.remove_node()

gives the same leaked pointers

That did prompt me to check what the ref counts of each group were after one iteration:

LightAttrib        4
PerspectiveLens    3
PointLight         3
RenderState        3
Texture            3
TextureAttrib      4

and as I think my memory tracking contributes +2 to all ref counts, then it’s LightAttrib and TextureAttrib that are being cached somewhere. In fact, TextureAttrib has a pointer to the Texture, and LightAttrib has a pointer to PointLight, which in turn has a pointer to the PerspectiveLens.

So now the mystery is where the Attribs are both being held, and what’s pointing to the RenderState.

Thaumaturge · May 21, 2019, 6:11pm

Hmmm… Does “descendents(root)” include “root” in its results? Because you appear to be setting the light on the root, not its descendants.

Other than that, I don’t know offhand–I’ll bow out in favour of others who might know better than I on this.

andyljones · May 21, 2019, 6:39pm

It does include the root

Bit more mystery: calling RenderState.garbage_collect() twice in a row, immediately after eachother, frees the RenderState and two of the Attrib pointers, getting me down to

LightAttrib        3
PerspectiveLens    3
PointLight         3
Texture            3
TextureAttrib      3

So now the mysteries are ‘what’s caching the Attribs’ and ‘why does the RenderState garbage collector need to be called twice’.

Updated the script with all of this.

andyljones · May 21, 2019, 7:27pm

And I’ve just discovered that running it three times in a row on each create/destroy iteration clears the leak.

w h a t

Part of my surprise is that it only gets rid of a single layer of pointers each time. That’s unusual for garbage collection, but if it’s expected to be called periodically then sure, fine. The rest of my surprise is that calling it three times per create/destroy works, but once per create/destroy doesn’t - I’d expect later iterations to clear up the mess of earlier ones.

Doing some more experiments, it seems my memory inspection is grabbing references to the ‘lower layers’ of objects before the garbage collect catches up, so they never get destroyed. Which means the memory leak in the minimal script is not the one killing me in prod.

Welp. This has been an education.