Strange behaviour from scene graph

Hello,

I have a BIG problem :frowning:

I’m creating a NodePath not linked to “render”.
I used this NodePath to attach many objects (divided in several subtree).

The problem seems that, even if that scene graph is not rendered (it has no parent), it still impact on the performance of the engine. In fact if I attach many objects to it (about 10000) my engine drops down to 40 fps, even if on the scene (render) is empty!!!

I cannot understand why a node, not parented to render, can affect the global performance in this way.
Please consider that, if I use only 2 objects attached to this NodePath, I can get more than 580fps! So it seems that, even if Panda uses a Scenegraph, nodes not attached to the main scene graph heavily impact on the performance.

How can I avoid this?
I really need help, or my game could be killed, since I need to make a huge scenario “offline”.

Please help me!!!

Thank you!

That’s odd. Didn’t you miscoded something?
Some test code?

It’s difficult, since the code is inside my game (a lot of classes).

Do you know if there are memory problems related to the performance, even if a node is not inside “render” tree?

Well… maybe you activated the memory track mechanism in your prc and forgot to disable (like track-memory-usage or similar)?
A little more time and someone can check this issue better than me… :wink:
You could try to isolate your test on a testScript and see if that replicates.

No memory track.
I’m trying to replicate the problem in a small piece of code.

Thank you!

Also you could run analyze() or pstats to see if you can see some useful information. :wink:

Ok, I found the problem and… it’s crazy (I cannot understand why)!

Look at this code:

from direct.showbase.ShowBase import ShowBase
from panda3d.core import *
from math import cos


class MyApp(ShowBase):
 
    def __init__(self):
        ShowBase.__init__(self)
        
        base.camLens.setFar(1000)
        
        theLens = base.cam.node().getLens()
        theLens.setFar(50)

        # Load the environment model.
        theBarrel = self.loader.loadModel("models/barrel.egg")
        
        counter = 0

        cache = NodePath("cache")

        NUM_OF_OBJECTS = 333
        
        for x in range(NUM_OF_OBJECTS):
                for y in range(NUM_OF_OBJECTS):
                    placeholder = cache.attachNewNode("Barrel")
                    #placeholder.setPos(x*1.5, y*1.5, 0)
                    theBarrel.instanceTo(placeholder)
                        
                    #b = theBarrel.copyTo(cache).setPos(x*1.5, y*1.5, 0)
                    
                    counter = counter + 1

        #cache.reparentTo(render)
        
        print "counter: " + str(counter)

app = MyApp()
app.run()

Well (use any model you wish).
I attach some objects (see NUM_OF_OBJECTS) to a node called “cache”: It is not linked to “render”.
In the source code you can see that:

# placeholder.setPos(x1.5, y1.5, 0)

is disabled. If I enable it (toggle comment “#”), I get performance going down!
Check my results:

SetPos Disabled:
NUM_OF_OBJECTS=33 --> 700FPS
NUM_OF_OBJECTS=333 --> 700FPS

setPos enabled:
NUM_OF_OBJECTS=33 --> 700FPS
NUM_OF_OBJECTS=333 --> 156FPS

So it seems there is a problem related to object position, even if the “cache” (node) is not linked to “render” :open_mouth:

UPDATE: i tried to use pstats (panda3d.org/manual/index.php/PStats) and I found this result:
pstat says that, using [b]setPos()/b] there is a big amount of time spent in “App” (blue bar). But I’m new in Panda, and I don’t know what does it mean!!!
App? Ok it is application… but how can I get more details about “app”? What’s inside?

EDIT: From the manual I found this:

In Panda’s nomenclature, “App” is any time spent in the application yourself, i.e. your program. This is your main loop, including any Python code (or C++ code) you write to control your particular game’s logic. It also includes any Panda-based calculations that must be performed synchronously with this application code; for instance, the collision traversal is usually considered to be part of App.

But since I have no loops, no Tasks, no collision, so it seems more a Panda bug related to setPos(). :cry:

I’m on my cell and can’t see the code correct, but are there 2 loops of 333 each? So you have 333x333~=100k nodes. With setPos() you add a transformation to each node. I don’t know how panda stores these (some 4x4 matrix?) but that can’t be free.

Do you really need that many nodes at once?

yes I created so many models, but if I don’t set setPos() function it seems there is no issue.
Furthermore the root of this “big block” is not inside RENDER loop.
So why Panda takes care of models out of RENDER block?

o_O that’s so weird…
I can’t test now (pc broke, no panda in my sis notebook), but you gave much more informations and you probably will get way faster answers now.

A last test would be to call:
cache.flattenLight() or cache.flattenMedium() or flattenStrong() after the reparentTo and see how it behaves (that’s not a solution [only a test], cuz even without these calls, in this case, you shouldn’t get any performance penalty, afaik).

yes I know those functions. But my concern is why panda spend time with something out of RENDER. Furthermore the time is spent only if I use setPos() !!! It’s funny!

I agree with you. (said about those methods only as a test).
I know that we should avoid unnecessary transforms and make a good use of those methods… but hell! the node isn’t even in render! lol =P

A loooong shot: Maybe your run out of memory and the swapping thing kills your performance? (I bet you didn’t, but anyway…)

Ok, maybe we made another step forward.

The problem is not really related to setPos() but to the position of the objects.
Ok, I will try to be more clear:

I modified the instruction:

placeholder.setPos(x*1.5, y*1.5, 0)

With this one:

placeholder.setPos(0, 0, 0)

And Frame rate goes up now! 700fps.
So it seems that Panda apply a kind of culling even for objects outside the render.
If I make 100000 objects to the same position, they can be resolved quickly, but if I made 100000 in different positions the global volume is much larger. It seems that panda spend time managing that large volume occupied!!!

But why panda manage my nodes if such nodes are not inside render tree?!?!?!

Have you tried all objects to the same pos (different than (0,0,0))???
Maybe it’s more related to the need of a non identity matrix?
Make others tests with setHpr or setScale too…
I can’t see the rendering stuff reaching the cache node. =/

I tried with placeholder.setPos(9999, 9999, 9999) and it works great!

So it seems the problem is the “offset” (or the volume used from all the objects).

I tried this:

for x in range(NUM_OF_OBJECTS):
                placeholder = cache.attachNewNode("Barrel")
                placeholder.setPos(2, 2, 2)
                theBarrel.instanceTo(placeholder)
                
                for y in range(NUM_OF_OBJECTS):
                    placeholder = cache.attachNewNode("Barrel")
                    placeholder.setPos(9999, 9999, 9999)
                    theBarrel.instanceTo(placeholder)
                        
                    counter = counter + 1

and I get great FPS!
So the problem is not the “volume” (in fact in the previuous example I putted a ton of objects to (9999, 9999, 9999) and a ton of objects to (2,2,2).

Maybe it is related to the “sparse” objects?
My brain is becoming mad!

This one get me great FPS also:

        for x in range(NUM_OF_OBJECTS):
                for y in range(NUM_OF_OBJECTS):
                    placeholder = cache.attachNewNode("Barrel")
                    placeholder.setPos(x, 9999, 9999)
                    theBarrel.instanceTo(placeholder)

But this one give me bad FPS:

        for x in range(NUM_OF_OBJECTS):
                for y in range(NUM_OF_OBJECTS):
                    placeholder = cache.attachNewNode("Barrel")
                    placeholder.setPos(x, y, 9999) # <-------------------
                    theBarrel.instanceTo(placeholder)

What’s weird to me is :
you don’t save the cache node, so it should’ve been garbage collected once init() finishes.

Do you use 1.8.0 ?

yes I use 1.8.0.
I don’t think garbage is involved, since I allocate memory for those nodes.
Garbage does not free memeory, since I use it.

I tested last Panda nightly build, and the problem still persist.

Can someone explain me this problem?
In the meantime, I will open a call in Bug-Tracker (I’m almost sure this is a bug!).