flattening kills performance

Well, with that inflammatory title, I begin.

The project I’m working on involves rendering tens of thousands of boxes. I’m currently testing it rendering 4097 boxes.

Naturally, this is quite slow (about 1.7 fps unflattened), so I started reading about performance optimization in Panda3D, and quickly came across node flattening; it sounds perfect as, at least for now, no boxes do anything independently at all except come into existence.

After the basic startup of getting a window up and running, I throw in a few pieces of code pertinent to Panda3D:

self.taskMgr.add(self.updateSceneTask, "UpdateSceneTask")
self.block_root = self.render.attachNewNode("block_root")
base.setFrameRateMeter(True)
self.added_blocks=False
self.first_time=True

All of this should be quite familiar, apart from the added_blocks and first_time variables, which are used to keep track of whether or not new blocks have been added in the updateSceneTask, which works like so:


    def updateSceneTask(self, task):
    if self.first_time=True:
            self.first_time=False;
            for x in range(16):
                for y in range(16):
                    for z in range(16):
                        self.added_blocks=True;
                                new_block = self.loader.loadModel("models/tribe/block")
                                new_block.setScale(1,1,1)
                                new_block.setPos(x*2-16,y*2+25,-2*z-5)
                                new_block.reparentTo(self.block_root)





         if self.added_blocks=True:
             self.block_root.flattenStrong()
             self.added_blocks=False
    return Task.cont

Now, I’ve butchered that code a bit to remove a whole lot of stuff extraneous to this problem, so if that doesn’t make perfect sense, let me know and I’ll clarify.

Basically, though, the end result is that the task goes through and creates 4096 blocks one time, adds them to the self.block_root node, and attempts to flatten them, once. This completely butchers performance. Unflattened, I can kind of pan the camera around and get a look at things. Flattened, no such luck. The window all but hard locks. Anyone care to show me the error of my ways?

flatten can take quite a while to process, depending on what you flatten. so that hard-lock you experience might simply be the time it takes to flatten the objects together. try with less cubes first, and see what happens when you increase the numbers.

btw. flattening all into one big mesh is not a good idea, as it breaks the culling process.
it is better to flatten your stuff into a couple of separate nodes. so culling can throw out all blocks outside the frustum , increasing performance.
you should aim for less than 300 individual nodes on-screen.

Thanks, that is helpful. I have altered my program significantly to perform some performance tests.

Now, I have a program that is solely the surface area of a cube with a length of 16 blocks, with every vertical slice being its own flattened node. That gives me around 1500 models spread out across 16 nodes. I don’t think it’s really being done in a way that will enable me to cull anything, since SOME elements of all of those blocks are in place, but it did improve performance significantly. I am now getting 30-40 FPS as opposed to less than 2 without any flattening.

However, this is definitely not the kind of performance I was hoping for. The perplexing thing is that I have no tasks, only an initialization script, which creates those nodes, flattens them, and then does nothing. This leaves one core of my CPU pegged at 100%. It leaves my graphics processor pegged at about…5%. Actually, the GPU has LESS of a load than that.

I find it strange that a fairly simple 3D scene with little to no external processing hits my CPU so hard and my GPU so little.

I feel like I must be doing something wrong / missing some optimization that would help with the CPU.

So, here’s the code I’m using now:

    def __init__(self):
        ShowBase.__init__(self)
        self.blocks = {}
        self.added_blocks=False

        self.block_node={}
        for i in range(16):
            self.block_node[i] = self.render.attachNewNode("block_root"+str(i))
        for x in range(16):
            self.blocks[x]={}
            for y in range(16):
                self.blocks[x][y]={}
                for z in range(16):
                    if x==0 or y==0 or z==0 or x==16 or y==16 or z==16:
                        self.blocks[x][y][z]=self.loader.loadModel("models/tribe/dirt")
                        #self.blocks[x][y][z]=self.loader.loadModel("models/tribe"+self.world.block_dictionary[world_blocks[x][y][z]])
                        self.blocks[x][y][z].setScale(1,1,1)
                        self.blocks[x][y][z].setPos(x*2-16,y*2+25,-2*z-5)
                        self.blocks[x][y][z].reparentTo(self.block_node[x])
            self.block_node[x].flattenStrong()

        base.setFrameRateMeter(True)

Is there some optimization in situations like this to give the CPU a bit more breathing room?

You can use PStats to see the number of individual Batches (roughly the same as Geoms, which also reported) you’re pushing through the pipe. Roughly, your CPU load will be proportional to the number of batches, while your GPU load is dependent on several other things that are harder to measure.

If all of your cubes are made up of lots of different states (e.g. a different texture on each face or something like that), then you might successfully flatten them down into only a handful of nodes, but you might still have many different batches within those nodes, resulting in the kind of performance envelope you describe.

You can also use NodePath.analyze() to get a rough idea of the number of different Geoms in your scene; this is a bit easier to use than PStats.

David

Thanks, drwr, that knowledge has been helpful in my quest for optimizing performance.

Hi Laereom,

I ran into the same issue rendering a bunch of boxes. I can finally draw 60x60x20 boxes that are 1x1x1 in size at about 30fps.

The secret is to draw only visible faces and use geom nodes to only draw the visible faces. It’s easiest to break them into 4x4x4 chunks. This is not easy to do and takes a lot of math and tinkering. There are some minecraft clone threads around that may be of some help in that regard. But you will simply not be able to load that many models into your scene without huge performance problems.

Good luck!

So, flattening is no longer killing my performance. It’s just not doing a lot for it either way.

As I understood it, flattenStrong essentially collapses all of the child nodes of a parent into one big node/geom and thereby reduces the number of batches.

Is that correct? If so, could someone give me a brief example of how to go about that? So far, I’ve tried reparenting all of the nodes in a given chunk to the same same node and flattening, with all of them having the same texture and scale, varying only in x,y,z position, and each one is still an entire Geom itself after my call to flattenStrong.

Since drwr is like the smartest person ever, let me just quote him from another thread:

[Help with closing primitives)

Maybe I wasn’t totally clear.

The reason it’s not doing anything to my performance is that it doesn’t seem to actually be reducing the number of geoms, despite the fact that the nodes I’m calling it on have 64 children using the exact same model, texture, and scale – the only thing they differ in is position. However, it just doesn’t seem to actually be able to flatten them. I am wondering what the specific qualifications are to get flattenStrong to collapse Geoms together.

So it’s not so much a matter of overall performance and the balancing act between flattening and not flattening, but rather getting flattening to actually…flatten.

I believe you should call clearModelNodes()
on your nodepath first before calling flattenStrong(), as flatten does not go beyond model nodes.

Hah, embarrasingly, I had read about that in a thread yesterday, stuck it into some buggy code of mine, didn’t see any results, and promptly forgot about it.

Now, though…
Woo, wow, that is fast. I went from 4FPS maxing out one core to to 60fps hitting it 10-15%. That’s sexy. Thanks.

If self.Land points to model.loadModel(“LandModel”), and I call clearModelNodes() on self.Land,

self.Land.clearModelNodes()

can I still perform regular nodepath operations using self.Land?

Example:
self.Land.removeNode()
self.Land.setScale()
self.Land.setTexture()

I believe that the answer is no, because strictly speaking, the node no longer exists. Although I’m not 100% sure. I think that your question comes at it from an angle that isn’t quite right, or I’m misunderstanding you.

What you really want to be calling clearModelNodes() on is the parent node of the self.Land, not self.Land itself.

At that point, self.Land is no longer an individual node, but instead just vertices in the parent node, no operations can be performed on it individually. There may still be ways to alter the vertices of those nodes individually, though, since as I understand it the vast majority of the information remains as part of the parent node.

Textures are an exception to that, I think, and a few other things may be, too – if you want to change the textures, they have to remain as individual nodes. I remember reading somewhere that having one texture file that effectively contains a bunch of textures and playing around with UV mapping can be a workaround, though. I am now speaking far beyond my level of competence.

I guess I really didn’t need to use the clearModelNodes(). It made no difference on my fps, which is running around 300 to 370+ fps with full screen desktop mode, wide screen. (faster in true full screen mode).

The only time my fps drops close to 60 is when I’m rotating an object by means of a “picking ray.” No suprise there, rays are expensive; even more so if your geometry receiving the ray is high count. I can eliminate that fps drop by allowing players to rotate an object before placement only, instead of anytime they want to.

Some of the rotating fps drop is do to the fact I have some bad test models. Some of my models don’t drop fps as bad when rotating. Models that have some issue when loaded up in P3D will cause additional fps drop. The only problem for me is, I can’t tell what’s wrong with the model, since they look normal enough.

All my test textures are poorly sized too. They aren’t 256x256, 512x512, 64x64, etc… They are ridiculous, 640x450, 2000x3000… That’s why I’m losing some performance as well.

Oh well, they’re just test models and textures anyway. I will be more careful during the actual graphic stage of development. (hopefully)

PS, I haven’t used egg-optchar on my actors yet (with keepall). I wonder if I could get some fps gain if I used it later?