flattening kills performance

Laereom · February 8, 2012, 3:21am

Well, with that inflammatory title, I begin.

The project I’m working on involves rendering tens of thousands of boxes. I’m currently testing it rendering 4097 boxes.

Naturally, this is quite slow (about 1.7 fps unflattened), so I started reading about performance optimization in Panda3D, and quickly came across node flattening; it sounds perfect as, at least for now, no boxes do anything independently at all except come into existence.

After the basic startup of getting a window up and running, I throw in a few pieces of code pertinent to Panda3D:

self.taskMgr.add(self.updateSceneTask, "UpdateSceneTask")
self.block_root = self.render.attachNewNode("block_root")
base.setFrameRateMeter(True)
self.added_blocks=False
self.first_time=True

All of this should be quite familiar, apart from the added_blocks and first_time variables, which are used to keep track of whether or not new blocks have been added in the updateSceneTask, which works like so:


    def updateSceneTask(self, task):
    if self.first_time=True:
            self.first_time=False;
            for x in range(16):
                for y in range(16):
                    for z in range(16):
                        self.added_blocks=True;
                                new_block = self.loader.loadModel("models/tribe/block")
                                new_block.setScale(1,1,1)
                                new_block.setPos(x*2-16,y*2+25,-2*z-5)
                                new_block.reparentTo(self.block_root)





         if self.added_blocks=True:
             self.block_root.flattenStrong()
             self.added_blocks=False
    return Task.cont

Now, I’ve butchered that code a bit to remove a whole lot of stuff extraneous to this problem, so if that doesn’t make perfect sense, let me know and I’ll clarify.

Basically, though, the end result is that the task goes through and creates 4096 blocks one time, adds them to the self.block_root node, and attempts to flatten them, once. This completely butchers performance. Unflattened, I can kind of pan the camera around and get a look at things. Flattened, no such luck. The window all but hard locks. Anyone care to show me the error of my ways?

ThomasEgi · February 8, 2012, 4:13am

flatten can take quite a while to process, depending on what you flatten. so that hard-lock you experience might simply be the time it takes to flatten the objects together. try with less cubes first, and see what happens when you increase the numbers.

btw. flattening all into one big mesh is not a good idea, as it breaks the culling process.
it is better to flatten your stuff into a couple of separate nodes. so culling can throw out all blocks outside the frustum , increasing performance.
you should aim for less than 300 individual nodes on-screen.

Laereom · February 8, 2012, 4:44am

Thanks, that is helpful. I have altered my program significantly to perform some performance tests.

Now, I have a program that is solely the surface area of a cube with a length of 16 blocks, with every vertical slice being its own flattened node. That gives me around 1500 models spread out across 16 nodes. I don’t think it’s really being done in a way that will enable me to cull anything, since SOME elements of all of those blocks are in place, but it did improve performance significantly. I am now getting 30-40 FPS as opposed to less than 2 without any flattening.

However, this is definitely not the kind of performance I was hoping for. The perplexing thing is that I have no tasks, only an initialization script, which creates those nodes, flattens them, and then does nothing. This leaves one core of my CPU pegged at 100%. It leaves my graphics processor pegged at about…5%. Actually, the GPU has LESS of a load than that.

I find it strange that a fairly simple 3D scene with little to no external processing hits my CPU so hard and my GPU so little.

I feel like I must be doing something wrong / missing some optimization that would help with the CPU.

So, here’s the code I’m using now:

    def __init__(self):
        ShowBase.__init__(self)
        self.blocks = {}
        self.added_blocks=False

        self.block_node={}
        for i in range(16):
            self.block_node[i] = self.render.attachNewNode("block_root"+str(i))
        for x in range(16):
            self.blocks[x]={}
            for y in range(16):
                self.blocks[x][y]={}
                for z in range(16):
                    if x==0 or y==0 or z==0 or x==16 or y==16 or z==16:
                        self.blocks[x][y][z]=self.loader.loadModel("models/tribe/dirt")
                        #self.blocks[x][y][z]=self.loader.loadModel("models/tribe"+self.world.block_dictionary[world_blocks[x][y][z]])
                        self.blocks[x][y][z].setScale(1,1,1)
                        self.blocks[x][y][z].setPos(x*2-16,y*2+25,-2*z-5)
                        self.blocks[x][y][z].reparentTo(self.block_node[x])
            self.block_node[x].flattenStrong()

        base.setFrameRateMeter(True)

Is there some optimization in situations like this to give the CPU a bit more breathing room?

drwr · February 8, 2012, 7:30am

You can use PStats to see the number of individual Batches (roughly the same as Geoms, which also reported) you’re pushing through the pipe. Roughly, your CPU load will be proportional to the number of batches, while your GPU load is dependent on several other things that are harder to measure.

If all of your cubes are made up of lots of different states (e.g. a different texture on each face or something like that), then you might successfully flatten them down into only a handful of nodes, but you might still have many different batches within those nodes, resulting in the kind of performance envelope you describe.

You can also use NodePath.analyze() to get a rough idea of the number of different Geoms in your scene; this is a bit easier to use than PStats.

David

Laereom · February 8, 2012, 9:00pm

Thanks, drwr, that knowledge has been helpful in my quest for optimizing performance.

ZetronMan · February 9, 2012, 4:48pm

Hi Laereom,

I ran into the same issue rendering a bunch of boxes. I can finally draw 60x60x20 boxes that are 1x1x1 in size at about 30fps.

The secret is to draw only visible faces and use geom nodes to only draw the visible faces. It’s easiest to break them into 4x4x4 chunks. This is not easy to do and takes a lot of math and tinkering. There are some minecraft clone threads around that may be of some help in that regard. But you will simply not be able to load that many models into your scene without huge performance problems.

Good luck!

Laereom · February 9, 2012, 6:55pm

So, flattening is no longer killing my performance. It’s just not doing a lot for it either way.

As I understood it, flattenStrong essentially collapses all of the child nodes of a parent into one big node/geom and thereby reduces the number of batches.

Is that correct? If so, could someone give me a brief example of how to go about that? So far, I’ve tried reparenting all of the nodes in a given chunk to the same same node and flattening, with all of them having the same texture and scale, varying only in x,y,z position, and each one is still an entire Geom itself after my call to flattenStrong.

ZetronMan · February 9, 2012, 7:11pm

Since drwr is like the smartest person ever, let me just quote him from another thread:

drwr:

I don’t see anything wrong, but the other side of flattening is that once a node has been flattened, Panda must then render either all of the vertices in the node, or none of them. (Before flattening, Panda can eliminate those parts of the node that are outside the viewing frustum, but not after flattening.) So, if you flatten the whole world, suddenly Panda has to render the whole world every frame.

If your scene is simple enough that your graphics card can handle rendering the entire scene every frame, this isn’t a problem; but if you overwhelm your graphics card, you might need to reconsider your level of flattening.

You can use root.analyze() to get a report of how much geometry you’ve got beneath the root node, and how it is distributed. Try looking at it before and after flattening. It shouldn’t change (much) in the number of vertices; if it does, something’s wrong.

You can also see some of this in PStats–in PStats, you can see the number of vertices (and the number of Geoms) that are actually being sent to the graphics card. Before flattenStrong(), the number of vertices will be relatively low, and the number of Geoms will be relatively high. After flattenStrong(), the number of Geoms will be lower (because they have been combined together), while the number of vertices will increase (because Panda can no longer cull out individual pieces).

The trick is to find the balance point between Geoms and triangles that is appropriate for your graphics card.

[Help with closing primitives)

Laereom · February 9, 2012, 7:38pm

Maybe I wasn’t totally clear.

The reason it’s not doing anything to my performance is that it doesn’t seem to actually be reducing the number of geoms, despite the fact that the nodes I’m calling it on have 64 children using the exact same model, texture, and scale – the only thing they differ in is position. However, it just doesn’t seem to actually be able to flatten them. I am wondering what the specific qualifications are to get flattenStrong to collapse Geoms together.

So it’s not so much a matter of overall performance and the balancing act between flattening and not flattening, but rather getting flattening to actually…flatten.

ognjenk · February 9, 2012, 8:07pm

I believe you should call clearModelNodes()
on your nodepath first before calling flattenStrong(), as flatten does not go beyond model nodes.

Laereom · February 9, 2012, 8:25pm

Hah, embarrasingly, I had read about that in a thread yesterday, stuck it into some buggy code of mine, didn’t see any results, and promptly forgot about it.

Now, though…
Woo, wow, that is fast. I went from 4FPS maxing out one core to to 60fps hitting it 10-15%. That’s sexy. Thanks.

markjacksonguy · February 10, 2012, 1:24am

If self.Land points to model.loadModel(“LandModel”), and I call clearModelNodes() on self.Land,

self.Land.clearModelNodes()

can I still perform regular nodepath operations using self.Land?

Example:
self.Land.removeNode()
self.Land.setScale()
self.Land.setTexture()

Laereom · February 10, 2012, 2:53am

I believe that the answer is no, because strictly speaking, the node no longer exists. Although I’m not 100% sure. I think that your question comes at it from an angle that isn’t quite right, or I’m misunderstanding you.

What you really want to be calling clearModelNodes() on is the parent node of the self.Land, not self.Land itself.

At that point, self.Land is no longer an individual node, but instead just vertices in the parent node, no operations can be performed on it individually. There may still be ways to alter the vertices of those nodes individually, though, since as I understand it the vast majority of the information remains as part of the parent node.

Textures are an exception to that, I think, and a few other things may be, too – if you want to change the textures, they have to remain as individual nodes. I remember reading somewhere that having one texture file that effectively contains a bunch of textures and playing around with UV mapping can be a workaround, though. I am now speaking far beyond my level of competence.

markjacksonguy · February 10, 2012, 7:28pm

Laereom:

I believe that the answer is no, because strictly speaking, the node no longer exists. Although I’m not 100% sure. I think that your question comes at it from an angle that isn’t quite right, or I’m misunderstanding you.

What you really want to be calling clearModelNodes() on is the parent node of the self.Land, not self.Land itself.

At that point, self.Land is no longer an individual node, but instead just vertices in the parent node, no operations can be performed on it individually. There may still be ways to alter the vertices of those nodes individually, though, since as I understand it the vast majority of the information remains as part of the parent node.

Textures are an exception to that, I think, and a few other things may be, too – if you want to change the textures, they have to remain as individual nodes. I remember reading somewhere that having one texture file that effectively contains a bunch of textures and playing around with UV mapping can be a workaround, though. I am now speaking far beyond my level of competence.

I guess I really didn’t need to use the clearModelNodes(). It made no difference on my fps, which is running around 300 to 370+ fps with full screen desktop mode, wide screen. (faster in true full screen mode).

The only time my fps drops close to 60 is when I’m rotating an object by means of a “picking ray.” No suprise there, rays are expensive; even more so if your geometry receiving the ray is high count. I can eliminate that fps drop by allowing players to rotate an object before placement only, instead of anytime they want to.

Some of the rotating fps drop is do to the fact I have some bad test models. Some of my models don’t drop fps as bad when rotating. Models that have some issue when loaded up in P3D will cause additional fps drop. The only problem for me is, I can’t tell what’s wrong with the model, since they look normal enough.

All my test textures are poorly sized too. They aren’t 256x256, 512x512, 64x64, etc… They are ridiculous, 640x450, 2000x3000… That’s why I’m losing some performance as well.

Oh well, they’re just test models and textures anyway. I will be more careful during the actual graphic stage of development. (hopefully)

PS, I haven’t used egg-optchar on my actors yet (with keepall). I wonder if I could get some fps gain if I used it later?