Newbie here, need help with optimization

tastyfish · December 11, 2015, 12:27pm

I’m working on a grid-based RPG with simple graphics (just meshes and textures with two global lights, no shadows, no bumpmapping etc.). I supposed graphics like that would run very smoothly, but I’m getting 25 - 60 FPS, depending on how many nodes are visible in the scene. Here are my screenshots from profiling:

slow:

fast:

So I noticed most of the frame time is spent on Draw and most time of Draw is spent on Set State. I am not sure what that means exactly. Does it mean that rendering state is changed too often? How can I optimise this? I tried using bins to render the object in different orders (front to back, back to front, state sort, …), nothing seemed to have any impact on FPS.

Each tile and wall segment is a separate node that is transformed into right place with setPos(). My view distance isn’t infinite, I have it set so that I can see about 8 tiles in front of the camera.

I’m running this on Ubuntu laptop with Intel integrated GPU and GeForce GT 540M, both give more or less the same framerate (I’m running GeForce with “optirun python main.py” - am I doing it right?).

I’ll be glad for any help! Thanks.

rdb · December 11, 2015, 1:40pm

I’m counting a lot of state changes, which indicates that Panda is spending a lot of time telling OpenGL that the parameters of the object that is being rendered have changed. The fact that “Set state” is taking a lot of time would seem to confirm this.

You should consider calling flattenLight() on your scene (or the static parts thereof). This will “bake” the transformation onto the vertices of the objects. This should at least cut down the number of transform state changes at least.

Another useful panel to be included in the screenshot would have been the “Batches” panel, which shows how many separate draw calls are being made to the GPU. Given the number of nodes and the fair bit of time spent in “Primitive”, it stands to reason that you are probably sending more than a few hundred batches to the GPU. You should generally aim for making only a few hundred draw calls per frame if you want good performance, especially on mobile hardware.

One way to reduce this is to call flattenStrong() on your scene, which goes even further than flattenLight() in that it will try to merge together objects with the same textures and material. This can greatly reduce the number of draw calls. Since this makes it no longer possible to move the objects individually, you should either protect the movable objects by inserting a ModelNode or calling flattenStrong before you insert the movable objects into the scene graph.

One thing to note is that if you have a big world, flattening the entire world into one object would be a bad idea, because Panda can no longer effectively cull away the parts that are out of view. It is better to divide up the world into “zones” or “groups” that are flattened together. If you load in the scene from a single big egg file, this can be automated by adding something like this to Config.prc:

egg-flatten-radius 20.0

This will group together nodes within a radius of 20 units. This can also be done using the SceneGraphReducer class at runtime.

Also, note that models loaded in via loader.loadModel are protected by default from being flattened with other models (although they may be flattened internally) via a ModelNode that is inserted at the root of the model hierarchy. You can call model.clearModelNodes() to prevent this. That would be a good thing to call for static objects that you don’t intend to move after flattening.

If you want to know which GPU Panda is using, put “notify-level-glgsg debug” in Config.prc. You can also query this using base.win.getGsg().getDriverRenderer().

tastyfish · December 12, 2015, 10:37am

I briefly tried flattenLight() and it greatly increased my FPS. I didn’t know about these methods. I will have to play around with grouping the objects as you said but now I know what to focus on, thank you very much!