Flattening issues

Craig · November 4, 2012, 10:19pm

I know I’ve brought this up before (see https://bugs.launchpad.net/panda3d/+bug/913658 ), but I’m still stuck. I have some models I want to flatten. Even if set every RenderState on every node and every geom to an empty RenderState, FlattenStrong only takes me from 21499 geoms to 589. It successfully reduces the model to a single geom node, but thats way too many geoms! It should be 2 (one for the lines, one for the triangles). I’ve tried removing all the lines, and it does not solve the issue.

Before flatten (after removing all renderStates):

28796 total nodes (including 0 instances); 0 LODNodes.
6732 transforms; 0% of nodes have some render attribute.
21499 Geoms, with 21499 GeomVertexDatas and 3 GeomVertexFormats, appear on 21519 GeomNodes.
639320 vertices, 634375 normals, 0 colors, 437750 texture coordinates.
GeomVertexData arrays occupy 28872K memory.
GeomPrimitive arrays occupy 2604K memory.
38381 GeomVertexArrayDatas are redundant, wasting 7184K.
18026 GeomPrimitive arrays are redundant, wasting 2112K.
464874 triangles:
148314 of these are on 68157 tristrips (2.17606 average tris per strip).
316560 of these are independent triangles.
3240 lines, 0 points.
36 textures, estimated minimum 26301K texture memory required.

After flatten:

2 total nodes (including 0 instances); 0 LODNodes.
0 transforms; 0% of nodes have some render attribute.
589 Geoms, with 326 GeomVertexDatas and 2 GeomVertexFormats, appear on 1 GeomNodes.
639320 vertices, 639320 normals, 0 colors, 575005 texture coordinates.
34468 normals are too long, 176977 are too short. Average normal length is 0.871142
GeomVertexData arrays occupy 32953K memory.
GeomPrimitive arrays occupy 2737K memory.
3 GeomVertexArrayDatas are redundant, wasting 141K.
46 GeomPrimitive arrays are redundant, wasting 139K.
464874 triangles:
0 of these are on 0 tristrips.
464874 of these are independent triangles.
3240 lines, 0 points.
0 textures, estimated minimum 0K texture memory required.

And node.ls() produces:

ModelRoot rest_FromDae.egg
GeomNode SketchUp (589 geoms)

Here is a smaller example (this time with the lines removed):

1253 total nodes (including 0 instances); 0 LODNodes.
232 transforms; 0% of nodes have some render attribute.
1019 Geoms, with 1019 GeomVertexDatas and 2 GeomVertexFormats, appear on 1019 GeomNodes.
31012 vertices, 31012 normals, 0 colors, 29400 texture coordinates.
GeomVertexData arrays occupy 1678K memory.
GeomPrimitive arrays occupy 99K memory.
3752 GeomVertexArrayDatas are redundant, wasting 1497K.
993 GeomPrimitive arrays are redundant, wasting 95K.
18678 triangles:
18150 of these are on 8208 tristrips (2.21126 average tris per strip).
528 of these are independent triangles.
0 textures, estimated minimum 0K texture memory required.

2 total nodes (including 0 instances); 0 LODNodes.
0 transforms; 0% of nodes have some render attribute.
16 Geoms, with 16 GeomVertexDatas and 2 GeomVertexFormats, appear on 1 GeomNodes.
31012 vertices, 31012 normals, 0 colors, 30402 texture coordinates.
0 normals are too long, 78 are too short. Average normal length is 1
GeomVertexData arrays occupy 1708K memory.
GeomPrimitive arrays occupy 104K memory.
21 GeomVertexArrayDatas are redundant, wasting 348K.
7 GeomPrimitive arrays are redundant, wasting 47K.
18678 triangles:
17036 of these are on 7725 tristrips (2.20531 average tris per strip).
1642 of these are independent triangles.
0 textures, estimated minimum 0K texture memory required.

ModelRoot rest_FromDae.egg
GeomNode SketchUp (16 geoms)

It should be exactly 1 geom, but instead its 16! Well, maybe 2 geoms (one each of the GeomVertexFormats)

The model in question for this smaller example, and the source code to strip renderstates, lines, and flatten it is here:
http://craig.p3dp.com/Misc/flattenIssue.zip

This has been killing my frame rates for the last year or so, and its getting kinda annoying to have 3-8 fps now that my maps are geting bigger. Even with proper chunking and LOD, each chunk at each LOD has like 10+ times the geoms it should.

Any ideas on how to fix this, or why it happens?

Thanks!

drwr · November 5, 2012, 1:14am

It’s a lot of vertices. FlattenStrong() will not put more than max-collect-vertices into a single GeomVertexData, or more than max-collect-indices into a single GeomPrimitive. The default values for both of these settings are 65536, because most graphics drivers can’t handle more than that without losing performance anyway. Your particular graphics card may vary (and you can ask what it claims to support with gsg.getMaxVerticesPerArray() and gsg.getMaxVerticesPerPrimitive().

David

Craig · November 5, 2012, 3:38am

1085 and 1938 vertex per geom for the two models in this example. getMaxVerticesPerArray and getMaxVerticesPerPrimitive report 2048 and 150000 respectively. This may explain part of the issue. 2048 is really small for MaxVerticesPerArray isn’t it? I can render 200000 vertex meshes on this graphics card at good frame rates with panda3d.

This may explain why I get different geom counts on my mac, on windows and linux (though with my windows and linux machines being the same hardware, I guess its just that the drivers differ).

If the default is 65536, why would both these values be vastly different from the defaults (in opposite directions)?

I tried changing them with the config vars “max-collect-vertices” and “egg-max-vertices” but it did not change the reported limit form the gsg. So how can I change the limit?

I’m on mac OSX 10.6 running panda3s 1.8. I have an Nvidia GeForce 9600M GT graphics card on this machine, though I have similar issues on my linux/windows box with Intel HD 4000 graphics.

Thanks!

drwr · November 5, 2012, 6:24pm

Ah, the number 2048 is coming from your graphics driver, in response to the OpenGL query glGetIntegerv(GL_MAX_ELEMENTS_VERTICES). You can see this happen if you set “notify-level-display debug”. This is the graphics driver saying that it cannot handle more than this without some kind of performance issue. It doesn’t say how severe the performance issue would be, and if your own performance measurements contradict this claim, you might have to take it up with NVidia.

Panda is helpfully respecting the requested limit; it chooses the lower of your max-collect-vertices and the driver value reported for GL_MAX_ELEMENTS_VERTICES.

Unfortunately, there’s not an interface to easily override this driver setting. It didn’t occur to me that it would ever be necessary, or even a good idea.

But you can override it. You just have to use the lower-level SceneGraphReducer, instead of using the high-level NodePath::flatten_strong() (which internally creates and uses a SceneGraphReducer). In fact, the definition for flatten_strong() is this:

int NodePath::
flatten_strong() {
  nassertr_always(!is_empty(), 0);
  SceneGraphReducer gr;
  gr.apply_attribs(node());
  int num_removed = gr.flatten(node(), ~0);

  if (flatten_geoms) {
    gr.make_compatible_state(node());
    gr.collect_vertex_data(node(), ~(SceneGraphReducer::CVD_format | SceneGraphReducer::CVD_name | SceneGraphReducer::CVD_animation_type));
    gr.unify(node(), false);
  }

  return num_removed;
}

Where flatten_geoms is a config variable whose default value is true. You can do this code in Python too; and just add the line gr.clear_gsg() right after the SceneGraphReducer constructor to remove the dependency on your own GSG’s reported limitations.

David

Craig · November 6, 2012, 3:37am

That worked! Thanks! It only slightly helped frame rate, but should make debugging the rest of my issues much simpler.

This is exactly what I wanted. Since this is part of my model preprocessor, clearing GPU specific details is appropriate.

Thanks for the detail. Now I understand what was going on, and why!

Edit: and with flattening “fixed”, I located my other big frame rate killer: I was using a ray for the camera collision, not a segment. Oops. 10 fps ->70 fps for those 2 fixes together. Now everything is back to being properly FPS limited by how many shaders/buffers I enable. My cunking and LOD is now paying off big time

This is a great example of why I like panda3d so much: when I get things right, it runs great. And when I don’t, there are great profiling tools, and great help available!