Hmm… On reflection, I wonder whether my problem isn’t simply that I have a pretty weak machine, especially in terms of graphics. Perhaps things normally passed on to the GPU are being executed on CPU in my case? 
I’ll confess that I’m rather hoping that my game will at least run playably on this machine; for one thing, this is my development machine, and I’m not likely to get something faster any time soon.
I changed my clear-code to the following:
for region in self.surfaceCamera.node().getDisplayRegions():
region.setClearActive(0, 0)
region.setClearActive(1, 0)
region.setClearActive(2, 0)
for region in base.win.getDisplayRegions():
region.setClearActive(0, 0)
region.setClearActive(1, 0)
region.setClearActive(2, 0)
(I wasn’t sure of what to use for the first parameter, so I guessed that clearing the first three of each would be reasonably safe.)
There seems to be little apparent effect (perhaps a change of one or two milliseconds, although that might be attributable to extraneous variations.
(The following results were all obtained with “gl-finish 1” still present in my .prc file.)
“Cull” seems to be associated with very little time overall, although for what it’s worth dr_0 and dr_1 do seem to take up a reasonable portion of what time that is.
Is that the value reported in the graph labelled “Primitive batches”? If so, it seems to hold steady at 7.
5 total nodes (including 0 instances); 0 LODNodes.
1 transforms; 40% of nodes have some render attribute.
2 Geoms, with 1 GeomVertexDatas and 1 GeomVertexFormats, appear on 1 GeomNodes.
8 vertices, 8 normals, 0 colors, 8 texture coordinates.
GeomVertexData arrays occupy 1K memory.
GeomPrimitive arrays occupy 1K memory.
4 triangles:
4 of these are on 2 tristrips (2 average tris per strip).
0 of these are independent triangles.
2 textures, estimated minimum 768K texture memory required.
I have no shaders active at all, I believe - “setShaderAuto” should not be being called, and I’m not doing any shader-work of my own, I believe.