Adding geom data all at once

Atlas · January 29, 2020, 11:04pm

Hi, I’m having a bit of a problem. I’m attempting to create models at runtime via GeomVertexWriter. Any time a model needs to be modified, I find myself needing to loop over every vertex, normal, and UV to re-add the data via addData. This method of generation is pretty costly(and slow, and unnecessarily complex), so I was wondering if there might be any better ways to do this? At the moment I’m contemplating saving the model data separately as a list or dict(that way I don’t have to process model data all over again), modifying what I need removed/added, and then using a function to add all that data to the model at once. I’m not sure if there’s an easier or better way to do this though… And I’m not particularly sure how to translate a list into a readable model format either. Any thoughts or suggestions are very welcome.

Thank you in advance.

rdb · January 30, 2020, 11:04am

Firstly, there are a few ways to speed up creating geometry through GeomVertexWriter fairly dramatically. You can vdata.setNumRows(n) with the number of rows you’re going to write. This makes Panda preallocate the vertex data with the right size, so that it doesn’t need to resize it on the fly every time you add a new row.
If you’ve done that, you can call setData* instead of addData*, which is identical other than the fact that the former doesn’t check whether it has already reached the end of the vertex data (which is no longer necessary if you have already set the correct number of rows).

More efficient would be to use the Python buffer protocol to get raw access to a GeomVertexArrayData. Then you can directly write the floating-point values to the buffer, or even load in a numpy array. This is a little more difficult, though, because you have to understand how the data is laid out in memory.

Atlas · January 30, 2020, 6:48pm

Thank you for the reply! Weirdly enough, I experience no change in performance when adding SetNumRows to my vertex data. I already know how many numbers of rows I’m going to write, but adding that in/commenting it out sees no change in FPS.

I’m using the V3c4t2 format for my vertices. I guess while we’re already here, I’m wondering too if it’s more efficient to create a global set of GeomVertexWriters and run data through those, or to create independent GeomVertexWriters inside each class instance of a modifiable object?

Thanks again for the help, by the way!

rdb · January 31, 2020, 9:42am

It might be worth using a Python profiler to find out which lines of code are taking the most time. Then you can optimize based on that.

I don’t think you can get away with creating a global set of GeomVertexWriters. If the GeomVertexWriter ends up being the bottleneck, I would advise moving ahead with using the Python buffer protocol.

Atlas · February 6, 2020, 9:20pm

Profilers are a godsend! it made finding the problem super easy!
But I’m not sure if fixing said problem will be… Apparently somewhere in the code, a single Exec is being called that’s taking 1.5 times longer than the rest of the entire mesh generation function. I generally try to avoid exec since it can cause some security issues, so there’s no execs in my script. Does Panda3D call exec somewhere in it’s internal geometry creation code?

rdb · February 7, 2020, 11:33am

No, Panda just calls it when loading particle effects, but not in the geometry creation code.

It should be possible to deduce from the profiler capture what is calling exec in this case. If you posted the output, we may be able to help you find it.

Atlas · February 7, 2020, 6:42pm

That would be very helpful. Thank you!

Here’s what gets outputted from CProfile

         79328 function calls in 0.051 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.051    0.051 <string>:1(<module>)
        1    0.022    0.022    0.051    0.051 modelgenerator.py:69(internal_build)
        1    0.000    0.000    0.051    0.051 {built-in method builtins.exec}
    11332    0.016    0.000    0.016    0.000 {method 'addVertices' of 'panda3d.core.GeomPrimitive' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    22664    0.003    0.000    0.003    0.000 {method 'setData2f' of 'panda3d.core.GeomVertexWriter' objects}
    22664    0.006    0.000    0.006    0.000 {method 'setData3f' of 'panda3d.core.GeomVertexWriter' objects}
    22664    0.004    0.000    0.004    0.000 {method 'setData4f' of 'panda3d.core.GeomVertexWriter' objects}

This is as far as I was able to track it on my own. internal_build is the method that converts my current format into a viewable model by iterating across a list of vertex locations and using setData3f.
…Also I don’t know if this is relevant, but I’m still back on version 1.10.2. I’ve modified my copy to allow for GUI elements to accept different button-clicks to run different scripts.

rdb · February 7, 2020, 9:14pm

It’s clear that exec is just what’s calling the code you’re profiling in this instance. It has no time under tottime so it’s nothing to worry about.

It looks like the biggest bottleneck is the general Python code in the internal_build function, rather than the methods that it calls. If you showed me the function source, I might be able to suggest some microoptimizations (and if we can’t squeeze anything more out of it, that might be a moment to consider Cython or a C extension).

addVertices also takes a decent chunk of the time. If you were to call primitive.reserveNumVertices(n) ahead of time with however many vertices you plan to add, I think the cost of this call might be reduced.

Atlas · February 8, 2020, 6:20pm

I tried reserveNumVertices but there was no change in speed. Here’s the source from internal_build: https://pastebin.com/T0vW3d6r

Interestingly enough, I already tried to optimize it as much as possible using advise you gave to others on similar topics. The code itself was modified from a cube-creation example I found.

Unless there’s just something horrible I’ve done to bog everything down(which is entirely likely!), going with the Cython / C extension is probably the way to go. It’s something I’ve been considering anyway. Running the current code completely stalls out the game. It’s not for very long, but it’s definitely long enough to notice the game freeze. Though I would still be interested in optimizations to the current code anyway!

Thank you for taking the time to look over this.

rdb · February 10, 2020, 2:44pm

There is room for microoptimization, such as getting rid of all the getattrs in the inner loops, and simplifying the math in the loop. You can also use a memoryview to directly manipulate the index array. Give this a try (untested, there might be mistakes):

    def internal_build(self):
        tss = self.TSS
        faces = modelInstance.faces
        setVertData = self.vertex.setData3f
        setClrData = self.color.setData4f
        setUvData = self.UV.setData2f

        invTss = Vec2(1.0 / tss[0], 1.0 / tss[1])
        topLeft = Vec2(0, invTss[1])
        topRight = Vec2(invTss[0], invTss[1])
        bottomRight = Vec2(invTss[0], 0)
        locScale = invTss * (1.0 / 32.0)

        verts = self.triangles.modifyVertices()
        verts.uncleanSetNumRows(len(faces) * 6)
        vertView = memoryview(verts)
        vertexId = 0
        indexId = 0

        for x1, y1, z1, x2, y2, z2, texture, color in faces:
            # It's faster if you store these in the correct order to begin with!
            if x1 != x2:
                setVertData(x1, y1, z1)
                setVertData(x2, y1, z1)
                setVertData(x2, y2, z2)
                setVertData(x1, y2, z2)
            else:
                setVertData(x1, y1, z1)
                setVertData(x2, y2, z1)
                setVertData(x2, y2, z2)
                setVertData(x1, y1, z2)

            # It's probably faster if you store your color as a VBase4 to begin with!
            setClrData(color, color, color, 1.0)
            setClrData(color, color, color, 1.0)
            setClrData(color, color, color, 1.0)
            setClrData(color, color, color, 1.0)

            location = Vec2(*materialIndex[texture])
            location.componentwiseMult(locScale)

            setUvData(location + topLeft)
            setUvData(location)
            setUvData(location + bottomRight)
            setUvData(location + topRight)

            vertView[indexId] = vertexId
            vertView[indexId + 1] = vertexId + 1
            vertView[indexId + 2] = vertexId + 3
            vertView[indexId + 3] = vertexId + 1
            vertView[indexId + 4] = vertexId + 2
            vertView[indexId + 5] = vertexId + 3
            vertexId += 4
            indexId += 6

        self.faceCounter += len(faces)

There may be further microoptimizations to be found, but it’s probably diminishing returns after this point.

Atlas · February 11, 2020, 3:39pm

That’s super fast!
According to the profiler, it’s almost twice as fast.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.029    0.029 <string>:1(<module>)
        1    0.018    0.018    0.029    0.029 modelgenerator.py:78(internal_build)
        1    0.000    0.000    0.029    0.029 {built-in method builtins.exec}
        2    0.000    0.000    0.000    0.000 {built-in method builtins.len}
     5666    0.000    0.000    0.000    0.000 {method 'componentwiseMult' of 'panda3d.core.LVecBase2f' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'modifyVertices' of 'panda3d.core.GeomPrimitive' objects}
    22664    0.002    0.000    0.002    0.000 {method 'setData2f' of 'panda3d.core.GeomVertexWriter' objects}
    22664    0.004    0.000    0.004    0.000 {method 'setData3f' of 'panda3d.core.GeomVertexWriter' objects}
    22664    0.004    0.000    0.004    0.000 {method 'setData4f' of 'panda3d.core.GeomVertexWriter' objects}
        1    0.000    0.000    0.000    0.000 {method 'uncleanSetNumRows' of 'panda3d.core.GeomVertexArrayData' objects}

The first model loads instantaneously. For scenes involving 10+ models, it does still experience a bit of lag. To compensate, I put in a momentary delay between loading. Between that and this new optimized function, I’m getting a solid 27 out of 30 fps! Eventually I might employ a C extension in case less powerful systems struggle running my game. But for now, these optimizations are an absolute lifesaver. Thank you!