Currently, the code is written to assume it only has to process the default set of texture coordinates for each vertex. This means that as it walks through the set of vertices, creating cards for each one, it reads (a) the vertex position, (b) the vertex color, © the texture coordinate, and applies all of those to the four corners of the quad that it generates. If TexGenAttrib.MPointSprite is applied, then instead of © it doesn’t copy the existing texture coordinate, but instead (c2) it generates a new set of (0,0)-(1,1) for the quad.
In order to handle multiple texture stages and multiple texture coordinates, this algorithm has to become a lot more sophisticated. Before it begins processing, it has to examine the set of texture stages, figure out which ones keep their texture coordinates and which ones generate new texture coordinates. Then it has to store this information in a data structure, a list of texture coordinates that have to be copied, and another list of texture coordiantes that have to be generated, and for each vertex, it has to walk through these lists and copy or generate the appropriate texture coordinates.
So this means the logic is more complicated; instead of simply doing either © or (c2) for each vertex, it has to walk through a list and process each item in the list. Even in the ordinary case, where the list only has one item in it, the overhead of walking through a list is more than hardcoding © or (c2).
Normally this additional overhead wouldn’t be that big a deal, but this is very low-level code that has to run many thousands of times a frame, in order to process the thousands of vertices that you might have; and so even a very tiny difference can add up to a noticeable drop in frame rate.
There are many cases in which a software package such as Panda must choose to do the less correct behavior in order to improve performance for 99% of the use cases. Collisions are a classic example of these kinds of compromises. It happens a lot in rendering, too, and this would not be the first case in Panda where this kind of compromise is made.
Still, it’s possible to fix it without imposing additional overhead for all cases, but it means replicating the code at the outer level to handle each case separately. You make a good argument for doing this, and I’ll put it on my list of things to do.
Of course, if someone else wanted to volunteer to do the needed work and submit patches, it would happen sooner.
David