setPos is a bit slow?

I’m trying to move hundreds of objects on the screen at a time, and I’ve noticed that setPos is a bit slow – even for nodes that are hidden.

To move 900 (hidden smileys around), PStats tells me that the actual 900 setPos call took 10.5ms, and another 2.5ms was spent in 'Bounds.'

App->Showcode->task_testmov 10.5ms
*->Bounds 2.5ms

I tried adding transform-cache 0 to the configfile. This brought the execution time of the task that contains the 900 setPos down to 5ms, but it still took 3ms to do 'Bounds.'

Is there a way to turn off all of the bounds checking/recalculation associated with setPos? I've tried (for every smiley)

 t.setBounds(OmniBoundingVolume())
 t.node().setFinal

But it doesn’t seem to do much. Also is there a way to setPos to perform faster? I know that 5ms/900~= 5usecs is not a very long time, but it still seems like a long time to set a few matrix elements.

Thanks,

Zhao

for i in range(0, 900):
    t = loader.loadModel('smiley')
    t.setPos( i*10,0,0)
    t.reparentTo(render)
    #t.node().setBounds(OmniBoundingVolume())
    #t.node().setFinal(True)
    t.hide()
    tlist.append(t)    

def task_testmov(task):
    for i in tlist:
        t = 0.01
        p = i.getPos()
        i.setPos( p[0] + t, p[1], [2]) )
return task.cont

taskMgr.add( testmov, 'testmov')

run()

P.S. The entire task takes ~1ms to perform without the setPos.

I suppose you could disable culling on the parent node of all the smileys, and that might avoid the need to recompute their bounding volumes.

But your render time is going to suffer if you really have 900 independently-moving nodes, quite regardless of the time spent in setPos().

David

Not every object is going to be onscreen at the same time. Most will be hidden. But I would still like to set their position in the scene graph. Otherwise, I would have to come up with some tortuous, error-prone caching mechanism.

How do I completely disable the culling?

You completely disable culling by disabling on the root node, e.g. render. Your pasted code is incomplete, so be sure you are doing it correctly:

render.node().setBounds(OmniBoundingVolume())
render.node().setFinal(True)

But disabling culling doesn’t necessary defeat the automatic computation of bounding volumes. All it does is defeat the need to use bounding volumes in culling; if any other process (for instance, collisions, or an explicit operation in Python code) requires the bounding volume, then it will still need to recompute the bounding volumes.

On the other hand, if you stash() your nodes instead of hide()ing them, then they will not contribute to their parent’s bounding volume in the first place, and there will be no need to recompute bounding volumes when you move them around. That might make more sense in your case.

David

David, thanks for the stash suggestion. It completely removed the 3ms bounds check. So it’s down to 5ms now for 900 setPos’s. I still think that’s a really long time, but I’ll try to distribute the calls over several frames.

The line:

i.setPos( p[0] + t, p[1], [2]) ) 

seems to have an extra closing parenthesis. Did you mean to write:

i.setPos( p[0] + t, p[1], [2])

or did you mean to write:

i.setPos( (p[0] + t, p[1], [2]) ) 

? I ask because the first form will be slightly faster than the second–the first form calls a C++ method directly, while the second form requires the construction of a temporary LVecBase3f from the temporary 3-tuple that you are creating on-the-fly.

Edit: you could try setFluidPos(), which might be a little bit less expensive because it omits one of the steps that setPos() must do.

Also note that setPos() and setFluidPos() both have to obtain the original transform and modify it to change its pos, in case you have a rotation and/or scale that you don’t want to change. Assuming you don’t have anything other than a pos, it might be slightly faster to replace the entire transform. Unfortunately, there are no Python-exposed methods that make this possible in a single call, and the per-call overhead of Python itself will probably dwarf any optimization benefits that you’d gain if you made the multiple calls necessary to achieve it. But you could try setPosHprScale(x, y, z, 0, 0, 0, 1, 1, 1) as the closest available possibility.

David

Thanks David for your tips. It’s somewhat manageable now. If it becomes more of a problem down the line, I’ll try to some digging in the C++ portion of the code.

I’m curious if any of my tips in the above post were specifically useful, and if so, which ones helped. It’s a useful point of information for me to know for future optimizations.

David

My conclusion of doing the testing is that setMat and setFluidPos are both equally fast for setting Pos, but I will most likely use setMat as I can also set hpr using it as well. In practice for most things, I think you will want to change Hpr too if/when Pos changes.

If you are going to optimize this part of Panda, I suggest that you optimize it for setMat. One scenario that is going to become more and more common is to run the world/physics simulator in a second process. Such a simulator will have it’s own internal representation and will most likely communicate with the render with a long list of Mat4s instead of separate Point3Pos + Vbase4Hprs.

My numbers for the tests were as follows:

[/code]

tlist = []

for i in range(0, 30):
for j in range(0,30):
t = loader.loadModel(‘smiley’)
t.setPos(i * 10, j*10, 0)
t.reparentTo( render)
t.stash()
#t.hide()
tlist.append(t)

tMat4 = Mat4(1,0,0,1,
0,1,0,1,
0,0,1,1,
0,0,0,0)

p = (1,2,3)

def testmov(task):
for i in tlist:
t = random.random()*0.01
#i.setFluidPos(p[0] + t, p[1], [2])
#i.setPos(p[0] + t, p[1], p[2])
i.setMat(tMat4)
return task.cont
[/code]

setPos + Hiding + transform-cache normal

testmov: 9.5ms
bounds: 3ms

setPos + stashed + transform-cache normal
testmov: 10-12ms
bounds: 0

setPos + hide + transform-cache 0
testmov: 6ms
bounds: 3ms

setPos + stashed + transform-cache 0
testmov: 6ms
bounds: ~0

setFluidPos + stashed + transform-cache 0
testmov: 3.8ms

setMat4 + stashed + transform-cache 0
testmov: 4ms

setPosScale + stashed + transform-cache 0
testmov: 7ms