y-coordinate performance

Performance decreases when nodes are farther away from the camera.

from panda3d.core import loadPrcFileData
loadPrcFileData("", "show-frame-rate-meter 1")
loadPrcFileData("", "geom-cache-size 0")
#loadPrcFileData("", "want-pstats 1")
import direct.directbase.DirectStart
from panda3d.core import *
import random as ran

cardmaker= CardMaker( 'cardmaker' )
cardmaker.setFrame( -.5, .5, -.5, .5 )

colHigh= VBase3D( 1, .776, 0 )
colLow= VBase3D( 0, .620, 1 ) 

top= render.attachNewNode( 'top' )
for j in range( 5000 ):
	c= NodePath( cardmaker.generate( ) )
	c.setColor( *colLow )
	p1= ran.uniform( 0, 100 ), ran.uniform( 0, 10000 ), ran.uniform( 0, 100 )
	c.setPos( *p1 )
	c.reparentTo( top )

base.cam.setPos( 50, -25, 50 )

run( )

When I change the y-coordinate range of ‘p1’ from 10000 to 100, performance increases significantly.

That’s a good one. I don’t know. Maybe something about the way your graphics card resolves sub-pixel aliasing, so that smaller objects are more likely to cause some extra work in the driver? It seems like a bit of a stretch.

But, honestly, some drivers’ performance characteristics have always completely mystified me. Let me know if you can isolate it further.

David

If it’s worth anything, the large range caused a “Thread Block” delay of 50ms in Pstats. The small range cause a “Thread Block” delay of 17ms. But it was just a hunch to check that first and I found a difference, so there could be others. If that’s inconsistent with the other observations I reported, then it’s probably spurious and not the problem.

Hum, I’d be highly suspicious of variants in “Thread Block”. That time category includes the amount of time the Panda process is blocked by the operating system while other processes run (including PStats itself). So it’s subject to quite a bit of distractingly random variation.

If you can run PStats on another machine, that will help reduce interference from PStats. It’s also helpful to reduce the frequency at which PStats updates by changing pstats-max-rate, as described in another thread here.

David

I would like to see a text (or Excel) dump of descriptive statistics instead of inspecting them visually, such as the five quartiles (sic.) of each of the categories, or some cocktail of times sampled every so often, instead of every frame. Perhaps the folks at CMU could point us to a neural net package for classifying segments that occur in the data. That could relieve some of the tool’s interference with… itself.

For whatever reason, the Wait category didn’t have the same discrepancy. (Concurrent observations included closing Chromium.) This time, Draw stuck out as the culprit and got investigated.

On 0-100, Draw took less than 33 ms, about 27 I estimated.
On 0-10000, Draw took 100m.

The investigator didn’t make an observation of the Piano Roll reading.

OK, that sounds like it’s likely to be a more meaningful measurement, but all it tells you is that something in your driver takes longer to process the further (or possibly smaller) polygons. So we’re back to the first guess.

You can use text-stats instead of pstats (and capture stderr to a file via text-stats >t.log 2>&1), to get a text log of all of the pstats data. It will tend to be big.

David

There’s a really low likelihood that the discrepancy is due to a bug in Panda and such that anyone we have is willing or able to fix. Some of the parts of Panda have advanced as far as our knowledge and concentration enable.

Here’s the address of our chief exec of statistical descriptions for the pstats module. You should put your heads together and get your five quartiles drawn up. Preferably something we don’t need a render farm to tabulate.

Otherwise, there was a rider to the health care bill that mandated “graphics driver documentation exchanges”, so you can go there and get detailed descriptions of your driver with your choice of feature-price combinations. The tome that explains this could be pricey.