Understanding (and reducing) latency

ckemere · September 14, 2021, 6:59pm

Hi,

I’ve created some code that renders a procedurally generated model of a simple maze. The y-position is controlled by a wheel connected to a rotary encoder which communicates to the program using a ZMQ socket connection. I have defined a Task which receives messages and updates the position variable. Everything is working approximately fine, and the reported frame rate is usually my monitor refresh rate (60 Hz).

Yesterday, I instrumented it with a photosensor, and discovered that the scene updates are occurring with a lag of between 3 and 4 frames. The 1 frame variability makes sense to me (the update will occur sometime between frames), but I don’t understand the 3-frame baseline. Is there some sort of double buffering that I can turn off?

More detail on my Task - I poll on the socket with a 0.01 ms timeout for the first message and a 0 ms timeout for subsequent ones (I expect ~8-9 messages per frame), and only finally update the position when no more messages are available.

Second more detail - I should mention that this is running on a Raspberry Pi 4B. Panda reports that I can do stable framerates up to 75Hz, but I really should try also on my desktop.

(The code is here - github.com/kemerelab/PyRenderMaze. It’s hard to make a minimal implementation b/c the instrumentation requires some hardware.)

Thanks!
Caleb

Thaumaturge · September 14, 2021, 7:34pm

Could it be that the processing of your messages is taking long enough that around three frames of activity are required in order to clear the message-queue?

As a test, what happens if you take one message per frame, dump all others, and then update the scene? Do you still encounter that three-frame latency?

ckemere · September 14, 2021, 8:57pm

Dropping the data rate by a factor of 4 did not reduce latency. It actually increased it by ~1/3rd of a frame, but that’s expected because my detection approach here is to cycle through the maze and detect an intensity change at a particular location. (That gives me hardware timestamps from my encoder and also hardware timestamps associated with the intensity change.)

Thaumaturge · September 14, 2021, 8:58pm

Ah, fair enough!

In that case, I’m afraid that I don’t know networking all that well, so for now I’ll bow out and leave this for others!

ckemere · September 15, 2021, 3:19pm

Interestingly, when I forced my refresh rate to 75 Hz, I find that the latency decreases absolutely (in milliseconds), but remains constant at between 3 and 4 frames.

I also tested using my laptop rather than the pi, and found roughly the same level. So the question is: is this a Panda3d effect, or is it caused by my graphics cards/openGL pipeline…

ckemere · September 20, 2021, 7:36pm

I just verified that I get the same latency when using Ubuntu and a bare X11 installation (no window manager). So I surmise that Panda3D is triple buffering somehow. Is there a way to avoid this somehow? (Is there filtering or something like that that can be turned off?)

Thanks!

serega-kkz · September 20, 2021, 7:45pm

Maybe it’s worth checking the driver settings, it’s usually configured there.

rdb · September 20, 2021, 7:57pm

Ways to reduce latency in Config.prc: (try them individually or together)

auto-flip 1
sync-video 0
back-buffers 0

The default is to optimize for performance, not latency.

Also, verify the sort value on your task, relative to the sort value of Panda’s graphics loop task.

ckemere · September 20, 2021, 8:25pm

Thanks for this!!!

Two quick questions: to run with back-buffers 0, do I need to do anything special? When I enable that, I find that I just get a blank screen.

Second, is there a way to see the sort value of the graphics loop? I assume to minimize pipeline latency, I’d want my data/model update to happen before the graphics loop execution. So that would mean a very high sort value?

Thanks!

Epihaius · September 20, 2021, 8:42pm

As far as I know, the sort value of the graphics loop is 50.
To be sure, you can call print(base.task_mgr), which lists all currently running tasks with their sort values (the graphics loop is called igLoop).

No, if you want your own task to run before the graphics loop, its sort value needs to be lower, e.g. 49.

ckemere · September 20, 2021, 9:28pm

Thanks for that! I will double check, but it sounds like my current sort value (=1) should be putting me in the right order already.

Regarding the Config.prc parameters, I have a confession - my application is inheriting from ShowBase rather than implementing the various pieces. Could this be causing my extra latency?

Thanks!

rdb · October 26, 2021, 9:55am

I don’t think inheriting from ShowBase would cause such a problem. However, make sure these settings are being set before the ShowBase constructor is invoked, otherwise they may not take effect.

It is odd that back-buffers 0 is giving you trouble. I’m not seeing issues when using the pandagles2 renderer with this setting on my laptop. It may be a driver-specific issue.