hey, just in case.
I managed to get all this working properly by insuring bullet processing is allocated to a task_chain whose thread duration is less than the others. (ie simply put bullet alone in one task_chain)
here are the results… with no visible lag any longer (bullet task is #4)
## TASK 1 re-exerciced after 0.0794035 seconds
## TASK 3 re-exerciced after 0.020194 seconds
## TASK 4 re-exerciced after 0.0202004 seconds
## TASK 2 re-exerciced after 0.0201275 seconds
## TASK 4 re-exerciced after 0.0191932 seconds
## TASK 3 re-exerciced after 0.0191997 seconds
## TASK 2 re-exerciced after 0.0191966 seconds
## TASK 3 re-exerciced after 0.0226837 seconds
## TASK 4 re-exerciced after 0.0226882 seconds
## TASK 2 re-exerciced after 0.022687 seconds
## TASK 4 re-exerciced after 0.0195121 seconds
## TASK 3 re-exerciced after 0.0195166 seconds
## TASK 2 re-exerciced after 0.0195143 seconds
## TASK 1 re-exerciced after 0.0855763 seconds
## TASK 4 re-exerciced after 0.0227592 seconds
## TASK 3 re-exerciced after 0.0227775 seconds
## TASK 2 re-exerciced after 0.0227659 seconds
## TASK 3 re-exerciced after 0.0202487 seconds
## TASK 4 re-exerciced after 0.0202973 seconds
## TASK 2 re-exerciced after 0.020325 seconds
## TASK 3 re-exerciced after 0.0192568 seconds
## TASK 4 re-exerciced after 0.0192333 seconds
## TASK 2 re-exerciced after 0.0191966 seconds
## TASK 3 re-exerciced after 0.0184327 seconds
## TASK 4 re-exerciced after 0.0184335 seconds
## TASK 2 re-exerciced after 0.0184457 seconds
## TASK 1 re-exerciced after 0.0802387 seconds
giving something around 42fps…with spy print enabled, and roughly 60fps if no spy printing.

Is there any hidden damping coefficient somewhere? (I’ve checked node.getLinearDamping(), and it returns 0.)