hey, just in case.
I managed to get all this working properly by insuring bullet processing is allocated to a task_chain whose thread duration is less than the others. (ie simply put bullet alone in one task_chain)
here are the results… with no visible lag any longer (bullet task is #4)
## TASK 1 re-exerciced after 0.0794035 seconds
## TASK 3 re-exerciced after 0.020194 seconds
## TASK 4 re-exerciced after 0.0202004 seconds
## TASK 2 re-exerciced after 0.0201275 seconds
## TASK 4 re-exerciced after 0.0191932 seconds
## TASK 3 re-exerciced after 0.0191997 seconds
## TASK 2 re-exerciced after 0.0191966 seconds
## TASK 3 re-exerciced after 0.0226837 seconds
## TASK 4 re-exerciced after 0.0226882 seconds
## TASK 2 re-exerciced after 0.022687 seconds
## TASK 4 re-exerciced after 0.0195121 seconds
## TASK 3 re-exerciced after 0.0195166 seconds
## TASK 2 re-exerciced after 0.0195143 seconds
## TASK 1 re-exerciced after 0.0855763 seconds
## TASK 4 re-exerciced after 0.0227592 seconds
## TASK 3 re-exerciced after 0.0227775 seconds
## TASK 2 re-exerciced after 0.0227659 seconds
## TASK 3 re-exerciced after 0.0202487 seconds
## TASK 4 re-exerciced after 0.0202973 seconds
## TASK 2 re-exerciced after 0.020325 seconds
## TASK 3 re-exerciced after 0.0192568 seconds
## TASK 4 re-exerciced after 0.0192333 seconds
## TASK 2 re-exerciced after 0.0191966 seconds
## TASK 3 re-exerciced after 0.0184327 seconds
## TASK 4 re-exerciced after 0.0184335 seconds
## TASK 2 re-exerciced after 0.0184457 seconds
## TASK 1 re-exerciced after 0.0802387 seconds
giving something around 42fps…with spy print enabled, and roughly 60fps if no spy printing.