Running parallel calculations for a simulation-based game

Hi! New user here, been getting to know Panda3D after starting development of a game idea I had early this year on Ursina. I’m moving my codebase to be slightly more P3d-based to get at the graphics and nodepath more easily, and for the richer online documentation and support.

I’m building a game that’s based on a particle simulation, which involves updating the position of a large number of objects every turn. There’s no player input to worry about for most of these, as the player interacts by adding and removing atoms rather than directly moving atoms. So I’ve got an integration method running 4 iterations on my calculations every tick (the Runge-Kutte fourth order method), and each of those iterations must be done sequentially to calculate weights for the motion for the tick. However, they can all be done in parallel, as each atom doesn’t need to know what any other atom will do next in order to do what it needs to do- they all just need to know the field state at the timestep, so the last iteration or whole update tick must have completed.

I have to admit, I’m really struggling to figure out how to use tasks for this. It looks like I’ll need to use the async await keyword in my update method, but am I forced to access global variables if I want the tasks to run asynchronously? It feels messy to give each calculator access to the entire dataset of positions/velocities, but I’m stuck for any way to ‘feed’ the correct variables to the tasks.

This feels tangled up with the question of- is there a way to manually continue tasks in-between ticks? I was thinking it would make sense if there was some way to tell an entire task chain “queue all your tasks to be run, and deposit their return values in the following variables” and then “wait for all tasks to complete before repeating, but now deposit their return values here”. Every time I try to think of how to do this, all I can think of is either an absolute mess of duplicate code and extra tracing variables to manage flow, or getting tangled up in the messy polymorphism of the AsyncTask class again. I tried inheriting that class as suggested on the API, but could not get the interpreter to consider the inheriting class to be an instance of AsyncTask, and scrapped all the AsyncTask-related code.

Sorry for the absolute mess of a post, but I wanted to give as much detail as I could on what I’ve tried so far and what’s gone on. I’m mostly testing things out here (please don’t judge my learner-code too badly!): chembattle/test zone/pandaTest.py at main · gossfunkel/chembattle · GitHub

An update!

I tried implementing it with async and member variables in my ShowBase class. The framerate is great! But nothing happens :grinning_face_with_smiling_eyes:

I suspect the threads are fighting for access to arrays, since they’re numpy array objects rather than explicit memory locations. This feels mesy- and it is, because it doesn’t work since reorganising for tasks :sweat_smile:
in _init_():

self.spheres = np.zeros(sphereNum)
self.thetas = np.zeros(sphereNum)
self.sphere = self.loader.loadModel("../sphere")
self.sphere.setScale(0.5)
self.sphereNodep = NodePath('spheres') # placeholders can be attached to another node to spawn groups of instances
for i in range(sphereNum):
	placeholder = self.sphereNodep.attachNewNode("Sphere-Placeholder")
			placeholder.setScale(0.5)
			placeholder.setPos(pos[i,0],pos[i,1],pos[i,2])
			self.sphere.instanceTo(placeholder)
			self.thetas[i] = 0.0
	placeholder = render.attachNewNode("SphereGroup")
	placeholder.setPos(-50,180,5)
	self.sphereNodep.instanceTo(placeholder)

# create a task chain capable of multithreading
self.taskMgr.setupTaskChain('physTaskChain', 
							numThreads=8, 
							threadPriority=1)
# initialise variables and weights for rk4 in scope, each as an array of size sphereNum, populated with 3d vectors
self.k = np.array([np.zeros((sphereNum,3)) for _ in range(4)])
self.r = pos
self.v = vel
self.t = 0.0
self.rk4step = 0
self.tasks = []
# generate a task to calculate for each sphere
for i in range(sphereNum):
	self.tasks.append(taskMgr.doMethodLater(0, self.ode, "sphere_ode", extraArgs=[i], taskChain="physTaskChain", 
																 sort=0, appendTask=True))

and the task methods:

def ode(self, i, task): 
	force = -dLJP(self.r,i) + coul(self.r,i) #+ grav(r,i) 								  #!!! CURRENTLY, ADDING GRAVITY HALVES THE FRAMERATE
	self.k[self.rk4step,i] = np.array([np.transpose(np.transpose(force) / mass[i])]) 	#!!!!!!!!!!!!!!!!! hacky - only works while masses are equal
	self.v[i] = self.v[i] + self.k[self.rk4step,i] * self.t 		# find r'(t) = v(t) from a = r''(t)
	self.r[i] = self.r[i] + self.v[i] * self.t 						# find r(t)
	return task.done	

async def update(self, task):
	global pos, vel

	for i in range(4):
		if (taskMgr.mgr.getActiveTasks().hasTask(self.tasks[sphereNum-1])): # wait for previous round of tasks to finish
			for tsk in taskMgr.mgr.getActiveTasks():
				if (tsk.name == "sphere_ode"):
					await tsk
		self.r = pos 
		self.rk4step = i
		if (i == 0): 
			self.t = 0
			self.v = vel
		elif (i == 1 or i == 2): 
			self.t = setdt/2. 
			self.v = vel + self.k[i-1] * setdt/2
		else: 
			self.t = setdt
			self.v = vel + self.k[3] * setdt

		for tsk in self.taskMgr.getDoLaters(): # run one task per sphere
			if (tsk.name == "sphere_ode"):
				tsk.again
		
	if (taskMgr.getTasksNamed("sphere_ode") != 0): # wait for previous round of tasks to finish
			for tsk in taskMgr.getTasksNamed("sphere_ode"):
				await tsk

	vel = self.v + setdt/6*(self.k[0] + 2*self.k[1] + 2*self.k[2] + self.k[3])
	pos = self.r

	# # update the positions of the spheres:
	for i in range(sphereNum):
		self.sphereNodep.getChild(i).setFluidPos(pos[i,0],pos[i,1],pos[i,2])
			(sin(self.thetas[i])*6)+10) 

	return task.cont

There are a number of different philosophies regarding the proper way to do simultaneous operations of this sort.

Have you considered using parallels? Sequences and Parallels — Panda3D Manual

Thanks for the reply! Yes, I’m being quite ambitious by jumping into this project, I’m just seeing how far I can get with the knowledge I have and can gain. I was wondering about those - I’ve been reading through the documentation to figure out how best to assign my tasks to a Parallel, to then chain those in a Sequence that can be called every update, but it’s been hard to see which of the classes I should instantiate or subclass to do this in the way that Parallels suit. Is there anything more in depth than that manual page with a bit more explanation than the API/source?

I tend to write procedural/functional code, so corralling classes together would likely be somebody else’s cup of tea I’m afraid. I believe that conventional class-based software architecture often leads to more problems than it solves.

Indeed, “showbase” itself shows this to an extent, as it is effectively a “god class” which allows rapid development in Panda3D.

I’ve read/heard this perspective in a few places! I’d love to know what procedural approaches to this would be, but am I right in saying that it requires a good foundation in functional programming to dive into things like that?

I’ve seen other devs in a number of different contexts struggle to understand procedural code, as they were trained for years in class-based software architecture. So, I’m not sure what it would take in your specific case. Best of luck though, and feel free to share code here if you would like to do so.

Thank you! Yes, it seems like it’s a very different sort of workflow- though I’d love a recommendation on any guides that you would recommend as someone who prefers this style of coding, particularly for simulation or games programming!

I’m trialling a method with parallels but am still not getting movement on-screen, so I’ll poke about and see if I’ve made a silly mistake somewhere. Open to trying new approaches where they come up! It’s been a while since I studied compsci so I’m approaching it with a very open mind, and have been enjoying restructuring my code to make it more data-oriented and reduce the number of objects flying around (bless numpy).

Latest update- switching the Tasks for Parallels has done the trick! At least, I hope it was that, because I did notice I had my timestep set too slow to have very visible results. However, it seems that now I can loop my physics calculations over 300 times per tick without dropping below 40fps, but adding a few more spheres brings it crashing down to 10fps no matter how many loops of the calculation I do. I’d hoped that I’d properly set them up for hardware instancing, but I am also updating their position one-by-one, so it looks like my deferred problem of figuring out how to pass the vertex shader my whole numpy array every tick is on the table now!

1 Like

If I understood your tasks correctly, then @wezy posted his experiments with buffers to implement particle motion calculations on the forum.

2 Likes

Hell yeah, thank you!!

Some thoughts/observations:

  • Disable vsync when profiling to avoid weird “stair-steps” in fps
  • Frame time is better for profiling since it is linear (going from 30fps to 40fps is a much bigger improvement than going from 300fps to 310fps, so a difference of “10fps” means nothing without context)
  • Multi-threading in Python isn’t going to use more CPU cores because of Python’s Global Interpreter Lock (Gil); multiprocessing can work around this, but it read its own issues
  • NumPy will help get an lot more out of a single thread
  • Computer shaders are well suited to this kind of thing, but the learning curve can be steep
1 Like