Hi everyone,
I’d like know your suggestions on which approach could best fit my problem related to performance once I’m newbie to multi-threading and multi-processing.
My last machine (an intel i5 with 2 cores and integrated graphics card) was taking several seconds to perform a single time step (taskMgr.step()
) even with several optimizations. All my tricks were used but still so performance was not the ideal. Then last month, in order to get things faster I decided buy a “monster” machine to simulate my AI things. One of the objectives would be run my simulation all the time (24/7). To be honest the things got better but not better enough with the new machine. I suspect the main reason is that my program is using only one single processor/thread instead of the several available processors because GIL among others.
Basically my program consists get information of cameras, bullet collisions, etc, to feed a neural network with sense of vision, touch, etc. Each instance sense is represented by a node which has class to handle the sense stuff. Example:
class Eye:
def __init__(self, args):
# Configure eye parameters like name, relative position, etc
# Create a camera spot to the scene
def process(self):
# Get camera spot's rendered image representing the image reaching the retina
class Skin:
def __init__(self, args):
# Configure skin parameters like name, relative position, etc
# Create feelers (neurons to detect the depth of the touch using bullet ray collision)
def process(self):
# Get which feelers were touch by an object on scene. A skin has several feelers,
# thus it's necessary use rayTestAll(origin, tip) for each feeler detect the touch depth
# and this is slow and there's no alternatives!
eye1 = Eye(...)
eye2 = Eye(...)
skin1 = Skin(...)
skin2 = Skin(...)
sense_nodes = [eye1, eye2, skin1, skin2]
class Simulation(ShowBase):
def __init__(self):
ShowBase.__init__(self)
loadPrcFileData("", "threading-model Cull/Draw") # I feel no performance difference with this option. Is it enabled by the default?
self.physics_manager = BulletWorld()
self.start = time.time()
self.step = 1
self.taskMgr.add(self.update, "update")
def update(self, task):
self.updateCamera()
self.physics_manager.doPhysics(globalClock.getDt())
for node in sense_nodes:
node.process()
# My attempt using python multiprocessing. It didn't work. :-(
#processes = []
#for node in self.sense_nodes:
# p = multiprocessing.Process(target=node.process)
# processes.append(p)
# p.start()
#for process in processes:
# process.join()
self.saveState(self.step) # Save the objects states to "disc"
if self.step == 50:
print(time.time() - self.start) # Print the time used for process 50 time steps
self.step += 1
return Task.cont
The new machine configuration:
- 2 chips Xeon e5-2678 v3 (2.30 GHz with 12 cores and 24 threads)
- 1 graphics card Radeon RX 570 8 Gb
- 32 gb RAM (server memmory)
- 1 SSD storage unit
I get 48 logical processors when I run:
import multiprocessing
multiprocessing.cpu_count()
Then the question is: which approach (taskChain, threading2, python multiprocessing, etc) I should use in order to get the best of 48 logical processors?? Remembering the data of each node doesn’t depend of other node data, i.e. they don’t need be synchronous, but each node needs get its processing done before the default taskManager iterate again (taskMgr.step()
).