Panda3d and head tracking with Opencv, performance issues

Hi, I’m having some trouble speeding up my code. My head tracking using opencv program works just fine but the performance was lacking. I tried to download the new 1.8.0 alpha version of panda3d that has true threading enabled and port all my opencv code to another thread. The performance gain was minimal.

On my machine it takes around 0.13s for opencv to process a single frame of my webcam and although I have a dual-core system and all the webcam processing is done in a separate thread (using panda3d thread, not the default python one) the processor use doesn’t go higher than 35% (dual-core hyperthreading processor, 50% means using both cores at maximum) and I’m getting 10 fps, if I turn off the head-tracking I get 300fps.

It seems I’m running into a Global Interpreter Lock problem which means that my code isn’t really executing in parallel, I thought that since panda3d and opencv do all the heavy-lifting in C++ the GIL would not drastically reduce my performance. Is this the case or am I doing something wrong? Is there any way to increase performance without porting the head tracking to C++? Am I doing the threading wrong?

from direct.stdpy.threading import Thread
import cv2.cv as cv
from time import sleep

class camThread(Thread):
        #my thread main loop
	def run(self):
		while True:
			if (self.ready==False):			
				#returns the position of the detected faces, calls opencv functions
				x,y,w,h= self.detect_and_draw(self.frame_copy, self.cascade) 
				#some basic math to find the center of the face
				self.pt1= x + w/2 - (frame.width)/4
				self.pt2= y + h/2 - (frame.height)/4
				self.ready=True
			else:
				sleep(0)

#my panda3d task that calls the thread
#self.camThread it the thread object that manages the head tracking
def cameraTask(self, task):
	if (self.camThread.ready==True):
		pt1= self.camThread.pt1
		pt2= self.camThread.pt2
		self.camThread.ready=False
		#changes the camera position based on the head position
		self.look(pt1,pt2,2)
	return Task.cont

If I change the sleep call from 0 to a higher number I get more fps but the head tracking gets more choppy. I know I should be using locks to manage camThread.ready but in this simple example it shouldn’t really be a problem.

Which is the function in this example that calls OpenCV? Is it a C++ function? Whichever function it is, I assume from what you say that at some point you call a C++ function, in your child thread, that takes about 0.13 s before it returns. Unless it releases the Python GIL while it runs, that lock will still be held during its execution time, preventing any Python code in the main thread from running for its entire 0.13 s.

If it’s not written to be specifically Python-aware, it won’t release the Python GIL. How is this function exposed to Python? Interrogate can add the appropriate Python calls to do this if it’s tagged with the BLOCKING keyword, but if you’re using some other tool, you’ll have to add them yourself.

David

I call opencv directly through opencv python wrapper provided.
In my case self.detect_and_draw() calls lots of opencv functions, but the one that is really slow is the cv.HaarDetectObjects, this is the one that takes 0.13s. I assume this function calls the appropriate c++ function.

Since that function is from a wrapper I assumed that it would release the GIL and let panda3d process more frames, but that doesn’t seem to be happening. I’m unsure if it’s a limitation of the wrapper or a problem with my code.

I will post the detect_and_draw function here, but it’s mostly opencv mumbo-jumbo.

def detect_and_draw(self,img, cascade):
        # allocate temporary images
        gray = cv.CreateImage((img.width,img.height), 8, 1)
        small_img = cv.CreateImage((cv.Round(img.width / image_scale),
                       cv.Round (img.height / image_scale)), 8, 1)
    
        # convert color input image to grayscale
        cv.CvtColor(img, gray, cv.CV_BGR2GRAY)
    
        # scale input image for faster processing
        cv.Resize(gray, small_img, cv.CV_INTER_LINEAR)
    
        cv.EqualizeHist(small_img, small_img)
        x=y=w=h=0
        if(cascade):
            t = cv.GetTickCount()
            faces = cv.HaarDetectObjects(small_img, cascade, cv.CreateMemStorage(0),
                                         haar_scale, min_neighbors, haar_flags, min_size)
            t = cv.GetTickCount() - t
            #print "detection time = %gms" % (t/(cv.GetTickFrequency()*1000.))
            if faces:
                for ((x, y, w, h), n) in faces:
                    # the input to cv.HaarDetectObjects was resized, so scale the 
                    # bounding box of each face and convert it to two CvPoints
                    pt1 = (int(x * image_scale), int(y * image_scale))
                    pt2 = (int((x + w) * image_scale), int((y + h) * image_scale))
                    cv.Rectangle(img, pt1, pt2, cv.RGB(255, 0, 0), 3, 8, 0)
    
        self.img= img
        return x, y, w, h

I’d be a bit surprised if the standard OpenCV python wrapper released the GIL by default. You might want to ask this question on the OpenCV forums or whoever provides the python wrappers for OpenCV.

But assuming it doesn’t, you might need to provide your own wrapper function instead, meaning you’ll have to dive at least a little bit into C++ coding.

David

Thanks drwr, I assumed I was doing something wrong, but it seems that you are right the wrapper is not releasing the GIL. I will ask around in the opencv forums.

Opencv requires numPy to be used with python, it might be releasing the GIL lock for some of the work, but it’s still doing some math in python which activates the GIL.