ffmpeg/Threading issues in 1.8.0?

Hi,
Just to mention that I’m encountering more & more frequently freezes or crash errors coming in a sporadic way from ffmpeg video and audio processing.

One crash example coming from FfmpegVideoCursor.cxx:

#ifdef HAVE_SWSCALE
struct SwsContext *convert_ctx = sws_getContext(_size_x, _size_y,
_video_ctx->pix_fmt, _size_x, _size_y,
PIX_FMT_BGR24, SWS_FAST_BILINEAR, NULL, NULL, NULL);
nassertv(convert_ctx != NULL);
sws_scale(convert_ctx, _frame->data, _frame->linesize,
0, _size_y, _frame_out->data, _frame_out->linesize); // ######## here !!!
sws_freeContext(convert_ctx);

Call stack at crash time:

>	libpanda.dll!FfmpegVideoCursor::export_frame(unsigned char * data=0x2fcad218, bool bgra=false, int bufx=512)  Line 249 + 0x24 bytes	C++
 	libpanda.dll!FfmpegVideoCursor::fetch_into_texture(double time=13.078053811331131, Texture * t=0x04ac8340, int page=0)  Line 482	C++
 	libpanda.dll!MovieTexture::cull_callback(CullTraverser * __formal=, CullTraverser * __formal=)  Line 389 + 0x19 bytes	C++
 	libpanda.dll!TextureAttrib::cull_callback(CullTraverser * trav=0x3891f3e0, const CullTraverserData & data={...})  Line 459 + 0x12 bytes	C++
 	libpanda.dll!RenderState::cull_callback(CullTraverser * trav=0x3891f3e0, const CullTraverserData & data={...})  Line 238 + 0x13 bytes	C++
 	libpanda.dll!GeomNode::add_for_draw(CullTraverser * trav=, CullTraverserData & data=)  Line 555 + 0x33 bytes	C++
 	libpanda.dll!CullTraverser::traverse_below(CullTraverserData & data={...})  Line 260	C++
 	libpanda.dll!CullTraverser::traverse(CullTraverserData & data={...})  Line 231	C++
 	libpanda.dll!CullTraverser::traverse_below(CullTraverserData & data={...})  Line 290	C++
 	libpanda.dll!CullTraverser::traverse(CullTraverserData & data={...})  Line 231	C++
 	libpanda.dll!CullTraverser::traverse(const NodePath & root={...})  Line 170	C++
 	libpanda.dll!GraphicsEngine::do_cull(CullHandler * cull_handler=0x00000000, SceneSetup * scene_setup=0x04acabb8, GraphicsStateGuardian * gsg=0x03693b50, Thread * current_thread=0x036920b8)  Line 1136	C++
 	libpanda.dll!GraphicsEngine::cull_to_bins(GraphicsOutput * win=, DisplayRegion * dr=, Thread * current_thread=)  Line 1443 + 0x15 bytes	C++
 	libpanda.dll!GraphicsOutput::get_active_display_region(int n=25821025)  Line 880 + 0x19 bytes	C++
 	libpanda.dll!BoundingSphere::BoundingSphere(const BoundingSphere & __that={...})  + 0x33 bytes	C++
 	libpanda.dll!GraphicsOutput::RenderTexture::~RenderTexture()  + 0x43 bytes	C++
 	libp3framework.dll!PandaFramework::task_igloop(GenericAsyncTask * task=0xfcb3eb1e, void * data=0x00000000)  Line 1581	C++
 	libpanda.dll!AsyncTask::unlock_and_do_task()  Line 457	C++
 	libp3framework.dll!PandaFramework::do_frame(Thread * current_thread=0x036920b8)  Line 845	C++

My config:

-win7
-build 1.8.0 from cvs
-ffmpeg libraries as per David last 3rd party 32bit libs (this is the only difference I can see from what I've use in the past months)
-*** use of threading though 3 task_chains

Anything maybe related to threading issues??

What are you other task chains doing? Anything related to texturing?

I haven’t been experiencing any crashes, so we have to isolate what’s different between you and me. I’m not running your application, so that’s one possibility. Maybe it’s the media: is there something unique about your movie files? That’s another possibility.

How often does this crash happen? Is it always in the same place?

Are you using the graphics-threading-model approach to multithreaded rendering? It doesn’t look like it from your call stack.

David

yes they do, for instance having a couple of camera views being used as inputs to a sort of a ‘mixer shader’ that renders into a texture

nothing special, regular avi files
and for audio: mp3 or wav
Don’t know if relevant but for one avi, I noticed recently the following warning:
:movies:ffmpeg(warning): Invalid and inefficient vfw-avi packed B frames detected

sporadically (ie sometimes it only shows up after a few minutes the app is launched, or whenever changing panda camera position).
** The fact is that what happens more is some form of ‘freeze’ giving the classical windows ‘the app is not responding’

You mean app/cull/draw threading? No, I’ve tried it and it crashes straight away (I think this may be due to re-using Shaders generated texture in multipass)

I don’t understand this–how can you be doing any rendering processing in a task chain? All rendering has to be done by the single igloop task.

David

Hi, sorry for the confused explanation. What I meant is that some rendering parameters (mostly shaders inputs) are being changed dynamically within task_chains. But this should not have an impact on ffmpeg…

To be more precise, here is the (simplified) overall backbone structure of the app

    Sketch of the c++ app:
    ----------------------
    load geometries and actors
    load avi videos as textures ie TexturePool::load_texture("movie.avi")    
    load sound files
    initiate network exchanges
    define cameras, hidden buffers, specific shaders
    
    then dispatch tasks:
    
    // task chain 2
	> collision traverser	
	> audio_update_task -> mostly AudioManager->update();
	
	// task chain 3
	> UDP (in & out) exchanges
	> actors animation through control joints
	> change camera lens set-up and some shaders control parameters
	
	// task chain 4
	> bullet physics
	
	// task chain 1 (master?)
	> dynamically set some shader input parameters   
    
    then run mainloop:
    ------------------
    while ( (framework.do_frame(current_thread)) && (!exit_application) ) {
		CIntervalManager::get_global_ptr()->step();   // Step the interval manager

		#ifdef EXTRA
		Berkelium::update();     // pass some inputs to berkelium
		if (nvidia_special_on) { // grab texture context for possible 
			GLTextureContext *GLtexContext = DCAST(GLTextureContext,WBuffertexture->prepare_now(1,gsg->get_prepared_objects(),gsg));
			GLuint tex_GLindex = GLtexContext->_index;
		}
		#endif
	}
    

Hmm, if you set “support-threads 0” in your Config.prc file, it will force all the task chains to run on the main thread. If you do this, does it also make the crashes go away?

David

Ok, I’m now running with:
load_prc_file_data("", “support-threads 0”);

Of course, as an immediate consequence the fps is divided by roughly 6! :cry:

Anyway, for the time being I’m not experiencing any crash or freeze, but be aware that in the previous case these crashes are sporadic (ie their appear from time to time…)

I’ll pursue testing the app in the no-thread mode and try and see what happen

One more interesting thing:

I’ve tried too a different approach ie:
load_prc_file_data("", “support-threads 1”);
load_prc_file_data("", “threading-model /Draw”);

and only one task_chain (ie no task_chain threading)

The outcome is:

no more crash at launch with “threading-model /Draw”, it seems that having only one task_chain helped on this
barely any performance improvement from previous post
ie fps now divided by 4.5

As of now, ffmpeg crash or freeze is not manifesting, but still exercicing the app…

Status: in case it helps…

After regenerating a new Panda-1.8.0 after David’s reported FMOD change yesterday evening in
the CVS. (as per [setOneShot and RTMCopyRam?) )

config:

(build Windows7 --no-python --optimize 4)

  load_prc_file_data("", "lock-to-one-cpu 0");
  load_prc_file_data("", "support-threads 1");
  load_prc_file_data("", "sync_video 1");

+ 4 task_chains

PLUS: :smiley:

Not getting anymore 
:movies:ffmpeg(warning): Invalid and inefficient vfw-avi packed B frames detected 

Not getting anymore weird sound acceleration, ie strange audio phenomenon that was happening whenever a sound->play() was issued in a task_chain

average FPS = 59.8 fps !!

bullet ok

MINUS: :cry:

Still sporadic app freezes, "ie application not responding".

That’s it for the time being:
Will try to make a panda built with --override “DEBUG_THREADS = 1”

Hi, as a follow up, I’ve isolated a call stack when the sporadic bug occurs. Here it is:

in CopyOnWritepointer.cxx

CPT(CopyOnWriteObject) CopyOnWritePointer::
get_read_pointer() const {
  if (_object == (CopyOnWriteObject *)NULL) {
    return NULL;
  }

  Thread *current_thread = Thread::get_current_thread();

  MutexHolder holder(_object->_lock_mutex); // ***** crashes here ***
  while (_object->_lock_status == CopyOnWriteObject::LS_locked_write) {
    if (_object->_locking_thread == current_thread) {
      return _object;
	libpanda.dll!CopyOnWritePointer::get_read_pointer()  Line 39 + 0x12 bytes	C++
 	libpanda.dll!CopyOnWritePointerTo<AnimPreloadTable>::get_read_pointer()  Line 285 + 0x10 bytes	C++
 	libpanda.dll!GeomVertexDataPipelineReader::make_array_readers()  Line 2164 + 0x1f bytes	C++
 	libpanda.dll!Geom::draw(GraphicsStateGuardianBase * gsg=0x01af3b50, const GeomMunger * munger=0x045d81a8, const GeomVertexData * vertex_data=0x2a2d6f2c, bool force=false, Thread * current_thread=0x01af20b8)  Line 1219	C++
 	libpanda.dll!CullBinStateSorted::draw(bool force=false, Thread * current_thread=0x01af20b8)  Line 90 + 0x2c bytes	C++
 	libpanda.dll!CullResult::draw(Thread * current_thread=0x01af20b8)  Line 286 + 0x29 bytes	C++
 	libpanda.dll!GraphicsEngine::do_draw(CullResult * cull_result=0x28493110, SceneSetup * scene_setup=0x2ff5ff88, GraphicsOutput * win=0x04587a38, DisplayRegion * dr=0x04588a38, Thread * current_thread=0x01af20b8)  Line 1903	C++
 	libpanda.dll!GraphicsEngine::draw_bins(GraphicsOutput * win=, DisplayRegion * dr=, Thread * current_thread=)  Line 1543	C++
 	libpanda.dll!GraphicsEngine::draw_bins(const ov_set<PointerTo<GraphicsOutput>,IndirectLess<GraphicsOutput> > & wlist={...}, Thread * current_thread=0x01af20b8)  Line 1494 + 0xf bytes	C++
 	libpanda.dll!GraphicsEngine::WindowRenderer::do_frame(GraphicsEngine * engine=0x01af2ab0, Thread * current_thread=0x00000000)  Line 2476	C++
 	libpanda.dll!GraphicsEngine::render_frame()  Line 744	C++
 	libp3framework.dll!PandaFramework::task_igloop(GenericAsyncTask * task=0x01b08d3c, void * data=0x00ad9ba0)  Line 1581	C++
 	libpanda.dll!GenericAsyncTask::do_task()  Line 76 + 0x10 bytes	C++
 	libpanda.dll!AsyncTask::unlock_and_do_task()  Line 457	C++
 	libpanda.dll!AsyncTaskChain::service_one_task(AsyncTaskChain::AsyncTaskChainThread * thread=)  Line 770 + 0xd bytes	C++
 	libpanda.dll!AsyncTaskChain::do_poll()  Line 1306	C++

Hmm, that’s trouble. That call stack seems to imply that one of the Geoms in the scene graph has gone bad; specifically, it references an invalid GeomVertexData. No idea how that could have happened, but it certainly isn’t directly related to the ffmpeg library, which has nothing to do with GeomVertexData.

Of course, it could be indirectly related to ffmpeg, if that module has overrun memory and corrupted the heap, but if so we’re in a very bad situation indeed.

Does this crash only happen if you play video files?

David

Hi,
Well as mentionned above

This since yesterday when you changed some stuff in FMOD.
So it seems that video/audio misbehaviour in thread mode (in my case multi task_chain) was healed by your changes. (Having video or audio appear to be ok, at least as good as they behaved several weeks ago).

So as of today, it seems that ffmpeg is not the key suspect, so maybe the title of this forum thread is not correct anylonger and should be related to threading…

What I’m simply experiencing are sporadic (not very often) freezes or crashes as described in my previous post.

Are you operating on models in your sub-threads, flattening, loading, or otherwise manipulating models that might be simultaneously observed by the main thread or some other thread? Simply reparenting models or changing their transforms should be safe, but low-level manipulation of a node’s vertices, via GeomVertexWriter, or via flatten, might not be, unless you’re certain that no other thread can find the node at the same time you’re manipulating it.

David

I don’t know if it is relevant but the only thing I can think of is having character controlled joint updates in one task_chain, and a collision pusher in an another one.
I’ve been wondering for a while if this might generate any conflict?

If not, I can’t think of any low level stuff, besides changing dynamically some shader inputs…

Now, maybe Bullet wnich is running on its own task_chain is doing low level stuff when displaying debug nodes, I don’t know.

The collision pusher is only updating transforms, so that should be OK, but the character joint updates can cause vertices to be recomputed, which could be trouble, perhaps. Are you explicitly updating character joints in a threaded task? You could try temporarily disabling this to see if it makes a difference.

David

Yes, since in the threaded task I’m getting UDP joints update information from remote controller.

You mean disabling threading or leaving this specific task in the master task_chain?

Just this specific task; I think we’ve already established that disabling threading in general seems to help. Now we’re trying to figure out which particular action is causing problems with threading.

David

Might be that Bullet is causing this problem. The BulletDebugNode does manipulate geometry, and it does this in a dirty way, i. e. keeping around some pointers in local members.

@jean-claude:
Does it help to leave away the BulletDebugNode?

@enn0x

I thought that BulletDebugNode could be one way or another involved… I’ll try to run without it and let you know how it comes.

@David

I left character joints update task in the main task_chain-> no difference ie still having some unexpected freezes/crashes

BTW. In order to be thread safe (at least as much as possible), I’ve insured that some C++ global variables are mutex’d, as per the following oversimplified sketch example:

int val;        // global variable
HANDLE mut_val; // associated mutex

AsyncTask::DoneStatus Task1(GenericAsyncTask* task, void* data) {
   WaitForSingleObject(mut_val,INFINITE);
   val--;
   ReleaseMutex(mut_val);
   return AsyncTask::DS_cont;
}

AsyncTask::DoneStatus Task2(GenericAsyncTask* task, void* data) {
   WaitForSingleObject(mut_val,INFINITE);
   val = val%30;
   ReleaseMutex(mut_val);
   return AsyncTask::DS_cont;
}

static void go_ahead (const Event* evenmt, void* data) {
   WaitForSingleObject(mut_val,INFINITE);
   val++;
   ReleaseMutex(mut_val);
}

static void event_button (const Event* evenmt, void* data) {
...
   if (condition) go_ahead();
...
}

main
----
...
// create mutex for int val
mut_val = CreateMutex(NULL,FALSE,"mutex");

// assign tasks to task_chain threads
...
T_task1 = new GenericAsyncTask("Task1", &Task1, NULL);
T_task1->set_task_chain("task_chain_2");
taskMgr->add(T_task1);

T_task2 = new GenericAsyncTask("Task2", &Task2, NULL);
T_task2->set_task_chain("task_chain_3");
taskMgr->add(T_task2);
...

// then run panda loop advancing frames
...

// on exit delete mutex
CloseHandle(mut_val);
...

I’d like to understand if this is overkilling (paranoid) or if Panda is already taking care of all or part of this?

@enn0x

After removing BulletDebugNode calls, I don’t experience any freeze. It seems to help :smiley:

(although I can’t be absolutely certain since the freezes have occured so far in a sporadic behaviour…)