garbage collect states, how to optimize performance?

Currently my game has about 1.5ms spent in garbage collect each frame according to pstats. This represents about 15% of the “App” time.
How would I go about analyzing what is being garbage collected in order to reduce the time spent? Presumably this would entail recycling more things instead of throwing them away each frame.

teedee, I already filed a bug report regarding this.
As stated in the report, there’s something very weird with the GarbageCollect stuff. img339.imageshack.us/img339/949/ … tela2i.png

You can check the report here:
bugs.launchpad.net/panda3d/+bug/1021053

To summarize one solution, I’d say:

I think I just have a lot of states being destroyed. Changing the setting makes no performance difference for me. Yes, the garbage collect is gone from pstats, but that time is just spread out on all the other stats. Which makes sense since toggling the PRC setting is changing if the garbage collection happens right away or at the end of the frame.

You could try disabling the transform and render state caches altogether with:

transform-cache 0
state-cache 0

Or with only one or the other.

David

For me, there is a place on my map that gets bad performance. garbageCollectStates is using 43.6363 ms of App’s 51.6243 ms total (Thats the worst case I have seen).

“garbage-collect-states #f” seems to fix it. It remove the “garbageCollectStates” section from the report, and seem to help frame rate somewhat.

Thats a worst case, but in that case garbageCollectStates is taking more time than “draw”, and much more than “cull”, and is 85% of app. In total garbageCollectStates is 34% of the frame time.

In a location where I get a much better frame rate, garbageCollectStates only takes 1.7 ms of the total frame time of 16.8 ms.

Craig, if I understand correctly, that’s pretty much the problem I reported too… In my case it seems even worst, as shown in the image I posted. 85% of the time is spent in the TransformState::garbage_collect.

In my case it was due a intense flatten situation. (The time shown in the chart is only count the post-flatten activity).

If I turn that option in prc, even with the flatten activity, the fps doesn’t get affected.

I think it is worth a investigation way that is happening.

I have a lot of flattened stuff, but it is flattened, then saved to bam files, then loaded from there when needed by my client.

I do a lot of crazy stuff (such as manually adjusting every render state on every node and geom I render), so my case is not a simple example of the issue at all.

I happen to have a lot of instancing, used closely with LOD nested nodes, multiple cameras re-rendering the scene with different default render states to different buffers and a bunch of tasks updating shader inputs, compas effects etc.

So, while I somewhat involve some flattening, I have no idea if thats the issue at all.

can’t view the image:(


What You Don’t Know About life insurance farmers as well as term life insurance for diabetics Could Be Costing To More Than You Think

Well, in my case setting “transform-cache 0” caused my game to crash while loading. However, “state-cache 0” seems to have cut the garbage collect time down to about 1/3 of the original cost and it doesn’t appear to have shifted anywhere else. 1ms for free, yay. :slight_smile:

Does this mean that my states were changing so often that it cost more to maintain the cache than the benefit it provided?

Yes, that sounds likely.

I don’t know why “transform-cache 0” should cause a crash, though. That’s strange and disturbing.

David

Here is the call stack, I don’t know if it is of much use. The state-cache setting resolves my issue nicely though.

Unhandled exception at 0x77e715de in python.exe: 0x00000000: The operation completed successfully.

 	ntdll.dll!77e715de() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]	
 	ntdll.dll!77e715de() 	
 	ntdll.dll!77e6014e() 	
 	ntdll.dll!77e9100b() 	
 	ntdll.dll!77e9d6b2() 	
 	ntdll.dll!77e9d554() 	
 	kernel32.dll!77157a0d() 	
 	msvcr90.dll!74b621cc() 	
 	msvcr90.dll!74b62411() 	
 	python26.dll!1e05b62a() 	
 	ntdll.dll!77e6faca() 	
 	ntdll.dll!77e89d6c() 	
 	ntdll.dll!77e83cee() 	
 	msvcr90.dll!74baa4f5() 	
 	ntdll.dll!77e7e67f() 	
 	KernelBase.dll!770a37ed() 	
 	python26.dll!1e05b62a() 	
 	python26.dll!1e05b5fb() 	
 	base.pyd!02221992() 	
 	msvcr90.dll!74bad0d9() 	
 	python.exe!1d001188() 	
 	msvcr90.dll!74baf914() 	
 	msvcr90.dll!74bb18de() 	
 	python.exe!1d0015f5() 	
 	ntdll.dll!77eab459() 	
 	ntdll.dll!77eab42b() 	
 	ntdll.dll!77eab3ce() 	
 	ntdll.dll!77e60133() 	
>	libpanda.dll!CullBinFrontToBack::add_object(CullableObject * object=0x00000023, Thread * current_thread=0x0027e398)  Line 75	C++
 	libpanda.dll!CullBinFrontToBack::add_object(CullableObject * object=0x00000000, Thread * current_thread=0x00000000)  Line 75	C++
 	libpanda.dll!TypedWritable::~TypedWritable()  Line 57 + 0x10 bytes	C++
 	libp3dtool.dll!TypeHandle::dec_memory_usage(TypeHandle::MemoryClass memory_class=-1, int size=268460172)  Line 95 + 0x13 bytes	C++
 	libpandaexpress.dll!___clean_type_info_names_internal()  + 0x7b2d bytes	C++
 	00000015()	
 	user32.dll!758a360e() 	
 	ddraw.dll!710febc6() 

Hmm, are you perhaps running with the threading-model in effect? I’m not sure that setting transform-cache or state-cache 0 is supported when you’re using the threaded pipeline–that model requires collecting all of the states at the end of the frame, or you risk a crash due to a race condition.

David

The opposite actually, I’m running a build with threading disabled completely (HAVE_THREADS disable in makepanda).