PyCEGUI / CallbackNode render bug with shader generator

whitelynx · March 9, 2011, 4:24pm

I’m having an issue using PyCEGUI in Panda3D with the shader generator enabled. I’m using morgul’s CEGUI integration from [url][Beta] CEGUI 0.7.5 in Python], which requires PyCEGUI. (to get it on Windows, simply easy_install PyCEGUI; on Ubuntu, see the instructions at https://dev.skewedaspect.com/precursors/wiki/Development/PyCEGUI_Ubuntu1010) This isn’t posted in that topic, because neither morgul nor I have any idea what’s causing this, and it seems to be something beyond the scope of the PyCEGUI bindings.

The issue is that the CEGUI windows sometimes disappear when a lit, normal-mapped model is loaded and becomes visible. The issue only manifests when there is a normal-mapped model on screen, there is a directional or point light (or possibly other types of lights) enabled, and setShaderAuto() has been called; if any of those stop being true (unloading the model, shutting off the directional light, or calling setShaderOff()), the problem disappears. If the model is loaded off screen and then you turn to look at it, the windows will be fine until when it comes on screen, at which point the windows will disappear.

I have a test case which can be downloaded from http://people.g33xnexus.com/media-whitelynx/cegui-panda-bug-2011-03-09.tar.bz2.

In the test case app:

model 1 (the asteroid from our game) reliably triggers the issue
model 2 (normal-mapped room from bumpmap tutorial) will sometimes trigger the issue
model 3 (panda) never does

Possibly also useful to note:

The PyCEGUI bindings for Panda3D use a CallbackNode to render.
For some reason, when the TaharezLook window in the bottom left corner disappears, it leaves behind a subtle translucent grey rectangle.

drwr · March 10, 2011, 1:07am

I haven’t looked at any of the code yet, but the symptom sounds like it might be a failure to call GeomDrawCallbackData::set_lost_state(true) at the end of the callback. Could this be the problem?

Further explanation: by default, the draw callback as triggered by the CallbackNode assumes that the callback will leave the render state exactly as it found it. If it doesn’t, bad things can happen as Panda’s model of the state gets out of sync with the actual render state. However, the draw callback receives a GeomDrawCallbackData object as its cbdata pointer, and you can call cbdata->set_lost_state(true) to indicate that the render state is undefined at the end of the draw callback. If you make this call, then Panda will make no assumptions about the render state and will reset anything it needs to.

If this isn’t the problem, it is possible that there is a bug in the handling for set_load_state() that isn’t properly resetting some key shader-related state.

David

whitelynx · March 10, 2011, 3:16pm

I tried calling data.setLostState(True) at the end of the render callback, and it didn’t seem to help. It seems to make sense to me that that wouldn’t have much effect, since the problem seems to be Panda messing up CEGUI’s rendering, and not the other way around. On the other hand, it doesn’t seem like that call is a bad thing to have in there, since it may avoid some future issues with Panda rendering. Anything I can do to test out the set_load_state() possibility?

drwr · March 11, 2011, 12:28am

Hmm, perhaps it’s necessary to put an explicit call to disable the current shader at the beginning of the callback? There’s no guarantee what state the OpenGL state is in when the callback begins, and perhaps CEGUI assumes there is no shader in effect.

Edit: never mind, I’m a dope. There is a guarantee what state the OpenGL state is in, and that’s whatever state is applied to the CallbackNode itself. So there should be no shader applied unless it is actually applied to the CallbackNode. So, this does sound like something’s going wrong in the underlying handling somehow.

How to debug it is a little tricky unless you’re good with C++ and debuggers.

David

morgul · March 11, 2011, 4:58am

WhiteLynx and I are working on this together. We’re both pretty good with both c++ and debuggers. (We were using a C++ engine a few years ago, and we both work at a company where some of the development is still C++. We’re old hats at using gdb.)

Any suggestions on how to start debugging this would be welcome. We’re not quite sure where to look, or what we’re looking for.

drwr · March 11, 2011, 9:19pm

I would start by explicitly unbinding the shader at the start of the CEGUI callback, just to prove whether that is or isn’t the problem. Since the auto-generated shaders are CG, I guess that means calling cgGLDisableProfile().

If unbinding the shader makes a difference, then it proves that Panda failed to unbind it itself, which is a bug in Panda, and we’ll have some detective work to do. If that doesn’t make a difference, though, then there must be some other problem, and again we’ll have some detective work to do.

David

whitelynx · March 14, 2011, 6:07am

How do we get the profile to disable? The CEGUI callback is in Python, so I’m using ctypes to load libcgGL and call cgGLDisableProfile, but it takes a parameter (see http.developer.nvidia.com/Cg/cgG … ofile.html) which determines what profile to disable. Is there actually any way of getting the correct profile?

It looks like libcg’s cgGetProfile() might be a way to get the right profile, but I’d still have to know the name of the profile I need to disable…

whitelynx · March 14, 2011, 7:07am

Ok, as a start, I tried disabling the following profiles at the beginning of the callback:* CG_PROFILE_ARBVP1

CG_PROFILE_ARBFP1
CG_PROFILE_FP40
CG_PROFILE_VP40

The first 2 had no visible effect, but when I tried disabling the 3rd or 4th, the following error message was printed to the console every frame:

:display:gsg:glgsg(error): panda/src/glstuff/glShaderContext_src.cxx, line 671: The profile is not supported.

I also tried re-enabling the profiles at the end of the callback, but this didn’t seem to change anything.

Despite that error being printed out repeatedly, there was no visible change in the look of the normal-mapped model, and the CEGUI windows never showed up.

Any suggestions on other profiles I should try disabling? Here’s the full list: (pulled from cg.h, since I had trouble finding it in an online reference anywhere)

CG_PROFILE_UNKNOWN = 6145
CG_PROFILE_VP20    = 6146
CG_PROFILE_FP20    = 6147
CG_PROFILE_VP30    = 6148
CG_PROFILE_FP30    = 6149
CG_PROFILE_ARBVP1  = 6150
CG_PROFILE_FP40    = 6151
CG_PROFILE_ARBFP1  = 7000
CG_PROFILE_VP40    = 7001
CG_PROFILE_GLSLV   = 7007 # GLSL vertex shader
CG_PROFILE_GLSLF   = 7008 # GLSL fragment shader
CG_PROFILE_GLSLG   = 7016 # GLSL geometry shader
CG_PROFILE_GLSLC   = 7009 # Combined GLSL program
CG_PROFILE_GPU_FP  = 7010 # Deprecated alias for CG_PROFILE_GP4FP
CG_PROFILE_GPU_VP  = 7011 # Deprecated alias for CG_PROFILE_GP4VP
CG_PROFILE_GPU_GP  = 7012 # Deprecated alias for CG_PROFILE_GP4GP
CG_PROFILE_GP4FP   = 7010 # NV_gpu_program4 fragment program
CG_PROFILE_GP4VP   = 7011 # NV_gpu_program4 vertex program
CG_PROFILE_GP4GP   = 7012 # NV_gpu_program4 geometry program
CG_PROFILE_GP5FP   = 7017 # NV_gpu_program5 fragment program
CG_PROFILE_GP5VP   = 7018 # NV_gpu_program5 vertex program
CG_PROFILE_GP5GP   = 7019 # NV_gpu_program5 geometry program
CG_PROFILE_GP5TCP  = 7020 # NV_tessellation_program5 tessellation control program
CG_PROFILE_GP5TEP  = 7021 # NV_tessellation_program5 tessellation evaluation program
CG_PROFILE_VS_1_1  = 6153
CG_PROFILE_VS_2_0  = 6154
CG_PROFILE_VS_2_X  = 6155
CG_PROFILE_VS_2_SW = 6156
CG_PROFILE_PS_1_1  = 6159
CG_PROFILE_PS_1_2  = 6160
CG_PROFILE_PS_1_3  = 6161
CG_PROFILE_PS_2_0  = 6162
CG_PROFILE_PS_2_X  = 6163
CG_PROFILE_PS_2_SW = 6164
CG_PROFILE_VS_3_0  = 6157 # DX9 vertex shader
CG_PROFILE_PS_3_0  = 6165 # DX9 pixel shader
CG_PROFILE_HLSLV   = 6158 # DX9 HLSL vertex shader
CG_PROFILE_HLSLF   = 6166 # DX9 HLSL fragment shader
CG_PROFILE_VS_4_0  = 6167 # DX10 vertex shader
CG_PROFILE_PS_4_0  = 6168 # DX10 pixel shader
CG_PROFILE_GS_4_0  = 6169 # DX10 geometry shader
CG_PROFILE_VS_5_0  = 6170 # DX11 vertex shader
CG_PROFILE_PS_5_0  = 6171 # DX11 pixel shader
CG_PROFILE_GS_5_0  = 6172 # DX11 geometry shader
CG_PROFILE_HS_5_0  = 6173 # DX11 hull shader (tessellation control)
CG_PROFILE_DS_5_0  = 6174 # DX11 domain shader (tessellation evaluation)
CG_PROFILE_GENERIC = 7002

rdb · March 14, 2011, 7:22am

If you have “basic-shaders-only” set to “#t”, then arbvp1/arbfp1 should be enough. Otherwise, you might want to disable glslf/glslv.

whitelynx · March 14, 2011, 7:22pm

‘basic-shaders-only’ is set to ‘#t’ on both my machines, and I’ve tried either disabling arbvp1/arbfp1 or glslf/glslv on both, and no combination seems to have any effect; the CEGUI windows still disappear in the same conditions as before. I’ve also tried both in conjunction with data.setLostState(True), without any effect.

Anything else I can check?

drwr · March 14, 2011, 8:04pm

Hmm, that’s strange all right.

Let’s assume you’re correctly disabling the shader, which means it’s not a problem with the shader being left enabled for the CEGUI callback. Maybe it’s something subtle like a driver bug preventing the shader from being fully disabled? Or maybe the CEGUI stuff is making its own assumptions about OpenGL state being left in a particular configuration?

It might help to clarify precisely the circumstances that cause the gui to disappear. For instance, is it a rendering order thing? Does the gui disappear only when it is rendered right after the normal-mapped object, or right before it, or in either case? You can determine this by explicitly setting the render order on everything in the scene, for instance with a setBin() operation. Although it occurs to me now that your normal-mapped object is probably in render, and your gui is probably in render2d; and render2d is drawn after render.

So, you can play with the relative draw order of render and render2d, e.g.:

base.cam.node().getDisplayRegion(0).setSort(200)
base.cam2d.node().getDisplayRegion(0).setSort(100)

to reverse the sort order of render2d and render. You might also need to play with the depth and color clearing; make sure render is not set to clear the color buffer:

base.cam.node().getDisplayRegion(0).setClearColorActive(False)

David

whitelynx · March 14, 2011, 9:47pm

I just tried the setSort() calls you suggested (and set cam to not clear the color buffer, but cam2d to clear it instead) and it still exhibits exactly the same behavior.

drwr · March 14, 2011, 11:04pm

Well, so we have a strange conundrum indeed. Does rendering a non-normal-mapped object after the normal-mapped object help? Try adding an ordinary, non-shader-rendered object to the scene, and ensure it gets drawn last with object.setBin(‘fixed’, 0). If this also has no effect, then it strongly suggests that OpenGL state is not the problem, which is quite mystifying, because what else could it be?

But somehow we have the observation that the presence of a normal-mapped object in the scene breaks the CEGUI rendering.

Maybe the next step is to try to diagnose precisely what is going wrong with the CEGUI rendering. Does CEGUI provide any simpler rendering modes that might be less fragile, for instance? Failing that, does it have a way to inspect all of the OpenGL calls that it makes?

What happens if you add your own simple OpenGL calls to the CEGUI callback to, say, draw a triangle on the screen? Does that triangle still draw when the CEGUI doesn’t? If the triangle disappears too, then we can tackle what’s making all of your output disappear. If the triangle remains but CEGUI disappears, then what the heck is CEGUI doing if it’s not drawing triangles?

David

whitelynx · March 20, 2011, 1:49am

Well, this actually fixed it. I can reliably fix the disappearance of the CEGUI windows by adding a panda (models/panda-model) in the ‘fixed’ bin, and break it again by removing the panda. Also, while the panda is loaded and displayed in the ‘fixed’ bin, switching between different models (normal-mapped and non-normal-mapped) in the default bin has no effect on CEGUI; it won’t break as long as there’s a plain model in ‘fixed’.

The weird thing is that loading the ‘models/abstractroom’ model in ‘fixed’ also seems to reliably fix it.

I’ve uploaded a new version of the test application at http://people.g33xnexus.com/media-whitelynx/cegui-panda-bug-2011-03-19.tar.bz2; it includes code to test out several of the fixes we’ve tried so far, and it shows loading a non-broken model in the ‘fixed’ bin rectifying the issue.

So, it seems that it’s some sort of state that isn’t getting cleaned up before the callback… any ideas on how to figure out what isn’t getting cleaned up?

drwr · March 22, 2011, 12:33am

I’ve just successfully run your test program and reproduced the error. I hope to get a chance in the next couple of days to take a look deeper into the code and try to track down the source.

David

whitelynx · March 22, 2011, 5:12am

Awesome, thanks for the help!

drwr · March 26, 2011, 12:22am

Found it!

The problem is not (exactly) a bug in Panda, nor is it (exactly) a bug in CEGUI. But it does have a simple solution, which I have just committed.

Specifically, the problem is that CEGUI doesn’t make use of the glActiveTexture() call, which was introduced to OpenGL as of version 1.3. This OpenGL function is designed to set the texture stage that all following texture functions will operate on; the intention is that you should always precede any texture operations with a call to glActiveTexture() to specify the texture stage you intend to work on, similar to the glMatrixMode() function which should always precede any call to glLoadMatrix() etc.

Prior to OpenGL 1.3, there was only one texture stage, so this function didn’t exist. OpenGL programs written for 1.2 or earlier therefore never make this call, and always operate on the default stage, texture stage 0.

CEGUI is apparently written for 1.2 or earlier, since it never makes this call. That’s OK as long as it’s the only program controlling the OpenGL state. But when you have another program (like Panda) in the mix, CEGUI will fail if the other program happened to leave the active texture stage set to something other than texture stage 0.

So, the workaround is for Panda to explicitly reset the active texture stage to 0 before making the OpenGL draw callback. I think this is a perfectly reasonable workaround, and also protects against other OpenGL callbacks that make a similar assumption.

You can pick up this workaround by getting the latest Panda code from the CVS repository. Or, you can simply add a call to glActiveTexture(GL_TEXTURE0) at the beginning of the CEGUI callback.

David

whitelynx · March 27, 2011, 3:17am

Awesome, that workaround works great! I just added the glActiveTexture(GL_TEXTURE0) call to the start of my render callback, so our other devs wouldn’t have to run CVS in order for it to work. Thanks for helping debug this!