[] Stack depth issue in EggPrimitive

FYI: Just experienced a stack overflow in EggPrimitive.
As far as I remember I’ve already seen this several months ago…

This happens at this point:

void EggPrimitive::
r_set_connected_shading(int stack_depth, EggPrimitive::Shading shading, 
                        const EggAttributes *neighbor, 
                        ConnectedShadingNodes &next_nodes) {
  if (stack_depth > 10000) {
    // Too deep.  Limit recursion.
    ConnectedShadingNode next;
    next._shading = shading;
    next._neighbor = neighbor;
    next_nodes.push_back(next);
    return;
  }

Call stack:

Note that the stack_depth is 6186 and has not reached the protective threshold of 10000

Oh, huh. How about that, there’s already recursion protection in there, and maybe the threshold is just too high. I assume it works if you hand-edit the threshold to, say, 5000?

I’ll make this a config setting. Thanks!

Edit: egg-recursion-limit config setting added, default value is 5000.

David

Hi,
Just quickly patched libpandaegg.dll with the suggested limit value of 5000.
It works ok. Thanks!

BTW. Just for my understanding: what’s the implication of changing the stack_depth limit in EggPrimitive::
r_set_connected_shading(…)?

It just means the function will have to fall back to a less-efficient approach when the stack limit is exceeded.

If the stack limit is set too low, the algorithm will take a little bit longer to compute, but it will reach the correct answer eventually. If (as we have discovered) the stack limit is too high, then the program will crash when the physical stack is exceeded.

On non-Windows platforms, there isn’t really a stack limit on the main thread, so this is only an issue on Windows, or on any platform if you are loading egg files in a sub-thread.

David

Hi,
Sorry to reopen this one, but just to let you know that the limits stroke again (ie gnerated a crash) with the default egg_recursion_limit value 5000

>	libpandaegg.dll!EggPrimitive::r_set_connected_shading(int stack_depth=3966, EggPrimitive::Shading shading=S_per_vertex, const EggAttributes * neighbor=0x21a652d0, pvector<EggPrimitive::ConnectedShadingNode> & next_nodes={...})  Line 1225 + 0x1a bytes	C++
 	libpandaegg.dll!EggPrimitive::r_set_connected_shading(int stack_depth=3965, EggPrimitive::Shading shading=S_per_vertex, const EggAttributes * neighbor=0x21a65600, pvector<EggPrimitive::ConnectedShadingNode> & next_nodes={...})  Line 1282 + 0x41 bytes	C++
 	libpandaegg.dll!EggPrimitive::r_set_connected_shading(int stack_depth=3964, EggPrimitive::Shading shading=S_per_vertex, const EggAttributes * neighbor=0x21a64e10, pvector<EggPrimitive::ConnectedShadingNode> & next_nodes={...})  Line 1282 + 0x41 bytes	C++

This actually raises the question:
since one can’t really define ahead what could be a decent limit based on its app and the size of the eggfiles to be used. Would it be possible to get a smart adaptive approach to the matter ie define dynamically the limit wrt the current stack level and system max stack_size?

I don’t know how to do that reliably. There’s no easy way to ask the C++ runtime (or the compiler) how much is added to the stack with each recursive call. I guess we can try to be sneaky by examining the stack register changing each time, but that’s pretty darn hacky.

Note that the appropriate limit has nothing to do with the egg files or the particular app. The actual limit is a hard value based on the stack size and the number of bytes consumed with each recursion call, both of which are determined at compilation time and have nothing to do with the runtime environment. (The choice of egg file does have an impact on whether you crash or not, but only because some egg files don’t require as much recursion, and may never approach the hard stack limit.)

So, it’s probably best just to find what that value is in the worst-case compilation environment. The easiest way I can think of to find that value is experimentally: when it crashes in a particular build environment, lower it until it doesn’t crash.

So, if 5000 is still too high in your environment, how about 2500?

David

Adopting an empirical approach is indeed one way to go:
load_prc_file_data("", “egg-recursion-limit 3000”);
solved the specific issue.

Having said so, a possible heuristic to auto-define on-the-fly the limit would be welcome, since getting a production app crash would definiively leave an end-user scratching his head…

What about for instance including in a egg-file a tag with its maximum recursion level, or at least including such an info in the asssociared bam file at the time it is created in the cache? And then derive the recursion-limit limit from this info…

Well, the egg file wouldn’t know its maximum recursion limit ahead of time; but it could theoretically be measured when the egg file is loaded, and stored in the bam file. But what useful information would that tell us?

This whole approach to limiting recursion is an ugly hack, anyway. A more robust approach would be not to use a recursive algorithm at all, or to design it with explicit data structures that grow without bounds even when the stack is limited. But no one wants to rewrite the egg loader to work that way; what a nuisance. Besides, that might not be any faster than setting a hard limit on recursion.

If there’s any worry about accidentally exceeding the limit we set in the future, I say we just set the limit low enough that that likelihood is sufficiently remote we shouldn’t need to worry about it. It was 10000 for years before anyone had a problem; I guess in the intervening years, the compiler changed a few properties, or maybe we expanded the size of some structures, or something. Now we find that we’ve only got about 1/3 as much space as we did originally.

All right, let’s set the limit to 1000, and we’ll be good for another 10 years. Maybe by the time we reach this new limit, the Windows compiler will do a better job of handling recursion without imposing a fixed stack size.

David

ok, I think too 1000 should fit the bill in most cases.

Having said so, I have the feeeling that this issue became visible these last days as a result of expanded memory aligment (eigen…).
It is somewhat related since as far as I remember the very time I experienced it in the past was through my use of Intel compiler on which I imposed mmx 16byte alignment…