eigen


#22

Hey David,
Forgive us to have put you under pressure… but you know Panda’s groupies are like spoiled children waiting anxiously for Xmas morning!
:laughing: :laughing:


#23

As far as I believe now, everything is correct and working. Please let me know how your own experiments go. :slight_smile:

David


#24

First thing I notice is Panda is using up a lot more system memory. Running my game the frame rate was actually a bit lower, but then I remembered about clearing out the BAM cache. Unfortunately, it is now crashing with an out of memory message (presumably while generating BAM files):

Couldn't allocate memory page of size 131072: Not enough storage is available to
 process this command.


This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

That is for the game client. The server does run and takes up about 1.2GB of RAM as opposed to 0.6GB previously.


#25

Hmm, that’s strange. I would have expected a bit more memory usage, due to the 16-byte memory alignment requirements, but not double. Maybe the code itself is now much bigger due to the template explosion problem? This bears investigation.

David


#26

My early experiments seem to indicate about 20-30% more memory required by ordinary Panda operations. I’m trying a compile now with the original alternative malloc scheme to see how much of this bloat is due to alignment requirements and how much is due to the change in malloc schemes.

Is that 30% additional requirement enough to push you over the edge on the client? How close were you running previously? How much does memory grow before your client dies?

Edit: another thought–are you perhaps comparing against the memory usage in 1.7.2? It’s also possible that the threading change with 1.8.0 is another major contributor to more memory consumption.

Would you be satisfied to use a Panda3D compiled with Win64, to allow it take advantage of more than 2GB of RAM?

David


#27

I am comparing against my previous build from cvs, both of which I have disabled threading in.

I may not have waited long enough for my original memory measurement, the server seems to be using 0.9GB on the previous build now, so the increase to 1.2GB would be consistent with the 30% you are seeing. Actually now that I think about it that is a ton of memory for what is actually going on in the server so perhaps something for me to investigate there.

It seems to use a lot more memory when it has to cache EGG to BAM files and not release that memory. For example on the previous build (before eigen), first run of the client after clearing the BAM cache it is using almost 1.5GB, but after closing it and running again it is using only 0.8GB. It is likely that this is the real reason why it is hitting the 32 bit memory limit, and it’s just that this extra 30% on top of that has pushed it over the limit. Looks like the client at about 1.8GB when it crashes. If it was going to be taking up 1.5GB in the previous build, then that extra 30% would put it at about 2GB.

I can’t rely on a 64 bit build since end users may be running a 32 bit OS.


#28

Hi teedee,
As a quick & dirty turn around you can try
editbin /LARGEADDRESSAWARE yourapp.exe
This should provide a true 3gb addressable space on x32 machines

EDIT: recommandation: do it merely as a post build command


#29

It appears after some experimentation that this 30% memory bloat is due to the need to impose an external 16-byte alignment on top of the system malloc library. If we go back to an alternative malloc library that can provide 16-byte alignment natively, this waste goes away. I’ll run a few more tests to confirm this, then commit this change.

jean-claude’s advice is good, though. Also, you might consider distributing your game to end-users with your egg files already prebuilt to bams, to avoid this potential problem in general (the egg loader was never intended to be particularly memory-efficient; it was primarily designed for the developer, not the end user).

David


#30

Yes the plan would be to eventually distribute as BAM files. I just didn’t want to maintain a 64-bit build for development in addition to the 32-bit for distribution. The large address tweak sounds like it would be a good solution for development though.

About the EGG loader, I just found it strange that the memory it uses sticks around even after the game is loaded up. Shouldn’t that memory it uses to convert to BAM be able to be released once it has finished loading?

I suppose I could have some pre-process that runs and loads up the EGG files to populate the BAM cache before running the game itself. I had assumed that is what was happening under the hood anyways, but maybe I’ve misunderstood.


#31

just for my understanding

Assuming the use of alternative malloc library:
(1) what about /zp16 option in the compiler?
(2) can EIGEN_ALIGN16 then still be used as recommended in eigen3?
ie for instance typedef Eigen3::Map<Eigen3::Vector4f, Eigen3::Aligned> Vector4fMap;


#32

Yes, it does, mostly. It doesn’t actually get returned to the system–most allocated memory doesn’t–but that memory can still be subsequently reused by Panda operations, to a point limited by fragmentation and related problems. But I think the fundamental problem with loading egg files in the runtime client is that once the graphics context has been created, there is much memory already allocated for that purpose, and then the egg loader adds memory on top of that–and it becomes easy to exceed the paltry 2GB limit that Win32 provides.

This affects only the compile-time packing of structures, which we are already handling correctly with the EIGEN_ALIGN16 and other related definitions. But these kinds of alignment rules all assume that the the structures start on a 16-byte aligned block of memory, which might not be true for the memory returned by malloc(). (Actually, I’ve seen differing reports on whether this is true or not for Win32 malloc(), and I just thought it best to assume there exist cases for which it’s not true without actually testing it. For instance, maybe it’s true for Win7, but not on WinXP; and who wants to go around and test all of the existing Windows versions?)

Yes, but again, this refers only to the compiler packing, and it fundamentally assumes that the runtime memory is already aligned (which is what Panda will be responsible for guaranteeing one way or another, especially if you use PANDA_MALLOC() / PANDA_FREE() to manage your memory allocations).

David


#33

ok, so basically you’re handling directly a panda aligned_malloc & aligned_free, and heap management (garbage collection).

I’m asking since for SSEx optimization when I’m using Intel compiler in some case I’ve redefined some new/delete.
But maybe I should merely rely upon Panda_malloc.

btw.
(1) I refrain from using STL stuff (list,set,map,vector,…) since I’ve seen a waste of memory (I’d say +40% sometimes) and a drop in performances
(2) in some case using alloca (ie allocating on the stack instead allocating in the heap) has proven quite efficient…


#34

You can also inherit from MemoryBase, which gives you a redefined operator new/delete that calls down to PANDA_MALLOC(). Note that this is guaranteed to be 16-byte aligned only when compiling with Eigen.

There are volumes of opinions across the internet about the pros and cons of STL and its relative inefficiency. It’s true it’s not the most efficient toolkit, memory-wise or CPU-wise, but it’s a decent tradeoff between performance and developer effort. And at this point we’re well-committed to relying on STL heavily throughout Panda. :wink: In the very inner loops, we are more likely to rely on hand-rolled structures for optimum performance.

Agreed! I love alloca; it’s practically free. It’s also unfortunately only occasionally useful due to its nature.

David


#35

Shoot, I was wrong about the source of the 30% memory bloat. It’s not related to the malloc scheme at all. I’ll have to investigate further. It takes a while to iterate because my build times are so slow now.

David


#36

BTW. There is something that puzzles me for a couple of days, ie I see some structures like that being generated at compile time at several places:

What kind of matrix is this? A five dimensional matrix?? What for??

example:


#37

Some of those 3’s are actually bitmask options for Eigen’s template class. I’m not sure offhand what they all stand for; but this is part of the nature of template libraries: the compiler symbols are really hard to decipher. :wink:

David


#38

Just wanted to give a little update.
Either by my own accidental doing or by recent changes in Panda my server app is now using less than 100mb of memory instead of 900mb. That is huge.
Using the 3GB patch on python lets me keep my 32-bit build for development since the memory eaten by the EGG loader will not push the client over the memory limit. Thanks for that suggestion jean-claude!
With eigen enabled I am getting a frame rate increase somewhere in the range of 5-10%, basically for free. :slight_smile:
Did you ever figure out where the memory bloat was coming from? I seem to get about the same memory usage regardless of enabling eigen or not in the build.


#39

When I looked closer, I didn’t find a memory bloat at all. Maybe there’s something unique to your particular scene that now causes more memory usage than in previous versions?

Or maybe the memory bloat is in the egg loader only, for instance because we changed that default recursion limit? You can try setting “egg-recursion-limit 10000” to see if it makes a difference.

David


#40

I tried setting it to 1000, 10000, and 1000000. It didn’t seem to make any difference in memory usage.
The extra memory from the EGG loader isn’t really an issue for me now with the 3GB patch, and it won’t affect end users who will have BAM files instead.
I’m not sure if my issue with GLSL shaders is related to these changes or not, but otherwise everything seems to be working fine.


#41

There appears to be a problem with OrthographicLens when eigen is used in the build, in that it behaves differently and spits out warnings in some circumstances.
Example code:

from panda3d.core import *
from direct.showbase.ShowBase import ShowBase

class Game(ShowBase):
    def __init__(self):
        """Get the game ready to play."""
        ShowBase.__init__(self)
        self.model = self.loader.loadModel('smiley')
        self.model.reparentTo(render)
        self.lens = OrthographicLens()
        self.cam.node().setLens(self.lens)
        self.lens.setFilmSize(50)
        self.lens.setAspectRatio(1)

game = Game()
game.run()

The warning (spammed repeatedly):

linmath(warning): Tried to invert singular LMatrix4