RFC - how should I store my world data?

I am wanting to make a clone of Minecraft (see: www.minecraft.net). I already have the basic functionality (see: video) but am having trouble getting my mind around how to store the map data.

Here are the main hurdles I see:

  1. Large amount of blocks 256x256x64 = ~4.2million (I want a larger world than this)
  2. A pickle of 3 nested lists to represent 256x256x64 matrix of the letter “a” = 25mb or 175k zipped
  3. Divide the world into chunks so that blocks near the player can be picked and ones far from the player can be flattenStrong().
  4. Ability to check and update the local world file w/o re-downloading the whole thing.

So here is my plan to tackle all of this:

  1. Blocks contain data like their type and position
  2. Divide the world up into 4x4x4 “chunks” of blocks (64 blocks)
  3. Create a 64x64x16 matrix of chunks for a full map of 256x256x64 blocks.

All of this data would be saved into one massive nested list and pickled.

Around the player I would create a 3x3x3 matrix of chunks (loaded from above pickle) that are parented to render (so that the player can manipulate them). The player would be positioned inside the middle chunk and when they moved out of the middle the chunks behind them would be flattenStrong() and the chunks infront would be parented to render. This 3x3x3 matrix of chunks around the player would be at the center of a much larger matrix of chunks that have flattenStrong() so that the player can see a vast world.

This is really the only way I can think of to manage this large amount of blocks but the nested 3D matrix idea hurts my brain. Please let me know if you can think of some improvements to this system, or an easier way to tackle this problem.

Thanks in advance for all of your input.

  • Daniel.

Im not sure if you want to actually store pandanodes if yes then it would be 400mb+ which would take quite a while to load and unpickle. If you dont it will be quite a lot of data, too.

Databases are exactly made for big datasets so I would suggest to use one like the python-integrated shelve or sqlite3 instead of a nested list pickled as a whole. (Of course you could also store each (super-)block as an individual file to disk.)
Anyway you would have to convert your coordinates to strings for accessing the database and flatten your nested list. So yourlist[0][2][1] has to become a (database-)entry with the key “0 2 1” in shelve (x=0,y=2,z=1 would also be possible in sqlite3) for example.

the magic word is called “octree”

I would recommend using sqlite3 instead of shelve, as it’s easy to add a direct pickle object(using a recipe found in The Python Cookbook, also found online at http://code.activestate.com/recipes/252531-storing-binary-data-in-sqlite/).

You also get the advantage of having a network-enabled database, so your game can pull the worlds off of a server.
Then, once you’ve downloaded it, you can tell sqlite3 to read from the file, instead of downloading it again.

P.S: I’m also storing block-based worlds in my game/game-development kit, but it just needs to store about 10,000 bricks max, so I can’t really help you there. Although the bricks can have color, transparency, and they can be modified by Lua scripts, so maybe I can help a little.

And ThomasEgi, octrees are a data type, not a serialization method, but it would be better than a flat list :slight_smile:

I’m just not sure if you(the OP author) want to know where to serialize your data, or what to put your data in. Could you clarify?

Thank you for all of the responses.

My original idea was to have each “block” just be a string containing the attributes of each in-game block so the string would look like, block[0][0][0] = “wood,95hp” since the world could be destroyed and each block could have hp or other states like on fire or rotting.

sqlite3 might be an option but I have not done any tests to see which is faster, accessing my data from nested lists or from one huge 4.3million element sqlite3 table with keys like “x0 y0 z0”. It might be faster to use non unique keys for 3 columns x y and z in the table so it would be like “SELECT block FROM world WHERE x=0 AND y=0 AND z=0”

I could also test a combination of these systems like an SQL table of chunks containing the pickle of the 4x4x4 blocks. This would bring the number of SQL entries for a map down to around 65k and most of the time I will want an entire chunk instead of individual blocks.

My knowledge of octree(s) ends at what you can find on en.wikipedia.org/wiki/Octree . So I have no clue how to even get started testing this method. Do we have any simple examples of using octrees floating around anywhere?


Admittedly, that’s(http://en.wikipedia.org/wiki/Octree) where my knowledge of octrees ends as well, so I can’t help with that :blush:

Now, with sqlite3 vs. a nested list, performance should be about equal, since SQLite performs optimizations that a Python list can’t like caching(sticking stuff in RAM), etc…

But I haven’t run any tests, either, so I’m not sure.
But some databases(e.g, for truly huge web sites) have that many elements(and sometimes even more).

octrees are a way to store and access volumetric data with high efficiency.

imagine you have a cubical world of 512x512x512 blocks.
the first level of octree would split that in 8 blocks of 256x256 each. given your initial world state you could say the upper 4 are all empty, the lower ones all filled with earth unless specified otherwise in a different sub-tree. the same repeats over and over untill you reach the blocksize one, or all homogenous blocks.

octrees are efficient to search through and you can easily build optimized geometry from them.
this will not only make your world files a lot smaller but also help you to determine which cubes may, or may not be visible given your field of view and camera position/orientation.

i also highly recommend to store your block information not in strings but in single-byte values.
this will also help greatly to reduce storage-size while allowing higher performance over pickle and zipping.

your biggest problem will be the number of nodes. since you should keep the number of visible nodes below 300 or your gpu will start to slow down a lot.
so instead of traditional nodes you may want to write your own algorithm which creates the neccessary geometry on the fly based on the octree information.
or implement an intelligent mechanism of grouping blocks together as suited.

Great reply ThomasEgi, as you suggested the octree is probably the perfect solution to my problem.

It appears that by using octrees I could easily flattenStrong() whole branches that are far away from player. I could quickly search and retrieve the individual block data near the player. And even save some space when branches contain all identical blocks.

I will defiantly be doing a bit of reading to learn all of the details of this data structure.

I did find an example of a python octree here which was made for the python-ogre project. I will update you guys with my progress or if i run into any issues.


There is an error in the octree code I posted a link to above so here is the full code with the error corrected:

# python Octree v.1

# Is now more like a true octree (ie: partitions space containing objects)

# Important Points to remember:
# The OctNode positions do not correspond to any object position
# rather they are seperate containers which may contain objects
# or other nodes.

# An OctNode which which holds less objects than MAX_OBJECTS_PER_CUBE
# is a LeafNode; it has no branches, but holds a list of objects contained within
# its boundaries. The list of objects is held in the leafNode's 'data' property

# If more objects are added to an OctNode, taking the object count over MAX_OBJECTS_PER_CUBE
# Then the cube has to subdivide itself, and arrange its objects in the new child nodes.
# The new octNode itself contains no objects, but its children should.

# Psyco may well speed this script up considerably, but results seem to vary.

# TODO: Add support for multi-threading for node insertion and/or searching

#### Global Variables ####

# This defines the maximum objects an LeafNode can hold, before it gets subdivided again.

# This dictionary is used by the findBranch function, to return the correct branch index
DIRLOOKUP = {"3":0, "2":1, "-2":2, "-1":3, "1":4, "0":5, "-4":6, "-3":7}

#### End Globals ####

# Try importing psyco, in case it makes any speed difference
# ( Speed increase seems to vary depending on system ).
    import psyco
    print "Could not import psyco, speed may suffer :)"

class OctNode:
    # New Octnode Class, can be appended to as well i think
    def __init__(self, position, size, data):
        # OctNode Cubes have a position and size
        # position is related to, but not the same as the objects the node contains.
        self.position = position
        self.size = size

        # All OctNodes will be leaf nodes at first
        # Then subdivided later as more objects get added
        self.isLeafNode = True

        # store our object, typically this will be one, but maybe more
        self.data = data
        # might as well give it some emtpy branches while we are here.
        self.branches = [None, None, None, None, None, None, None, None]

        # The cube's bounding coordinates -- Not currently used
        self.ldb = (position[0] - (size / 2), position[1] - (size / 2), position[2] - (size / 2))
        self.ruf = (position[0] + (size / 2), position[1] + (size / 2), position[2] + (size / 2))

class Octree:
    def __init__(self, worldSize):
        # Init the world bounding root cube
        # all world geometry is inside this
        # it will first be created as a leaf node (ie, without branches)
        # this is because it has no objects, which is less than MAX_OBJECTS_PER_CUBE
        # if we insert more objects into it than MAX_OBJECTS_PER_CUBE, then it will subdivide itself.
        self.root = self.addNode((0,0,0), worldSize, [])
        self.worldSize = worldSize

    def addNode(self, position, size, objects):
        # This creates the actual OctNode itself.
        return OctNode(position, size, objects)

    def insertNode(self, root, size, parent, objData):
        if root == None:
            # we're inserting a single object, so if we reach an empty node, insert it here
            # Our new node will be a leaf with one object, our object
            # More may be added later, or the node maybe subdivided if too many are added
            # Find the Real Geometric centre point of our new node:
            # Found from the position of the parent node supplied in the arguments
            pos = parent.position
            # offset is halfway across the size allocated for this node
            offset = size / 2
            # find out which direction we're heading in
            branch = self.findBranch(parent, objData.position)
            # new center = parent position + (branch direction * offset)
            newCenter = (0,0,0)
            if branch == 0:
                # left down back
                newCenter = (pos[0] - offset, pos[1] - offset, pos[2] - offset )
            elif branch == 1:
                # left down forwards
                newCenter = (pos[0] - offset, pos[1] - offset, pos[2] + offset )
            elif branch == 2:
                # right down forwards
                newCenter = (pos[0] + offset, pos[1] - offset, pos[2] + offset )
            elif branch == 3:
                # right down back
                newCenter = (pos[0] + offset, pos[1] - offset, pos[2] - offset )

            elif branch == 4:
                # left up back
                newCenter = (pos[0] - offset, pos[1] + offset, pos[2] - offset )

            elif branch == 5:
                # left up forward
                newCenter = (pos[0] - offset, pos[1] + offset, pos[2] + offset )
            elif branch == 6:
                # right up forward
                newCenter = (pos[0] + offset, pos[1] + offset, pos[2] + offset )

            elif branch == 7:
                # right up back
                newCenter = (pos[0] + offset, pos[1] + offset, pos[2] - offset )
            # Now we know the centre point of the new node
            # we already know the size as supplied by the parent node
            # So create a new node at this position in the tree
            # print "Adding Node of size: " + str(size / 2) + " at " + str(newCenter)
            return self.addNode(newCenter, size, [objData])
        #else: are we not at our position, but not at a leaf node either
        elif root.position != objData.position and root.isLeafNode == False:
            # we're in an octNode still, we need to traverse further
            branch = self.findBranch(root, objData.position)
            # Find the new scale we working with
            newSize = root.size / 2
            # Perform the same operation on the appropriate branch recursively
            root.branches[branch] = self.insertNode(root.branches[branch], newSize, root, objData)
        # else, is this node a leaf node with objects already in it?
        elif root.isLeafNode:
            # We've reached a leaf node. This has no branches yet, but does hold
            # some objects, at the moment, this has to be less objects than MAX_OBJECTS_PER_CUBE
            # otherwise this would not be a leafNode (elementary my dear watson).
            # if we add the node to this branch will we be over the limit?
            if len(root.data) < MAX_OBJECTS_PER_CUBE:
                # No? then Add to the Node's list of objects and we're done
                #return root
            elif len(root.data) == MAX_OBJECTS_PER_CUBE:
                # Adding this object to this leaf takes us over the limit
                # So we have to subdivide the leaf and redistribute the objects
                # on the new children. 
                # Add the new object to pre-existing list
                # copy the list
                objList = root.data
                # Clear this node's data
                root.data = None
                # Its not a leaf node anymore
                root.isLeafNode = False
                # Calculate the size of the new children
                newSize = root.size / 2
                # distribute the objects on the new tree
                # print "Subdividing Node sized at: " + str(root.size) + " at " + str(root.position)
                for ob in objList:
                    branch = self.findBranch(root, ob.position)
                    root.branches[branch] = self.insertNode(root.branches[branch], newSize, root, ob)
        return root

    def findPosition(self, root, position):
        # Basic collision lookup that finds the leaf node containing the specified position
        # Returns the child objects of the leaf, or None if the leaf is empty or none
        if root == None:
            return None
        elif root.isLeafNode:
            return root.data
            branch = self.findBranch(root, position)
            return self.findPosition(root.branches[branch], position)

    def findBranch(self, root, position):
        # helper function
        # returns an index corresponding to a branch
        # pointing in the direction we want to go
        vec1 = root.position
        vec2 = position
        result = 0
        # Equation created by adding nodes with known branch directions
        # into the tree, and comparing results.
        # See DIRLOOKUP above for the corresponding return values and branch indices
        for i in range(3):
            if vec1[i] <= vec2[i]:
                result += (-4 / (i + 1) / 2)
                result += (4 / (i + 1) / 2)
        result = DIRLOOKUP[str(result)]
        return result

## ---------------------------------------------------------------------------------------------------##

if __name__ == "__main__":

    ### Object Insertion Test ###
    # So lets test the adding:
    import random
    import time

    #Dummy object class to test with
    class TestObject:
        def __init__(self, name, position):
            self.name = name
            self.position = position

    # Create a new octree, size of world
    myTree = Octree(15000.0000)

    # Number of objects we intend to add.

    # Number of collisions we're going to test

    # Insert some random objects and time it
    Start = time.time()
    for x in range(NUM_TEST_OBJECTS):
        name = "Node__" + str(x)
        pos = (random.randrange(-4500.000, 4500.000), random.randrange(-4500.00, 4500.00), random.randrange(-4500.00, 4500.00))
        testOb = TestObject(name, pos)
        myTree.insertNode(myTree.root, 15000.000, myTree.root, testOb)
    End = time.time() - Start

    # print some results.
    print str(NUM_TEST_OBJECTS) + "-Node Tree Generated in " + str(End) + " Seconds"
    print "Tree Leaves contain a maximum of " + str(MAX_OBJECTS_PER_CUBE) + " objects each."

    ### Lookup Tests ###

    # Look up some random positions and time it
    Start = time.time()
    for x in range(NUM_COLLISION_LOOKUPS):
        pos = (random.randrange(-4500.000, 4500.000), random.randrange(-4500.00, 4500.00), random.randrange(-4500.00, 4500.00))
        result = myTree.findPosition(myTree.root, pos)
        # This proves that results are being returned - but may result in a large printout
        # I'd just comment it out and trust me :)
        # print "Results for test at: " + str(pos)
        # if result != None:
        #    for i in result:
        #        print i.name, i.position,
        # print
    End = time.time() - Start

    # print some results.
    print str(NUM_COLLISION_LOOKUPS) + " Collision Lookups performed in " + str(End) + " Seconds"
    print "Tree Leaves contain a maximum of " + str(MAX_OBJECTS_PER_CUBE) + " objects each."

    x = raw_input("Press any key (Wheres the any key?):")

The error was calculating a new center for branch 6. the signs should be +++ instead of ±-


When I read the HD Minecraft thread the other day I almost posted suggesting they use panda over unity.

Here is another proposition, why not use minecraft’s data storage format? Granted it does prevent you from deviating from functionality. On the other hand it allows people to use all the tools and map makers already created for mindcraft greatly improving the 3rd party tool ecosystem right away. It also means people can use their existing creations.

Well croxis, I read your reply right as I finish up stripping down the above code to fit my needs.

Unfortunately I have found that creating the octree takes a long time (around 100s for a 128x128x128 cube). Then a pickle of that data structure is 340mb and takes around 10 minutes to pickle.

Loading the pickle also takes a long time but once loaded, searching for specific cubes is VERY fast.

After reading: minecraftwiki.net/wiki/Alpha_Level_Format

I am going to test a similar system. My plan is to have chunks of similar size 16x16x64 and store them in folders bassed on their position in the world so it would be like “world/x/y/chunk.dat” I have a tiled terrain system I set up a while back for allowing me to load and destroy egg files as the player approached them. I will test all this out and see how it goes.

Well it might not be too late, you can still use the file format for storage and the octree for in memory :stuck_out_tongue:

Have you looked at the panda3d and cython blog entry?: panda3d.org/blog/?p=173 Using cython can get to near C speeds which may help your octree creation times.

the whole idea about octrees is to store only the absolutely neccessary data. if you fill each leaf until you reached blocksize 1 regardless of it’s content there wont be any benefit. in this case you could aswell simply store the whole mapdata in a giant array and index it with offset values. you may be able to push the data down to 16byte per block, loading should be fast,searching aswell + you can directly write it down to disk as binary file.

This thread is very interesting as I am also worrying about the exact same problem (although I am not making a Minecraft clone). alogon3, I hope you keep posting your findings on this subject :smiley: It would be greatly appreciated.

BR // Astrogee

I choose to go with just saving the nested lists with pickle and gzip. I can load a 16x16x16 chunk from disk in about 4ms. One chunk takes up a little under 4k on disk.

Here is a very simplistic version of the “tile/chunk terrain” system I use. Just posting it here incase anyone else needs something similar.

NOTE: Hold right click to move the camera around. The tiles do not fit together perfectly because I do not know the exact dimensions of the model used and the model does not have 0,0 set to the center or corner of the model.

from direct.showbase.ShowBase import ShowBase
from direct.task import Task

class MyApp(ShowBase):
    def __init__(self):

        self.size = 100 # this should be set to the size of the map (not accurate for models/environment)
        self.grid = [[0,0], [1,0], [1,1], [0,1], [-1,1], [-1,0], [-1,-1], [0,-1], [1,-1]]
        self.oldPos = None
        #set up initial world.
        self.tiles = [None,None,None,None,None,None,None,None,None,]
        for n in range(len(self.tiles)):
            self.tiles[n] = self.loader.loadModel("models/environment")
        # Add the checkPos procedure to the task manager.
        self.taskMgr.add(self.checkPos, "CheckPlayerPos")
    # if the player is on a new tile, move around the tiles.
    # this could load new map files infront of the player or detach areas behind the player
    def updateWorld(self, pos):
    	for n in range(len(self.grid)):
    		tilePosX = (self.grid[n][0] + pos[0])*self.size
    		tilePosY = (self.grid[n][1] + pos[1])*self.size
    #task to check if the player has moved to a new tile since last frame
    def checkPos(self, task):
        cameraTilePos = [int(self.camera.getX()/self.size),
        if cameraTilePos != self.oldPos:
        self.oldPos = cameraTilePos
        return Task.cont
app = MyApp()

Thank you for a very interesting thread!

Can you post an update on your progress?

At Patrik’s request here is a small update of my progress:

Right now I am using the tile system above to load the block data from files (chunks) into a massive nested list. Then I place a small portion of this block data in the scene using a similar system of smaller subChunks.

My subChunks are much smaller (4x4x4 blocks) and I am still adjusting this size to find the best performance. A 3x3 grid of subChunks with the player in the middle are placed directly into the scene and all of the chunks further from the player are flattenStrong and placed into a rigid body combiner. This allows the player to see a massive world created by hundreds of thousands of blocks with out hurting performance.

As a player moves onto a new subChunks, the subChunks behind them are flattenStrong and placed into the rigid body combiner. The subChunks in front of them are detached and removed from the rigid body combiner then recreated and placed directly into the scene. This is actually a very quick process.

The issue with this system comes when I need to call the rigid body combiner’s collect() function. This call is very expensive (I know the manual warned me of this) and depending on the size and amount of subChunks this call could take anywhere from 2 to 10 seconds.

I might try moving some of this environment management stuff to its own thread or some other trickery but I suspect that a better solution would be to dynamically generate a mesh from the block data and change my system to allow the player to modify this mesh.

As usual I am always open to suggestions, comments and rude remarks.


I just recently found this thread and I was wondering how it was coming along.

Not that it really matters, but for the code above:

self.size = 123

That will make it fit together without the overlapping.

I apologize for the slow response, I have been quite busy lately and have not worked on this too much.

I am however, happy to report I have made some progress! I finally understand octrees. I am getting my head around perlin noise functions and have the following screen shot to show for it:


This screen shot is of a 256x256x256 octree containing 332,390 cubes. The holes in the map are not an error, just where the perlin noise function choose that the height of those areas should be 0,

The FPS in this image is a bit low, I noticed a drop of about 50fps when I used cl.ly to take the screen shot.

Here is a closer image where you can see the combined blocks: cl.ly/1Y0F2o1e0g0o3f1h0e3z

All of the geometry for this image is created on the fly (not using a egg file) this really helps out the performance because there is just one geom node in the scene graph and I only draw the exposed faces.

Side Note: Minecraft uses sections that are sized 16x16x128, my current app is able to consolidate a 16x16x16 octree in 0.6ms and able to recreate all of the geometry in 4.6ms I think this would be quick enough to just re draw these smaller chunks when blocks are added or removed.