Flattening faster (and instanced animated grass)

So, I’ve got this crazy script that puts lots and lots of grass on a terrain, flattens it into fewer nodes and writes it all to a bam file

from panda3d.core import *
import direct.directbase.DirectStart
import random

geoterrain = GeoMipTerrain("mySimpleTerrain")
geoterrain.setHeightfield("hf.png")
geoterrain.setMinLevel(4)
geoterrain.generate()
geoterrain.getRoot().setSz(100) 
grass1=loader.loadModel('grass_mini1.egg') 
grass2=loader.loadModel('grass_mini2.egg') 
grass3=loader.loadModel('grass_mini3.egg') 
grass4=loader.loadModel('grass_mini4.egg') 
grass_dict={}
root=NodePath("root")           
grass=render.attachNewNode('grass')
grass_map=PNMImage()
grass_map.read("grass_map.png")
for x in xrange(512):    
    for y in xrange(512):
        color=grass_map.getBright(x,511-y)       
        z=geoterrain.getElevation(x,y)*100
        model=grass1
        if color <0.9:
            model=grass2
        if color <0.7:
            model=grass3    
        if color <0.5:
            model=grass4
        if color<0.3:
            model=None
        if model!=None:
            grass_dict[(x,y)]=model.copyTo(root)
            grass_dict[(x,y)].setPos(x,y,z)
            grass_dict[(x,y)].setH(random.randrange(360)) #optional random rotation?
            grass_dict[(x,y)].clearModelNodes()
x_offset=0
y_offset=0
nodes=0
while y_offset<512:            
    group=grass.attachNewNode('group')    
    print y_offset, x_offset, "nodes:", nodes
    nodes=0
    for x in range(x_offset, x_offset+64):
        for y in range(y_offset, y_offset+64):
            if grass_dict.has_key((x,y)):
                grass_dict[(x,y)].reparentTo(group)
                nodes=nodes+1
    group.flattenStrong()
    x_offset=x_offset+64
    if x_offset>512:
        x_offset=0
        y_offset=y_offset+64
grass.writeBamFile('grass1.bam')       
grass.analyze()        
    
run()  

Models and textures if someone wants to run this:
sendspace.com/file/t573k1

It works, but it takes about an hour or so to finish. Any ideas how I could make this work faster?

This is how it looks:
i.imgur.com/3m2L1Gs.png
i.imgur.com/wBvNwTM.jpg

It’s sad that this either takes +60 minutes or ~350MB on the hard drive once generated. Looks like I’m cooked without hardware instancing(can’t find/make code running on ATI) :cry:

I use a GeomVertexWriter and synthesise the grass blades on the fly.

That’s an idea… too bad I don’t know where to start :mrgreen:

Is there any way to copy what I’ve already got (groups of blades in egg files) or do I have to generate them one vertex at a time?

What kind of improvement in terms of speed can I hope for? I’ve got a 512x512 map, for each pixel of the map about 10-30 blades, each blade is 4 triangles so in the worst case scenario I need to generate 31 457 280 triangles :open_mouth:

I can’t ship a game with 300MB of grass per level and I don’t think that writing this amount of data do disk is something players will be happy about, it would have to generate all that grass in under a minute - is that even plausible? I hate to waste time on something that will not work in a real life scenario :unamused:

Hmm. You planned to show all of this 31 million triangles at the same time? I think - not. And I think that some of this grass player never see. Try to generate only one small visible part, may be it done fairly quick to do it on the fly in the game (possible in thread)?

Ninth is right. You’ll never get that kind of performance on contemporary computers.

I knocked up a simple geomvertexwriter class to generate 51251220 blades. On my machine it takes 7 minutes to finish. If I reduce it to 2 geometries per unit instead of 20, it only takes 22 seconds.

Even when it did eventually generate all of the data, in the format that you used (flattened), all of the blades were being rendered all of the time, tanking my fps.

There are tricks you can use to fake grass rendering, which you can google. This one looks promising. http.developer.nvidia.com/GPUGem … _ch07.html

I’ve read that part of the GPU Gems a few times, but it might as well be written in Elvish runs for me… I know how it could be done, but I still don’t know how to do it :mrgreen:

The first approach, after a very, very long wait I got a 344 MB bam file, that loaded quite fast (under a minute…ok, let’s call it slow), looked good (see the screenshots in post#1) and got my framerate from ~45-60FPS to ~25-30FPS. Just loading the bam int pview gives me anything from 40 to 250 FPS. Acceptable.

Using a GeomVertexWriter as rdb suggested got something generated fast, this is my code (so far):

from panda3d.core import *
from direct.directbase import DirectStart
import random

def makeGrassBlades():
    format=GeomVertexFormat.getV3n3cpt2()
    vdata=GeomVertexData('blade', format, Geom.UHStatic)
    vertex=GeomVertexWriter(vdata, 'vertex')
    normal=GeomVertexWriter(vdata, 'normal')
    texcoord=GeomVertexWriter(vdata, 'texcoord')
    blade=Geom(vdata)
       
    for x in xrange(20):
        for y in xrange(20):
            r=random.uniform(0, 0.4)
            vertex.addData3f((x*0.4)-0.0291534+r, (y*0.4)+0.0101984+r, 0.0445018)
            vertex.addData3f((x*0.4)+0.0338934+r, (y*0.4)+0.041644+r, 0.83197)
            vertex.addData3f((x*0.4)+0.0304494+r, (y*0.4)-0.00795362+r, 0.360315)
            vertex.addData3f((x*0.4)-0.0432457+r, (y*0.4)-0.0362444+r, 0.0416673)
            vertex.addData3f((x*0.4)-0.0291534+r, (y*0.4)+0.0101984+r, 0.0445018)
            normal.addData3f(0.493197, 0.854242, -0.164399)
            normal.addData3f( -0.859338, 0.496139, -0.124035 )
            normal.addData3f(-0.759642, -0.637797, -0.127114)
            normal.addData3f(0.974147, -0.0584713, -0.218218)
            normal.addData3f(0.493197, 0.854242, -0.164399)
            texcoord.addData2f(0.0478854, 0.000499576)
            texcoord.addData2f(0.353887, 0.9995)
            texcoord.addData2f(0.999501, 0.363477)
            texcoord.addData2f(0.729119, 0.000499576)
            texcoord.addData2f(0.000499547, 0.000499576)    
    
    for z in xrange(0, 2000, 5): 
        triangles=GeomTriangles(Geom.UHStatic ) 
        triangles.addVertices(0+z,1+z,2+z)
        triangles.addVertices(2+z,3+z,0+z)
        triangles.addVertices(1+z,4+z,2+z)
        triangles.addVertices(3+z,2+z,4+z)            
        blade.addPrimitive(triangles)
    
    snode=GeomNode('node')    
    snode.addGeom(blade)    
    return snode
    
 
for x in range(4):
    for y in range(4):
        grass_group=render.attachNewNode(makeGrassBlades())
        grass_group.setTexture(loader.loadTexture("grass_mini.png"))        
        grass_group.setColor(1,1,1,1)
        grass_group.setPos(x*8,y*8,0)

#grass_group.analyze()

# Create some lighting
ambientLight = AmbientLight("ambientLight")
ambientLight.setColor(Vec4(.3, .3, .3, 1))
directionalLight = DirectionalLight("directionalLight")
directionalLight.setDirection(Vec3(-5, -5, -5))
directionalLight.setColor(Vec4(1, 1, 1, 1))
directionalLight.setSpecularColor(Vec4(1, 1, 1, 1))
render.setLight(render.attachNewNode(directionalLight))
render.setLight(render.attachNewNode(ambientLight))
render.setShaderAuto()

run()

So each Geom has 400 blades, 2000 vers, 1600 triangles, I’ve got 16 of these, but also tested it with 64. It struggles to render at ~20-30FPS :cry:

I don’t know why it’s so slow when compared to the flattened version (that has way more vertex). I used 4 triangles and the original grass blade was made from 2 triangle strips. Could that be it? How do you use triangle strips anyway? I’ve got most of the numbers here by looking at a single blade exported to a egg file… I don’t really know what I’m doing :neutral_face:

Are you generating each blade of grass as one or more triangles? If so, I strongly recommend that you you switch to textured quads/triangle pairs: you should end up with far less geometry to render, and you may even end up with a more convincing result stemming from the detail that can be included in textures at minimal additional cost.

For an example, look at section 7.3.2 of the GPU Gems page linked-to above.

That said, as you indicated, you don’t seem to have very many Geoms or vertices, so I’m not sure of why your current arrangement is so slow… :confused:

1 Test: “as is” ~ 30 FPS

2 Test: flattenStrong on each grass_group ~400-450 FPS

So, difference in colors and GeomPrimitive count.

Ah! You nailed it!
Calling flattenStrong on the groups also gave me 10x more speed. And it still loads as fast or laster then loading the jumbo bam from disk.

You could also consider using a geometry shader to generate the grass blades on the GPU.

In theory I could, but I’m still not very good at the shader business, so unless some shader-genius looks at ninth is sharing anything like that, I’ll just stick to what I’ve got. Maybe I’ll come back to this “grass knoll” in about 2 weeks, for now I’ve got different stuff to code.

Heh. Im not works with geometry shaders before, but it’s chance to do it )

Didn’t abandon this idea, and now with some glsl magic (I’m starting to like this shader businesses) I present … a lot-o-grass:

Download link:
sendspace.com/file/ouswfc

Run the file ‘gl_grass.py’ to view the demo, or ‘grass_gen4.py’ to generate some new grass blades for later use.
The generator will want a height map and a ‘grass map’, a grass map is nothing more then a black and white mask, black areas will have no grass, white will be seeded with grass blades.

The animation could be better but I find in more realistish (I wouldn’t say realistic) then the typical sway-back-and-for-flat-quad animation.

To run you will need a gpu that supports ARB_draw_instanced

Looks pretty good, but my test produced the following:

I get the following error at start up which I’m assuming might have something to do with all the missing grass in my shot:

:grutil(warning): Rescaling heightfield image hf.png from 512x512 to 513x513 pixels.
:display:gsg:glgsg(error): An error occurred while compiling shader!
0(4) : warning C7572: OpenGL requires extension names to begin with 'GL_'
0(19) : error C7532: global variable gl_InstanceID requires "#version 140" or later
0(19) : error C0000: ... or #extension GL_EXT_gpu_shader4 : enable
0(19) : error C0000: ... or #extension GL_EXT_draw_instanced : enable

I’m running on an nvidia card so maybe that’s the problem.

Judging by the error, the shader file contains something like:

#extension EXT_draw_instanced : enable

Whereas it should contain:

#extension GL_EXT_draw_instanced : enable

It has:

#extension ARB_draw_instanced : enable

GL_EXT_draw_instanced was not supported by my ATI card.

csloss77-> can you try replacing that line in the vert shader to see if it works?

@csloss77 Just try to replace glsl version in the shader headers. It works for me with #version 130 or #version 140

@wezu thanks for sharing )

Ok, I’ve tried all of the above suggestions on two systems without a change in result; the shader still fails to compile. The cards on each system are: (1) Geforce GTX770, (2) ATI Sapphire X1150. Changing the headers to 130 or 140 just produces a list of further errors about variables being deprecated. Here’s the error from the second system (it happens with both headers GL_EXT and ARB):

WARNING: 0:3: extension 'GL_EXT_draw_instanced' is not supportedERROR: 0:19: 'gl_InstanceID' : requires extension support: GL_EXT_gpu_shader4
ERROR: 0:19: 'gl_InstanceID' : requires extension support: GL_EXT_gpu_shader4
WARNING: 0:19: implicit cast from int to float
WARNING: 0:19: implicit cast from int to float
ERROR: 0:19: 'gl_InstanceID' : requires extension support: GL_EXT_gpu_shader4
WARNING: 0:19: implicit cast from int to float
ERROR: 0:19: 'gl_InstanceID' : requires extension support: GL_EXT_gpu_shader4
ERROR: 0:20: 'gl_InstanceID' : requires extension support: GL_EXT_gpu_shader4
ERROR: 0:20: 'gl_InstanceID' : requires extension support: GL_EXT_gpu_shader4
WARNING: 0:20: implicit cast from int to float
WARNING: 0:20: implicit cast from int to float
ERROR: 0:20: 'gl_InstanceID' : requires extension support: GL_EXT_gpu_shader4
WARNING: 0:20: implicit cast from int to float
ERROR: 0:20: 'gl_InstanceID' : requires extension support: GL_EXT_gpu_shader4
ERROR:  compilation errors.  No code generated.

The second card is 6 years old, so that might be the problem in this case.

GTX 770 is a monster, it can’t be that it has no support for instancing, unless my memory is wrong that card is just a few months old.
Maybe if the driver arsks for GL_EXT_gpu_shader4, then that’s the extention that should be enabled? Would you be willing to give it yet another go and add #extension GL_EXT_gpu_shader4 : enable ?

I found the docs here:
www.opengl.org/registry/specs/ARB/draw_instanced.txt
It says ’ EXT_gpu_shader4 or NV_vertex_program4 or OpenGL 3.0 is required.’ so maybe one needs to enable NV_vertex_program4 (if it’s supported)?

My ati card (3000 series) seams happy with ARB_draw_instanced… but maybe the correct way is to use GL_ARB_draw_instanced?

I think I could write some ‘ifdefs’ if I knew what works on nvidia cards …BTW is there a way to ask the driver from python for supported extentions? If a card has no support for instancing then drawing a few sad blades of grass is pointless it’d be best not to draw it at all.

Just to be clear that second error I posted is from the sapphire x1550, the GTX 770 is the first one. Going back to the GTX I did try “#extension GL_EXT_gpu_shader4 : enable” and it gets rid of the original error but then there seems to be a version mismatch occurring. If I use 120 or 130 I get:

:display:gsg:glgsg(error): An error occurred while compiling shader!
0(32) : error C7532: global function texture requires "#version 130" or later

but switching to 140 produces:

:display:gsg:glgsg(error): An error occurred while compiling shader!
0(13) : warning C7555: 'varying' is deprecated, use 'in/out' instead
0(14) : warning C7555: 'varying' is deprecated, use 'in/out' instead
0(18) : error C7533: global variable gl_Vertex is deprecated after version 120
0(19) : error C7533: global variable gl_Color is deprecated after version 120
0(22) : error C7533: global variable gl_ModelViewProjectionMatrix is deprecated
after version 120
0(23) : error C7533: global variable gl_TexCoord is deprecated after version 120
0(23) : error C7533: global variable gl_MultiTexCoord0 is deprecated after version 120
0(25) : error C7533: global variable gl_NormalMatrix is deprecated after version 120
0(25) : error C7533: global variable gl_Normal is deprecated after version 120
0(26) : error C7533: global variable gl_LightSource is deprecated after version
120
0(27) : error C7533: global variable gl_FrontMaterial is deprecated after version 120
0(29) : error C7533: global variable gl_LightModel is deprecated after version 120

Maybe the case is that the shader is incompatible with ver 140 but the GTX won’t accept anything less? I’m fairly sketchy with shaders and then only have experience with Cg. I’d like to try get it working though; I could definitely see myself trying something like this for my own project.