Simple Instancing

Following the topic here I’ve tweaked a cut down version of instancing that JUST has instancing, and I hope works out of the box.

If you need more advanced usage, including different textures for each instance, check out etc.cmu.edu/projects/pandase/downloads.html. This is where the example below is adjusted from.

Anyway, this will create 200 smileys with a random position, rotation and scale. I’ve added some extra comments, cleaned things up a bit, and removed culling.

If you’re looking to get things like an asteroid field, trees, or mobs, it might help.

The python file:

#
# Authors: Federico Perazzi, Deepak Murali
# Last Updated: 28/08/2013
#
# This tutorial will demonstrate how to use Hardware Based Instancing
#
# Requires advancedInputs.cg shader

from pandac.PandaModules import loadPrcFileData

loadPrcFileData('', 'sync-video 1')
loadPrcFileData('', 'show-frame-rate-meter 1')

import random, sys, os, math
import direct.directbase.DirectStart
from   direct.showbase.DirectObject import DirectObject
from pandac.PandaModules  import *
from direct.gui.DirectGui import OnscreenText
from direct.task.Task     import Task
from pandac.PandaModules import Vec3,Vec4,Point3


class World(DirectObject):

  def __init__(self):

	# Check if the Graphics Hardware supports this program
    assert base.win.getGsg().getSupportsCgProfile("gp4vp"),\
      "Profile gp4vp is not supported by the hardware."
    assert base.win.getGsg().getSupportsCgProfile("gp4fp"),\
      "Profile gp4fp is not supported by the hardware."
    assert base.win.getGsg().getSupports2dTextureArray(),\
        "2D-Texture Array is not supported by the hardware."
    assert base.win.getGsg().getSupportsGeometryInstancing(),\
        "Geometry Instancing is not supported by the hardware."

    # This creates the on screen title
    self.title = OnscreenText(text="Panda3D: Simple Instancing",
                              style=1, fg=(1,1,1,1),
                              pos=(0.57,0.90), scale = .07) 

    # Set the background color
    base.setBackgroundColor(.66, .714, 1)

    # Hit ESC to exit
    base.accept("escape",sys.exit)




    # How many instances should be shown?
    # Limited to 255
    # If you want to change this, remember to also change the number in the shader!
    # TODO: See if we can add / remove instances without changing the size of the arrays
    self.instanceNum = 200

    # Create the dummy NodePath root. This NodePath is passed to the PandAI
    # Library that will update the position of the nodes. The model matrices 
    # of each node of this graph, will be used to feed the shader, that, through
    # Hardware Based Instancing, will render each node.
    self.dummyNodePathRoot = NodePath('dummy-path')
    self.dummyNodePath = []

    # Attach nodes to the dummy path. No models are loaded.
    for i in range(self.instanceNum):
      index = int(i)
      self.dummyNodePath.append(NodePath('dummy%.3d'%i))
      self.dummyNodePath[index].setPos(random.randrange(0,50),random.randrange(0,50),0)
      self.dummyNodePath[index].reparentTo(self.dummyNodePathRoot)
      self.dummyNodePath[index].setScale(random.uniform(0.2,0.5))
      self.dummyNodePath[index].setH(random.randrange(0,360))
    
    # Create another Nodepath. This nodePath contains the model that will
    # be instatiated multiple times directly inside the shader at the positions
    # defined by the dummy NodePath's nodes.
    self.orginalNode = loader.loadModel('smiley.egg.pz')
    self.orginalNode.reparentTo(render)    
    self.orginalNode.setShader(loader.loadShader("advancedInputs.cg"))
    self.orginalNode.setInstanceCount(self.instanceNum)

    # Send array of textures to the shader.
    # Use this to set a different texture for each instance
    # self.orginalNode.setShaderInput('mtex_0',self.ralphTextures) 

    # Custom bounds, as we don't want to cull all the instances if the original
    # scrolls out of view.
    # We should cull the instances properly though based on if the instance
    # is visible or not.
    self.orginalNode.node().setBounds(OmniBoundingVolume())
    self.orginalNode.node().setFinal(True)   

    # Add task to update the shader
    # This means you can update the dummyNodePath postions and the
    # instances will move as well
    taskMgr.add(self.IssueShaderParameters, "ShaderParameters")

  def IssueShaderParameters(self, task):

    # Set the instance count based on the current number of instances
    self.orginalNode.setInstanceCount(self.instanceNum)

    # Retrieve the view matrix needed in the shader.
    # Need to update this every frame to allow for camera movement
    self.viewMatrix = self.orginalNode.getMat(base.camera)

    # Retrieve model matrices from the dummy NodePath
    self.modelMatrices = [nodePath.getMat(self.dummyNodePathRoot) for nodePath in self.dummyNodePath]
    
    # Compute the modelview matrix for each node 
    self.modelViewMatrices = [UnalignedLMatrix4f(modelMatrix * self.viewMatrix) for modelMatrix in self.modelMatrices]

    # Send array of 4x4 matrices to the shader.
    self.orginalNode.setShaderInput('offset', self.modelViewMatrices)

    return Task.cont
	
w = World()
run()

advancedInputs.cg shader:

//Cg
//Cg profile arbvp1 arbfp1

// Total number of instances
#define NUMBER_OF_INSTANCES 200

// Number of textures we pass in the texture array
#define NUMBER_OF_TEXTURES 15

// This matrix represent a ninty degrees rotation around the X axis.
// It will used to transform a vertex from the Panda3D coordinate system
// to the OpenGL coordinate system (right handed - Y Up)
const float4x4 to_apiview = {{1.0, 0.0, 0.0, 0.0},
			     {0.0, 0.0, 1.0, 0.0},
		             {0.0,-1.0, 0.0, 0.0},
		             {0.0, 0.0, 0.0, 1.0}};

// Compute the inverse of the transpose of an affine
// matrix composed by a rotation and a translation.
float4x4 inverse_transposed(float4x4 matrix) {
  float4x4 R = matrix;
  float4x4 T = {{1.0, 0.0, 0.0, 0.0},
	        {0.0, 1.0, 1.0, 0.0},
	        {0.0, 0.0, 1.0, 0.0},
	        {-R[0].w, -R[1].w, -R[2].w, 1.0}};
  R[0].w = 0;
  R[1].w = 0;
  R[2].w = 0;

  return mul(T, R);
}

// Vertex data entering the Vertex Shader
struct VertexDataIN {
  float4 vtx_position  :POSITION;  // object-space
  float3 vtx_normal    :NORMAL;    // object-space
  float4 vtx_color     :COLOR0;
  float2 vtx_texcoord0 :TEXCOORD0;
  int l_id             :INSTANCEID;
};

// Vertex data coming out of the Vertex Shader
struct VertexDataOUT{
  float4 o_position :POSITION;  // clip-space
  float3 o_normal   :TEXCOORD1; // eye-space
  float3 o_texcoord :TEXCOORD3;
};

// Vertex Shader
void vshader(in  VertexDataIN IN,
             out VertexDataOUT OUT,
             uniform float4x4 mat_projection,
	     uniform float4x4 offset[NUMBER_OF_INSTANCES])
{
  // Change the coordinate system of IN.vtx_position from the object (model) space to clipping space.
  OUT.o_position  = mul(mul(mul(mat_projection, to_apiview),offset[IN.l_id]),IN.vtx_position);
  // Change the coordinate system of IN.vtx_normal from the object (model) space to clipping space.
  OUT.o_normal    = mul(mul(to_apiview,inverse_transposed(offset[IN.l_id])),float4(IN.vtx_normal,0)).xyz;

  // Set the Z coordinates depending on the 2D texture, stored in
  // the 2D texture array, that should be binded to the model.
  OUT.o_texcoord = float3(IN.vtx_texcoord0,IN.l_id%NUMBER_OF_TEXTURES);
}

// Fragment Shader
void fshader(in VertexDataOUT vIN,
	     sampler2D tex_0,
	     sampler2DArray mtex_0,
	     out float4 o_color :COLOR)
{
   // Fetch Texture Color from a texture array
   o_color= tex2DARRAY(mtex_0,vIN.o_texcoord);
   
   // Or use the existing texture
   o_color= tex2D(tex_0,vIN.o_texcoord);
   
   // Apply diffuse lighting to final color.
   o_color*= (max(dot(normalize(vIN.o_normal),normalize((3.0).xxx)),0)+(0.2).xxxx)*1.3;
   
}

Thats cool, but there is an official engine-supported instancing too: panda3d.org/manual/index.php/Instancing

This is not the same. instanceTo will make a copy of the mesh on the cpu and it will be send to the gpu in a new batch. With hardware instancing you send the mesh to the gpu once and tell the gpu to render that mesh say 200 times.
A big number of independend meshes can slow the rendering down, the bus just can’t send more data no mater how fast the gpu and/or cpu is, with hardware instancing you can cut the amount of data send betwen the gpu and cpu.

ah got it. you are a bit wrong though:

It has only one mesh on cpu but renders it with one batch per instance.

Great contribution :slight_smile:

The animations are shared but as far as I know the models are still send on a per object basis to the gpu. See the blog: panda3d.org/blog/?p=44