Panda3D benchmark for 2d graphics

maxxim · September 28, 2019, 8:21pm

Hello. I want to use Panda3d to display 2d sprites. I decided to test how many sprtites on the screen it can handle.

Using CardMaker I could get only 1500 static sprites on screen before FPS dropped below 60.
2000 sprites gives only 40 FPS, and 8000 sprites is already 10 FPS.

I am running Windows 10 on Xeon E3 and GeForce 1050.

What am I doing wrong? Is there a better way to draw sprites?
Parts of the code below is borrowed from the forum, particualrly the create_sprites_node_setup function.

import random

from pandac.PandaModules import TextureStage, CardMaker, TextNode

from direct.showbase.ShowBase import ShowBase


def create_sprites_node_setup(screenWidth, screenHeight, parent):
    aspect_ratio = parent.getScale()[0]

    screenOrigin = parent.attachNewNode('screen_origin')
    screenNode = screenOrigin.attachNewNode('screen_node')

    screenOrigin.setPos(-1.0 / aspect_ratio, 0.0, 1.0)
    screenOrigin.setScale(2.0, 1.0, -2.0)

    screenNode.setPos(0, 0, 0)

    screenNode.setScale(1.0 / (aspect_ratio * screenWidth), 1.0, 1.0 / screenHeight)
    screenNode.setTexScale(TextureStage.getDefault(), 1.0, -1.0)

    return screenNode


def add_sprites():
    n = 500
    for _ in range(0, n):
        sprite = cm.generate()

        rx = random.randint(0, SCREEN_WIDTH)
        rz = random.randint(0, SCREEN_HEIGHT)

        spriteNP = base.aspect2d.attachNewNode(sprite)
        spriteNP.setTexture(texture)
        spriteNP.setPos(rx, 0, rz)
        spriteNP.setScale(texture.getOrigFileXSize(), 1.0, texture.getOrigFileYSize())
        spriteNP.setTransparency(True)
        spriteNP.reparentTo(sprites_root)

    global counter
    counter += n

    change_counter()


def change_counter():
    global counter
    text.setText("Sprite count: {} \nPress Space to add more".format(counter))


SCREEN_WIDTH = 800
SCREEN_HEIGHT = 600

base = ShowBase()
base.setFrameRateMeter(True)
base.accept('space', add_sprites)

sprites_root = create_sprites_node_setup(SCREEN_WIDTH, SCREEN_HEIGHT, base.render2d)
texture = base.loader.loadTexture("panda.png")
cm = CardMaker('spritesMaker')

counter = 0
text = TextNode('counter')
textNodePath = base.render2dp.attachNewNode(text)
textNodePath.setScale(0.07)
textNodePath.setPos(-1, 0, .9)
textNodePath.setColor(1.0, 0.0, 0.0, 1)

change_counter()

base.run()

maxxim · September 28, 2019, 8:23pm

panda.png is this file from the official repo, but you can use any file
590956

Thaumaturge · September 28, 2019, 11:17pm

If I’m not much mistaken, one big problem with this approach is that you’re generating as many nodes as you have sprites–and 1500 nodes is quite a lot, I believe.

If you want to draw that many sprites, you could perhaps look at a shader-based approach–I think that that’s how some games handle large numbers of particles.

wezu · September 29, 2019, 8:13am

The problem is in the amount of nodes, you always should aim to have up to 100-ish nodes on screen else the performance will suffer.

I can think of 3 ways to do it.
Flatten multiple sprites into a one node. You will no longer be able to change the position/scale/rotation of individual nodes and they all need to use the same texture.

Use point render mode. This way each vertex is rendered as a point and you can control the size of the points with a shader, and you will need the shader to also set the position and texture of the sprite. Depending on hardware there may be a top limit to the size of the sprite.

Hardware instancing. Tell the gpu to render the same sprite as many times aa you need and use a shader just like above to control position, shape, color.

maxxim · September 29, 2019, 11:16am

Thanks, wezu and Thaumaturge. Does it mean the scene graph itself is a bottleneck, regardless of a node type? Is it because the renderer is pushing vertex data to GPU for each node individually in a python loop?

It is sad then. I wonder how would it perform on a 10-year old machine or some Atom netbook.

eldee · September 29, 2019, 11:57am

You don’t need shaders or create your geometry manually to render sprites with Panda3D, and it will work with decent performance even on old machine. You simple need to use GeomPoints. If your hardware does not support hardware points, Panda will create an quad geometry for you and instanciate it for all the points in one draw call.

See https://www.panda3d.org/manual/?title=GeomPrimitive

For activating hardware sprites, use

load_prc_file_data("", “hardware-point-sprites #t”)

maxxim · September 29, 2019, 12:07pm

Thanks for the advice, I will try it out.

Epihaius · September 29, 2019, 12:34pm

And if you want to control the size of individual point sprites, you can use a vertex format that contains a size column and change that size attribute at runtime using a GeomVertexWriter.

maxxim · September 29, 2019, 1:18pm

Okay, using one GeomNode and MPointSprite textures, I was able to get up to 30 000 sprites and 60 FPS on my machine. Good! Again, I borrowed some of the code from this forum:

from direct.showbase.ShowBase import ShowBase
from panda3d.core import GeomVertexFormat, GeomVertexData, GeomVertexWriter
from panda3d.core import Geom, GeomPoints, GeomNode, NodePath
from panda3d.core import TextureStage, TexGenAttrib, TransparencyAttrib
from random import uniform


def create_sprites_node_setup(screenWidth, screenHeight, parent):
    aspect_ratio = parent.getScale()[0]

    screenOrigin = parent.attachNewNode('screen_origin')
    screenNode = screenOrigin.attachNewNode('screen_node')

    screenOrigin.setPos(-1.0 / aspect_ratio, 0.0, 1.0)
    screenOrigin.setScale(2.0, 1.0, -2.0)

    screenNode.setPos(0, 0, 0)

    screenNode.setScale(1.0 / (aspect_ratio * screenWidth), 1.0, 1.0 / screenHeight)
    screenNode.setTexScale(TextureStage.getDefault(), 1.0, -1.0)

    return screenNode


SCREEN_WIDTH = 800
SCREEN_HEIGHT = 600
NUM_SPRITES = 30000

base = ShowBase()
base.setFrameRateMeter(True)
sprites_root = create_sprites_node_setup(SCREEN_WIDTH, SCREEN_HEIGHT, base.render2d)

# vertex writer
vdata = GeomVertexData('points', GeomVertexFormat.getV3(), Geom.UHDynamic)
vwriter = GeomVertexWriter(vdata, 'vertex')

# random sprites
for i in range(NUM_SPRITES):
    vwriter.addData3f(uniform(0, SCREEN_WIDTH), uniform(0, 0), uniform(0, SCREEN_HEIGHT))

# create geom
points = GeomPoints(Geom.UHDynamic)
points.addNextVertices(NUM_SPRITES)
points.closePrimitive()
geo = Geom(vdata)
geo.addPrimitive(points)
gnode = GeomNode('points')
gnode.addGeom(geo)
np = sprites_root.attachNewNode(gnode)

# point sprite effect
texture = base.loader.loadTexture("panda.png")
np.setTransparency(TransparencyAttrib.M_alpha)
np.setTexGen(TextureStage.getDefault(), TexGenAttrib.MPointSprite)
np.setTexture(texture)
np.setRenderModeThickness(32)

base.run()

Now I have a question. How can I change the vertex data at the runtime? E.g. for particle effects. Is using shaders the only way? (I have zero experience with shaders so far)

Epihaius · September 29, 2019, 1:56pm

This modified version of your code randomly changes the size of sprites:

from direct.showbase.ShowBase import ShowBase
from panda3d.core import GeomVertexArrayFormat, GeomVertexFormat, GeomVertexData, GeomVertexWriter
from panda3d.core import Geom, GeomPoints, GeomNode, NodePath, InternalName
from panda3d.core import TextureStage, TexGenAttrib, TransparencyAttrib
from random import uniform, randint, random


def create_sprites_node_setup(screenWidth, screenHeight, parent):
    aspect_ratio = parent.getScale()[0]

    screenOrigin = parent.attachNewNode('screen_origin')
    screenNode = screenOrigin.attachNewNode('screen_node')

    screenOrigin.setPos(-1.0 / aspect_ratio, 0.0, 1.0)
    screenOrigin.setScale(2.0, 1.0, -2.0)

    screenNode.setPos(0, 0, 0)

    screenNode.setScale(1.0 / (aspect_ratio * screenWidth), 1.0, 1.0 / screenHeight)
    screenNode.setTexScale(TextureStage.getDefault(), 1.0, -1.0)

    return screenNode


SCREEN_WIDTH = 800
SCREEN_HEIGHT = 600
NUM_SPRITES = 300

base = ShowBase()
base.setFrameRateMeter(True)
sprites_root = create_sprites_node_setup(SCREEN_WIDTH, SCREEN_HEIGHT, base.render2d)

# vertex writer
array_format = GeomVertexArrayFormat()
array_format.add_column(InternalName.make("vertex"), 3, Geom.NT_float32, Geom.C_point)
array_format.add_column(InternalName.make("size"), 1, Geom.NT_float32, Geom.C_other)
vertex_format = GeomVertexFormat()
vertex_format.add_array(array_format)
vertex_format = GeomVertexFormat.register_format(vertex_format)
#vdata = GeomVertexData('points', GeomVertexFormat.getV3(), Geom.UHDynamic)
vdata = GeomVertexData('points', vertex_format, Geom.UHDynamic)
vwriter = GeomVertexWriter(vdata, 'vertex')
swriter = GeomVertexWriter(vdata, 'size')

# random sprites
for i in range(NUM_SPRITES):
    vwriter.addData3(uniform(0, SCREEN_WIDTH), uniform(0, 0), uniform(0, SCREEN_HEIGHT))
    swriter.addData1(32)

# create geom
points = GeomPoints(Geom.UHDynamic)
points.addNextVertices(NUM_SPRITES)
# points.closePrimitive()
geo = Geom(vdata)
geo.addPrimitive(points)
gnode = GeomNode('points')
gnode.addGeom(geo)
np = sprites_root.attachNewNode(gnode)

# point sprite effect
texture = base.loader.loadTexture("panda.png")
np.setTransparency(TransparencyAttrib.M_alpha)
np.setTexGen(TextureStage.getDefault(), TexGenAttrib.MPointSprite)
np.setTexture(texture)
# np.setRenderModeThickness(32)


def set_sprite_size(row_index, new_size):

    vdata = np.node().modify_geom(0).modify_vertex_data()
    swriter = GeomVertexWriter(vdata, "size")
    swriter.set_row(row_index)
    swriter.set_data1(new_size)


def change_size_task(task):

    row_index = randint(0, NUM_SPRITES - 1)
    new_size = random() * 64.
    set_sprite_size(row_index, new_size)

    return task.again


base.task_mgr.do_method_later(.5, change_size_task, "change_size")


base.run()

It uses a custom vertex format that includes a size column. Because of this, the call to setRenderModeThickness no longer has any effect; you need to initialize the sizes manually.

Some general remarks:

please don’t use calls like addData3f, but addData3, as the former leads to a crash if Panda is compiled for double precision, while the latter will make Panda always do the right thing;
calling closePrimitive is only necessary for complex GeomPrimitive types, not for simple ones like GeomPoints, GeomTriangles etc.

Hope this helps .

maxxim · September 29, 2019, 2:12pm

Yes, this helps, thank you!

I wonder why this is not abstracted in Panda3D to a class, it looks like basic functionality.

It would be good to have a “Shared Geomery Node” (or something like this) for point sprites, that stores only a pointer to the row in the shared vertex data object.

Thaumaturge · September 29, 2019, 2:25pm

Ah, I didn’t know about this feature–that’s really neat!

maxxim · October 6, 2019, 9:09pm

OK, I thought that the bottleneck was Python. But it seems that it isn’t.
I did the same benchmark in C++ and got THE SAME results. FPS drops below 30 at ~3000 sprites. This is exactly the same as with Python.

#include "pandaFramework.h"
#include "pandaSystem.h"
#include "cardMaker.h"
#include "texturePool.h"
#include "TextureStage.h"
#include "load_prc_file.h"

#include <iostream>
#include <random>
#include <string>

// Panda window
PandaFramework framework;
WindowFramework* window;
int SCREEN_WIDTH = 1024;
int SCREEN_HEIGHT = 768;

// Texture
Texture* tex;
int TEX_WIDTH = 32;
int TEX_HEIGHT= 32;

// CardMaker
CardMaker* cm;

// Root node to attach sprites to
NodePath spritesRoot;

// Random number generator
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_int_distribution<int> uniX(0, SCREEN_WIDTH);
std::uniform_int_distribution<int> uniZ(0, SCREEN_HEIGHT);

// Number of sprites to create
int NUM_SPRITES = 3000;


NodePath createSpritesRoot() {
	NodePath parent = window->get_render_2d();
	int aspect = parent.get_scale()[0];

	NodePath screenOrigin = parent.attach_new_node("screen_origin");
	NodePath screenNode = screenOrigin.attach_new_node("screen_node");

	screenOrigin.set_pos(-1.0 / aspect, 0.0, 1.0);
	screenOrigin.set_scale(2.0, 1.0, -2.0);

	screenNode.set_pos(0, 0, 0);

	screenNode.set_scale(1.0 / (aspect * SCREEN_WIDTH), 1.0, 1.0 / SCREEN_HEIGHT);
	screenNode.set_tex_scale(TextureStage::get_default(), 1.0, -1.0);

	return screenNode;
}

void addSprites() {
	PT(PandaNode) sprite = cm->generate();
	NodePath np = spritesRoot.attach_new_node(sprite);
	np.set_texture(tex);
		
	int rx = uniX(rng);
	int rz = uniZ(rng);

	np.set_pos(rx, 0, rz);
	np.set_scale(TEX_WIDTH, 1.0, TEX_HEIGHT);
	np.set_transparency(TransparencyAttrib::Mode::M_alpha);
}

int main(int argc, char* argv[]) {
	load_prc_file_data("", "show-frame-rate-meter true");
	std::string winSizeStr = "win-size " + std::to_string(SCREEN_WIDTH) + " " + std::to_string(SCREEN_HEIGHT);
	load_prc_file_data("", winSizeStr);

	framework.open_framework(argc, argv);
	framework.set_window_title("My Panda3D Window");
	window = framework.open_window();
	
	// Load texture
	cm = new CardMaker("sprites");
	tex = TexturePool::load_texture("panda.png");
	spritesRoot = createSpritesRoot();

	for (size_t i = 0; i < NUM_SPRITES ; i++)
	{
		addSprites();
	}

	framework.main_loop();
	framework.close_framework();
	return (0);
}

Is it the Scene Graph that slows everything?
Or is it Panda’s render system, which does a separate draw call for each sprite?

serega-kkz · October 6, 2019, 9:27pm

I think you can use the RenderDoc graphical debugger or another. To find out how sprites are rendered.

maxxim · October 6, 2019, 9:35pm

Thanks for your suggestion, @serega-kkz. It seems that I’ve found the answer on the forums:

It looks for me that panda3D switches textures, blendmodes, etc (I am not a graphics programmer, so I don’t really know) for every node, doesn’t matter if it uses the same texture or not, and sends them individually to the render pipeline.

serega-kkz · October 6, 2019, 9:43pm

For the sake of experiment, you can try this: RigidBodyCombiner You may lose access to the sprite geometry, but this should show the truth. Is Scene Graph a bottleneck.

maxxim · October 6, 2019, 9:57pm

Seems like a cool feature, I didn’t know about it. Can you give me some advice, how to make it work in 2d plane?

serega-kkz · October 6, 2019, 10:15pm

I never thought about it. But you can easily use orthographic camera for rendering.

maxxim · October 6, 2019, 10:20pm

Nevermind, I got it to work! Thanks, Serega.I’ve just attached the rbc-node to the sprite_root.

Easily got 10 000 static sprites at 60 FPS!
15 000 sprites at 40 FPS and 20 000 sprites at 30 FPS.

This is pretty good. I think it is enough for me.

One more question, is this feature available in C++ API?

serega-kkz · October 6, 2019, 10:23pm

Judging by the forum, then yes