Youtube video - "How to Create Minecraft in Python and Panda3D" , also a question about it

Nice to see better produced tutorials for this engine:

One specific question I have, how does he keep the framerate high when just using instanceTo? From my understanding instanceTo is not GPU instancing, so how is he getting a decent fps with hundreds of individual meshes in the frame? Has instanceTo been updated to use shaders since last time I checked? Has Panda3D renderer itself been updated to use some dynamic optimization under the hood? Or is it still a very resource heavy way of doing it, just on a beefy setup?

1 Like

https://docs.panda3d.org/1.10/python/programming/scene-graph/instancing#instancing-an-important-caveat
It’s still rendering a decent amount of triangles. (48012 in the entire scene)

Here’s the show-scene-graph-analyzer-meter 1 output on my machine.
Running on a Core-i7-8750H with a GTX 1050 Ti and getting ~25FPS.

That, I fear, would be a question to ask the person who made the video. For one thing, it might well be worth asking for the specs of the machine on which they were running!

[edit]
One thing that I do notice in a quick look at the video, however, is that the video-author seems to be using a fairly small window when running the game. That might be helping their frame-rate!
[/edit]

To the best of my knowledge, no.

Specifically, the NodePath “instanceTo”/“instanceUnderNode” methods aren’t actually intended for the sort of instancing that you’re thinking of, if I’m not much mistaken. (The naming is perhaps a bit of a problem, as it may well give the impression that these methods are intended for that sort of instancing.)

Rather, I believe that they’re intended simply to reduce the impact of skeletal animation–a process that can be somewhat CPU intensive for large numbers of skeletons.

They do this by allowing the engine to animate a mesh once, and then essentially copy the result to the indicated places and with the indicates states. (This isn’t an exactly-accurate description of how it works, but in effect it’s much the same.)

rdb would know better, but I don’t think that I’ve heard of any such thing.

For the sort of instancing that you’re thinking of, custom shaders might be called for. You might find one or more snippets if you search the forum!

However, there are a few non-instancing tricks that might perhaps be used to improve performance.

For example, this seems like the sort of scene to which the Rigid Body Combiner might apply quite well, or some clever usage of MeshDrawer.

I’m surprised the GTX10xx series can even show 4000 individual geoms in one frame at 25 fps.
This is what I tried to say, the code would be very impractical unless real instancing, or at least some proper occlusion culling was done.

Seems like it’s one of the tutorials that has prioritized production quality over the quality of the actual material, for clicks. This is not a practical way to handle voxels.

Relevant:

1 Like

Here’s how I’ve done a lot of cubes on a 3D grid (or “voxels” if you prefer) which allows nearly a half million cubes on my machine (mind you, at a heavily reduced framerate):


    def create_geometry(self):
        self.cube_model = self.loader.load_model("1m_cube.gltf")
        self.cube_model.set_scale(0.49)
        self.cube_model.set_name("CubeModel")

        self.instance_root = NodePath("InstanceRoot")
        self.instance_root.reparent_to(self.render)

        for x in range(self.size):
            for y in range(self.size):
                for z in range(self.size):
                    if self.grid[x][y][z]:
                        instance = self.instance_root.attach_new_node("Instance")
                        instance.set_pos(Point3(x, y, z))
                        self.cube_model.instance_to(instance)

This may or may not be performant enough for your application/specific hardware goal.

1 Like

In all fairness, the use of “instanceTo”/“instanceUnderNode” would I think potentially reduce construction time, mesh-count, and RAM usage, and may (depending on the scene hierarchy) also reduce node-count.

Sorry but I think you pretty much did what the tutorial author did. I’ve already mentioned the issue I believe exists with using instanceTo()

I mean, it’s just cubes, I don’t see the RAM use not being miniscule even with tens of thousands of them.

What do you mean by meshes? I assume you’re referring to the Panda3D Geom object. I don’t see how instanceTo can reduce Geom count, like I mentioned before, I believe that’s the job of geometry shader instancing.

I don’t know what you mean by node count. That should have no effect on performance and you probably want to preserve the individual cube nodes since the terrain is dynamic and interactive (you could for example check for collisions while treating each cube as its own collision node without having it rendered as its own geom).

The only potentially practical way to do this in Panda3D, sans use of custom instacing shaders, would maybe be the use of procedural geometry which is very well explained in the manual and having the entire terrian or pages of it as a single Geom. But I’ve never done it for non-static geoms, I don’t know if it would still be fast enough like shader instacing would.

Just using instanceTo() and calling it a day I think would only improve the initialization of the scene, like you mentioned.

Because, when using “instanceTo”/“instanceUnderNode”, there is only one copy of the referenced mesh.

You see, those methods work, as I recall, by creating node-hierarchies in which multiple nodes end up pointing to the same underlying mesh. The renderer, working its way through the hierarchy, ends up visiting that single mesh more than once, and thus renders it more than once–but with transforms influenced by the nodes along that specific path through the hierarchy, thus resulting in a different final transform.

Thus there is just one geom–and the resources consumed by just one geom–even though the mesh appears multiple times in the scene.

That is one way of doing it–and, I would guess, a faster way.

Still, “instanceTo”/“instanceUnderNode” should, I now realise, offer some improvement in performance. I’d expect said approach to perform somewhere between the naive, presumably-slow approach of multiple meshes and the presumably-very-fast approach of shader-based instancing.

Oh, node-count can have a big effect on performance.

Indeed, keeping the number of nodes visible at any given moment to a minimum can be an important optimisation!

I mean, with the “instanceTo”/“instanceUnderNode” approach, you do still have intermediary nodes that might be used to move or hide individual cubes.

I wouldn’t entirely write off either the “instanceTo”/“instanceUnderNode” approach.

Or–as I mentioned above–the “RigidBodyCombiner” approach.

(I also mentioned MeshDrawer–but that’s essentially a procedural-geometry approach, which is what you were referring to.)

All the above discussion said, this is a testable hypothesis: put together a small program that can be easily configured to construct a bunch of cubes either naively (i.e. loading each as a separate mesh) or via “instanceTo”, and then see what sort of frame-time you get with each.

[edit]
In fact, I decided to do just that:

Here below is the program that I whipped up–please do critique if I’ve missed something!

from panda3d.core import loadPrcFile, loadPrcFileData
loadPrcFileData("", "show-frame-rate-meter #t")
loadPrcFileData("", "frame-rate-meter-milliseconds #t")

from direct.showbase.ShowBase import ShowBase

USE_INSTANCE_TO = True

class Game(ShowBase):
    def __init__(self):
        ShowBase.__init__(self)
        
        if USE_INSTANCE_TO:
            self.baseModel = loader.loadModel("sphere")
            generationMethod = self.generateByInstanceTo
        else:
            generationMethod = self.generateByNaiveLoading

        for i in range(100):
            for j in range(100):
                np = generationMethod()
                np.setPos((i - 50) * 2, 400, (j - 50) * 2)

    def generateByInstanceTo(self):
        np = self.render.attachNewNode("mew")
        self.baseModel.instanceTo(np)
        return np

    def generateByNaiveLoading(self):
        np = loader.loadModel("sphere")
        np.reparentTo(render)
        return np

app = Game()
app.run()

As to the results…

Indeed, it seems that there’s little to no difference in performance for a static scene!

(Although a quick-and-dirty test with the RigidBodyCombiner suggested that it did help significantly–albeit as long as the vertex-count per object was kept low. But then, if we’re talking about cubes, then that is the case…)

On my 6700 XT machine, the instanceTo version of your test program there runs ~10 milliseconds faster (57 ms vs. 67 ms for the naive model load version). Also there is a noticeable startup lag time while the naive model load version, well, loads, whereas the instanceTo version starts pretty much instantly for me. I’m guessing the differences become more noticeable on fairly fast hardware like this machine has.

Sorry Thaumaturge, but I’m pretty sure you’re wrong. You’re using the phrase Mesh while in Panda3D API there is no such concept, there’s Geom , I think you’re also referring to Geom by saying Mesh.

“Geom is the smallest piece into which Panda will subdivide the scene for rendering; in any given frame, either an entire Geom is rendered, or none of it is.”

But then you seem to confuse “Node” (I assume you mean either NodePath or GeomNode) with what a Mesh/Geom is.
You can have many NodePaths and Panda3D wouldn’t care, each is less than probably few kilobytes of data in the RAM and nothing to do with the GPU. Pretty sure even an empty GeomNode should have no impact, although there wouldn’t be much point in having an empty one. The amount of Geoms in the camera view is what is important, not the node/nodepath. This is the GPU bottleneck, even modern GPUs which can render a billion polygons still struggle with hundreds of individual Geoms.meshes in the camera view.

What I’m describing is a known limitation in Panda, in fact I’ve known about it since 2009! Hardware Geometry Instancing | Panda3D

" But doesn’t Panda3D already support instancing?

Currently, Panda3D supports instancing of animated models. That is entirely unrelated to geometry instancing. The existing instancing system only exists to improve performance if you have a lot of animated models, by reducing the amount of vertex displacements that are done by Panda3D’s animation system. Geometry instancing, on the other hand, exists to greatly reduce the amount of data that is passed to the video card. Whether the model is animated or not is irrelevant with the new instancing system."

As for RigidBodyCombiner, it’s been years since I’ve dealt with it, my guess it does the procedural geometry approach I mentioned internally. I do remember it also had limitations, but can’t quite remember what, I think at least the culling per object breaks. RigidBodyCombiner has existed before RDB made that blog post, so he probably knows better why RGC is not a good option when you need hundred thousand trees or cubes.

Mesh and Geom can be two distinct concepts. I usually use “mesh” as the geometry of a 3d model (as opposed to sliders, textures, bones, etc.) while a Geom is exactly as that definition you grabbed. It’s important to note that a single “Model” in panda3d is mostly conceptual, with the beginning of a model in the scene graph being represented by a modelNode. Furthermore, if a geomNode holds multiple geoms, it would be fair to say those geoms are all part of the same “mesh”, right?

as explained here, the purpose of instanceTo() is to create a new nodePath that references the same pandaNode. if you have thousands of copies of a single pandaNode, and the data inside that pandaNode never changes between copies, (i.e. character nodes all playing the same animation in sync) you can create multiple nodePaths that all reference the same pandaNode in memory, and thus updating only that one animated character’s data.

NodePaths don’t really have a direct effect on render performance, unless you set a bunch of render attributes on them. panda3d lets you set attributes and have them propagate down the scene path. Since an attribute is a change to a node’s render state, this means that panda3d won’t be able to flatten these nodes/nodepaths down and send them to the GPU in a single batch.

It’s less a limitation of panda3d and more a limitation of computers and graphics cards in general. See here.

1 Like

Ah, interesting, and surprising! (But good to know!)

It may be as you say: that the difference is more visible on faster hardware. Or perhaps there’s a bottleneck elsewhere in my system that’s swamping the effect of “instanceTo”.

At the least, it doesn’t seem that using “instanceTo” in this way is disadvantageous.

What frame-times were you seeing for each approach? (I think that I was getting about 40ms either way–and about 20-something with the RigidBodyCombiner.)

Not really, no; I’m not confusing them at all.

It’s not just about RAM and GPU-speed–in this case, the more nodes there are, the more work the culling system has to do in traversing those nodes.

Remember: there is more than one potential bottleneck in a game. There’s RAM, graphics memory, GPU speed, communication speed to the GPU, CPU speed, and I daresay more besides!

Pretty much, I imagine.

But I took your mention of “procedural geometry” to refer to developer-coded procedural work, rather than via Panda-provided systems.

At the least I think that it’s worth having it mentioned that Panda does provide some systems by which one can automate or simplify such procedural generation!

Of course! All systems do, I daresay.

The thing is, I’m not convinced that those are quite the same case.

In the case of trees, one might be switching between levels-of-detail quite frequently as the player moves around, which I imagine wouldn’t work well with the RigidBodyCombiner.

Conversely, a cube isn’t likely to benefit as much from levels-of-detail, and so the RigidBodyCombiner might help there.

Sorry but you can’t just say no and call it a day, I explained why I think you’re wrong and you didn’t address it. I might as well respond with “Yes, really, you do. You are really confusing them”. What’s the value in this kind of dicsussion? None.

When you use a method in an API which both doesn’t benefit the code as well as doesn’t even do what it implies it does, then it’s just bad code (and coding tutorial) regardless of if it has negative side effect on performance or not. The fact that it doesn’t really do anything or doesn’t do what was implied, and in this instance doesn’t optimize much, IS the disadvantage.

It’s not just about RAM and GPU-speed–in this case, the more nodes there are, the more work the culling system has to do in traversing those nodes.

Again, nodes don’t render anything by themselves. If something isn’t rendered it’s not processed by the GPU. In Panda you need a Geom for GPU to be involved. And isn’t culling done on the CPU anyway?

Remember: there is more than one potential bottleneck in a game. There’s RAM, graphics memory, GPU speed, communication speed to the GPU, CPU speed, and I daresay more besides!

Sure but this is off topic and we could spend houra discussing general optimization which aren’t the issue for the subject of the topic.

Cubes are destructible, so the mesh data being rendered changes dynamically, LOD dynamically changes the mesh data being rendered, so the same as far as the GPU is concerned. In theory LOD could be implemented with RBC for this reason.
Again, I don’t remember much about RBC but I do remember it had disadvantages and there should be a reason why RDB didn’t mention it, I’ll let him clarify this if he sees this topic.

Not in Panda3d, which was my point.

Okay so you’re using your own terms and concepts here outside of the game engine/API. I’m not sure how this is a response to what I said.

No, you’d usually say the GameObject/NodePath/whatever your engine calls it has multiple meshes/Geoms. This would be important because of how GPU works/optimization and how culling and Mesh/Geom-level render attributes are shared between that single Mesh/Geom (for example if you want to control the color or material of a specific part of your object then it needs to be its own Mesh/Geom).
You could call each of your multi mesh NodePath a single “mesh” because it visually looks like a single part but you would be risking causing confusion for the above reason.

as explained here, the purpose of instanceTo() is to create a new nodePath that references the same pandaNode. if you have thousands of copies of a single pandaNode, and the data inside that pandaNode never changes between copies, (i.e. character nodes all playing the same animation in sync) you can create multiple nodePaths that all reference the same pandaNode in memory, and thus updating only that one animated character’s data.

Okay, the blog post I linked to explains the same.

NodePaths don’t really have a direct effect on render performance, unless you set a bunch of render attributes on them. panda3d lets you set attributes and have them propagate down the scene path.

Not if the NodePath is empty like in my example. You need a Mesh/Geom for the GPU to be involved.

Since an attribute is a change to a node’s render state, this means that panda3d won’t be able to flatten these nodes/nodepaths down and send them to the GPU in a single batch.

flattenStrong() will ignore that and do it anyway, choosing the render state of one of the NodePaths to apply to the new singular merged NodePath.

It’s less a limitation of panda3d and more a limitation of computers and graphics cards in general. See here.

It’s a limitation of Panda3D because Panda3D does not natively support hardware instacing. It would be a limitation of GPUs in this context if GPUs didn’t have a solution for it, but they do, Panda3D simply hasn’t implemented that feature. An analogy to your line of reasoning is if you said my custom rasterizer engine was slow because it used the CPU instead of the GPU and I responded that it’s not a limitation of my engine but the CPU.

The thing is, your assertion there was unfounded, I feel.

But all right, let me be clear:

In my previous posts, I believe that I was using the following terms in these ways:

  • Mesh: An actual set of vertices; a model.
    • In Panda, these are stored in GeomNodes, but I wasn’t bothering with distinguishing the terms so finely for the purposes of this discussion. The point was “an actual thing that is rendered”.
  • Node: Any node in the scene-hierarchy, whether it contains a Geom or not.

But note that my response there was to a post in which Simulan established that–for some hardware at least–it does provide a benefit.

What I’m saying, then, is that it provides benefit for some systems, and no deficit for the remaining systems, and so it’s a net positive.

But it’s not all about rendering. (See below for a fuller response, since the next few points are related.)

Indeed, that’s (a part of) my point! (Again, see below.)

It’s not, really:

The question is that of whether the use “instanceTo”/“instanceUnderNode” provides a performance boost.

You’re arguing that only things that speed up rendering are worth considering.

I’m arguing that the CPU can be a bottleneck, too, and that optimising there can thus provide a performance–and thus frame-rate–boost.

Thus “instanceTo”/“instanceUnderNode” may not speed up rendering, but it can nevertheless improve performance.

Indeed, this is what we see with its intended purpose: Animating skeletal meshes on the CPU can be slow, and this improves a performance boost by reducing the amount of animation to be done.

It’s not quite the same in practice however, I suspect:

In the case of trees, one wants to swap out once mesh for another–a high-poly tree for a low-poly tree; a low-poly tree for an imposter, and back again.

While that could, I imagine, be done with the RigidBodyCombiner, it may well be that it’s less effective when handled in that way.

In the case of cubes, it’s more a matter of… just showing or hiding them. Something that I suspect is perfectly fine with the RigidBodyCombiner.

But yes, I’d be happy for rdb to weigh in on this!

Actually, I think that it does affect the state-sorting that Panda does.

Perhaps there’s some clever combining going on behind the scenes, but in general, the more states one has (whether transforms, or shaders, or render-effects)–even I believe on empty nodes–the more work the engine has to do before rendering.

I’m sorry but you’re just being confusing here. In Panda, this is how you generate geometry and put it in the scene to be ready for rendering: GeomVertexData → GeomPrimitive → Geom → GeomNode → NodePath.
“Mesh”, “model”, these aren’t Panda terms. Only now it’s clear that by Mesh you mean Geom, and only because you stated that GeomNode contains Meshes. You still didn’t explicitly say you mean Geom by “Mesh”. This is what made it confusing to understand what you were talking about exactly.

I’m arguing that the CPU can be a bottleneck

But it’s not here, so how is it relevant to the topic?

Simulan established that–for some hardware at least–it does provide a benefit.

At the initialization phase for few milliseconds, somehow. That’s not really the main issue here, it’s pretty much a non-issue here.

But it’s not all about rendering.

Again, it is for this topic.

Actually, I think that it does affect the state-sorting that Panda does.

My comment you responded to was that it does not have effect on rendering (GPU) performance.

the more states one has (whether transforms, or shaders, or render-effects)–even I believe on empty nodes–the more work the engine has to do before rendering.

That work is miniscule for this topic and it’s not related to the GPU.

To summarize, I don’t see the miniscule benefits to CPU/RAM being at all relevant for this topic regarding making Minecraft type voxel terrian in Panda where the implementation takes no care for the number of Geoms/Meshes on the scene and thus is impractical. Since the two common game engines today, Unity and Unreal, both have proper hardware instacing support, the viewers of the tutorial are going to assume it does what it really doesn’t, and the tutorial author takes no care clarifying that or maybe doesn’t even know about it himself.
The video author chose to introduce the viewers to the Panda3D engine with the worst possible example: by showing how to do something with it which it cannot do practically in realtime at a stable frame rate, without custom implementation of the feature needed. It’s a bad tutorial both for the (reputation of the) engine as well as the viewers. Viewers are going to run the code, see the low fps, and likely move on to another engine. The few viewers who still stick and ask how to solve the issue in the forum, are not going to be persuaded by being told “you can code your own shader for that”, since it’s 2023 and almost no one has time to write the engine and expects the engine to support this already.
Is my gripe with the video clearer now?

Indeed, this engine is not for game developers looking for the easiest way to make a game. I’m not the maker of the tutorial you’re referring to, but I imagine they picked a “Minecraft clone” project due to the popularity of that kind of thing, not because it was necessarily a great fit for the engine.

It’s not that it’s not a good fit, it’s a great example if you want to demonstrate why the engine is not good.

Yes, I grok your general thrust here, and please feel free to use Unreal Engine 5, Unity, Godot, or any of the other ones.

I think it’s worth noting that the conclusions of sutemp are incorrect with respect to Panda3D, since hardware Instancing is the use of for a specific type of game. Out-of-the-box support doesn’t make any sense, as it may be redundant.

Panda3D does not restrict the user in this technique, there is a lot of code in the forum examples proving this.

As for creating software instances, panda conducts it in accordance with the principles of OOP, you do not need to use anything superfluous, you can load one box into the scene 1000 times, while the data of one will be loaded into RAM. Only when the attributes are changed, the data affected by the change will be duplicated. However, this does not really affect the rendering speed in any way, since the attributes and transformation of the object into graphic memory will be transferred 1000 times.

Hardware instances solves this problem, its application actually depends on the programmer’s qualifications and panda does not impose any restrictions for this.

Sorry I completely disagree with this point. Also, specify your opinions as opinions, I’m not “incorrect” and what I say doesn’t “doesn’t make any sense” simply because you disagree. It’s not respectful.

Even though I use Panda3D for some projects, I’m not going to sit here and pretend that a basic old feature like hardware instacing is something each developer has to implement for their specific application because it can’t be standardized. Other engines prove this is a false idea.

This is just us coping with using an outdated engine which is not helpful if we want it to at least slowly improve.

Lumberyard also seems to have an automatic way to perform hardware instacing via the “DrawCall Batching” feature.