[SOLVED] Performance Issue: primitive->add_vertices()

tobspr · July 29, 2013, 4:23pm

I’m generating chunks dynamically. I have ~4000 vertices, and ~8000 faces. So I’m calling

vertex_writer->add_data3f(x,y,0);
texcoord_writer->add_data2f(x,y);

arround 4000 times, and:

g_tris->add_vertices(p1, p2, p2);

arround 8000 times.

Without these calls, my code needs 0.0ms. With these calls, it needs 140ms. Should that be?

Tell me if I shall provide some code.
I’m also wondering wheter it’s a problem that I’m passing the writers and the triangles-primitive through pointers to another function (which then creates the faces)?

Thanks in advance
Tobias

drwr · July 29, 2013, 7:05pm

Since the Python wrapper functions turn around and call the C++ functions of the same names, it’s hard to believe that a Python implementation would be faster than a C++ function that calls exactly the same functions. So, perhaps the bottleneck is somewhere else, or perhaps your two implementations are not exactly the same?

David

tobspr · July 30, 2013, 6:50am

Yeah, my python implemention isn’t completely same, so maybe not compareable … but I am wondering why loader.loadBamFile is so much faster (it takes 0.0 ms to load the same geometry), as it also generates geometry … maybe I can do it the same way? It is possible to pass a bunch of vertices to the GeomVertexWriters? Or change the write pointer?

Edit: I have already found this forum post: Faster way to build a mesh than GeomWriter ? and will give it a try … But is there a concrete example for modifying the data? I used

printf("'%s'", g_data->get_array(0)->get_handle()->get_data());

and got ‘∟’ for ~500 vertices … which won’t help me much … also I am wondering in which format the data is stored in the string?

Thanks in advance, and sorry for so much questions
Tobias

drwr · July 30, 2013, 5:01pm

It shouldn’t take 140 ms to call add_data() and add_vertices() 8000 times. Something else must be slowing you down.

That said, if you want to format the vertex buffer by some other means, you can then stuff the formatted buffer directly into the GeomVertexData as described by these other threads. The data is in the form of embedded raw binary data, exactly as described by the GeomFormat–that is, if your GeomFormat says the first column is three floats beginning at byte 0 and continuing for 12 bytes, then the first 12 bytes of your data is three raw floats. Of course it won’t be a string that you can print via Python’s “%s” operator.

David

tobspr · July 30, 2013, 6:05pm

I have added a define in my code, which when enabled, deactivates exactly these calls. With the define enabled, it takes 0.0 ms … so it’s definetly the add_data() and add_vertices(). I’m already creating the GeomVertexData and so on as mentioned here: https://www.panda3d.org/manual/index.php/Creating_and_filling_a_GeomVertexData.

Maybee there is a problem because I’m passing pointers to the writers to the functions?

Back to editing the raw data:

g_data->get_array(0)->get_handle()->get_data()

I did not expect an output of ‘0.000|0.000|1.000’ for example. I’m was just wondering I’m getting back only one char, but that was only because of non-printable characters. I checked the size and got ‘91546256’, so I’m able to read the data … but how do I find out which format takes which bytes in which order? I’m using V3T2, so it basically uses 5 floats, but how are these floats stored?

Is there a code snippet how to generate the new string? It would really help to see some example doing this.

I’ve also attached the source code … the calls are in line 29 and 56. Feel free to correct me

Thanks for your help!
Tobias
ChunkGenerator.zip (4.07 KB)

rdb · July 30, 2013, 11:40pm

I think that there’s a method for reserving the number of vertices that you intend to add. This will cause Panda to preallocate a buffer of the required size, which should be faster than adding your vertices one by one.

drwr · July 31, 2013, 12:31am

rdb’s suggestion is a good one: you can call GeomVertexData::reserve_num_rows() ahead of time to set the number of vertices you expect to add; and you can call GeomPrimitive::reserve_num_vertices() ahead of time to set the number of index numbers you expect to add. This may be a significant speed up.

But if it’s not, it’s worth figuring out why you’re running so slow, instead of just tossing these very useful functions out the window prematurely. I’d start by comparing your C++ code to your Python code, which you say runs quickly. What’s different?

If you print your GeomVertexFormat, it will tell you exactly how the data is packed. For instance:

>>> print GeomVertexFormat.getV3t2()
Array 0:
  Array format (stride = 20):
    vertex(3f) float32 point start at 0
    texcoord(2f) float32 texcoord start at 12

This tells you that the format is one array, with 20 bytes per row (stride). Of these 20 bytes, the first 12 (3f * float32) are the three floats of the vertex, and the next 8 (2f * float32) are the two floats of the texture coordinate. You can use the struct module with a format string of ‘fffff’ to unpack (or pack) these five values 20 bytes at a time. If you are writing this in C++, you can copy the values into a buffer using memcpy or some such.

But it’s probably just easier to figure out why your add_vertex() and whatnot is slow, and fix that.

David

tobspr · July 31, 2013, 9:08am

GeomVertexData::reserve_num_rows() did partially work for me … I tested it with a basic plane (1800 vertices), and it is now arround 4-10 times faster (takes ~16ms), but still not fast enough (my code may take 0.5ms max, if not less).

I forgot that my python code would load a bam file when the chunk is already cached … that explains why it ran so fast If I disable caching, it’s slower than the c++ code …

Edit: I am now creating the binary string … It seems to work … and is much faster (almost 0.0ms)!
My next attempt would be to modify the primitive … are the vertice-indexes stored in the same way?
I’m also wondering why the data is internally stored as a binary string … and not as an array of floats for example.

Edit 2: I got the primitive to work … but it seems to mix my vertices (see http://prntscr.com/1ik7ce)? I passed the data to the primitive as 3 int32’s, so the stride is 12 … is that correct?
Beside, I now can generate 1000 chunks in 0.01ms (~ 4mio vertices) … so the performance problem is fixed.

Thanks for your help!
Tobias

drwr · July 31, 2013, 9:01pm

Vertex indices are stored as a buffer of single integers, usually 16-bit integers. So the stride is 2 in that case. You can use GeomVertexArrayData::get_array_format() to see the format it is in by default, or you can replace the format with a 32-bit integer data if you require 32-bit indexes (but there is often a render performance cost for going to 32-bit indexes).

I don’t think this question makes sense. Data is data. Whether it’s a binary string or an array of floats is just a question of how you look at the data, and has nothing to do with how the data is stored–they’re both the same thing in memory.

David

tobspr · August 1, 2013, 8:11am

It’s finally working now! Thanks for your help!

It’s stored same in memory, yeah, but why is it’s type string? and not float* ?. I mean, it’s containing floats (or int16s for example). In my case, I had to first fill an array of floats, and then generate a string from it … instead of directly passing the array.

Tobias

rdb · August 1, 2013, 9:48am

The array doesn’t need to necessarily contain just floats. It could be a combination between floats and unsigned bytes (for colour information or so). This all depends on the format you’ve configured the array to be. Furthermore, Python doesn’t even have a float* type. (Python 3 does have a “bytes” type to represent a memory buffer; it is equivalent to Python 2’s “string” type.)

drwr · August 1, 2013, 4:36pm

Note that GeomVertexArrayDataHandle::set_data() exists for the Python programmer’s convenience. It accepts a string because Python (as rdb points out) uses strings for all raw-data purposes.

If you’re working in C++, you can use GeomVertexArrayDataHandle::get_write_pointer() to copy the data from your own float array without having to go through a string. Be sure you call set_num_rows() first to allocate the memory you’re about the copy into, and know what you’re doing when you do low-level copies like this.

David

tobspr · August 2, 2013, 5:11am

Get_write_pointer() sounds much better, I’ll try it now

Thanks
Tobias