Slow TextNode behaviour

Someone on the IRC complained about slow OnscreenText behavior when having much text.
So, I made this tiny sample code:

from direct.directbase import DirectStart
from pandac.PandaModules import TextNode

debugtext = TextNode('node name')
debugtext.setText("aaaa")
textNodePath = aspect2d.attachNewNode(debugtext)

def updateText(task):
  text = "lorem ipsum dolor sit amet\n" * 20
  debugtext.setText(text)
  return task.cont

taskMgr.add(updateText, "aaa")
run()

Try it – I don’t get more than 32 fps on my 8600. -.-
Just 20x “lorem ipsum dolor sit amet”. Is it supposed to be so slow?
PStats tells me 99% of the time goes in “Generate Text”, totalling 28 ms here per frame. Am I doing something wrong, is this normal, or is it a bug in Panda?

pro-rsoft

only 12 FPS here

No, it really is slow. It does a lot of work to assemble the text, so you will want to minimize unnecessary calls to setText() or related functions that force the text to be recomputed.

Though you just reminded me that earlier this month another Panda user has submitted a patch to me that intends to reduce this problem, by reducing the amount of work that is done during text assembly; and I have been sitting on this patch this whole time. Shame on me. I’ll see about incorporating it promptly.

David

Hmm, I’m not sure how the TextAssembler works, but wouldn’t it be an improvement if it cached the generated letters so they can be reused? In this case, it would save all the letters from “Lorem ipsu dl a” and reuse them on the next line?
Once it finished generating the text it can throw away the stored letters.

No, it does cache letters. What’s slow is not the time it takes to generate the individual letters, but rather the final flattenStrong() operation, which copies all the vertices into a single GeomVertexData, assembles all the GeomPrimitives into a single GeomPrimitives, and combines them all into a single Geom.

When there are many letters in the text, this operation can take several milliseconds.

David

I’ve commited the contributed code in question. It does seem to make the generate text operation about twice as fast, a decided improvement. It is, however, still on the slow side, so it’s still probably wise to minimize calls to setText().

On the other hand, if you really want to change your text frequently, you can try putting this in your Config.prc file:

text-flatten 0

this will remove the call to flattenStrong() within the text generation process. Changing the text will be much faster, but rendering the resulting text will be slower.

If you have the latest code I’ve just checked in, you will also need to add:

text-dynamic-merge 0

to achieve the same effect.

David

oh wow maybe that’s whats eating my UI!

I do regenerate some text quite often.

David, what’s fixed by that patch ?

I just found that filling a vertex data , as advertised in the manual, is SLOWER than .
By column is almost 31% faster.

from pandac.PandaModules import *
import direct.directbase.DirectStart

VCTarray = GeomVertexArrayFormat()
VCTarray.addColumn(InternalName.make('vertex'), 3, Geom.NTFloat32, Geom.CPoint)
VCTarray.addColumn(InternalName.make('color'), 3, Geom.NTFloat32, Geom.CColor)
VCTarray.addColumn(InternalName.make('texcoord'), 2, Geom.NTFloat32, Geom.CTexcoord)
VCTformat = GeomVertexFormat()
VCTformat.addArray(VCTarray)
VCTVtxFormat = GeomVertexFormat.registerFormat(VCTformat)

vdata = GeomVertexData('textline', VCTVtxFormat, Geom.UHStatic)
prim = GeomTriangles(Geom.UHStatic)
vW = GeomVertexWriter(vdata, 'vertex')
cW = GeomVertexWriter(vdata, 'color')
tW = GeomVertexWriter(vdata, 'texcoord')

iterRange=range(80000)
idx4=0
startT=globalClock.getRealTime()


if 0:
  '''____AS SLOW AS SUGGESTED IN THE MANUAL____
  elapsed : 8.00554908868
  '''
  for i in iterRange:
      vW.addData3f(1,1,1)
      cW.addData3f(1,1,1)
      tW.addData2f(1,1)

      vW.addData3f(2,2,2)
      cW.addData3f(2,2,2)
      tW.addData2f(2,2)

      vW.addData3f(3,3,3)
      cW.addData3f(3,3,3)
      tW.addData2f(3,3)

      vW.addData3f(4,4,4)
      cW.addData3f(4,4,4)
      tW.addData2f(4,4)

      prim.addConsecutiveVertices(idx4,3)
      prim.addVertices(idx4+3,idx4+2,idx4+1)
      idx4+=4
else:
  '''____THIS IS --> 30.9% <-- FASTER____
  elapsed : 5.52589131891
  '''
  for i in iterRange:
      vW.addData3f(1,1,1)
      vW.addData3f(2,2,2)
      vW.addData3f(3,3,3)
      vW.addData3f(4,4,4)
  for i in iterRange:
      cW.addData3f(1,1,1)
      cW.addData3f(2,2,2)
      cW.addData3f(3,3,3)
      cW.addData3f(4,4,4)
  for i in iterRange:
      tW.addData2f(1,1)
      tW.addData2f(2,2)
      tW.addData2f(3,3)
      tW.addData2f(4,4)
      prim.addConsecutiveVertices(idx4,3)
      prim.addVertices(idx4+3,idx4+2,idx4+1)
      idx4+=4

prim.closePrimitive()
geom = Geom(vdata)
geom.addPrimitive(prim)

print 'elapsed :',globalClock.getRealTime()-startT

Hmm, interesting. I’ll investigate this difference, though my first suspicion is that the only reason for the difference is better CPU mem caching in the by-column approach. This isn’t related to the TextNode performance issues, though.

David

Ironic this thread came up; ive been trying to track down the culprit of my fps hit. Been setting text on four objects at 30 hz …

Ty guys

Do you mean to tell me that flattening operation is by-column already ?

Yes, in fact, when the flatten operation uses a GeomVertexWriter or GeomVertexRewriter, it is generally by column, simply because that’s the simplest way to walk through the data. But much of the time for flatten is spent in other tasks.

And I oversimplified when I stated that the cost of TextNode generation was for flatten. A good part of it is the construction and destruction of hundreds of Geom copies, and so forth (which is a big part of what the aforementioned optimization was improving).

David

Note: I’ve just checked in an overhaul to the text generation that makes generating text significantly faster (I measured 75x as fast!)

I also checked in a change that will make flattening text significantly faster in the most common cases.

The recommended optimal settings for text in Panda 1.10 are “text-flatten 0” and “text-dynamic-merge 1”.

good news!
imo, (in Panda 1.81) textNode was slow at updating even with flattening & merge set to 0, but rendering is not slow if flattened & merged.
does the new method improve updating speed without sacraficing rendering speed?

Yes, the new method combines the best of both worlds: it’s faster to generate and it’s faster to render. Faster than any existing combination of those config flags will give you in 1.8 or 1.9.

In fact, I’m considering just removing the “text-dynamic-merge” option, since in 1.10 it is faster in all circumstances to leave it at “1”, except perhaps in rare situations where the new method can’t be used (eg. custom graphics, extruded text). Setting it to “0” in 1.10 disables the new system, which will cause everything to be slow.

Enabling “text-flatten” will still incur a very slight performance drop in exchange for a small gain in rendering performance in the case where you have a text with lots of different colors. I’ve also increased the performance of flattening text, though, so the penalty is very small. Even with it enabled, generating text is still a lot faster than in 1.9.

is this new feature in Panda 1.9.1 ? or future versions?

No, it is only available in the development versions of Panda3D, and will be available in Panda3D 1.10.0.

i found a problem with onscreentext (don’t know whether it only exists in the specific Panda 1.10 build or it exists in every Panda version, i only have Panda 1.10 installed).
with “threading-model Cull/Draw” setting, sometimes some onscreentext nodes are not rendered (i see quick flashing of some nodes). with single thread setting, that doesn’t happen.