I had a discussion on Discord with a couple of people from the UI team recently about rendering text and thought it appropriate to post here to share it better.
What I want to cover is essentially the problem of rendering arbitrary text in a real-time context. There are a number of requirements for this that yield less than ideal results with the most naive solutions.
- Support arbitrary text resizing. For example, placing text in a 3D world and walking very close to text, you may have a single glyph covering the player’s entire screen. Other cases may include animating UI text sizes.
- Very efficient rendering both in terms of GPU and CPU resources. Especially large spikes are generally undesirable even as text changes in many ways.
- Dynamic visual effects such as shadows, outlines, gradients or glow.
- Support for arbitrary glyphs as defined by font files, even emoji and others
Before I go into the industry solutions I know of, let’s cover the source data we are actually trying to render. Mike Acton would be proud.
- Text is usually defined as a series of Unicode “Code Points”.
- A Font is a collection of Glyphs where each glyph maps to a Code Point.
- A Glyph is a representation of the actual visual artifact that is put on the screen.
The most popular glyph representation in a font is “vectors” (TrueType, OpenType for example) which might be familiar if you’ve done graphic design. Illustrator, SVG etc. A vector font uses lines and curves to represent a glyph.
So to summarize, the job of a text rendering system is then to take a position on the screen, an input string of Unicode Code Points and output something in an image buffer.
Seems pretty straight-forward so far, right?
There are essentially two parts:
- Find out which glyphs go where based on the string input. This stage outputs vector primitives and positional information.
- Render the vector primitives.
Should be easy & fast, right? Well, it turns out languages are hard, and there are things like right-to-left languages, composite characters, control codes and other fun stuff that makes the first part a bit more complicated. Laying out glyphs, doing kerning and handling all the unicode code point stuff is a lot of work but usually the OS or a third-party library can handle this for us. Unfortunately because of the (necessary) complexity of Unicode, it is hard to offload this part to the GPU.
Rendering vector primitives on the GPU is kind of solved now? I mean, people have done it, but I don’t think it’s actually used in any of the big game engines yet.
CPU Texture Rasterization
While Languages Are Hard, your OS or libraries like FreeType have good, simple-to-use support for taking Unicode text and rasterizing onto a bitmap texture at a specific font size. The texture can then be uploaded to the GPU and rendered in-engine like any other texture.
The good part about this technique is that it’s really simple to implement, really easy to render and the CPU rasterization is generally quite optimized. The benefits really end there though, unfortunately.
While CPU rasterization is generally fast, it’s not fast enough to run every frame on every text string on the screen. To implement this properly, you’ll need to cache both the layout and rasterization step. The most popular approach is to maintain cache textures with glyphs on the GPU and render them just like you’d render a spritesheet. Sprite = Glyph, essentially. It’s possible to upload only the glyphs that changed when new glyphs need to be added to the GPU cache so it doesn’t need to break your GPU bandwidth budget either.
Since glyphs are stored in a texture, you can’t render at a different pixel screen size without artifacts. This usually means you need to keep text in your game at a few set sizes and maintain separate caches for each size.
Dynamic visual effects such as shadows, outlines or glows are a bit hacky to implement. Usually you’d draw each glyph N extra times in each direction with a different color to create outlines, shadows or glows.
On the plus side, implementing this solution can unify the sprite and text rendering passes. Rendering a text string will be a matter of rendering a number of sprites at very specific positions, so a UiText component may produce a number of sprite draw commands to the sprite rendering pass for example.
Unity’s Text implementation uses this solution.
Signed Distance Fields encoded in Texture
Chris Green’s beautiful paper has inspired a lot of implementations for good-looking text rendering in games. Valve’s games use it and for Unity there’s TextMeshPro.
Basically: each pixel contains distance to the closest curve in the image instead of the alpha value of the rasterized glyph. But you should read the paper instead.
- Rendering the text requires a relatively simple custom shader so it’s quite fast.
- Since the texture encodes distance instead of alpha, bilinear interpolation works well. The text scales from very small to very large without many visible artifacts.
- Outlines, glows and shadows are easy to implement thanks to the distance field.
- Generating the distance field takes a really long time, especially with the method detailed in the paper. This generally means that it’s not viable to generate new glyphs for the cache at runtime - all glyphs that you want to use in your game needs to be pre-generated. This makes it a non-starter for player-generated text, but is fine for all static text.
The implementation for this solution looks quite similar to the previous one with the exception that generating the texture cache and the rendering pass differs, and you generally need fewer sizes. Maintaining texture cache, looking up glyphs etc will be identical. But it does require a custom shader and pass.
Rendering Vectors on the GPU
Aras’ Font Rendering is Getting Interesting is a good take on the solutions in this space about a year ago and you might prefer his article over my post. Since he posted it, it looks like there’s a commercial product called Slug that claims to render quadratic Bézier curves on the GPU. Quadratic Bézier curves can be created from vector primitives output by font libraries, so this solves the second step of rendering vector primitives entirely on the GPU. Since it rasterizes on the GPU, scaling artifacts are entirely eliminated but this method may not work on older hardware due to the compute requirements.
Where do we go?
RustType and gfx-glyph are already used in Amethyst to implement solution #1. I think a unification of the sprite and text rendering passes are a good next step.
After that, maybe rendering vectors on the GPU is a more forward-thinking step than implementing the Distance Field in Texture thing?