渲染轮廓,除非你总共只渲染十几个字符,否则仍然是“不可能的”,因为每个字符需要近似曲率的顶点数量。虽然已经有了在像素着色器中评估bezier曲线的方法,但这些方法不容易反锯齿,这在使用距离贴图纹理的四方中是微不足道的,并且在着色器中评估曲线的计算仍然比必要的要昂贵得多。
“快速”和“质量”之间的最佳权衡仍然是带有符号距离场纹理的纹理四边形。这是非常慢的使用一个普通的普通纹理四,但不是那么多。另一方面,质量是完全不同的。结果真的是惊人的,它是你能得到的最快的速度,而且像发光这样的效果也很容易添加。此外,如果需要,可以很好地将该技术降级到较旧的硬件。
有关该技术,请参阅著名的Valve论文。
The technique is conceptually similar to how implicit surfaces (metaballs and such) work, though it does not generate polygons. It runs entirely in the pixel shader and takes the distance sampled from the texture as a distance function. Everything above a chosen threshold (usually 0.5) is "in", everything else is "out". In the simplest case, on 10 year old non-shader-capable hardware, setting the alpha test threshold to 0.5 will do that exact thing (though without special effects and antialiasing).
If one wants to add a little more weight to the font (faux bold), a slightly smaller threshold will do the trick without modifying a single line of code (just change your "font_weight" uniform). For a glow effect, one simply considers everything above one threshold as "in" and everything above another (smaller) threshold as "out, but in glow", and LERPs between the two. Antialiasing works similarly.
通过使用8位符号距离值而不是单个位,这种技术将纹理映射在每个维度上的有效分辨率提高了16倍(不是黑白,而是使用了所有可能的阴影,因此我们使用相同的存储空间获得了256倍的信息)。但是,即使你放大远远超过16倍,结果看起来还是可以接受的。长直线最终会变得有点摆动,但不会有典型的“块状”采样工件。
You can use a geometry shader for generating the quads out of points (reduce bus bandwidth), but honestly the gains are rather marginal. The same is true for instanced character rendering as described in GPG8. The overhead of instancing is only amortized if you have a lot of text to draw. The gains are, in my opinion, in no relation to the added complexity and non-downgradeability. Plus, you are either limited by the amount of constant registers, or you have to read from a texture buffer object, which is non-optimal for cache coherence (and the intent was to optimize to begin with!).
A simple, plain old vertex buffer is just as fast (possibly faster) if you schedule the upload a bit ahead in time and will run on every hardware built during the last 15 years. And, it is not limited to any particular number of characters in your font, nor to a particular number of characters to render.
If you are sure that you do not have more than 256 characters in your font, texture arrays may be worth a consideration to strip off bus bandwidth in a similar manner as generating quads from points in the geometry shader. When using an array texture, the texture coordinates of all quads have identical, constant s and t coordinates and only differ in the r coordinate, which is equal to the character index to render.
But like with the other techniques, the expected gains are marginal at the cost of being incompatible with previous generation hardware.
Jonathan Dummer提供了一个生成距离纹理的便利工具:描述页面
Update:
As more recently pointed out in Programmable Vertex Pulling (D. Rákos, "OpenGL Insights", pp. 239), there is no significant extra latency or overhead associated with pulling vertex data programmatically from the shader on the newest generations of GPUs, as compared to doing the same using the standard fixed function.
Also, the latest generations of GPUs have more and more reasonably sized general-purpose L2 caches (e.g. 1536kiB on nvidia Kepler), so one may expect the incoherent access problem when pulling random offsets for the quad corners from a buffer texture being less of a problem.
这使得从缓冲区纹理中提取常量数据(如四元大小)的想法更有吸引力。因此,一个假设的实现可以通过以下方法将PCIe和内存传输以及GPU内存减少到最小:
Only upload a character index (one per character to be displayed) as the only input to a vertex shader that passes on this index and gl_VertexID, and amplify that to 4 points in the geometry shader, still having the character index and the vertex id (this will be "gl_primitiveID made available in the vertex shader") as the sole attributes, and capture this via transform feedback.
This will be fast, because there are only two output attributes (main bottleneck in GS), and it is close to "no-op" otherwise in both stages.
Bind a buffer texture which contains, for each character in the font, the textured quad's vertex positions relative to the base point (these are basically the "font metrics"). This data can be compressed to 4 numbers per quad by storing only the offset of the bottom left vertex, and encoding the width and height of the axis-aligned box (assuming half floats, this will be 8 bytes of constant buffer per character -- a typical 256 character font could fit completely into 2kiB of L1 cache).
Set an uniform for the baseline
Bind a buffer texture with horizontal offsets. These could probably even be calculated on the GPU, but it is much easier and more efficient to that kind of thing on the CPU, as it is a strictly sequential operation and not at all trivial (think of kerning). Also, it would need another feedback pass, which would be another sync point.
Render the previously generated data from the feedback buffer, the vertex shader pulls the horizontal offset of the base point and the offsets of the corner vertices from buffer objects (using the primitive id and the character index). The original vertex ID of the submitted vertices is now our "primitive ID" (remember the GS turned the vertices into quads).
像这样,理想情况下可以将所需的顶点带宽减少75%(平摊),尽管它只能渲染一条线。如果想要在一个draw调用中渲染几行,就需要将基线添加到缓冲纹理中,而不是使用统一的(使带宽增益更小)。
However, even assuming a 75% reduction -- since the vertex data to display "reasonable" amounts of text is only somewhere around 50-100kiB (which is practically zero to a GPU or a PCIe bus) -- I still doubt that the added complexity and losing backwards-compatibility is really worth the trouble. Reducing zero by 75% is still only zero. I have admittedly not tried the above approach, and more research would be needed to make a truly qualified statement. But still, unless someone can demonstrate a truly stunning performance difference (using "normal" amounts of text, not billions of characters!), my point of view remains that for the vertex data, a simple, plain old vertex buffer is justifiably good enough to be considered part of a "state of the art solution". It's simple and straightforward, it works, and it works well.
上面已经提到了“OpenGL Insights”,有必要指出Stefan Gustavson的“2D Shape Rendering by Distance Fields”一章,其中详细解释了距离字段的渲染。
2016年更新:
同时,还有一些其他的技术,旨在消除在极端放大时变得令人不安的圆角人工制品。
一种方法简单地使用伪距离字段而不是距离字段(区别在于距离不是到实际轮廓的最短距离,而是到轮廓或从边缘突出的假想线的最短距离)。这稍微好一些,并且以相同的速度(相同的着色器)运行,使用相同数量的纹理内存。
另一种方法是在github中使用三通道纹理细节和实现。这旨在对以前用于解决该问题的and-or黑客进行改进。质量好,稍微慢一点,几乎不明显,但是使用了三倍的纹理内存。另外,额外的效果(例如发光)也很难得到正确的效果。
最后,存储构成字符的实际bezier曲线,并在片段着色器中评估它们已经变得实用,性能略差(但还不至于成为问题),即使在最高放大倍率下也会产生惊人的结果。
WebGL演示使用该技术实时渲染一个大型PDF。