Sunday, August 31, 2008

Brute Force Tesselation and Quadratic Approximation of Subdivision Surfaces

**Update** This post is somehow outdated and I just lost this small demo. Sorry for that.

Hello there, next post!
Here no relationship with ray tracing; on the contrary, it is all about rasterization and tesselation.
First, some screen shots of the demo provided with this post:

The standard mesh with no tesselation (2408 triangles)

A close up on the tesselated surface in wireframe mode (10,000,000 triangles). About 200 millions triangles per second are displayed on my low-end 8400GS (Here the frame rate is lower due to the wireframe mode slow on Nvidia)

A simple procedural displacement is applied on the tesselated mesh

This technique actually relies on several techniques developed by Tamy Boubekeur.
  • Patch Tesselation
The first thing to do is to tesselate the input mesh. To achieve this, we simply replace each triangle by a patch, i.e. a set of barycentric coordinate triangles. Then, instead of displaying each triangle, we replace each of them by an instanciated patch. In the demo, we therefore create two vertex streams. The first one contains all the data for each triangle and another one simply contains the barycentric coordinates of the triangles of a patch. Then, the instanciation ability of Direct3D allows us to repeat the patch along the surface of the object. For more details about the technique, may one refer to the Graphics Hardware '05 article by Tamy here. However, in this article, no Direct3D instanciation is used.

So, why is this method good? Actually, if the patches are large enough (i.e. if the tesselation rate is high), the pre-transform and post-transform caches used by the GPUs become more and more efficient so that we directly get the maximum theoretical performance of the rasterizers. For example, the frame rate does not change when you use 64 times more triangles. In fact, we can expect the same level of performance for the GPUs which will support tesselation units.without the need to use any hand-coded patches. In my 8400GS, I got 200 millions triangles per second but we may easily expect 1 billion triangles per second on DirectX11 GPUs.

By the way, I use in the demo a simple optimization of the patch using the very good and simple indexation optimizer, "Tipsify" (you may find the reference here). This reorders the patch indexation in linear time using only triangle / point adjacency while maintaining a very good level of performance (and this is much simpler and much faster that stripification algorithms like TriStrip or NvStrip).
  • Approximation of Subdivision Surfaces
The tesselation algorithm via the Direct3D instanciation is the core system to provide many triangles. However, the way triangles are generated (via tesselated patches) does not seem appropriate to the evaluation of subdivision surfaces since they are complex recursive algorithms. (By the way, one may find excellent references and details on subdivision surfaces in the awesomo tutorial located here). In the article, QAS (located here), Tamy proposes to replace the recursive evaluation of the subdivision surface by an analytical approximation. The principle is pretty simple: first, project the triangles vertices and the middle of every edge on the limit surface (i.e. the subdivision surface) and compute a quadratic Bezier triangle which fits these 6 points (the vertices and the edge middle points). This leads to a pretty fast and nice approximation of subdivsion surfaces.
  • About Adaptive Tesselation
In this article, Tamy proposes a very practical way to adaptively tesselate patches without cracks. This may lead for example to simple terrain rendering algorithms (if we ignore the possible memory problems which may require texture mip-mapping or page faulting systems). However, adaptive tesselation on current hardware (or rather APIs since Radeon already have tesselation units) may be less efficient for some cases since it requires to change the patches and therefore to queue more draw calls. However, with nicer APIs (like DX11), this will lead to super efficient solutions.
  • Some Details about Asset Production
What can we do with displacement? Actually, many tools already provide the assets. Using ZBrush, we already have the mesh + normal map + displacement map. Rendering this is simple: tesselate the mesh a subdivision surface approximation, displace it and light it correctly using the normal maps in the shader. Another super nice tool would to extend the mega-texture technique (the original article by Sylvain Lefebvre may be found here) to directly paint displacement on objects. However, the QAS technique may suffer from serious drawback when the artists decide to use creasing. In a modeler like Maya, the artist may work in subdivsion surface mode. He can view the tesselated mesh while edition the control mesh. He can then crease triangles, edges or vertices in any level of tesselation and then see the resulting subdisvion surface. For this kind of input, the QAS algorithm requires deep modification. Nevertheless, even if it seems necessary to provide ways to crease the geometry, none of the artists I work with produces the asset in this mode (A ZBrush approach + polygonal mode edition is generally the way they produce meshes).
  • The Demo
Here is the demo. This demo does not include displacement maps but I quickly hacked a stupid procedural displacement in the shader (that you may change). Using a displacement map does not change the performance on a 8800GT (due to the large number of texture units on it). On the 8600GT I have in office, the application becomes texture bound. I do not provide the complete source code yet since it is a bit in a dirty state. I will however try to post it ASAP.



Tamy Boubekeur said...

Hi Ben,
Nice post, and interesting comments on actual DX implementation. You're right about the modeling procedure: an approximation model like QAS is not adapted for multiscale editing. The input geometry is rather to be defined as a coarse mesh + disp-map than a multiresolution subdivision surface (Zorin 97), where geometry can be edited at any level. Note that multiscale editing may still be performed by using hierarchical disp-maps, such as wavelet-based ones. But it remains less flexible than direct polygonal editing at arbitrary level. Overall, models done with ZBrush are better than the ones donewith Maya or 3DS Max (unless they are converted to dips-subsurf) for QAS rendering.

About the tessellation, note that there is an implementation of the Graphics hardware 2005 paper in the nVidia OpenGL SDK (v10). They call that Instanced Tessellation.

Last, two tricks to reduce the number of drawing calls when doing adaptive tessellation:
1. reorder calls to cluster patches sharing the same refinement configuration.
2. constraint the depth-tag metric to impose a smoothly varying refinement level (this allows to make the matrix of refinement patterns sparse and makes bigger the clusters of the previous trick).
In fact, the adaptive refinement kernel was done to be as generic as possible, but it can be specialized to a particular scene to avoid a specific performance defect, such as memory issues or the number of drawing calls.


Tamy Boubekeur said...

There was an interesting talk by NVIdia at SIGGRAPH 08 about those topics, which is now online:

Unknown said...

Hi Bouliiii,

Do you still have the source/binaries for the demo. I'm trying to implement this myself in directx9 however there are a few steps I don't quite understand.

There seems to be very few directx9 demos of Instance tessellation.

It would be great if you could re-post the source or perhaps email it to me?