**Update** This post is somehow outdated and I just lost this small demo. Sorry for that.

Hello there, next post!

Here no relationship with ray tracing; on the contrary, it is all about rasterization and tesselation.

First, some screen shots of the demo provided with this post:

Hello there, next post!

Here no relationship with ray tracing; on the contrary, it is all about rasterization and tesselation.

First, some screen shots of the demo provided with this post:

A close up on the tesselated surface in wireframe mode (10,000,000 triangles). About 200 millions triangles per second are displayed on my low-end 8400GS (Here the frame rate is lower due to the wireframe mode slow on Nvidia)

__

A simple procedural displacement is applied on the tesselated mesh

__

This technique actually relies on several techniques developed by Tamy Boubekeur.
__

A simple procedural displacement is applied on the tesselated mesh

__

- Patch Tesselation

The first thing to do is to tesselate the input mesh. To achieve this, we simply replace each triangle by a patch, i.e. a set of barycentric coordinate triangles. Then, instead of displaying each triangle, we replace each of them by an instanciated patch. In the demo, we therefore create two vertex streams. The first one contains all the data for each triangle and another one simply contains the barycentric coordinates of the triangles of a patch. Then, the instanciation ability of Direct3D allows us to repeat the patch along the surface of the object. For more details about the technique, may one refer to the Graphics Hardware '05 article by Tamy here. However, in this article, no Direct3D instanciation is used.

So, why is this method good? Actually, if the patches are large enough (i.e. if the tesselation rate is high), the pre-transform and post-transform caches used by the GPUs become more and more efficient so that we directly get the maximum theoretical performance of the rasterizers. For example, the frame rate does not change when you use 64 times more triangles. In fact, we can expect the same level of performance for the GPUs which will support tesselation units.without the need to use any hand-coded patches. In my 8400GS, I got 200 millions triangles per second but we may easily expect 1 billion triangles per second on DirectX11 GPUs.

By the way, I use in the demo a simple optimization of the patch using the very good and simple indexation optimizer, "Tipsify" (you may find the reference here). This reorders the patch indexation in linear time using only triangle / point adjacency while maintaining a very good level of performance (and this is much simpler and much faster that stripification algorithms like TriStrip or NvStrip).

So, why is this method good? Actually, if the patches are large enough (i.e. if the tesselation rate is high), the pre-transform and post-transform caches used by the GPUs become more and more efficient so that we directly get the maximum theoretical performance of the rasterizers. For example, the frame rate does not change when you use 64 times more triangles. In fact, we can expect the same level of performance for the GPUs which will support tesselation units.without the need to use any hand-coded patches. In my 8400GS, I got 200 millions triangles per second but we may easily expect 1 billion triangles per second on DirectX11 GPUs.

By the way, I use in the demo a simple optimization of the patch using the very good and simple indexation optimizer, "Tipsify" (you may find the reference here). This reorders the patch indexation in linear time using only triangle / point adjacency while maintaining a very good level of performance (and this is much simpler and much faster that stripification algorithms like TriStrip or NvStrip).

- Approximation of Subdivision Surfaces

The tesselation algorithm via the Direct3D instanciation is the core system to provide many triangles. However, the way triangles are generated (via tesselated patches) does not seem appropriate to the evaluation of subdivision surfaces since they are complex recursive algorithms. (By the way, one may find excellent references and details on subdivision surfaces in the awesomo tutorial located here). In the article, QAS (located here), Tamy proposes to replace the recursive evaluation of the subdivision surface by an analytical approximation. The principle is pretty simple: first, project the triangles vertices and the middle of every edge on the limit surface (i.e. the subdivision surface) and compute a quadratic Bezier triangle which fits these 6 points (the vertices and the edge middle points). This leads to a pretty fast and nice approximation of subdivsion surfaces.

Enjoy!

- About Adaptive Tesselation

- Some Details about Asset Production

- The Demo

Enjoy!

## 3 comments:

Hi Ben,

Nice post, and interesting comments on actual DX implementation. You're right about the modeling procedure: an approximation model like QAS is not adapted for multiscale editing. The input geometry is rather to be defined as a coarse mesh + disp-map than a multiresolution subdivision surface (Zorin 97), where geometry can be edited at any level. Note that multiscale editing may still be performed by using hierarchical disp-maps, such as wavelet-based ones. But it remains less flexible than direct polygonal editing at arbitrary level. Overall, models done with ZBrush are better than the ones donewith Maya or 3DS Max (unless they are converted to dips-subsurf) for QAS rendering.

About the tessellation, note that there is an implementation of the Graphics hardware 2005 paper in the nVidia OpenGL SDK (v10). They call that Instanced Tessellation.

Last, two tricks to reduce the number of drawing calls when doing adaptive tessellation:

1. reorder calls to cluster patches sharing the same refinement configuration.

2. constraint the depth-tag metric to impose a smoothly varying refinement level (this allows to make the matrix of refinement patterns sparse and makes bigger the clusters of the previous trick).

In fact, the adaptive refinement kernel was done to be as generic as possible, but it can be specialized to a particular scene to avoid a specific performance defect, such as memory issues or the number of drawing calls.

Tamy

There was an interesting talk by NVIdia at SIGGRAPH 08 about those topics, which is now online: http://developer.nvidia.com/object/siggraph-2008-Subdiv.html

Hi Bouliiii,

Do you still have the source/binaries for the demo. I'm trying to implement this myself in directx9 however there are a few steps I don't quite understand.

There seems to be very few directx9 demos of Instance tessellation.

It would be great if you could re-post the source or perhaps email it to me?

Post a Comment