Packet data format for a triangle node - Development of the Hybris rendering architecture

3.2 Development of the Hybris rendering architecture

3.2.11 Packet data format for a triangle node

The tile renderer back-end expects a description of the triangles it has to render for each tile. A data packet protocol for transmitting and storing the information required to express a triangle is needed. To keep things simple the packet only defines a single triangle. Although using individual triangles may be redundant it keeps the design simple, as on-the-fly repartitioning of triangle strips and meshes into tiles is considered to be too complex (for now). Thus a multi-triangle packet format will not be studied. Some other architectures, e.g. InfiniteReality and Neon [158, 141] allow triangle strips, but they do not have to deal with the added com-plexity of tile partitioning. Single triangle packets may be compared to fixed-length instructions in RISC¹architecture design philosophy, while multi-triangle triangle mesh or triangle strip packets may be compared to CISC.

There are several possible options for defining the packet data format, three of which will be identified and discussed below.

Raw. Stores start and end coordinates for each edge, i.e. the three vertices of each triangle are stored, possibly rounded to integer or sub-pixel coordinates. For each vertex, triangle parameter values are also stored, e.g. colours, texture coordinates and depth.

Differential. Stores only a differential expression of the triangle. Starting values for incremental linear interpolation of the edges as well as parameters along the edges are stored, along with differentials to add at each step in forward differencing interpolations.

Plane Equations. Stores half-plane edge functions describing the three edges of the triangle, as well as plane equations for each of the triangle’s parameters.

Each of these three methods have advantages and disadvantages, depending on how we choose to partition the rendering architecture.

Raw triangle description

The first method,Raw, is conceptually the simplest as the three vertices of a trian-gle can be stored and transmitted virtually unchanged, except for some rounding and packing depending on the desired bit-count. When a raw triangle is received it is necessary to set-up the interpolation parameters, which involves a division for each edge to calculate edge slopes. A traditional full-frame rendering engine only reads each packet once, so it may not matter much if these divisions are done before

1RISC: Reduced Instruction Set Computer, CISC: Complex Instruction Set Computer, see [174].

or after transmission of the triangle. However a tile-based renderer must perform these edge slope divisions for every tile the triangle overlaps, making this approach questionable for tile engines. An advantage is good compression properties, as the vertex data size can be reduced by simple quantization.

Systems known to use Raw transmission of triangles include the Neon [140, 141], which allows quantization of data down to 12 bytes per vertex us-ing fixed-point notations, resultus-ing in triangle packet sizes down to 36 bytes in the simplest case. This allows transmission of up to 2.6 million triangles/s using 32 bit PCI. Other examples are [116, 115] and the Silicon Graphics GTX [4, 63] which internally uses a three-step decomposition into first raw triangles then raw edges and finally raw spans, while the SGI InfiniteReality [158] uses raw triangles (or triangle strips) directly on its internal bus.

Differential triangle description

The second method, Differential, performs the triangle set-up divisions required for edge slope calculations before storing the data in the triangle node packet. This approach is tied to an implementation which uses slope based interpolation either by direct evaluation or by forward differencing. A naive implementation would need to store interpolation start values for the first vertex of each edge, and com-pute parameter interpolation slopes for each span from the interpolated parameter values at the edges. Examples of the differential method are found in the 3Dlabs GLint, the 3dfx Voodoo Graphics accelerator and other early PC graphics accelera-tors without hardware support for triangle set-up. Most other graphics architectures transfer raw triangles, even though they use differential expressions internally, re-quiring them to do triangle set-up every time the triangle is read. Since a global framebuffer architecture only reads a triangle node once this can be an advantage.

Since we choose to support only triangles, parameters can be expressed by plane equations as discussed earlier. By storing thex-axis derivative of each pa-rameters plane equation we can avoid a division per span. In addition this allows us to skip interpolation of parameters along edges at the end of the spans, or both at the beginning and end of spans if they-axis plane equation derivative is also stored, although this will complicate incremental evaluation. A better approach is to al-low an adaptive incremental interpolation direction for interpolating along spans.

Since one side of a screen-projected triangle will always have only one edge, while the two other edges will be on the other side (one might be a top/bottom edge), we can start span interpolation from the side of the triangle which has one edge.

The advantage is that parameter interpolation start values and slopes are only re-quired for one edge, saving both space and calculations, but the span interpolator must allow both left-to-right and right-to-left incremental evaluation, and a method

Left edge, interpolate xleft only

Right edge, interpolate xright & parameters

Span interpolation

Span Right edge, interpolate xright only

Left edge, interpolate xleft & parameters

Span interpolation

Span

Figure 3.9: Adaptive bidirectional incremental interpolation for triangle ren-dering. Left: Left-to-right incremental interpolation. Right: Right-to-left incre-mental interpolation.

for identifying the direction is needed. Figure 3.9 shows how the adaptive bidi-rectional incremental interpolation method works. A future implementation of the adaptive interpolation direction method might extend the concept to allow both top-to-bottom and bottom-to-top scanline interpolation.

By storing differential expressions we can reduce the number of necessary op-erations per tile, as triangle set-up per tile would be magnified by the bucket sorting overlap factor. Combined with the optimizations above, the differential method is judged to be a good data format for the triangle heap.

To summarize, the triangle heap node packet data format in Hybris uses differ-ential expressions, some of which are based on plane equations. There seems to be a tradeoff between computing triangle setup before or after sending the triangles over a redistribution network in a parallel renderer, depending on whether we per-form set-up once before writing the node or possibly multiple times after reading the node in the tile renderers. The number of writes vs. the number or reads ratio influences this, and depends on the bucket sorting overlap factor.

As an example, the Hybris dual CPU parallel software implementation uses two writers and two readers, where each reader may read a node multiple times depending on the overlap. As the triangle set-up calculations are stored and re-used this architecture works well in software. The hardware tile renderer back-end implementations benefit from not having to do triangle setup in the back-back-end, simplifying their design greatly.

Plane equation triangle description

The last method,Plane Equations, performs evaluation of the parameters required to express edge functions and plane equations prior to storage. The edge functions and plane equations are subsequently used to render the triangle using either di-rect evaluation or an incremental algorithm such as Pineda’s [184]. An advantage of using edge and plane equations is that their parameters can be calculated com-pletely without use of divisions. Since edge and plane equation parameters can be calculated directly from the raw triangle vertices, implementations tend to use on-the-fly plane equation setup from a raw triangle description. The example from before, Neon, transmits raw triangles but uses edge function setup and incremental evaluation internally. The design in [235] also transmits raw triangles, although it also uses internal buffering of plane equation parameters.

The best match for the plane equation description model is the SGI Reali-tyEngine [5] which transmits plane equation packets on its internal triangle bus.

However this approach was abandoned in the InfiniteReality [158] which transmits raw vertices on its internal redistribution bus, forming triangles at the receiving end by interpreting the vertex stream as triangles or triangle strips. Pixel-Planes [64, 65] first evaluates the plane equation parameters and then transmits them to an array of pixel processors, each of which evaluate the plane equation in parallel.

PowerVR [188] is suspected to transmit plane equations, although nothing concrete about this has been published.

In document Design for Scalability in 3D Computer Graphics Architectures (Sider 74-77)