• Ingen resultater fundet

In this chapter we have discussed parallel rendering in general with a special focus on scalability. State of the art in current scalable commercial rendering architec-tures has been covered. An introduction to general concepts in parallel rendering has been given with an overview of some of the available options for implementing a scalable graphics architecture. This overview seems to point towards a primar-ily sort-middle architecture based on image-parallel subdivision of the screen into many small square tiles mapped to virtual local framebuffers. For each tile bucket sorting and buffering of work is used to load balance the jobs across virtual pro-cessors, each optimized for rendering one small square tile. In addition a partial sort-first architecture using object-parallel subdivision of the 3D model input data looks promising. The input data is split into many small sub-objects to distribute work over several geometry processors while maintaining data coherence. Finally sort-last is used to assemble the final image from tiles. Image composition of over-lapping tiles might be useful in order to allow the architecture to scale even further, if correct handling of transparency is not an issue.

Designing a Scalable Graphics Architecture

This chapter presents an analysis of how theHybris graphics system architecture is designed and implemented at a high abstraction level slightly above the possible implementations. The intention is to specify a portable and scalable architecture which may be implemented for many different software or hardware based com-puter technologies. Hybris is designed to reduce the computational load at many levels and to be scalable. The graphics architecture was originally named HPGA which is short for Hybrid Parallel Graphics Architecture, and later renamed to Hybris (Danish for Hubris). This graphics architecture is a hybrid because it ap-plies a combination of several types of parallelism in order to scale. Chapter 18 of [63] defines a hybrid-parallel graphics system as one which uses a combination of object-order and image-order rasterization techniques.

3.1 Understanding the problem

In order to define and implement a scalable graphics architecture we need to un-derstand the operation of all parts of the graphics pipeline. This unun-derstanding will help the design for a scalable graphics architecture.

Traditionally thegraphics pipelineis a serial processing pipeline for processing one graphics primitive at a time. While a straightforward implementation of this model lends itself towards easy implementation in both software and hardware, it is not necessarily the most optimal.

Designing ascalablearchitecture for graphics forces us to take a new look at the graphics pipeline. Distributed processing is needed for good scalability, leading to an architecture composed of multiple localized data processing units. The

pro-41

cessing units are not identical, several different unit types are needed to form the graphics pipeline. From the overview of scalability in graphics architectures given in the earlier chapter, giving an idea of the generic parallel graphics architecture, we must find a practical implementation.

For developing the Hybris graphics architecture, the architecture has been evolved mainly in a software environment, reflecting useful software implementa-tion methods. The architecture is targeted towards an implementaimplementa-tion with efficient utilization of CPU and system resources such as instruction scheduling, caches and memory bandwidth. Additionally the 3D graphics rendering algorithms used in Hybris are optimized towards achieving these goals. Using a general purpose com-puter for development has allowed us to apply an abstraction of the design process above a straightforward implementation of the archetypical graphics pipeline. The main difference between the software implementation and the standard graphics pipeline is howmemoryis used.

The usage of memory in the graphics architecture allows buffering of tem-porary data and variables. While buffering enables re-use of earlier calculated data values, an equally important aspect is that the memory can be partitioned to match the data coherence present in computer graphics. A useful way to improve memory usage in any system is to apply loop fusion and strip mining techniques [117, 143, 228], which are examined in relation to Hybris in the paper [88], see also the next chapter.

To enable a scalable architecture, a workload distribution scheme must be ap-plied as well as a practical way to collect and combine partial results into the final result, in this case one rendered frame of interactive real-time animation.

The software version of the Hybris rendering architecture was originally devel-oped by breaking the rendering pipeline apart into several independent functions or loops, each reading its input data from memory and writing output data back to memory. This isolation of components made it possible to test each component separately and test various implementations of each component. However, when the components are configured into a graphics pipeline this approach is not neces-sarily optimal when compared to the direct pipeline approach where no temporary data exist in memory like that.

The advantage of exposing temporary data in memory is greater freedom to experiment with various memory access schemes for data reuse and data access coherence as well as enabling a means for data redistribution for use in parallel im-plementations. Data blocking or chunking schemes has proved to be a very efficient method for optimizing data access performance in computer systems equipped with caches. Previously, data blocking schemes have been widely employed for supercomputer applications e.g. in implementations of the linpack and scalapack mathematical subroutine libraries. Today these techniques are not restricted to

su-percomputers, but can be employed by modern personal computers as well as new ASIC technologies with enough space for on-chip memories.

The Hybris graphics architecture is an attempt at applying data blocking tech-niques to computer graphics. This was done by experimenting with various code transformations by manually applying techniques such as loop fusion and strip mining to achieve good cache and memory utilization.