Tech-Hounds.com

Because gamers play games, not benchmarks




A Primer to 3D Graphics - Part 2

After reading part 1, you should have a basic understanding of how 3D graphics work. We've discussed how objects are defined in 3D space, the math involved and how we try to 'emulate' the real world in 3D graphics. We even discuss a little bit about filtering and anti aliasing techniques that are widely used in real time 3D graphics today. In this part 2, we'll talk about the technical aspects of 3D graphics.

Going Deeper with 3D Graphics

The whole process of displaying 3D graphics on your screen is usually called rendering. Much of this process is done in the graphics pipeline. You might be wondering what this 'pipeline' is - well it's just a term describing the series of task your PC must complete to render 3D graphics. Basically, we've talked about them in part 1 - the geometry stage where we describe the object, perform the needed translation, rotation and scaling, clipping the world and doing perspective correction. We may also choose to 'lit' our object using lights so we have to calculate them too. There's also the rasterization stage where we add color, texture and other special effects to our objects. Information from the geometry part, including lighting will be also be used. Afterwards, the PC sends that image to your screen. Let's go deeper into these topics and discuss them one by one.

Geometry

Transformation & Lighting

While pixels and texels gets all the attention these days, none of them will matter without the basic geometry operations and definitions widely used in 3D graphics. They really are the basic building blocks of 3D. Without them, we will still be looking at mostly 2D images.

From part 1, we know some of the required math involving 3D graphics. Operations such as translation (moving an object parallel to an axis), rotation (rotating an object around an axis), scaling (changing the size of the object along an axis) are usually called transformation operations. Included in geometry transformations is matrix operations needed to present the 3D world to the viewer - us. Every operation - translation, rotation, scaling - are done and then 'perspectively corrected' in reference to the camera - our perspective. If one part of an object - for example the hand of a human model - is moved, the graphics hardware must perform the needed transformation on the related vertex and polygons (the hand). These transformations are usually done in relation to the object's point of origin (its center). Then the graphics hardware must convert the end result position of the hand into world coordinate space. This operation will provide the position of the model in the world. The last matrix conversion will be to determine both the position of the model and the camera - is it visible to the camera? There's also the operation to portray depth - usually scaling it to the center of the screen which serves as the camera's point of focus.

Translation matrix Scaling matrix

X Rotation matrix

Y Rotation matrix

Z Rotation matirx

Additional operations such as back face culling, clipping are also done. These operations are used to minimize unnecessary workload. We don't want 'unseen' objects, polygons and vertices to be included in the rasterization process. Of course, the next step will to to plot the positions of each object, polygon and vertex on the screen - perspectively corrected. But before doing that, we could choose to do some light calculations. They will provide visual cues for the viewer to complete the illusion of seeing the scene in 3D. So, as you can see, geometry transformations plays an important role in 3D graphics.
How a vertex in 3D space is drawn on a 2D screen How an edge / line in 3D space is drawn on a 2D screen

The matrix transforms as defined in OpenGL

Here's a snippet from the OpenGL standard book from www.opengl.org that describes matrix operations involved in 3D graphics: "The vertex coordinates that are presented to the GL are termed object coordinates. The model-view matrix is applied to these coordinates to yield eye coordinates. Then another matrix, called the projection matrix, is applied to eye coordinates to yield clip coordinates. A perspective division is carried out on clip coordinates to yield normalized device coordinates. An additional viewport transformation is applied to convert these coordinates into window coordinates."

A shape as defined in 3D space, projected onto a 2D screenScan conversion fills our shape with color, drawing the shape by filling the pixels affected

After the coordinates are plotted on the screen, the graphics hardware renders the screen, filling the frame buffer line per line until all the pixels are drawn. The rasterization process will fill each pixels with the appropriate textures, comparing Z values when needed and perform filtering to make sure the appropriate samples are chosen for the texture filtering. All these operations can either be done completely on the graphics card or just some of it. Early graphics hardware rely solely on the processing power of the main processor for geometry processing. Newer hardware available today can do all the processing on its own. Not only will the processor be free from the additional burden of geometry processing, but dedicated geometry hardware on these newer graphics card are leaps and bounds over what can be done in software. An added benefit to this is the graphics rasterization hardware doesn't have to wait for the geometry data from the processor since they're already in the graphics card memory and processed internally.

Extra Tasks of Geometry Processing

One measure of how fast a graphics card can do geometry processing is the amount of vertices / triangles / polygons it can process per second. The higher the number, the more objects we can have on screen, but we can also have them more detailed. As faster and more powerful hardware become available on the market, the level of detail and the amount of polygon we're seeing in games have been steadily rising. Back in the days of NVIDIA's TNT and 3dfx Voodoo2, the game Quake III Arena uses a model with 750 polygons, quite detailed for that time. Compared that to Doom 3, which uses 5000 polygons per character in average. Of course, you need to consider the polygons of all the objects and the whole 'map' or 'level' itself - this can reach around 40.000 to 60.000 polygons. Even the fastest processors today will slow to crawl with a model that size if all geometry processing is done in software by the processor.

However, geometry processing is not just about transformation and matrix operations. One of the possibilities made possible by faster hardware are curves. Rather than using lots of edges to approximate a curve, we can actually send the mathematical equation of a curve to the geometry units in the graphics hardware. Not only can we use this to draw perfect curves (or splines), but also polygons using curves as its edges - usually called N-patches (or higher order surfaces). With this method, we don't have to use as many vertices and polygons which means less memory usage for our 3D model. Remember, each vertex are defined with at least three 32 bit value (at least 12 bytes of data per vertex). The more we use vertices, the more memory we have to set aside for that model. Take for example our average Doom 3 model: assuming a polygon has 4 vertices, the memory usage for one model will be around 5000 polygons x 4 vertices x 12 bytes = 235 KBytes! (the actual figure can be lower, since polygons share at least two vertex with another). Think that's small? Not if you have to set aside that much memory for every frame per animation, which can range to 50 to 150 frames for all range of movement the model has!

Even if the graphics hardware prefer to natively work with straight edges and triangles, they can implement a dynamic level of detail for approximating curves: using more detail when the object is near the camera and less detail when its away. As you can see, curves does require significant calculation. Instead of just two vectors or vertices, we need four (the additional two to control the angle of the curve). An interpolation of the two additional vectors is needed for each part of the curve. Higher order surfaces is not widely used today because of the performance penalty and most graphics card in the market doesn't have the necessary processing power, but with faster hardware this will definitely improve in the future.

Another use of geometry processing power in graphics hardware today is lighting. Remember what we talked about in part 1, about lighting, shading and vertex shaders? Newer graphics hardware can provide very acceptable per pixel shading with very fast frame rates. The actual per color per pixel calculation will be done in the rasterization stage, but the much needed ground work such as cross product calculation, which provide us with the normals of a triangle or polygon and dot product calculation, which provides the light intensity for each pixel by calculating the incidence of light and normals at every pixel, can now be done in hardware. Of course, with very detailed world and models, we still need very fast hardware since we need to do this with every light in the scene, for every model.

Lambert diffuse model Phong specular model

As a reminder, here is the mathematical expression for computing both diffuse and specular reflection. Where the diffuse model is stated as Light Intensity * Proportion of Diffuse Light Reflected * (Light vector * Normal Vector). the specular model is defined as 2 (Normal Vector * Light Vector) ( Normal Vector * Viewer Vector) - (Viewer Vector . Light Vector). The two can be combined to get the complete reflection model which will be stated as Intensity (on surface) = Light Intensity * Proportion of Light Reflected + [Diffuse Model + Specular Model^g]. The dot product is defined as A * B = (Xa.Xb + Ya.Yb + Za.Zb). As for the cross product, its defined as A * B = C.

In general, graphics programmers still relies on a combination of both dynamic lighting and offline pre-rendered lights in their game engines. If you played Doom 3 without shadows, you'll notice that while moving objects don't cast realistic shadows anymore, much of the shadows cast from lights that don't change (the lights can't be turned on / off or they don't move) on every other objects are still present. These are essentially 'free' since they're 'built' into the textures.

All of this are done by the geometry processing unit of the chip. Since the resources and the performance are not infinite, graphics programmers must balance them all. For example, if a game doesn't use lots of dynamic lights, it will be more appropriate to use the geometry power for models and levels with more polygons and details. Bigger levels with low detail models might also be a possibility, using detailed shaders to 'hide' the lack of details. So, with newer games relying more and more on both pixel and vertex shaders, the number of vertices and pixels per second that can be processed by graphics hardware becomes less of a concern. Its a matter of how flexible and how capable they are.

Geometry Performance Considerations

Just like your processor, your ordinary graphics chips uses geometry or vertex units in the graphics chip for geometry processing. The amount of units of course determines how many vertices can be processed per cycle. Remember, these units must operate on at least four 32 bit floating point values, since we use 32 bit values for each axis to define the vertex's position. Some of you might notice something strange here - since we're only using three axis, what's the deal with the fourth value? Well, you're right, we only need three values as coordinates. But remember, we're drawing a 3D scene into a 2D screen, this means we need an additional variable (look back at the matrix definitions).

The internal parts of GeForce 6800

The Internal parts of GeForce 6600

With a single geometry (or vertex) unit, we can process a vertex per cycle. The smallest unit in any 3D model is a polygon with four vertices or triangle with three vertices (a polygon can be thought of as two triangles sharing two vertices), it makes sense to use at least three geometry units. With four geometry units, we can process four vertices or two triangles or one polygon. Let's go back to the GeForce 6800 - it has six vertex units. This means we get six vertices or four triangles per cycle (assuming some of the triangles share vertices). Since the GeForce 6800 runs at 400 MHz this should give us about 2,4 billion vertices per second. If you read the official specs, you'll notice that NVIDIA only admits 600 million vertices per second. Is something wrong here? Well, that's 600 million 'lit' vertices per second, since it factors in the additional computation for lighting. With the same formula, we could see that the GeForce 6600 has only three vertex units (3 vertex units x 500 MHz / 4). This means the GeForce 6600 has less geometry processing power than the GeForce 6800, but it really doesn't matter much anyway. It's been a known fact that consumer 3D graphics hardware is more focused at pixel processing that vertex processing. Games will most likely not reach that high amount of polygons in a single level. For one, developers made their software so that it can ran well on a typical computer, which uses off the shelf consumer 3D graphics hardware (and not professional graphics hardware that emphasizes geometry processing). Second, the levels of geometry processing varies quite a lot between one card and another. Some users will have the best, high end hardware (like the GeForce 6800) while most will likely use more affordable, mainstream solutions (like the GeForce 6600). And again, the lack of detail from less polygon can be hidden by a good shader. Take a look at Doom 3, you'll probably see some blockiness on the models, although they're pretty detailed. This is because they're not using higher order surfaces or lots of polygons to approximate a curve.

Rasterization

Frame Buffer

When browsing through all those 3D graphics FAQ and articles, no doubt you will come across this term - frame buffer. Simply put, a frame buffer is a buffer or container that holds frame(s) or images, usually from a series of images (animation). These images may be pre-rendered as in the case of video or still images or processed in real time such as is the case with real time 3D graphics. Quite literally, think of the frame buffer as your PC's screen.

The screen resolution and color depth you're using will determine how big the frame buffer is going to be. A resolution of 1024 x 768, 32 bit color takes about 3 MBs (1024 x 768 x 4 RGBA channels x 8 bits). Using larger resolutions will require more memory - a 1600 x 1200 resolution at 32 bit color requires around 7.2 MBs. Each pixel in the frame buffer is drawn one at a time, or more accurately per line (there are 768 lines in our example). To provide the illusion of moving images or seamless animation, the graphics hardware and your PC must produce a series of images, at least 25 or 30 of them each second. This series of images seem to blend together to our eyes at that rate. As you can see, this means lots of work - the computer must provide at least 47 million pixels every second for a 1024 x 768, 32 bit color, 30 fps animation.

Each green line represents a line on your screen. After all the lines are drawn, the graphics card stars over with another frame

Since we're talking about real time 3D graphics, we're talking about graphics we're seeing on the screen. And when we're talking about the screen, we also must consider the physical aspects of a monitor. If you recall, monitors refresh the screen several times per second - at least at a rate of 60 Hz, sometimes more. In 3D graphics, this means we're targeting 60 frames per second. There will be times our graphics hardware can draw faster or slower than 60 Hz or 60 fps. If we're rendering the images faster than the monitor can display them, sometimes it will draw the new image even if the old one isn't completely done. This produces an artifact called 'tearing'. To avoid it, you could either render slower (which we don't want) or use higher refresh rates - we want the refresh rate and the frame rate per second kept in sync. Vertical sync (v-sync) opts for the first option, telling your graphics hardware to discard images or not to render faster than the current refresh rate of your monitor. That's why when people turned off v-sync when conducting test and benchmarks. This assures the graphics card can render as fast as it can.

While v-sync may help maintain a much more smooth frame rate and avoid 'tearing', using vertical sync alone won't stop 'jerkiness' from happening due to fluctuations in frame rate. Just look at this example: the screen is being refreshed at 75 Hz and our graphics card and processor are rendering between 50 to 100 fps. There's at least a 25 fps difference between what we're seeing and what the computer 'sees'. If there's a sudden drop or rise in fps, we might still be seeing 'old' images that's in queue to the frame buffer although the computer is already rendering another scene. To catch up, the software may choose to 'skip' some images. Our eyes notice the absence of those frames and interprets the the not-so-smooth transition between frames as jerkiness. Then there's the problem of 'locking' the frame buffer. An application may told the graphics card that it has 'lock' the frame buffer to draw an image. When this happens, the graphics card must wait until the frame buffer is unlocked - that's a waste of processing power!
Since the frame buffer is still being filled, the monitor still displays the old image.

With two buffers, the monitor can alternatingly get the image from each buffer rather than wait for it to be filled.

To fix this, graphics hardware use more than just one main frame buffer. By using a copy of the buffer with double and triple buffering, the graphics hardware can fill these buffer copies first, regardless of what's inside the actual frame buffer (the final image you see on the screen). The graphics card simply transfers the content from the buffer to the actual frame buffer. For triple buffering, the graphics hardware merely 'switch' between two copies instead of using just one. So, regardless of drastic fluctuations, the frame buffers will always be filled with a steady, constant supply of images.

Using double and triple buffering is not without a price - mainly memory, because the graphics hardware must set aside memory to hold these buffers. These memory can be used to hold other information we need. So, not surprisingly graphics cards only use double buffering by default.

Depth Buffer

One of most important buffers used in 3D graphics is the depth or Z buffer. If you recall part 1, we use Z value to store just how far a vertex or an object is to the camera. Each time the graphics hardware renders a pixel, it stores the Z value of that pixel in the depth buffer. If there's another object in front of the object being drawn, it will do a second pass on that pixel. The hardware will then compare the Z value of both objects to decide which one should be drawn - naturally the closest to the viewer. A closer or smaller Z value means that particular object is nearer or in front of the other one. Now, this is done for every pixel and the screen - that means we have to set aside memory for the depth buffer about the same size as the frame buffer. There will be a difference in the range of numbers we store for each pixel - we can either use 16 bit or 24 bit. Why choose a higher bit? Well, remember that we're representing depth (distance) here along with perspective correction. We have to be accurate, or else objects may render in the wrong place and all kinds of bad things can happen. 16 bit numbers takes less memory but 24 bit numbers are far, more accurate. Hardware nowadays feature Z compression, so even a 24 bit depth buffer don't take as much space as they use to.

A Z buffer is an exact duplicate of the frame buffer, only filled with Z value rather than actual colored pixels

While the depth buffer is very simple to implement, it has one major flaw: its not efficient in scenes with very high overdraw where there are lots of objects overlapping each other - we might be filling the same pixels over and over. That processing power for our overlapping pixel can and should be used where it counts, the pixels that don't overlap. Hardware manufacturers have implemented techniques to minimize overdraw such as hierarchical Z and of course, Z compression. While they won't give us 100 % efficiency, they're generally quite effective in minimizing overdraw (around 90 to 95 %). These features kept any performance loss from processing overlapping pixels to a minimum.

Texture Buffer

Since most PC games uses textures, all 3D consumer graphics hardware not only feature a frame and depth buffer, it also use a texture buffer (quite extensively). Just like the name implied, a texture buffer holds all texture data. Early graphics hardware with very limited memory only support 8 bit or indexed color textures, but now we're seeing 24 bit (RGB) and 32 bit (RGB with an alpha channel) texture as the norm.

Most of the memory available in graphics card today are used as texture buffer. To save space and bandwidth, we could compress all of these textures. Indeed, texture compression is widely used now. There are several compression scheme and methods, with each providing us with different compression ratios, but they basically use the same technique - pack the data more efficiently and lose the bits most likely never seen by our eyes.

With texture compression, we can make textures take less space or we can choose to maintain space and go for a more detailed, higher resolution texture. Unlike ordinary image compression such as JPEG and GIF, texture compression can't be too liberal, otherwise we're going to notice compression artifacts in our textures. They still have to retain much of the data since they're also going to be filtered or blended with other textures. In most cases, a compression ration of 4:1 is good enough.

Stencil Buffer

Perhaps the easiest way to explain this feature is with an analogy. Think of the stencil buffer as a placeholder to do very specific things - a 'quick and dirty sketch' if you will. A graphics programmer can use the stencil buffer to do some additional processing for some effects. One such use will be rendering a shadow - its really nothing else but an outline of an object seen from the direction of the shadow casting light. The graphics programmer can tell the graphics card to render the scene from that vantage point into the stencil buffer, filling it with information that indicate which part of the scene is inside the outline (effectively what's inside the shadows). This information is then used to render the shadows from the viewer's perspective (transforming the light coordinates into camera coordinates) Shadows are generally not colored - they come in shades from fully dark to fully lit. Using an 8 bit stencil buffer is enough to provide 255 levels of shadow.

Memory Considerations

Almost all 3D graphics hardware feature a frame buffer, a depth buffer and of course a texture buffer. A stencil buffer may also be present. All of this buffers take up your graphics card's memory. Since it's limited, we can only use so much of it. Most of the time, the frame buffer (taking into account double or triple buffering) and depth buffer take first priority. The frame buffer holds the final output, the one you'll see on your screen. This will of course, be written to constantly. The depth buffer will also be frequently read and written to. So, what ever memory the graphics card have left is used for the texture buffer.

If you're graphics card memory is quite limited, there might not be enough space to store all the textures (because most of it are taken by the frame and depth buffer). This will cause performance issues, more so if you're playing a game that uses lots of textures. Large, high detailed textures may force the graphics card to store some of them in main memory (RAM), which is a lot slower than the local graphics card memory Texture compression may help, but not by much. In this case, we have to try to fit all textures inside the graphics card memory. This was once a primary concern, but since graphics cards nowadays come with lots of memory (the largest available today is 512 MB/s), capacity is no longer an issue.

What's more important now is bandwidth. If you remember, bandwidth are influenced by two things - memory speed in Hz and bus width. They bigger the bus and / or the faster the memory, the bigger the bandwidth. Even if you can fit all the textures inside the graphics card memory, you still need lots of bandwidth to actually transfer the textures back and forth to the chip. This is one of the main reasons why low end / entry level graphics card is better suited for resolution lower than 1024 x 768, 32 bit color. Both the capacity (128 MB) and bandwidth (64 or 128 bit bus) of these cards can't keep up with the demands of a higher resolution. Even more so if you're planning to play with lots of effects and features such as anti-aliasing and anisotropic filtering

To illustrate what's going on, let's calculate some numbers: we know that a resolution of 1024 x 768, 32 bit color takes up about 3 MBs. If we're planning to hit a constant frame rate of about 60 fps, roughly speaking we will need 60 sec x 3 MBs, which is 180 MB/s. Don't forget there's the depth buffer - that's another 135 MB/s. Of course, we also need bandwidth for textures. Most graphics cards today can hold and process a texture up to 2048 x 2048, 32 bit color - roughly 16 MBs in size. Textures used for some game models are limited to 512 x 512, 32 bit color. So, we can actually fit 16 such textures for our 2048 x 2048 texture. Of course, we also have to factor filtering into the equation, which can use 4 samples (bilinear) to 16 or even 32 samples - a total of 30 GB/s (16 MBs x 60 sec x 32 samples) of bandwidth! That's a total of 31 GB/s of bandwidth in all! With Z and texture compression, we can actually get away with less bandwidth, although there's some additional latency.

A GeForce 6800 has around 33.6 GB/s of bandwidth, so it's enough for our example. Compare that to the 'measly' 14.4 to 16 GB/s of bandwidth of its sibling, the GeForce 6600. If you don't remember, the main difference between the two is the bus width (256 vs 128 bit), the memory type and clock is about the same. So, if we want to use the same setting, we have to compromise a bit (even after using Z and texture compression). For example, one of the compromise we can make is to only use a maximum of 16 samples for filtering on the GeForce 6600, which is good enough for most people. This means we only need 15.36 GB/s of bandwidth (16 MBs x 60 sec x 16 samples), just right for the GeForce 6600.

Rasterization Performance Considerations

Even all the bandwidth and memory in the world won't help performance if your graphics card is slow to begin with. Just like your main processor, we also need a balance between available bandwidth and the processing power of the graphics chip. We usually define how much processing power by how many pixels or texels the chip can process per cycle (or per clock). The more pixels we can draw, the faster we can draw the entire frame buffer.

Texel and pixel processing are usually done inside the graphics chips. Just like your processor, these chips are made up of tiny units, some are called TMU or Texture Management Units. Additional units will let the chip split the workload, instead of using just a single unit to process them all. After the texels are processed, they can then be passed to dedicated filtering units or internally filtered if the TMUs is capable of doing so. Before shaders arrive, texel processing is actually very easy, so these units are quite simple and small. With better fabrication and design, we can put more than one TMU inside a graphics chip. Of course this effectively doubles the pixel processing power.

TMUs have also become faster and more powerful. Arranging TMUs in an array - a 'pipeline' - allows us to either process a pixel and get additional filtering for free or two texels with an additional pass for filtering. Under this scenario, multitexturing can now be done in a single pass. With 4 'pipelines', we can get either 4 pixels (single textured texels) with trilinear filtering or 4 multitextured pixels with bilinear filtering. Nowadays, it's not that rare to see graphics card which can process 8, 12 or 16 texels/pixels per cycle.

When full fledge shader hardware arrived, these TMUs are effectively replaced by more generalized, powerful ALUs (Algorithmic Logical Units). They can still function just like your basic TMUs, but they offer more flexibility because they can process instructions - shaders - as well. Unlike with multitexturing that only has simple, limited processing, some shaders can be short or long, taking several cycles to complete. So, to maximize performance, these ALUs are design to process several instructions at a time. A simpler way of putting it would be like this: if an instruction needs 3 cycles to complete, than processing 3 of them in a single cycle means they effectively can be done in a single cycle. We also get the advantage of doing 3 short instructions in a single cycle! To maximized even more from these ALUs, they can 'pack' several bits of data together in one large chunk. So, instead of processing four 32 bit texels at a time, they can put them together as one 128 bit chunk. This method, often called SIMD (Single Instruction Multiple Data) have been used by processors for a long time (MMX, 3DNow!, SSE, Altivec).

The internal parts of GeForce 6800

The Internal parts of GeForce 6600

Now, let's go back to our previous example, the GeForce 6800. The chip on this graphics card can process 12 or 16 texels per clock. Since it's running at 400 MHz, that gives us about 4.8 to 6.4 billion texels per second. Remember a texel is actually a pixel in the texture - texture element. Since the graphics card can process that much texels, it needs 19.2 to 25.6 GB/s (4.8 and 6.4 billion x 32 bits) of bandwidth for the textures. 33,6 GB/s that it has is more than enough (with some additional headroom for other tasks).

On the other hand, the GeForce 6600 can only process 8 texels per clock. It's running at 500 MHz, providing us with 4 billion texels per second. The needed bandwidth should be around 16 GB/s, enough for the 128 bit bus with 500 MHz DDR memory (effectively 1 GHz). This is preferred than a 256 bit bus with 250 MHz DDR memory, because making a 256 bit bus graphics is more expensive. The memory for both cards are basically the same, so manufacturers don't have to place separate orders, plus they can supplement memory needs. If demand is high for the 6600, the memory allocated for the 6800 can be used first and vice versa.

The GeForce 6800 uses four ALUs that can process up to four texels per ALU, so it can process shaders as well. Several instructions can be done in a single cycle since the ALU can process two instructions per cycle (with some restrictions). More complex ones might require several cycles. Under the traditional 'pipeline' assumption, GeForce 6800's ALUs basically allows us to calculate either 1 anisotropic filtered pixel with 16 samples, 2 trilinear filtered pixels or 4 bilinear filtered pixels. The GeForce 6600 is equipped with a different, although similar ALU. It uses a single ALU capable of processing eight texels per cycle, but optimized to produce four pixels per cycle. This arrangement, much like the old TMU array, allows it to process some instructions (for shaders) or perform filtering / multitexturing more effectively. One advantage that can be seen clearly is we get either 1 relatively free trilinear filtered pixel or 1 anisotropic filtering up to 16 samples (in two passes) with almost no performance drop. That's very good for a card with half the pixel processing performance of the GeForce 6800.

Confused? To put it simply, the processing power of any graphics card is very much related to how many texels it can process per cycle. A general consensus is that the more TMUs (or ALUs), the better. But the way we organize TMUs and their capabilities also influence performance. Remember that on the screen, we're viewing at least bilinear filtered textures, not just point sampled ones. If we 'stack' TMUs into an array, we get more multitexturing power and possibly trilinear filtering for free. Anisotropic filtering does need another pass, but we can get additional samples for free with the second TMU. Combined with enough speed and memory bandwidth, we can get relatively free anisotropic filtering! Without a stacked TMU, we must do another pass for filtering each texels, even more pass is needed if we're going to do trilinear and of course anisotropic filtering. But a high enough clock and the sheer number of texels we can produce per cycle alleviates the penalty for multiple passes.

Go to top
Disclaimer and Privacy policy.