A Primer to 3D Graphics - Part 1
Much have changed with PC graphics. Gone are the days
of where most of what we see in our screen are 2D images,
animation and video. In the last five years, 3D graphics
became less of a buzzword and more of a reality, so to
speak. Nowadays, we hardly see a new graphics chip without
some 3D graphics capabilities built in. Even the graphics
we're seeing in our cellular phones and hand held devices
are moving to 3D.Through the years, we see the overall image quality improve. Better images and animation are now the standard, made possible by more capable and faster hardware. In fact with every new generation, the tech demos gets closer and closer to cinematic quality, the holy grail of real time graphics. No wonder more and more attention have been given to 3D graphics, particularly in the area of real time consumer 3D graphics.
Computer Graphics 101
Before we delve deeper into 3D graphics, we should take a look how they are made possible. Undeniably, computer generated graphics are very closely related to math. Unlike drawing with a pen on a piece of paper, we worked with virtual ones. In this case, the virtual paper is the screen. Drawing a line is simple enough: put our virtual pen in one of the dots and move the pen towards the other dot. Slopes and curves are drawn much in the same way, although more calculations are needed.
With PC graphics, we must provide the dots' positions to the computer. So we divide the screen into a coordinate grid, by what we call the coordinate system, like in your basic geometry. We now have two axis - x and y - where x is the horizontal axis and y is the vertical axis. The height and length of the screen can be any number (or even infinity), but most of the time we use the number of pixels - resolution. For a screen resolution of 1024 x 768 pixels, most likely we're going to use 1024 as the maximum number on the x axis and 768 for the y axis. Since the pixels are square objects, we use the same unit for x and y. Of course, these numbers may have decimal values to allow more precision. Now we just need two coordinates (x,y) for our dots and then begin drawing our straight line.

If both dots share the same number on a given axis (for example 20,5 and 20,10), we have a line that's parallel to the axis (in our case its the x axis - 20). If they don't, we have a sloped line, either downwards or upwards. When we view the line, usually we don't see it as a line connecting two dots, but just a line. For example, when drawing, we draw the first dot and then move our pen to the second one. This method of drawing is actually called vector drawing. As you can see, we can draw a vector either by providing coordinates for both dots or we could use the coordinate of the first dot, the direction and distance of the vector. Using vectors is preferred since we can store a lot more information inside a vector and use it to draw a straight line or curve. There are other advantages as well, but we'll get into that later.

We could make all kinds of shapes using simple lines and curves. We could make a box, square, triangle or even complicated objects such as a pentagon, hexagon, ellipse or circle - whatever you like. These complicated shapes are mostly made up of lines and curves.. So, we can draw them just by using dots and connecting them. But more importantly, we can also draw them using a reference dot in the center and then draw the other dots in reference to the center dot.

Why is this so important? Look at a circle - you can't draw a circle with dots. The easiest way to a draw a circle is to draw a curve around the center (our reference center dot), to which every part of the curve has the same distance to the dot. Drawing an ellipse can be done in the same way, just use stretch the distance to get that ellipse shape. Defining a shape with vectors allows us to easily perform rotations and scaling. For example, to rotate a shape on it's center axis, just rotate the vectors in relation to the center. We can also scale the shape by increasing the distance from the center. Is that all? No, let's go wild - we could rotate and scale it arbitrarily in relation to any dot, even one outside the shape!



Of course, drawing empty shapes is hardly interesting, even in computer graphics. What we can do is fill them with color. Most colors can be represented by a combination of red, green and blue. So we use them in varying degrees to represent colors, where one color is used per 'channel'. Using 8 bits per color channel, we could represent 16.7 million colors on the screen (an 8 bit number can hold a value between 0 to 255, so we can represent this many color : 255 x 255 x 255). An additional 8 bits can be used to control the transparency of the shape - we usually call this the alpha channel. That's why we rarely see any difference between a 24 bit and 32 bit graphics. Of course, we can fill it with just one color, but we can do more since we can fill different colors for each pixels within the shape - for example a bitmap of a scanned photograph, a gradient fill or even a pattern.
Bringing it to 3D
If 2D graphics only uses two axis - the x and y axis, 3D adds another axis - the z axis - into the mix. This z axis represents depth and allows us to convey not just lines, curves and shapes but also volumes. Loot at a cube - it's basically nothing more than six squares, where one square shares a dot with two other squares. A pyramid is made up of four triangles and a square on the bottom. Even a ball is actually a circle rotated on an axis. You get the idea.
3D graphics allows us to present object that looks very closely to what we seen in real life. If you look at the same cube from the front, you could only see the side facing you. Turn it slightly, and you'll see the others. We know that a cube uses the same dimension for all its sides, but our visual perception of the cube depending on the angle and distance - this is called perspective. To our eyes, the back facing side of the cube is further away. A good 3D implementation must take this into account since we see everything around us in 3D and in perspective.
One of the obvious affects of perspective is the cube's sides appear less slanted when we move it left and right. It's actually easy to duplicate this effect in 3D - just shift the vectors for the back side of the cube. First, we need a vector describing the cube's position relative to the center of the screen (using the center of the cube). Then we 'correct' the vectors of all the dots in the cube to account for our perspective. It's the same thing for any object, even if our object is not a cube.
When we move an object along (or in parallel to) an axis, this is called translation. Since every dot in the box is drawn relative to its center, we only need to shift the center of the box relative to the center of the screen. Remember how rotation works in 2D? It's the same principle - rotate the center of the box relative to the center of the screen. Of course, since we're working in 3D, all this operation can be done on the x, y and z axis. For rotation, this means roll, yaw and pitch.
If we can move objects in 3D, shouldn't we be able to move ourselves as well? Well, yes! That's one of the greatest things with 3D, you can move your view just like you do in real life. You could move nearer to or further from an object, look up/down or left/right. Doing this in our virtual screen doesn't mean we're actually moving you or the camera. Instead, we move your perspective of everything in the 3D virtual world - we move everything you see in 3D in relation to you (or more accurately, the camera's position). This provides the illusion of movement to your eyes, although you hardly move at all! So every time you move in 3D, the computer must transform - translate and rotate all the objects in relation to you and take into account your perspective.
To convey a better sense of volume, we need to 'fill' the box's sides so it looks solid. Otherwise, you can't tell if which side is facing you. This is done by passing information about the order or the dots - usually called vertex (vertices if more than one). If they are drawn clockwise, that side faces outward and vice versa. This direction is called a 'normal'. The lines connecting the vertices are aptly called 'edges'. When all of them are put together, we'll either have a triangle (3 vertices and 3 edges) or polygon. A polygon can have more than 3 vertices and edges, but the greatest thing is that they can be easily be made using triangles. The computer will compare the normal of every object in 3D to where the camera (or you) is pointing at. If the object is in your perspective and the normals are facing in your general direction, the object (and the sides thats facing you) will be drawn. To avoid drawing the back of the object, the computer may choose to simply draw the back sides first and then the front. This is usually done by comparing the coordinates of the vertices (in relation to you and the camera)- which ones are further along the z axis.
Unfortunately, this process will draw every object in the world - even the ones not in view! This is of course not efficient - we want the computer to draw only the objects in our field of vision. Most of the time, this is done by defining a box or a plane representing our field of vision and the comparing the vertex coordinates to this plane. An object with vertex coordinates outside the plane will not be drawn. However, an object having both a vertex that's 'visible' and not will still be drawn. Of course, objects will all the vertices inside our field of vision will be drawn. This process is called clipping, since with it we 'clip' the 3D world as if we're looking at it through a hole in shape of a box - our field of vision. Normally, we define our field of vision as a 90 degree area, both up/down and left/right.
Distance also plays a role in clipping. For instance, if you have ever used a camera, you'll notice that objects beyond and below the focus will be blurred - out of focus. We need to have a near and far clipping plane to tell the computer not to draw objects that are too far or too near, because we only want to see objects in focus. This is again done by defining a value along the z axis where we want our clipping planes and the the computer will compare each vertices coordinates with these values.
In addition to this, there is a technique that will let the computer know not to draw the sides that are facing away. This is usually called back face culling. After clipping is done and before drawing the objects, back face culling helps the computer determine which sides of an object that's not in our view and stops them from being drawn.
Another exploit of distance we can find in 3D graphics is what's usually called Level of Detail (LOD). When using LOD, we make several versions of the same object, where we display the most detailed version when it's close and least version when it's far. This makes sense since in real life we hardly notice details of objects far from our focus. Using a less detailed version means we could save memory usage and processing time to use them on our most detailed version - where it really counts.
A Shady World
Up till now, you'll notice that our objects have the same color on all their sides. Needless to say, this is not natural phenomenon. In real life, lights and shadows are everywhere. They give us visual cues about an object, such as whether it is facing or away from us, how smooth the surface etc. Shadows present important information to our brain to 'see' where an object is and just how far it is from the ground.What most people seem to misunderstood is that we actually don't see light in terms of photons, but instead we see how they interact with objects all around us Light will interact through reflection, refraction, deflection, some are even absorbed. 3D graphics can't model light (at least its not practical with current hardware), so we settle for the next best thing - modeling how light interacts with objects. This way we could approximate how light behaves in relation to how we see it. So good lighting and shadows are important to 3D graphics.
The earliest form of shading - how light interacts with object - is based on the understanding that when light 'hits' an object, it will be reflected in all direction with the same intensity. Lambert shading laid the ground work for all 3D shading algorithms after it, by offering a simple calculation regarding the incidence of light (the angle of light relative to the object).
Gouraud shading takes it a step further. The problem with Lambert shading is that the resulting image doesn't give us a smooth transition between lit areas. This is because every triangle or polygon of an object are lit with different amounts of intensity. The solution is simple: interpolate the intensities between each vertex, resulting in smooth transition of intensity across the object.

Unfortunately, Gouraud shading has some drawbacks. For one, it can't offer specular highlights, since it only calculate diffuse light. Specular highlights occurs when an area of an object is directly reflecting light to our eyes. It also does not do a very good job of lighting a low polygon object since it needs lots of vertices to calculate the highlights. Phong shading solves all these problems by doing the calculation not per vertex, but per pixel. Like Gouraud shading, the angles can still be interpolated, but the per pixel calculation allows us to get specular highlights, even on a low polygon object. Phong shading is not without a drawback - the sheer number of calculation needed to be done per pixel. But if you have a fast hardware (or more than willing to wait), the resulting image is very pleasing and more acceptable to the eye than Gouraud shading.

Of course, 3D graphics doesn't stop there. In real life, surfaces of any objects are not perfectly smooth. Every surface has little microscopic fractures and holes, indents etc. They also have different behavior when interacting with light - some reflect more than others, some are even translucent or transmissive. Blinn shading incorporates all of this, providing us with the most complicated and more believable images than Phong shading.
What may surprise you is that all these algorithms are actually developed in the very early stages of computer history - in the 1970s. Up till the advent of affordable 3D graphics, they are only used for non-real time rendering. The first generation of consumer 3D graphics hardware mostly rely on Gouraud shading, since it offers a more affordable compromise between image quality and rendering performance. The latest hardware available are powerful enough to do Phong and even Blinn shading in real time, although some approximations and compromise may still be used.
The Texture Component
If you look at the objects around you, you'll see that even if they have the same basic shape, they differ much in detail - textures, colors, materials etc. Take for example a brick wall - we know that a wall is actually just a simple plane and could be easily modeled as a box, even a cube. But then we lost the illusion of the bricks. While we can use polygons to model each brick in our brick wall, it would be to expensive and not very practical. Thankfully, we have an easier solution.Remember before we talked about how we can fill a 2D shape with just about any color, even a combination of colors divided into pixels (the bitmap)? Well, we can also use it in 3D graphics. In this case, the bitmap becomes a texture element. We simply take a picture of a brick wall and place in in our 3D box. Now we have a 3D brick wall! Well, sort of anyway.

Combined with Gouraud shading, the practice of texture mapping provides some very pleasing images. Of course, they're still lacking in some way - remember that Gouraud shading can't provide specular highlights and produce some artifacts with low polygon objects. This is a very huge problem for real time 3D graphics, since most early hardware can only do Gouraud shading and renders become very slow if you use high polygon objects. The solution is again very practical - process the lighting for the entire world even before rendering (usually with offline rendering methods such as ray tracing and radiosity). Then put the information INSIDE the textures, so no additional calculations will be needed. This method works beautifully on worlds and objects where the lights are static and doesn't move. For the ones that do, we could make use of simple Gouraud shading.
More effects with multi texturing
The methods described above does a pretty good job with the available hardware at the time. But that doesn't mean they were enough. Everyone wanted faster, better graphics and the industry was more than happy to oblige. We want shadows, bullet marks, smoke, water, reflections - you see it in real life - we want it. Blob shadows, decals for bullet marks, transparent alpha textures for smoke, and environment mapping become the norm. These special effects are made possible by better, faster more capable texturing hardware in consumer 3D graphics.One of the important features that made all of this possible is multi texturing. Look back at Blinn shading, you'll notice that we essentially have a light, a surface, and information regarding the roughness of the surface (usually called a 'bump' map). Doing Blinn shading in real time is not very practical, but we could 'fake' Blinn shading with multiple textures - one for each component (light, surface and 'bump' map). Since most of the available hardware at the time can process two textures per cycle, doing this fake Blinn will only require two passes in most cases (more if you use better than bilinear filtering). But it's still a lot faster than doing it in software! If you don't want the performance penalty, just toss away the bump texture map, so the hardware will render in one pass. We could even forgo Gouraud shading entirely by using a texture for the light - typically a circle of solid color that gradually fades.
Multi texturing really opens up a whole new era of effects, but the most important part is the preliminary use of shaders. Think of shaders as equations - after all shading algorithms are actually just an equation. But since they're not design specifically for shader use, multitexturing have very limited capabilities and constraints when 'forced' to work with shaders. Even so, graphics programmers are 'hacking' their way to provide better effects. After this, it's very clear what the next step will be - developer controlled, multi texturing - programmable shaders.
Shaders
With programmable shaders, 3D graphics begins a new journey. Graphics programmers have more control over the rendering process. They can specify what data will be processed, but more importantly, how to process them. They can even use them to built the data to be used on the fly! Shaders have also extend themselves to polygons in the form of special vertex shaders. Vice versa, shaders dealing with textures are usually called pixel shaders. All of this innovation now allows the industry to offer consumers more realistic images, with shadows that looks like they do in real life. Real time 3D graphics have progressed rapidly and now provide comparable images to early offline rendering solution such as ray tracing and radiosity. Even effects such as depth of field, motion blur and bloom effects are now possible in real time, again with some caveats.Now, you remember what we talked about shading algorithms and multitexturing, particularly 'faking' Blinn shading. With programmable shaders, the results are better. Remember that to produce convincing images, we need to calculate just how much light is reflected by the surface at every pixel. To find the specular component, we also have to compare the incidence of light and the camera. The pixels with the appropriate angles and normals will be filled with specular highlights while those around it will gradually fades (diffuse light). Under multitexturing hacks, all of these must be done through some creative and clever coding, but with shaders programmers can explicitly send the code the way it is originally written directly to the graphics chip. With the help of a shader compiler inside the graphics card drivers, this code is translated into instructions which the hardware can understand.
Newer hardware have more capable shader hardware and faster performance. With it, shaders can have much more instructions, branches, loops etc, just like the software that runs on your processor.
Floating point precision
One of the most intriguing changes with shaders is the move from integer number to floating point numbers. This is important because with integers, we might lose some accuracy when processing. Take this as an example: multiplying decimal numbers such as 0,33 or more accurately 1/3.. If we look at the shading algorithms, most of them involves multiply operation. The sum of one or two operations may not differ much, but what if we're doing more than that? 0,3^4 (0.01185921) is not the same with 1/3^4 (0.012345679012345679012345679012309 or 0.01234567).As you can see from our very simple example above, the number of bits also plays an important role, since the more bits we have, the greater range of results we can show. But you have to remember that all these computations is going to every pixel drawn on your screen. The difference between these two numbers may not be noticeable by your eyes, particularly if the developer is smart enough. That's why not many people see a difference between 16, 24 and 32 bit floating point precision when looking at images from consumer 3D graphics hardware. But as more and more developer and programmers jump in the shader bandwagon and begin writing long, complex shaders, this small difference can make a difference someday. Right now, games are only beginning to rely on shaders - mostly because the hardware is more widespread now, but also because of performance. New graphics card can sometimes offer up to 50 % increase in shader processing power and this trend should continue into the future.
Graphical Features - The Must Have Eye Candy
Ever look at your old. 3D games? Since they're old, chances are you'll see a lot of blockiness. Blockiness in shape usually results from using too low polygons, while blockiness in texture results from too low texture detail / resolution. This is normal, since these games are built with the graphics card available at the time in mind. Newer graphics card are more capable (and a whole lot faster), so you can either run low detail objects / texture very fast, or high detail objects / texture fast (enough). Of course, game developers choose the latter for their newer games. Let's see what features are common place in today's games applications.LOD - Level of Detail
High detail objects and texture take up space in your graphics card memory, but what's even more important, they eat up bandwidth. Instead of using them all the time, games and applications will use several versions of the same objects, ranging from very high, high to low (hence the term level of detail). The game or application will decide which one to use depending on how far they are - high detail when the objects are near and low detail when they're far. Needless to say, this method uses bandwidth effectively since low detail objects and texture don't take up much bandwidth, cover most of your view and are not generally focused at. More of the available bandwidth can be used to display objects and textures that are much nearer, where it really counts.Texture Filtering
Just like a 2D image, a texture will look best if they are displayed in their standard resolution, for example, a 512 x 512 texture will occupy 512 x 512 pixels (more than a quarter of your screen in 1024 x 768 resolution). When an object is far away, the image stays the same but take up less screen space. This means a pixel (on your monitor's screen) may actually be comprised of several different texels (texture elements). Vice versa, when the object is very near, a texel may be displayed on several pixels, resulting in blockiness. To minimize graphical artifacts, we need to filter the texels. Filtering here means we take a sample of several texels, compute an averaged value and use it to fill each pixel of your screen.There are basically four general methods of filtering: nearest (point sampling), bilinear, trilinear and anisotropic filtering. Point sampling is mostly used in old games using software rendering (with your processor doing all the work). It's the fastest method and put least burden on hardware, but looks really, really ugly! This is because we only use one texel, regardless whether or not it represent what the image should look like on your monitor.
Bilinear filtering was the norm for quite awhile since it's quite fast with first generation consumer 3D accelerators and graphics card. Bilinear filtering uses several texels (usually four) and computes an average to be used. Unfortunately, when used with the LOD technique for textures (otherwise known as MIP-mapping), it produces an artifact that's very visible called MIP-banding. MIP-banding is artifact where we 'see' a clear defined transition line between one MIP level to the next.
Almost all of games nowadays have moved to at least trilinear filtering to counter this artifact. Trilinear filtering still shares the same method with bilinear filtering but uses samples from both the texture from the high and the lower MIP level. By mixing the samples from both MIP levels, we get a smooth transition.
Nowadays, all the latest games are using anisotropic filtering. Anisotropic filtering is still very new for consumer graphics cards. It receives much attention when the Radeon 8500 came out. ATI made some compromises to the Radeon 8500's method of anisotropic filtering. On the Radeon 8500, anisotropic filtering will only work with bilinear filtering, and it won't work with trilinear filtering. This was finally fixed on the Radeon 9700, where we see both anisotropic filtering and trilinear filtering. NVIDIA followed suit with their GeForceFX 5900 line but uses a different method and calculation. On the current generation (GeForce 6 and X700 / X8x0 series), they pretty much have the same method (ATI has some edge though). So, even now, there are some compromises made by both manufacturers to ensure anisotropic filtering doesn't put too much strain on frame rate.
What's the deal anyway? Well, an artifact that's apparent with using trilinear filtering is that as texture gets further away from the camera, it became more blurry and less clear. This happens because trilinear filtering uses a sample pattern assuming the texture is directly facing the camera and doesn't take into account the camera's perspective. We actually need more texel samples if the triangle / polygon is in perspective (anisotropic). Anisotropic filtering uses a modified sample pattern that takes into account the 'shape' of a texture - perspective wise. Obviously this means more samples than bilinear or even trilinear filtering, so it's quite heavy..
Anti-aliasing
When people talk about anti aliasing, the usually refer to the process needed to remove jaggies or staircase effect that's visible on lines in 3D graphics. If you look at old, 3D games 3 to 5 years ago, you probably see that besides blockiness, you can also see that the outline of the objects is not smooth. They're even more obvious if you move the camera, either sideways or forward and backwards. This artifact - aliasing - happens because your screen is made up of square pixels and if the lines drawn are slanted / sloped, the computer must decide which pixels around the line is drawn. If you're using too low resolution, the pixels are quite big so the line 'jumped' from pixel to pixel. Using higher resolution will lessen this effect, but anti aliasing does it better (and smoother) to the point you don't notice it anymore.In most cases, anti aliasing works in a way that for every pixel drawn on your screen, your graphics hardware must decide which pixels are filled to eliminate aliasing artifacts (the pixel 'jump'). This means using more samples - just like filtering. The hardware can render more sample pixels then calculates a single, final pixel or they can just sample parts of the pixel. The first method are generally called super sampling which is very heavy since basically your graphics card must render the same image with twice or more resolution. The second method, which is widely used today is called multi sampling and is a lot more lighter than super sampling. Sample amount can range from 2 samples to 6 or even 8 samples. Of course, the more the sample the better the image and also the heavier for your graphics card. For the most part, 4 sample anti-aliasing (4x FSAA) is good enough and fast enough on current hardware. If you're very picky about image quality, you can either up the number of samples or the resolution or both, but there would be a performance penalty.
The number of samples is not the only determining factor in image quality. The sample pattern is also important. If the pattern is still shaped much in a square like array, some pixels may still look 'jagged'. This pattern sample is called ordered grid. A much better sample pattern would be to slightly tilt the pattern, greatly improving image quality. This pattern sample is called rotated grid or jittered. Current consumer 3D graphics hardware such as NVIDIA's GeForce 6 and ATI's X series uses the rotated grid when anti aliasing, but older hardware such as the GeForceFX series uses an ordered grid. ATI have been using rotated grid anti-aliasing since the Radeon 9700.
Go to top