Benchmarking on Trial - Part 1
Performance evaluations are never easy and the argument of using synthetic and real world benchmarks is one that's neverending among reviewers, the general public and even manufacturers. However, it is important to realize that there is a difference between benchmarks and reviews - benchmarks is usually a part of a full review, a part that focuses on performance since it is relatively easy to measure than other features of a product (image / sound quality, ease of use and installation etc).Since most people do put performance first when buying a hardware, it's only natural for reviews to put more focus on performance than say, value added features. Benchmarks are usually done with a fixed setup where only one type of hardware is changed - the hardware that's being benchmarked. As a reference point, a similar product, either in performance, capabilities or price is also benchmarked. Afterwards, the results of the benchmarks are compared to see which one offer the higher performance, according to the benchmark's metrics.
Once we have a performance comparison, the reviewer will usually add some other objective and subjective evaluations, ranging from their experience with the product, some notable differences in features and output quality, usually visual quality of the images for graphics cards. Noise, heat levels, power consumption are also becoming more important than ever. Price may also be an influencing factor - let's face it, if two products being compared basically offer the same performance, we would either go with the one with more features or less price - all things being equal (warranty / support). A slightly more difficult comparison will be if the the two product differs in both price and performance. Usually in these cases, a price / performance (and sometimes plus features) ratio is used, but these methods don't take into account the concern of the very restrictive price buyers - one that will only spend a certain amount of money on a product and no higher.
Back to the subject - benchmarks. Since we want to evaluate a hardware - a specific aspect of a system - it's typical for reviewers to use benchmarks that scales with that particular hardware, be it a processor, memory, graphics cards or other components. Synthetic benchmarks is the ideal testing tool in this regard - it completely isolates a particular aspect of a system. However, keep in mind that performance with real applications in real life scenarios are often bottlenecked by several aspects. For instance, a graphics card can only render as fast as the processor can supply data to it and the processor can only process data if the data is ready in it's memory. An overly simplistic example, but it's not far from the truth. Just look at the many components involved in that statement - graphics card, processor and memory. There's also the processor's front side bus, memory controller and even the graphics card's interface. That's why reviewers in general (and also the general public) favor benchmark results from real world application or games that they use / play. They have more bearing on what kind of performance they can expect by adding a particular hardware to their system.
But even the use of real world applications and games as benchmarks have come under fire in recent years. Since benchmarks needs to produce repeatable / reproducible results, they often make use of a series of actions or scenarios. While makers of benchmarks typically state that these scenarios are selected as close to possible to real world day to day use, they will not cover all type of use. Benchmarks using real world applications or games are not immune to this. More so for graphical benchmarks with games, remember that game developers do not focus their game code solely for graphics. They must develop their game to at least be equally balanced between the processor and graphics card. In this regard, a graphically oriented benchmark - usually a replay or a timedemo that solely focus on graphics performance will not offer benchmark results that portray typical frame rate for the entire game. There will be times in the game where you will have graphically intensive scenes and less intensive moments. Naturally frame rates between those two kinds of scenes will be different, less intensive scenes will have higher frame rates and the more intensive scenes have lower frame rates. There' will also be scenes where you are more bottlenecked by the processor or memory subsystem rather than the graphics card. In these scenes the graphics card won't be able to reach its peak potential and results are lower than they can be.
So, in that regard, graphical benchmarks using timedemos or replays are similar to synthetic benchmarks - they isolate the load on a specific hardware, in this case the graphics card despite they are real world applications or games. As a natural progression, reviewers and manufacturers are now using more 'real world' performance testing with games in particular. The idea is to monitor frame rate during actual gameplay and not replays or timedemos so that the results are as close as can be to actual use or in this case actual gameplay frame rates users and gamers can expect. To keep repeatibility and reproducibility, each gameplay session is done in a specific manner so the outcome is similar each time. Results from repeated runs are then averaged as a proxy for that scenario, since there are variations between runs which is higher than what can be expected from timedemos.
Today we're going to share our experiences with both of these benchmark tools - gameplay testing sessions and replay / timedemo runs. This article is not meant as an argument both for or against either of them, rather an 'adventure' in exploring peculiarities of both tools.
[Previous Page]
[Go to top]
[Next Page]