Tech-Hounds.com

Because gamers play games, not benchmarks




How We Evaluate

Performance: 60 % of total score
Features: 30 % of total score
Documentation: 10 % of total score

How We Measure Performance

To measure the performance of a product, we install the product in a fixed platform setup. After driver installation and configuration, we will conduct test using benchmarks we have chosen for this purpose. The result of the benchmarks will be the performance metric for that product. We place the same point for all benchmarks, in order to give a more complete view of the product's performance. Using a weighted average, an overall total performance score is then given to the product. Information regarding the test setup used will be included in the evaluation notes.

What is a Benchmark?

There are many ways to define a benchmark when we're talking about computer and peripherals.

"A benchmark is a test, or series of test, run on a product or platform to determine the performance of that product or platform under a certain, specific load. Benchmarks may be purely synthetic, testing the product at the low level, or try to emulate real world applications or even using real world applications under certain scenarios, usually meant to represent real world situations that will produce repeatable, measurable results with small degrees of variance. Some benchmarks allow users to change their settings, so these users can evaluate different aspects of a certain product or platform. Using different settings means the product will probably give different results for each setting so only products or platform using the same settings should be compared with each other. Inevitably, there will be variances between products since they may use different methods of running the task the benchmark specifies or there are other differences (clock speed, bandwidth, instruction per cycle) between the products or platform. In this case, users must make assurances that the end data, either in binary form, image, audio, should be as close to another as possible. A note should be made in the case such assurances can not be achieved. The measurable result of any benchmark is a performance metric, an indication of what the performance of the tested product or platform under that particular test. A full, complete performance evaluation means using several benchmarks, testing them with different settings and under different loads so we can get a comprehensive 'snapshot' of what kind of performance does the product or platform offer."

In short, benchmarks are tools to measure performance, in this case either a whole computer system or a certain peripheral. Using benchmarks, we can get an idea of what performance we can expect from that system or peripheral. This way we can choose what is the best option available that will fulfill our needs. Remember that benchmark are not evaluation, only a part of it. As you can see, we don't rely on benchmarks alone but also evaluate features and documentation. Users and readers will be wise to consider the price of a product, since price is still one of the factors influencing most people's buying decision.

But Gamers Play Games, not Benchmarks

Over the years, much attention have been given to the benchmarking scene. Developer have made both synthetic benchmarks or test emulating real-world scenarios, often working with hardware manufacturers to make sure no compatibility or performance issues exist. These tools use to exist solely as professional tools, nowadays more and more benchmarks available in the market are targeted at gaming performance and they are used to evaluate products ranging from full corporate servers to PDAs and smartphones. Professional and general users alike rely on benchmarks results to help them make their buying decisions.

No disrespect to the hardworking developers of synthetic benchmarks, but their software are still not games. They may or may not reflect real world gaming performance, but that is not the issue here. We prefer to use real games for our benchmarks, so our users and readers can be sure that the performance results of a product we have tested will be the performance they can expect. Basically this means getting as close as possible to actual gameplay and of course, using the game itself.

Of course, we can't use every game in existence. Not only is that impossible, but pointless. We don't believe benchmarking every game that's available under the sun will benefit users and our readers. Instead we have choose several game titles which we feel are not only good games, but also good benchmarks. These titles have been carefully chosen to represent the various game genre available. The reason is game developers usually use some general conventions of a genre, despite using different underlaying software or 'engines' for the games. Thus, performance of one game can often be similar to other games that share the same genre. We made this decision because based on our own experience, two games using the same engine doesn't translate to the same / similar gameplay or benchmark results. In addition to that, the titles should also be very popular with gamers in their respective genre or highly regarded by gamers and game reviewers alike.

What makes a good game benchmark?

Built-in benchmarking tool or replay feature
To be a benchmark, the games must either have built in benchmarking tools or a replay feature. Typically benchmarking tools come in the form of a 'timedemo'. An ordinary timedemo is actually a replay of gameplay, the difference being timedemos are played back as fast as possible. The time taken for the timedemo to complete is then compared to the number of frames and the result is the average fps of that timedemo. If the game does not feature benchmarking tools or timedemo, we must use the replay feature with external tools to compute the results we want. For this, we're using the FRAPS software utility. Bear in mind, using external software does have some impact on performance, but we're confident that any influence is negligible at most.
Benchmarks must reflect or represent gameplay
Of course, having timedemo and replay is only the beginning. Since we want to see what the performance will be like in gaming situations, the timedemo and replay must also reflect or represent gameplay scenarios or be as close as possible. A timedemo or replay of that does not use the player's perspective, or does not feature gaming situations (empty levels or maps), or does not use effects commonly used throughout the game does not reflect or represent gameplay, at least in our opinion.
Benchmarks must be repeatable and produce repeatable, similar results
Timedemo and replay must also be repeatable and repeated runs should give the same results or at the very least be very close to each other. By repeating timedemos and replays, we can be sure that the results are valid and that there are no other influencing factors that may have come into play. This way, when we change the settings or switch platforms or products, any differences in results will only be caused by the change / switch.

Variances and Significance

Of course, even repeated runs using the same settings and platform or product may vary. These differences are often caused by external factors such as loading data from the hard drives into memory. If this is the case, we will repeat the runs until the differences is minimal and not significant. An example of this would be a difference of 3 fps between a result of 120 and 123 fps, still within a standard 5 % deviation and not very noticeable in gameplay. The same 3 fps difference will be significant if the results are 25 and 28 fps (a 12 % difference).

Bugs, Optimizations and Different Rendering Methods

As with any other software, games and benchmarks often have bugs. To minimize the impact of bugs, we update the game binaries with the most recent, final update from the games' developer or publisher. Results from using different binaries may not be compared without checking whether or not there are performance fixes or any changes affecting performance in the patch. Of course, this means we also update graphics card and motherboard drivers, which are often updated monthly. Drivers can also offer performance fixes, particularly graphics card drivers. This means benchmark results using different drivers should not be compared 'as is' without proper evaluation of the performance impact.

Unlike game updates, graphics card drivers may contain performance enhancements and optimizations for a specific application or game. While they will speed up the performance, they complicate the benchmark and evaluation process since the driver may override or alter some settings by sacrificing the image quality displayed. Illegal optimizations may also inflate benchmark results and thus invalidating them, even more so if these optimizations are only activated for benchmarks and not during gameplay. There are steps we have taken to ensure our benchmark results are valid. Image quality test, both static (ie. screenshots) and dynamic (playing the game and looking for artifacts, errors or differences) are done to ensure the image quality is the same or at least very close. We also compare the frame rate in benchmark results to real world gaming experience. A note will be given if we discover any anomalies during testing, along with any relevant information. Graphical settings and features will be activated through the game options when possible and drivers settings panel when the game doesn't support the settings and features.

Unlike pure synthetic benchmarks, game developers may also choose to optimize their game for a specific product or platform, either right out of the box or through an update. In this case, these developer optimizations are acceptable since at the very least the developers have made assurances through their own internal testing that the image quality or gameplay have not deviated far from what it should be. A note will also be given in this case.

The games and benchmarks

As time went by, newer games may also be included. We're already thinking of using Splinter Cell Chaos Theory but are still examining it further to see whether or not it will fit our needs. Other games we're considering as candidates are Call of Duty 2 or Dungeon Siege 2 when they're finally available.

Examining what influence benchmarks results

Graphics Card and Processor Impact: Using the Right Benchmark for Each
Unlike synthetic benchmarks which isolate each peripherals in the system when testing, game benchmarks are influenced by many factors. Processors, the memory used and graphics card are the peripherals that directly influence game benchmarks. In order to know what their influence are on the benchmark results, we run the game benchmarks on a platform, switching processors and using different settings. The purpose of this exercise is to determine benchmarks that we will be using for evaluating processors or graphics cards. System benchmarks are benchmarks on which the results scale significantly with increases of processor's performance. On the other hand, graphics benchmarks are benchmarks on which the results scale significantly with either resolution, graphical effects or features (such as FSAA and anisotropic filtering) changes.
Not All Games are Created Equal: Genre and Performance Considerations
Through testing, we find that the notion of developers designing games using normal or general conventions of the games' genre is generally true. Simple, fast paced, action games favor fast frame rates and often scales better with graphical effects, resolution and graphical features. This means they are generally good graphic cards benchmarks. Simulation games on the other hand, can go either way since they reap benefits from both graphics cards' and processors' performance increases. Sophisticated, real time strategy games often gain huge benefits only with increases in processors' performance. So by definition, strategy games are often only used as system benchmarks, while simulation and action games can be used for both.
What Users Notice Most: Frame rate, Details, FSAA then Anisotropic Filtering
Since we're talking about benchmarks that reflect or represent gameplay, we must also choose the settings that users most likely use. Users with different systems will have different settings, so how do we determine the ideal benchmark settings and setup? Generally, users will define the gameplay performance through the game's frame rate, the details and effects displayed by the game and graphical features activated (such as FSAA and anisotropic filtering). Using a fixed platform, users will prefer to sacrifice graphical features rather than experiencing low frame rates or low details. They will also sacrifice details and resolution if the frame rate is still to low for the gameplay to be enjoyable.

So we will conduct our test at high detail levels on a resolution of 1024 x 768, on 32 bit colors for system benchmarks. For graphics benchmarks, we will use several resolutions such as 1024 x 768, 1280 x 1024, 1600 x 1200, on 32 bit colors. We will also conduct separate test for FSAA and anisotropic filtering.
Resolution : Do People Really Play Games Above 1024 x 768, 32 bit?
The answer to that question is yes, although these gamers only cover a very small number of the population. Most gamers still play games at 1024 x 768, 32 bit color. There are reasons for this, usually related to monitor capabilities such as maximum resolution and refresh rates but performance is also a very significant factor. Most people don't use high end graphics card and systems, so for them choosing a high detail setting at a resolution above 1024 x 768, 32 bit color often means low frame rates which makes the gameplay suffers. Then why test performance of cards and systems at resolutions such as 1280 x 1024 or 1600 x 1200? At these resolutions, the graphics card themselves becomes the bottle neck, thus making it a graphics benchmark. This is of course assuming the system can push very high frame rates, since nothing have changed except the resolution. Graphical benchmarks often scales better with resolution than with graphical features, so this makes benchmarking at high resolutions makes more sense than cranking up the graphical features.
The Illusive Frame rate per Second: 30, 60, 120 Fps or More?
This question will always pop up, since in many ways the frame rate per second you get does determine how much you can enjoy a game. Some people can tolerate low fps, while others can't. In fact, a scientific research concluded that our eye can actually detect a differing single frame of image in a 200 frame per second animation. So how much fps is enough? As a general rule, frame rates below 25 or 30 fps is not desirable. Below 25 fps, our eyes will notice the lag and interactivity with the game can suffer, thus hurting gameplay. Frame rates between 30 to 60 fps is fine and can be considered normal, thus 30 fps is the minimum frame rate we want. Between 60 fps to 120 fps, the animation 'feels' really fluid and seamless, so it actually helps gameplay because you can immerse yourself further. Above 120 fps, it doesn't really matter since most CRTs can only support maximum refresh rates of 120 Hz. Combined this with what we know about significance, a product's performance can be considered a significant difference if there's at least 10 % difference between them (3 fps for 25 to 30fps, 6 fps for 30 to 60 fps, 12 fps for above 120 fps)
Don't Settle for Average: The Minimum Fps Hurts the Most
Most benchmarking tools and timedemo often only produce a single result: the average frame rate. Since gameplay gets very varied, there will be times where you will experience higher or lower than average frame rates. We don't really care about the higher frame rates, since they often occur when you're looking at an empty space such as a wall or sky. However, most of us are more concerned with minimum frame rates. Since minimum frame rates can be caused by different reasons, we must be careful when evaluating a product or platform. At times, they are often caused by engine limitations or other background process not related to graphical effects and features. In these cases, the minimum frame rates are caused by system limitations, since the processor can't deliver instructions fast enough to the graphics card. But there are times when the frame rate is cause by graphical effects or features, thus making it important for our graphics evaluation.

This is one area we pay much attention to when we choose our benchmarks. Since we examined the frame rate per second for the entire timedemo or replay on different settings and processors, we can determine what cause these minimum frame rates and thus make a note of it when we're making our evaluation and recommendations. If the minimum frame rates are the different across different processors, they're very likely caused by system limitations. Too many of them means the benchmarks is most definitely system limited. Likewise with minimum frame rates caused by graphics card limitations.
Fps Frequency Distribution - What the Average Fps Couldn't Tell You
There are times when even using minimum, average and maximum fps don't quite show the whole picture. Sometimes real world gameplay experience tells a different story. One of the reason for this fact is frame rate distribution across the whole timedemo isn't always even like in a normal distribution. In a normal distribution, samples with minimum values are often offset by samples with maximum values. That's one of the reason why we examine the frequency distribution of game benchmarks we use. With it, we can see just how long (in seconds) the benchmark spend time within a given range of frame rates.

Since we're more concerned with lower frame rates than higher ones, we divide the frame rate into 5 to 10 fps ranges, starting from 25 to 125 fps. There are other instances where we're using even lower frame rates, such as 15 fps. Again, the whole idea is to provide you a more complete picture of what's happening rather than just showing the results such as minimum, average and maximum fps.

Conclusion: System and Graphics Benchmarks We will Use

System benchmarks:
Dungeon Siege benchmark
Dungeon Siege Processor Scaling
32 sec
16 sec
0 sec
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
35
45
55
65
75
85
105
115
125
Dungeon Siege fps Progress
200 fps
100 fps
0 fps
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

This action game with RPG elements may be old, but it does represent the RPG genre somewhat quite well. The graphics is outdated by now, but most RPGs primarily concentrate on statistics and probability calculation, making them very system dependent. As you can see from the graphs, when we switched processors from an Athlon XP 2500+ to Athlon 64 3000+, we gain quite a significant performance increase. The average fps jump from 64.22 fps to 87 fps. On the last part of the benchmark where it is the heaviest, the nominal difference in minimum fps may not seem high, but they are still significant. Frame rates jump from around 25 fps to 30 fps in the last part of the benchmark (an 20 % increase, give or take). You can also see the game spend more time on higher fps with the Athlon 64 system than the Athlon XP system (above 75 fps). These are the reasons why we choose this game as one of our system benchmark.
Splinter Cell - Tbilisi Demo 1
Splinter Cell - Tbilisi 1 Processor Scaling
16 sec
8 sec
0 sec
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
35
45
55
65
75
85
105
115
125
Splinter Cell - Tbilisi 1 fps Progress
70 fps
35 fps
0 fps
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

This stealth action game using the Unreal engine caused quite a stir when it came out on the PC, since it's one of the few titles at the time that heavily uses shaders. There are newer sequels with improvements and newer graphical enhancements, but for the most part the engine is not that different (except for Chaos Theory, which we're looking very closely as a substitute or complementary benchmark). This game is both a good system and graphics benchmark, so we're using it for both. For the system benchmark, we're using the Tbilisi Demo 1, since as you can see from the processor scaling graph it scales very well with increases in processor's performance. We couldn't see this fact if we're only looking at the minimum (no difference in fps) and average frame rates (a 5.35 fps difference). The Tbilisi Demo 1 does reflect gameplay and environment most seen in this game than the Caspian Oil Refinery demo, so it's very like you will experience the same fps in other levels throughout the game.
Lock On: Modern Air Combat - F-15 Demo 01
Lock-On Processor Scaling
72 sec
36 sec
0 sec
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
35
45
55
65
75
85
105
115
125
Lock-On Progress Graph Title
150 fps
70 fps
0 fps
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Looking at this simulation of modern air combat, you might assume that its very graphics limited since it's very visually appealing. While it may look nicer than other flight simulators, it shares the same limitations: this game is very system dependent. Both the Athlon 64 and Athlon XP system still spend relatively the same amount of time in the lower fps (from 15 to 35 fps), although the Athlon 64 system spend more time on higher fps than the Athlon XP system (from 55 to 100 fps). I think it will be some time before we could find a processor that's fast enough to provide us with more than 25 fps minimum frame rates in this game. Thankfully, a rating increase of 500+ between the Athlon 64 and Athlon XP does help a bit (you spend 5 to 10 seconds less on the lower fps).
Nascar 2003 Season - Custom Replay
Nascar 2003 Processor Scaling
90 sec
45 sec
0 sec
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
35
45
55
65
75
85
105
115
125
Nascar 2003 fps Progress
80 fps
40 fps
0 fps
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Like any other racing simulations, Nascar 2003 uses complex AI to challenge the player. This and a variety of other factors (such as physics and statistics calculations) makes simulations generally system limited. So it's quite refreshing to find out this game actually scales very well in both resolution and increases in processors' performance. On the Athlon 64 system, we hardly experience fps below 35 fps, while the Athlon XP system dips below 35 fps quite a lot, usually when we're driving very close to several cars or there's collision on the race track.
Full Spectrum Warrior - Custom Replay
Full Spectrum Warrior Processor Scaling
70 sec
35 sec
0 sec
.
.
.
.
.
.
.
.
.
.
.
.
25
35
45
55
65
75
85
105
115
125
Full Spectrum Warrior fps Progress
90 fps
45 fps
0 fps
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

This game span a new genre when it first appeared, combining real time strategy with a third person perspective. Of course, this also means it's one of the visually appealing strategy games using lots of details and even shaders. Make no mistake though, this game is definitely a strategy game and very system limited. We can already see the improvements moving to a faster processor just by looking at the average fps (from 48.09 fps to 62.60 fps). But more importantly, getting a faster processor means higher minimum fps as well (37 fps compared to 31 fps). This fact can be examined more closely through the processor scaling graph. Here we can even see that we spend more time in total in the higher fps just by using the faster processor.
Rome: Total War - Custom Battle
Rome Total War Processor Scaling
90 sec
45 sec
0 sec
.
.
.
.
.
.
.
.
.
.
.
.
25
35
45
55
65
75
85
105
115
125
Rome Total War fps Progress
32 fps
16 fps
0 fps
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Complex, real strategy games don't get much more complex than this. This is probably the only reason why processor is the only thing that's relevant here. As you can see from the processor scaling graph, higher fps can only be experience with a faster processor. Even on Athlon 64, setting the unit scale to large means playing at around 15 to 25 fps most of the time.

Due to the nature of this game, we couldn't really keep the replay exactly the same. Repeated testing don't always have the same outcome, but they do stay close to each other up to a certain point. So this means we will only be using FRAPS up to a certain point, using the utility's feature to stop benchmarks after a certain time. Results for this game will only be used for system benchmarks only and should be considered as additional information. It will never be used as a tie breaker when comparing products.
Call of Duty - Dawnville timedemo1
Call of Duty Resolution Scaling
18 sec
9 sec
0 sec
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
35
45
55
65
75
85
105
115
125
Call of Duty fps Progress
200 fps
100 fps
0 fps
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Call of Duty Resolution Scaling
16 sec
8 sec
0 sec
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
35
45
55
65
75
85
105
115
125
Call of Duty fps Progress
200 fps
100 fps
0 fps
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.