Tech-Hounds.com

Because gamers play games, not benchmarks





Memory Related Factors

Another area we want to examine is further down the pipe - memory. By default, Core 2 Duo processors, even the highest clocked Core 2 Duo X6800 are only running at 266 MHz FSB (1066 MHz effective). To alleviate bandwidth penalties, Intel decided to use larger caches with these processors. Conroe processors - E6600, E6700 and X6800 came with 4 MBs of cache while Allendale processors - E6300 and E6400 - comes with 2 MBs of cache, half of it's much expensive siblings. On the chipset, you can use asynchronous memory clocks - for example if you're using the official FSB of 266 MHz FSB, you can run your DDR2 modules at 333, 400 or 533 MHz (translating into DDR2-667, 800 and 1066, respectively). Now, there's two problem we can see with this picture. Judging from past experiences, running asynchronous memory clocks are not what is all cracked up to be. Memory controllers have to be specifically tuned for asynchronous settings and traditionally Intel chipsets work best in synchronous modes. Second, memory timings usually gets 'looser' with higher frequencies - latencies are higher at higher clocks. Despite higher raw bandwidth, we might not see much improvement in performance with asynchronous modes.

Most articles on the matter have also stuck to the official FSB specs - 266 MHz. We are particularly interested to see if such a behavior are also present at higher FSBs, which fits perfectly with our overclocking experiment. If there is an FSB cap to Intel's chipsets, then which method should we use to increase memory bandwidth: higher memory clocks with asynchronous settings, tighter memory timings with synchronous settings or a little bit of both? What kind of gains can we expect from each? This is especially interesting, since many memory manufacturers have been touting DDR2-1066 and higher clocked memory modules for some time now. Some even tout low latencies, but is there really a definite need for them?

Low Risk, Low Hassle Overclocking

Overclocking, voltage and cooling are heavily related to one another. If you overclock a component, chances are it will produce more heat than if you don't overclock it. That means you have to pay attention to cooling solutions - sometimes just putting a fan is enough, sometimes you have to use more exotic methods like water cooling or thermo-electric solutions. Raising the voltage adds another problem - more voltage, more watts - and more watts means more heat which of course means you need a more effective cooling solution to keep temperatures low (or low enough). For the purpose of this article, we decided NOT to raise any voltage settings - be it processor, memory or chipset. In fact, the Core 2 Duo E6300 we're using is actually running at a lower voltage than specified - mostly done to keep a balance between stability and temperature. By not raising the voltage, the overclocks we reached are pretty tame - 2.8 GHz with stock air cooling - but it's also hassle free, easy and less risky than if we were to raise any voltages. No doubt if we were to raise the voltage and use more effective cooling methods we could've reach higher clocks, but that's not what we're interested in for this article.

Preliminary Tests

Since we're going to examine very specific areas, we have to use very isolated tests - that means resorting to synthetic benchmarks. In this case, we're going to use two benchmarks - SuperPi 1.1 (no mod) and Sciencemark 2.0's Membench. SuperPi is a very processor / bandwidth sensitive benchmark, which makes it ideal for doing preliminary tests. Although it's a very good benchmark, SuperPi test results are influenced by several components - processor (clock) and memory (timing and bandwidth). That's why we need a second benchmark. Sciencemark's Membench results will give us a much clearer view of bandwidth and latency so we can determined which one has the bigger influence on performance. Keep in mind, these are only preliminary tests results, from which we're hoping can help us find the best combination of FSB and memory settings that offer the most optimal increase in performance.

Here is the SPD data from the A-DATA Vitesta modules


Synchronous vs Asynchronous Mode

First of, lets take a look at asynchronous test results from Sciencemark's Membench. We kept the processor running at its default clock 7 x 266 MHz or 1833 MHz, then ran tests under three settings - with the memory running at 533 MHz (synchronous), 667 and 800 MHz (asynchronous). For this test, we've decided to let the motherboard apply SPD timing values, meaning memory timings are different for each memory clock.


Sync Async Async

533 Mhz 667 Mhz 800 Mhz
Bandwidth 4609.8 MB/s 5016.16 MB/s  5079 MB/s
Latency


4 byte stride 3 cycles 3 cycles 3 cycles
16 byte stride 7 7 6
64 byte stride 28 26 25
256 byte stride 99 89 83
512 byte stride 112 102 98


Compared to Sync 533 MHz
Compared to Sync 533 MHz
Bandwidth
8.82% 10.18%
Latency


4 byte stride
0 cycles 0 cycles
16 byte stride
0 -1
64 byte stride
-2 -3
256 byte stride
-10 -16
512 byte stride
-10 -14

Next, we raise the FSB to 333 MHz, which is supposedly the new official FSB for the newer Core 2 architecture based Intel processors and Intel Bearlake chipsets.


Sync Async

667 Mhz 833 Mhz
Bandwidth 5295.66 MB/s 5649.92 MB/s
Latency

4 byte stride 3 cycles 3 cycles
16 byte stride 8 8
64 byte stride 34 31
256 byte stride 120 104
512 byte stride 138 122


Compared to Sync 677 MHz
Bandwidth
6.69%
Latency

4 byte stride
0 cycles
16 byte stride
0
64 byte stride
-3
256 byte stride
-16
512 byte stride
-16

Notice that bandwidth gains are relatively insignificant with asynchronous memory settings (mostly below 10 percent, staying around 400 MB/s), even if you were able to run with slightly lower latencies. Below you can see the results of the same test, but this time using synchronous settings. Keep in mind that this mean raising the FSB to keep it in sync with memory speeds (FSB 266 - memory 533 MHz, FSB 333 - memory 667 MHz and FSB 400 - memory 800MHz).


Sync Sync Sync

533 Mhz 667 Mhz 800 Mhz
Bandwidth 4609.8 MB/s 5295.66 MB/s 6262.48 MB/s
Latency


4 byte stride 3 cycles 3 cycles 3 cycles
16 byte stride 7 8 9
64 byte stride 28 34 35
256 byte stride 99 120 125
512 byte stride 112 138 143


Compared to Sync 533 MHz Compared to Sync 533 MHz
Bandwidth
14.88% 35.85%
Latency


4 byte stride
0 cycles 0 cycles
16 byte stride
1 2
64 byte stride
6 7
256 byte stride
21 26
512 byte stride
26 31

Notice how things have changed - bandwidth gains are significant and much more noticeable (more than 15 percent). Assuming memory clocks and timings are the same between synchronous and asynchronous settings, the extra boost in bandwidth have to come from somewhere else - we believe from the inner workings of the chipset or rather the memory controller. You can see the table below for a much clearer comparison.


Async Sync Async Sync

667 Mhz 667 Mhz 800 Mhz 800 Mhz
Bandwidth 5016.16 MB/s 5295.66 MB/s 5079 MB/s 6262.48 MB/s
Latency cycles cycles cycles cycles
4 byte stride 3 3 3 3
16 byte stride 7 8 6 9
64 byte stride 26 34 25 35
256 byte stride 89 120 83 125
512 byte stride 102 138 98 143


Compared to Async 667MHz
Compared to Async 800MHz
Bandwidth
5.57%
23.30%
Latency
cycles
cycles
4 byte stride
0
0
16 byte stride
1
3
64 byte stride
8
10
256 byte stride
31
42
512 byte stride
36
45

Just for confirmation, let's take a peek at CPU-Z's memory timing dump.

667 MHz

 

800 MHz

 

Well, that single CAS latency difference may play a part in the results between synchronous and asynchronous 667 MHz, but there's no other explanation for the lack of any latency difference between synchronous and asynchronous 800 MHz. Conclusion: memory timings are the same between synchronous and asynchronous settings. Obviously, Intel chipsets, in this particular case the Intel P965 chipset, is still designed with synchronous than asynchronous memory settings in mind - exactly like its predecessors. Wait! You might say, these results are caused by raising the processor's FSB but remember, we're not looking at performance numbers here, but memory (subsystem) bandwidth and latencies.

Here's an explanation:

By raising the FSB, we raise the clock at which the chipset is operating. If internal chipset / memory controller latencies and memory timings are constant, we should've seen lower time spent on idle cycles - we didn't. . We saw latencies are higher in synchronous mode than at asynchronous mode at the same memory clock (and timing).  Add to that, we also saw higher FSBs use higher latencies than lower FSBs (266 MHz vs 343 MHz vs 400 MHz) - latencies that are inside the chipset / memory controller, not on the memory modules. Later we will see why this is important.

[Previous Page]
[Go to top]
[Next Page]
Disclaimer and Privacy policy.