Saturday, June 27, 2009

Debunking Turbo Boost FUD

Some self-proclaimed technical professionals from AM(FU)Dzone is spreading FUD, again, on the Intel Turbo Boost. Some do raise intelligent doubts about it, while some do raise pure FUD, especially the moderator that I was having fun in my last post:) So, what I am trying to do here? Try explaining what the Turbo Boost really is while having some fun over them, again, of course! :)

Intel Turbo Boost technology white paper is available from http://download.intel.com/design/processor/applnots/320354.pdf , and all my explanations are solely base on this document. Why bother repeat what has been documented? Because some questions about the Turbo mode need points from various place to explain that, and might need some minimal understanding about the platform architecture and terms.

Intel® Turbo Boost technology automatically allows processor cores to run faster than the base operating frequency if the processor is operating below rated power, temperature, and current specification limits. Put it in another way, use available headrooms to operate at a higher frequency. Where do these headrooms (not talking about the possible down binning to fullfil demand, nor manufacturing guardband) come from? Softwares exersize the CPU differently and mostly will excersize the CPU at the base operating frequency with some headroom left in the rated power/temperature/current. On top of that, if software(s) does not take up all cores, some cores will go into idle/inactive state (C3 and below for Nehalem) and this further creates more headroom available for the boost. Let's look at this FUD from the FUDZone:

AMDZone.com • View topic - AMD's Magny Cours Architecture revealed: "So in order to accelerate single-threaded performance, ALL CORES are dissipating more power" Clearly this guy does not know what he is talking about or spread FUD in purpose. Per the whitepaper description, if the said software was to trigger the single threaded boost, the other cores were to be in inactive state at the first place, thus, how are they going to dissipating more power? :) When a core is in C3, it consumes very little power, and it would be even less for C6.

AMDZone.com • View topic - AMD's Magny Cours Architecture revealed: "So although CPU may have thermal detector to throttle its own clock rate, but the attempts to exceed TDP will affect the whole system as the extra heat must be dissipated to the environment. I believe this is why Turbo mode is turned off in many/most IT and datacenter environments." More FUDs here :) How is the Turbo Boost exceeding the TDP here? It is by definition to obey the TDP (utilizing the power/temperature/current headrooms). And wow, this guy 'knows' for the fact that Turbo mode is turned off in many/most IT and datacenter environments!!

AMDZone.com • View topic - AMD's Magny Cours Architecture revealed: "A clean implementation of such idea should take TDP restriction and software controllability into account. The latter is favorable to server/worktation environment so a system reboot is not needed to enable/disable the feature. In addition, it might also be preferable to implement better thread/process affinity to cores, so the running threads/processes won't jump across different cores excessively." the first part is especially funny when compare to the second quote later. For the software control to enable/disable the feature without reboot, without touch any internal information, this feature could be implemented currently with ACPI (hint: read the spec on Pstate related method) but I do not think any OEM doing that as why to allow that at runtime anyway. Now let's have fun on the first part:

http://www.amdzone.com/phpbb3/viewtopic.php?p=160965#p160965: "Lets take a slightly different point of view and ask what would it take for AMD to implement "Turbo Mode"? What mechanisms are missing?
...
For AMD to implement something like the Turbo mode, two extra things are needed. (1) An on-die thermal sensor that accurately detects core temperature. (2) The "reverse" CnQ driver that set target P-state to ones that are above the default max. According to AMD's latest revision guide, the thermal sensor problem in Phenom is already fixed in the Phenom II revision, so requirement 1 is met. What is needed is then BIOS/driver support for one or two extra P-states above the current default maximum.

Implemented correctly, it should be as reliable as the CnQ (probably more reliable than Intel's Turbo Mode). I suppose AOD/3 is already doing something similar. We know how good AMD's CnQ is, and how easy it is to active/support it. I really don't see such "self-overclocking" is much a big deal."
per what he described, AMD just need to measure the temperature (while Intel's measuring of power/temperature/current is not enough:), which clearly in correlation to the TDP btw)

enough talking about the FUDs, let's look at the other intelligent/unintelligent doubt there (at least not totally ill-intention).

AMDZone.com • View topic - Valencia and 16-core Interlagos are based on Bulldozer!: "This feature is really ONLY about the benchmarks. Having a chip clocked at it's highest and then underclocking is a much better design."

http://www.amdzone.com/phpbb3/viewtopic.php?p=160964#p160964 : "WHY would they bother?

FIRST:

Actually I see TurboBoost as being "kind of" dishonest. If the chip can run faster and has passed testing and validation at the higher speed, then why isn't it just clocked higher? It is easy to see that a much more eloquent and cleaner design would be to go ahead clock it at the fastest it can be tested and validated at and then put in a mechanism to downclock cores independently if they are not needed. OH WAIT WE ALREADY HAVE THAT.

Thus: With the availability of dynamic underclocking the only reason for TurboBoost to exist is as a marketing gimmick. It might fool some people, and cause others to ignore the truth, but in reality it isn't that useful.

This is all about binning. For most of the shipped products, I do not think that that said CPU could run at the higher frequency while maintain its TDP category and reliability period. Beside, right now in the market it is fused as 2/1/1/1 frequency bin boost, in the future there could be W/X/Y/Z, with W much bigger than the Z. The said chip definitely cannot be marked with frequency W as the base operating frequency. It cannot marked with frequency Z either if you understand that some software exercise the CPU more that the other software. Turbo would be useful under that 'the other software environment' without breaking the TDP category.

SECOND:

Something even better than TurboBoost would be to allow the system to overclock differently for various specific applications. The user should be able to specify how much to overclock and set other parameters such as voltage. OH WAIT WE ALREADY HAVE THAT. But of course since it's not in microcode or in the bios then many people don't consider that "allowed" for benchmarking. And these same people will adamantly insist that TB be allowed for benchmarking because "real system performance" is more important than clock per clock comparisons. But that same argument can used to defend TB also applies to using tools such as AOD. (Personally I think BOTH should not be used for comparative benchmarking. Comparative means we want to KNOW how they compare. Dynamic features only confuse the issue.)

I am not too sure why one could claim this, as the hardware assit TB can help 'OC' safely by measuring the CPU headroom, while the recommended approach actually user setting for each application which might or might work properly. I do like the AOD profile ability for certain usage, of which i won't describe here :)
THIRD:

And one of the most relevant points for SERVERS: Most experienced administrators do NOT want added complexity in their systems. Period. Especially when it only adds another point of failure. And as others have mentioned: The small amount of "the best performance at all costs" people don't make up a large enough demographic to spend time and money on."


No experience in server administration and thus i won't comment much on this, but i believe this is lame excuse because there are way more other features add more complexity to the system than the Turbo mode, which merely dynamically changing frequency which until recently already implemented in the reverse direction (SpeedStep).

Ok, enough of the Q&A and fun:).

No comments: