Wednesday, September 09, 2009

How to make Turbo Boost work under Linux

Just seen some funny accusation that Intel's Turbo Boost is broken, not working in Linux ... hence one need to turn of Turbo boost in running benchmarks ... and some wishes that AMD's implementation would 'fix' that bug ... and of course, this is from the AMFUDZone :)

Well, I'll just list down some simple short rules for any OS to run Turbo boost here;
  1. The said OS must support ACPI
  2. The BIOS in use must support ACPI
  3. EIST and Turbo Boost BIOS option must be turned on
  4. CState should be set
  5. The said OS must turn on its power management features, for both Pstates and Cstates
  6. The Pstate entries, P1 should be corresponding to the chip default frequency

Here are the brief explanations.

Turbo boost is enter through Pstate-0. Thus the system (BIOS and OS) must support ACPI and turn those options on. Turbo boost is guarded by thermal and power headroom, enabling (deeper) Cstate would help CPU running at higher frequency because headroom are likely available.

The sixth requirement is not quite obvious. I have personally seen a Linux variance kernel debug check if this is not fulfilled.

A side note on this is that there are also people claim that CPU running at higher frequence unnecessarily. Actually this is untrue. It is again depends on the user's choice of power policy. Take the Windows XP for example, if an user choose the power scheme "Home/Office Desk". The said CPU(s) would run at Pstate-0 most of the time (except when entering enhanced Cstate which it would reduce to lower Pstate before idle). The CPU would be under Turbo Boost most of the time. But this make little different. If the given CPU does not support Turbo boost, it would be under its default max frequency with this setting anyway. If one has a concern on this, one could just use the power scheme "Portable/Laptop", like I did, even with a desktop system. Then when you are doing light work, your system would just run with lower Pstate, and enter Turbo during high load.

Then the usual accusation saying this waste power for server when it is idleing most the time ... wait, if one would have to enable Turbo Boost on server, and has concern with power ... should not one turn on the power saving policy so that enter lower Pstate on low usage ??? :) Anyway, I am not a system admin, not sure about the server power policy, feel free to correct me, either with your experience, or known data. If you were to share your viewpoint/guess, please add words like such as "think/guess", or put the statement in the form of question like I did :)

Then the even funny statement on why one would buy a CPU with Turbo Boost and should turn it off, because want Consistent Results or claiming With turbo mode, the additional clock rates is not guaranteed " ... and those are likely the same folks would turn on their CnQ :)

Saturday, June 27, 2009

Debunking Turbo Boost FUD

Some self-proclaimed technical professionals from AM(FU)Dzone is spreading FUD, again, on the Intel Turbo Boost. Some do raise intelligent doubts about it, while some do raise pure FUD, especially the moderator that I was having fun in my last post:) So, what I am trying to do here? Try explaining what the Turbo Boost really is while having some fun over them, again, of course! :)

Intel Turbo Boost technology white paper is available from http://download.intel.com/design/processor/applnots/320354.pdf , and all my explanations are solely base on this document. Why bother repeat what has been documented? Because some questions about the Turbo mode need points from various place to explain that, and might need some minimal understanding about the platform architecture and terms.

Intel® Turbo Boost technology automatically allows processor cores to run faster than the base operating frequency if the processor is operating below rated power, temperature, and current specification limits. Put it in another way, use available headrooms to operate at a higher frequency. Where do these headrooms (not talking about the possible down binning to fullfil demand, nor manufacturing guardband) come from? Softwares exersize the CPU differently and mostly will excersize the CPU at the base operating frequency with some headroom left in the rated power/temperature/current. On top of that, if software(s) does not take up all cores, some cores will go into idle/inactive state (C3 and below for Nehalem) and this further creates more headroom available for the boost. Let's look at this FUD from the FUDZone:

AMDZone.com • View topic - AMD's Magny Cours Architecture revealed: "So in order to accelerate single-threaded performance, ALL CORES are dissipating more power" Clearly this guy does not know what he is talking about or spread FUD in purpose. Per the whitepaper description, if the said software was to trigger the single threaded boost, the other cores were to be in inactive state at the first place, thus, how are they going to dissipating more power? :) When a core is in C3, it consumes very little power, and it would be even less for C6.

AMDZone.com • View topic - AMD's Magny Cours Architecture revealed: "So although CPU may have thermal detector to throttle its own clock rate, but the attempts to exceed TDP will affect the whole system as the extra heat must be dissipated to the environment. I believe this is why Turbo mode is turned off in many/most IT and datacenter environments." More FUDs here :) How is the Turbo Boost exceeding the TDP here? It is by definition to obey the TDP (utilizing the power/temperature/current headrooms). And wow, this guy 'knows' for the fact that Turbo mode is turned off in many/most IT and datacenter environments!!

AMDZone.com • View topic - AMD's Magny Cours Architecture revealed: "A clean implementation of such idea should take TDP restriction and software controllability into account. The latter is favorable to server/worktation environment so a system reboot is not needed to enable/disable the feature. In addition, it might also be preferable to implement better thread/process affinity to cores, so the running threads/processes won't jump across different cores excessively." the first part is especially funny when compare to the second quote later. For the software control to enable/disable the feature without reboot, without touch any internal information, this feature could be implemented currently with ACPI (hint: read the spec on Pstate related method) but I do not think any OEM doing that as why to allow that at runtime anyway. Now let's have fun on the first part:

http://www.amdzone.com/phpbb3/viewtopic.php?p=160965#p160965: "Lets take a slightly different point of view and ask what would it take for AMD to implement "Turbo Mode"? What mechanisms are missing?
...
For AMD to implement something like the Turbo mode, two extra things are needed. (1) An on-die thermal sensor that accurately detects core temperature. (2) The "reverse" CnQ driver that set target P-state to ones that are above the default max. According to AMD's latest revision guide, the thermal sensor problem in Phenom is already fixed in the Phenom II revision, so requirement 1 is met. What is needed is then BIOS/driver support for one or two extra P-states above the current default maximum.

Implemented correctly, it should be as reliable as the CnQ (probably more reliable than Intel's Turbo Mode). I suppose AOD/3 is already doing something similar. We know how good AMD's CnQ is, and how easy it is to active/support it. I really don't see such "self-overclocking" is much a big deal."
per what he described, AMD just need to measure the temperature (while Intel's measuring of power/temperature/current is not enough:), which clearly in correlation to the TDP btw)

enough talking about the FUDs, let's look at the other intelligent/unintelligent doubt there (at least not totally ill-intention).

AMDZone.com • View topic - Valencia and 16-core Interlagos are based on Bulldozer!: "This feature is really ONLY about the benchmarks. Having a chip clocked at it's highest and then underclocking is a much better design."

http://www.amdzone.com/phpbb3/viewtopic.php?p=160964#p160964 : "WHY would they bother?

FIRST:

Actually I see TurboBoost as being "kind of" dishonest. If the chip can run faster and has passed testing and validation at the higher speed, then why isn't it just clocked higher? It is easy to see that a much more eloquent and cleaner design would be to go ahead clock it at the fastest it can be tested and validated at and then put in a mechanism to downclock cores independently if they are not needed. OH WAIT WE ALREADY HAVE THAT.

Thus: With the availability of dynamic underclocking the only reason for TurboBoost to exist is as a marketing gimmick. It might fool some people, and cause others to ignore the truth, but in reality it isn't that useful.

This is all about binning. For most of the shipped products, I do not think that that said CPU could run at the higher frequency while maintain its TDP category and reliability period. Beside, right now in the market it is fused as 2/1/1/1 frequency bin boost, in the future there could be W/X/Y/Z, with W much bigger than the Z. The said chip definitely cannot be marked with frequency W as the base operating frequency. It cannot marked with frequency Z either if you understand that some software exercise the CPU more that the other software. Turbo would be useful under that 'the other software environment' without breaking the TDP category.

SECOND:

Something even better than TurboBoost would be to allow the system to overclock differently for various specific applications. The user should be able to specify how much to overclock and set other parameters such as voltage. OH WAIT WE ALREADY HAVE THAT. But of course since it's not in microcode or in the bios then many people don't consider that "allowed" for benchmarking. And these same people will adamantly insist that TB be allowed for benchmarking because "real system performance" is more important than clock per clock comparisons. But that same argument can used to defend TB also applies to using tools such as AOD. (Personally I think BOTH should not be used for comparative benchmarking. Comparative means we want to KNOW how they compare. Dynamic features only confuse the issue.)

I am not too sure why one could claim this, as the hardware assit TB can help 'OC' safely by measuring the CPU headroom, while the recommended approach actually user setting for each application which might or might work properly. I do like the AOD profile ability for certain usage, of which i won't describe here :)
THIRD:

And one of the most relevant points for SERVERS: Most experienced administrators do NOT want added complexity in their systems. Period. Especially when it only adds another point of failure. And as others have mentioned: The small amount of "the best performance at all costs" people don't make up a large enough demographic to spend time and money on."


No experience in server administration and thus i won't comment much on this, but i believe this is lame excuse because there are way more other features add more complexity to the system than the Turbo mode, which merely dynamically changing frequency which until recently already implemented in the reverse direction (SpeedStep).

Ok, enough of the Q&A and fun:).

Thursday, July 03, 2008

Fanboism

It has been quite a while since my last post. It is actually more than a year! I have been commenting mostly in Roborat64's blog instead of working of any of own posting. I also tried once posting in AMDzone using the name p4nee (some special meaning in Chinese :)) teasing on their double standards , but got banned at 7th post, because i returned back a same word to some one that implied on me. I have stopped posting since after, why I have to let those fanbois control my ability to post?

Nevertheless, I still visit that site for some jokes (believe me, they are! :)), and once a while Scientia's blog too which is getting less commenter now. Some of the jokes there are just too outstanding and thus I decide to keep them here. Below is the first one, and there will be more to come when i have time to dig out some of their older thread or find some new one. Enjoy! :)

From AMDZone

by abinstein on Thu Jul 03, 2008 2:00 am

Maybe I'm just naive but I don't think nVidia's problem is AMD's gain. AMD's #1 enemy is Intel, which anyone with a clear mind knows that it plays leaps and bounds dirtier than nVidia or any company on earth. If nVidia becomes weaker, it will bow lower to Intel's monopoly force, which in the end hurts AMD and the whole industry.

X86 is an instruction that should've been gone long ago but got life-supported by Intel's monopoly tactics. Now Intel's trying to put x86 into graphics? Please guys, if not for x86, with the same engineering effort the industry has put into PC, we could've been running 4-5GHz Power6-like CPUs on our desktops! Lets still hope nVidia and its GPGPU gets enough momentum to stop Larrabee.

by abinstein on Thu Jul 03, 2008 4:06 am
Woofermazing wrote:My memory is pretty vague, but wasn't Intel planning on having Itanium filter down to the desktop, and then AMD foiled that with the Athlon 64?

I believe AMD supported x86 for a good reason: they got a very good implementation (at that time) of the ISA, K7, which runs faster and scales better than any other implementation including Intel's P6 at that time. With any other instruction set (Power, MISP, EPIC, ...) AMD would've been non-competitive at all.

...

Monday, February 26, 2007

PCI, Torenza , CELL, Network Processor and Fusion

This won't be a lengthy article as I don't usually spend long time writing a blog :)

Torenza is a good technology, but it is not a totally new technology. Every functions that it tries to provide, exist today. Co-processing existed on all sort of interfaces, with PCI being the most common one. Those co-processing, don't require low latency access by/to the CPU, bandwidth is one single biggest factor. Torenza adds low latency into the picture, which indeed will create a new frontier of co-processing for application requiring low latency at system level wise. However, while it claims to be open, it is not as 'open' as PCI and alls its derivatives. PCI specs are easily available with full details of information, and it is guarantee to be free on the entire technology. Geneseo will fill in the gap for low latency co-processor interfacing and will see a much wider technology adoption, even by being late into the game play. Todays' economy is economy of scales. Whose technology is most open, most used, most industries back up, will win.

While all these are trying to provide more co-processing power at system level, internal dynamic co-processing chip designs already existed or soon to be exist. Perhaps CELL is the most known example of this, where the coprocessing requirement, can be dynamically programmed. I have a strong feeling that the concept were from its own network processor, which IBM exited the business few months before the CELL launched. Intel still sells its IXP range of network processors.

Fusion, to me is just a funky name used for marketing purpose. It is another level down, no dynamic co-processing at runtime, but at design/manufacturing time. It provides jigsaw puzzle ability to AMD to pick and match component within a silicon. Anyway, this technology is not new at all, a single silicon today can actually be packaged into multiple chips SKU, by fusing (or derivative design) some of its components within the silicon. All AMD plans to do on this is to add GPU (and of course some changes to make it homogeneous system wise). AMD is smart enough to target the mobile platform first, which the power is its key strength.

After the internal dynamic co-processing, may be the next step is FPGA within a chip, which will provide even dynamic co-processing nature.




Thursday, February 22, 2007

Barcelona's code name is not K10

So many people were trying to guess the correct code name for the AMD's coming Barcelona processor. INQ (and later those fanbois Sharikou, Scientia and alikes) wished to call it K10. Well, it is not K10.

Guess what, K10 is still under development, so it can't be K10. K9 is just a bad name.

Again, Barcelona's code name 'is not' K10, period.

p/s: K8L is more likely referring to Barcelona

Wednesday, December 13, 2006

The Secret of MegaTasking - Revealed!

Just had a chance to talk to Henry from the A company, to understand the funky terms - MegaTasking ...

Q: What is multitasking?
A: The running of two or more programs in one computer at the same time.

Q: Then what is the different between MegaTasking and multitasking?
A: MegaTasking has some similar sense with the multitasking, but it is more than just computing?

Q: Can you elaborate further?
A: Sure! .The MegaTasking is about convergence. While our competitor is/was talking about the computing, networking and communication convergence, we aim much higher. It converge almost everything for your daily life stuff. For what you do in the living room, kitchen, laundry, neighbor, society, lan party, (342 words omitted). With our sophisticated design, advanced process, brilliant individuals, smart executives, enormous fanbois base, (1178 words omitted) ...

Q: So ...? What is it has anything to do with the living room? are you talking about your 'Live' stuff?
A: Nope, more than that.

Q: Is that so?
A: Yup, the moment you turn on the MegaTasking in a living room, you got a PC and heater.

Q: What if it is summer?
A: Err ... you just turn your living room to a Sauna Spa.

Q: ... then the kitchen?
A: With a proper casing, you just got yourself an oven. You can look at the screen for the recipe and bake the cake at the same time!

Q: ... ... then the laundry?
A: Just put your wet clothes close to the fan, it will instantly dry. Better than most of the commercially available clothes drier!

Q: ... ... ... then the neighbor?
A: What's more fun than directing the noise to your stupid neighbor that use our competitor's product? I'm sure our fanbois base would love this.

Q: ... ... ... ... and anything else?
A: What's even more fun than 'legally' disturb your opponent with noise and heat in lan party game competition?

Q: ... ... ... ... ... and some more?
A: The MegaTasking is a innovation for someone innovative. Think of it for few minutes, i'm sure you can list out more than what i have said.

Q: ... ok, anything else to say?
A: Yup, our MegaTasking is fast.

Q: How fast?
A: It can consume 1 MegaWatt in just 41 days consider full day usage. Our competitor not even close to that.

Q: What about the computing speed?
A: Sorry, I gotta take a whiz ... bye.

Friday, December 08, 2006

Another Joke of the Day

Quote from http://www.newsfactor.com/story.xhtml?story_id=013001BYD5Z4
In making the announcement, AMD executives said that even at 90 nanometers and 90 watts, its chips, on average, consume half the power of an Intel Core 2 Duo. But Athlon's power consumption will drop even further, AMD said, with the 65-nm chips that will run at an average 65 watts.

Wow, the AMD executive hiring criteria is able to lie without the eye blinking? :)