Thursday, February 21, 2008


Isaiah revealed: VIA's new low-power architecture
And then there were three

So much of the PC world's coverage is focused on the horse race between Intel and its archrival AMD that we often forget about the other x86 processor company out there, the one that's still well-known among the crowd of tweakers, hackers, and enthusiasts who build their own home firewall boxes and in-car PCs. I'm talking, of course, about VIA, maker of the low-power, low-cost, and also relatively low-performance x86 processors at the heart of many special-purpose DIY boxes. VIA's processors, designed by the company's Centaur subsidiary, focus on keeping costs and power down at the expense of performance.

VIA's newly launched processor architecture, known for the last three years by its codename, "Isaiah," will keep the company's focus on cost and power intact while taking things in a substantially different direction. In short, this year will see something truly odd happen on the low end of the x86 market: VIA and Intel will, architecturally speaking, switch places. Intel will take a giant step down the power/performance ladder with the debut of Silverthorne/Diamondville, its first in-order x86 processor design since the original Pentium, while VIA will attempt to move up into Intel's territory with its first-ever out-of-order, fully buzzword-compliant processor, codenamed Isaiah.

In this brief article, I'll give an overview of Isaiah and of what VIA hopes to accomplish with this new design. Most of the high-level details of Isaiah have been known since at least 2004, when VIA began publicizing the forthcoming processor's general feature list (i.e., 64-bit support, out-of-order execution, vector processing, memory disambiguation, and others). So I'll focus here on a recap of those features and on a broader look at the market that VIA is headed into.

Introducing Isaiah

                                                                                              

The Isaiah processor, which was first unveiled at the Fall Processor Forum in 2004, will start shipping in the spring of this year. The new-from-the-ground-up processor is fabricated on an unnamed 65nm process (VIA isn't ready to reveal who its foundry is) and at some point at a year or more out, it will shift to 45nm. As is typical of VIA, the company will use process shrinks to gain cost and power advantages, and not to increase performance by ramping up clockspeed. "Good enough" performance is the goal, and now that the company has made the leap to out-of-order execution (see below) it can focus on maturing the basic Isaiah design by eliminating bottlenecks when they do core revisions.                  
  Die plot of the Isaiah

(Note: Much of the language in VIA's architecture white paper stresses that this or that feature is characteristic of the "initial" Isaiah architecture, with "initial" in italics for extra stress in some cases. VIA has also said a number of times, both in the white paper and in my conversation with them, how much they've learned in designing and implementing Isaiah over the past few years. So we can expect parameters like decode width, issue width, buffer depth, and so on to change with the next core revision.)

Isaiah is pin-compatible with existing C7 processors, and VIA is promising two to four times the performance of C7 while staying within the same power and thermal envelope. The company also says that Isaiah will sell for about the same price as existing C7 parts, which means that VIA parts will continue to be favorites with the small form factor crowd that builds Mini-, Nano-, and Pico-ITX systems.

Though the current Isaiah parts are single-core, VIA assured me that it does have a dual-core variant in the works, but wouldn't say much more. Isaiah was designed with dual-core in mind, and Centaur's president, Glenn Henry, suggested that a dual-core part would probably happen at the 45nm node.

Isaiah also brings to VIA's line-up support for the latest and greatest in the alphabet soup of x86 ISA extensions that AMD and Intel have introduced over the past few years. Intel's virtualization extensions are supported in the new processor, as are the various SSE flavors.

The new processor also contains some added support for security features, like an on-board random number generator, hardware acceleration for popular crypto algorithms, and a "secure execution mode" that lets instructions access a private "volatile secure memory" (VSM) area on the chip. This added mode and special memory pool are both unique to VIA products, so I don't know enough about them to go into any detail on them. It's likely that they may find use in some application-specific embedded situations.

Isaiah's front end

The shift to out-of-order execution means that Isaiah joins the rest of the modern x86 processors in breaking down long, variable-length x86 instructions into shorter, uniform micro-ops (or "uops"). This uop translation means that Isaiah's front-end pipeline is now fairly bulky and features the stages and hardware that readers of my book or my past processor articles will recognize from comparable Intel and AMD designs.

                                       
                                          Isaiah's architecture. Image source: VIA

Isaiah fetches instructions into a two-cycle decode phase that can take in three x86 instructions of any size or type per cycle. VIA claims that the decode phase can do both Conroe-style macro-fusion of some x86 instruction combinations, like compare and jump, and micro-ops fusion of instructions that use different issue ports. As is the case in Intel's Conroe (also known as the Core 2 Duo), these two types of fusion cut down on the amount of bookkeeping logic needed to track in-flight instructions.


                                     
                                      Isaiah architecture branch prediction. Image source: VIA

Recent years have shown that for more deeply pipelined out-of-order machines, branch prediction is one of the places where microarchitects get the best power/performance return-on-investment for transistors spent. So like its competition from Intel and AMD, Isaiah spends quite a few resources on branch prediction in both the fetch and decode phases of its pipeline. The processor has a total of eight branch predictors spread out over two of its fetch and translate/decode stages, each of which targets a different type of branch, and all of which vote to determine the prediction and branch target for speculative execution.

Isaiah's back end

Isaiah's decode phase passes instructions into a uop queue, where they fall into the processor's re-order buffer (ROB) for register renaming and allocation into reservation stations (RS). There's no word from VIA on how deep the ROB and RS buffers are, but if people are dying to know I can ask and I'm sure they will tell me.

The ROB + RS hardware, which is the heart of the out-of-order engine, issues up to seven instructions per cycle to any of seven execution units:
Two 64-bit integer units
Two 128-bit vector/floating-point units
Three load-store units (store-address, store-data, and load-data)

The FP/vector units, which VIA calls "media units," aren't symmetric; using a typical division of labor, one focuses on addition and the other on multiplication (and probably permutes). These FP/vector have fairly robust floating-point capabilities—more so than I would expect from a processor that mostly be used in embedded and low-power situations that typically see more integer-intensive, branchy code.

In particular, the FP units can do any type of floating-point add (vector, scalar, double-precision, or single-precision) in only two clocks—at least one clock less than the Core 2 Duo's three- or four-clock latency. The FP multiply hardware is similarly speedy, and is capable of executing single-precision multiples in three clocks and double-precision multiplies in four.

VIA attributes this low FP latency to a "completely new algorithm" for floating-point adds, but I have my suspicions that Isaiah may also be sacrificing some clockspeed scaling headroom in order to keep its pipelines short. Even if it is, this won't affect it much, since the point of Isaiah is low power and not high clockspeeds.

Data flow and cache hierarchy

On the data-flow side of the processor, Isaiah can do the kind of memory disambiguation that Intel's Core 2 Duo uses to commit memory writes out-of-order. Isaiah also can do store merging, where smaller writes are combined with larger writes in the write buffer and sent out to memory as a group.

Isaiah's cache hierarchy has some important differences from its competitors. Its 64K L1 instruction and data caches are twice as large as the typical 32K L1 caches of its competitors. The processor's L1 and 1MB L2 caches are also exclusive, meaning that data that resides in the L1 is not present in the L2, and vice versa. This exclusive design can have its drawbacks, but it makes the L1 + L2 function like a single, larger cache. Both the L1 and L2 caches are 16-way set associative.

Isaiah also has a special data prefetch cache that should help it save space in the regular cache hierarchy. Data that's prefetched doesn't typically get accessed more than once, so there's no need to take up regular cache space with it. The data prefetch cache solves this problem putting prefetched data into a special 64-line cache.


Conclusions                                                                     
                                                                    
VIA is aiming Isaiah at the same segment that Intel, AMD, and ARM are targeting with their forthcoming processors: the so-called mobile Internet device (MID) and ultramobile PC (UMPC). Right now, the fate of the UMPC as a form factor has yet to be decided, so there's no guarantee that there will even be a market there for everyone to fight over. It's possible that the real action will be in the emerging flash-based laptop form factor (see the Asus Eee PC or Apple's new MacBook Air) or in the (increasingly ill-named) "smartphone" category

                                                  

Regardless of what happens to the UMPC, though, 2008 will see a mix of low-power x86 devices, both in-order and out-of-order, from all three x86 players as they go head-to-head with the increasingly complex RISC processors from companies like ARM and MIPS that currently own the low-power space. ARM in particular will throw its hat into the out-of-order ring later this year with their forthcoming dual-core Cortex A9 processor. Whether any of the x86 contenders, whether they go out-of-order like VIA or in-order like Intel, can take on the reigning champ of the mobile and embedded space remains to be seen. But Isaiah looks like a worthy contender at the very least.

Appendix: making the leap to out-of-order

How could Isaiah possibly deliver on a two to four times performance increase over its predecessor in the same thermal and power envelope? Part of the answer is in the jump to 65nm, but tied up with that move is a change to the way that the processor executes instructions that's enabled by Isaiah's enlarged transistor budget.

With its in-order design built on an aging but inexpensive 90nm process technology, the VIA C7 (codenamed Esther) and its predecessors have long lagged the x86 competition in terms of performance-enhancing microarchitectural features. Specifically, VIA's processors—even the ones launched as recently as 2005—have so far lacked the one crucial architectural feature that separates the Pentium from the Pentium Pro and that is the hallmark of almost all modern desktop- and server-oriented processor designs: an instruction window.

In a nutshell, an instruction window is what enables modern processors to dynamically reorganize the sequential instruction stream so that instructions can execute inside the processor in an order other than the one in which they were placed by a programmer. On an in-order processor like the C7, an instruction that takes a long time to execute or that's waiting for data can stall the processor so that no other work gets done until that instruction finishes executing. In contrast, an out-of-order processor with an instruction window can allow instructions to flow around the problem instruction so that the processor continues to work; the instruction window's bookkeeping apparatus enables it to put the instructions back into their original order before writing their results out to programmer-visible memory.

As you might imagine, an instruction window is fairly complex to implement, and the costs associated bookkeeping apparatus described above don't even tell the whole story. Because an out-of-order core can reorder and track only small, uniform instructions, the long, variable-length x86 instructions that come into the processor must be broken down into a series of uniform instructions called micro-ops. So the front end of the processor, which is the part that prepares the variable-length x86 instructions for execution, also balloons with extra translation and decoding hardware.

All told, the transition from in-order to out-of-order is a massive leap in hardware complexity and size for an x86 processor, which is why it has taken the power-conscious VIA line so long to get here. Conversely, jettisoning all of that extra hardware can save tons of die space and power, which is why Intel's forthcoming mobile-oriented Silverthorne/Diamondville processor is an in-order design.

No comments: