In a triumph of PR right up there with suggesting that Intel executives ever badgered Microsoft executives into doing anything, IBM this week introduced a new generation of mainframe computers. The IBM System z10 is smaller, faster, cooler, has more memory, more storage — more of everything in fact — and all that is crammed into less of everything than was the case with the z9 machine it replaces. Touted as more of a super-duper virtualization server than traditional big iron, the only problem with the z10 is that every bit of its superior performance can be easily attributed to Moore’s Law. The darned thing actually should be faster than it is. There’s a mainframe revolution going on all right, but it’s not at IBM. The real mainframe revolt is taking place inside your generic Linux box, as well as at an outfit called Azul Systems.
I’m perfectly happy for IBM to introduce a great new mainframe computer. It’s just that the 85 percent faster, 85 percent smaller and a little bit cheaper z10 is coming three years after the z9, and Moore’s Law says sister machines that far apart ought to be 200 percent faster, not 85 percent — a fact that IBM managed to ignore while touting the new machine’s unsubstantiated equivalence to 1,500 Intel single-processor servers.
Where were the hard questions? Did anyone do the math? The tricked-out z10 that’s the supposed equivalent of 1,500 beige boxes costs close to $20 million, which works out to $13,333 per beige box — hardly a cost savings. Even taking into account the data center space savings, power savings, and possibly (far from guaranteed) savings on administration, the z10 really isn’t much of a deal unless you use it for one thing and one thing only — replacing a z9.
So the newfangled mainframe is really just an oldfangled mainframe after all, which I am sure is comforting for folks who like to buy oldfangled mainframes.
But those sketchily described IBM benchmarks are, themselves, dubious. IBM never fully explains its own benchmarks nor does it even allow others to benchmark IBM mainframe computers. So nobody really knows how fast the z10 is or how many Intel boxes it can replace if those boxes are actually DOING something.
Remember the stories folks like me wrote a few years back about an earlier IBM mainframe running 40,000+ instances of SUSE Linux under VM on one machine? I wonder how many of those 40,000 parallel Linux images were simultaneously running Doom? My guess is none were.
Far more interesting to me is the vastly increasing utility of Linux as what I would consider a mainframe-equivalent operating system, primarily due to the open source OS’s newfound skill with multiple threads that goes a long way toward making efficient use of those multi-core processors we all are so excited to buy yet barely use.
As I wrote a few weeks ago in a column on semiconductor voltage leakage of all things, all this multi-core stuff is really about keeping benchmark performance up while keeping clock speeds down so the CPUs don’t overheat. Unlike the benchmark programs, most desktop applications still run on a single processor core and have no good way to take efficient advantage of this extra oomph.
But that’s changing. Linux used to be especially bad at dealing with multiple program threads for example — so bad the rule of thumb was it simply wasn’t worth even trying under most conditions. But that was with the archaic Linux 2.4 kernel. Now we have Linux 2.6 and a new library called NPTL or Native POSIX Thread Library to change all that.
NPTL has been in the enterprise versions of Red Hat Linux for a while, but now it is here for the rest of us, too. With NPTL, hundreds of thousands of threads on one machine are now very possible. And where it used to be an issue when many threads competed for data structures (think about 1,000 threads all trying to update a hash table), we now have data structures where no thread waits for any other. In fact, if one thread gets swapped out before it’s done doing the update, the next thread detects this and helps finish the job.
The upshot is superior performance IF applications are prepared to take advantage.
“My e-mail application runs on a four-core Opteron server,” says a techie friend of mine, “but I’ve seen it have over 4,000 simultaneous connections - 4,000 separate threads (where I’m using “thread” to describe a lightweight process) competing for those four CPU’s. And looking at the stats, my CPUs are running under five percent almost all the time. This stuff really has come a long way.”
But not nearly as far as Azul Systems has gone in ITS redefinition of the mainframe — extending further than any other company, as far as I can tell, models for thread management and process concurrency.
Azul makes custom multi-core server appliances. You can buy a 14u Azul box with up to 768 processor cores and 768 gigabytes of memory. The processors are of Azul’s own design, at least for now.
But what’s a server appliance? In the case of Azul, the appliance is a kind of Java co-processor that sits on the network providing compute assistance to many different Java applications running on many different machines.
Java has always been a great language for writing big apps that can be virtualized across a bunch of processors or machines. But while Java was flexible and elegant, it wasn’t always very fast, the biggest problem being processor delays caused by Java’s automatic garbage collection routines. Azul handles garbage collection in hardware rather than in software, making it a continuous process that keeps garbage heap sizes down and performance up.
Language geeks used to sit around arguing about the comparative performance of Java with, say, C or C++ and some (maybe I should actually write “Sun”) would claim that Java was just as fast as C++. And it was, for everything except getting work done because of intermittent garbage collection delays. Well now Azul — not just with its custom hardware but also with its unique multi-core Java Virtual Machine — has made those arguments moot: Java finally IS as fast as C++.
But for that matter there is no reason to believe that Azul’s architecture has to be limited to Java, either, and can’t be extended to C++, too.
To me what’s exciting here is Azul’s redefinition of big iron. That z10 box from IBM, for example, can look to the network like 1,500 little servers running a variety of operating systems. That’s useful to a point, but not especially flexible. Azul’s appliance doesn’t replace servers in this sense of substituting one virtualized instance for what might previously have been a discrete hardware device. Instead, Azul ASSISTS existing servers with their Java processing needs with the result that fewer total servers are required.
Servers aren’t replaced, they are made unnecessary at a typical ratio of 10-to-one, according to Azul. So what might have required 100 blade servers can be done FASTER (Azul claims 5-50X) with 10 blade servers and an Azul appliance. Now that Azul box is not cheap, costing close to $1,000 per CPU core, but that’s comparable to blade server prices and vastly cheaper than mainframe power that isn’t nearly as flexible.
And flexibility is what this is all about, because Azul’s assistance is provided both transparently and transiently. Java apps don’t have to be rewritten to accept assistance from the Azul appliance. If it is visible on the network, the appliance can assist ANY Java app, with that assistance coming in proportion to the amount of help required based on the number of cores available.
Now imagine how this would work in a data center. Unlike a traditional mainframe that would take over from some number of servers, the Azul box would assist EVERY server in the room as needed, so that you might need a big Azul box for every thousand or so servers, with that total number of servers dramatically diminished because of the dynamically shared overhead.
This is simply more efficient computing — something we don’t often see.
There are other concurrency architectures out there like Appistry (which I wrote about back when it was called Tsunami before we unfortunately HAD a Tsunami — what sort of marketing bad luck is that?). But where Appistry spreads the compute load concurrently across hundreds or thousands of computers, Azul ASSISTS hundreds or thousands of servers or server images with their compute requirements as needed.
Bear Stearns runs its back office with Azul assistance, but many customers use Azul boxes to accelerate their websites.
Since I am not a big company guy who cares very much about what big companies do, what I see exciting about Azul’s approach is how it could be applied in the kinds of data centers where I am typically renting either virtual or dedicated servers. If an Azul box were installed on that network, my little app would instantly and mysteriously run up to 50 times faster.
Cool.