Henri Kuiper (@henrikuiper) has an interesting, insightful observation illustrating one of the many reasons IBM z Systems are essential to secure, reliable enterprise computing.

On March 9, Google’s “Project Zero” published proof that a wide variety of computers are vulnerable to DRAM “rowhammer” attacks. As chip density increases, it’s becoming increasingly difficult to confine and control electrons within such tiny dimensions. DRAM (Dynamic Random Access Memory), an IBM invention, is an extremely important component in practically every computer, including in IBM z Systems. DRAM chips form the computer’s main memory, and if a computer’s memory does not function correctly, results are unpredictable or worse.

That’s exactly what Google’s security research team discovered, based on some earlier research by Yoongu Kim and others. Using a specially but easily crafted piece of code, they were able to influence the bit values of DRAM that the code shouldn’t have had access to, that should have been protected. The reason, in everyday language, is that electrons “hopped,” “leaked,” and/or “flipped” (take your pick). The individual DRAM chips are sufficiently unreliable now (and to some extent also in the past) that you cannot depend on their faithfully storing and retrieving binary zeroes and ones. You can “hammer” DRAM in certain ways that disturb other memory locations. In fact, as the Google team demonstrated, it’s possible to exploit this particular hardware vulnerability to gain complete control of an entire system without authorization. At present they don’t know how to protect fully against this vulnerability because the problem is a fundamental, undesirable characteristic of the hardware.

That’s alarming! It’s not totally surprising, at least to those who’ve been paying attention, but the publication of this proof of concept and test results are quite disturbing. The IT security community has a lot of work to do, and so will hardware manufacturers. Hardware manufacturers will likely need to implement at least what the IBM PC had way back in 1981: DRAM parity checking. (Though that’s likely not enough protection even in lower security contexts, especially as DRAM circuits continue to shrink.)

OK, so what about IBM z Systems? Now the “secret” is revealed. Many years ago IBM’s engineers predicted this class of problems, and they wanted to prevent IBM mainframes from even approaching these problems. Several years ago, in the previous decade, the engineers got to work redesigning and improving the memory subsystems in IBM mainframes in anticipation of these DRAM density-related problems they would face in the component supply chain even selecting the absolute best components. In 2010, IBM introduced the z196 mainframe with a brand new memory subsystem incorporating a breakthrough innovation: RAIM (Redundant Array of Independent Memory) design. RAIM is analogous to RAID (Redundant Array of Independent Disks) for magnetic and solid state disks. All data are cross-checked for both reads and writes, and the failure (or misbehavior) of any single component does not threaten data integrity. In fact, IBM’s RAIM design can tolerate up to triple component failures and still continue running, with no application interruption. (It’s also possible and common to configure an IBM z System such that, in the unlikely event there is a memory hardware failure requiring eventual service, even memory repairs also occur without interrupting applications.) You can “rowhammer” IBM z RAIM as much as you like, but you’re absolutely not going to flip bits that aren’t your bits in your authorized storage area.

The latest IBM z13 mainframe, now shipping, incorporates the third generation RAIM subsystem, and every mainframe IBM sells includes at least second generation RAIM. You cannot disable RAIM protections, and you cannot configure an IBM mainframe without RAIM. RAIM is standard, not optional. When you buy or lease an IBM mainframe, the amount of memory you acquire — to pick a random example, 480GiB — is customer usable. Physically there are many more of the highest quality DRAM components inside the machine to support RAIM. But when you order a machine with 480GiB usable, you get 480GiB usable, after RAIM overhead. You can now order an IBM z13 mainframe with as much as 10TiB of usable, RAIM-protected memory.

IBM z Systems are the only servers in the world featuring this extreme, innovative level of memory protection, and that’s been true for nearly half a decade and counting. IBM stands alone here. As we’ve now discovered (or been reminded), RAIM is not only critical to ensuring the continuous operation of your applications, it’s also critical to ensuring the utmost security.

OK, but what about ECC memory? IBM invented Chipkill advanced ECC memory, too, and all IBM servers feature at least Chipkill memory. It’s an important, essential technology, and it works well. (Google’s security researchers were not able to demonstrate “rowhammer” vulnerabilities in the ECC memory systems they tested. My prediction is that ECC will quickly emerge as the minimum requirement to secure DRAM from rowhammer and similar attacks, even in client devices.) But ECC isn’t RAIM. RAIM is a big step beyond ECC memory, providing greater assurance in maintaining application availability even as DRAM densities continue to increase. That makes sense, of course. One of the core principles of IBM z Systems design is to maximize the safety margins as much as possible. Mission critical means mission critical, quite simply. So if you need the utmost in memory reliability — and often you do — then quite simply you need an IBM z System and its unique RAIM design.

Change is good

I don’t know if you’ve ever seen “The Day The Earth Stood Still“, but I always thought this scene holds a universal truth…..

“Technology is not your problem. The problem is you, you lack the will to change”

So why do we need to change?

It’s my firm belief that technology, by itself, is never the solution to a problem. There is just no way a slick and sexy looking ‘appliance’ by itself will magically fix your technology problems. Sooner or later the ‘real’ issue will pop up and cause the same problem over and over again.

Take for example a company facing severe challenges to keep their ‘backup window’ within the allotted time frame. Just getting faster drives and faster networks might solve the ‘short term’ issue, but as long as they don’t ‘take a step back’ and review what it really is they’re backing up and compare that to what they really need to be backed up, this ‘gain’ might only be marginal and most likely temporary of nature.

Taking that step back to reevaluate past decisions and configuration should always be the first course of action when facing ‘problems’ (for me this holds true beyond the technology domain). Just like how the car would have never be invented if we had just kept searching for ‘better and faster horses,’ I believe we will make the biggest progress in (Enterprise) IT once we allow change across the whole playing field.

But why don’t we?

This means we should be willing to change our infrastructure, our operating systems, our applications and middleware our procedures and maybe even our job descriptions….

And for a lot of us, change is not something we like.

I’ve always done it this way” and “But that would take a lot of effort” are frequently heard excuses for not changing. Whenever these “arguments” are given I cannot help but think about these demotivational posters.

When your basement is getting too packed with ‘stuff’ you keep storing down there would your solution be to move to house with a bigger basement? Or would it maybe seem smarter (and more efficient) to reevaluate your procedures regarding what to store there?

“To improve is to change; to be perfect is to change often.” 

Looking forward to your reactions in the comments section……


Kevin Dooley - http://www.flickr.com/photos/pagedooley/ Beat the Bashers!

Being the Mainframe Junkie that I am, I’ve been in a lot of situations where my platform has been under severe attack.

No, I’m not talking DDoS-attacks, but more of an ideological attack rooted in the (false) belief that the mainframe sucks, is old and will not continue to be among us for very much longer.

I have been working on The Mainframe (yes double capitals) since 1998 and have always heard them say my precious Mainframe will be made redundant within the next 5 years.

As of today, The Mainframe is still here and it’s getting bigger and bigger, not smaller and smaller. The Mainframe is still the dominant architecture in most of the Fortune-500 companies, a lot of governmental institutes and as of lately is growing out to be the number one platform when it comes to Large Linux Infrastructures.

Seeing as I’ve been invited to join the (ever growing ranks) of The Millennial Mainframer I’d deemed it fit to share some of my experiences where the unknowing started bashing The Mainframe and how I have ‘retorted’ to these blasphemies.

For my first post I will rant about Cost and Legacy……

Cost: “It’s too expensive

Expensive is a relative term to start with. Yes you have to dish out large amounts of money to acquire a machine packed to the rim with all the awesomeness that makes up a Mainframe. But when you take a step backwards and realize this is one machine to host a multitude of workloads, with the capability of being managed with a small amount of FTE’s ( Full-Time Equivalents) it’s a whole different ball game all of a sudden!

There’s no ‘let’s buy another server for this application’-approach. There’s no linear growth in the ‘workloads-to-manpower’-graph and due to the sheer fact utilization rates on Mainframes are a multitude of the rates on platforms of a lesser order there’s a much more economically viable picture to be drawn.

Set aside from plain acquirement and personnel costs there is the ‘software licensing cost’. My advice : get down to the bottom of the figures when it comes to cost.  For one I am pretty sure the PVU Pricing Model for WebSphere will turn out to be cheaper on a Mainframe Platform.

  [EDITOR ~ Paul Gamble:  If any readers could comment and walk me through this I would be eternally grateful…and smarter!!!] .

The Mainframe can host a variety of different workloads it’s a dead give away that it will be cheaper to bring extra workload to The Mainframe than to offload workload elsewhere…..

Legacy: “It’s Dinosaur stuff

This must be the biggest frustrations on my end. The similarity between Dino’s and The Mainframe is only true when you look at their awesomeness. For the rest the comparison goes completely bust.

I shall be the last to disagree with the fact that Mainframes date back from ‘ancient times’ (much like dinosaurs). But where at one stage T-Rex and his friends became extinct (possibly due to changing environments) The Mainframe has shown us that it is capable of continuing it’s evolution to keep sustainable.

The Mainframe has been virtualizing since like forever. From before I was born it was capable of running multiple workloads, in multiple ‘ADDRESS SPACES’ (images they say) on hardware less powerful than the CPU’s today. 

But my …….. (insert any X86 argument) has a gazillion virtualization instructions.

Yeah I know, sorry for ya, we’ve got one on The Mainframe, it’s called SIE (Start Interpretive Execution) and we don’t pronounce it ‘sigh’ for nothing 🙂

[Follow Henri Kuiper @  http://zdevops.tumblr.com/

Henri KuiperAbout Henri:

I’m a level 76 Fire Mage.
Oh noes, born in 1976, computer junkie since I was like 7 (BBC Micro) and moved up from Commodore to MSX then Amiga then Linux. Been a Mainframe Junkie since 1998 and consider myself a full-fledged systems programmer. I understand (and work on) the mainframe from the IO drawerr all the way up to the end-user application.

Proud member of the Millennial Mainframer 🙂