z/OS is IBM’s flagship operating system for its mission critical z System enterprise servers. Earlier this week IBM lifted the veil on the upcoming z/OS 2.3 release, with general availability planned for September, 2017. IBM’s Early Support Program is potentially available to those z/OS licensees who can’t wait that long. (Ask “your friendly IBM representative” if you’d like to join the ESP.) The preview announcement is chock full of interesting new features. I am particularly interested in these items:

  • z/OS 2.3 will require an IBM zEC12, zBC12, or higher model machine. That’s a fairly aggressive leap forward, but in my view it’s a sensible one. Setting a higher baseline means that z/OS can do more to exploit newer instructions and other newer model features.
  • z/OS and its predecessor, OS/390, have included encryption services for decades, but this release makes encryption of data sets, zFS, and CF structures much easier to implement because there’s no need to change applications.
  • Communications Server will incorporate a network security “sanity check” (my term) feature, to warn administrators if they’re not being at least reasonably careful to encrypt network connections. (Philip Young might appreciate this new feature.)
  • z/OSMF will support a standard software installation package for z/OS, based on some recent vendor consensus. Consequently z/OSMF will support both SMP/E and non-SMP/E (but also standard) software installation.
  • z/OSMF will integrate the z/OS incident log with IBM’s support database to find likely APARs (fixes or documented workarounds) automatically.
  • XML System Services will use 64-bit address spaces, to handle larger XML documents.
  • Eight character TSO/E user IDs!
  • IBM Knowledge Center for z/OS will include a message lookup feature.
  • The Sub-Capacity Reporting Tool (SCRT) is moving into the base operating system, and IBM is opening it up to software vendors.
  • Workload Manager will have a “maximum zIIP” control (my term) that can be applied to particular service classes. WLM will require all zIIP-eligible work within these controlled service classes to execute only on zIIPs, never on general purpose engines.
  • Online migration of HFS to zFS, and online repair of zFS. (IBM expects that the z/OS release after Version 2.3 will be the last to support HFS.)
  • RACF user ID to e-mail conversion and mapping.
  • JES2 job notifications and z/OSMF event notifications via e-mail.
  • Although z/OS 2.3 will still be available on tape media, IBM says it’ll discontinue tape media deliveries of z/OS and related products in the future. Electronic delivery (Internet downloads) and DVD media will continue to be available.

Why do eBay’s employees (plural!) have access to all of eBay’s master customer databases? DB2 for z/OS offers strict security separation (e.g. SECADM, DBADM, and other, highly selective authorities). eBay (and Paypal) should be using a real mainframe database with real security to protect its customers and its business reputation.

Choose the right tool for the job, eBay.

Nicolas Raymond ~  Free Grunge Textures

How to Destroy a Sysplex

To say we had an interesting Business Recovery Exercise this week would be an understatement!

Since bringing our BR (business recovery) / DR (disaster recovery) solution in house, rather than performing offsite, we’ve had a total of five BR Exercises this year alone.  This is pretty impressive for our shop since we use to go YEARS between BR Exercises.  Now our clients can declare a BR Exercise without prior notice to ensure our infrastructure is sound and solid.

Our infrastructure IS sound and solid…provided no one  messes with it!

Two months earlier I was doing what I thought was helpful clean up on RACF.  I was adding a new PROFILE for a monitoring application.  Our RACF expert had just recently retired and our new RACF person was not quite trained and up to speed.

On occasion I would go in and “fix up” some things in RACF trying to helpful.  Although I had ADMIN rights to reset PASSWORDS when I’m on-call,  I’m not really suppose to mess around in RACF.

But what’s the worse that can happen?

I honestly thought I was doing something good by deleting a VERY suspicious * (G)ENERIC profile.

Disaster Recovery_RACF_profiles
(* I have my very own screen shot auditing script that captures my screen every minute on my workstation.  It was able to capture the quiet destruction of the sysplex.)

To me this generic profile seemed a security risk and decided to take matters into my own hands (since the new guy surely was not going to) and DELETED this profile!

Disaster Recovery_RACF_delete
Ah oh!!!
Disaster Recovery_RACF_warning
“You still have a chance to undo this Paul!!!”
Disaster Recovery_RACF_refresh
Nope.  Profile is deleted.

Quiet Disaster

What I didn’t realize what I had done is that instead of making the system more secure I delete a VERY important PROFILE that’s used at IPL.

As Michael Cairns excellently describes in his article “Addressing Common RACF Configuration Issues“, that * GENERIC PROFILE was the catchall profile.

[The] class SURROGAT profile consisting simply of "**" or "*.*" (sometimes called a catchall profile). It applies to all user IDs that aren't matched by a more specific profile and probably covers your user ID unless steps have been taken to avoid this.
Without a catchall generic profile of some kind in the class STARTED, a previously undefined started task will fall back to the contents of ICHRIN03. 
If fallback to ICHRIN03 can happen, you need to know what privileges it's granting.

That’s exactly what happened.

We started the Business Recovery Exercise and the system upon the first IPL came to a screeching halt.  Apparently JES2 (Job Entry Subsystem) did not have authority and the ICHRIN03 was poorly coded.

But…NOTHING has changed!!!!

Imagine the frustration my fellow colleagues (and myself before discovery) were experiencing.  Here we were doing our FIFTH BR exercise this year.  It always worked.  It never failed.  We had a perfect mirror of our working production.  Nothing had changed!

To make a long story (and painful one for me) short, we opened a Service Request with Severity 1 with IBM.  This is equivalent to calling 911 or pressing the nuclear panic button when you need IBM support and need it fast!

We were directed to a teleconference with their JES and RACF experts and with their AWE INSPIRING expertise guide us to the discovery that yes, we were missing that * GENERIC PROFILE in RACF.  Since JES2 at our shop started in a certain sequence we were unable to re-create this PROFILE on our BR system.

Since this was a mirror of our production we discovered that we were in fact vulnerable on our PRODUCTION SYSTEM!!!

If we had IPL’d any of our production LPARS, meaning recycling them, there was NO WAY they were coming back up.  JES2 would have ran into the same authority issue error and the entire system would be in a matter of speaking…toast!

Luckily we caught this and were able to RECREATE the profile on our PRODUCTION system so we could mirror it over to the BR SYSTEM and finish the exercise.

Take away lessons:

  1. NEVER…  EVER…   MESS WITH RACF! (At least without knowing what you’re doing.  My RACF roles have been relinquished to the appropriate people.)
  2. Business / Disaster Recovery Exercises are there for a REASON!  If you’re not doing it at your shop, how do you know you’re not vulnerable?