United States
Site Map Contacts Hitachi Global
Techno Musings Blog - Content and Information Management Hitachi - Inspire the Next

Content and Information | Physical Infrastructure | Enterprise Systems Management

Home > Corporate > HDS Blogs > HDS Blog Roll > Techno Musings
Products, Solutions and more

Techno Musings

Sometimes You Can’t Help Yourself…

And you make a mistake that is rather like watching a train wreck — this is a highly relevant analogy for this post.  Well last week I was headed from Shin-Yokohama to Odawara and accidentally got on the wrong train. Four hours and a 5 minute stop in Nagoya later I was right back where I started.  You see my brain mixed up Nozomi with Kodama and Kodama with Nozomi, a mistake.  As a result I sprinted to the platform and got on the train.  Well 10 seconds later, just after the doors closed I realized my mistake — and also a fellow passenger joked at my “Oh no” vocalization — basically there was no turning back at this point.  I had to take a seat, get some work done, fortunately I have a laptop with a long battery life, and contemplate my mistake burning it into long term memory.  Needless to say I will not make that mistake again and this will cause extra vigilance in my investigation of which Shinkansen I’m supposed to get on.  Because someone else did the same thing as I did a couple of months ago, I was able to learn a lesson from them and went immediately from the train that I took from Shin-Yokohama and got on a return Nozomi back to where I came from.  So I was able to recover and I made lemonade out of lemons by doing a bunch of work on the train — another lesson I learned from a colleague, Jim.

However this was not the first mistake I’ve made, I did want to tell the story of another mistake in my IT career: the dreaded “rm -rf . /” on a Solaris 2.6 machine as root.  Just like getting on the Nozomi and realizing the mistake via the horrible flash of adrenaline I had the same feeling when I did the recursive remove command.  Well fortunately, someone in Sun — yes I know sOracle now — designed the file system hierarchy such that the second root level directory which is removed is /devices — I think that the first was /bin.  This meant that all of the raw devices which had the file systems on them had their device files removed first.  If you understand UNIX and UNIX-like systems you should immediately understand that since the raw devices were removed first the removal of anything after that was basically halted.  So the “rm” exited with an error and I still had that sinking feeling.  Well, fortunately we had a really smart problem solver at the time and he devised a scheme whereby we booted off of an OS CD, and recovered the /devices directory from another backup on a tape.  We then booted the Solaris system, but before doing so instructed it to rediscover all of its devices — I think from from the boot prompt you have to perform a “boot -r”. This did the trick and was a pretty solid way to recover the system — note I skipped a few steps.  This system was really important and we did not know just how important it was until the “rm” event happened.  You see it was the source code repository for a huge project and it had not been through the backup cycle in nearly a year or so.  While the lemon part of this was the recovery, actually the lemonade part was that we got it back into the backup schedule.  Doing this kind of low level system recovery work is pretty crucial to any problem solver in the IT space.  I’ve done my fair share of booting from install media and running lilo or creating files with “echo” and “cat” — hey sometimes when “ls” is not available “echo *” is the only thing you can use to list files in a directory — to recover a system.

Okay so now that I’ve outed myself, I do want to point out why I’m writing this post in the first place: design.  As engineers we need to put on our “black hats” and think about what if the system fails.  We need to think about how out users might recover from such a failure, and if possible we need to think about putting in safeguards to prevent our users from performing the mistake in the first place.  Hitachi takes this approach already and sometimes you find it via enhanced usability, while other times you will find in not being able to do something because it is dangerous.  Safety is something that Hitachi really does get in spades and well we have to we build nuclear reactors and heavy earth moving equipment.  Operator error in these two spaces alone has serious consequences.  While I’m sure my competitors and the analysts will call this a weakness, I can sure tell you it is not.  So Hitachi really is a place where your data can rest safely.

Related Posts Plugin for WordPress, Blogger...

Comments (4 )

Post Comment

miho on 16 Mar 2010 at 8:58 pm

I was laughed at first! My friend also made same mistake yours.
And I also made mistake of “rm” command.
Hitachi storage should have the feature against human error like flash-recovery function of Oracle database;-)

Michael Hay on 17 Mar 2010 at 8:51 pm

Nice to hear from you Miho! Yes I laugh at myself now too and it was a good lesson to learn. I know for a fact that we do put things into Hitachi storage which help prevent bad things from happening for our users. I know because I helped make sure that some of them are in our products. For example, HTSM after every migration has the array do an overwrite of the original source LUN, or if you are really concerned you can do a shred operation to make sure that all of the data is wiped. There are more things that we do, but I just wanted to supply one example.

Greg G on 29 Mar 2010 at 7:28 pm

My favorite stuipd UNIX trick, that can be career limiting at the wrong time and on the wrong server:

last | reboot

(missing grep before reboot when trying to find the last time a server rebooted. Answer is: right now!!)

Sorry to hear about your Nagoya mistake. I go there all the time, and now I guess I’ll need to make sure I don’t accidently go to Odawara since the tracks are so close, I’ve never thought about it :-)

Michael Hay on 30 Mar 2010 at 1:40 am

Well fortunately in both cases I could easily recover which is a good thing. In other conditions at least the dreaded “rm” would have landed me a pink piece of paper which is not something you want to get.

Post a Comment





.

Techno-Musings

Techno Musings

Connect with Us

   

Recent Videos

Switch to our mobile site