Seen in today’s issue of the RISKS digest:
In the process of upgrading its storage management, PlusNet deleted more than 700GB of its customers’ e-mail and disabled the ability of about half its 140,000 users to send and receive new e-mail. “At the time of making this change the engineer had two management console sessions open one to the backup storage system and one to live storage. These both have the same interface, and until [then] it was impossible to open more than one connection to any part of the storage system at once.” Patches were installed, but the engineer assumed he was working with the backup rather than the live server. Thus, “the command to reconfigure the disk pack and remove all data therein was made to the wrong server.”
It’s for things like this that the RISKS digest should be a required reading for software professionals, especially in enterprise software. Tools make it easier to do useful things, they also make it easier to do very stupid things. Additional automation (that we are working on right now) can help prevent these problems. But it has corner cases too that may open the door to even bigger failures.
[UPDATE: turns out it was Sun.]