Sorry, no server for you today
by William (@vambenepe on Twitter)Imagine that you are leasing a new car. Of course you plan to stay current on your lease payments. When you take delivery of the car, it comes with a loaded gun mounted on the dashboard and pointed at the driver’s head. The sales guy assures you that the gun has been programed to only discharge if your fall behind in your payments. As long as you keep paying, what could go wrong he asks?
Ask this poor VMWare customer (whose virtual machines suddenly refused to power up) what could go wrong. According to a company spokesman, “an issue has been uncovered with ESX 3.5 Update 2 and ESXi 3.5 that causes the product license to expire on August 12″.
Why does anyone accept to use mission-critical infrastructure software that has such a kill switch? Enough things can go wrong with complex software that we don’t need to engineer additional causes of failure.
[UPDATED 2008/8/15: A less dramatic but related example: a Microsoft employee has his Win Server 2008 release candidate license expire on him. Sure it's an RC so you shouldn't have production-quality expectations on it, but that means that the "kill switch" code is there. Even if you plan to free the final release from this constraint, the fact that the code was there at one point means that things can go wrong. This is what happened with VMWare BTW: "the problem is caused by a build timeout that was mistakenly left enabled for the release build".]
[UPDATED 2008/9/2: A more throrough analysis of the importance of asking "why is this (license enforcement) in the code in the first place" rather than "how did this bug slip through".]
August 14th, 2008 at 10:25 am
Been there, done that. What a nice surprise when machines would fail with a “General System Error” message. Luckily for us, these VMs were used for development, not production. My whole team rants about VMware ESX, its quirkiness, the insistence with which it begs vmware-tools to be deployed… The basic VM hypervisor is a commodity, pretty much the way OS have gone, so I would not be surprised if VMware focussed solely on managing VMs, regardless of their provenance.
Nice blog by the way.
August 18th, 2008 at 1:24 am
I think the “leasing car & gun” situation is not new… conventional operating systems could exhibit the same problematic behaviour. I mean, even if you are not using VMware ESX or any other virtualization technology, it could happen that your old-fashioned operating system refuses to boot due to some obscure problem bug that only its vendor knows (theoretically). I didn’t dig in, but I think that several examples could be found.
The fact is that whenever you rely your services/application/solution/etc. in someone piece of software (OS or hypervisor) you *need* to trust in its vendor. So, what VMware and other have to care about is the reliability perception of their virtualization products (and, of course, this kind of news damages their image). Moreover, some customer may prefer open software solutions (being the Xen hypervisor the most relevant case nowadays, I guess) by the old argument that the more eyes you keep in the code (so, less “obscurity”) the less bugs the code would have and, therefore, customers perceive more reliability from those products.
August 18th, 2008 at 9:20 am
Fermin: Sure there are always bugs. But most of them are part of a the implementation of a feature that is (presumably) of some value to the user. License enforcement is a path of failure (designed to be a single point of failure) that is specifically added to the system and yet doesn’t provide any customer benefit. It just seems to be in a different category.