William Vambenepe's blog

IT management in a changing IT world

kid rock ringtoneshalloween theme ringtonemp3 ringtoneaxe ringtone

Archive for the 'Amazon' Category

30
Jul
2008

Grid cloudification

by William Vambenepe

Grid computing is moulting and, to no surprise, the new skin has “cloud” written all over it.

That’s one way to interpret the announcement today that HP, Intel and Yahoo are going to launch a compute cloud. Seeing Intel and HP work together on this is no surprise. Back at HP I had some involvement with the collaboration between HP Labs and Intel on PlanetLab.

I have only read the Gigaom article and Steve’s, so this post is not an analysis of the announcement. Just a few questions that come to mind. They can be most concisely expressed by trying to understand the difference with Amazon’s EC2. The quotes below all come from the Gigaom article.

“six physical locations” -> Amazon has availability zones, including the choice of three geographies.

“between 1,000 and 4,000 mostly Intel cores” -> According to this well-publicized story, Amazon can deliver 5,000 servers (each linked to at least one physical core) to one customer without breaking a sweat.

“We want, unlike other partnerships including Google and IBM’s where the lower-level stacks are not provided in a open manner to the world, open access to all levels of the hardware” -> The quote seems to conveniently avoid comparison with EC2 which provides a much lower abstraction level: virtual machines with mountable raw block storage devices. How much lower can you go without handing out access cards to physically walk into the datacenter? Access to the BMC on the motherboard? Access to some internal bus? Remote-controlled little robots that will slide cards in and out of a chassis?

“researchers will be able to access the cloud through a proposal process later this year” -> Ec2 offers pay-as-you go, which tends to be a good driver for people to use the infrastructure efficiently. And of course someone can always give researchers a grant in the form of EC2 rent money.

Just to be clear, I am not belittling the announcement because for one thing I haven’t read much about it and for another I probably know many of the HP Labs people involved and they are part of the “mucho sapiens” branch of “homo sapiens”. I know they wouldn’t bother putting this out if it was nothing more than giving researchers some free EC2 time.

But these are the questions I’ll be trying to answer for myself as I read more about this project.

21
Jul
2008

Animoto is no infrastructure flexibility benchmark

by William Vambenepe

I have nothing against Animoto. From what I know about them (mostly from John’s podcast with Brad Jefferson) they built their system, using EC2, in a very smart way.

But I do have something against their story being used to set the benchmark for infrastructure flexibility. For those who haven’t heard it five times already, the summary of “their story” is ramping up from 50 to 5000 machines in a week (according to the podcast). Or from 50 to 3500 (according to the this AWS blog entry). Whatever. If I auto-generate my load (which is mostly what they did when they decided to auto-create a custom video for each new user) I too can create the need for a thousands of machines.

This was probably a good business decision for Animoto. They got plenty of visibility at a low cost. Plus the extra publicity from being an EC2 success story (I for one would never have heard of them through their other channels). Good for them. Good for Amazon who made it possible. And who got a poster child out of it. Good for the facebookers who got to waste another 30 seconds of their time straining their eyes. Everyone is happy, no animal got hurt in the process, hurray.

That’s all good but it doesn’t mean that from now on any utility computing solution needs to support ramping up by a factor of 100 in a week. What if Animoto had been STD’ed (slashdoted, technoratied and dugg) at the same time as the Facebook burst, resulting in the need for 50,000 servers? Would 1,000 X be the new benchmark? What if a few of the sites that target the “lonely guy” demographic decided to use Animoto for… ok let’s not got there.

There are three types of user requirements. The Animoto use case is clearly not in the first category but I am not convinced it’s in the third one either.

  1. The “pulled out of thin air” requirements that someone makes up on the fly to justify a feature that they’ve already decided needs to be there. Most frequently encountered in standards working groups.
  2. The “it happened” requirements that assumes that because something happened sometimes somewhere it needs to be supported all the time everywhere.
  3. The “it makes business sense” requirements that include a cost-value analysis. The kind that comes not from asking “would you like this” to a customer but rather “how much more would you pay for this” or “what other feature would you trade for this”.

When cloud computing succeeds (i.e. when you stop hearing about it all the time and, hopefully, we go back to calling it “utility computing”), it will be because the third category of requirements will have been identified and met. Best exemplified by the attitude of Tarus (from OpenNMS) in the latest Redmonk podcast (paraphrased): sure we’ll customize OpenNMS for cloud environments; as soon as someone pays us to do it.

30
Jun
2008

Moving towards utility/cloud computing standards?

by William Vambenepe

This Forbes article (via John) channels 3Tera’s Bert Armijo’s call for standardization of utility computing. He calls it “Open Cloud” and it would “allow a company’s IT systems to be shared between different cloud computing services and moved freely between them“. Bert talks a bit more about it on his blog and, while he doesn’t reference the Forbes interview (too modest?), he points to Cloudscape as the vision.

A few early thoughts on all this:

  • No offense to Forbes but I wouldn’t read too much into the article. Being Forbes, they get quotes from a list of well-known people/companies (Google and Amazon spokespeople, Forrester analyst, Nick Carr). But these quotes all address the generic idea of utility computing standards, not the specifics of Bert’s project.
  • Saying that “several small cloud-computing firms including Elastra and Rightscale are already on board with 3Tera’s standards group” is ambiguous. Are they on-board with specific goals and a candidate specification? Or are they on board with the general idea that it might be time to talk about some kind of standard in the general area of utility computing?
  • IEEE and W3C are listed as possible hosts for the effort, but they don’t seem like a very good match for this area. I would have thought of DMTF, OASIS or even OGF first. On the face of it, DMTF might be the best place but I fear that companies like 3Tera, Rightscale and Elastra would be eaten alive by the board member companies there. It would be almost impossible for them to drive their vision to completion, unlike what they can do in an OASIS working group.
  • A new consortium might be an option, but a risky and expensive one. I have sometimes wondered (after seeing sad episodes of well-meaning and capable start-ups being ripped apart by entrenched large vendors in standards groups) why VCs don’t play a more active role in standards. Standards sound like the kind of thing VCs should be helping their companies with. VC firms are pretty used to working together, jointly investing in companies. Creating a new standard consortium might be too hard for 3Tera, but if the VCs behind 3Tera, Elastra and Rightscale got together and looked at the utility computing companies in their portfolios, it might make sense to join forces on some well-scoped standardization effort that may not otherwise be given a chance in existing groups.
  • I hope Bert will look into the history of DCML, a similar effort (it was about data center automation, which utility computing is not that far from once you peel away the glossy pictures) spearheaded by a few best-of-bread companies but ignored by the big boys. It didn’t really take off. If it had, utility computing standards might now be built as an update/extension of that specification. Of course DCML started as a new consortium and ended as an OASIS “member section” (a glorified working group), so this puts a grain of salt on my “create a new consortium and/or OASIS group” suggestion above.
  • The effort can’t afford to be disconnected from other standards in the virtualization and IT management domains. How does the effort relate to OVF? To WS-Management? To existing modeling frameworks? That’s the main draw towards DMTF as a host.
  • What’s the open source side of this effort? As John mentions during the latest Redmonk/Willis IT management podcast (starting around minute 24), there needs to a open source side to this. Actually, John thinks all you need is the open source side. Coté brings up Eucalyptus. BTW, if you want an existing combination of standards and open source, have a look at CDDLM (standard) and SmartFrog (implementation, now with EC2/S3 deployment)
  • There seems to be some solid technical raw material to start from. 3Tera’s ADL, combined with Elastra’s ECML/EDML, presumably captures a fair amount of field expertise already. But when you think of them as a starting point to standardization, the mindset needs to switch from “what does my product need to work” to “what will the market adopt that also helps my product to work”.
  • One big question (at least from my perspective) is that of the line between infrastructure and applications. Call me biased, but I think this effort should focus on the infrastructure layer. And provide hooks to allow application-level automation to drive it.
  • The other question is with regards to the management aspect of the resulting system and the role management plays in whatever standard specification comes out of Bert’s effort.

Bottom line: I applaud Bert’s efforts but I couldn’t sleep well tonight if I didn’t also warn him that “there be dragons”.

And for those who haven’t seen it yet, here is a very good document on the topic (but it is focused on big vendors, not on how smaller companies can play the standards game).

[UPDATED 2008/6/30: A couple hours after posting this, I see that Coté has just published a blog post that elaborates on his view of cloud standards. As an addition to the podcast I mentioned earlier.]

[UPDATED 2008/7/2: If you read this in your feed viewer (rather than directly on vambenepe.com) and you don't see the comments, you should go have a look. There are many clarifications and some additional insight from the best authorities on the topic. Thanks a lot to all the commenters.]

31
May
2008

Google App Engine: less is more

by William Vambenepe

“If you have a stove, a saucepan and a bottle of cold water, how can you make boiling water?”

If you ask this question to a mathematician, they’ll think about it a while, and finally tell you to pour the water in the saucepan, light up the stove and put the saucepan on it until the water boils. Makes sense. Then ask them a slightly different question: “if you have a stove and a saucepan filled with cold water, how can you make boiling water?”. They’ll look at you and ask “can I also have a bottle”? If you agree to that request they’ll triumphantly announce: “pour the water from the saucepan into the bottle and we are back to the previous problem, which is already solved.”

In addition to making fun of mathematicians, this is a good illustration of the “fake machine” approach to utility computing embodied by Amazon’s EC2. There is plenty of practical value in emulating physical machines (either in your data center, using VMWare/Xen/OVM or at a utility provider’s site, e.g. EC2). They are all rooted in the fact that there is a huge amount of code written with the assumption that it is running on an identified physical machine (or set of machines), and you want to keep using that code. This will remain true for many many years to come, but is it the future of utility computing?

Google’s App Engine is a clear break from this set of assumptions. From this perspective, the App Engine is more interesting for what it doesn’t provide than for what it provides. As the description of the Sandbox explains:

“An App Engine application runs on many web servers simultaneously. Any web request can go to any web server, and multiple requests from the same user may be handled by different web servers. Distribution across multiple web servers is how App Engine ensures your application stays available while serving many simultaneous users [not to mention that this is also how they keep their costs low -- William]. To allow App Engine to distribute your application in this way, the application runs in a restricted ’sandbox’ environment.”

The page then goes on to succinctly list the limitations of the sandbox (no filesystem, limited networking, no threads, no long-lived requests, no low-level OS functions). The limitations are better described and commented upon here but even that article misses one major limitation, mentioned here: the lack of scheduler/cron.

Rather than a feature-by-feature comparison between the App Engine and EC2 (which Amazon would won handily at this point), what is interesting is to compare the underlying philosophies. Even with Amazon EC2, you don’t get every single feature your local hardware can deliver. For example, in its initial release EC2 didn’t offer a filesystem, only a storage-as-a-service interface (S3 and then SimpleDB). But Amazon worked hard to fix this as quickly as possible in order to be appear as similar to a physical infrastructure as possible. In this entry, announcing persistent storage for EC2, Amazon’s CTO takes pain to highlight this achievement:

“Persistent storage for Amazon EC2 will be offered in the form of storage volumes which you can mount into your EC2 instance as a raw block storage device. It basically looks like an unformatted hard disk. Once you have the volume mounted for the first time you can format it with any file system you want or if you have advanced applications such as high-end database engines, you could use it directly.”

and

“And the great thing is it that it is all done with using standard technologies such that you can use this with any kind of application, middleware or any infrastructure software, whether it is legacy or brand new.”

Amazon works hard to hide (from the application code) the fact that the infrastructure is a huge, shared, distributed system. The beauty (and business value) of their offering is that while the legacy code thinks it is running in a good old data center, the paying customer derives benefits from the fact that this is not the case (e.g. fast/easy/cheap provisioning and reduced management responsibilities).

Google, on the other hand, embraces the change in underlying infrastructure and requires your code to use new abstractions that are optimized for that infrastructure.

To use an automotive analogy, Amazon is offering car drivers to switch to a gas/electric hybrid that refuels in today’s gas stations while Google is pushing for a direct jump to hydrogen fuel cells.

History is rarely kind to promoters of radical departures. The software industry is especially fond of layering the new on top of the old (a practice that has been enabled by the constant increase in underlying computing capacity). If you are wondering why your command prompt, shell terminal or text editor opens with a default width of 80 characters, take a trip back to 1928, when IBM defined its 80-columns punch card format. Will Google beat the odds or be forced to be more accommodating of existing code?

It’s not the idea of moving to a more abstracted development framework that worries me about Google’s offering (JEE, Spring and Ruby on Rails show that developers want this move anyway, for productivity reasons, even if there is no change in the underlying infrastructure to further motivate it). It’s the fact that by defining their offering at the level of this framework (as opposed to one level below, like Amazon), Google puts itself in the position of having to select the right framework. Sure, they can support more than one. But the speed of evolution in that area of the software industry shows that it’s not mature enough (yet?) for any party to guess where application frameworks are going. Community experimentation has been driving application frameworks, and Google App Engine can’t support this. It can only select and freeze a few framework.

Time will tell which approach works best, whether they should exist side by side or whether they slowly merge into a “best of both worlds” offering (Amazon already offers many features, like snapshots, that aim for this “best of both worlds”). Unmanaged code (e.g. C/C++ compiled programs) and managed code (JVM or CLR) have been coexisting for a while now. Traditional applications and utility-enabled applications may do so in the future. For all I know, Google may decide that it makes business sense for them too to offer a Xen-based solution like EC2 and Amazon may decide to offer a more abstracted utility computing environment along the lines of the App Engine. But at this point, I am glad that the leaders in utility computing have taken different paths as this will allow the whole industry to experiment and progress more quickly.

The comparison is somewhat blurred by the fact that the Google offering has not reached the same maturity level as Amazon’s. It has restrictions that are not directly related to the requirements of the underlying infrastructure. For example, I don’t see how the distributed infrastructure prevents the existence of a scheduling service for background jobs. I expect this to be fixed soon. Also, Amazon has a full commercial offering, with a price list and an ecosystem of tools, why Google only offers a very limited beta environment for which you can’t buy extra capacity (but this too is changing).