Category Archives: Cloud Computing

The battle of the Cloud Frameworks: Application Servers redux?

The battle of the Cloud Frameworks has started, and it will look a lot like the battle of the Application Servers which played out over the last decade and a half. Cloud Frameworks (which manage IT automation and runtime outsourcing) are to the Programmable Datacenter what Application Servers are to the individual IT server. In the longer term, these battlefronts may merge, but for now we’ve been transported back in time, to the early days of Web programming. The underlying dynamic is the same. It starts with a disruptive IT event (part new technology, part new mindset). 15 years ago the disruptive event was the Web. Today it’s Cloud Computing.

Stage 1

It always starts with very simple use cases. For the Web, in the mid-nineties, the basic use case was “how do I return HTML that is generated by a script as opposed to a static file”. For Cloud Computing today, it is “how do I programmatically create, launch and stop servers as opposed to having to physically install them”.

In that sense, the IaaS APIs of today are the equivalent of the Common Gateway Interface (CGI) circa 1993/1994. Like the EC2 API and its brethren, CGI was not optimized, not polished, but it met the basic use cases and allowed many developers to write their first Web apps (which we just called “CGI scripts” at the time).

Stage 2

But the limitations became soon apparent. In the CGI case, it had to do with performance (the cost of the “one process per request” approach). Plus, the business potential was becoming clearer and attracted a different breed of contenders than just academic and research institutions. So we got NSAPI, ISAPI, FastCGI, Apache Modules, JServ, ZDAC…

We haven’t reached that stage for Cloud yet. That will be when the IaaS APIs start to support events, enumerations, queries, federated identity etc…

Stage 3

Stage 2 looked like the real deal, when we were in it, but little did we know that we were still just nibbling on the hors d’oeuvres. And it was short-lived. People quickly decided that they wanted more than a way to handle HTTP requests. If the Web was going to be central to most programs, then all aspects of programming had to fit well in the context of the Web. We didn’t want Web servers anymore, we wanted application servers (re-purposing a term that had been used for client-server). It needed more features, covering data access, encapsulation, UI frameworks, integration, sessions. It also needed to meet non-functional requirements: availability, scalability (hello clustering), management, identity…

That turned into the battle between the various Java application servers as well as between Java and Microsoft (with .Net coming along), along with other technology stacks. That’s where things got really interesting too, because we explored different ways to attack the problem. People could still program at the HTTP request level. They could use MVC framework, ColdFusion/ASP/JSP/PHP-style markup-driven applications, or portals and other higher-level modular authoring frameworks. They got access to adapters, message buses, process flows and other asynchronous mechanisms. It became clear that there was not just one way to write Web applications. And the discovery is still going on, as illustrated by the later emergence of Ruby on Rails and similar frameworks.

Stage 4

Stage 3 is not over for Web applications, but stage 4 is already there, as illustrated by the fact that some of the gurus of stage 3 have jumped to stage 4. It’s when the Web is everywhere. Clients are everywhere and so are servers for that matter. The distinction blurs. We’re just starting to figure out the applications that will define this stage, and the frameworks that will best support them. The game is far from over.

So what does it mean for Cloud Frameworks?

If, like me, you think that the development of Cloud Frameworks will follow a path similar to that of Application Servers, then the quick retrospective above can be used as a (imperfect) crystal ball. I don’t pretend to be a Middleware historian or that these four stages are the most accurate representation, but I think they are a reasonable perspective. And they hold some lessons for Cloud Frameworks.

It’s early

We are at stage 1. I’ll admit that my decision to separate stages 1 and 2 is debatable and mainly serves to illustrate how early in the process we are with Cloud frameworks. Current IaaS APIs (and the toolkits that support them) are the equivalent of CGI (and the early httpd), something that’s still around (Google App Engine emulates CGI in its Python incarnation) but almost no-one programs to directly anymore. It’s raw, it’s clunky, it’s primitive. But it was a needed starting point that launched the whole field of Web development. Just like IaaS APIs like EC2 have launched the field Cloud Computing.

Cloud Frameworks will need to go through the equivalent of all the other stages. First, the IaaS APIs will get more optimized and capable (stage 2). Then, at stage 3, we will focus on higher-level, more productive abstraction layers (generally referred to as PaaS) at which point we should expect a thousand different approaches to bloom, and several of them to survive. I will not hazard a guess as to what stage 4 will look like (here is my guess for stage 3, in two parts).

No need to rush standards

One benefit of this retrospective is to highlight the tragedy of Cloud standards compared to Web development standards. Wouldn’t we be better off today if the development leads of AWS and a couple of other Cloud providers had been openly cooperating in a Cloud equivalent of the www-talk mailing list of yore? Out of it came a rough agreement on HTML and CGI that allowed developers to write basic Web applications in a reasonably portable way. If the same informal collaboration had taken place for IaaS APIs, we’d have a simple de-facto consensus that would support the low-hanging fruits of basic IaaS. It would allow Cloud developers to support the simplest use cases, and relieve the self-defeating pressure to standardize too early. Standards played a huge role in the development of Application Servers (especially of the Java kind), but that really took place as part of stage 3. In the absence of an equivalent to CGI in the Cloud world, we are at risk of rushing the standardization without the benefit of the experimentation and lessons that come in stage 3.

I am not trying to sugar-coat the history of Web standards. The HTML saga is nothing to be inspired by. But there was an original effort to build consensus that wasn’t even attempted with Cloud APIs. I like the staged process of a rough consensus that covers the basic use cases, followed by experimentation and proprietary specifications and later a more formal standardization effort. If we skip the rough consensus stage, as we did with Cloud, we end up rushing to do final step (to the tune of “customers demand Cloud standards”) even though all we need for now is an interoperable way to meet the basic use cases.

Winners and losers

Whoever you think of as the current leaders of the Application Server battle (hint: I work for one), they were not the obvious leaders of stages 1 and 2. So don’t be in too much of a hurry to crown the Cloud Framework kings. Those you think of today may turn out to be the Netscapes of that battle.

New roles

It’s not just new technology. The development of Cloud Frameworks will shape the roles of the people involved. We used to have designers who thought their job was done when they produced a picture or at best a FrameMaker or QuarkXPress document, which is what they were used to. We had “webmasters” who thought they were set for life with their new Apache skills, then quickly had to evolve or make way, a lesson for IaaS gurus of today. Under terms like “DevOps” new roles are created and existing roles are transformed. Nobody yet knows what will stick. But if I was an EC2 guru today I’d make sure to not get stuck providing just that. Don’t wait for other domains of Cloud expertise to be in higher demand than your current IaaS area, as by then you’ll be too late.

It’s the stack

There aren’t many companies out there making a living selling a stand-alone Web server. Even Zeus, who has a nice one, seems to be downplaying it on its site compared to its application delivery products. The combined pressure of commoditization (hello Apache) and of the demand for a full stack has made it pretty hard to sell just a Web server.

Similarly, it’s going to be hard to stay in business selling just portions of a Cloud Framework. For example, just provisioning, just monitoring, just IaaS-level features, etc. That’s well-understood and it’s fueling a lot of the acquisitions (e.g. VMWare’s purchase of SpringSource which in turn recently purchased RabbitMQ) and partnerships (e.g. recently between Eucalyptus and GroundWork though rarely do such “partnerships” rise to the level of integration of a real framework).

It’s not even clear what the right scope for a Cloud Framework is. What makes a full stack and what is beyond it? Is it just software to manage a private Cloud environment and/or deployments into public Clouds? Does the framework also include the actual public Cloud service? Does it include hardware in some sort of “private Cloud in a box”, of the kind that this recent Dell/Ubuntu announcement seems to be inching towards?

Integration

If indeed we can go by the history of Application Server to predict the future of Cloud Frameworks, then we’ll have a few stacks (with different levels of completeness, standardized or proprietary). This is what happened for Web development (the JEE stack, the .Net stack, a more loosely-defined alternative stack which is mostly open-source, niche stacks like the backend offered by Adobe for Flash apps, etc) and at some point the effort moved from focusing on standardizing the different application environment technology alternatives (e.g. J2EE) towards standardizing how the different platforms can interoperate (e.g. WS-*). I expect the same thing for Cloud Frameworks, especially as they grow out of stages 1 and 2 and embrace what we call today PaaS. At which point the two battlefields (Application Servers and Cloud Frameworks) will merge. And when this happens, I just can’t picture how one stack/framework will suffice for all. So we’ll have to define meaningful integration between them and make them work.

If you’re a spectator, grab plenty of popcorn. If you’re a soldier in this battle, get ready for a long campaign.

7 Comments

Filed under Application Mgmt, Automation, Big picture, Cloud Computing, Everything, Mgmt integration, Middleware, Standards, Utility computing

Steve Ballmer gets Cloud

Steve Ballmer wants devops

Devops? What’s devops? See these articles:

3 Comments

Filed under Cloud Computing, DevOps, Everything, IT Systems Mgmt, Microsoft, People

Smoothing a discrete world

For the short term (until we sell one) there are three cars in my household. A manual transmission, an automatic and a CVT (continuous variable transmission). This makes me uniquely qualified to write about Cloud Computing.

That’s because Cloud Computing is yet another area in which the manual/automatic transmission analogy can be put to good use. We can even stretch it to a 4-layer analogy (now that’s elasticity):

Manual transmission

That’s traditional IT. Scaling up or down is done manually, by a skilled operator. It’s usually not rocket science but it takes practice to do it well. At least if you want it to be reliable, smooth and mostly unnoticed by the passengers.

Manumatic transmission (a.k.a. Tiptronic)

The driver still decides when to shift up or down, but only gives the command. The actual process of shifting is automated. This is how many Cloud-hosted applications work. The scale-up/down action is automated but, still contingent on being triggered by an administrator. Which is what most IaaS-deployed apps should probably aspire to at this point in time despite the glossy brochures about everything being entirely automated.

Automatic transmission

That’s when the scale up/down process is not just automated in its execution but also triggered automatically, based on some metrics (e.g. load, response time) and some policies. The scenario described in the aforementioned glossy brochures.

Continuous variable transmission

That’s when the notion of discrete gears goes away. You don’t think in terms of what gear you’re in but how much torque you want. On the IT side, you’re in PaaS territory. You don’t measure the number of servers, but rather a continuously variable application capacity metric. At least in theory (most PaaS implementations often betray the underlying work, e.g. via a spike in application response time when the app is not-so-transparently deployed to a new node).

More?

OK, that’s the analogy. There are many more of the same kind. Would you like to hear how hybrid Cloud deployments (private+public) are like hybrid cars (gas+electric)? How virtualization is like carpooling (including how you can also be inconvenienced by the BO of a co-hosted VM)? Do you want to know why painting flames on the side of your servers doesn’t make them go faster?

Driving and IT management have a lot in common, including bringing out the foul-mouth in us when things go wrong.

So, anyone wants to buy a manual VW Golf Turbo? Low mileage. Cloud-checked.

4 Comments

Filed under Application Mgmt, Automation, Cloud Computing, Everything, IT Systems Mgmt, Utility computing

“Freeing SaaS from Cloud”: slides and notes from Cloud Connect keynote

I got invited to give a short keynote presentation during the Cloud Connect conference this week at the Santa Clara Convention Center (thanks Shlomo and Alistair). Here are the slides (as PPT and PDF). They are visual support for my bad jokes rather than a medium for the actual message. So here is an annotated version.

I used this first slide (a compilation of representations of the 3-layer Cloud stack) to poke some fun at this ubiquitous model of the Cloud architecture. Like all models, it’s neither true nor false. It’s just more or less useful to tackle a given task. While this 3-layer stack can be relevant in the context of discussing economic aspects of Cloud Computing (e.g. Opex vs. Capex in an on-demand world), it is useless and even misleading in the context of some more actionable topics for SaaS: chiefly, how you deliver such services, how you consume them and how you manage them.

In those contexts, you shouldn’t let yourself get too distracted by the “aaS” aspect of SaaS and focus on what it really is.

Which is… a web application (by which I include both HTML access for humans and programmatic access via APIs.). To illustrate this point, I summarized the content of this blog entry. No need to repeat it here. The bottom line is that any distinction between SaaS and POWA (Plain Old Web Applications) is at worst arbitrary and at best concerned with the business relationship between the provider and the consumer rather than  technical aspects of the application.

Which means that for most technical aspect of how SaaS is delivered, consumed and managed, what you should care about is that you are dealing with a Web application, not a Cloud service. To illustrate this, I put up the…

… guillotine slide. Which is probably the only thing people will remember from the presentation, based on the ample feedback I got about it. It probably didn’t hurt that I also made fun of my country of origin (you can never go wrong making fun of France), saying that the guillotine was our preferred way of solving any problem and also the last reliable piece of technology invented in France (no customer has ever come back to complain). Plus, enough people in the audience seemed to share my lassitude with the 3-layer Cloud stack to find its beheading cathartic.

Come to think about it, there are more similarities. The guillotine is to the axe what Cloud Computing is to traditional IT. So I may use it again in Cloud presentations.

Of course this beheading is a bit excessive. There are some aspects for which the IaaS/PaaS/SaaS continuum makes sense, e.g. around security and compliance. In areas related to multi-tenancy and the delegation of control to a third party, etc. To the extent that these concerns can be addressed consistently across the Cloud stack they should be.

But focusing on these “Cloud” aspects of SaaS is missing the forest for the tree.

A large part of the Cloud value proposition is increased flexibility. At the infrastructure level, being able to provision a server in minutes rather than days or weeks, being able to turn one off and instantly stop paying for it, are huge gains in flexibility. It doesn’t work quite that way at the application level. You rarely have 500 new employees joining overnight who need to have their email and CRM accounts provisioned. This is not to minimize the difficulties of deploying and scaling individual applications (any improvement is welcome on this). But those difficulties are not what is crippling the ability of IT to respond to business needs.

Rather, at the application level, the true measure of flexibility is the control you maintain on your business processes and their orchestration across applications. How empowered (or scared) you are to change them (either because you want to, e.g. entering a new business, or because you have to, e.g. a new law). How well your enterprise architecture has been defined and implemented. How much visibility you have into the transactions going through your business applications.

It’s about dealing with composite applications, whether or not its components are on-premise or “in the Cloud”. Even applications like Salesforce.com see a large number of invocations from their APIs rather than their HTML front-end. Which means that there are some business applications calling them (either other SaaS, custom applications or packaged applications with an integration to Salesforce). Which means that the actual business transactions go across a composite system and have to be managed as such, independently of the deployment model of each participating application.

[Side note: One joke that fell completely flat was that it was unlikely that the invocations of Salesforce  through the Web services APIs be the works of sales people telneting to port 80 and typing HTTP and SOAP headers. Maybe I spoke too quickly (as I often do), or the audience was less nerdy than I expected (though I recognized some high-ranking members of the nerd aristocracy). Or maybe they all got it but didn’t laugh because I forgot to take encryption into account?]

At this point I launched into a very short summary of the benefits of SOA governance/management, real user experience monitoring, BTM and application-centric IT management in general. Which is very succinctly summarized on the slide by the “SOA” label on the receiving bucket. I would have needed another 10 minutes to do this subject justice. Maybe at Cloud Connect 2011? Alistair?

This picture of me giving the presentation at Cloud Connect is the work of Alex Dunne.

The guillotine picture is the work of Rusty Boxcars who didn’t just take the photo but apparently built the model (yes it’s a small-size model, look closely). Here it is in its original, unedited, glory. My edited version is available under the same CC license to anyone who wants to grab it.

14 Comments

Filed under Application Mgmt, Business Process, Cloud Computing, Conference, Everything, Governance, Mgmt integration, SaaS, Trade show, Utility computing

Standards Disconnect at Cloud Connect

Yesterday’s panel session on the future of Cloud standards at Cloud Connect is still resonating on Twitter tonight. Many were shocked by how acrimonious the debate turned. It didn’t have to be that way but I am not surprised that it was.

The debate was set up and moderated by Bob Marcus (ET-Strategies CTO and master standards coordinator). On stage were Krishna Sankar (Cisco and DMTF Cloud incubator), Archie Reed (HP and CSA), Winston Bumpus (VMWare and DMTF), a gentleman whose name I unfortunately forgot (and who isn’t listed on the program) and me.

If the goal was to glamorize Cloud standards, it was a complete failure. If the goal was to come out with some solutions and agreements, it was also a failure. But if the goal, as I believe, was to surface the current issues, complexities, emotions and misunderstandings surrounding Cloud standards, then I’d say it was a success.

I am not going to attempt to summarize the whole discussion. Charles Babcock, who was in the audience, does a good enough job in this InformationWeek article and, unlike me, he doesn’t have a horse in the race [side note: I am not sure why my country of origin is relevant to his article, but my guess is that this is the main thing he remembered from my presentation during the Cloud Connect keynote earlier that morning, thanks to the “guillotine” slide].

Instead of reporting on what happened during the standards discussion, I’ll just make one comment and provide one take-away.

The comment: the dangers of marketing standards

Early in the session, audience member Reuven Cohen complained that standards organizations don’t do enough to market their specifications. Winston was more than happy to address this and talk about all the marketing work that DMTF does, including trade shows and PR. He added that this is one of the reasons why DMTF needs to charge membership fees, to pay for this marketing. I agree with Winston at one level. Indeed, the DMTF does what he describes and puts a fair amount of efforts into marketing itself and its work. But I disagree with Reuven and Winston that this is a good thing.

First it doesn’t really help. I don’t think that distributing pens and tee-shirts to IT admins and CIO-wannabes results in higher adoption of your standard. Because the end users don’t really care what standard is used. They just want a standard. Whether it comes from DMTF, SNIA, OGF, or OASIS is the least of their concerns. Those that you have to convince to adopt your standard are the vendors and the service providers. The Amazon, Rackspace and GoGrid of the world. The Microsoft, Oracle, VMWare and smaller ones like… Enomaly (Reuven’s company). The highly-specialized consultants who work with them, like Randy. And also, very importantly, the open source developers who provide all the Cloud libraries and frameworks that are the lifeblood of many deployments. I have enough faith left in human nature to assume that all these guys make their strategic standards decisions on a bit more context than exhibit hall loot and press releases. Well, at least we do where I work.

But this traditional approach to marketing is worse than not helping. It’s actually actively harmful, for two reasons. The first is that the cost of these activities, as Winston acknowledges, creates a barrier for participation by requiring higher dues. To Winston it’s an unfortunate side effect, to me it’s a killer. Not necessarily because dropping the membership fee by 50% would bring that many more participants. But because the organizations become so dependent on dues that they are paranoid about making anything public for fear of lowering the incentive for members to keep paying. Which is the worst thing you can do if you want the experts and open source developers, who are the best chance Cloud standards have to not repeat the mistakes of the past, to engage with the standard. Not necessarily as members of the group, also from the outside. Assuming the work happens in public, which is the key issue.

The other reason why it’s harmful to have a standards organization involved in such traditional marketing is that it has a tendency to become a conduit for promoting the agenda of the board members. Promoting a given standard or organization sounds good, until you realize that it’s rarely so pure and unbiased. The trade shows in which the organization participates are often vendor-specific (e.g. Microsoft Management Summit, VMWorld…). The announcements are timed to coincide with relevant corporate announcements. The press releases contain quotes from board members who promote themselves at the same time as the organization. Officers speaking to the press on behalf of the standards organization are often also identified by their position in their company. Etc. The more a standards organization is involved in marketing, the more its low-level members are effectively subsiding the marketing efforts of the board members. Standards have enough inherent conflicts of interest to not add more opportunities.

Just to be clear, that issue of standards marketing is not what consumed most of the time during the session. But it came up and I since I didn’t get a chance to express my view on this while on the panel, I used this blog instead.

My take-away from the panel, on the other hand, is focused on the heart of the discussion that took place.

The take away: confirmation that we are going too fast, too early

Based on this discussion and other experiences, my current feeling on Cloud standards is that it is too early. If you think the practical experience we have today in Cloud Computing corresponds to what the practice of Cloud Computing will be in 10 years, then please go ahead and standardize. But let me tell you that you’re a fool.

The portion of Cloud Computing in which we have some significant experience (get a VM, attach a volume, assign an IP) will still be relevant in 10 years, but it will be a small fraction of Cloud Computing. I can tell you that much even if I can’t tell you what the whole will be. I have my ideas about what the whole will look like but it’s just a guess. Anybody who pretends to know is fooling you, themselves, or both.

I understand the pain of customers today who just want to have a bit more flexibility and portability within the limited scope of the VM/Volume/IP offering. If we really want to do a standard today, fine. Let’s do a very small and pragmatic standard that addresses this. Just a subset of the EC2 API. Don’t attempt to standardize the virtual disk format. Don’t worry about application-level features inside the VM. Don’t sweat the REST or SOA purity aspects of the interface too much either. Don’t stress about scalability of the management API and batching of actions. Just make it simple and provide a reference implementation. A few HTTP messages to provision, attach, update and delete VMs, volumes and IPs. That would be fine. Anything else (and more is indeed needed) would be vendor extensions for now.

Unfortunately, neither of these (waiting, or a limited first standard) is going to happen.

Saying “it’s too early” in the standards world is the same as saying nothing. It puts you out of the game and has no other effect. Amazon, the clear leader in the space, has taken just this position. How has this been understood? Simply as “well I guess we’ll do it without them”. It’s sad, but all it takes is one significant (but not necessarily leader) company trying to capitalize on some market influence to force the standards train to leave the station. And it’s a hard decision for others to not engage the pursuit at that point. In the same way that it only takes one bellicose country among pacifists to start a war.

Prepare yourself for some collateral damages.

While I would prefer for this not to proceed now (not speaking for my employer on this blog, remember), it doesn’t mean that one should necessarily stay on the sidelines rather than make lemonade out of lemons. But having opened the Cloud Connect panel session with somewhat of a mea-culpa (at least for my portion of responsibility) with regards to the failures of the previous IT management standardization wave, it doesn’t make me too happy to see the seeds of another collective mea-culpa, when we’ve made a mess of Cloud standards too. It’s not a given yet. Just a very high risk. As was made clear yesterday.

9 Comments

Filed under Amazon, Cloud Computing, Conference, DMTF, Everything, Mgmt integration, People, Portability, Standards, Trade show, Utility computing

Two versions of a protocol is one too many

There is always a temptation, when facing a hard design decision in the process of creating an interface or a protocol, to produce two (or more) versions. It’s sometimes a good idea, as a way to explore where each one takes you so you can make a more informed choice. But we know how this invariably ends up. Documents get published that arguably should not. It’s even harder in a standard working group, where someone was asked (or at least encouraged) by the group to create each of the alternative specifications. Canning one is at best socially awkward (despite the appearances, not everyone in standards is a psychopath or a sadist) and often politically impossible.

And yet, it has to be done. Compare the alternatives, then pick one and commit. Don’t confuse being accommodating with being weak.

The typical example these days is of course SOAP versus REST: the temptation is to support both rather than make a choice. This applies to standards and to proprietary interfaces. When a standard does this, it hurts rather than promote interoperability. Vendors have a bit more of an excuse when they offer a choice (“the customer is always right”) but in reality it forces customers to play Russian roulette whether they want it or not. Because one of the alternatives will eventually be left behind (either discarded or maintained but not improved). If you balance the small immediate customer benefit of using the interface style they are most used to with the risk of redoing the integration down the road, the value proposition of offering several options crumbles.

[Pedantic disclaimer: I use the term “REST” in this post the way it is often (incorrectly) used, to mean pretty much anything that uses HTTP without a SOAP wrapper. The technical issues are a topic for other posts.]

CMDBf

CMDBf v1 is a DMTF standard. It is a SOAP-based protocol. For v2, it has been suggested that there should a REST version. I don’t know what the CMDBf group (in which I participate) will end up doing but I’ve made my position clear: I could go either way (remain with SOAP or dump it) but I do not want to have two versions of the protocol (one SOAP one REST). If we think we’re better off with a REST version, then let’s make v2 REST-only. Supporting both mechanisms in v2 would be stupid. They would address the same use cases and only serve to provide political ass-coverage. There is no functional need for both. The argument that we need to keep supporting SOAP for the benefit of those who implemented v1 doesn’t fly. As an implementer, nobody is saying that you need to turn off your v1 services the second you launch the v2 version.

DMTF Cloud

Between the specifications submitted directly to DMTF, the specifications developed by DMTF “partner” organizations and the existing DMTF protocols, the DMTF Cloud effort is presented with a mix of SOAP, RESTful and XML-RPC-over-HTTP options. In the process of deciding what to create or adopt I am sure that the temptation will be high to take the easy route of supporting several versions to placate everyone. But such a “consensus” would be achieved on the back of the implementers so I very much hope it won’t be the case.

When it is appropriate

There are cases where supporting alternatives options is worth the cost. But it typically happens when they serve very different use cases. Think of SAX versus DOM, which have clearly differentiated sweetspots. In the Cloud world, Amazon S3 gives us interesting examples of both justified and extraneous alternatives. The extraneous one is the choice between REST and SOAP for the S3 API. I often praise AWS for its innovation and pragmatism, but this is an example of something that only looks pragmatic. On the other hand, the AWS import/export mechanism is a useful alternative. It allows you to physically ship a device with a few terabytes of data to Amazon. This is technically an alternative to the S3 programmatic interface, but one with obviously differentiated use cases. I recommend you reserve the use of “alternative APIs” for such scenarios.

If it didn’t work for Tiger Woods, it won’t work for your Cloud API either. Learn to commit.

[CLARIFICATION: based on some of the early Twitter feedback on this entry, I want to clarify that it’s alternative versions that I am against, not successive versions (i.e. an evolution of the interface over time). How to manage successive versions properly is a whole other debate.]

4 Comments

Filed under Amazon, API, Cloud Computing, CMDB, CMDBf, DMTF, Everything, IT Systems Mgmt, Protocols, REST, SOAP, Specs, Standards, Utility computing, Web services

Square peg, REST hole

For all its goodness, REST sometimes feels like trying to fit a square peg in the proverbial round hole. Some interaction patterns just don’t lend themselves well to the REST approach. Here are a few examples, taken from the field of IT/Cloud management.

Long-lived operations. You can’t just hang on for a synchronous response. Tim Bray best described the situation, which he called Slow REST. Do you create an “action in progress” resource?

Query: how do you query for “all the instances of app foo deployed in a container that has patch 1234 installed” in a to-each-resource-its-own-URL world? I’ve seen proposals that create a “query” resource and build it up incrementally by POSTing constraints to it. Very RESTful. Very impractical too.

Events: the process of creating and managing subscriptions maps well to the resource-oriented RESTful approach. It’s when you consider event delivery mechanisms that things get nasty. You quickly end up worrying a lot more about firewalls and the cost of keeping HTTP connections open than about RESTful purity.

Enumeration: what if your resource state is a very long document and you’d rather retrieve it in increments? A basic GET is not going to cut it. You either have to improve on GET or, once again, create a specifically crafted resource (an enumeration context) to serve as a crutch for your protocol.

Filtering: take that same resource with a very long representation. Say you just want a small piece of it (e.g. one XML element). How do you retrieve just that piece?

Collections: it’s hard to manage many resources as one when they each have their own control endpoint. It’s especially infuriating when the URLs look like http://myCloud.com/resources/XXX where XXX, the only variable part, is a resource Id and you know – you just know – that there is one application processing all your messages and yet you can’t send it a unique message and tell it to apply the same request to a list of resources.

The afterlife: how do you retrieve data about a resource once it’s gone? Which is what a DELETE does to it. Except just because it’s been removed operationally doesn’t mean you have no interest in retrieving data about it.

I am not saying that these patterns cannot be supported in a RESTful way. In fact, the problem is that they can. A crafty engineer can come up with carefully-defined resources that would support all such usages. But at the cost of polluting the resource model with artifacts that have little to do with the business at hand and a lot more with the limitations of the access mechanism.

Now if we move from trying to do things in “the REST way” to doing them in “a way that is as simple as possible and uses HTTP smartly where appropriate” then we’re in a better situation as we don’t have to contort ourselves. It doesn’t mean that the problems above go away. Events, for example, are challenging to support even outside of any REST constraint. It just means we’re not tying one hand behind our back.

The risk of course is to loose out on many of the important benefits of REST (simplicity, robustness of links, flexibility…). Which is why it’s not a matter of using REST or not but a matter of using ideas from REST in a practical way.

With WS-*, on the other hand, we get a square peg to fit in a square hole. The problem there is that the peg is twice as wide as the hole…

37 Comments

Filed under Cloud Computing, Everything, Implementation, Modeling, Protocols, Query, REST, Utility computing

HP has submitted a specification to the DMTF Cloud incubator

When I lamented, in a previous post, that I couldn’t tell you about recent submissions to the DMTF Cloud incubator, one of those I had in mind was a submission from HP. I can now write this, because the author of the specification, Nigel Cook, has recently blogged about it. Unfortunately he is isn’t publishing the specification itself, just an announcement that it was submitted. Hopefully he is currently going through the long approval process to make the submitted document public (been there, done that, I know it takes time).

In the blog, Nigel makes a good argument for the need to go beyond a hypervisor-centric view of Cloud computing. Even at the IaaS layer there are cases of automated-but-not-virtualized deployment that have all the characteristics of Cloud computing and need to be supported by Cloud management APIs. Not to mention OS-level isolation like Solaris Containers.

Nigel also offers a spirited defense of SOAP-based protocols. I don’t necessarily agree with all his points (“one could easily map the web service definition I described to REST if that was important” suggests a “it’s just SOAP without the wrapper” view of REST), but I am glad he is launching this debate. We need to discuss this rather than assume that REST is the obvious answer. Remember, a few years ago SOAP was just as obvious an answer to any protocol question. It may well be that indeed REST comes out ahead of this discussion, but the process will force us to be explicit about what benefits of REST we are trying to achieve and will allow us to be practical in the way we approach it.

4 Comments

Filed under Automation, Cloud Computing, DMTF, Everything, HP, IT Systems Mgmt, Mgmt integration, Specs, Standards, Utility computing, Virtualization

Waiting for events (in Cloud APIs)

Events/alerts/notifications have been a central concept in IT management at least since the first SNMP trap was emitted, and probably even long before that. And yet they are curiously absent from all the Cloud management APIs/protocols. If you think that’s because “THE CLOUD CHANGES EVERYTHING” then you may have to think again. Over the last few days, two of the most experienced practitioners of Cloud computing pointed out that this omission is a real pain in the neck. RightScale’s Thorsten von Eicken was first to request “an event based interface instead of a request-reply based interface”, pointing out that “we run a good number of machines that do nothing but chew up 100% cpu polling EC2 to detect changes”. George Reese seconded and started to sketch a solution. And while these blog posts gave the issue increased visibility recently, it has been a recurring topic on the AWS Forum and other similar discussion boards for quite some time. For example, in this thread going back to 2006, an Amazon employee wrote that “this is a feature we’ve discussed recently and we’re looking at options” (incidentally, I see a post by Thorsten in that old thread). We’re still waiting.

Let’s look at what it would take to define such a feature.

I have some experience with events for IT management, having been involved in the WS-Notification family of specifications and having co-chaired the OASIS technical committee that standardized them. This post is not about foisting WS-Notification on Cloud APIs, but just about surfacing some of the questions that come up when you try to standardize such a mechanism. While the main use cases for WS-Notification came from IT (and Grid) management, it was supposed to be a generic mechanism. A Cloud-centric eventing protocol can be made simpler by focusing on fewer use cases (Cloud scenarios only). In addition, WS-Notification was marred by the complexity-is-a-sign-of-greatness spirit of the time . On this too, a Cloud eventing protocol could improve things by keeping IBM at bay simplicity in mind.

Types of event

When you pull the state of a resource to see if anything changed,  you don’t have to tell the provider what kind of change you are interested in. If, on the other hand, you want the provider to notify you, then they need to know what you care about. You may not want to be notified on every single change in the resource state. How do you describe the changes you care about? Is there an agreed-upon set of states for the resource and you are only notified on state transitions? Can you indicate the minimum severity level for an event to be emitted? Who determines the severity of an event? Or do you get to specify what fields in the resource state you want to watch? What about numeric values for which you may not want to be notified of every change but only when a threshold is crossed? Do you get to specify a query and get notified whenever the query result changes? In WS-Notification some of this is handled by WS-Topics which I still like conceptually (I co-edited it) but is too complex for the task at hand.

Event formats

What format are the events serialized in? How is the even metadata captured (e.g. time stamp of observation, which may not be the same as the time at which the notification message was sent)? If the event payload is a representation of the new state of the resource, does it indicate what field changes (and what the old value was)? How do you keep event payloads consistent with the resource representation in the request/response interactions? If many events occur near the same time, can you group them in one notification message for better scalability?

Subscription creation

Presumably you need a subscription mechanism. Is the subscription set in stone when the resource is created? Or can you come later and subscribe? If subscription is an operation on the resource itself, how do you subscribe for events on something that doesn’t exist yet (e.g. “create a VM and notify me once it’s started”)? Do you get to set subscriptions on a per-resource-basis? Or is this a global setting for all the resources that you own? Can you have two different subscriptions on the same resource (e.g. a “critical events only” subscription that exist throughout the life of the resource, plus a “lots of events please” subscription that you keep for a few hours while troubleshooting)?

Subscription management

Do you get to come back and update/pause/delete a subscription? Do you get to change what filter the subscription carries? Or is it set in stone until the subscription expires? Can you change the delivery endpoint? What if events fail to be delivered? Does the provider cancel your subscription? After how many failures? Does it just pause it for a few hours? Keep trying?

Subscription expiration

Who sets the expiration period? The subscriber? Can the provider set a max duration? Do you get a warning message before the subscription expires? Can you renew a subscription or do you have to create a new one? Do you get a message telling you that it has expired? Where are these subscription-lifecycle messages sent? To the same endpoint as the regular messages? What if your subscription is being killed because your deliver endpoint is down, clearly it makes no sense to send the warning message to that same endpoint. Do you provide a separate “subscription management” endpoint (different from the event delivery endpoint) when you subscribe? Alternatively, does an email message get sent to the registered user who set the subscription?

Delivery reliability

How reliable do you want the notifications to be? Should the emitter retry until they’ve received a confirmation? How long do they keep messages that can’t be delivered? Some may have a very short shelf life while others are still useful weeks later. If you don’t have a reliable mechanism but you really “need to know about a lost server within a minute of it disappearing” (the example Georges gives) then in reality you may still have to poll just to make sure that an event wasn’t lost. If you haven’t received an event in a while, how can you test if the subscription is still working? Should subscriptions send a heartbeat message once a while?

Delivery mechanism

How do you deliver notifications? Do you keep HTTP connections open through tricks similar to how self-updating web pages work (e.g. COMET, long polling and soon WebSockets)? Or do you just provide a listener endpoint to which the notifier tries to connect (which, in the case of public cloud deployments, means you need to have a publicly-addressable listener, but hopefully not on the same Cloud infrastructure). Do you use XMPP? AMQP? Email? Can I have you hold my events and let me come pull them?

Security

Do you need to verify the origin of the events you receive? Or do you assume they may be forged and always initiate a connection to the provider to double-check? And on the other side, what are the security requirements for event delivery? If a user looses some of their privileges, do you have to go and cancel the still-active subscriptions that they created?

Throttling

Is there a maximum event rate? Do you get charged for the events the Cloud provider sends you? How do you make sure that someone doesn’t create a subscription pointing to the wrong endpoint (either erroneously or maliciously, e.g. DoS). Do you send a test message at registration asking the delivery endpoint to acknowledge that they indeed want to receive these notifications?

Conclusion

My goal is not to argue that we cannot have a simple yet good enough notification system or to scare anyone from attempting to define it. It’s just to show that it’s not as simple as it may seem at first blush. But there probably is a sweetspot and people like Thorsten and George are very well qualified to find it.

[UPDATED 2010/4/7: Amazon releases AWS Simple notification Service. Not just as an eventing feature for the Cloud API, as a generic notification service. Which can, of course, also carry Cloud management events. Though at this point you’re on your own to publish them from your instances, it doesn’t look like the AWS infrastructure can do it for you. Which means, for example, that you’re not going to be able to publish an event for a sudden crash.]

11 Comments

Filed under API, Application Mgmt, Automation, Cloud Computing, Desired State, Everything, IT Systems Mgmt, Manageability, Mgmt integration, Protocols, Specs, Standards, Tech, Utility computing

Can Cloud standards be saved?

Then: Web services standards

One of the most frustrating aspects of how Web services standards shot themselves in the foot via unchecked complexity is that plenty of people were pointing out the problem as it happened. Mark Baker (to whom I noticed Don Box also paid tribute recently) is the poster child. I remember Tom Jordahl tirelessly arguing for keeping it simple in the WSDL working group. Amberpoint’s Fred Carter did it in WSDM (in the post announcing the recent Amberpoint acquisition, I mentioned that “their engineers brought to the [WSDM] group a unique level of experience and practical-mindedness” but I could have added “… which we, the large companies, mostly ignored.”)

The commonality between all these voices is that they didn’t come from the large companies. Instead they came from the “specialists” (independent contractors and representatives from small, specialized companies). Many of the WS-* debates were fought along alliance lines. Depending on the season it could be “IBM vs. Microsoft”, “IBM+Microsoft vs. Oracle”, “IBM+HP vs. Microsoft+Intel”, etc… They’d battle over one another’s proposal but tacitly agreed to brush off proposals from the smaller players. At least if they contained anything radically different from the content of the submission by the large companies. And simplicity is radical.

Now: Cloud standards

I do not reminisce about the WS-* standards wars just for old time sake or the joy of self-flagellation. I also hope that the current (and very important) wave of standards, related to all things Cloud, can do better than the Web services wave did with regards to involving on-the-ground experts.

Even though I still work for a large company, I’d like to see this fixed for Cloud standards. Not because I am a good guy (though I hope I am), but because I now realize that in the long run this lack of perspective even hurts the large companies themselves. We (and that includes IBM and Microsoft, the ringleaders of the WS-* effort) would be better off now if we had paid more attention then.

Here are two reasons why the necessity to involve and include specialists is even more applicable to Cloud standards than Web services.

First, there are many more individuals (or small companies) today with a lot of practical Cloud experience than there were small players with practical Web services experience when the WS-* standardization started (Shlomo Swidler, Mitch Garnaat, Randy Bias, John M. Willis, Sam Johnston, David Kavanagh, Adrian Cole, Edward M. Goldberg, Eric Hammond, Thorsten von Eicken and Guy Rosen come to mind, though this is nowhere near an exhaustive list). Which means there is even more to gain by ensuring that the Cloud standard process is open to them, should they choose to engage in some form.

Second, there is a transparency problem much larger than with Web services standards. For all their flaws, W3C and OASIS, where most of the WS-* work took place, are relatively transparent. Their processes and IP policies are clear and, most importantly, their mailing list archives are open to the public. DMTF, where VMWare, Fujitsu and others have submitted Cloud specifications, is at the other hand of the transparency spectrum. A few examples of what I mean by that:

  • I can tell you that VMWare and Fujitsu submitted specifications to DMTF, because the two companies each issued a press release to announce it. I can’t tell you which others did (and you can’t read their submissions) because these companies didn’t think it worthy of a press release. And DMTF keeps the submission confidential. That’s why I blogged about the vCloud submission and the Fujitsu submission but couldn’t provide equivalent analysis for the others.
  • The mailing lists of DMTF working groups are confidential. Even a DMTF member cannot see the message archive of a group unless he/she is a member of that specific group. The general public cannot see anything at all. And unless I missed it on the site, they cannot even know what DMTF working groups exist. It makes you wonder whether Dick Cheney decided to call his social club of energy company executives a “Task Force” because he was inspired by the secrecy of the DMTF (“Distributed Management Task Force”). Even when the work is finished and the standard published, the DMTF won’t release the mailing list archive, even though these discussions can be a great reference for people who later use the specification.
  • Working documents are also confidential. Working groups can decide to publish some intermediate work, but this needs to be an explicit decision of the group, then approved by its parent group, and in practice it happens rarely (mileage varies depending on the groups).
  • Even when a document is published, the process to provide feedback from the outside seems designed to thwart any attempt. Or at least that’s what it does in practice. Having blogged a fair amount on technical details of two DMTF standards (CMDBf and WS-Management) I often get questions and comments about these specifications from readers. I encourage them to bring their comments to the group and point them to the official feedback page. Not once have I, as a working group participant, seen the comments come out on the other end of the process.

So let’s recap. People outside of DMTF don’t know what work is going on (even if they happen to know that a working group called “Cloud this” or “Cloud that” has been started, the charter documents and therefore the precise scope and list of deliverables are also confidential). Even if they knew, they couldn’t get to see the work. And even if they did, there is no convenient way for them to provide feedback (which would probably arrive too late anyway). And joining the organization would be quite a selfless act because they then have to pay for the privilege of sharing their expertise while not being included in the real deciding circles anyway (unless there are ready to pony up for the top membership levels). That’s because of the unclear and unstable processes as well as the inordinate influence of board members and officers who all are also company representatives (in W3C, the strong staff balances the influence of the sponsors, in OASIS the bylaws limit arbitrariness by the board members).

What we are missing out on

Many in the standards community have heard me rant on this topic before. What pushed me over the edge and motivated me to write this entry was stumbling on a crystal clear illustration of what we are missing out on. I submit to you this post by Adrian Cole and the follow-up (twice)by Thorsten von Eicken. After spending two days at a face to face meeting of the DMTF Cloud incubator (in an undisclosed location) this week, I’ll just say that these posts illustrate a level of practically and a grounding in real-life Cloud usage that was not evident in all the discussions of the incubator. You don’t see Adrian and Thorsten arguing about the meaning of the word “infrastructure”, do you? I’d love to point you to the DMTF meeting minutes so you can judge for yourself, but by now you should understand why I can’t.

So instead of helping in the forum where big vendors submit their specifications, the specialists (some of them at least) go work in OGF, and produce OCCI (here is the mailing list archive). When Thorsten von Eicken blogs about his experience using Cloud APIs, they welcome the feedback and engage him to look at their work. The OCCI work is nice, but my concern is that we are now going to end up with at least two sets of standard specifications (in addition to the multitude of company-controlled specifications, like the ubiquitous EC2 API). One from the big companies and one from the specialists. And if you think that the simplest, clearest and most practical one will automatically win, well I envy your optimism. Up to a point. I don’t know if one specification will crush the other, if we’ll have a “reconciliation” process, if one is going to be used in “private Clouds” and the other in “public Clouds” or if the conflict will just make both mostly irrelevant. What I do know is that this is not what I want to see happen. Rather, the big vendors (whose imprimatur is needed) and the specialists (whose experience is indispensable) should work together to make the standard technically practical and widely adopted. I don’t care where it happens. I don’t know whether now is the right time or too early. I just know that when the time comes it needs to be done right. And I don’t like the way it’s shaping up at the moment. Well-meaning but toothless efforts like cloud-standards.org don’t make me feel better.

I know this blog post will be read both by my friends in DMTF and by my friends in Clouderati. I just want them to meet. That could be quite a party.

IBM was on to something when it produced this standards participation policy (which I commented on in a cynical-yet-supportive way – and yes I realize the same cynicism can apply to me). But I haven’t heard of any practical effect of this policy change. Has anyone seen any? Isn’t the Cloud standard wave the right time to translate it into action?

Transparency first

I realize that it takes more than transparency to convince specialists to take a look at what a working group is doing and share their thoughts. Even in a fully transparent situation, specialists will eventually give up if they are stonewalled by process lawyers or just ignored and marginalized (many working group participants have little bandwidth and typically take their cues from the big vendors even in the absence of explicit corporate alignment). And this is hard to fix. Processes serve a purpose. While they can be used against the smaller players, they also in many cases protect them. Plus, for every enlightened specialist who gets discouraged, there is a nutcase who gets neutralized by the need to put up a clear proposal and follow a process. I don’t see a good way to prevent large vendors from using the process to pressure smaller ones if that’s what they intend to do. Let’s at least prevent this from happening unintentionally. Maybe some of my colleagues  from large companies will also ask themselves whether it wouldn’t be to their own benefit to actually help qualified specialists to contribute. Some “positive discrimination” might be in order, to lighten the process burden in some way for those with practical expertise, limited resources, and the willingness to offer some could-otherwise-be-billable hours.

In any case, improving transparency is the simplest, fastest and most obvious step that needs to be taken. Not doing it because it won’t solve everything is like not doing CPR on someone on the pretext that it would only restart his heart but not cure his rheumatism.

What’s at risk if we fail to leverage the huge amount of practical Cloud expertise from smaller players in the standards work? Nothing less than an unpractical set of specifications that will fail to realize the promises of Cloud interoperability. And quite possibly even delay them. We’ve seen it before, haven’t we?

Notice how I haven’t mentioned customers? It’s a typical “feel-good” line in every lament about standards to say that “we need more customer involvement”. It’s true, but the lament is old and hasn’t, in my experience, solved anything. And today’s economical climate makes me even more dubious that direct customer involvement is going to keep us on track for this standardization wave (though I’d love to be proven wrong). Opening the door to on-the-ground-working-with-customers experts with a very neutral and pragmatic perspective has a better chance of success in my mind.

As a point of clarification, I am not asking large companies to pick a few small companies out of their partner ecosystem and give them a 10% discount on their alliance membership fee in exchange for showing up in the standards groups and supporting their friendly sponsor. This is a common trick, used to pack a committee, get the votes and create an impression of overwhelming industry support. Nobody should pick who the specialists are. We should do all we can to encourage them to come. It will be pretty clear who they are when they start to ask pointed questions about the work.

Finally, from the archives, a more humorous look at how various standards bodies compare. And the proof that my complaints about DMTF secrecy aren’t new.

12 Comments

Filed under Cloud Computing, CMDBf, DMTF, Everything, HP, IBM, Mgmt integration, Microsoft, Oracle, People, Protocols, Specs, Standards, Utility computing, VMware, W3C, Web services, WS-Management

Is Business Process Execution the killer app for PaaS?

Have you noticed the slow build-up of business process engines available “as a service”? Force.com recently introduced a “Visual Process Manager”. Amazon is looking for product managers to help customers “securely compos[e] processes using capabilities from all parts of their organization as well as those outside their organization, including existing legacy applications, long-running activities, human interactions, cloud services, or even complex processes provided by business partners”. I’ve read somewhere (can’t find a link right now) that WSO2 was planning to make its Business Process Server available as a Cloud service. I haven’t tracked Azure very closely, but I expect AppFabric to soon support a BizTalk-like process engine. And I wouldn’t be surprised if VMWare decided to make an acquisition in the area of business process execution.

Attacking PaaS from the business process angle is counter-intuitive. Rather, isn’t the obvious low-hanging fruit for PaaS a simple synchronous HTTP request handler (e.g. a servlet or its Python, Ruby, etc equivalent)? Which is what Google App Engine (GAE) and Heroku mainly provide. GAE almost defined PaaS as a category in the same way that Amazon EC2 defined IaaS. The expectation that a CGI or servlet-like container naturally precedes a business process engine is also reinforced by the history of middleware stacks. Simple HTTP request-response is the first thing that gets defined (the first version of the servlet package was java.servlet.* since it even predates javax), the first thing that gets standardized (JSR 53: servlet 2.3 and JSP 1.2) and the first thing that gets widely commoditized (e.g. Apache Tomcat). Rather than a core part of the middleware stack, business process engines (BPEL and the like) are typically thought of as a more “advanced” or “enterprise” capability, one that come later, as part of the extended middleware stack.

But nothing says it has to be that way. If you think about it a bit longer, there are some reasons why business process execution might actually be a more logical beach head for PaaS  than simple HTTP request handlers.

1) Small contract

Architecturally, the contract between a business process engine and the deployed entities (process definitions) is much smaller than the contract of a GAE-style HTTP handler. Those GAE contracts include an entire programming language and lots of libraries. A BPEL container, on the other hand, has a simple contract. It’s documented in one specification (plus a few dependencies) and offers basic activities like routing logic, message correlation, simple data manipulation, compensation handlers and service invocation. You may not think of BPEL as “simple” but would you rather implement a BPEL engine or a complete Python interpreter along with most of the core libraries? I thought so. That’s what I mean by a simpler (narrower) contract. And BPEL is just one example, I suspect some PaaS platforms will take a more bare-bone approach (e.g. no “scopes”).

Just like “good fences make good neighbors”, small contracts make good Cloud services. When your container only interprets a business process definition (typically an XML document), you don’t need to worry about intercepting/preventing all the nasty low-level APIs (e.g. unfettered network access, filesystem reads, OS calls…) that are not acceptable in a PaaS situation. But that is what Google had to do in the process of pairing down a general-purpose programming language to fit into the constraints of a PaaS container. There is no intrinsic reason why a synchronous HTTP request handler has to have access to image-manipulation libraries and a business process handler doesn’t. But the use cases tend to push you in that direction and the expectations have been set. As a result, a business process engine is architecturally a better candidate for being delivered as a Cloud service.

2) Major differentiator over IaaS-based solutions

Practically speaking, it is pretty easy today to get a (synchronous) Web app framework up and running “in the Cloud”. Provisioning a Django, PHP, RoR or Tomcat (plus the Java framework of your choice) stack on EC2 is a well-traveled path. Even auto-scaling these things is pretty well understood. I am the first one to scream that “here is an AMI of our server stack” is *not* the same as PaaS, but truth be told many people are happy enough with it. As a result, the benefit of going from a “web app on IaaS” situation to GAE-like situation is not perceived as very compelling. I suspect the realization may hit later, but for now people are happy to trade the simplified administration and extra scalability of PaaS for the ability to keep their current framework (MySQL and all) unchanged.

There is no fundamental reason why you can’t run a business process engine on top of an IaaS-provisioned infrastructure. It’s just that you are mostly on your own at this point. Even if you find an existing public AMI that meets your needs, I doubt you’ll find a well-tested way to manage, backup and auto-scale this system (marrying IaaS-level invocations with container-level and DB-level tasks). Or if you do it will probably cost you. In that “new frontier” context, a true PaaS alternative to the “build it on top of IaaS” approach is a lot more compelling than if all you need is yet another RoR-on-EC2 system.

When deciding whether to walk back to your hotel after dinner or take a cab, you don’t just consider the distance. How familiar you are with the neighborhood and how safe it appears are also important parameters.

3) There is an existing market

This may not be obvious to people who come to PaaS from a Web application framework perspective, but there is a large market for business process engines in enterprise integration scenarios. Whether it’s Oracle Fusion Middleware, Microsoft BizTalk, webMethods (now Software AG) or others, this is a very common and useful tool in the enterprise computing toolbox. If this is the market you are after (rather than creating Facebook apps or the next Twitter), then you have to address this need. Not to mention that business processes engines are often used for partner integration scenarios (which makes hosting in a public Cloud a natural choice).

Conclusion

In the end, both synchronous and asynchronous execution engines are useful, as are other core services like storage (here is my proposed list of PaaS container types). I just wanted to bring some attention to business process execution because I think PaaS is the context in which its profile will rise to higher prominence. I also anticipate that this rise will lead to some very interesting progress and innovation in the way these processes are defined, deployed and managed. We haven’t yet seen, in this area, the relentless evolutionary pressure that has shaped today’s synchronous Web application frameworks. Fun times ahead.

[UPDATED 2010/2/18: More information about Salesforce.com’s Visual Process Manager.]

1 Comment

Filed under Application Mgmt, BPEL, BPM, Business Process, Cloud Computing, Everything, Google App Engine, Middleware, PaaS, Portability, Tech, Utility computing

Everything you always wanted to know about SaaS but were afraid to ask

What makes one Web applications “Software as a Service” (SaaS) and another a “plain old Web application” (POWA)? Or is there no such distinction?

Wouldn’t it be convenient if we had an answer that has some functional relevance? Here are the different axis on which I (unsuccessfully) tried to project Web applications to sort them between SaaS and POWA:

1) Amount of data

Hypothesis: If the Web application stores a lot of your data, it’s SaaS, otherwise it’s a POWA. For example, Webmails like GMail and Yahoo Mail have a lot. So do hosted CRM systems. They are all SaaS. Zillow and Expedia do not, so they are POWA.

2) Criticality of data

Hypothesis: Flickr and YouTube may have many Gigabytes of your data, but it’s not critical data (and you most likely have a local copy), so they are POWA. An on-line password manager, or a federated identification service, stores little information about you but it’s important information so they are SaaS.

3) Customization

Hypothesis: If you’ve invested time to customize the Web application (like Salesforce.com) then it’s SaaS. If, on the other hand, users all get pretty much the same interface, give or take some minor branding (e.g. YouTube or GMail), then it’s a POWA.

4) Complexity

Hypothesis: If it takes many man-years to build it it’s SaaS, otherwise it’s a POWA. So Google Search is SaaS, but twitpics.com is a POWA.

5) Organization mandate

Hypothesis: If all your employees use the same Web application for a given task, that’s SaaS. For example, some companies mandate that all travel reservations go through American Express or Carlson Wagonlit so they are SaaS. If your company lets you use Expedia or any travel site of your choice and you just expense the bill, then it’s a POWA.

6) Administration

Hypothesis: If your company has its own administrator(s) to manage the use of the Web application by its employees, then it’s SaaS. Otherwise it’s a POWA. For example, Google Apps is SaaS, but Facebook is a POWA.

7) Payment

Hypothesis: If you pay for using the Web application, its’ SaaS. In that respect, LotusLive (and porn sites) are SaaS. Facebook and Twitter are POWA.

8) SLA

Hypothesis: If you have guarantees of service, and they are backed by some compensation, then it’s SaaS. For example Aplicor and AWS S3 are SaaS, but (last I heard) SalesForce.com is not.

9) Compliance

Hypothesis: If the Web application conforms to your compliance needs, then it’s SaaS. By this metric, pretty much nothing is SaaS if you have any serious compliance requirement, it’s all POWA.

10) On-premise alternative

Hypothesis: If the Web application replaces something that you could do on-premise, then it’s SaaS. If it intrinsically involves a third-party provider, then it’s a POWA. Zoho (which replaces Office+Sharepoint) is SaaS, while Amazon (the store) and EBay are POWA. You couldn’t  run your own version of them, at least not as a replacement for the public versions (you could run an internal auction site, but that’s a different use case from what you use EBay for).

11) On demand

Hypothesis: If you can initiate service via just a Web request it’s SaaS. If it requires you to interact with a human it’s a POWA. Or is it the other way around? Most silly little Web apps out there would fit the SaaS bill by this definition, while LotusLive-type deployments (or HP Cloud Assure) would not. So are we now saying that Cloud requires *more* friction?

12) Scalable

Hypothesis: If you can quickly ramp up your usage (in terms of load per user and/or number of users) without having to warn the provider and without noticing performance degradations then it’s SaaS, otherwise it’s POWA. The Amazon storefront and GMail are SaaS, Twitter is a POWA.

13) Pay per use

Hypothesis: If your usage of the Web application is metered and your bill is based on it then it’s SaaS. Otherwise it’s a POWA. iTunes is SaaS. GMail is a POWA.

13) Primarily computational

Hypothesis: If the service provided is primarily computational (storage/processing of data) it’s SaaS, otherwise it’s a POWA. GMail and Salesforce are SaaS. A Web application for pizza delivery or travel reservation is a POWA. Banking services are SaaS (let’s chew on that one for a second).

14) Programmatic usage

Hypothesis: If the web application is primarily used programmatically (e.g. SOA) it’s SaaS, if it’s primarily used by humans through the Web then it’s a POWA. There aren’t that many SaaS applications by that measure. Maybe some financial/trading/information services? Or AWS S3 (which has the amazing property of belonging to IaaS, PaaS and SaaS depending on who you ask). Or iTunes?

15) Standard interface

Hypothesis: If there is a standard interface to the Web application, then it’s SaaS, otherwise it’s just a POWA. Webmails (at least those that support IMAP) are SaaS. Most other Web applications are POWA, by virtue of there being very few standard application-level interfaces (remember RosettaNet?).

16) Gets the CIO on magazine covers

Hypothesis: If adoption of a Web application gets the CIO on the cover of trade magazines, then it’s SaaS. Otherwise it’s a POWA. The problem with this definition is that it’s a moving line. The first CIO to use external email or CRM probably landed on a few cover pages. Soon nothing short of putting your pacemaker firmware in the Cloud will.

17) Open Source

Hypothesis: If the Web application is built using Open Source software it’s SaaS. Otherwise it’s a POWA. WordPress.com and the various Zimbra hosted providers are therefore SaaS but SalesForce.com is not. Yes, that’s a silly criteria to even consider. But someone was going to bring it up, so let’s kill it now.

18) Enterprise usage

Hypothesis: Web applications used for business are SaaS. Those used by consumers are POWA. For example GMail (as part of Google Apps) is SaaS while the regular GMail is… a POWA.

Bottom Line

Which of these sorting criteria work for you? I don’t find any of them convincing. Many feel very artificial, as the example for the last one illustrates. Several (like the first three) seem to test how tied to the Web application you are (with the hypothesis that commitment means true SaaS while a one-night-stand is just casual SaaS, aka POWA). But this goes against the common notion that “Cloud” means more flexibility, not less.

I could list more question, but this list should suffice to illustrate the difficulty of establishing a Litmus test that separates SaaS from POWA. Even if we give up on a clear definition, I don’t even feel like I can say that “I know SaaS when I see it”. At least not in a way that I can expect most others to agree with.

Are there actually useful characteristics that am I forgetting to test against? Is it a carefully weighted combination of those above? In the end, is there a reasonable way to tell SaaS apart from POWA? If we can’t find an answer that people eventually agree on and that has functional relevance (by which I mean that there is a meaningful difference between how you consume/manage SaaS vs. POWA) then I’ll have to conclude that the term SaaS only exists for some of the following (bad) reasons:

  • A two level-stack (IaaS and Paas) looks silly,
  • adding SaaS to IaaS and PaaS instantly inflates the size, adoption and relevance of the “Cloud” umbrella , by co-opting pre-existing and already-thriving businesses,
  • conversely, riding the coattail of the hot “Cloud” buzzword helps providers of such applications. SaaS sounds much better than the worn-out “ASP” appellation, especially with ASP.Net cheapening it.

After throwing stones, here is my proposal.

There is a definition I would like to use to qualify which Web applications qualify for the SaaS appellation: it’s any Web application which satisfies all (or at least several) of the layer 3 benefits in this “Taxonomy of Cloud Computing Benefits”. The first test is to get authentication and authorization right. The second is to offer proper programmatic remote interfaces. Etc (there are 5, remember to scroll down to “layer 3” to find them).

But if we take that list as a definition, rather than just as an aspiration, then let’s face the truth: right now there isn’t a single SaaS offering in the market.

5 Comments

Filed under Application Mgmt, Cloud Computing, Everything, SaaS, Utility computing

Generalizing the Cloud vs. SOA Governance debate

There have been some interesting discussions recently about the relationship between Cloud management and SOA management/governance (run-time and design-time). My only regret is that they are a bit too focused on determining winners and loosers rather than defining what victory looks like (a bit like arguing whether the smartphone is the triumph of the phone over the computer or of the computer over the phone instead of discussing what makes a good smartphone).

To define victory, we need to answer this seemingly simple question: in what ways is the relationship between a VM and its hypervisor different from the relationship between two communicating applications?

More generally, there are three broad categories of relationships between the “active” elements of an IT system (by “active” I am excluding configuration, organization, management and security artifacts, like patch, department, ticket and user, respectively, to concentrate instead on the elements that are on the invocation path at runtime). We need to understand if/how/why these categories differ in how we manage them:

  • Deployment relationships: a machine (or VM) in a physical host (or hypervisor), a JEE application in an application server, a business process in a process engine, etc…
  • Infrastructure dependency relationships (other than containment): from an application to the DB that persists its data, from an application tier to web server that fronts it, from a batch job to the scheduler that launches it, etc…
  • Application dependency relationships: from an application to a web service it invokes, from a mash-up to an Atom feed it pulls, from a portal to a remote portlet, etc…

In the old days, the lines between these categories seemed pretty clear and we rarely even thought of them in the same terms. They were created and managed in different ways, by different people, at different times. Some were established as part of a process, others in a more ad-hoc way. Some took place by walking around with a CD, others via a console, others via a centralized repository. Some of these relationships were inventoried in spreadsheets, others on white boards, some in CMDBs, others just in code and in someone’s head. Some involved senior IT staff, others were up to developers and others were left to whoever was manning the controls when stuff broke.

It was a bit like the relationships you have with the taxi that takes you to the airport, the TSA agent who scans you and the pilot who flies you to your destination. You know they are all involved in your travel, but they are very distinct in how you experience and approach them.

It all changes with the Cloud (used as a short hand for virtualization, management automation, on-demand provisioning, 3rd-party hosting, metered usage, etc…). The advent of the hypervisor is the most obvious source of change: relationships that were mostly static become dynamic; also, where you used to manage just the parts (the host and the OS, often even mixed as one), you now manage not just the parts but the relationship between them (the deployment of a VM in a hypervisor). But it’s not just hypervisors. It’s frameworks, APIs, models, protocols, tools. Put them all together and you realize that:

  • the IT resources involved in all three categories of relationships can all be thought of as services being consumed (an “X86+ethernet emulation” service exposed by the hypervisor, a “JEE-compatible platform” service exposed by the application server, an “RDB service” expose by the database, a Web services exposed via SOAP or XML/JSON over HTTP, etc…),
  • they can also be set up as services, by simply sending a request to the API of the service provider,
  • not only can they be set up as services, they are also invoked as such, via well-documented (and often standard) interfaces,
  • they can also all be managed in a similar service-centric way, via performance metrics, SLAs, policies, etc,
  • your orchestration code may have to deal with all three categories, (e.g. an application slowdown might be addressed either by modifying its application dependencies, reconfiguring its infrastructure or initiating a new deployment),
  • the relationships in all these categories now have the potential to cross organization boundaries and involve external providers, possibly with usage-based billing,
  • as a result of all this, your IT automation system really needs a simple, consistent, standard way to handle all these relationships. Automation works best when you’ve simplified and standardize the environment to which it is applied.

If you’re a SOA person, your mental model for this is SOA++ and you pull out your SOA management and governance (config and runtime) tools. If you are in the WS-* obedience of SOA, you go back to WS-Management, try to see what it would take to slap a WSDL on a hypervisor and start dreaming of OVF over MTOM/XOP. If you’re into middleware modeling you might start to have visions of SCA models that extend all the way down to the hardware, or at least of getting SCA and OSGi to ally and conquer the world. If you’re a CMDB person, you may tell yourself that now is the time for the CMDB to do what you’ve been pretending it was doing all along and actually extend all the way into the application. Then you may have that “single source of truth” on which the automation code can reliably work. Or if you see the world through the “Cloud API” goggles, then this “consistent and standard” way to manage relationships at all three layers looks like what your Cloud API of choice will eventually do, as it grows from IaaS to PaaS and SaaS.

Your background may shape your reference model for this unified service-centric approach to IT management, but the bottom line is that we’d all like a nice, clear conceptual model to bridge and unify Cloud (provisioning and containment), application configuration and SOA relationships. A model in which we have services/containers with well-defined operational contracts (and on-demand provisioning interfaces). Consumers/components with well-defined requirements. APIs to connect the two, with predictable results (both in functional and non-functional terms). Policies and SLAs to fine-tune the quality of service. A management framework that monitors these policies and SLAs. A common security infrastructure that gets out of the way. A metering/billing framework that spans all these interactions. All this while keeping out of sight all the resource-specific work needed behind the scene, so that the automation code can look as Zen as a Japanese garden.

It doesn’t mean that there won’t be separations, roles, processes. We may still want to partition the IT management tasks, but we should first have a chance to rejigger what’s in each category. It might, for example, make sense to handle provider relationships in a consistent way whether they are “deployment relationships” (e.g. EC2 or your private IaaS Cloud) or “application dependency relationships” (e.g. SOA, internal or external). On the other hand, some of the relationships currently lumped in the “infrastructure dependency relationships” category because they are “config files stuff” may find different homes depending on whether they remain low-level and resource-specific or they are absorbed in a higher-level platform contract. Any fracture in the management of this overall IT infrastructure should be voluntary, based on legal, financial or human requirements. And not based on protocol, model, security and tool disconnect, on legacy approaches, on myopic metering, that we later rationalize as “the way we’d want things to be anyway because that’s what we are used to”.

In the application configuration management universe, there is a planetary collision scheduled between the hypervisor-centric view of the world (where virtual disk formats wrap themselves in OVF, then something like OVA to address, at least at launch time, application and infrastructure dependency relationships) and the application-model view of the world (SOA, SCA, Microsoft Oslo at least as it was initially defined, various application frameworks…). Microsoft Azure will have an answer, VMWare/Springsouce will have one, Oracle will too (though I can’t talk about it), Amazon might (especially as it keeps adding to its PaaS portfolio) or it might let its ecosystem sort it out, IBM probably has Rational, WebSphere and Tivoli distinguished engineers locked into a room, discussing and over-engineering it at this very minute, etc.

There is a lot at stake, and it would be nice if this was driven (industry-wide or at least within each of the contenders) by a clear understanding of what we are aiming for rather than a race to cobble together partial solutions based on existing control points and products (e.g. the hypervisor-centric party).

[UPDATED 2010/1/25: For an illustration of my statement that “if you’re a SOA person, your mental model for this is SOA++”, see Joe McKendrick’s “SOA’s Seven Greatest Mysteries Unveiled” (bullet #6: “When you get right down to it, cloud is the acquisition or provisioning of reusable services that cross enterprise walls. (…)  They are service oriented architecture, and they rely on SOA-based principles to function.”)]

6 Comments

Filed under Application Mgmt, Automation, Cloud Computing, CMDB, Everything, Governance, IT Systems Mgmt, ITIL, Mgmt integration, Middleware, Modeling, OSGi, SCA, Utility computing, Virtualization, WS-Management

Backward-compatible vs. forward-compatible: a tale of two clouds

There is the Cloud that provides value by requiring as few changes as possible. And there is the Cloud that provides value by raising the abstraction and operation level. The backward-compatible Cloud versus the forward-compatible Cloud.

The main selling point of the backward-compatible Cloud is that you can take your existing applications, tools, configurations, customizations, processes etc and transition them more or less as they are. It’s what allowed hypervisors to spread so quickly in the enterprise.

The main selling point of the forward-compatible Cloud is that you are more productive and focused. Fewer configuration items to worry about, fewer stack components to install/monitor/update, you can focus on your application and your business goals. You develop and manage at the level of application concepts, not systems. Bottom line, you write and deploy applications more quickly, cheaply and reliably.

To a large extent this maps to the distinction between IaaS and PaaS, but it’s not that simple. For example, a PaaS that endeavors to be a complete JEE environment is mainly aiming for the backward-compatible value proposition. On the other hand, EC2 spot instances, while part of the IaaS layer, are of the forward-compatible kind: not meant to run your current applications unchanged, but rather to give you ways to create applications that better align with your business goals.

Part of the confusion is that it’s sometimes unclear whether a given environment is aiming for forward-compatibility (and voluntary simplification) or whether its goal is backward-compatibility but it hasn’t yet achieved it. Take EC2 for example. At first it didn’t look much like a traditional datacenter, beyond the ability to create hosts. Then we got fixed IP, EBS, boot from EBS, etc and it got more and more realistic to run applications unchanged. But not quite, as this recent complaint by Hoff illustrates. He wants a lot more control on the network setup so he can deploy existing n-tier applications that have specific network topology/config requirements without re-engineering them.  It’s a perfectly reasonable request, in the context of the backward-compatible Cloud value proposition. But one that will never be granted by a Cloud that aims for forward-compatibility.

Similarly, the forward-compatible Cloud doesn’t always successfully abstract away lower-level concerns. It’s one thing to say you don’t have to worry about backup and security but it means that you now have to make sure that your Cloud provider handles them at an acceptable level. And even on technical grounds, abstractions still leak. Take Google App Engine, for example. In theory you only deal with requests and not even think about the servers that process them (you have no idea how many servers are used). That’s nice, but once a while your Java application gets a DeadlineExceededException. That’s because the GAE platform had to start using a new JVM to serve this request (for example, your traffic is growing or the JVM previously used went down) and it took too long for the application to load in the new JVM, resulting in this loading request being killed. So you, as the developer, have to take special steps to mitigate a problem that originates at a lower level of the stack than you’re supposed to be concerned about.

All in all, the distinction between backward-compatible and forward-compatible Clouds is not a classification (most Cloud environments are a mix). Rather, it’s another mental axis on which to project your Cloud plans. It’s another way to think about the benefits that you expect from your use of the Cloud. Both providers and consumers should understand what they are aiming for on that axis. Hopefully this can help prevent shout matches of the “it’s a bug, no it’s a feature” variety.

[UPDATED 2010/3/4: Apparently, Steve Ballmer thinks along the same lines. Though the way he sees it, Azure is forward-compatible, while Amazon is backward-compatible: “I think Amazon has done a nice job of helping you take the server-based programming model – the programming model of yesterday, that is not scale-agnostic – and then bringing it into the cloud. On the other hand, what we’re trying to do with Azure is let you write a different kind of application.“]

[UPDATED 2010/3/5: I now have the quasi-proof that indeed Steve Ballmer stole the idea from my blog. Look at this entry in my HTTP log. This visitor came the evening before Steve’s “Cloud” talk at the University of Washington. I guess I am not the only one to procrastinate until the 11th hour when I have a deadline. Every piece of information in this log entry points at Steve Ballmer. How can it be anyone other than him?

131.107.0.71 - - [03/Mar/2010:23:51:52 -0800] "GET /archives/1198 HTTP/1.0" 200 4820 "http://www.bing.com/search?q=Brilliant+Cloud+Insight" "Mozilla/1.22 (compatible; MSIE 2.0; Microsoft Bob)"

(in case you are not fluent in the syntax and semantics of HTTP log files, this is a joke)]

9 Comments

Filed under Amazon, Application Mgmt, Cloud Computing, Everything, Google App Engine, IT Systems Mgmt, Utility computing, Virtualization

Taxonomy of Cloud Computing Benefits

One of the heavily discussed Cloud topics in early 2009 was a  Cloud Computing taxonomy. Now that this theme has died down (with limited results), and to start 2010 in a similar form, here is a proposal for a taxonomy of the benefits of Cloud Computing.

Just like the original Cloud Computing taxonomy only had three layers (IaaS/PaaS/SaaS), so does this taxonomy of Cloud benefits. The point of this post is to promote the third layer. I describe layers 1 and 2  mainly to better call out what’s specific about layer 3.

Layer 1 (infrastructure: “let someone else do it”)

This is the bare-bottom, inherent benefit of Cloud Computing: you don’t have to deal with the hardware. In practice, it means:

  • no need to worry about power/cooling,
  • on-demand provisioning/deprovisioning (machines appear/disappear in a way physical machines do not),
  • not responsible for physical security (though responsible for ensuring that the provider has an acceptable security level),
  • economies of scale (for equipment purchase and operations),
  • potential environmental benefits,
  • etc…

Layer 2 (management: “let a program do it”)

More specifically, more automated IT management. This does not require Cloud Computing (you can have a highly automated IT management environment on premise), but the move to Cloud Computing is the trigger that is making it really happen. While this capability is not an inherent benefit of Cloud Computing, the Cloud makes it:

  • Needed: You don’t get to put color tags on machines, you don’t get to bring a DVD to install a new application, you don’t get to open a machine to insert more memory, you don’t get to go retrieve a backup tape, label it and put it in a safe, etc. Of course loosing these “privileges” doesn’t sound bad considering that they are mostly chores, but it means that you have to design alternative (and mostly programmatic) ways to perform the functions that these tasks addressed.
  • Easier: Cloud environments are highly API-driven. Many IT tools from the previous generation were console-centric (people would go out and buy “a network/event/system management console“) with APIs/protocols as a secondary thought. In Cloud environments, tools are a lot more API-centric with the console as an adjunct (anyone has stats about the ratio of EC2 instances provisioned via the AWS console versus the APIs?). This is also why even though a lot of people wanted standard management protocols (of the WSDM/WS-Management generation), there wasn’t as much of a realization of their importance in the old environment (and not as much pressure to create them and eagerness to adopt them). The stakes and visibility are a lot higher in the Cloud environments and that’s why this second wave of protocols will have to succeed where the previous one came short.
  • More beneficial: Once you have automated IT management in a traditional data center, what you get is fewer employees needed and somewhat better utilization. But you are still gated by the time/process to purchase/install new machines and the cost of unused machines (at least with automation you don’t have to pay their power/cooling). You don’t get the “just what I need” level of infrastructure usage that the same automation work allows in a Cloud setting.

Layer 3 (applications: “do it right”)

In short, use the move to the Cloud as an opportunity to fix some of the key issues of today’s applications. Think of the Cloud switch as a second Y2K, 10 years later: like in 2000, not only are there things that the transition requires you to fix, there are also many things that aren’t exactly required to fix but still make sense to fix as part of the larger modernization effort. Of course the Cloud move is missing that ever-so-valuable project management motivator of a firm deadline, but hopefully competitive pressure can play a similar role.

What are these issues? Here is a partial list:

  • Security: at least authentication and authorization. We have SSO/Federation systems, both enterprise-type and Web-centric and they often suck in practice. Whether it’s because of the protocols, the implementations, the tools or the mindset. Plus, there are too many of them. As applications gained mouths and ears and started to communicate with one another, the problem became obvious. If, in the Cloud, you also want them to grow legs and be able to move around (wholly or in parts) then it really really has to get fixed. Not to mention the “all or nothing” delegation model that I am surprised hasn’t yet created a major disaster (let’s see what 2010 has in store). I suggested a band-aid fix earlier, but this needs a real solution (the Cloud Security Alliance provides some guidance in this document, see “domain 12” for IAM).
  • Get remote application interfaces right. It’s been discussed, manifesto’ed, buried and lampooned many times before (this was my humble take on it). Whether it’s because of WS-* or, more likely, java2wsdl we have been delayed in this but it simply has to happen. Call it SOAP, zenSOAP, REST, practical REST or whatever you want. Just make sure that all important functions and data are accessible via clear, documented, consistent, easy-to-use, on-the-wire interfaces. Once we have these interfaces, and only then, we can worry about reliably composing/orchestrating applications that cross organizational boundaries.
  • Related to the previous point, clean up the incestuous relationship between an application and its data. Actually, it’s not “its” data. It’s the data it works on.
  • Deliver application-centric IT management. Quit loosing and (badly) re-creating information: for example, an application deployment followed by a black-box discovery (“what did I just do”?). Or after-the-fact re-establishing correlations between events on different servers (“what was this about”?). Application management too often looks like a day in the life of a senile person.
  • Fault-tolerance and disaster recovery. It is too often lacking (or untested, which is the same) for applications that are just below the perceived threshold of requiring it to be done right. That threshold needs to be lowered and the move to the Cloud can be used to make this possible.

[You should also read Tim Bray’s perspective (and Stefan Tilkov’s comment) on the process/methodology/tools for enterprise applications, an orthogonal (but related) area of improvement. More fundamental.]

As I mentioned above, these are mostly not Cloud specific (though it is possible to create a Cloud connection for each). They are things that we have known about and tried to fix for a while. But the pace has been pretty slow and there is an opportunity for the Cloud transition to do more than just hand out the keys of the datacenter.

What kinds of benefits are you aiming for in your Cloud plans?

[UPDATED 2010/01/11: An interesting take on a similar topic by Brenda Michelson: 5 Enduring Aspects of Cloud Computing]

[UPDATED 2010/01/14: Along the same lines (but looking at it in the other direction), an interesting graph from Alistair Croll of Bitcurrent.]

11 Comments

Filed under Application Mgmt, Automation, Cloud Computing, Ecology, Everything, IT Systems Mgmt, Mgmt integration, Security, Utility computing

PaaS as the path to MDA?

Lots of communities think of Cloud Computing as the realization of a vision that they have been pusuing for a while (“sure we didn’t call it Cloud back then but…”). Just ask the Grid folks, the dynamic data center folks (DCML, IBM’s “Autonomic Computing”, HP’s “Adaptive Enterprise”,  Microsoft’s DSI), the ASP community, and those of us who toiled on what was going to be the SOAP-based management stack for all IT (e.g. my HP colleagues and I can selectively quote mentions of “adaptation mechanisms like resource reservation, allocation/de-allocation” and “management as a service” in this WSMF white paper from 2003 to portray WSMF as a precursor to all the Cloud APIs of today).

I thought of another such community today, as I ran into older OMG specifications: the Model-Driven Architecture (MDA) community. I have no idea what people in this community actually think of Cloud Computing, but it seems to me that PaaS is a chance to come close to part of their vision. For two reasons: PaaS makes it easier and more rewarding, all at the same time, to practice model-driven design. More bang for less buck.

Easier

My understanding of the MDA value proposition is that it would allow you to create a high-level design (at the level of something like an augmented version of UML) and have it automatically turn into executable code (e.g. that can run in a JEE or .NET container). I am probably making it sound more naive than it really is, but not by much. That’s a might wide gap to bridge, for QVT and friends, from UMLish to byte-code and it’s no surprise that the practical benefits of MDA are still to be seen (to put it kindly).

In a PaaS/SaaS world, on the other hand, you are mapping to something that is higher level than byte code. Depending on what types of PaaS containers you envision, some of the abstractions provided by these containers (e.g. business process execution, event processing) are a lot closer to the concepts manipulated in your PIM (Platform-independent model, the UMLish mentioned above). Thus a smaller gap to bridge and a better chance of it being automagical. Especially if you add a few SaaS building blocks to the mix.

More rewarding

Not only should it be easier to map a PIM to a PaaS deployment environments, the benefits you get once you are done are incommensurably greater. Rather than getting a dump of opaque auto-generated byte-code running in a regular JVM/CLR, you get an environments in which the design concepts (actors/services, process, rules, events) and the business model elements are first class citizens of the platform management infrastructure. So that you can monitor and set policies on the same things that you manipulate in you PIM. As opposed to falling down to the lowest common denominator of CPU/memory metrics. Or, god forbid, trying to diagnose/optimize machine-generated code.

We shall see

I wasn’t thinking of Microsoft SQL Server Modeling (previously known as Oslo) when I wrote this, but Doug Purdy’s tweet made the connection for me. And indeed, one can see in SQLSM+Azure the leading candidate today to realizing the MDA vision… minus the OMG MDA specifications.

[Note: I wasn’t planning to blog this, but after I tweeted the basic idea (“Attempting MDA (model-driven architecture) before inventing model-driven deployment and mgmt was hopeless. Now possibly getting there.”) Shlomo requested more details and I got frustrated by the difficulty to explain my point in twitterisms. In effect, this blog entry is just an expanded tweet, not something as intensely believed, fanatically researched and authoritatively supported as my usual blog posts (ah!).]

[UPDATED 2009/12/29: Some relevant presentations from OMG-land, thanks to Jean Bezivin. Though I don’t see mention of any specific plan to use/adapt MOF/XMI/QVT/etc for the Cloud.]

4 Comments

Filed under Application Mgmt, Automation, Azure, BPM, Business Process, Cloud Computing, Everything, Implementation, Microsoft, Middleware, Modeling, Specs, Standards, Utility computing

REST in practice for IT and Cloud management (part 3: wrap-up)

[Preface: a few months ago I shared some thoughts about how REST was (or could) be applied to IT and Cloud management. Part 1 was a comparison of the RESTful aspects of four well-known IaaS Cloud APIs and part 2 was an analysis of how REST applies to configuration management. Both of these entries received well-informed reader comments BTW, so if you read the posts but didn’t come back for the comments you really owe it to yourself to do so now. At the time, I jotted down thoughts for subsequent entries in this series, but I never got around to posting them. Since the topic seems to be getting a lot of attention these days (especially in DMTF) I decided to go back to these notes and see if I could extract a few practical recommendations in the form of a wrap-up.]

The findings listed below should be relevant whether your protocol is trying to be truly RESTful, just HTTP-centric or even zen-SOAPy. Many of the issues that arise when creating a protocol that maps well to IT management use cases should transcend these variations and that’s what I try to cover.

Finding #1: Relationships (links) are first-class entities (a.k.a. “hypermedia”)

The clear conclusion of both part 1 and part 2 was that the most relevant part of REST for IT and Cloud management is the use of hypermedia. IT management enjoys a head start on this compared to other domains, because its models are already rich in explicit relationships (e.g. CIM associations), as opposed to other business domains in which relationships are more implicit (to the end user at least). But REST teaches us that just having relationships in your model is not enough. They need to be exposed in a way that maps directly to the protocol, so that following a relationship is an infrastructure-level task, not an application-level task: passing an ID as a parameter for some domain-specific function is not it.

This doesn’t violate the rule to not mix the protocol and the model because the alignment should take place in the metamodel. XML is famously weak in that respect, but that’s where Atom steps in, handling relationships in a generic way. Similarly, support for references is, in addition to its accolade to Schematron, one of the main benefits of SML (extra kudos for apparently dropping the “EPR” reference scheme between submission and standardization, in favor of just the “URI” scheme). Not to mention RDFa and friends. Or HTTP Link headers (explained) for link-challenged types.

Finding #2: Put IDs on steroids

There is little to argue about the value of clearly identifying things of interest and we didn’t wait for the Web to realize this. But it is also one of the most vexing and complex problems in many areas of computing (including IT management). Some of the long-standing questions include:

  • Use an opaque ID (some random-looking string a characters) or an ID grounded in “unique” properties of the resource (if you can find any)?
  • At what point does a thing stop being the same (typical example: if I replace each hardware component of a server one after the other, at which point is it not the same server anymore? Does it make sense for the IT guys to slap an “asset id” sticker on the plastic box around it?)
  • How do you deal with reconciling two resources (with their own IDs) when you realize they represent the same thing?

REST guidelines don’t help with these questions. There often is an assumption, which is true for many web apps, that the application “owns” the resource. My “inbox” only exists as a resource within the mail server application (e.g. Gmail or an Exchange server). Whatever URI GMail assigns for it is the URI for my inbox, period. Things are not as simple when the resources exist outside of any specific application: take a server, for example: the board management controller (or the hypervisor in the case of a VM), the OS management layer and the management agent installed on the machine all have claims to report on the machine (and therefore a need to identify it).

To some extent, Cloud computing simplifies many of these issues by providing controllers that “own” infrastructure resources and can authoritatively identify them. But it really is only pushing the problem to the next level of the stack.

Making the ID a URI doesn’t magically answer these questions. Though it helps in that it lets you leverage reconciliation mechanisms developed around URIs (such as <atom:link rel=”alternate”> or owl:sameAs). What REST does is add another constraint to this ID mechanism: Make the IDs dereferenceable URLs rather than just URIs.

I buy into this. A simple GET on a resource URI doesn’t solve everything but it has so many advantages that it should be attempted in all cases. And make this HTTP GET please (see finding #6).

In this adoption of GET, we just have to deal with small details such as:

  • What URL do I use for resources that have more than one agent/controller?
  • How close to the resource do I point this URL? If it’s too close to it then it may change as the resource evolves (e.g. network changes) or be affected by the resource performance (e.g. a crashed machine or application that does not respond to its management API). If it’s removed from the resource, then I introduce a scope (e.g. one controller) within which the resource has to remain, which may cause scalability concerns (how many VMs can/should one controller handle, what if I want to migrate a VM across the ocean…).

These are somewhat corner cases (and the more automation and virtualization you get, the fewer possible controllers you have per resource). While they need to be addressed, they don’t come close to negating the value of dereferenceable IDs. In addition, there are plenty of mechanisms to help with the issues above, from links in the representations (obviously) to RDDL-style lightweight directory to a last resort “give Saint Peter a call” mechanism (the original WSRF proposal had a sub-specification called WS-RenewableReferences that would let you ask for a new version of an expired EPR but it was never published — WS-Naming in then-GGF also touched on that with its reference resolvers — showing once again that the base challenges don’t change as fast as technology flavors).

Implicit in this is the fact that URIs are vastly superior to EPRs. The latter were only just a band-aid on a broken system (which may have started back when WSDL 1.1 decided to define “ports” as message aggregators that can have only one URL) and it’s been more debilitating to SOAP than any other interoperability issue. Web services containers internalized this assumption to the point of providing a stunted dispatch mechanism that made it very hard to assign distinct URLs to resources.

Finding #3: If REST told you to jump off a bridge, would you do it?

Adherence to REST is not required to get the benefits I describe in this series. There is a lot to be inspired by in REST, but it shouldn’t be a religion. Sure, if you squint hard enough (and poke it here and there) you can call your interface RESTful, but why bother with the contortions if some parts are not so. As long as they don’t detract from the value of REST in the other parts. As in all conversions, the most fervent adepts of RPC will likely be tempted to become its most violent denunciators once they’re born again. This is a tired scenario that we don’t need to repeat. Don’t think of it as a conversion but as a new perspective.

Look at the “RESTful with many parameters?” comment thread on Stefan Tilkov’s excellent InfoQ introduction to REST. It starts with some shared distaste for parameter-laden URIs and a search for a more RESTful approach. This gets suggested:

You could do a post on some URI like ./query/product_dep which would create a query resource. Now you “add” products to the query either by sending a product uri list with the initial post or by calling post on ./query/product_dep/{id}. With every post to the query resource the get on the query resource would change.

Yeah, you could. But how about an RPC-like query operation rather than having yet another resource lifecycle to manage just for the sake of being REST-compliant? And BTW, how do you think any sane consumer of your API is going to handle this? You guessed it, by packaging the POST/POST/GET/DELETE in one convenient client-side library function called “query”. As much as I criticize RPC-centric toolkits (see finding #5 below), it would be justified in this case.

Either you understand why/how REST principles benefit you or you don’t. If you do, then use this understanding to interpret the REST principles to best fit your needs. If you don’t, then no amount of CONTENT-TYPE-pixie-dust-spreading, GET-PUT-POST-DELETE-golden-rule-following and HATEOAS-magical-incantation-reciting will help you. That’s the whole point, for me at least, of this tree-part investigation. Stefan says essential the same, but in a converse way, in his article: “there are often reasons why one would violate a REST constraint, simply because every constraint induces some trade-off that might not be acceptable in a particular situation. But often, REST constraints are violated due to a simple lack of understanding of their benefits.” He says “understand why you violate” and I say “understand why you obey”. It is essentially the same (if you’re into stereotypes you can attribute the difference to his Germanic heritage and my Gallic blood).

Even worse than bending your interface to appear RESTful, don’t cherry-pick your use cases to only keep those that you feel you can properly address via REST, leaving the others aside. Conversely, don’t add requirements just because REST makes them easy to support (interesting how quickly “why do you force me to manage the lifecycle of yet another resource just to run a query” turns into “isn’t this great, you can share queries among users and you can handle long-running queries, I am sure we need this”).

This is not to say that you should not create a fully RESTful system. Just that you don’t necessarily have to and you can still get many benefits as long as you open your eyes to the cost/benefits trade-off involved.

Finding #4: Learn humility from REST

Beyond the technology, there is a vibe behind REST design. You can copy the technology and still miss it. I described it in 2005 as Humble Architecture, and applied to SOA at the time. But it describes REST just as well:

More practically, this means that the key things to keep in mind when creating a service, is that you are not at the center of the universe, that you don’t know who is going to consume your service, that you don’t know what they are going to do with it, that you are not necessarily the one who can make the best use of the information you have access to and that you should be willing to share it with others openly…

The SOA Manifesto recently called this “intrinsic interoperability”.

In IT management terms, it means that you can RESTify your CMDB and your event console and your asset management software and your automation engine all you want, if you see your code as the ultimate consumer and the one that knows best, as the UI that users have to go through, the “ultimate source of truth” and the “manager of managers” then it doesn’t matter how well you use HTTP.

Finding #5: Beware of tools bearing gifts

To a large extent, the great thing about REST is how few tools there are to take it away from you. So you’re pretty much forced to understand what is going on in your contract as opposed to being kept ignorant by a wsdl2java type of toolkit. Sure, Java (and .NET) have improved in that regard, but really the cultural damage is done and the expectations have been set. Contrast this to “the ‘router’ is just a big case statement over URI-matching regexps”, from Tim Bray’s post on the Sun Cloud API, one of my main inspirations for this investigation.

REST is not inherently immune to the tool-controlling-the-hand syndrome. It’s just a matter of time until such tools try to make REST “accessible” to the “normal” developer (who can supposedly prevent thread deadlocks but not parse XML). Joe Gregorio warns about this in the context of WADL (to summarize: WADL brings XSD which leads to code generation). Keep this in mind next time someone states that REST is more “loosely coupled” than SOAP. It’s how you use it that matters.

Finding #6: Use screws, not glue, so we can peer inside and then close the lid again

The “view source” option is how I and many others learned HTML. It unfortunately created a generation of HTML monsters who never went past version 3.2 (the marbled background makes me feel young again). But it also fueled the explosion of the Web. On-the-wire inspection through soapUI is what allowed me to perform this investigation and report on it (WMI has allowed this for years, but WS-Management is what made it accessible and usable for anyone on any platform). This was, of course, in the context of SOAP which is also inspectable. Still, in that respect nothing beats plain HTTP which is why I recommend HTTP GET in finding #2 (make IDs dereferenceable) even though I don’t expect that the one-page-per-resource view is going to be the only way to access it in the finished product.

These (HTML source, on-the-wire XML and resource-description pages) rarely hit the human eye and yet their presence enables the development of the more commonly used views. Making it as easy as possible to see what is going on under the covers helps with learning, with debugging, with extending and with innovating. In the same way that 99% of web users don’t look at the HTML source (and 99.99% of them don’t see the HTTP requests) but the Web would not be what it is to them if this inspectability wasn’t been there to fuel its development.

Along the same line, make as few assumptions as possible about the consumers in your interfaces. Which, in practice, often means document what goes on the wire. WSDL/WADL can be used as a format, but they are at most one small component. Human-readable semantics are much more important.

Finding #7: Nothing is free

Part of what was so attractive about SOAP is everything you were going to get “for free” by using it. Message-level security (for all these use cases where your messages starts over HTTP, then hops onto a train, then get delivered by a carrier pigeon). Reliable messaging. Transactionality. Intermediaries (they were going to be a big deal in SOAP, as you can see in vestigial form today in the Nodes/Roles left in the spec – also, do you remember WS-Routing? I do.)

And it’s true that by now there is a body of specifications that support this as composable SOAP headers. But the lack of usage of these features contrasts with how often they were bandied in the early days of SOAP.

Well, I am detecting some of the same in the REST camp. How often have you heard about how REST enables caching? Or about how content types allows an ISP to compress images on the fly to speed up delivery over dial-up? Like in the SOAP case, these are real features and sometimes useful. It doesn’t mean that they are valuable to you. And if they are not, then don’t let them be used as justifications. Especially since they are not free. If caching doesn’t help me (because of low volume, because security considerations prevent a shared cache, etc) then its presence actually adds a cost to me, since I now have to worry whether something is cached or not and deal with ETags. Or I have to consistently remember to request the cache to be bypassed.

Finding #8: Starting by sweeping you front door.

Before you agonize about how RESTful your back-end management protocol is, how about you make sure that your management application (the user front-end) is a decent Web application? One with cool URIs , where the back button works, where bookmarks work, where the data is not hidden in some over-encompassing Flash/Silverlight thingy. Just saying.

***

Now for some questions still unanswered.

Question #1: Is this a flee market?

I am highly dubious of content negotiation and yet I can see many advantages to it. Mostly along the lines of finding #6: make it easy for people to look under the hood and get hold of the data. If you let them specify how they want to see the data, it’s obviously easier.

But there is no free lunch. Even if your infrastructure takes care of generating these different views for you (“no coding, just check the box”), you are expanding the surface of your contract. This means more documentation, more testing, more interoperability problems and more friction when time comes to modify the interface.

I don’t have enough experience with format negotiation to define the sweetspot of this practice. Is it one XML representation and one HTML, period (everything else get produced by the client by transforming the XML)? But is the XML Atom-wrapped or not? What about RDF? What about JSON? Not to forget that SOAP wrapper, how hard can it be to add. But soon enough we are in legacy hell.

Question #2: Mime-types?

The second part of Joe Gregorio’s WADL entry is all about Mime types and I have a harder time following him there. For one thing, I am a bit puzzled by the different directions in which Mime types go at the same time. For example, we have image formats (e.g. “image/png”), packaging/compression formats (e.g. “application/zip”) and application formats (e.g. “application/vnd.oasis.opendocument.text” or “application/msword”). But what if I have a zip full of PNG images? And aren’t modern word processing formats basically a zip of XML files? If I don’t have the appropriate viewer, maybe I’d like them to be at least recognized as ZIP files. I don’t see support for such composition and taxonomy in these types.

And even within one type, things seem a bit messy in practice. Looking at the registered applications in the “options” menu of my Firefox browser, I see plenty of duplication:

  • application/zip vs. application/x-zip-compressed
  • application/ms-powerpoint vs. application/vnd.ms-powerpoint
  • application/sdp vs. application/x-sdp
  • audio/mpeg vs. audio/x-mpeg
  • video/x-ms-asf vs. video/x-ms-asf-plugin

I also wonder at what level of depth I want to take my Mime types. Sure I can use Atom as a package but if the items I am passing around happen to be CIM classes (serialized to XML), doesn’t it make sense to advertise this? And within these classes, can I let you know which domain (e.g. which namespace) my resources are in (virtual machines versus support tickets)?

These questions may simply be a reflection of my lack of maturity in the fine art of using Mime types as part of protocol design. My experience with them is more of the “find the type that works through trial and error and then leave it alone” kind.

[Side note: the first time I had to pay attention to Mime types was back in 1995/1996, playing with non-parsed headers and the multipart/x-mixed-replace type to bring some dynamism to web pages (that was before JavaScript or even animated GIFs). The site is still up, but the admins have messed up the Apache config so that the CGIs aren’t executed anymore but return the Python code. So, here are some early Python experiments from yours truly: this script was a “pushed” countdown and this one was a “pushed” image animation. Cool stuff at the time, though not in a “get a date” kind of way.]

On the other hand, I very much agree with Joe’s point that “less is more”, i.e. that by not dictating how the semantics of a Mime type are defined the system forces you to think about the proper way to define them (e.g. an English-language RFC). As opposed to WSDL/XSD which gives the impression that once your XML validator turns green you’re done describing your interface. These syntactic validations are a complement at best, and usually not a very useful one (see “fat-bottomed specs”).

In comments on previous posts, Stu Charlton also emphasizes the value that Mime types bring. “Hypermedia advocates exposing a variety of links for such state-transitions, along with potentially unique media types to describe interfaces to those transitions.” I get the hypermedia concept, the HATEOAS approach and its very practical benefits. But I am still dubious about the role of Mime types in achieving them and I am not the only one with such qualms. I have too much respect for Joe and Stu to dismiss it entirely, but until I get an example that makes it “click” in practice for me I won’t sweat about Mime types too much.

Question #3: Riding the Zeitgeist?

That’s a practical question rather than a technical one, but as a protocol creator/promoter you are going to have to decide whether you market it as “RESTful”. If I have learned one thing in my past involvement with standards it is that marketing/positioning/impressions matter for standards as much as for products. To a large extent, for Clouds, Linked Data is a more appropriate label. But that provides little marketing/credibility humph with CIOs compared to REST (and less buzzword-compliance for the tech press). So maybe you want to write your spec based on Linked Data and then market it with a REST ribbon (the two are very compatible anyway). Just keep in mind that REST is the obvious choice for protocols in 2009 in the same way that SOAP was a few years ago.

Of course this is not an issue if you specification is truly RESTful. But none of the current Cloud “RESTful” APIs is, and I don’t expect this to change. At least if you go by Roy Fielding’s definition (or Paul’s handy summary):

A REST API must not define fixed resource names or hierarchies (an obvious coupling of client and server). Servers must have the freedom to control their own namespace. Instead, allow servers to instruct clients on how to construct appropriate URIs, such as is done in HTML forms and URI templates, by defining those instructions within media types and link relations. [Failure here implies that clients are assuming a resource structure due to out-of band information, such as a domain-specific standard, which is the data-oriented equivalent to RPC’s functional coupling].

And (in a comment) Mark Baker adds:

I’ve reviewed lots of “REST APIs”, many of them privately for clients, and a common theme I’ve noticed is that most folks coming from a CORBA/DCE/DCOM/WS-* background, despite all the REST knowledge I’ve implanted into their heads, still cannot get away from the need to “specify the interface”. Sometimes this manifests itself through predefined relationships between resources, specifying URI structure, or listing the possible response codes received from different resources in response to the standard 4 methods (usually a combination of all those). I expect it’s just habit. But a second round of harping on the uniform interface – that every service has the same interface and so any service-specific interface specification only serves to increase coupling – sets them straight.

So the question of whether you want to market yourself as RESTful (rather than just as “inspired by the proper use of HTTP illustrated by REST”) is relevant, if only because you may find the father of REST throwing (POSTing?) tomatoes at you. There is always a risk in wearing clothes that look good but don’t quite fit you. The worst time for your pants to fall off is when you suddenly have to start running.

For more on this, refer to Ted Neward’s excellent Roy decoder ring where he not only explains what Roy means but more importantly clarifies that “if you’re not doing REST, it doesn’t mean that your API sucks” (to which I’d add that it is actually more likely to suck if you try to ape REST than if you allow yourself to be loosely inspired by it).

***

Wrapping up the wrap-up

There is one key topic that I had originally included in this wrap-up but decided to remove: extensibility. Mark Hapner brings it up in a comment on a previous post:

It is interesting to note that HTML does not provide namespaces but this hasn’t limited its capabilities. The reason is that links are a very effective mechanism for composing resources. Rather than composition via complicated ‘embedding’ mechanisms such as namespaces, the web composes resources via links. If HTML hadn’t provided open-ended, embeddable links there would be no web.

I am the kind of guy who would have namespace-qualified his children when naming them (had my wife not stepped in) so I don’t necessarily see “extension via links” as a negation of the need for namespaces (best example: RDF). The whole topic of embedding versus linking is a great one but this post doesn’t need another thousand words and the “REST in practice” umbrella is not necessarily the best one for this discussion. So I hereby conclude my “REST in practice for IT and Cloud management” series, with the intent to eventually start a “Linked Data in practice for IT and Cloud management” series in which extensibility will be properly handled. And we can also talk about querying (conspicuously absent from Cloud APIs, unless CMDBf is now a Cloud API) and versioning. As a teaser for the application of Linked Data to IT/Cloud, I will leave you with what Vint Cerf has to say.

[UPDATED 2010/1/27: I still haven’t written the promised “Linked Data in practice for IT and Cloud management” post, but this explanation of the usage of Linked Data for data.gov.uk pretty much says it all. I may still write a post describing how what Jeni says about government data applies to Cloud management APIs, but it’s almost too obvious to bother. Actually, there may be reasons why Cloud management benefits even more from Linked Data than UK government data, so it may still be worth a post. At some point. When I convince myself that it may influence things rather than be background noise.]

15 Comments

Filed under API, Application Mgmt, Automation, Cloud Computing, Everything, IT Systems Mgmt, Manageability, Mgmt integration, Modeling, Protocols, REST, Semantic tech, SOA, SOAP, Specs, Utility computing

Can I get a price check on this AMI?

I almost titled this entry “Cloud + Tivoli = $” in reference to the previous one (“Cloud + proprietary software = ♥”). In that earlier entry, I described the opportunity for Cloud providers to benefit themselves, their customers and software vendors by drastically reducing the frictions involved in using proprietary software (rather than open source software). The example I used was Windows EC2 instances. But it’s not the best example because there is a very tight relationship between Amazon and Microsoft on this. In many ways, these Windows instances are “hard-coded” in EC2: they have a special credential retrieval mechanism, their price appears in the main EC2 price list, etc. This cannot scale as a generic Amazon-mediated payment service for many software vendors.

Rather than the special case of Windows instances, the more interesting situation to look at is the availability of vendor-provided EC2 instances at a higher price. So I went to look a bit more into this, and I came out… very confused and $20 poorer.

Earlier in the week, I had noticed an announcement of IBM Tivoli on EC2 that explained that “the hourly price for Tivoli on EC2 includes an IBM license”. This seemed like a perfect place for me to start. My first question was “how much does it cost”? The blog entry doesn’t say. It links to a Tivoli on EC2 FAQ on the IBM site, which doesn’t say either (apparently IBM’s target customers work in recession-proof industries and do not “frequently ask” about prices). I then followed the link to the overall IBM and AWS FAQ but it just states that “charges will be announced by Amazon Web Services in the coming months”. Both FAQs explain how to use your traditional IBM license on EC2, but that’s not what I am after. At this point, I feel like a third-world tourist who entered a high-end jewelry store in Paris where no price is displayed. Call me plebeian, but I am more accustomed to Target-like stores with price-check scanners in the aisles…

I hypothesized that the AWS console might show me the price when I select the Tivoli AMIs. But no such luck. Tired of searching, and since I was already in the console, I figured I’d just launch an instance and see the hourly cost in my account usage. Since it comes in three versions (depending on how many targets you want Tivoli to manage), I launched one of each. Additionally, for one of them I launched instances of two different sizes so I can verify that the price difference is equal to the base EC2 price difference between such instance sizes. Here is what I got:

ec2-tivoli-bill

Of course, by the time my account usage page was updated (it took a few hours) I had found the price list which in retrospect wasn’t that hard to find (from Amazon, not IBM).

So maybe I am not the brightest droplet in the cloud, but for 20 bucks I consider that at least I bought the right to make a point: these prices should not be just on some web page. They should be accessible at the time of launch, in the console. And also in the EC2 API, so that the various EC2 tools can retrieve them. Whether it’s just for human display or to use as part of some automation logic, this should be available in an authoritative manner, without the need to scrape a page.

The other thing that bothers me is the need to decide upfront whether I want to launch a Tivoli instance to manage 50 virtual cores, 200 virtual cores or 600 virtual cores. That feels very inelastic for an EC2 deployment. I want to be charged for the actual number of virtual cores I am managing at any point in time. I realize the difficulty in metering this way (the need for Tivoli to report this to AWS, the issue of trust…) but hopefully it will eventually get there.

While I am talking about future improvements, another limitation is that there can currently only be one vendor per AMI. What if someone wants to write an application that runs on top of Oracle Middleware and package this as a paid AMI? It would be nice if Amazon eventually allowed the price of the instance to be split three-ways (Amazon, Oracle, application vendor).

In any case, now you know why this investigation left me poorer. The confused part comes from the fact that I had earlier experimented with Amazon Paid AMIs and it was an entirely different experience. Better in some ways: you get a clear price list upfront such as this.

ec2-paid-ami-price

But not as good in other ways: you have to purchase the paid AMI in a way that it is somewhat disconnected from the launch of the instance. And for some reason you paid for this directly out of your credit card as opposed to it going to your AWS usage account along with all your other charges. I would expect that many customers will use these paid AMIs are part of a larger EC2 deployment and as such it seems awkward to have it billed separately.

But overall, it’s the disconnect between the two that the confuses me. Are there two different types of paid AMIs (three if you include the Windows EC2 instances)? What am I missing?

The next step in my investigation should probably be to create an AMI and set a price on it, so I get the vendor’s view in addition to the consumer’s view. And maybe I can earn my $20 back in the process…

7 Comments

Filed under Amazon, Application Mgmt, Business, Cloud Computing, Everything, IBM, IT Systems Mgmt, Utility computing

Cloud + proprietary software = ♥

When I left HP for Oracle, in the summer of 2007, a friend made the argument that Cloud Computing (I think we were still saying “Utility Computing” at the time) would be the death of proprietary software.

There was a lot to support this view. EC2 was one year old and its usage was overwhelmingly based on open source software (OSS). For proprietary software, there was no clear understanding of how licensing terms applied to Cloud deployments. Not only would most sales reps not know the answer, back then they probably wouldn’t have comprehended the question.

Two and a half years later… well it doesn’t look all that different at first blush. EC2 seems to still be a largely OSS-dominated playground. Especially since the most obvious (though not necessarily the most accurate) measure is to peruse the description/content of public AMIs. They are (predictably since you can’t generally redistribute proprietary software) almost entirely OSS-based, with the exception of the public AMIs provided by software vendors themselves (Oracle, IBM…).

And yet the situation has improved for usage of proprietary software in the Cloud. Most software vendors have embraced Cloud deployments and clarified licensing issues (taking the example of Oracle, here is its Cloud Computing Center page, an overview of the licensing policy for Cloud deployments and the AWS/Oracle partnership page to encourage the use of Oracle software on EC2; none of this existed in 2007).

But these can be called reactive adaptations. At best (depending on the variability of your load and whether you use an Unlimited License Agreement), these moves have brought the “proprietary vs. OSS” debate in the Cloud back to the same parameters as for on-premise deployments. They have simply corrected the initial challenge of not even knowing how to license proprietary software for Cloud deployments.

But it’s not stopping here. What we are seeing is some of the parameters for the “proprietary vs. OSS” debate become more friendly towards proprietary software in the Cloud than on-premise. I am referring to the emergence of EC2 instances for which the software licenses are included in the per-hour rate for server instances. This pretty much removes license management as a concern altogether. The main example is Windows EC2 instances. As a user, you don’t have to deal with any Windows license consideration. You just request a machine and use it. Which of course doesn’t mean you are not paying for Windows. These Windows instances are more expensive than comparable Linux instances (e.g. $0.34 versus $0.48 per hour in the case of a “standard/large” instance) and Amazon takes care of paying Microsoft.

The removal of license management as a concern may not make a big difference to large corporations that have an Unlimited License Agreement, but for smaller companies it may take a chunk out of the reasons to use open source software. Not only do you not have to track license usage (and renewal), you never have to spend time with a sales rep. You don’t have to ask yourself at what point in your beta program you’ve moved from a legitimate use of the (often free) development license to a situation in which you need a production license. Including the software license directly in the cost of the base Cloud resource (e.g. the virtual machine instance) makes planning (and auto-scaling) easier: you use the same algorithm as in the “free license” situation, just with a different per hour cost. The trade-off becomes quantitative rather than qualitative. You can trade a given software stack against a faster CPU or more memory or more storage, depending on which combination serves your needs better. It doesn’t matter if the value you get from the instance comes from the software in the image or the hardware. This moves IaaS closer to PaaS (force.com doesn’t itemize your bill between hardware cost and software cost). Anything that makes IaaS more like PaaS is good for IaaS.

From an earlier post, about virtual appliances:

As with all things computer-related, the issue is going to get blurrier and then irrelevant . The great thing about software is that there is no solid line. In this case, we will eventually get more customized appliances (via appliance builders or model-driven appliance generation) blurring the line between installed software and appliance-based software.

I was referring to a blurring line in terms of how software is managed, but it’s also true in terms of how it is licensed.

There are of course many other reasons (than license management) why people use open source software rather than proprietary. The most obvious being the cost of the license (which, as we have seen, doesn’t go away but just gets incorporated in the base Cloud instance rate). Or they may simply prefer a given open source product independently of any licensing aspect. Some need access to the underlying code, to customize/improve it for their purpose. Or they may be leery of depending on one entity for the long-term viability of their platform. There may even be some who suspect any software that they don’t examine/compile themselves to contain backdoors (though these people are presumably not candidates for Cloud deployments in he first place). Etc. These reasons remain pretty much unchanged in the Cloud. But, anecdotally at least, removing license management concerns from both manual and automated tasks is a big improvement.

If Cloud providers get it right (which would require being smarter than wireless service providers, a low bar) and software vendors play ball, the “proprietary vs. OSS” debate may become more favorable to proprietary software in the Cloud than it is on-premise. For the benefit of customers, software vendors and Cloud providers. Hopefully Amazon will succeed where telcos mostly failed, in providing a convenient application metering/billing service for 3rd-party software offered on top of their infrastructural services (without otherwise getting in the way). Anybody remembers the Minitel? Today we’d call that a “Terminal as a Service” offering and if it did one thing well (beyond displaying green characters) it was billing. Which reminds me, I probably still have one in my parent’s basement.

[Note: every page of this blog mentions, at the bottom, that “the statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation” but in this case I will repeat it here, in the body of the post.]

[UPDATED 2009/12/29: The Register seems to agree. In fact, they come close to paraphrasing this blog entry:

“It’s proprietary applications offered by enterprise mainstays such as Oracle, IBM, and other big vendors that may turn out to be the big winners. The big vendors simply manipulated and corrected their licensing strategies to offer their applications in an on-demand or subscription manner.

Amazonian middlemen

AWS, for example, now offers EC2 instances for which the software licenses are included in the per-hour rate for server instances. This means that users who want to run Windows applications don’t have to deal with dreaded Windows licensing – instead, they simply request a machine and use it while Amazon deals with paying Microsoft.”]

[UPDATED 2010/1/25: I think this “Cloud as Monetization Strategy for Open Source” post by Geva Perry (based on an earlier post by Savio Rodrigues ) confirms that in the Cloud the line between open source and proprietary software is thinning.]

[UPDATED 2010/11/12: Related and interesting post on the AWS blog: Cloud Licensing Models That Exist Today]

6 Comments

Filed under Amazon, Application Mgmt, Automation, Business, Cloud Computing, Everything, Open source, Oracle, Utility computing, Virtual appliance

Can your hypervisor radio for air support?

As I was reading about Microsoft Azure recently, a military analogy came to my mind. Hypervisors are tanks. Application development and runtime platforms compose the air force.

Tanks (and more generally the mechanization of ground forces) transformed war in the 20th century. They multiplied the fighting capabilities of individuals and changed the way war was fought. A traditional army didn’t stand a chance against a mechanized one. More importantly, a mechanized army that used the new tools with the old mindset didn’t stand a chance against a similarly equipped army that had rethought its strategy to take advantage of the new capabilities. Consider France at the beginning of WWII, where tanks were just canons on wheels, spread evenly along the front line to support ground troops. Contrast this with how Germany, as part of the Blitzkrieg, used tanks and radios to create highly mobile – and yet coordinated – units that caused havoc in the linear French defense.

Exercise for the reader who wants to push the analogy further:

  • Describe how Blitzkrieg-style mobility of troops (based on tanks and motorized troop transports) compares to Live Migration of virtual machines.
  • Describe how the use of radios by these troops compares to the use of monitoring and control protocols to frame IT management actions.

Tanks (hypervisor) were a game-changer in a world of foot soldiers (dedicated servers).

But no matter how good your tanks are, you are at a disadvantage if the other party achieves air superiority. A less sophisticated/numerous ground force that benefits from strong air support is likely to prevail over a stronger ground force with no such support. That’s what came to my mind as I read about how Azure plans to cover the IaaS layer, but in the context of an application-and-data-centric approach. Where hypervisors are not left to fend for themselves based on the limited view of the horizion from the periscope of their turrets but rather orchestrated, supported (and even deployed) from the air, from the application platform.

C-130 tank airdrop

(Yes, I am referring to the Azure vision as it was presented at PDC09, not necessarily the currently available bits.)

Does your Cloud vendor/provider need an air force?

Exercise for the reader who wants to push the analogy to the stratosphere:

  • Describe how business logic/process, business transaction management and business intelligence are equivalent to satellites, surveying the battlefield and providing actionable intelligence.

The new Cloud stack (“military-cloud complex” version):

cloud-military-stack

[Note: I have no expertise in military history (or strategy) beyond high school classes about WWI and WWII, plus a couple of history books and a few war movies. My goal here is less to be accurate on military concerns (though I hope to be) than to draw an analogy which may be meaningful to fellow IT management geeks who share my level of (in)expertise in military matters. This is just yet another way in which I try to explain that, for Clouds as for plain old IT management, “it’s the application, stupid”.]

1 Comment

Filed under Application Mgmt, Azure, Cloud Computing, Everything, IT Systems Mgmt, Mgmt integration, Utility computing, Virtualization