Archive for the ‘Utility computing’ Category

Post

PaaS as the path to MDA?

In Application Mgmt,Automation,Azure,BPM,Business Process,Cloud Computing,Everything,Implementation,Microsoft,Middleware,Modeling,Specs,Standards,Utility computing on December 28, 2009 by @vambenepe

Lots of communities think of Cloud Computing as the realization of a vision that they have been pusuing for a while (“sure we didn’t call it Cloud back then but…”). Just ask the Grid folks, the dynamic data center folks (DCML, IBM’s “Autonomic Computing”, HP’s “Adaptive Enterprise”,  Microsoft’s DSI), the ASP community, and those of us who toiled on what was going to be the SOAP-based management stack for all IT (e.g. my HP colleagues and I can selectively quote mentions of “adaptation mechanisms like resource reservation, allocation/de-allocation” and “management as a service” in this WSMF white paper from 2003 to portray WSMF as a precursor to all the Cloud APIs of today).

I thought of another such community today, as I ran into older OMG specifications: the Model-Driven Architecture (MDA) community. I have no idea what people in this community actually think of Cloud Computing, but it seems to me that PaaS is a chance to come close to part of their vision. For two reasons: PaaS makes it easier and more rewarding, all at the same time, to practice model-driven design. More bang for less buck.

Easier

My understanding of the MDA value proposition is that it would allow you to create a high-level design (at the level of something like an augmented version of UML) and have it automatically turn into executable code (e.g. that can run in a JEE or .NET container). I am probably making it sound more naive than it really is, but not by much. That’s a might wide gap to bridge, for QVT and friends, from UMLish to byte-code and it’s no surprise that the practical benefits of MDA are still to be seen (to put it kindly).

In a PaaS/SaaS world, on the other hand, you are mapping to something that is higher level than byte code. Depending on what types of PaaS containers you envision, some of the abstractions provided by these containers (e.g. business process execution, event processing) are a lot closer to the concepts manipulated in your PIM (Platform-independent model, the UMLish mentioned above). Thus a smaller gap to bridge and a better chance of it being automagical. Especially if you add a few SaaS building blocks to the mix.

More rewarding

Not only should it be easier to map a PIM to a PaaS deployment environments, the benefits you get once you are done are incommensurably greater. Rather than getting a dump of opaque auto-generated byte-code running in a regular JVM/CLR, you get an environments in which the design concepts (actors/services, process, rules, events) and the business model elements are first class citizens of the platform management infrastructure. So that you can monitor and set policies on the same things that you manipulate in you PIM. As opposed to falling down to the lowest common denominator of CPU/memory metrics. Or, god forbid, trying to diagnose/optimize machine-generated code.

We shall see

I wasn’t thinking of Microsoft SQL Server Modeling (previously known as Oslo) when I wrote this, but Doug Purdy’s tweet made the connection for me. And indeed, one can see in SQLSM+Azure the leading candidate today to realizing the MDA vision… minus the OMG MDA specifications.

[Note: I wasn't planning to blog this, but after I tweeted the basic idea ("Attempting MDA (model-driven architecture) before inventing model-driven deployment and mgmt was hopeless. Now possibly getting there.") Shlomo requested more details and I got frustrated by the difficulty to explain my point in twitterisms. In effect, this blog entry is just an expanded tweet, not something as intensely believed, fanatically researched and authoritatively supported as my usual blog posts (ah!).]

[UPDATED 2009/12/29: Some relevant presentations from OMG-land, thanks to Jean Bezivin. Though I don't see mention of any specific plan to use/adapt MOF/XMI/QVT/etc for the Cloud.]

Post

REST in practice for IT and Cloud management (part 3: wrap-up)

In API,Application Mgmt,Automation,Cloud Computing,Everything,IT Systems Mgmt,Manageability,Mgmt integration,Modeling,Protocols,REST,Semantic tech,SOA,SOAP,Specs,Utility computing on December 10, 2009 by @vambenepe

[Preface: a few months ago I shared some thoughts about how REST was (or could) be applied to IT and Cloud management. Part 1 was a comparison of the RESTful aspects of four well-known IaaS Cloud APIs and part 2 was an analysis of how REST applies to configuration management. Both of these entries received well-informed reader comments BTW, so if you read the posts but didn't come back for the comments you really owe it to yourself to do so now. At the time, I jotted down thoughts for subsequent entries in this series, but I never got around to posting them. Since the topic seems to be getting a lot of attention these days (especially in DMTF) I decided to go back to these notes and see if I could extract a few practical recommendations in the form of a wrap-up.]

The findings listed below should be relevant whether your protocol is trying to be truly RESTful, just HTTP-centric or even zen-SOAPy. Many of the issues that arise when creating a protocol that maps well to IT management use cases should transcend these variations and that’s what I try to cover.

Finding #1: Relationships (links) are first-class entities (a.k.a. “hypermedia”)

The clear conclusion of both part 1 and part 2 was that the most relevant part of REST for IT and Cloud management is the use of hypermedia. IT management enjoys a head start on this compared to other domains, because its models are already rich in explicit relationships (e.g. CIM associations), as opposed to other business domains in which relationships are more implicit (to the end user at least). But REST teaches us that just having relationships in your model is not enough. They need to be exposed in a way that maps directly to the protocol, so that following a relationship is an infrastructure-level task, not an application-level task: passing an ID as a parameter for some domain-specific function is not it.

This doesn’t violate the rule to not mix the protocol and the model because the alignment should take place in the metamodel. XML is famously weak in that respect, but that’s where Atom steps in, handling relationships in a generic way. Similarly, support for references is, in addition to its accolade to Schematron, one of the main benefits of SML (extra kudos for apparently dropping the “EPR” reference scheme between submission and standardization, in favor of just the “URI” scheme). Not to mention RDFa and friends. Or HTTP Link headers (explained) for link-challenged types.

Finding #2: Put IDs on steroids

There is little to argue about the value of clearly identifying things of interest and we didn’t wait for the Web to realize this. But it is also one of the most vexing and complex problems in many areas of computing (including IT management). Some of the long-standing questions include:

  • Use an opaque ID (some random-looking string a characters) or an ID grounded in “unique” properties of the resource (if you can find any)?
  • At what point does a thing stop being the same (typical example: if I replace each hardware component of a server one after the other, at which point is it not the same server anymore? Does it make sense for the IT guys to slap an “asset id” sticker on the plastic box around it?)
  • How do you deal with reconciling two resources (with their own IDs) when you realize they represent the same thing?

REST guidelines don’t help with these questions. There often is an assumption, which is true for many web apps, that the application “owns” the resource. My “inbox” only exists as a resource within the mail server application (e.g. Gmail or an Exchange server). Whatever URI GMail assigns for it is the URI for my inbox, period. Things are not as simple when the resources exist outside of any specific application: take a server, for example: the board management controller (or the hypervisor in the case of a VM), the OS management layer and the management agent installed on the machine all have claims to report on the machine (and therefore a need to identify it).

To some extent, Cloud computing simplifies many of these issues by providing controllers that “own” infrastructure resources and can authoritatively identify them. But it really is only pushing the problem to the next level of the stack.

Making the ID a URI doesn’t magically answer these questions. Though it helps in that it lets you leverage reconciliation mechanisms developed around URIs (such as <atom:link rel=”alternate”> or owl:sameAs). What REST does is add another constraint to this ID mechanism: Make the IDs dereferenceable URLs rather than just URIs.

I buy into this. A simple GET on a resource URI doesn’t solve everything but it has so many advantages that it should be attempted in all cases. And make this HTTP GET please (see finding #6).

In this adoption of GET, we just have to deal with small details such as:

  • What URL do I use for resources that have more than one agent/controller?
  • How close to the resource do I point this URL? If it’s too close to it then it may change as the resource evolves (e.g. network changes) or be affected by the resource performance (e.g. a crashed machine or application that does not respond to its management API). If it’s removed from the resource, then I introduce a scope (e.g. one controller) within which the resource has to remain, which may cause scalability concerns (how many VMs can/should one controller handle, what if I want to migrate a VM across the ocean…).

These are somewhat corner cases (and the more automation and virtualization you get, the fewer possible controllers you have per resource). While they need to be addressed, they don’t come close to negating the value of dereferenceable IDs. In addition, there are plenty of mechanisms to help with the issues above, from links in the representations (obviously) to RDDL-style lightweight directory to a last resort “give Saint Peter a call” mechanism (the original WSRF proposal had a sub-specification called WS-RenewableReferences that would let you ask for a new version of an expired EPR but it was never published — WS-Naming in then-GGF also touched on that with its reference resolvers — showing once again that the base challenges don’t change as fast as technology flavors).

Implicit in this is the fact that URIs are vastly superior to EPRs. The latter were only just a band-aid on a broken system (which may have started back when WSDL 1.1 decided to define “ports” as message aggregators that can have only one URL) and it’s been more debilitating to SOAP than any other interoperability issue. Web services containers internalized this assumption to the point of providing a stunted dispatch mechanism that made it very hard to assign distinct URLs to resources.

Finding #3: If REST told you to jump off a bridge, would you do it?

Adherence to REST is not required to get the benefits I describe in this series. There is a lot to be inspired by in REST, but it shouldn’t be a religion. Sure, if you squint hard enough (and poke it here and there) you can call your interface RESTful, but why bother with the contortions if some parts are not so. As long as they don’t detract from the value of REST in the other parts. As in all conversions, the most fervent adepts of RPC will likely be tempted to become its most violent denunciators once they’re born again. This is a tired scenario that we don’t need to repeat. Don’t think of it as a conversion but as a new perspective.

Look at the “RESTful with many parameters?” comment thread on Stefan Tilkov’s excellent InfoQ introduction to REST. It starts with some shared distaste for parameter-laden URIs and a search for a more RESTful approach. This gets suggested:

You could do a post on some URI like ./query/product_dep which would create a query resource. Now you “add” products to the query either by sending a product uri list with the initial post or by calling post on ./query/product_dep/{id}. With every post to the query resource the get on the query resource would change.

Yeah, you could. But how about an RPC-like query operation rather than having yet another resource lifecycle to manage just for the sake of being REST-compliant? And BTW, how do you think any sane consumer of your API is going to handle this? You guessed it, by packaging the POST/POST/GET/DELETE in one convenient client-side library function called “query”. As much as I criticize RPC-centric toolkits (see finding #5 below), it would be justified in this case.

Either you understand why/how REST principles benefit you or you don’t. If you do, then use this understanding to interpret the REST principles to best fit your needs. If you don’t, then no amount of CONTENT-TYPE-pixie-dust-spreading, GET-PUT-POST-DELETE-golden-rule-following and HATEOAS-magical-incantation-reciting will help you. That’s the whole point, for me at least, of this tree-part investigation. Stefan says essential the same, but in a converse way, in his article: “there are often reasons why one would violate a REST constraint, simply because every constraint induces some trade-off that might not be acceptable in a particular situation. But often, REST constraints are violated due to a simple lack of understanding of their benefits.” He says “understand why you violate” and I say “understand why you obey”. It is essentially the same (if you’re into stereotypes you can attribute the difference to his Germanic heritage and my Gallic blood).

Even worse than bending your interface to appear RESTful, don’t cherry-pick your use cases to only keep those that you feel you can properly address via REST, leaving the others aside. Conversely, don’t add requirements just because REST makes them easy to support (interesting how quickly “why do you force me to manage the lifecycle of yet another resource just to run a query” turns into “isn’t this great, you can share queries among users and you can handle long-running queries, I am sure we need this”).

This is not to say that you should not create a fully RESTful system. Just that you don’t necessarily have to and you can still get many benefits as long as you open your eyes to the cost/benefits trade-off involved.

Finding #4: Learn humility from REST

Beyond the technology, there is a vibe behind REST design. You can copy the technology and still miss it. I described it in 2005 as Humble Architecture, and applied to SOA at the time. But it describes REST just as well:

More practically, this means that the key things to keep in mind when creating a service, is that you are not at the center of the universe, that you don’t know who is going to consume your service, that you don’t know what they are going to do with it, that you are not necessarily the one who can make the best use of the information you have access to and that you should be willing to share it with others openly…

The SOA Manifesto recently called this “intrinsic interoperability”.

In IT management terms, it means that you can RESTify your CMDB and your event console and your asset management software and your automation engine all you want, if you see your code as the ultimate consumer and the one that knows best, as the UI that users have to go through, the “ultimate source of truth” and the “manager of managers” then it doesn’t matter how well you use HTTP.

Finding #5: Beware of tools bearing gifts

To a large extent, the great thing about REST is how few tools there are to take it away from you. So you’re pretty much forced to understand what is going on in your contract as opposed to being kept ignorant by a wsdl2java type of toolkit. Sure, Java (and .NET) have improved in that regard, but really the cultural damage is done and the expectations have been set. Contrast this to “the ‘router’ is just a big case statement over URI-matching regexps”, from Tim Bray’s post on the Sun Cloud API, one of my main inspirations for this investigation.

REST is not inherently immune to the tool-controlling-the-hand syndrome. It’s just a matter of time until such tools try to make REST “accessible” to the “normal” developer (who can supposedly prevent thread deadlocks but not parse XML). Joe Gregorio warns about this in the context of WADL (to summarize: WADL brings XSD which leads to code generation). Keep this in mind next time someone states that REST is more “loosely coupled” than SOAP. It’s how you use it that matters.

Finding #6: Use screws, not glue, so we can peer inside and then close the lid again

The “view source” option is how I and many others learned HTML. It unfortunately created a generation of HTML monsters who never went past version 3.2 (the marbled background makes me feel young again). But it also fueled the explosion of the Web. On-the-wire inspection through soapUI is what allowed me to perform this investigation and report on it (WMI has allowed this for years, but WS-Management is what made it accessible and usable for anyone on any platform). This was, of course, in the context of SOAP which is also inspectable. Still, in that respect nothing beats plain HTTP which is why I recommend HTTP GET in finding #2 (make IDs dereferenceable) even though I don’t expect that the one-page-per-resource view is going to be the only way to access it in the finished product.

These (HTML source, on-the-wire XML and resource-description pages) rarely hit the human eye and yet their presence enables the development of the more commonly used views. Making it as easy as possible to see what is going on under the covers helps with learning, with debugging, with extending and with innovating. In the same way that 99% of web users don’t look at the HTML source (and 99.99% of them don’t see the HTTP requests) but the Web would not be what it is to them if this inspectability wasn’t been there to fuel its development.

Along the same line, make as few assumptions as possible about the consumers in your interfaces. Which, in practice, often means document what goes on the wire. WSDL/WADL can be used as a format, but they are at most one small component. Human-readable semantics are much more important.

Finding #7: Nothing is free

Part of what was so attractive about SOAP is everything you were going to get “for free” by using it. Message-level security (for all these use cases where your messages starts over HTTP, then hops onto a train, then get delivered by a carrier pigeon). Reliable messaging. Transactionality. Intermediaries (they were going to be a big deal in SOAP, as you can see in vestigial form today in the Nodes/Roles left in the spec – also, do you remember WS-Routing? I do.)

And it’s true that by now there is a body of specifications that support this as composable SOAP headers. But the lack of usage of these features contrasts with how often they were bandied in the early days of SOAP.

Well, I am detecting some of the same in the REST camp. How often have you heard about how REST enables caching? Or about how content types allows an ISP to compress images on the fly to speed up delivery over dial-up? Like in the SOAP case, these are real features and sometimes useful. It doesn’t mean that they are valuable to you. And if they are not, then don’t let them be used as justifications. Especially since they are not free. If caching doesn’t help me (because of low volume, because security considerations prevent a shared cache, etc) then its presence actually adds a cost to me, since I now have to worry whether something is cached or not and deal with ETags. Or I have to consistently remember to request the cache to be bypassed.

Finding #8: Starting by sweeping you front door.

Before you agonize about how RESTful your back-end management protocol is, how about you make sure that your management application (the user front-end) is a decent Web application? One with cool URIs , where the back button works, where bookmarks work, where the data is not hidden in some over-encompassing Flash/Silverlight thingy. Just saying.

***

Now for some questions still unanswered.

Question #1: Is this a flee market?

I am highly dubious of content negotiation and yet I can see many advantages to it. Mostly along the lines of finding #6: make it easy for people to look under the hood and get hold of the data. If you let them specify how they want to see the data, it’s obviously easier.

But there is no free lunch. Even if your infrastructure takes care of generating these different views for you (“no coding, just check the box”), you are expanding the surface of your contract. This means more documentation, more testing, more interoperability problems and more friction when time comes to modify the interface.

I don’t have enough experience with format negotiation to define the sweetspot of this practice. Is it one XML representation and one HTML, period (everything else get produced by the client by transforming the XML)? But is the XML Atom-wrapped or not? What about RDF? What about JSON? Not to forget that SOAP wrapper, how hard can it be to add. But soon enough we are in legacy hell.

Question #2: Mime-types?

The second part of Joe Gregorio’s WADL entry is all about Mime types and I have a harder time following him there. For one thing, I am a bit puzzled by the different directions in which Mime types go at the same time. For example, we have image formats (e.g. “image/png”), packaging/compression formats (e.g. “application/zip”) and application formats (e.g. “application/vnd.oasis.opendocument.text” or “application/msword”). But what if I have a zip full of PNG images? And aren’t modern word processing formats basically a zip of XML files? If I don’t have the appropriate viewer, maybe I’d like them to be at least recognized as ZIP files. I don’t see support for such composition and taxonomy in these types.

And even within one type, things seem a bit messy in practice. Looking at the registered applications in the “options” menu of my Firefox browser, I see plenty of duplication:

  • application/zip vs. application/x-zip-compressed
  • application/ms-powerpoint vs. application/vnd.ms-powerpoint
  • application/sdp vs. application/x-sdp
  • audio/mpeg vs. audio/x-mpeg
  • video/x-ms-asf vs. video/x-ms-asf-plugin

I also wonder at what level of depth I want to take my Mime types. Sure I can use Atom as a package but if the items I am passing around happen to be CIM classes (serialized to XML), doesn’t it make sense to advertise this? And within these classes, can I let you know which domain (e.g. which namespace) my resources are in (virtual machines versus support tickets)?

These questions may simply be a reflection of my lack of maturity in the fine art of using Mime types as part of protocol design. My experience with them is more of the “find the type that works through trial and error and then leave it alone” kind.

[Side note: the first time I had to pay attention to Mime types was back in 1995/1996, playing with non-parsed headers and the multipart/x-mixed-replace type to bring some dynamism to web pages (that was before JavaScript or even animated GIFs). The site is still up, but the admins have messed up the Apache config so that the CGIs aren't executed anymore but return the Python code. So, here are some early Python experiments from yours truly: this script was a "pushed" countdown and this one was a "pushed" image animation. Cool stuff at the time, though not in a "get a date" kind of way.]

On the other hand, I very much agree with Joe’s point that “less is more”, i.e. that by not dictating how the semantics of a Mime type are defined the system forces you to think about the proper way to define them (e.g. an English-language RFC). As opposed to WSDL/XSD which gives the impression that once your XML validator turns green you’re done describing your interface. These syntactic validations are a complement at best, and usually not a very useful one (see “fat-bottomed specs”).

In comments on previous posts, Stu Charlton also emphasizes the value that Mime types bring. “Hypermedia advocates exposing a variety of links for such state-transitions, along with potentially unique media types to describe interfaces to those transitions.” I get the hypermedia concept, the HATEOAS approach and its very practical benefits. But I am still dubious about the role of Mime types in achieving them and I am not the only one with such qualms. I have too much respect for Joe and Stu to dismiss it entirely, but until I get an example that makes it “click” in practice for me I won’t sweat about Mime types too much.

Question #3: Riding the Zeitgeist?

That’s a practical question rather than a technical one, but as a protocol creator/promoter you are going to have to decide whether you market it as “RESTful”. If I have learned one thing in my past involvement with standards it is that marketing/positioning/impressions matter for standards as much as for products. To a large extent, for Clouds, Linked Data is a more appropriate label. But that provides little marketing/credibility humph with CIOs compared to REST (and less buzzword-compliance for the tech press). So maybe you want to write your spec based on Linked Data and then market it with a REST ribbon (the two are very compatible anyway). Just keep in mind that REST is the obvious choice for protocols in 2009 in the same way that SOAP was a few years ago.

Of course this is not an issue if you specification is truly RESTful. But none of the current Cloud “RESTful” APIs is, and I don’t expect this to change. At least if you go by Roy Fielding’s definition (or Paul’s handy summary):

A REST API must not define fixed resource names or hierarchies (an obvious coupling of client and server). Servers must have the freedom to control their own namespace. Instead, allow servers to instruct clients on how to construct appropriate URIs, such as is done in HTML forms and URI templates, by defining those instructions within media types and link relations. [Failure here implies that clients are assuming a resource structure due to out-of band information, such as a domain-specific standard, which is the data-oriented equivalent to RPC's functional coupling].

And (in a comment) Mark Baker adds:

I’ve reviewed lots of “REST APIs”, many of them privately for clients, and a common theme I’ve noticed is that most folks coming from a CORBA/DCE/DCOM/WS-* background, despite all the REST knowledge I’ve implanted into their heads, still cannot get away from the need to “specify the interface”. Sometimes this manifests itself through predefined relationships between resources, specifying URI structure, or listing the possible response codes received from different resources in response to the standard 4 methods (usually a combination of all those). I expect it’s just habit. But a second round of harping on the uniform interface – that every service has the same interface and so any service-specific interface specification only serves to increase coupling – sets them straight.

So the question of whether you want to market yourself as RESTful (rather than just as “inspired by the proper use of HTTP illustrated by REST”) is relevant, if only because you may find the father of REST throwing (POSTing?) tomatoes at you. There is always a risk in wearing clothes that look good but don’t quite fit you. The worst time for your pants to fall off is when you suddenly have to start running.

For more on this, refer to Ted Neward’s excellent Roy decoder ring where he not only explains what Roy means but more importantly clarifies that “if you’re not doing REST, it doesn’t mean that your API sucks” (to which I’d add that it is actually more likely to suck if you try to ape REST than if you allow yourself to be loosely inspired by it).

***

Wrapping up the wrap-up

There is one key topic that I had originally included in this wrap-up but decided to remove: extensibility. Mark Hapner brings it up in a comment on a previous post:

It is interesting to note that HTML does not provide namespaces but this hasn’t limited its capabilities. The reason is that links are a very effective mechanism for composing resources. Rather than composition via complicated ‘embedding’ mechanisms such as namespaces, the web composes resources via links. If HTML hadn’t provided open-ended, embeddable links there would be no web.

I am the kind of guy who would have namespace-qualified his children when naming them (had my wife not stepped in) so I don’t necessarily see “extension via links” as a negation of the need for namespaces (best example: RDF). The whole topic of embedding versus linking is a great one but this post doesn’t need another thousand words and the “REST in practice” umbrella is not necessarily the best one for this discussion. So I hereby conclude my “REST in practice for IT and Cloud management” series, with the intent to eventually start a “Linked Data in practice for IT and Cloud management” series in which extensibility will be properly handled. And we can also talk about querying (conspicuously absent from Cloud APIs, unless CMDBf is now a Cloud API) and versioning. As a teaser for the application of Linked Data to IT/Cloud, I will leave you with what Vint Cerf has to say.

[UPDATED 2010/1/27: I still haven't written the promised "Linked Data in practice for IT and Cloud management" post, but this explanation of the usage of Linked Data for data.gov.uk pretty much says it all. I may still write a post describing how what Jeni says about government data applies to Cloud management APIs, but it's almost too obvious to bother. Actually, there may be reasons why Cloud management benefits even more from Linked Data than UK government data, so it may still be worth a post. At some point. When I convince myself that it may influence things rather than be background noise.]

Post

Can I get a price check on this AMI?

In Amazon,Application Mgmt,Business,Cloud Computing,Everything,IBM,IT Systems Mgmt,Utility computing on December 4, 2009 by @vambenepe

I almost titled this entry “Cloud + Tivoli = $” in reference to the previous one (“Cloud + proprietary software = ♥”). In that earlier entry, I described the opportunity for Cloud providers to benefit themselves, their customers and software vendors by drastically reducing the frictions involved in using proprietary software (rather than open source software). The example I used was Windows EC2 instances. But it’s not the best example because there is a very tight relationship between Amazon and Microsoft on this. In many ways, these Windows instances are “hard-coded” in EC2: they have a special credential retrieval mechanism, their price appears in the main EC2 price list, etc. This cannot scale as a generic Amazon-mediated payment service for many software vendors.

Rather than the special case of Windows instances, the more interesting situation to look at is the availability of vendor-provided EC2 instances at a higher price. So I went to look a bit more into this, and I came out… very confused and $20 poorer.

Earlier in the week, I had noticed an announcement of IBM Tivoli on EC2 that explained that “the hourly price for Tivoli on EC2 includes an IBM license”. This seemed like a perfect place for me to start. My first question was “how much does it cost”? The blog entry doesn’t say. It links to a Tivoli on EC2 FAQ on the IBM site, which doesn’t say either (apparently IBM’s target customers work in recession-proof industries and do not “frequently ask” about prices). I then followed the link to the overall IBM and AWS FAQ but it just states that “charges will be announced by Amazon Web Services in the coming months”. Both FAQs explain how to use your traditional IBM license on EC2, but that’s not what I am after. At this point, I feel like a third-world tourist who entered a high-end jewelry store in Paris where no price is displayed. Call me plebeian, but I am more accustomed to Target-like stores with price-check scanners in the aisles…

I hypothesized that the AWS console might show me the price when I select the Tivoli AMIs. But no such luck. Tired of searching, and since I was already in the console, I figured I’d just launch an instance and see the hourly cost in my account usage. Since it comes in three versions (depending on how many targets you want Tivoli to manage), I launched one of each. Additionally, for one of them I launched instances of two different sizes so I can verify that the price difference is equal to the base EC2 price difference between such instance sizes. Here is what I got:

ec2-tivoli-bill

Of course, by the time my account usage page was updated (it took a few hours) I had found the price list which in retrospect wasn’t that hard to find (from Amazon, not IBM).

So maybe I am not the brightest droplet in the cloud, but for 20 bucks I consider that at least I bought the right to make a point: these prices should not be just on some web page. They should be accessible at the time of launch, in the console. And also in the EC2 API, so that the various EC2 tools can retrieve them. Whether it’s just for human display or to use as part of some automation logic, this should be available in an authoritative manner, without the need to scrape a page.

The other thing that bothers me is the need to decide upfront whether I want to launch a Tivoli instance to manage 50 virtual cores, 200 virtual cores or 600 virtual cores. That feels very inelastic for an EC2 deployment. I want to be charged for the actual number of virtual cores I am managing at any point in time. I realize the difficulty in metering this way (the need for Tivoli to report this to AWS, the issue of trust…) but hopefully it will eventually get there.

While I am talking about future improvements, another limitation is that there can currently only be one vendor per AMI. What if someone wants to write an application that runs on top of Oracle Middleware and package this as a paid AMI? It would be nice if Amazon eventually allowed the price of the instance to be split three-ways (Amazon, Oracle, application vendor).

In any case, now you know why this investigation left me poorer. The confused part comes from the fact that I had earlier experimented with Amazon Paid AMIs and it was an entirely different experience. Better in some ways: you get a clear price list upfront such as this.

ec2-paid-ami-price

But not as good in other ways: you have to purchase the paid AMI in a way that it is somewhat disconnected from the launch of the instance. And for some reason you paid for this directly out of your credit card as opposed to it going to your AWS usage account along with all your other charges. I would expect that many customers will use these paid AMIs are part of a larger EC2 deployment and as such it seems awkward to have it billed separately.

But overall, it’s the disconnect between the two that the confuses me. Are there two different types of paid AMIs (three if you include the Windows EC2 instances)? What am I missing?

The next step in my investigation should probably be to create an AMI and set a price on it, so I get the vendor’s view in addition to the consumer’s view. And maybe I can earn my $20 back in the process…

Post

Cloud + proprietary software = ♥

In Amazon,Application Mgmt,Automation,Business,Cloud Computing,Everything,Open source,Oracle,Utility computing,Virtual appliance on December 2, 2009 by @vambenepe

When I left HP for Oracle, in the summer of 2007, a friend made the argument that Cloud Computing (I think we were still saying “Utility Computing” at the time) would be the death of proprietary software.

There was a lot to support this view. EC2 was one year old and its usage was overwhelmingly based on open source software (OSS). For proprietary software, there was no clear understanding of how licensing terms applied to Cloud deployments. Not only would most sales reps not know the answer, back then they probably wouldn’t have comprehended the question.

Two and a half years later… well it doesn’t look all that different at first blush. EC2 seems to still be a largely OSS-dominated playground. Especially since the most obvious (though not necessarily the most accurate) measure is to peruse the description/content of public AMIs. They are (predictably since you can’t generally redistribute proprietary software) almost entirely OSS-based, with the exception of the public AMIs provided by software vendors themselves (Oracle, IBM…).

And yet the situation has improved for usage of proprietary software in the Cloud. Most software vendors have embraced Cloud deployments and clarified licensing issues (taking the example of Oracle, here is its Cloud Computing Center page, an overview of the licensing policy for Cloud deployments and the AWS/Oracle partnership page to encourage the use of Oracle software on EC2; none of this existed in 2007).

But these can be called reactive adaptations. At best (depending on the variability of your load and whether you use an Unlimited License Agreement), these moves have brought the “proprietary vs. OSS” debate in the Cloud back to the same parameters as for on-premise deployments. They have simply corrected the initial challenge of not even knowing how to license proprietary software for Cloud deployments.

But it’s not stopping here. What we are seeing is some of the parameters for the “proprietary vs. OSS” debate become more friendly towards proprietary software in the Cloud than on-premise. I am referring to the emergence of EC2 instances for which the software licenses are included in the per-hour rate for server instances. This pretty much removes license management as a concern altogether. The main example is Windows EC2 instances. As a user, you don’t have to deal with any Windows license consideration. You just request a machine and use it. Which of course doesn’t mean you are not paying for Windows. These Windows instances are more expensive than comparable Linux instances (e.g. $0.34 versus $0.48 per hour in the case of a “standard/large” instance) and Amazon takes care of paying Microsoft.

The removal of license management as a concern may not make a big difference to large corporations that have an Unlimited License Agreement, but for smaller companies it may take a chunk out of the reasons to use open source software. Not only do you not have to track license usage (and renewal), you never have to spend time with a sales rep. You don’t have to ask yourself at what point in your beta program you’ve moved from a legitimate use of the (often free) development license to a situation in which you need a production license. Including the software license directly in the cost of the base Cloud resource (e.g. the virtual machine instance) makes planning (and auto-scaling) easier: you use the same algorithm as in the “free license” situation, just with a different per hour cost. The trade-off becomes quantitative rather than qualitative. You can trade a given software stack against a faster CPU or more memory or more storage, depending on which combination serves your needs better. It doesn’t matter if the value you get from the instance comes from the software in the image or the hardware. This moves IaaS closer to PaaS (force.com doesn’t itemize your bill between hardware cost and software cost). Anything that makes IaaS more like PaaS is good for IaaS.

From an earlier post, about virtual appliances:

As with all things computer-related, the issue is going to get blurrier and then irrelevant . The great thing about software is that there is no solid line. In this case, we will eventually get more customized appliances (via appliance builders or model-driven appliance generation) blurring the line between installed software and appliance-based software.

I was referring to a blurring line in terms of how software is managed, but it’s also true in terms of how it is licensed.

There are of course many other reasons (than license management) why people use open source software rather than proprietary. The most obvious being the cost of the license (which, as we have seen, doesn’t go away but just gets incorporated in the base Cloud instance rate). Or they may simply prefer a given open source product independently of any licensing aspect. Some need access to the underlying code, to customize/improve it for their purpose. Or they may be leery of depending on one entity for the long-term viability of their platform. There may even be some who suspect any software that they don’t examine/compile themselves to contain backdoors (though these people are presumably not candidates for Cloud deployments in he first place). Etc. These reasons remain pretty much unchanged in the Cloud. But, anecdotally at least, removing license management concerns from both manual and automated tasks is a big improvement.

If Cloud providers get it right (which would require being smarter than wireless service providers, a low bar) and software vendors play ball, the “proprietary vs. OSS” debate may become more favorable to proprietary software in the Cloud than it is on-premise. For the benefit of customers, software vendors and Cloud providers. Hopefully Amazon will succeed where telcos mostly failed, in providing a convenient application metering/billing service for 3rd-party software offered on top of their infrastructural services (without otherwise getting in the way). Anybody remembers the Minitel? Today we’d call that a “Terminal as a Service” offering and if it did one thing well (beyond displaying green characters) it was billing. Which reminds me, I probably still have one in my parent’s basement.

[Note: every page of this blog mentions, at the bottom, that "the statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation" but in this case I will repeat it here, in the body of the post.]

[UPDATED 2009/12/29: The Register seems to agree. In fact, they come close to paraphrasing this blog entry:

"It's proprietary applications offered by enterprise mainstays such as Oracle, IBM, and other big vendors that may turn out to be the big winners. The big vendors simply manipulated and corrected their licensing strategies to offer their applications in an on-demand or subscription manner.

Amazonian middlemen

AWS, for example, now offers EC2 instances for which the software licenses are included in the per-hour rate for server instances. This means that users who want to run Windows applications don't have to deal with dreaded Windows licensing - instead, they simply request a machine and use it while Amazon deals with paying Microsoft."]

[UPDATED 2010/1/25: I think this "Cloud as Monetization Strategy for Open Source" post by Geva Perry (based on an earlier post by Savio Rodrigues ) confirms that in the Cloud the line between open source and proprietary software is thinning.]

[UPDATED 2010/11/12: Related and interesting post on the AWS blog: Cloud Licensing Models That Exist Today]

Post

Can your hypervisor radio for air support?

In Application Mgmt,Azure,Cloud Computing,Everything,IT Systems Mgmt,Mgmt integration,Utility computing,Virtualization on November 25, 2009 by @vambenepe

As I was reading about Microsoft Azure recently, a military analogy came to my mind. Hypervisors are tanks. Application development and runtime platforms compose the air force.

Tanks (and more generally the mechanization of ground forces) transformed war in the 20th century. They multiplied the fighting capabilities of individuals and changed the way war was fought. A traditional army didn’t stand a chance against a mechanized one. More importantly, a mechanized army that used the new tools with the old mindset didn’t stand a chance against a similarly equipped army that had rethought its strategy to take advantage of the new capabilities. Consider France at the beginning of WWII, where tanks were just canons on wheels, spread evenly along the front line to support ground troops. Contrast this with how Germany, as part of the Blitzkrieg, used tanks and radios to create highly mobile – and yet coordinated – units that caused havoc in the linear French defense.

Exercise for the reader who wants to push the analogy further:

  • Describe how Blitzkrieg-style mobility of troops (based on tanks and motorized troop transports) compares to Live Migration of virtual machines.
  • Describe how the use of radios by these troops compares to the use of monitoring and control protocols to frame IT management actions.

Tanks (hypervisor) were a game-changer in a world of foot soldiers (dedicated servers).

But no matter how good your tanks are, you are at a disadvantage if the other party achieves air superiority. A less sophisticated/numerous ground force that benefits from strong air support is likely to prevail over a stronger ground force with no such support. That’s what came to my mind as I read about how Azure plans to cover the IaaS layer, but in the context of an application-and-data-centric approach. Where hypervisors are not left to fend for themselves based on the limited view of the horizion from the periscope of their turrets but rather orchestrated, supported (and even deployed) from the air, from the application platform.

C-130 tank airdrop

(Yes, I am referring to the Azure vision as it was presented at PDC09, not necessarily the currently available bits.)

Does your Cloud vendor/provider need an air force?

Exercise for the reader who wants to push the analogy to the stratosphere:

  • Describe how business logic/process, business transaction management and business intelligence are equivalent to satellites, surveying the battlefield and providing actionable intelligence.

The new Cloud stack (“military-cloud complex” version):

cloud-military-stack

[Note: I have no expertise in military history (or strategy) beyond high school classes about WWI and WWII, plus a couple of history books and a few war movies. My goal here is less to be accurate on military concerns (though I hope to be) than to draw an analogy which may be meaningful to fellow IT management geeks who share my level of (in)expertise in military matters. This is just yet another way in which I try to explain that, for Clouds as for plain old IT management, "it's the application, stupid".]

Post

Review of Fujitsu’s IaaS Cloud API submission to DMTF

In Automation,Cloud Computing,DMTF,Everything,IT Systems Mgmt,Mgmt integration,Modeling,Specs,Standards,Utility computing,Virtualization on November 19, 2009 by @vambenepe

Things are heating up in the DMTF Cloud incubator. Back in September, VMWare submitted its vCloud API (or rather a “reader’s digest” version of it) to the group. Last week, the group released a white paper titled “Interoperable Clouds”. And a second submission, from Fujitsu, was made last week and publicly announced today.

The Fujitsu submission is called an “API design”. What this means is that it doesn’t tell you anything about what things look like on the wire. It could materialize as another “XML over HTTP” protocol (with or without SOAP wrapper), but it could just as well be implemented as a binary RPC protocol. It’s really more of an esquisse of a resource model than a remote API. The only invocation-related aspect of the document is that it defines explicit operations on various resources (though not their input and outputs). This suggest that the most obvious mapping would be to some XML/HTTP RPC protocol (SOAPy or not). In that sense, it stands out a bit from the more recent Cloud API proposals that take a “RESTful” rather than RPC approach. But in these days of enthusiastic REST-washing I am pretty sure a determined designer could produce a RESTful-looking (but contorted) set of resources that would channel the operations in the specification as HTTP-like verbs on these resources.

Since there are few protocol aspects to this “API design”, if we are to compare it to other “Cloud APIs”, it’s really the resource model that’s worth evaluating. The obvious comparison is to the EC2 model as it provides a pretty similar set of infrastructure resources (it’s entirely focused on the IaaS layer). It lacks EC2 capabilities around availability, security and monitoring. But it adds to the EC2 resource model the notions of VDC (“virtual data center”, a container of IaaS resources), VSYS (see below) and a lightly-defined EFM (Extended Function Module) concept which intends to encompass all kinds of network/security appliances (and presumably makes up for the lack of security groups).

The heart of the specification is the VSYS and its accompanying VSYS Descriptor. We are encouraged to think of the VSYS Descriptor as an extension of OVF that lets you specify this kind of environment:

Example content for a VSYS Descriptor

Example content for a VSYS Descriptor

By forcing the initial VSYS instance to be based on a VSYS Descriptor, but then allowing the VSYS to drift away from the descriptor via direct management actions, the specification takes a middle-of-the-road approach to the “model-based versus procedural” debate. Disciples of the procedural approach will presumably start from a very generic and unconstrained VSYS Descriptor and, from there, script their way to happiness. Model geeks will look for a way to keep the system configuration in sync with a VSYS Descriptor.

How this will work is completely undefined. There is supposed to be a getVSYSConfiguration() operation which “returns the configuration information on the VSYS” but there is no format/content proposed for the response payload. Is this supposed to return every single config file, every setting (OS, MW, application) on all the servers in the VSYS? Surely not. But what then is it supposed to return? The specification defines five VSYS attributes (VSYSID, creator, createTime, description and baseDescriptor) so I know what getSYSAttributes() returns. But leaving getVSYSConfiguration() undefined is like handing someone an airplane maintenance manual that simply reads “put the right part in the right place”. A similar feature is also left as an exercise to the reader in section that sketches an “external configuration service”. We are provided with a URL convention to address the service, but zero information about the format and content of the configuration instructions provided to the VServer.

EC2 has a keypair access mechanism for Linux instances and a clumsy password-retrieval system for Windows instances. The Fujitsu proposal adopts the lowest common denominator (actually the greatest common divisor, but that’s a lost rhetorical cause): random password generation/retrieval for everyone.

I also noticed the statement that a VServer must be “implemented as a virtual machine” which is an unnecessary constraint/assumption. The opposite statement is later made for EFMs, which “can be implemented in various ways (e.g. run on virtual machines or not)”, so I don’t want to read too much into the “hypervisor-required” VServer statement which probably just needs an editorial clean-up.

From a political perspective this specification looks more like a case of “can I play with you? I brought some marbles” than a more aggressive “listen everybody, we’re playing soccer now and I am the captain”. In other words, this may not be as much an attempt to shape the outcome of the incubator as much as to contribute to its work and position Fujitsu as a respected member whose participation needs to be acknowledged.

While this is an alternative submission to the vCloud API, I don’t think VMWare will feel very challenged by it. The specification’s core (VSYS Descriptor) intends to build on OVF, which should be music to VMWare’s ears (it’s the model, not the protocol, which is strategic). And it is light enough on technical details that it will be pretty easy for vCloud to claim that it, indeed, aligns with the intent of this “design”.

All in all, it is good to see companies take the time to write down what they expect out of the DMTF work. And it’s refreshing to see genuine single-company contributions rather than pre-negotiated documents by a clique. Whether they look more like implementable specifications of position paper, they all provide good input to the DMTF Cloud incubator.

Post

Desirable technical characteristics of PaaS

In Application Mgmt,Cloud Computing,Everything,IT Systems Mgmt,Manageability,Mgmt integration,Middleware,PaaS,Utility computing on November 10, 2009 by @vambenepe

PaaS can most dramatically improve the IT experience in four areas:

  • Hosting/operations efficiency
  • Application-centric management
  • Development productivity
  • Security

To do so, there are technical characteristics that PaaS frameworks should eventually exhibit. These are not technical characteristics of a given PaaS container, they are shared characteristics that go across all container types, no matter what the operational capabilities of the containers are.

Here is a rough and unorganized list of the desirable characteristics (meta-capabilities) of PaaS Cloud containers:

  • An application component model that supports deployment/configuration across all PaaS container types.
  • Explicit interactions/invocations between application components (resilient connections between component: infrastructure-level retry/reroute)
  • Uniform and consistent request tracking across all components. Ability to intercept component-to-component communication.
  • Short-term (or externally persisted) state so that all instances can be quickly redirected out of any one node.
  • Subset of platform management interface exposed to consumer, along with out of the box application management. Application metrics consolidated at application level rather than node level.
  • Consistent, model-based application management interface across all container types. Hooks for component code to provide its manageability in the same framework.
  • Minimal footprint of any container node for limited patching requirements.
  • Assistance for debugging platform-hosted code (see this entry).
  • No encroachment of container technology on application contract (e.g. no forced URL structure).
  • Application uniformly scalable to the limit of the underlying hardware (no imposed partitioning).
  • Shared authentication / authorization / auditing across containers.
  • Minimum contract/interface exposed by each container.
  • Governance of application services, aligned (in model/protocols) with the container management interfaces.
  • [UPDATE: need to add metering+billing as William Louth pointed out in a comment]

This applies across the board to public, private and hybrid PaaS. The distinctions between these delivery models are real but at a different level. The important thing is that the PaaS administrator is different from the application administrator in all cases. On the other hand, most of these technical characteristics are not achievable for lower-level Cloud resources (like virtual hosts and low-level storage) which is why the IaaS form of Cloud leaves the Cloud promise only partially fulfilled.

Post

Enumeration of PaaS container types

In Application Mgmt,Cloud Computing,Everything,Middleware,PaaS,Utility computing on November 9, 2009 by @vambenepe

What do we need from an on-demand application platform for enterprise software? Here is a proposed classification of container types on which you can create/scale/manage applications. Your application is made of modules that run on these containers (e.g. one app may have two “synchronous web request” modules, one “structured data service” module and one “scheduled job” module). All tied together using an application component model that dovetails with the capabilities of the different containers. The proposed containers map to the following capabilities:

  • Synchronous web request processing
  • Long-lived (persisted) process execution with introspectable/declarative flow
  • Event processing
  • Persistence (a few different types: structured vs. buckets/files vs. object cache)
  • Batch/background/scheduled/queued processing
  • Possibly some advanced Web/mobile UI (portals, human task flows…)
  • User management (self-contained or as an abstraction layer for other systems)
  • [UPDATED 2009/11/19: I should have included pluggable protocol/format adapters.]

While new applications may be built to run purely on such a platform, you will not be able to run your IT solely on it anytime in the foreseeable future. So if you are thinking about it from the perspective of the entire universe of virtualization containers needed to support your IT system, you’ll need to add lower-level container types, such as:

  • Guest hosts (typically from hypervisors)
  • Low-level (e.g. block) storage
  • Networking between them

Another way to think about it is that Cloud/Utility computing is about having pools of resources dynamically allocated. There needs to be few enough pools that each pool is large enough and used across enough consumers to derive efficiencies from the act of sharing. But there needs to be enough pool types that applications have a complete infrastructure to run on that lets them abstracts away what is not business logic. This list is an attempt at this middle ground.

This is a classification of containers, not a description. There are many different ways to realize each of them. The synchronous  web request processing could look like a servlet engine, a Python handler or a PHP page. The persisted execution could be a BPEL engine or some other state machine definition/execution engine. The structured data interface could look like SQL, XQuery or SPARQL, etc… In addition, more specialized application infrastructure elements (e.g. video streaming, analytics) might enter the picture for some applications.

I hope to see the discussion move beyond “IaaS vs. PaaS” towards talking about more specific container types that are needed/supported by the different virtualization stacks. My application doesn’t care if the file container is considered IaaS or PaaS.

The next post will list desired characteristics of the PaaS environment (meta-capabilities that go across all the container types in the first list but may not be available from the lower-level containers in the second list).

Post

Missing out on the OCCI fun

In Cloud Computing,Everything,People,Protocols,SOAP,SOAP header,Specs,Standards,Utility computing on October 21, 2009 by @vambenepe

As a recovering “design by committee” offender I have to be careful when lurking near standards groups mailing list, for fear my instincts may take over and I might join the fray. But tonight a few tweets containing alluring words like “header” and “metadata” got the better of me and sent me plowing through a long and heated discussion thread in the OGF OCCI mailing list archive.

I found the discussion fascinating, both from a technical perspective and a theatrical perspective.

Technically, the discussion is about whether to use HTTP headers to carry “metadata” (by which I think they  mean everything that’s not part of the business payload, e.g. an OVF document or other domain-specific payload). I don’t have enough context on the specific proposal to care to express my opinion on its merits, but what I find very interesting is that this shines another light on the age-old issue of how to carry non-payload info when designing a protocol. Whatever you call these data fields, you have to specify (by decreasing order of architectural importance):

  • How you deal with unknown fields: mustUnderstand or mustIgnore semantics.
  • How you keep them apart (prevent two people defining fields by the same name, telling different versions apart).
  • How you parse their content (and are they all parsed in the same manner or is it specific to each field).
  • Where they go.

SOAP provides one set of answers.

  • You can tag each one with a mustUnderstand attribute to force any consumer who doesn’t understand them to fault.
  • They are namespace-qualified.
  • They are XML-formatted.
  • They go at the top of the XML doc, in a section called the SOAP header.

You may agree or not with the approach SOAP took, but it’s important to realize that at its core SOAP is just this: the answer (in the form of the SOAP processing model) to these simple questions (here is more about the SOAP processing model and the abuses it has suffered if you’re interested). WSDL is something else. The WS-* stack is also something else. It’s probably too late to rescue SOAP from these associations, but I wanted to point this out for the record.

Whatever you answer to the four “non-payload data fields” questions above, there are many practical concerns that you have to consider when validating your proposal. They may not all be relevant to your use case, but then explicitly decide that they are not. They are things like:

  • Performance
  • Ability to process in a stream-based system
  • Ease of development (tool support, runtime accessibility…)
  • Ease of debugging
  • Field length limitations
  • Security
  • Ability to structure the data in the fields
  • Ability to use different transports (way overplayed in SOAP, but not totally irrelevant either)
  • Ability to survive intermediaries / proxies

Now leaving the technology aside, this OCCI email thread is also interesting from a human and organizational perspective. Another take on the good old Commedia dell standarte. Again, I don’t have enough context in the history of this specific group to have an opinion about the dynamics. I’ll just say that things are a bit more “free-flowing” than when people like my friend Dave Snelling were in charge in OGF. In any case, it’s great that the debate is taking place in public. If it had been a closed discussion they probably would not have benefited from Tim Bray dropping in to share his experience. On the plus side, they would have avoided my pontifications…

Post

Cloud platform patching conundrum: PaaS has it much worse than IaaS and SaaS

In Application Mgmt,Cloud Computing,Everything,Google App Engine,Governance,ITIL,Manageability,Mgmt integration,PaaS,SaaS,Utility computing,Virtualization on October 15, 2009 by @vambenepe

The potential user impact of changes (e.g. patches or config changes) made on the Cloud infrastructure (by the Cloud provider) is a sore point in the Cloud value proposition (see Hoff’s take for example). You have no control over patching/config actions taken by the provider, any of which could potentially affect you. In a traditional data center, you can test the various changes on specific applications; you don’t have to apply them at the same time on all servers; and you can even decide to skip some infrastructure patches not relevant to your application (“if it aint’ broken…”). Not so in a Cloud environment, where you may not even know about a change until after the fact. And you have no control over the timing and the roll-out of the patch, so that some of your instances may be running on patched nodes and others may not (good luck with troubleshooting that).

Unfortunately, this is even worse for PaaS than IaaS. Simply because you seat on a lot more infrastructure that is opaque to you. In a IaaS environment, the only thing that can change is the hardware (rarely a cause of problem) and the hypervisor (or equivalent Cloud OS). In a PaaS environment, it’s all that plus whatever flavor of OS and application container is used. Depending on how streamlined this all is (just enough OS/AS versus a traditional deployment), that’s potentially a lot of code and configuration. Troubleshooting is also somewhat easier in a IaaS setup because the error logs are localized (or localizable) to a specific instance. Not necessarily so with PaaS (and even if you could localize the error, you couldn’t guarantee that your troubleshooting test runs on the same node anyway).

In a way, PaaS is squeezed between IaaS and SaaS on this. IaaS gets away with a manageable problem because the opaque infrastructure is not too thick. For SaaS it’s manageable too because the consumer is typically either a human (who is a lot more resilient to change) or a very simple and well-understood interface (e.g. IMAP or some Web services). Contrast this with PaaS where the contract is that of an application container (e.g. JEE, RoR, Django).There are all kinds of subtle behaviors (e.g, timing/ordering issues) that are not part of the contract and can surface after a patch: for example, a bug in the application that was never found because before the patch things always happened in a certain order that the application implicitly – and erroneously – relied on. That’s exactly why you always test your key applications today even if the OS/AS patch should, in theory, not change anything for the application. And it’s not just patches that can do that. For example, network upgrades can introduce timing changes that surface new issues in the application.

And it goes both ways. Just like you can be hurt by the Cloud provider patching things, you can be hurt by them not patching things. What if there is an obscure bug in their infrastructure that only affects your application. First you have to convince them to troubleshoot with you. Then you have to convince them to produce (or get their software vendor to produce) and deploy a patch.

So what are the solutions? Is PaaS doomed to never go beyond hobbyists? Of course not. The possible solutions are:

  • Write a bug-free and high-performance PaaS infrastructure from the start, one that never needs to be changed in any way. How hard could it be? ;-)
  • More realistically, narrowly define container types to reduce both the contract and the size of the underlying implementation of each instance. For example, rather than deploying a full JEE+SOA container componentize the application so that each component can deploy in a small container (e.g. a servlet engine, a process management engine, a rule engine, etc). As a result, the interface exposed by each container type can be more easily and fully tested. And because each instance is slimmer, it requires fewer patches over time.
  • PaaS providers may give their users some amount of visibility and control over this. For example, by announcing upgrades ahead of time, providing updated nodes to test on early and allowing users to specify “freeze” periods where nothing changes (unless an urgent security patch is needed, presumably). Time for a Cloud “refresh” in ITIL/ITSM-land?
  • The PaaS providers may also be able to facilitate debugging of infrastructure-related problem. For example by stamping the logs with a version ID for the infrastructure on the node that generated the log entry. And the ability to request that a test runs on a node with the same version. Keeping in mind that in a SOA / Composite world, the root cause of a problem found on one node may be a configuration change on a different node…

Some closing notes:

  • Another incarnation of this problem is likely to show up in the form of PaaS certification. We should not assume that just because you use a PaaS you are the developer of the application. Why can’t I license an ISV app that runs on GAE? But then, what does the ISV certify against? A given PaaS provider, e.g. Google? A given version of the PaaS infrastructure (if there is such a thing… Google advertises versions of the GAE SDK, but not of the actual GAE runtime)? Or maybe a given PaaS software stack, e.g. the Oracle/Microsoft/IBM/VMWare/JBoss/etc, meaning that any Cloud provider who uses this software stack is certified?
  • I have only discussed here changes to the underlying platform that do not change the contract (or at least only introduce backward-compatible changes, i.e. add APIs but don’t remove any). The matter of non-compatible platform updates (and version coexistence) is also a whole other ball of wax, one that comes with echoes of SOA governance discussions (because in PaaS we are talking about pure software contracts, not hardware or hardware-like contracts). Another area in which PaaS has larger challenges than IaaS.
  • Finally, for an illustration of how a highly focused and specialized container cuts down on the need for config changes, look at this photo from earlier today during the presentation of JRockit Virtual Edition at Oracle Open World. This slide shows (in font size 3, don’t worry you’re not supposed to be able to read), the list of configuration files present on a normal Linux instance, versus a stripped-down (“JeOS”) Linux, versus JRockit VE.


By the way, JRockit VE is very interesting and the environment today is much more favorable than when BEA first did it, but that’s a topic for another post.

[UPDATED 2009/10/22: For more on this (in an EC2-centric context) see section 4 ("service problem resolution") of this IBM paper. It ends with "another possible direction is to develop new mechanisms or APIs to enable cloud users to directly and automatically query and correlate application level events with lower level hardware information to better identify the root cause of the problem".]

[UPDATES 2012/4/1: An example of a PaaS platform update which didn't go well.]

Post

PaaS as a satisfying and success-ready hobbyist platform

In Cloud Computing,Everything,Google,Google App Engine,Implementation,Mashup,Microsoft,PaaS,Utility computing,WSO2 on September 30, 2009 by @vambenepe

I don’t know anyone in Silicon Valley who can code and doesn’t fantasize about writing an accidental killer app. One that gets designed during a long layover in DEN and implemented in a rainy weekend (El Nino is my VC). One that was only supposed to meet the needs of a few friends and is used by half of the world a few months/years later.

I am not talking about seasoned entrepreneurs, who have a network, discipline, resources and enough experience to know that it takes a lot more than a cool idea. Rather, about programming hobbyists (who may of may not be programmers in their day jobs),

By definition, hobbyists only do things that are satisfying. In the rarefied air of Silicon Valley, it also helps if there is a conceivable “upside” to dream about. Platform as a Service (PaaS, e.g. Google App Engine) provides both to software-oriented hobbyist. And make it very cheap (borderline free), which doesn’t hurt.

Satisfying

In a well-crafted PaaS environment, the development process and the result are both satisfying. I am not a Google shrill, but GAE is a fair example. The barrier to entry is very low (the download is less than 10MB and contains all you need to get started). In an hour you have an application running locally. In an hour and 5 minutes you have it deployed and accessible on the web for all. And yet this ease of bootstrap does not come at the cost of too many longer-term limitations (now that the environment has gown a bit from the original limitations and provides scheduled and background jobs). Unlike Yahoo Pipes, for which the first impression is “nifty!” and the second is “gimme a textual representation of my pipe now!”.

Beyond the easy ramp-up, the main source of satisfaction developing in a PaaS environment is that you spend 99% of your time working on the application. Not the OS, not the firewall, not the application container, not the database. Not to mention having to deal with your co-lo provider or the leased line for the servers in the basement. If you are a hobbyist with only a few spare hours per week, that’s a make or break deal. It also means that you have a fighting chance of developing a secure application because you are responsible for a much smaller surface of attack.

Eons ago (in computing time), Visual Basic was the name of the game for these people. More recent was the rise of PHP. It dramatically lowered the barrier to adoption and provided a quick route to a working web application. I know several non-professional developers (e.g. web designers) who are scared of any “normal” programming language but happily write PHP (often of equivalent complexity BTW). Combine this with the wide availability of ISP-managed PHP environments and you get close to what GAE gives you. At the risk of adding to the annoying trend of retroactively cloudifying everything, I think of ISP-hosted PHP as the first generation of PaaS. But it is focused on “show what’s in the DB” scenarios rather than service-centric / mash-up / web 2.0 integration. And even for DB-centric scenarios, by and large PHP coders don’t want to think too much about model and queries (and it shows). I think Google decided to go with Python rather than the easy route of aping the hosted PHP environments in large part to avoid hitting such ceilings down the road. Not surprisingly, PHP support is currently the most requested GAE feature, ahead of Perl and Ruby. Lets see if Google tries to get the PHP community on board or prefers to stay clear of such PaaS legacy (already!).

Ready for success

In the unlikely event that your application catches fire and sees wide adoption (which is not impossible, especially if well integrated in a social network), what are you going to do about it? Keeping in mind two constraints: first, this is a part-time hobby of yours. Second, don’t dream of riches. We are talking about an influx of facebookers or twitterers here, with no intention to pay for anything. But click they will. If you were going to answer: “I get funding and hire a real IT staff” then think again. You most likely won’t get funding for your toy app without revenue potential. And even if you do, by the time you have it it’s too late and people have moved on because you were not there when the spotlight was on you.

With a PaaS-based application you have a fighting chance. If the spike is short enough, you may not even hit the limit of the free quota. If it does, you have the choice of whether you are willing to pay to support the extra traffic or not. No change in code required (though it may be advised anyway, if your app wasn’t architected for efficient scaling – PaaS doesn’t entirely take this off your hands).

That “sudden spike” story is a commonly-invoked use case for EC2. And it’s probably true for a start-up with an IT staff (of at least one full-time person). But despite Amazon’s efforts (and other providers such as RightScale) this type of scaling is something you have to architect for and putting it in place takes away from the time you spend coding application features. It also means that you are responsible for more infrastructure (OS and application container at least). Not to mention that IaaS providers don’t usually offer free resources for limited usage, the way Google does (I suspect 99% of GAE apps never get over this quota). Even if a small EC2 instance is not very expensive (though it adds up over time if you keep it up for that occasional user), the difference between “free” and “cheap” is significant. As an application provider you’d like for this not to be the case with your users, but as a consumer of infrastructure service you’re on the other side of the deal, aren’t you?

There is a reason why suburbia-bound SUVs are advertised crossing mountain streams. The “I could if I wanted to” line has appeal. For the software hobbyist, knowing that your application won’t crash if it happened to meet success (even if only for a couple of days, e.g. the Slashdot effect) is a good feeling (“I could if *they* wanted to”). In truth this occasion is rare (and likely to end up like this), but you are ready for the eventuality. And if there are enough such hobbyists, then statistically some will encounter it.

The provider’s upside

That last point brings the topic of the PaaS provider’s upside in this. I have read several critical comments arguing that no company will rewrite their application for GAE (true) and that no start-up will write their new code for it either because of the risk of lock-in (also true: “being bought by Google” is not a bad outcome but “has to be bought by Google” is a bad exit strategy). But I think this misses the point of casting such a wide nest and starting with creating a great tool for hobbyists.

After all, Google has made a great business monetizing millions of small sites none of which makes much money by itself. At the very least least this can grow the web and, symbiotically, Google. With  two possible upsides:

  • Some of these hobbyist applications may actually take off and Google becomes their natural partner/godfather (including managing their user accounts). For example, wouldn’t it be nice for Google if Craigslist or Twitter was running on GAE?
  • The platform eventually evolves into something that makes sense for start-ups, SMBs and/or enterprises to use. Google works out the kinks with less demanding users first.

Interesting times

Two closing thoughts, which I’ll leave undeveloped for now:

  • There is an especially good synergy between mobile apps and PaaS. Once you get past the restaurant tip calculators, many mobile apps need a server-hosted sidekick to do the heavy lifting of gathering/storing/transforming data. As a hobbyist, you want to spend most of your time making you mobile app cool. Which leaves even less time for administrating server components. On the server side, you are even less likely to want to deal with anything but application logic. PaaS is especially attractive in these scenarios. Google and Microsoft have to navigate these waters carefully but look for some synergy/integration stories around GAE + Android and Azure + Windows Mobile respectively. Not clear what Apple’s story is here or if they think they need one. If it surfaces as an issue then we have yet another reason to restart the “Apple buys Adobe” rumor. Or maybe Sanjiva will get a middle-of-the-night call from Steve Jobs…
  • A platform to build/run your application is one thing. A way to reach users is another (arguably much more critical). Things like mobile app stores (especially Apple’s of course), Facebook and next generation app stores. But this goes  beyond the scope of this post.

Just to be clear, I am not in any way suggesting that PaaS is only for hobbyists. I am just saying that right now it is a great tool for them, the best way for an individual programmer to have fun and have an impact. This doesn’t take away from the value that PaaS will eventually deliver to larger organizations.

[UPDATED 2009/10/4: Microsoft Azure apparently supports PHP.]

Post

The future (2006 version), has arrived

In Application Mgmt,Automation,Cloud Computing,CMDB,CMDB Federation,CMDBf,Desired State,Everything,IT Systems Mgmt,Manageability,Mgmt integration,Modeling,Protocols,SML,Specs,Standards,Utility computing,WS-Management on September 24, 2009 by @vambenepe

Remember 2006? Things were starting to fall into place for IT management integration and automation:

  • SDD was already on its way to cleanly describe/package/manage the lifecycle of simple and composite applications alike,
  • the first version of SML came out to capture all the relevant constraints of complex and composite systems and open the door to “desired-state management”,
  • the CMDBf effort was started to seamlessly integrate all sources of configuration and provide a bird-eye view of your entire IT infrastructure, and
  • the WSDM/WS-Management convergence/reconciliation was announced and promised to free management consoles from supporting many resource discovery, collection and control mechanisms and from having platform/library dependencies between the manager and its targets.

It looked like we were a year or two from standardization on all these and another year or two from shipping implementations. Things were looking good.

Good news: the schedule was respected. SDD, SML and CMDBf are now all standards (at OASIS, W3C and DMTF respectively). And today the Eclipse COSMOS project announced the release of COSMOS 1.1 which implements them all. The WSDM/WS-Management convergence is the only one that didn’t quite go according to the plan but it is about to come out as a standard too (in a pared-down form).

Bad news: nobody cares. We’ve moved on to “private clouds”.

Having been involved with these specifications in various degrees (a little bit on SDD, a fair amount on SML and a lot on CMDBf and WSDM/WS-Management) I am not as detached as my sarcastic tone may suggest. But as they say in action movies, “don’t let sentiments get in the way of the mission”.

There is still a chance to reuse parts of this stack (e.g. the CMDBf query language) and there are lessons to learn from our errors. The over-promising, the technical misjudgments, the political bickering, the lack of concrete customer validation, etc. To some extent this work was also victim of collateral damages from the excesses of WS-* (I am looking at you WS-Addressing). We also failed to notice the rise of the hypervisor in our peripheral vision.

I tried to capture some important lessons in this post-mortem. For the edification of the cloud generation. I also see a pendulum in action. Where we over-engineered I now see some under-engineering (overly granular interaction models, overemphasis on the virtual machine as the unit of everything, simplistic constraint models, underestimation of config/patching issues…). Things will come around and may eventually look familiar (suggested exercise: compare PubSubHubBub with WS-Notification).

As long as each iteration gets us closer to the goal things are good.

See you in 2012. Same place, same day, same time.

Post

Thoughts on the “Simple Cloud API”

In Application Mgmt,Cloud Computing,Everything,IBM,Manageability,Mgmt integration,Microsoft,Middleware,Open source,Portability,Utility computing on September 22, 2009 by @vambenepe

PHP developers with Cloud aspirations rejoice! Zend has announced a PHP toolkit (called the Simple Cloud API project) to abstract and access application-level Cloud services. This is not just YACA (yet another Cloud API), as there are interesting differences between this and all the other Cloud toolkits out there.

First it’s PHP, which was not covered by the existing toolkits. Considering how many web applications are written in PHP (including the one that serves this very blog) this may seem strange, until you realize that most Cloud toolkits out there are focused on provisioning/managing low-level compute resources of the IaaS kind. Something that is far out of PHP’s sweetspot and much more practically handled with Java, Python, Ruby or some .NET language accessible via PowerShell.

Which takes us to the second, and arguably most interesting, characteristic of this toolkit: it is focused on application-level Cloud services (files, documents and queues for now) rather than infrastructure-level. In other word, it’s the first (to my knowledge) PaaS toolkit.

I also notice that Zend has gotten endorsements from IBM, Microsoft, Nirvanix, Rackspace and GoGrid. The first two especially seem to have impressed InfoWorld. Let’s keep in mind that at this point all we are talking about are canned quotes in a press release. Which rank only above politician campaign promises as predictor of behavior. In any case that can’t be the full extent of IBM and Microsoft’s response to the VMWare/Cisco push on IaaS standards. But it may suggest that their response will move the battlefield to include PaaS, which would be a smart move.

Now for a few more acerbic comments:

  • It has “simple” in its name, like SOAP (as Pete Lacey famously lampooned). In the long term this tends to negatively correlate with simplicity, just like the presence of “democratic” in the official name of a country does not bode well for actual democracy.
  • Please, don’t shorten “Simple Cloud API” to SCA which is already claimed in a (potentially) closely related field.
  • Reuven Cohen is technically correct to see it as “a way to create other higher level programmatic API interfaces such as REST or SOAP using an easy, yet portable PHP programming environment”. But pay attention to how many turtles are on this pile: the native provider API, the adapter to the “simple cloud API”, the SOAP or REST remote API and the consuming application’s native API. How much real isolation are you getting when you build your house on such a wobbly foundation

[UPDATE: Comments from someone in the know:  a programmer working on adding Azure support for this Simple Cloud API project.]

Post

Look Ma, no hypervisor!

In Application Mgmt,Cloud Computing,Everything,Google App Engine,Implementation,IT Systems Mgmt,Middleware,Utility computing,Virtualization,VMware,XenSource on September 21, 2009 by @vambenepe

Encouraged by hypervisor vendors, the confusion between virtualization and Cloud Computing is rampant. In the industry, the term “virtualization” (and its corollary, “virtual machine”) is used in so many different ways that it has lost all usefulness. For a recent example, read the introduction of this SNIA/OGF white paper (on Cloud Storage) which asserts that “the new technology underlying this is the system virtual machine that allows multiple instances of an operating system and associated applications to run on single physical machine. Delivering this over the network, on demand, is termed Infrastructure as a Service (IaaS)”.

In fact, even IaaS-type Cloud services don’t imply the use of hypervisors.

We need to decouple the Cloud interface/contract (e.g. “what are the types of resources that can I provision on demand? hosts, app servers, storage capacity, app services…”) from the underlying implementation (e.g. “are hypervisors used by the Cloud provider?”). At the risk of spelling out things that may be obvious to many readers of this blog, here is a simplified matrix of Cloud Computing systems, designed to illustrate that all combinations of interface and implementation are possible and in many cases even reasonable.

IaaS interface PaaS interface
Hypervisor used Yes! (see #1) Yes! (see #2)
Hypervisor not used Yes! (see #3) Yes! (see #4)

#1: IaaS interface, hypervisor-based implementation

This is a very common approach these days, both in public Clouds (EC2, Rackspace and presumably at some point the VMWare vCloud Express service providers) and private Clouds (Citrix, Sun, Oracle, Eucalyptus, VMWare…). Basically, you take a bunch of servers, put hypervisors on all of them and make VMs running on these hypervisors available to the Cloud customers.

But despite its predominance, this is not the only path to a Cloud, not even to an IaaS (e.g. “x86 hosts on demand”) Cloud. The following three other scenarios are all valid too.

#2: PaaS interface, hypervisor-based implementation

This is the road SpringSource has been on, first with Cloud Foundry (using AWS EC2 which is based on the Xen hypervisor) and presumably soon on top of VMWare.

#3: IaaS interface, no hypervisor in the implementation

Let’s remember that the utility computing vision (before the term fell in desuetude in favor of “cloud”) has been around before x86 hypervisors were so common. Take Loudcloud as an illustration. They were building what is now called a “public Cloud” starting back in 1999 and not using any hypervisor. Just bare metal provisioning and advanced provisioning automation software. Then they sold the hosting part to EDS (now HP) and only kept the software, under the name Opsware (now HP too, incidentally). That software was meant to create what we now call a “private Cloud”. See this old DCML announcement as one example of the Opsware vision. And no hypervisor was harmed in the making of this movie.

At the current point in time, the hardware (e.g. multiple cores, shared memory) and software (hypervisors, legacy apps) environment is such that hypervisor-based solutions seem to have an edge over those based on automated provisioning/configuration alone. But these things tend to change quickly in our industry… Especially if you factor in non-technical considerations like compliance, fear of data leakage and the risk of having the hardware underlying your application seized because of an investigation involving another tenant…

And this is not going into finner techno-philosophical points about the different types of hypervisors. Not to mention mainframe LPARs… One could build a hypervisor-free IaaS solution on these.

To some extent, you may even put the “pwned” machines (in a botnet) in this “IaaS with no hypervisor” category (with the small difference that what’s being made available is an x86 with an OS, typically Windows, already installed). If you factor out externalities (like the FBI breaking down your front door at 6:00AM) this approach has claims as the most cost-effective form of Cloud computing available today… Solaris zones are another example of possible foundation for a hypervisor-free IaaS-like offering (here too, with an OS rather than a “raw host” as the interface).

#4: PaaS interface, no hypervisor in the implementation

In the public sphere, this corresponds to Google App Engine.

In the private sphere, several companies have built it themselves on top of WebLogic, by adding some level of “on-demand” application provisioning in order to streamline the relationship between the IT group running the servers and the business groups who want to deploy applications on them. Something that one should ideally be able to buy rather than build.

Waiting for the question to become irrelevant

Like most deeply-ingrained confusions, the conflation of virtualization and Cloud Computing won’t be dispelled as much as made irrelevant. The four categories enumerated in this post are a point-in-time view of a continuously evolving system. What may start today as a bundle of a hypervisor, an OS and an app server may become a somewhat monolithic “PaaS engine” over time as the components are more tightly integrated. That “engine” may have memory isolation mechanisms that look a lot like a hypervisor. But it may not be able to host a generic OS. In the same way that whales don’t have fingers and toes and yet they are still very much apparent in their skeleton.

[UPDATED 2009/10/8: A real-life example of #3! On-demand servers via bare metal provisioning (via Sam). No hypervisor in the picture. See also here.]

[UPDATED 2009/12/29: Another non-hypervisor Cloud provider! NewServers. Here is their API. And a Q&A.]

Post

Cloud Data Management Interface (CDMI) draft released

In Cloud Computing,Everything,Protocols,REST,Specs,Standards,Utility computing,Virtualization on September 15, 2009 by @vambenepe Tagged:

Have you developed “Cloud API fatigue” from seeing too many IaaS “Cloud APIs” lately? Are you starting to wonder how many different ways there can possibly be to launch a virtual machine via an HTTP POST? Are you wondering why everybody else seems to equate Cloud computing with on-demand server instances?

If yes, then CDMI will come as a breath of fresh air. This specification (just a draft at this point) is a rare example of a different beast. Coming out of SNIA, it endeavors to standardize the way storage resources are managed and accessed in a Cloud environment. They call this DaaS (Data storage as a Service).

The specification has two components (which may benefit from being separated in two specifications at some point). One (called “control paths”) is an interface to manage a data storage service. That interface is expected to work across many forms of data storage from block storage (like AWS EBS) to filesystems (e.g. NFS) to object stores with a CRUD interface (similar to the WebDAV volumes of the Sun API). It also mentions a “simple table space storage” storage form, but that part is pretty fuzzy.

The second component of CDMI (called “data paths”) only applies to the CRUD object store and it describes a RESTful interface for accessing it. This figure from the specification does a good job of illustrating the two different APIs in the specification (and the different types of storage envisioned).

One of the most interesting sections in the document describes the way in which the authors envision the ability to export the storage resources provisioned/managed through CDMI to other Cloud APIs. They illustrate it in an example involving OCCI (see also this joint white paper). This is very interesting and another sign that we need a shared RESTful resource control framework for Cloud computing as a first layer of standardization. One of the reasons I used to justify this claim two weeks ago was that “there will not be one API that provides control of [all the different forms of Cloud Computing], but they can share a base protocol that will make life a lot easier for developers. These Clouds won’t be isolated, developers will use them as a continuum.” One week later, this draft specification illustrates the point very well.

[As a somewhat related side note, this interesting post about what it takes to provide a large-scale resilient data service (the Google App Engine data store). And more about the Google File System in general.]

Post

Toolkits to wrap and bridge Cloud management protocols

In Amazon,API,Automation,Cloud Computing,Everything,Google App Engine,Implementation,IT Systems Mgmt,Manageability,Mgmt integration,Open source,Protocols,Utility computing on September 11, 2009 by @vambenepe

Cloud development toolkits like Libcloud (for Python) and jcloud (for Java) have been around for some time, but over the last two months they have been joined by several other open source contenders. They all claim to abstract the on-the-wire Cloud management protocols sufficiently to let you access different Clouds via the same code; while at the same time providing objects in your programming language of choice and saving you the trouble of dealing with on-the-wire messages. By focusing on interoperability, they slot themselves below the larger role of a “Cloud broker” (which also deals with tasks like transfer and choice). Here is the list, starting with the more recent contenders:

DeltaCloud shares the same goal of translating between different Cloud management protocols but they present their own interface as yet another Cloud REST API/protocol rather than a language-specific toolkit. More along the lines of what UCI is trying to do (not sure what’s up with that project, I recorded my skepticism earlier and am still waiting to be pleasantly surprised).

Of course there are also programming toolkits that are specific to one Cloud provider. They are language-specific wrappers around one Cloud management protocol. AWS protocols (EC2, S3, etc…) represent the most common case, for example amazon-ec2 (a Ruby Gem), Power-EC2Dream (in C# which gives it the tantalizing advantage of being invokable via PowerShell) and typica (for Java). For Clouds beyond AWS, check out the various RightScale Ruby Gems.

The main point of this entry was to list the cross-Cloud development toolkits in the bullet list above. But if you’re in the mood for some pontification you can keep reading.

For some reason, what used to be called “protocols” is often called “APIs” in Cloud settings. Witness the Sun Cloud “API” or the vCloud “API” which only define XML formats for on-the-wire messages. I have never heard of CIM/XML over HTTP, WSDM or WS-Management being referred as APIs though they occupy a very similar place. They are usually considered “protocols”.

It’s a just question of definition whether an on-the-wire protocol (rather than a language-specific set of objects/methods) qualifies as an “Application Programming Interface”. It’s not an “interface” in the Java sense of the term. But I can “program” against it so it could go either way. On this blog I have gone along with the “API” term because that seemed widely used, though in verbal conversations I have tended to stick to “protocol”. One problem with “API” is that it pushes you towards mixing the “what” and the “how” and not respecting the protocol/model dichotomy.

Where is becomes relevant is when you start to see language-specific APIs for Cloud control pop-up as listed above. You now have two classes of things called “API” and it gets a bit confusing. Is it time to bring back the “protocol” term for on-the-wire definitions?

As a developer, whether you’re better off eating your Cloud noodles using chopsticks (on-the-wire protocol definitions) or a fork (language-specific APIs) is an important decision that will stay with you and may come back to bit you (e.g. when the interfaces are versioned). There is a place for both of course, but if we are to learn anything from WS-* it’s that we went way too far in the “give me a java stub” direction. Which doesn’t mean there is no room for them, but be careful how far from the wire semantics you get. It become even trickier when your stub tries not jsut to bridge between XML and Java but also to smooth out the differences between several on-the-wire protocols, as the toolkits above do. The hope, of course, is that there will eventually be enough standardization of on-the-wire protocols to make this a moot point.

Post

Separating model from protocol in Cloud APIs

In API,Automation,Cloud Computing,Everything,IT Systems Mgmt,Manageability,Mgmt integration,Modeling,Protocols,REST,Specs,Standards,Utility computing on September 4, 2009 by @vambenepe

What happened to the separation between the model and the protocol in management APIs? For all the arguments we had in the design of WSDM and WS-Management, this was one fundamental concept that took little discussion before everyone agreed: that the protocol (the interaction model and the on-the-wire shape of the messages used) should be defined in a way that is agnostic to the type of resource being managed (computers, elevators or toasters — the perennial silly example). To this end, WSDM took pains to release MUWS (Management Using Web Services) and MOWS (Management Of Web Services) as two different specifications.

Contrast that to the different Cloud APIs (there is a new one released every other day). If they have one thing in common it is that they happily ignore this principle and tackle protocol concerns alongside the resource model. Here are my guesses as to why that is:

1) It’s a land grad

The goal is not to produce the best long-term API, it’s to be out early, to stake your claim and to gain leverage, so that you can steer the final standard close to your implementation. Editorial niceties like properly factoring the specification are not major concerns, there will be plenty of time for this during the standardization process. In fact, leaving such improvements for the standardization phase is a nice way to make it look like the group is not just rubberstamping, while not changing much that actually impacts your implementation. The good old “give them something insignificant to argue about” trick. It works BTW.

As an example of how rushed some of these submissions can be, did you notice that what VMWare submitted to DMTF this week is the vCloud API Specification v0.8 (a 7-page document that is simply a list of operations), not the accompanying vCloud API programming guide v0.8 which is ten times longer and is the real specification, the place where the operation semantics, payload formats and protocol considerations are actually described and without which the previous document cannot possibly be implemented. Presumably the VMWare team was pressed to release on time for a VMWorld announcement and they came up with this to be able to submit without finishing all the needed editorial work. I assume this will follow soon and in the meantime the DMTF members will retrieve the programming guide from the VMWare site in order to make sense of what was submitted to them.

This kind of rush is not rare in the history of specification submission, even those that have been in the work for a long time . For example, the initial CBE submission by IBM had “IBM Confidential” all over the specification and a mention that one should retrieve the most up to date version from the “Autonomic Computing Problem Determination Offering Team Notes Database” (presumably non-IBMers were supposed to break into the server).

If lack of time is the main reason why all these APIs do not factor out the protocol aspects then I have no problem, there is plenty of time to address it. But I suspect that there may be other reasons, that some may see it as a feature rather than a bug. For example:

2) Anything but WS-*

SOAP-based interfaces (WS-* or WS-DeathStar) have a bad rap and doing anything in the opposite way is a crowd pleaser (well, in the blogosphere at least). Modularity and composition of specifications is a major driving force behind the WS-* work, therefore it is bad and we should make all specifications of the new REST order stand-alone.

3) Keep it simple

A more benevolent way to put it is the concern to keep things simple. If you factor specifications out you put on the developer the burden of assembling the complete documentation, plus you introduce versioning issues between the parts. One API document that fully describes the contract is simpler.

4) We don’t need no stinking’ protocol, we have HTTP

Isn’t this the protocol? Through the magic of REST, all that’s needed is a resource model, right? But if you look in the specifications you see sections about authentication, fault handling, long-lived operations, enumeration of long result sets, etc… Things that have nothing to do with the resource model.

So what?

Why is this confluence of model and protocol in one specification bad? If nothing else, the “keep it simple” argument (#3) above has plenty of merits, doesn’t it? Aren’t WSDM and WS-Management just over-engineered?

They may be, but not because they offer this separation. Consider the following practical benefits of separating the protocol from the model:

1) We can at least agree on one part

Thanks to the “REST is the new black” attitude in Cloud circles, there are lots of commonalities between these various Cloud APIs. Especially the more recent ones, those that I think of as “second generation” APIs: vCloud, Sun API, GoGrid and OCCI (Amazon EC2 is the main “1st generation” Cloud API, back when people weren’t too self-conscious about not just using HTTP but really “doing REST”). As an example of convergence between second generation specifications, see for example, how vCloud and the Sun API both use “202 Accepted” and a dedicated “status” resource to handle long-lived operations. More comparisons here.

Where they differ on such protocol matters, it wouldn’t be hard to modify one’s implementation to use an alternative approach. Things become a lot more sensitive when you touch the resource model, which reflects the actual capabilities of the Cloud management infrastructure. How much flexibility in the network setup? What kind of application provisioning? What affinity/anti-affinity control level? Can I get block-level storage? Etc. Having to implement the other guy’s interface in these matters is not just a matter of glue code, it’s a major product feature. As a result, the resource model is a much more strategic control point than the protocol. Would you rather dictate the terms of a contract or the color of the ink in which it is printed?

That being the case, I suspect that there could be relatively quick and painless agreement on that first layer of the Cloud API: a set of protocol considerations, based on HTTP and REST, that provide a resource control framework with support for security, events, long-running operations, faults, many-as-one semantics, enumeration, etc. Or rather, that if there is to be a “quick and painless” agreement on anything related to Cloud computing standards it can only be on something that is limited to protocol concerns. It doesn’t have to be long and complex. It doesn’t have to be factored in 8 different specifications like WS-* did. It can be just one specification. Keep it simple, ignore all use cases that aren’t related to Cloud Computing. In the end, please call it MUR (Management Using REST)… ;-)

2) Many Clouds, one protocol to rule them all

Whichever Cloud taxonomy strikes your fancy (I am so disappointed that SADIST-PIMP hasn’t caught on), it’s pretty clear that there will not be one kind of Cloud. There will be at least some IaaS, some PaaS and plenty of SaaS. There will not be one API that provides control of them all, but they can share a base protocol that will make life a lot easier for developers. These Clouds won’t be isolated, developers will use them as a continuum.

3) Not just one access model

As much as it makes sense to start from simple and mostly synchronous operations, there will be many different interaction models for Cloud Computing. In addition to the base operations, we may get more of a desired-state/blueprint interaction pattern, based on the same resource model. Or, somewhere in-between, some kind of stored execution flow where modules are passed around rather than individual operations. Also, as the level of automation increases you may want a base framework that is more event-friendly for rapid close-loop management. And there are other considerations involved (like resource monitoring, policies…) not currently covered by these specifications but that can surely reuse the protocol aspects. By factoring out the resource model, you make it possible for these other interaction patterns to emerge in a compatible way.

The current Cloud APIs are not far away from this clean factoring. It would be an easy task to extract protocol considerations as a separate document, in large part due to the fact that REST prevents you from burying the resource model inside convoluted operation semantics. To some extent it’s just a partitioning issue, but the same can be said of many intractable and bloody armed conflicts around the world… Good fences make good neighbors in the world of IT specs too.

[UPDATE: Soon after this entry went to "press" (meaning soon after I pressed the publish button), I noticed this report of a "REST-*" proposal by Mark Little of RedHat/JBOSS. I will reserve judgment until Mark has blogged about it or I have seen some other authoritative description. We may be talking about the same thing here. Or maybe not. The REST-* name surprises me a bit as I would expect opponents of such a proposal to name it just this way. We'll see.]

[UPDATE 2009/9/6: Apparently I am something like the 26th person to think of the "one protocol/API to rule them all" sentence. We geeks have such a shallow set of shared cultural references it's scary at times.]

[UPDATED 2009/11/12: Lori MacVittie has a very nice follow-up on this, with examples and interesting analogies. Check it out.]

Post

VMWare publishes (and submits) vCloud API

In API,Application Mgmt,Automation,Cloud Computing,DMTF,Everything,IT Systems Mgmt,Mgmt integration,Protocols,REST,Specs,Standards,Utility computing,Virtualization,VMware on September 2, 2009 by @vambenepe

VMWare published its vCloud API yesterday (it was previously only available to a few partners) and submitted it to the DMTF, as had been previously announced. So much for my speculations involving IBM.

It may be time to update the Cloud API comparison. After a very quick first pass, vCloud looks quite similar to the Sun Cloud API (that’s a compliment). For example, they both handle long-lived operations via a “202 Accepted” complemented by a resource that represents the progress (“status” for Sun, “task” for vCloud). A very visible (but not critical) difference is the use of JSON (Sun) versus XML (vCloud).

As expected, OVF/OVA is central to vCloud. More once I have read the whole specification.

In any case, things are going to get interesting in the DMTF Cloud incubator. I there a path to adoption?Assuming that Amazon keeps sitting it out, what will the other Cloud vendors with an API (Rackspace, GoGrid, Sun…) do? I doubt they ever had plans/aspirations to own or even drive the standard, but how much are they willing to let VMWare do it? How much does Citrix/Xen want to steer standards versus simply implement them in the context of the Xen Cloud project? What about OGF/OCCI with which the DMTF is supposedly collaborating?How much support is VMWare going to receive from its service provider partners? How much traction does VMWare have with Cisco, HP (server division) and IBM on this? What are the plans at Oracle and Microsoft? Speaking of Microsoft, maybe it will at some point want its standard strategy playbook back. At least when VMWare is done using it.

Post

Are these your files? I found them on my cloud

In Amazon,Cloud Computing,Everything,Google App Engine,Security,Utility computing,Virtualization,Xen on September 2, 2009 by @vambenepe

Drip drip drip… Is this the sound of your cloud leaking?

It can happen in different ways. See for example this recent research paper, titled “Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds”. It’s a nice read, especially if you find side channels interesting (I came up with one recently, in a different context).

In the first part of the paper, the authors show how to get your EC2 instance co-located (i.e. running in in the same hypervisor) with the instance you are targeting (the one you want to spy on). Once this is achieved, they describe side channel attacks to glean information from this situation.

This paper got me thinking. I noticed that it does not mention trying to go after disk blocks and memory. I don’t know if they didn’t try or they tried and were defeated.

For disk blocks (the most obvious attack vector), Amazon is no dummy and their “proprietary  disk  virtualization  layer  automatically  wipes every block of storage used by  the customer, and guarantees  that one customer’s data  is never exposed to another” as explained in the AWS Security Whitepaper. In fact, they are so confident of this that they don’t even bother forbidding block-based recovery attempts in the AWS customer agreement (they seem mostly concerned about attacks that are not specific to hypervisor environments, like port scanning or network-based DOS). I took this as an invitation to verify their claims, so I launched a few Linux/ext3 and Windows/NTFS instances, attached a couple of EBS volumes to them and ran off-the-shelf file recovery tools. Sure enough, nothing was found on  /dev/sda2 (the empty 150GB partition of local storage that comes with each instance) or on the EBS volumes. They are not bluffing.

On the other hand, there were plenty of recoverable files on /dev/sda1. Here is what a Foremost scan returned on two instances (both of them created from public Fedora AMIs).

The first one:

Finish: Tue Sep  1 05:04:52 2009

5640 FILES EXTRACTED

jpg:= 14
gif:= 670
htm:= 1183
exe:= 2
png:= 3771
------------------------------------------------------------------

And the second one:

Finish: Wed Sep  2 00:32:16 2009

17236 FILES EXTRACTED

jpg:= 236
gif:= 2313
rif:= 11
htm:= 4886
zip:= 182
exe:= 6
png:= 9594
pdf:= 8
------------------------------------------------------------------

These are blocks in the AMI itself, not blocks that were left on the volumes on which the AMI was installed. In other words, all instances built from the same AMI will provide the exact same recoverable files. The C: drive of the Windows instance also had some recoverable files. Not surprisingly they were Windows setup files.

I don’t see this as an AWS flaw. They do a great job providing cleanly wiped raw volumes and it’s the responsibility of the AMI creator not to snapshot recoverable blocks. I am just not sure that everyone out there who makes AMIs available is aware of this. My simple Foremost scans above only looked for the default file types known out of the box by Foremost. I suspect that if I added support for .pem files (used by AWS to store private keys) there may well be a few such files recoverable in some of the publicly accessible AMIs…

Again, kudos to Amazon, but I also wonder if this feature opens a possible DOS approach on AWS: it doesn’t cost me much to create a 1TB EBS volume and to destroy it seconds later. But for Amazon, that’s a lot of blocks to wipe. I wonder how many such instantaneous create/delete actions on large EBS volumes it would take to put a large chunk of AWS storage capacity in the “unavailable – pending wipe” state… That’s assuming that they proactively wipe all the physical blocks. If instead the wipe is virtual (their virtualization layer returns zero as the value for any free block, no matter what the physical value of the block) then this attack wouldn’t work. Or maybe they keep track of the blocks that were written and only wipe these.

Then there is the RAM. The AWS security paper tells us that the physical RAM is kept separated between instances (presumably they don’t use ballooning or the more ambitious Xen Transcendent Memory). But they don’t say anything about what happens when a new instance gets hold of the RAM of a terminated instance.

Amazon probably makes sure the RAM is reset, as the disk blocks are. But what about your private Cloud infrastructure? While the prospect of such Cloud leakage is most terrifying in a public cloud scenario (anyone could make use of it to go after you), in practice I suspect that these attack vectors are currently a lot more exploitable in the various “private clouds” out there. And that for many of these private clouds you don’t need to resort to the exotic side channels described in the “get off of my cloud” paper. Amazon has been around the block (no pun intended) a few times, but not all the private cloud frameworks out there have.

One possible conclusion is that you want to make sure that your cloud vendor does more than writing scripts to orchestrate invocations of the hypervisor APIs. They need to understand the storage, computing and networking infrastructure in details. There is a messy physical world under your clean shinny virtual world. They need to know how to think about security at the system level.

Another one is that this is a mostly an issue for hypervisor-based utility computing and a possible trump card for higher level of virtualization, e.g. PaaS. The attacks described in the paper (as well as block-based file recovery) would not work on Google App Engine. What does co-residency mean in a world where subsequent requests to the same application could hit any machine (though in practice it’s unlikely to be so random)? You don’t get “deployed” to the same host as your intended victim. At best you happen to have a few requests executed while a few requests of your target run on the same physical machine. It’s a lot harder to exploit. More importantly, the attack surface is much more restrained. No direct memory access, no low-level scheduler data, no filesystem… The OS to hardware interface that hypervisors emulate was meant to let the OS control the hardware. The GAE interface/SDK, on the other hand, was meant to give the application just enough capabilities to perform its task, in a way that is as removed from the hardware as possible. Of course there is still an underlying physical reality in the GAE case and there are sure to be some leaks there too. But the small attack surface makes them a lot harder to exploit.

[UPDATED 2009/9/8: Amazon just improved the ability to smoothly update your access certificates. So hopefully any such certificate found on recoverable blocks in an AMI will be out of data and unusable.]

[UPDATED 2009/9/24: Some good security practices that help protect you against block analysis and many other forms of attack.]

[UPDATED 2009/10/15: At Oracle Open World this week, I was assured by an Amazon AWS employee that the DOS scenario I describe in this post would not be a problem for them. But no technical detail as to why that is. Also, you get billed a minimum of one hour for each EBS volume you provision, so that attack would not be as cheap as I thought (unless you use a stolen credit card).]

Post

Thoughts on VMWare, SpringSource and PaaS

In Application Mgmt,Cloud Computing,Everything,Middleware,Spring,Utility computing,Virtualization,VMware on August 26, 2009 by @vambenepe

I am late to the party for  commenting on the upstream and downstream acquisitions involving SpringSource. I was away on vacations, but Rod Johnson obviously didn’t have too many holiday plans of his own in August.

First came the acquisition by VMWare. Then the acquisition of Cloud Foundry and the launch of its SpringSource reincarnation.

You’ve all read a lot about this already, so I’ll limit myself to a few bullet-point comments.

  • I was wrong to think at the time of the Hyperic acquisition that SpringSource would focus on app-centric management, BTM and transaction tracing more than Cloud computing and automation.
  • This move by VMWare helps me make some of the points I have been trying to make internally about Cloud computing.
  • This is a step in the progress from “fake machines” to true “virtual machines” (note to self: I may have to stop referring to “fake machines” as “VMWare-style virtual machines”).
  • Savio Rodrigues makes some interesting points, especially on the difference between a framework and a runtime.
  • Many people have hypervisors, a management console and middleware bits. If you are industry-darlings VMWare/SpringSource people seem more willing to assume that you can put them all in a bag, shake it and out comes a PaaS platform than if you are boring old Oracle, Microsoft or IBM. Fine. But let’s see how the (very real) potential gets delivered. Kudos to Adrian Colyer for taking a shot at describing it in a reasonable way, though there is still a fair amount of hand-waving… and already a drift towards the “I don’t need no cluster, I have a hypervisor and everything is a VM” reflex.
  • The “what does it mean for RedHat” angle seems to miss the point to me and be a byproduct of over-focusing on the “open source” aspect which is not all that relevant. This is more about Oracle, Microsoft and IBM than RedHat in my view.
  • Won’t it be fun when Cisco, VMWare and BMC are all one company and little SpringSource calls the shots from within (I have seen this happen more than once during my days at HP Software)?
  • I have no opinion on the question of whether VMWare over-paid or not. I’ll tell you in two years… :-)