Category Archives: Cloud Computing

Cloud APIs need to be complemented by Cloud processes

A lot of attention has been focused on technical standards for Cloud computing, especially over the last month (e.g. DMTF incubator announcement). That’s fine, but before we go crazy with detailed technical standards let’s realize that for Cloud computing (of the public variety at least) to take off we’ll need just as much standardization of non-technical interactions. Namely processes.

This, to me, is one of the most interesting angles on the recent announcement by Amazon AWS that they now support (in limited beta) the ability to load data from storage that is physically shipped to them. Have a look at this announcement and you’ll notice that it spends more time describing a logistical process (how to pack, how to ship…) than technical interfaces (storage device requirements, how to create a manifest…). It is still part of the “AWS Developer Guide” but clearly these instructions are not just for developers.

Many more such processes need to be “standardized” (or at least documented) for companies to efficiently be able to use public Clouds (and to some extent even private Clouds). Let’s take SLAs as an example. It sounds good when a Cloud provider says “we offer SLAs”. But what does it mean? Does it mean “we advertise some SLA numbers, you’re responsible for contacting us (trough a phone number hidden somewhere on our site) when you think we’ve violated them; if we agree with your measurements then you may get a check in the mail at some point in the future”? Not so useful. If, on the other hand, there is a clear definition of the metric that the SLA applies to, a clear definition of how it gets measured (do we trust provider performance reports, customer measurement, a third party monitor…), a clear process to claim refund, a clear process to actually provide the refund (credit for future service or direct payment, when/how is the payment made…), then it becomes more useful.

I picked the SLA enforcement example because it happens to be an area that the TMF (TeleManagement Forum) has made partially available as a teaser for its eTOM business process framework (aimed at telco providers). The full list of eTOM processes is only available to paying subscribers. One of the goals of the eTOM process framework is “to simplify procurement, serving as a common language between service providers and suppliers”. Another way to say it is that eTOM “recognizes that the enterprise interacts with external parties, and that the enterprise may need to interact with process flows defined by external parties, as in ebusiness interactions”. Exactly what we are talking about when it comes to making public Clouds easily consumable by enterprises. SLA management is just one small part of the overall eTOM framework (if you look for it in this eTOM overview poster it’s in purple, under “assurance”, in the first row).

My point is not to assert that Cloud providers should adopt eTOM. Nobody adopts eTOM directly as a blueprint anyway. But, while the cultures and maturity levels are sometimes different, it is also hard to argue that Cloud providers have nothing to learn form telco providers (many of which are becoming Cloud providers themselves). I shudder at the idea of AT&T teaching another company how to handle customer service, but have you ever tried to call Google?

Readers of this blog are likely to be more familiar with ITIL than eTOM (who, incidentally, incorporates parts of ITIL in its latest version, 8.0). For those who don’t know about either, one way to think about it is that Cloud providers would implement processes that look somewhat like eTOM processes, that Cloud consumers implement IT management processes that follow to some extent ITIL best practices and that these two sets of processes need to meet for public Clouds to work. I touched on this a few months ago, when I commented on the incorporation of Cloud services in an IT service catalog.

My main point is not about ITIL or eTOM. It’s simply that there are important process aspects to delivering/consuming Cloud services and that they have so far been overshadowed by the technical aspects. The processes sketched in the AWS import/export capability represent the first drop of an upcoming shower.

[UPDATED 2009/5/22: More on telcos becoming Cloud providers from EMC’s Chuck Hollis, with a retort by James Governor. Just listing these as FYI but my main point in this post is not about telcos, it’s about the need to clarify processes, independently of whether the provider is Amazon or AT&T. It’s just that the telcos have been working on such process standardization for a long time. Hoff provides another example of where process standardization is needed in Cloud relationships: right to audit.]

2 Comments

Filed under Amazon, Application Mgmt, Business Process, Cloud Computing, Everything, IT Systems Mgmt, ITIL, Standards, Utility computing

IT automation: the seven roads to management middleware

You can call it a “Cloud operating system”, an “adaptive infrastructure framework” or simply “IT management middleware” (my vote) as you prefer. It’s the software that underpins the automation engine of your Cloud. You can’t have a Cloud without an automation engine, unless you live in a country where IT admins run really fast, never push the wrong button, never plug a cable in the wrong port, can interpret blinking lights at a rate of 9,600 bauds and are very cheap. The automation engine is what technically makes a Cloud. That engine is an application whose business is to know what needs to be done to maintain the IT environment you use in a state that is acceptable to you at any point in time (where you definition of “acceptable” can evolve). Like any application, you want to keep its business logic neatly isolated from the mundane tasks that it relies on. These mundane tasks include things like:

  • collecting events and delivered them to the right place
  • collecting metrics of the different IT elements
  • discovering available resources and accessing them (with or without agents)
  • performing coordinated actions on IT elements
  • maintaining an audit of management actions
  • securing the management interactions
  • managing long-running tasks and processes
  • etc

That’s what management middleware does. It doesn’t automate anything by itself, but it provides an environment in which it is feasible to implement automation. This middleware is useful even if you don’t automate anything, but it often doesn’t get called out in that scenario. On the other hand, automation means capturing more business logic in software which makes it imperative to clearly layer concerns, at which point the IT management middleware can be more clearly identified within the overall IT management infrastructure.

This is happening in many different ways. I can count seven roads to IT management middleware, seven ways in which it is emerging as an identifiable actor in data centers. Each road represents a different history and comes with different assumptions and mindsets. And yet, they go after the same base problem of enabling IT management automation. Here is a quick overview of these seven roads.

Road #1: “these scripts have to grow up”

This road starts from all the scripts common in IT operations and matures them. It’s based on the realization that they are crucial business assets, just like the applications that they support. And that they implement reusable patterns. Alex Honor described it well here. Puppet and Powershell are in this category.

Road #2: “it’s just another integration job”

We’ve been doing computer integration almost since the second piece of software was written. There are plenty of mechanisms available to do so. IT management is just another integration problem, so let’s present it in a way that allows us to use our favorite integration tools on it. That’s the driver behind the use of Web Services for management integration (e.g. WSDM/WS-Management): create interfaces to manageable resources so that existing middleware (mostly J2EE application servers, along with their WS stack) can be used to solve the “enterprise IT management” integration problems in a robust and reliable way. The same logic is behind the current wave of REST-based IT management efforts (see this presentation). REST is a good integration approach, so let’s turn IT resources into RESTful resources so we can apply this generic integration mechanism to enterprise IT management. Different tool, but same logical approach. Which is why they can be easily compared.

Road #3: “top-down”

This is the “high road”, the one most intellectually satisfying and most promising in the long term. But also the one with this highest hurdle off the gate. In this approach, you create a top-down model of your system and you try to mediate management actions through this model. But for this to be practical, you need to hit the sweetspot in many dimensions. You need composable sub-models at a level of granularity that makes them maintainable. You need to force enough uniformity but not so much as to loose all optimizations. You need to decide which of the myriads of configuration variables you include in your model. Because you can’t take the traditional approach of “I’ll model it and display it to my user who can decide what to do with it”. Because the user now is a piece of software and it can’t make a judgment of whether it is ok if parameter foo differs from the desires state or not. This has been worked on for a long time (remember HP’s UDC?) with steady but slow progress. Elastra has some of the most interesting technology there, and a healthy dose of realism and opportunism to make it work.

Think of it as SCA component/composites but not just for software artifacts. Rather, it’s SCA for all IT elements, with wires and policies that are just rich enough to allow meaningful optimization but not too rich to be unmanageable. If you can pull off such model-driven IT management middleware, then the automation code almost writes itself on top of it.

Road #4: “management integration is another feature of our management console”

That was the road followed by the Big Four. Buy enough of their products (CMDBs, network management console, operations console, service desk, etc…) and you’ll get APIs that allow you to leverage their discovery, collection, eventing and process management features. So you can write your automation on top. At least on paper. In reality, these APIs are too inconsistent and import/export-oriented to really support SOA-style (or REST-style) integration, even though they usually have a SOAP and/or plain HTTP option available. It’s a challenge just to get point to point integration between these products, even more a true set of management services that can be orchestrated. These vendors know it and rather than turning their product suites into a real SOI (Service Oriented Infrastructure) they have decided to build/buy automation engines on the side that can be hard-wired with the existing IT management products. That’s your IT management middleware but it comes bundled with the automation engine rather than as an independent layer.

Road #5: “management integration is another feature of our hypervisor”

If the virtual machine (in the x86 virtualization sense of the term, a.k.a. a fake machine) is the basic building block of your IT infrastructure then hypervisor interfaces to manipulate these VMs are pretty much all you need in terms of middleware to build data center automation on top, right? Are we done then? Not really, since there is a lot more to an application than the VMs on which it runs. Still, hypervisors bring the potential of automation to what used to be a hardware domain and as such play a big part in the composition of the IT management middleware of modern data centers.

Road #6: “make it all the same”

From what I understand about how Yahoo, Google (see section 2.1 “System Health Infrastructure”), Microsoft (see the “device manager” and “collection service” parts of Autopilot) and others run their Web applications, they have put a lot of work in that management middleware and have made simplicity a key design goal for it. To that end, they are  willing to accept drastic limitations at both ends of the IT infrastructure chain: at the bottom, they actively limit the heterogeneity of resources in the data center. At the top, they limit the capabilities exposed to the business applications. In an extreme scenario, all servers are the same and all the business applications are written to a few execution/persistence/communication environments (think GAE SDK as an example). Even if you only approximate this ideal, it’s a dramatic simplification that makes your IT management middleware much simpler and thinner.

Road #7: “it’s the Grid”

The Grid computing and HPC (High Performance Computing) communities have long been active in this area. There is a lot of relevant expertise in all the Grid work, but we also need to understand the difference between IT management middleware and the Grid infrastructure as defined by OGSI. OGSI defines a virtualization layer on which to build applications. It doesn’t define how to deploy, manage and configure the physical datacenter infrastructure that allows OGSI interfaces to be exposed to consumers. With regards to HPC, we also need to keep in mind that the profile of the applications is very different from your typical enterprise application (especially the user-driven apps as opposed to batch jobs). In HPC environments, CPUs can run at full capacity for days and new requests just go in a queue. The Web applications of your typical enterprise don’t have this luxury and usually need spare capacity.

All these approaches can complement each other and I am not trying to pin each product/vendor to just one approach. In this post (motivated by this podcast), Stu Charlton discusses the overlap and differences between some of these approaches. Rather than a taxonomy of products, this list of seven roads to IT management middleware is simply a cultural history, a reading guide to understand the background, vocabulary and assumptions of the different solutions. This list cuts across the declarative versus procedural debate (#1 is clearly procedural, #3 is clearly declarative, the others could go either way).

[UPDATED 2009/6/23: Stu has a somewhat related (similary structured but much more entertainingly writen) list of Cloud Computing approaches. I feel good that I have one more item in my list than him.]

3 Comments

Filed under Application Mgmt, Automation, Cloud Computing, Everything, Grid, IT Systems Mgmt, Mgmt integration, Middleware, Utility computing

The law of conservation of hype

To the various conservation laws from physics (e.g. of energy and of momentum), one can add the law of conservation of hype. In the IT industry, as in others, there is only so much bandwidth for over-hyped concepts. Old ones have to move out of the limelight to make room for new ones, independently of their usefulness.

Here is this law, I think, illustrated in action. After running a Google Trends report on “web services”, “SOA”, “virtualization” and “cloud computing”, I downloaded the underlying data and added one line: the total search volume across all four terms. Here is the result:

SOA/WS/Cloud/Virtualization search volume trend

The black line, “total”, is remarkably flat (if you ignore the annual Christmas-time drop). There is a surge in late 2007 for both WS and SOA that I can’t really link to anything (Microsoft first announced Oslo around that time, but I doubt this explains it). Other than this, there is a nice continuity that seems to graphicaly support the following narrative:

Web services were the hot thing in the beginning of the decade among people who sell and buy corporate IT systems. Then the cool kids decided that Web services were just an implementation technology but what matters is the underlying pattern. So “SOA” became the word to go after. Just ask Sys-con: exit “Web Services Journal”, hello “SOA World magazine”. Meanwhile “virtualization” has been slowly growing and suddenly came Cloud computing. These two are largely an orthogonal concern from the SOA/WS pair but it doesn’t matter. Since they interest the same people, the law of conservation of hype demands that room be made. So down goes SOA.

The bottom line (and the reason why I ran these queries on Google Trends to start with) is that I feel that application integration and architecture concerns have been pushed out of the limelight by Cloud computing, but that important work is still going on there (some definition work and a lot of implementation work). Work that in fact will become critical when Cloud computing grows out of its VM-centric adolescent phase. I plan to write more entries about this connection (between Cloud computing and application architecture) in the future.

[Side note: I also put this post in the crazyStats category because I understand that by carefully picking the terms you include you can show any trend you want for the “total”. My real point is not about proving “the law of conservaton of hype” (though I believe in it). Rather, it is captured in the previous paragraph.]

5 Comments

Filed under Cloud Computing, CrazyStats, Everything, SOA, Virtualization

Hyperic joins SpringSource

SpringSource’s Rod Johnson tells us today that his company just bought Hyperic. The press release is a bit more specific, announcing that SpringSource acquired “substantially all of the assets of Hyperic”, which sounds different from acquiring the company itself. Maybe not for SpringSource customers, but possibly for current Hyperic customers (and investors). Acquiring the assets of an open source company may sound like a bit of an oxymoron (though I understand it’s not just about the source code), but Hyperic is what’s called an “open core” company, which means not all the code is open source (see Tarus’ take on it). But the main difference between this and forking might be that you are getting the key employees; who are nice enough with their investors do to it in an orderly way.

Anyway, this is not a business or HR blog, it’s about the technology. And on that front, this looks like an interesting way for SpringSource to expand their monitoring from just the application down into some parts of the infrastructure, at least to some extent. SpringSource’s AMS (Application Management Suite) was already based on Hyperic, so the integration headaches should be minimal. And Hyperic has been doing some Cloud monitoring work too (see this podcast if you want to learn more about it), which if nothing else is PR gold these days (I am not saying it’s just that, but it is that for sure).

As a side note, it is ironic that Hyperic (which started inside Covalent until Javier Soltero spun it off and became its CEO) is now reunited with its mothership (SpringSource acquired Covalent last year).

I am a big proponent of management capabilities in application infrastructure. I applauded Rod Johnson for writing something along the same line last year and I am pleased to see him really push this approach with this acquisition.

Here are the questions that come to my mind when I read about this deal (keep in mind that this is competition from my perspective, so feel free to “question my questions” as you read):

I was going to ask whether this acquisition means that Hyperic users who don’t care for Spring are going to see diminishing value as the product becomes more tied to Spring. But if you look at what Hyperic gives you on the resources it manages, it’s mainly a list of metrics and a few control operations. These will still be there because they’ll be needed for the Spring-centric view anyway. It would be more of a question if Hyperic had advanced discovery features (e.g. examine all the config files of the managed resources and extract infrastructure topology from them). I would wonder if these would still be maintained/improved for non-Spring middleware. But again, not an issue here since I don’t think there is much of this in Hyperic today. And since presumably SpringSource made the acquisition in part to cover more resources types in their management offering (Rod talks about DB and VM management in his post), the list of supported infrastructure elements (OS, DB, VM, network…) will presumably grow rather than shrink. What may be trimmed down eventually is the list of application runtimes currently supported. If you’re a Hyperic/Coldfusion user you should probably attend the upcoming webcast to hear about the plans.

Still on the topic of Hyperic’s monitoring-only capabilities, it means that if Rod Johnson really wants to provide everything for Java developers to put “applications into production without the mediation of operations”, as he says, then he should keep his checkbook open (as a side note, if a developer puts “applications into production” then s/he doesn’t bypass operations but rather becomes operations; you may not think of yourself as one, but if you’re the one who gets called when the application crashes then you are in “operations”). SpringSource is still a long way from offering the complete picture. Here are my guesses for the management features on Rod’s grocery list:

  • configuration management -many potential acquisition candidates
  • in depth database management (going beyond the “you want metrics? we’ve got metrics!” approach to DB management) – fewer candidates

As far as in-house developement, I would expect this acquisition to first yield some auto-discovery of application (and infrastructure) topology in a Spring environment. Then they’ll have to decide if they want to double-down on Cloud support and build/buy more automation features or rather focus on application-centric management and join the fray of BTM / transaction tracing. Doing both at the same time would be very ambitious. This Register article seems to imply the former (Cloud) but my guess is that SpringSource will make the smart choice of focusing on the latter (application-centric management). I see in the Register that, “Peter Cooper-Ellis, SpringSource’s senior vice president of engineering and product management called management of the cloud and virtualized datacenters a strategic driver for the deal”. But this sounds more like telling a buzzword-hungry reporter what he wants to hear rather than actual strategy to me. We’ll see. I hope this acquisition and its follow-through will help move the industry in the right direction of application-centric management, something that will take more than one company.

[UPDATED 2009/5/7: A nice article on the acquisition by Charles Humble at InfoQ. Though I have to take issue with the assertion that “many aspects of monitoring that are essential in a data centre, such as OS and network monitoring, are irrelevant in the context of the cloud”.]

[UPDATED 2009/6/23: Via Coté, an announcement that shows that the Cloud angle might have more post-aquisition juice than I expected. Unless this thing coasted on momentum alone.]

3 Comments

Filed under Application Mgmt, Business, Cloud Computing, Everything, IT Systems Mgmt, Manageability, Middleware, Open source, Spring, Utility computing

Cloud API: what’s cooking between IBM and VMWare?

In the previous entry, I declared that I had a “guess as to why [the DMTF Cloud] incubator was created without a submission”, that I may later reveal. Well here it is: VMWare and IBM are negotiating a joint Cloud API submission to DMTF and need more time before they can submit it.

This is 100% speculation on my part. It’s not even based on rumors or leaks. I made it up. Here are the data points that influenced me. You decide what they’re worth.

  • VMWare has at numerous time announced (comments here and here) that they would submit a vCloud API to DMTF in the first half of 2009.
  • In the transcript of this VMWare webcast we learn that an important part of the vCloud API is its adoption of REST as part of a move towards more abstraction and simplicity (“this is not simply proxy-ing of VIM APIs”).
  • IBM, meanwhile, has been trying to get a SOAP-based IT management framework for a while. Unsuccessfully so far. WSDM was a first failed attempt. The WS-Management/WSDM reconciliation was another one (I was in the same boat on both of these). The WS-RA working group at W3C (where the ashes of WS-RT are smoldering) could be where the third attempt springs from. But IBM is currently very quiet about their plans (compared to all the conference talks, PowerPoint slides and white papers that that heralded the previous two attempts). They obviously haven’t given up, but they are planning the next move. And the emergence of Cloud computing in the meantime is redefining the IT automation landscape in a way that they will make sure to incorporate in their updated standards plans.
  • Then comes the DMTF Cloud incubator of which the co-chairs are from VMWare and IBM (“interim” co-chairs in theory, but we know how these things go). Which seems to imply an agreement around a proposal (this is what the incubator process is explicitly designed for: “allow vendors aligned with a certain proposal to move forward and produce an interoperability specification”). But there is no associated specification submission, which suggest that the agreed-upon proposal is still being negotiated.

VMWare has a lot of momentum in a virtualization-focused view of IT automation (the predominant view right now, though I am not sure it will always be) and IBM sees them as the right partner for their third attempt (HP was the main partner in the first, Microsoft in the second). VMWare knows that they are going against Microsoft and they need IBM’s strength to control the standard. This could justify an alliance.

It seems pretty clear that VMWare has an API specification already (they supposedly even gave it to partners). It is also pretty clear that IBM would not agree to it in a wholesale way. For technical and pride reasons. They did it for OVF because it is a narrow specification, but a more comprehensive Cloud API would touch on a lot of aspects where IBM has set ideas and existing products. Here are some of the aspects that may be in contention.

REST versus WS-* – Yes, that old rathole. Having just moved to REST, the VMWare folks probably don’t feel like turning around. IBM has invested a lot in a WS-* approach over the years. It doesn’t mean that they won’t go with the REST approach, but it would take them some time to get over it. Lots of fellows and distinguished engineers would need to be convinced. There are some very REST-friendly parts in IBM (in Rational, in WebSphere) but Tivoli has seemed a lot less so to me. The worst outcome is if they offer both options. If you see this (or if you see XPath/XQuery expressions embedded inside URLs or HTTP headers), run for the escape hatches.

While REST versus WS-* is an easy one to grab on, I don’t think it’s the most important issue. Both parties are smart enough to realize it’s not that critical (it’s the model, not the protocol, that matters).

CBE/WEF – IBM has been trying to get a standard stamp on its Common Base Event format (CBE) forever. When they did (as WEF, the WSDM Event Format) it was in a simplified form (by yours truly, among others) and part of a standard that wasn’t widely adopted. But it’s still there in Tivoli and you can expect it to resurface in some form in their next proposal.

Software packaging – I am not sure what’s up with SDD, but whether it’s this specification or something else I would expect that IBM would have a lot to say about software packaging and patching. A lot more than VMWare probably cares about. Expect IBM’s fingerprints all over that part.

Security – I have criticized IBM many times for the “security considerations” boilerplate that they stick on every specification. But this in an area in which it actually make sense to have a very focused security analysis, something that IBM could do a lot better than VMWare I suspect.

ITSM / ITIL – In addition to the technical aspect of IT management operations, there are plenty of process and human aspects. Many areas of ITSM are applicable (e.g. I have written about the role of service catalogs, or you can think about the link to CMDBf). IBM has a lot more exposure there than VMWare.

Grid – IBM’s insistence to align Grid computing and IT management is one of the things that weighted WSDM down. Will they repeat this? In a way, Cloud computing *is* that junction of IT management and Grid that they were after with WSRF. But how much of the existing GGF Grid infrastructure are they going to try to accommodate? I don’t think they’ll be too rigid on this, but it’s worth watching.

Seeing how the topics above are handled in the VMWare/IBM proposal (if such a proposal ever materializes) will tell the alert readers a lot about the balance of power between VMWare and IBM.

As a side note, there are very smart people in the EMC CTO office (starting with the CTO himself and my friend Tom Maguire) who came from IBM and are veterans of the WSDM/WSRF/OGSI efforts. These people could play an interesting role in the IBM/VMWare relationship if the corporate arrangement between EMC and VMWare allows it (my guess is it doesn’t). Another interesting side note is to ask what Microsoft would do if indeed VMWare and IBM were dancing together on this. Microsoft is listed in the members of the DMTF Cloud incubator, but I notice a certain detachment in this post from Steve Martin. For now at least.

Did I mention that this is all pure speculation on my part? We’ll see what happens. Hopefully it’s at least entertaining. And even if I am wrong, the questions raised (around the links between previous IT management efforts and the new wave of Cloud standards) are relevant anyway. I am still in “lessons learned” mode on this.

[UPDATED 2009/5/5: Here is a first-hand source for the data point that VMWare plans to submit the vCloud API (rather than second-hand reports from reporters): Winsont Bumpus (VMWare’s Director of Standards Architecture) says that “VMware announced its intention to submit its key elements of the vCloud API to an existing standards organization for the basis of developing an industry standard”.]

1 Comment

Filed under Automation, Cloud Computing, DMTF, Everything, Grid, IBM, IT Systems Mgmt, Mgmt integration, OVF, SOAP, Specs, Standards, Utility computing, Virtualization, VMware

A pulp view of Cloud computing politics

As promised, here are some more thoughts on the creation by DMTF of an incubator for Cloud standards. The first part of this entry asks whether DMTF will play nicely with the other kids in the playground. The second part examines the choice of the “incubator” process in DMTF for this work.

Sharing the sandbox with the other kids

In other words, will the DMTF seek collaboration with other standards bodies, as well as less-structured organizations (the different Cloud forums and interest groups out there) and other communities (e.g. open source projects). The short answer is “no”, for reasons explained below.

The main reason is that companies don’t have the same level of influence in all organizations. Unless you’re IBM, who goes in force pretty much everywhere, you place your bets. If you are very influential in organization A but not in B, then the choice of whether a given piece of work happens in A or B decides the amount of influence you’ll have on it. That’s very concrete. When companies see it that way, the public-facing discussions about the “core competencies” of the different organizations is just hand-waving that has little actual weight in the decision. Just like plaintiffs pick friendly jurisdictions to press charge (e.g. East Texas for patent holders), companies try to choose the standard organization they want the game to be played in. As a result, companies influential in the DMTF want the DMTF to do the work and companies influential in other organizations would rather have the other organization. Since by definition those influential companies make the will of the organizations, you see organizations always trying to grow to cover more ground. For example, VMWare has invested quite a lot in DMTF. I don’t know if they are even members of OGF (at least they are not organizational members) so it makes a huge difference to them. Sure they could just as well ramp up in OGF. But at a cost.

That’s a general rule that apply to DMTF like others. But collaboration is especially hard for DMTF because it is on the “opaque” side of the openess scale (e.g. compare it to OASIS, W3C and OGF which have large amounts of publicly-accessible working documents and mailing list archives). It’s hard to collaborate if the others can’t even see what you’re doing.

But, you may ask, doesn’t the Cloud incubator charter list “Work register(s) with appropriate alliance partners” as a deliverable, and aren’t “work registers” what DMTF calls its collaboration agreements with other organizations? Surely they are taking this collaboration to heart, aren’t they? Let me tell you a story.

Once upon a time, there was a work register in place between DMTF and the OASIS WSDM technical committee which said things like “OASIS web service standardization for resource sharing and provisioning will be cross-leveraged in DMTF’s CIM and WBEM standards” and “recommendations related to management of and management using web services will be submitted to OASIS”. Then Microsoft submitted WS-Management, a replacement for WSDM, to DMTF and DMTF used the work register as a doormat.

Don’t get me wrong though. I do believe that Cloud standards are closely related to IT management automation and that the DMTF has a central role to play there. I am not arguing against DMTF’s attempt to tackle this. I am just doing a reality check on the prospect of open and meaningful collaboration with other organizations.

OGF is not standing still and has also staked its claim to the Cloud (also focusing on the IaaS form of Cloud computing): it’s called OCCI for Open Cloud Computing Interface and will share its documents here. OGF and DMTF have long had a work register too (it includes an eerily familiar sounding sentence, “Grid technology will be cross-leveraged in the DMTF’s CIM and WBEM standards”). Looks like it is going to endure its first stress test.

As for the less structured Cloud gatherings (like CCIF), they’ll be welcome as long as they play the cheerleader role (“If this group forms a Cloud trade association, I can see us establishing an alliance with the DMTF to coordinate the messaging and driving adoption of the DMTF standards”) or are happy providing feedback into a black hole (“DMTF already has a process for providing feedback: http://www.dmtf.org/standards/feedback/ so no additional legal agreements need be made for community members to provide their input”). These are from Mark Carlson, the DMTF VP of Alliances, in a thread about the incubator announcement on the CCIF mailing list. BTW, Mark is a very fair-minded person and an ardent promoter of collaboration (disclosure: he once gave me a ride in a cool Volvo convertible to the Martha’s Vineyard airport so I could catch my puddle-jumper back to Boston, so I owe him). It’s not him personally, it’s the DMTF that is so tightfisted.

The use of the “incubator” process

This second part is for standards junkies and other process wonks who run their family dinners by Robert’s rules of order. Normal people should feel free to move on.

I am not at all surprised to see the incubator process being used here, but I am surprised to see it used in the absence of a submitted specification. I expected VMWare to submit a vCloud API document to this group. What’s a rubber stamp for if you don’t have a piece of paper to stamp with it?

I have my guess as to why this incubator was created without a submission, but that’s a topic for a future post (a good soap opera writer knows to pace the drama).

In any case, this leaves us in an interesting situation. The incubator process document (DSP 4008) itself says that “the purpose of this is to allow vendors aligned with a certain proposal to move forward and produce an interoperability specification without being blocked by those who would prefer a different proposal”. What’s the “proposal” that members of this incubator align with? That Cloud computing is important? Not something that too many people would dispute at this time.

This has interesting repercussions from a process standpoint. The incubator process pushes you towards an informational specification that is then sent to a new working group for quick ratification. The quick ratification is, in effect, the reward for doing the work in the incubator rather than in private. But this Cloud incubator is currently chartered to produce proposed changes to OVF and other DMTF standard (rather than a new specification). Say it does that, what happens to the proposed changes then? Presumably they are sent to the working groups that own the original specifications, but what directives do these groups get from the board? Are they expected to roll over and alter their specifications as demanded by the Cloud incubator? Or do these changes come as comments like any other, for the groups to handle however they sees fit?

Take a concrete example. Oracle, BMC, CA and Fujitsu are very involved in the DMTF CMDBf working group but not (that I can see) in the incubator. If the Cloud incubator comes up with changes needed in CMDBf for Cloud usage, are these companies supposed to accept the changes even if they are disruptive to the original goals of the CMDBf specification? Same goes for WS-Management and even OVF. It’s one thing for an incubator to produce its own specification, it is another entirely to go and try to change someone else’s work. Presumably this wouldn’t stand (or would it?).

The lack of a submission to this incubator may end up creating a lot of argument about the interpretation of DSP 4008. For one thing, the DSP is not precise about when a submission to an incubator can take place. Since an incubator is meant to assemble people who agree with a given proposal, you’d expect that the proposal would be there at the start (so people can self-select and only join if they buy into it). But this is not explicit in the process.

The more Cloud API standardization unfolds, the more it looks like the previous attempt.

[UPDATED 2009/5/5: I just saw that Winston Bumpus has been blogging recently on the VMWare exec blog. Hopefully he will soon have his own feed for those of us interested in Cloud standards, an area in which he is a major actor. In this entry he describes his view of the DMTF incubator process. It doesn’t really align with my reading of the incubator process document though. Winston sees it as “a place for ideas to be developed or incubate before specifications are created”, while I see the process as geared towards work that starts from an existing submission. In any case, what really matters is less what the process says than how it is used, and so far it seems that it is being used as Winston describes.]

4 Comments

Filed under Cloud Computing, CMDBf, DMTF, Everything, Grid, IT Systems Mgmt, Mgmt integration, Standards, Utility computing, VMware

DMTF calls the ball on Cloud standards

To no surprise to industry watchers (and especially the small subset of them who read this blog), the DMTF has announced today (warning, PDF) that they are creating their very first “incubator” group and it is chartered with standardizing deployment, management and portability of Cloud systems. You’ve probably skipped it at the time (you’re forgiven), but you may now be motivated to go back and read this short analysis of the DMTF incubator process. And now you know why I bothered to look into this never-used two-year old process. Since it was DMTF-internal information, I couldn’t at the time explain that my motivation was the preparations under way for this Cloud computing incubator.

Since the press release talks about Cloud compatibility and since I am obviously in very self-referencing mood today, I have to point to this “reality check on Cloud portability” for a historical perspective.

Three things to notice in the charter (warning, PDF) of the incubator:

First and foremost, it explicitly takes a very IaaS-centric view of Cloud computing. And within that, a very VM-driven view. VMWare could have written it…

“Virtualization technology and the evolution from software packages that can be created and deployed as a collection of virtual images is becoming the primary focus for delivering and managing software solutions into enterprise customers today”. I guess the “is becoming” formulation provides enough wiggle room (interesting rhetorical twist that lets you make a prognostic and yet use the present tense) that one can’t really call them on it and ask how many enterprise software systems are actually delivered and managed as virtual machines today (see my colleague Adam’s view of what it will take).

Let’s next look at the description of the deliverables:

Cloud taxonomy:
– Terms and definitions
Cloud Interoperability whitepaper
Informational specifications:
– Proposed OVF changes for cloud usage
– Proposed Profiles  for management of resources exposed by a cloud
– Proposed changes to other DMTF standards
Requirements for trust for cloud resource management.
Work register(s) with appropriate alliance partners (See below)

We find the requisite “cloud taxonomy” (all the blog chatter about this a few months ago died without producing much alignment beyond the good old “IaaS, PaaS and SaaS”, or did I miss something). The interesting aspect to notice is the lack of new specification in the list. Just adjustments to the current ones (including OVF) and some profiling on top. I guess we are much closer to Cloud interoperability and portability than I thought! And the lessons form the past have been learned.

The third thing to notice is the name of the “interim co-chairs”. Who happen to be from VMWare and IBM. Who also happen to be the DMTF President and DMTF Chairman. In case you had any doubt, this is very high profile in DMTF. Especially for something that’s theoretically only an “incubator”. It may just be an egg, but there is a baby T-Rex in it.

Who’s missing in the party? Two groups of people. First, DMTF members who chose not to join (Oracle, CA, BMC…). And more importantly, the non-DMTF members who may nevertheless have a few ideas about Clouds: Google, Amazon, Salesforce and all the small Cloud pure-plays. You know, the kind of people who publish their docs in HTML rather than just PDF.

[Note: this is a quick first take written over lunch. More thoughts about the choice of the “incubator process” and the prospects for collaboration with other standards groups to follow, maybe as soon as tonight. — UPDATE: done]

3 Comments

Filed under Cloud Computing, DMTF, Everything, IT Systems Mgmt, OVF, Portability, Standards, Utility computing, Virtualization, VMware

A post-mortem on the previous IT management revolution

Before rushing to standardize “Cloud APIs”, let’s take a look back at the previous attempt to tackle the same problem, which is one of IT management integration and automation. I am referring to the definition of specifications that attempted to use the then-emerging SOAP-based Web services framework to easily integrate IT management systems and their targets.

Leaving aside the “Cloud” spin of today and the “Web services” frenzy of yesterday, the underlying problem remains to provide IT services (mostly applications) in a way that offers the best balance of performance, availability, security and economy. Concretely, it is about being able to deploy whatever IT infrastructure and application bits need to be deployed, configure them and take any required ongoing action (patch, update, scale up/down, optimize…) to keep them humming so customers don’t notice anything bothersome and you don’t break any regulation. Or rather so that any disruption a customer sees and any mandate you violate cost you less than it would have cost to avoid them.

The realization that IT systems are moving more and more towards distributed/connected applications was the primary reason that pushed us towards the definition of Web services protocols geared towards management interactions. By providing a uniform and network-friendly interface, we hoped to make it convenient to integrate management tasks vertically (between layers of the IT stack) and horizontally (across distributed applications). The latter is why we focused so much on managing new entities such as Web services, their execution environments and their conversations. I’ll refer you to the WSMF submission that my HP colleagues and I made to OASIS in 2003 for the first consistent definition of such a management framework. The overview white paper even has a use case called “management as a service” if you’re still not convinced of the alignment with today’s Cloud-talk.

Of course there are some differences between Web service management protocols and Cloud APIs. Virtualization capabilities are more advanced than when the WS effort started. The prospect of using hosted resources is more realistic (though still unproven as a mainstream business practice). Open source component are expected to play a larger role. But none of these considerations fundamentally changes the task at hand.

Let’s start with a quick round-up and update on the most relevant efforts and their status.

Protocols

WSMF (Web Services Management Framework): an HP-created set of specifications, submitted to the OASIS WSDM working group (see below). Was subsumed into WSDM. Not only a protocol BTW, it includes a basic model for Web services-related artifacts.

WS-Manageability: An IBM-led alternative to parts of WSDM, also submitted to OASIS WSDM.

WSDM (Web Services Distributed Management): An OASIS technical committee. Produced two standards (a protocol, “Management Using Web Services” and a model of Web services, “Management Of Web Services”). Makes use of WSRF (see below). Saw a few implementations but never achieved real adoption.

OGSI (Open Grid Services Infrastructure): A GGF (the organization now known as OGF) standard to provide a service-oriented resource manipulation infrastructure for Grid computing. Replaced with WSRF.

WSRF: An OASIS technical committee which produced several standards (the main one is WS-ResourceProperties). Started as an attempt to align the GGF/OGSI approach to resource access with the IT management approach (represented by WSDM). Saw some adoption and is currently quietly in use under the cover in the GGF/OGF space. Basically replaced OGSI but didn’t make it in the IT management world because its vehicle there, WSDM, didn’t.

WS-Management: A DMTF standard, based on a Microsoft-led submission. Similar to WSDM in many ways. Won the adoption battle with it. Based on WS-Transfer and WS-Enumeration.

WS-ResourceTransfer (aka WS-RT): An attempt to reconcile the underlying foundations of WSDM and WS-Management. Stalled as a private effort (IBM, Microsoft, HP, Intel). Was later submitted to the W3C WS-RA working group (see below).

WSRA (Web Services Resource Access): A W3C working group created to standardize the specifications that WS-Management is built on (WS-Transfer etc) and to add features to them in the form of WS-RT (which was also submitted there, in order to be finalized). This is (presumably) the last attempt at standardizing a SOAP-based access framework for distributed resources. Whether the window of opportunity to do so is still open is unclear. Work is ongoing.

WS-ResourceCatalog : A discovery helper companion specification to WS-Management. Started as a Microsoft document, went through the “WSDM/WS-Management reconciliation” effort, emerged as a new specification that was submitted to DMTF in May 2007. Not heard of since.

CMDBf (Configuration Management Database Federation): A DMTF working group (and soon to be standard) that mainly defines a SOAP-based protocol to query repositories of configuration information. Not linked with (or dependent on) any of the specifications listed above (it is debatable whether it belongs in this list or is part of a new breed).

Modeling

DCML (Data Center Markup Language): The first comprehensive effort to model key elements of a data center, their relationships and their policies. Led by EDS and Opsware. Never managed to attract the major management vendors. Transitioned to an OASIS member section and died of being ignored.

SDM (System Definition Model): A Microsoft specification to model an IT system in a way that includes constraints and validation, with the goal of improving automation and better linking the different phases of the application lifecycle. Was the starting point for SML.

SML (Service Modeling Language): Currently a W3C “proposed recommendation” (soon to be a recommendation, I assume) with the same goals as SDM. It was created, starting from SDM, by a consortium of companies that eventually submitted it to W3C. No known adoption other than the Eclipse COSMOS project (Microsoft was supposed to use it, but there hasn’t been any news on that front for a while). Technically, it is a combination of XSD and Schematron. It appears dead, unless it turns out that Microsoft is indeed using it (I don’t know whether System Center is still using SDM, whether they are adopting SML, whether they are moving towards M or whether they have given up on the model-centric vision).

CML (Common Model Library): An effort by the SML authors to create a set of model elements using the SML metamodel. Appears to be dead (no news in a long time and the cml-project.org domain name that was used seems abandoned).

SDD (Solution Deployment Descriptor): An OASIS standard to define a packaging mechanism meant to simplify the deployment and configuration of software units. It is to an application archive what OVF is to a virtual disk. Little adoption that I know of, but maybe I have a blind spot on this.

OVF (Open Virtualization Format): A recently released DMTF standard. Defines a packaging and descriptor format to distribute virtual machines. It does not defined a common virtual machine format, but a wrapper around it. Seems to have some momentum. Like CMDBf, it may be best thought of as part of a new breed than directly associated with WS-Management and friends.

This is not an exhaustive list. I have left aside the eventing aspects (WS-Notification, WS-Eventing, WS-EventNotification) because while relevant it is larger discussion and this entry to too long already (see here and here for some updates from late last year on the eventing front). It also does not cover the Grid work (other than OGSI/WSRF to the extent that they intersect with the IT management world), even though a lot of the work that took place there is just as relevant to Cloud computing as the IT management work listed above. Especially CDDLM/CDL an abandoned effort to port SmartFrog to the then-hot XML standards, from which there are plenty of relevant lessons to extract.

The lessons

What does this inventory tell us that’s relevant to future Cloud API standardization work? The first lesson is that protocols are easy and models are hard. WS-Management and WSDM technically get the job done. CMDBf will be a good query language. But none of the model-related efforts listed above seem to have hit the mark of “doing the job”. With the possible exception of OVF which is promising (though the current expectations on it are often beyond what it really delivers). In general, the more focused and narrow a modeling effort is, the more successful it seems to be (with OVF as the most focused of the list and CML as the other extreme). That’s lesson learned number two: models that encompass a wide range of systems are attractive, but impossible to deliver. Models that focus on a small sub-area are the way to go. The question is whether these specialized models can at least share a common metamodel or other base building blocks (a type system, a serialization, a relationship model, a constraint mechanism, etc), which would make life easier for orchestrators. SML tries (tried?) to be all that, with no luck. RDF could be all that, but hasn’t managed to get noticed in this context. The OVF and SDD examples seems to point out that the best we’ll get is XML as a shared foundation (a type system and a serialization). At this point, I am ready to throw the towel on achieving more modeling uniformity than XML provides, and ready to do the needed transformations in code instead. At least until the next window of opportunity arrives.

I wish that rather than being 80% protocols and 20% models, the effort in the WS-based wave of IT management standards had been the other way around. So we’d have a bit more to show for our work, for example a clear, complete and useful way to capture the operational configuration of application delivery services (VPN, cache, SSL, compression, DoS protection…). Even if the actual specification turns out to not make it, its content should be able to inform its successor (in the same way that even if you don’t use CIM to model your server it is interesting to see what attributes CIM has for a server).

It’s less true with protocols. Either you use them (and they’re very valuable) or you don’t (and they’re largely irrelevant). They don’t capture domain knowledge that’s intrinsically valuable. What value does WSDM provide, for example, now that’s it’s collecting dust? How much will the experience inform its successor (other than trying to avoid the WS-Addressing disaster)? The trend today seems to be that a more direct use of HTTP (“REST”) will replace these protocols. Sure. Fine. But anyone who expects this break from the past to be a vaccination against past problems is in for a nasty surprise. Because, and I am repeating myself, it’s the model, stupid. Not the protocol. Something I (hopefully) explained in my comments on the Sun Cloud API (before I knew that caring about this API might actually become part of my day job) and something on which I’ll come back in a future post.

Another lesson is the need for clear use cases. Yes, it feels silly to utter such an obvious statement. But trust me, standards groups still haven’t gotten this. It’s not until years spent on WSDM and then WS-Management that I realized that most people were not going after management integration, as I was, but rather manageability. Where “manageability” is concerned with discovering and monitoring individual resources, while “management integration” is concerned with providing a systematic view of the environment, with automation as the goal. In other words, manageability standards can allow you to get a traditional IT management console without the need for agents. Management integration standards can allow you to coordinate your management systems and automate their orchestration. WS-Management is for manageability. CMDBf is in the management integration category. Many of the (very respectful and civilized) head-butting sessions I engaged in during the WSDM effort can be traced back to the difference between these two sets of use cases. And there is plenty of room for such disconnect in the so-loosely-defined “Cloud” world.

We have also learned (or re-learned) that arbitrary non-backward compatible versioning, e.g. for political or procedural reasons as with WS-Addressing, is a crime. XML namespaces (of the XSD and WSDL types, as well as URIs used in similar ways in specifications, e.g. to identify a dialect or profile) are tricky, because they don’t have backward compatibility metadata and because of the practice to use organizations domain names in the URI (as opposed to specification-specific names that can be easily transferred, e.g. cmdbf.org versus dmtf.org/cmdbf). In the WS-based management world, we inherited these problems at the protocol level from the generic WS stack. Our hands are more or less clean, but only because we didn’t have enough success/longevity to generate our own versioning problems, at the model level. But those would have been there had these models been able to see the light of day (CML) or see adoption (DCML).

There are also practical lessons that can be learned about the tactics and strategies of the main players. Because it looks like they may not change very much, as corporations or even as individuals. Karla Norsworthy speaks for IBM on Cloud interoperability standards in this article. Andrew Layman represented Microsoft in the post-Manifestogate Cloud patch-up meeting in New York. Winston Bumpus is driving the standards strategy at VMWare. These are all veterans of the WS-Management, WSDM and related wars collaborations (and more generally the whole WS-* effort for the first two). For the details of what there is to learn from the past in that area, you’ll have to corner me in a hotel bar and buy me a few drinks though. I am pretty sure you’d get your money’s worth (I am not a heavy drinker)…

In summary, here are my recommendations for standardizing Cloud API, based on lessons from the Web services management effort. The theme is “focus on domain models”. The line items:

  • Have clear goals for each effort. E.g. is your use case to deploy and run an existing application in a Cloud-like automated environment, or is it to create new applications that efficiently take advantage of the added flexibility. Very different problems.
  • If you want to use OVF, then beef it up to better apply to Cloud situations, but keep it focused on VM packaging: don’t try to grow it into the complete model for the entire data center (e.g. a new DCML).
  • Complement OVF with similar specifications for other domains, like the application delivery systems listed above. Informally try to keep these different specifications consistent, but don’t over-engineer it by repeating the SML attempt. It is more important to have each specification map well to its domain of application than it is to have perfect consistency between them. Discrepancies can be bridged in code, or in a later incarnation.
  • As you segment by domain, as suggested in the previous two bullets, don’t segment the models any further within each domain. Handle configuration, installation and monitoring issues as a whole.
  • Don’t sweat the protocols. HTTP, plain old SOAP (don’t call it POS) or WS-* will meet your need. Pick one. You don’t have a scalability challenge as much as you have a model challenge so don’t get distracted here. If you use REST, do it in the mindset that Tim Bray describes: “If you’re going to do bits-on-the-wire, Why not use HTTP? And if you’re going to use HTTP, use it right. That’s all.” Not as something that needs to scale to Web scale or as a rebuff of WS-*.
  • Beware of versioning. Version for operational changes only, not organizational reasons. Provide metadata to assert and encourage backward compatibility.

This is not a recipe for the ideal result but it is what I see as practically achievable. And fault-tolerant, in the sense that the failure of one piece would not negate the value of the others. As much as I have constrained expectations for Cloud portability, I still want it to improve to the extent possible. If we can’t get a consistent RDF-based (or RDF-like in many ways) modeling framework, let’s at least apply ourselves to properly understanding and modeling the important areas.

In addition to these general lessons, there remains the question of what specific specifications will/should transition to the Cloud universe. Clearly not all of them, since not all of them even made it in the “regular” IT management world for which they were designed. How many then? Not surprisingly (since IBM had a big role in most of them), Karla Norsworthy, in the interview mentioned above, asserts that “infrastructure as a service, or virtualization as a paradigm for deployment, is a situation where a lot of existing interoperability work that the industry has done will surely work to allow integration of services”. And just as unsurprisingly Amazon’s Adam Selipsky, who’s company has nothing to with the previous wave but finds itself in leadership position WRT to Cloud Computing is a lot more circumspect: “whether existing standards can be transferred to this case [of cloud computing] or if it’s a new topic is [too] early to say”. OVF is an obvious candidate. WS-Management is by far the most widely implemented of the bunch, so that gives it an edge too (it is apparently already in use for Cloud monitoring, according to this press release by an “innovation leader in automated network and systems monitoring software” that I had never heard of). Then there is the question of what IBM has in mind for WS-RT (and other specifications that the WS-RA working group is toiling on). If it’s not used as part of a Cloud API then I really don’t know what it will be used for. But selling it as such is going to be an uphill battle. CMDBf is a candidate too, as a model-neutral way to manage the configuration of a distributed system. But here I am, violating two of my own recommendations (“focus on models” and “don’t isolate config from other modeling aspects”). I guess it will take another pass to really learn…

[UPDATED 2009/5/7: Senior moment! When writing this entry I forgot that I wrote an earlier entry (in late 2007) specifically to describe the difference between “manageability” and “management integration”. So here it is, if you care for more details on this topic.]

5 Comments

Filed under Automation, Cloud Computing, Everything, IT Systems Mgmt, Manageability, Mgmt integration, Modeling, People, Portability, REST, SML, SOAP, Specs, Standards, Utility computing, Virtualization, WS-Management, WS-ResourceCatalog, WS-ResourceTransfer

Reality check on Cloud portability

SD Times recently published an interesting article about “cloud interoperability”. It has some well-informed opinions. But, like all Cloud-related discussions, it also suffers from mixing a bunch of things. The word “interoperability” is alternatively applied to the Cloud infrastructure services (in which case this “interoperability” is a way to provide application “portability”) and to the Cloud-hosted applications themselves.

Application-level interoperability (“look, my GAE-hosted app successfully sent an HTTP request to an Azure-hosted app, open the champagne”) is not very new or exciting anymore and is often used as an interoperability smokescreen (hello Salesforce.com). Many of these interop concerns are long solved and the others (like authentication and data migration) need to be solved in ways that don’t care whether the application is hosted in your Silicon Valley garage or near the Columbia river.

Cloud infrastructure compatibility (in other words application portability) is the more interesting discussion. I keep reading that it is needed (“no vendor lock-in, not ever again”) for enterprises to move to the Cloud. Being a natural-born cynic, I always ask myself whether those asking for it are naive (sometimes) or have ulterior motives (e.g. trying to catch-up with Amazon by entangling them in the standards net – some of my fellow cynics see the Open Cloud Manifesto as just this).

Because the reality is that, Manifesto or no Manifesto, you are not going to get application portability across IaaS-type Cloud providers. At least for production applications. Sorry. As a consolation prize, you may get some runtime portability such that we’ll be shown nice demos of prototype apps moving from one provider to another (either as applications or as virtual machines). Clap clap until you realize that they left behind their monitoring capabilities, or that their configuration rules don’t validate anything anymore. And that your printer ran out of red ink when printing the latest compliance report. Oops.

Maybe I am biased because they are both my friends and ex-colleagues, but the HP guys make the most sense in the SD Times article. Tim Hall has it right when he suggests “that the industry should focus on specific problems that it is going to solve around deployment and standardized monitoring”. And the other HP Tim, Mr. van Ash, rightly points out that we should “stop promising miracles”, which Forrester’s Jeffrey Hammond echoes, saying that there is a difference between a standard and “plug-and-play in reality”.

Tim Hall uses SQL as an example of a realistic common baseline. J2EE would be another one. They provide a good reality check. Standards are always supposed to prevent vendor lock-in. And there is a need for some of that, of course. But look at the track records. How many applications do you know that are certified and supported on any SQL database, any Unix operating system and any J2EE app server? And yet, standardizing queries on relational data and standardizing an enterprise-class runtime environment for one programming language are pretty constrained scopes in the grand scheme of things. At least compared to all the aspects that you need to standardize to provide real Cloud portability (security, monitoring, provisioning, configuration, language runtime and/or OS, data storage/retrieval, network configuration, integration with local apps, metering/billing, etc). And we’re supposed to put together a nice bundle of standards that will guarantee drag-and-drop portability across all these concerns? In how many lifetimes? By then, Cloud computing will have been replaced by the next big thing (galaxycomputing.com is still available BTW).

Not to mention that this standardization comes hand in hand with constraints on what you can do. That’s why I read Amazon’s Adam Selipsky’s comment that allowing customers to do “whatever they want” is vital as a way to say “get real” to requests for application portability, while allowing him to sound helpful rather than obstructionist.

This doesn’t mean that these standards are not useful. They make application portability possible if not free. They make for much improved productivity through generic tools and reusable developer knowledge. We still need all this.

Here is the best that can realistically happen in the “application portability across IaaS providers” area for at least 10 years:

  • a set of partial standards for small parts of the Cloud computing domain (see list above), many of which already exist.
  • a set of RightScale-like tools that do a lot of the grunt work of mapping/hiding/transforming between providers, with various degrees of success.
  • the need for application providers to certify their applications on Cloud providers one by one anyway and to provide cloning/migration as a feature of the application rather than an infrastructure-level task.

That’s assuming that IaaS providers become a major business, that there remains a difference between service providers and software providers. The other option is that the whole Cloud excitment goes back to SaaS only, that application creators are also hosting providers, that the only resource you get in a “utility” fashion is the application itself. At which point application portability is not a concern anymore and we go back to “only” worrying about data portability and application interoperability, an easier problem and one on which we have come a long way already. If this is what comes to pass then the challenge of Cloud portability may well be one of the main reasons. Along with the lack of revenue/margin potential for many of the actors in an IaaS world, as my CEO is fond of pointing out.

[UPDATED 2009/4/22: F5’s Lori MacVittie provides a very nice illustration of the same point, in her explanation of why OVF is not a cloud portability silver bullet.]

[UPDATED 2009/6/1: Soon after posting this entry I was contacted by people at SD Times about turning it into a “guest view” article in the June issue. It has just been published. It’s also in the paper version.]

5 Comments

Filed under Amazon, Application Mgmt, Articles, Cloud Computing, Everything, Google App Engine, HP, IT Systems Mgmt, Mgmt integration, People, Portability, Specs, Standards, Utility computing

Open Cloud Manifesto, circa 2004

The mini-scandal of last week was the manifesto-gate. The mini-scandal of this week is shaping out to be the Ulitzer-gate (if you want to make sure not to miss next week’s IT scandal, subscribe to the Register feed, ferreting these out and adding a bass-heavy soundtrack is their specialty).

Turns out I am one of these Ulitzer “unaware authors” through two articles I wrote a while ago for the Web services Journal, a paper publication by Sys-con (based on a request from HP PR) and a blog post I allowed Sys-con to republish. Looks like Ulitzer and Sys-con are one and the same. Three articles, spaced two years apart. That’s enough to earn me a dedicated home page at Ulitzer and a rank of 1,000 among their more than 6,000 authors. Makes you wonder how much the 5,000 “authors” behind me have (unknowingly) produced… Whatever. At least it’s all content that I authorized Sys-con to use, not something that was lifted from my blog as apparently happened to others.

Turns out the oldest of these articles (“From Web Services Management to Utility Computing” , from 2004) is not that different from the recently-published (and amply maligned) Open Cloud Manifesto. I described my article at the time as “an attempt to explain how the different efforts going on in the industry around Web services, grid, SOA management, virtualization, utility computing, <insert your favorite buzzword>, fit together to provide organizations with the flexibility and efficiency they need from their IT in order to thrive.”

It ends with “while it would be easier to develop an end-to-end model specific to one company’s offering, standardization allows the integration of the management capabilities of all the components that compose enterprise services. We must keep the pressure on vendors to deliver modular and composable specifications (for format, function, and protocol) that expose management capabilities of infrastructure services, applications, and business processes in such a way that these capabilities can be composed by the next generation of management applications.”

Sure it has a lot more emphasis on WS-* specs than is compatible with the current zeitgeist, and it uses the now-obsolete term of “utility computing” rather than the nebulous alternative currently en vogue, but isn’t the main message there?

Just to be clear, I am not laying pretentious claims of prescience and vision (at least not in this entry). There are plenty of documents (e.g. from the Grid community) that make the same points in more eloquent terms and starting many years prior. It’s just fun to see this link from today’s scandal to the one from last week.

for old time sake, here is the content of the 2004 article:

From Web Services Management to Utility Computing
by William Vambenepe

Enterprise services are created by combining infrastructure services, applications, and business processes. To be able to adapt quickly to business changes, enterprise IT must evolve from management of individual resources to management of interrelated services. This will be achieved through the development of composable and modular standards that expose the management capabilities of the building blocks of enterprise services. The Web services platform is an enabler of this transformation: a Web services-based management infrastructure provides a channel that is appropriate for dynamic resource provisioning, allocation, and configuration – often called utility computing.

We can consider this management infrastructure as a four-layered architecture. Starting at the foundation layer, the work on the base Web services infrastructure is far from over. First, until WSDL 2.0 is widely deployed, designers have to compose around the deficiencies of WSDL 1.1, such as the lack of portType inheritance. Second, there is still no standard for referencing Web services. Finally, key specifications such as WSRF (Web Services Resource Framework) and WSN (Web Services Notification), without which people were left to reinvent Web services interfaces to access stateful resources, have only recently reached the standards community. These issues are being resolved and a set of building blocks for accessing resources through an SOA (service-oriented architecture) is shaping up. It is critical that these building blocks be modular and composable to allow incremental adoption and separation of concerns.

Moving from the foundation to the management protocol layer, the OASIS WSDM (Web Services Distributed Management) technical committee, through its MUWS (Management Using Web Services) specification, is the key articulation point between the base Web services architecture and utility computing. Both the IT management community and the Grid community rely on MUWS. It defines how to express and exercise manageability capabilities through Web services, putting in place a management channel that is more interoperable and accessible than ever before.

Next is the modeling layer. Information models need to be composed so that a service can be represented based on the services that it is assembled from, be they peer or infrastructure services. Since these will be described by different models, the management channel (MUWS) needs to be model-agnostic in order to support a model-centric architecture. For example, CIM (Common Information Model) is a model that focuses on concrete resources. The DMTF WS-CIM subgroup must now open CIM to the Web services platform by developing a standard way to expose CIM-modeled resources through MUWS. Other models provide representations for service security, service-level agreements (SLA), etc. Only by composing these models will, for example, an auction service SLA be adequately managed as it depends on a combination of the performance of the servers on which the service runs, the application server that hosts it, the other services (authentication, billing, etc.) that it makes use of, and the business process engine that controls the bidding. Once this model-centric architecture is in place, management actions can be policy-driven through explicit constraints.

Finally, at the top layer, the architecture includes a set of common services for utility computing. They are being defined collaboratively by DMTF (Utility Computing working group) and GGF (OGSA working group).

All the pieces are falling into place but much remains to be done to allow comprehensive management of enterprise services in a model-centric way through Web services standards. While it would be easier to develop an end-to-end model specific to one company’s offering, standardization allows the integration of the management capabilities of all the components that compose enterprise services. We must keep the pressure on vendors to deliver modular and composable specifications (for format, function, and protocol) that expose management capabilities of infrastructure services, applications, and business processes in such a way that these capabilities can be composed by the next generation of management applications. These applications will use this to synchronize business and IT and to capitalize on change.

Comments Off on Open Cloud Manifesto, circa 2004

Filed under Application Mgmt, Articles, Automation, Business Process, Cloud Computing, Everything, IT Systems Mgmt, Mgmt integration, Modeling, Specs, Standards, Utility computing, Virtualization

Manifesto hyperventilation

The Cloud Manifesto debacle of the last few days is another sign of the landgrab atmosphere in the Cloud interoperability/standardization space. It’s like the crowd at 6:00AM in front of an electronics store on the day of the Big Sale. This is not the first spark and it won’t be the last one.

Two aspects of this crisis (soon to be anecdote) are especially ironic.

First, as many have picked up, the irony of a Microsoft manager complaining about a document having to be signed “as is” is something to savor slowly. I know Microsoft is a large company and I can believe that Steven Martin may never personally have engaged in such practices. But our credulity gets stretched when in the same post he lauds the WS-* process as an example of “organic” industry collaboration. Pick any WS-* spec and try contacting the listed co-authors other than Microsoft and (usually but not always) IBM. Ask them how much input they had and how much they were able to change. The answer won’t always be zero, but there will be plenty of them.

And these WS-* documents were standard candidates: technical specifications that these “take it or leave it” authors would presumably eventually have to implement in their products. Which takes me to the second point of irony, one I haven’t noticed mentioned:

This is a manifesto folks! I am not familiar with all of these other manifestos (plus this one) but at least for those I know one of their defining characteristics is that they stand in opposition to other views. Some may seem non-controversial now, but at the time of their publication they were very much so. Otherwise they wouldn’t have been called “manifestos”. The very idea of a manifesto being criticized for not being inclusive enough makes my head spin.

If anything (based on this draft), what the “manifesto” can be criticized for is being too meek and consensual  to be worthy of that name. How many important manifestos give their readers a “whenever appropriate” escape clause in their guiding principles?

[Note: If you are in an investigative mood, start with this whois search and if the name of the registrant sounds familiar it may be because you’ve read this blog post before.]

[UPDATED 2009/3/30: It’s live.]

[UPDATED 2009/3/31: Mike DiPetrillo also thinks this is a bit too motherhood-and-applepie for being called a “manifesto”. Thomas Bittman’s translation of the manifesto is also good.]

1 Comment

Filed under Cloud Computing, Everything, Microsoft, People, Portability, Standards, Utility computing

OVF 1.0 and beyond

OVF 1.0 just got released as a DMTF standard. Here is the specification and its companion white paper. After a quick scan I didn’t see any major change from the submitted version, which is consistent with the content of the “preliminary standard” from last year.

The interesting question is what comes next, especially with regards to VMWare’s vCloud. The VMWare press release stated that “as one of the original authors of the Open Virtualization Format (OVF) standard now released from the Distributed Management Task Force (DMTF), VMware will build upon that work by submitting a draft of its VMware vCloud API to enable consistent mobility, provisioning, management, and service assurance of applications running in internal and external clouds” and Drue Reeves at the Burton group commented on this (Drue, we’re still waiting for part II). I see no reason to believe that VMWare is going to stop playing by the Microsoft playbook in DMTF as it appears to be quite successful so far (I’ll pat myself in the back for predicting over a year ago that “OVF might only be the beginning” for VMWare at DMTF).

This results in what looks like a landgrab from DMTF in Cloud standards. Meanwhile, in Washington DC yesterday, the Strategies and Technologies for Cloud Computing Interoperability (SATCCI) workshop took place. At this point all I know about it is the report from Reuven Cohen that I just read (hopefully Stu, Krishna and other bloggers who participated will provide additional perspectives). From Reuven’s report, Winston Bumpus (Director of Standards Architecture at VMware and President of the DMTF) described OVF as “an ideal cloud migration and deployment package”. Which may be true but is a pretty recent repurposing (the spec and the white paper don’t even mention this application). And while the DMTF is going full speed ahead on this, Reuven reports that “Craig Lee, President of the Open Grid Forum suggested that we need to take more time to examine the overlap between various standards groups, mapping the opportunities for collaboration”. Sure thing. The old timers might remember that when the DMTF decides to run with Microsoft’s WS-Management it wasn’t just OASIS (where WSDM was created) that eventually got hosed but also OGF (then called GGF) which relied on the WSRF/WSDM stack. At the time too there were discussions to identify and reconcile the overlap, for all the good they did (disclosure: I have some history there).

We’ve seen this in the WS-* game before. At the end it’s not so much a matter of what the standards bodies do (and even less of what they say), it’s a matter of what the big players do and where they choose to take their marbles. To the extent that you can separate the two, which becomes tricky in the case of vendor-run bodies like WS-I and DMTF. As I have written before, “at the end, it comes down to what [you think] a standard should be”.

[UPDATED 2009/3/26: Stu has now written a report on the SATCCI meeting.]

5 Comments

Filed under Cloud Computing, Conference, DMTF, Everything, Grid, IT Systems Mgmt, OVF, Portability, Specs, Standards, Utility computing, Virtualization, VMware, WS-Management

Cloud computing: would you like flexibility with your simplicity?

The recent announcement of the Sun Cloud, and more specifically its API is a good occasion to think about how much simplicity we really want in our datacenter automation mechanisms. The Sun API is very simple and its authors are proud of that fact. Indeed they should be proud of avoiding unneeded complexity. They have probably also kept out (at least so far), some needed complexity.

First, let’s focus on the important part:

It’s not REST that matters, it’s the rest

Most of the comments on the API focus on the fact that it’s RESTful. The authoritative source on this is Tim Bray’s description of the API, which he helped shape. But Tim is very down-to-earth about the reasons to use REST:

Why REST? · It’s a sensible question. The chief virtue of RESTful interfaces is massive scaling. But gimme a break, these are data-center management operations; a typical transaction frequency would be a single-digit number per week, with the single digit often being “0”, and it wouldn’t be surprising if a big multi-cluster staged-boot operation had a latency of minutes. The data-center controls are unlikely to be a bottleneck.

Why, then? Simply because we wanted a bits-on-the-wire interface. APIs, in the general case, suck; and are really hard to make portable. Bits-on-the-wire are ultimately flexible and interoperable. If you’re going to do bits-on-the-wire, Why not use HTTP? And if you’re going to use HTTP, use it right. That’s all.

The use of REST is not a fundamental characteristic of the API. In other words, if this API turns out to be useful I can rewrite it as a SOAP API and it would still be useful. Unless the SOAP API is made purposely complicated, it would only be marginally harder to use, not fundamentally less useful.

In fact, we may find out. If the rumor is confirmed and IBM decides to Tivolify (rather than kill) the Sun Cloud, the whole thing can be refactored as WS-RT/XML/XQuery (and maybe WS-ResourceCatalog) in five days, four of which would be spent capturing, sedating and restraining Tim Bray (and his “spec machete”) with the last one used for coding.

In the case of the Sun Cloud API, REST makes the API simpler in the same way that a keyless system makes a car easier to operate. You don’t have to fumble for they key, but you still need to know to parallel park, change a tire and operate the stereo.

By using REST, the Sun team has kept away some arbitrary complexity (e.g. fine-grained PUT; instead Sun decides what are the two valid sets of input parameters to create a cluster). But that’s only a small percentage of the potential complexity of the system. Not to mention that most developer will use libraries rather than on-the-wire protocols so they won’t see any difference. Instead, the real deal is:

The model

By “the model” I mean both the resource model and the capabilities of the resources. For capabilities, I don’t care whether a virtual machine can be started via an HTTP GET request on a URL that ends with ?control=start, or via a SOAP message with the wsa:Action header set to http://iloveclouds.com/vm/start or via an RPC call to a Start(…) method. I just care that the model includes the capability to start a VM. And the list of states a VM can be in.

Look at a datacenter today. Make an inventory of all the networking equipment, storage, servers, hypervisors, operating systems and infrastructure services that it contains. Consider all the configuration settings of all these resources (as they would be represented in a complete, authoritative and consistent CMDB, that most elusive creature). Add to it all the controls and APIs they expose. That’s a lot of data, even if you don’t consider the applications layer. That’s a few orders of magnitude larger than what the model in the Sun Cloud API can describe. That gap (between our CMDB model and the Sun Cloud model) is what we should look at and analyze. Why are they so far apart? How big is the ideal datacenter automation and virtualization model?

Among other things, these hundreds of configuration settings in your current datacenter are used to optimize deployments. No-one would miss the pain of dealing with the optimizations if they went away, but we would miss the performance benefits they bring. So what replaces them if the model is too simple to support any tweak? Is the infrastructure behind the API auto-optimized, based on actual application patterns? Now that would be real progress towards simplicity and may allow us to rely on an API as simple as the Sun API. But the industry has been trying to do this with little success for a long time. I expect incremental, not radical, progress on this. Alternatively, does Cloud Computing change the economics to the point where performance optimizations through configurations are no longer cost-efficient, where scaling out is the answer? Hard to make this a general statement, considering how difficult it remains for many applications to scale out. And this sounds very SUV-like in these footprint-aware times (we see how well the “stretch the hood and add two cylinders to the engine” approach worked for Detroit).

Sun might very well have this covered under the hood. But I don’t know that I want to assume that they have an auto-optimizing system just because they produced an API that would benefit from having it underneath.

Not to mention that not all configuration tweaks have to do with performance optimization. Some of them are driven by licensing, organizational, risk and compliance considerations. If auto-detecting an application performance profile is hard, try auto-detecting its regulatory requirements.

Complexity with a purpose

The right place to be, between the “omniscient CMDB model” and the “Sun Cloud model” is somewhere in the middle, with a couple of incrementally complex layers. Of course they are so far apart that saying “somewhere in the middle” is a cope-out.  The current level of complexity is very hard to manage by humans (assisted by processes and tools, e.g. ITIL) and impossible to really automate. A lot of the complexity and variability is arbitrary rather than flexibility-inducing. We need to reduce this (all-out standardization is one way, stack integration is another). But the simplicity of the model in the Sun Cloud API is too extreme. Look at Amazon EC2. Everyone lauds the simplicity of the APIs and everyone, in the same breath, asks for more options (different instance types, availability zones, reserved instances…). Amazon (and Sun too, I assume) is taking the eminently rational approach of starting from simple and adding complexity (sorry, flexibility) as needed. That’s great. Just don’t get too enamored with the initial simplicity.

[UPDATED 2009/3/20: James Governor lauds the simplicity of Amazon’s cloud offering.  If I understand him correctly, he sees simplicity as coming not just from “few options” but also from backward compatibility with current app infrastructure. That second part is what William Louth criticizes in his comment below. At the very least I like to keep the two separated: “how intrinsicly simple is it” and “how backward compatible is it” even though both can be seen as providing the benefit of simplicity.]

7 Comments

Filed under Automation, Cloud Computing, Everything, IT Systems Mgmt, Modeling, REST, Specs, Tech, Utility computing, Virtualization

Exploring “IT management in a changing IT world”

The tagline for this blog is “IT management in a changing IT world”. Of course nobody but their authors care about blog taglines. Still, in the unlikely event that I am asked to expand on the “changing IT world” part I would do it as follows.

The changes currently at work in the IT world can be organized along three axis:

  • IT infrastructure and management
  • Application development and delivery
  • Business and regulation

Each of these categories is ridiculously large. It’s only through the prism of the relationships between them that they provide any value. Think about three balls linked by coil springs.

If you give one of these balls a shake, you will start a hard-to-predict dance between them. This is similar to how the three domains above relate to one another. Changes in one (say a new focus on regulatory compliance in the “business” area, the emergence of virtualization technology in the “infrastructure” area or the appearance of Web 2.0 applications in the “application” area) start a complex movement involving all three. It takes a while to achieve a new equilibrium (and in practice it is never achieved since changes occur too often, adding stimulus to an already excited system). For a visual illustration, see this little YouTube video (but imagine that the three balls are arranged in a triangle rather than linearly and that every so often one of them gets pulled in a random direction).

This is not new of course. There have been changes in these three areas for as long as IT has existed (starting before it was called IT) and they have always driven changes in how IT is managed. To some extent they also have always influenced one another. The “new” part is that the connections are a lot tighter now, that the springs have a much higher force constant (the “k” in “F=-kx”). So here is my attempt at mapping today’s hot buzzwords on a map organized along these areas.

Before you ask: yes of course I have a very rigorous methodology, based on very precise quantitative data, to establish with certainty the exact x, y and z coordinates of each label. Buzzword topology is a precise science.

You may notice that the buzziest buzzword (at least currently), “Cloud”, does not appear on the map. It’s because it buzzes so much that it would be all over it, engulfing what currently appears as “virtualization”, “datacenter automation”, “Iaas”, “PaaS”, “SaaS” and “opex/capex”. There are two main parts in the “Cloud” buzzword: the “Technical Cloud” and the “Business Cloud”. The “Technical Cloud” is where we take virtualization and standardization (of machines, networks and application infrastructure) and turn that mind-boggling complexity into a manageable system that can be programmed to deliver applications (Cisco recently called it “Unified Computing”; HP, IBM and others have been trying to describe and brand it for a long time). Building on these technical capabilities comes the second part of “Cloud”, the “Business Cloud”. It is the ability to use infrastructure owned by a third party (presumably one able to leverage economies of scale) and all the possibilities this opens in the business realm. That’s what “Cloud” started as, back when it was known as “Utility Computing” and before it was applied to everything under the sun. A recent illustration of the relationship between the “Technical Cloud” and the “Business Cloud” is the introduction of vCloud by VMWare (their vision includes using VMotion technology, a piece of the “Technical Cloud”, not just to move machines between neighboring hypervisors but between organizations, enabling the “Business Cloud”). Anyway, that’s why “Cloud” it’s not on the map. It is actually all over it.

The system displayed on the map is vibrating very intensely right now, and I don’t see this changing anytime soon. Just for fun, here are candidates for future boxes on the map:

  • In the “IT infrastructure and management” category, maybe one day we’ll get to real metadata-driven management integration across the stack (as opposed to the more limited “application modeling” area listed above), whether through RDF or not.
  • In the “application development and delivery” category, maybe Doug Purdy’s vision “to make everyone a programmer (even if they don’t know it)” will be realized, whether through Oslo or not.
  • In the “business and regulation” category, maybe one day corporations will actually start caring about the customer data they are entrusted to (but only if mishandling it finally costs them more than “sorry about that, here is a one year credit monitoring subscription now go away”).

In summary, the evolution of IT management is driven not only by changes in IT technology but also by changes in two other fields (“application development and delivery” and “business and regulation”) with which it is tightly connected. Both of these fields are also in a very dynamic state. And they also influence one another, resulting in a complex three-way dance. You can’t understand the trajectory and moves of one dancer without seeing the others.

That’s what I mean by “IT management in a changing IT world”. Thanks for asking.

[UPDATED 2009/6/25: For more on the “technical cloud” versus “business cloud”, go read Neil Ward-Dutton’s nice explanation. He actually breaks down the “business cloud” in two (separating the economic aspect from the strategic aspect).]

1 Comment

Filed under Application Mgmt, Automation, Big picture, BPM, BSM, Business, Cloud Computing, Everything, IT Systems Mgmt, ITIL, Mgmt integration, Open source, Utility computing, Virtualization

Cloudman to the rescue?

GMail had a hiccup last night. Last month it had a more serious outage and so did AWS.

In the movie:
Superman:
Easy, miss. I’ve got you.
Lois Lane:
You’ve got me? Who’s got you?

In the IT world:
Cloud provider: Easy, CIO. I’ve got you.
CIO: You’ve got me? Who’s got you?

With apologies to Werner Vogels.

Comments Off on Cloudman to the rescue?

Filed under Amazon, Application Mgmt, Cloud Computing, Everything, Google, IT Systems Mgmt, Utility computing

Towards making Cloud services more consumable by enterprises

If you are a hard core network/system management person who has been very suspicious of all the ITIL/ITSM bullshit from the boss, and even more suspicious of the “Cloud” nonsense that occupies the interns while they should be monitoring the event console, then I have bad news for you: they are mating. Not the interns (as far as I know), I am talking about ITSM and the Cloud.

If, on the other hand, you are an ITSM and Cloud enthusiast who sees himself/herself leveraging all these nifty ITSM/BSM tools and shinny Cloud services to become the ultimate CIO, delivering unprecedented business value, compliance and IT efficiency from your iPhone at the beach, then you’ll see this marriage as good news, a sign that your move to Hawaii is approaching.

I am referring to the announcement by Rodrigo Flores that his company, newScale, has released a new product, newScale FrontOffice for Virtual Data Centers, to incorporate Cloud-based services in their IT Service Catalog.

This is not sexy but it’s the kind of support that some classes of enterprises will need in order to really make use of Cloud services. Eventually, Cloud providers are going to have to move their focus away from cool technology and developer evangelism towards making their services easily consumable by corporations. Otherwise they’ll be like an office supply store that doesn’t take American Express.

While the direction is very interesting I can’t comment on how much value this new product actually provides because the company seems to engage in anti-marketing activities. If you want to dig a bit deeper than the press release, you get redirected to this page which requires your info in order to download the “complimentary information brief”. The confirmation page promises an email “containing the information you requested within the next 30 minutes”. I thought that the info I requested was the brief. What I got instead was an email asking me to call them. I don’t know if it’s my unpronouncable last name of the oracle.com in my email address that scares them (I’d guess the latter) but that seems like a lot of precautions for a “complimentary information brief”. Unless this is an attempt to grab the “Cloud” buzzword with little meat to back it up (not that anyone would do this, of course). Hopefully they’ll make more information publicly available when they get around to it.

[UPDATED 2009/2/25: Via Coté, I just saw what I consider supporting envidence, from Gartner that, once we’re out of the early adopter phase, Cloud firms will need to focus less on pleasing developers and more on pleasing IT people: “But my observation from client interactions is that cloud adoption in established, larger organizations (my typical client is $100m+ in revenue) is, and will be, driven by Operations, and not by Development.”]

[UPDATED 2009/3/3: I got the PDF brief, thanks newScale (Ken and Mark). Many of the benefits it describes assumes that there is a pretty robust automation/provisioning infrastructure to back it up, in addition to the Catalog itself. E.g. the Catalog alone will not allow you to “shorten the provisioning cycle time to minutes instead of months”. The brief lists adapter kits to VMWare/EC2 and more internal-minded tools (HP and BMC, presumable through their Opsware and BladeLogic acquisitions respectively). So on the “public cloud” side it’s EC2 for now, not surprisingly. Integration with many of the Cloud tools (like RightScale) could be tricky since these tools bundle a catalog with the automation engine. If we ever do get a useful ontology of Cloud services this catalog would be a natural user for it, when it expands to other services beyond EC2 and tries to help you compare them. I guess newSCale wouldn’t appreciate if I provided a direct link to the PDF, so go request it to see for yourself.]

[UPDATED 2009/3/23: Speaking of managing your IT systems from your cell phone at the beach: VMWare vCenter Mobile Access.]

3 Comments

Filed under Cloud Computing, Everything, Governance, IT Systems Mgmt, ITIL, Mgmt integration, Utility computing

The datacenter as a programmable entity

This is an exciting time for those who want to shrink the computer. They are having a field day playing with devices powered by Android, the iPhone’s Cocoa, Palm’s new WebOS, Windows Mobile, JavaFX (maybe one day) and, to a lesser extent, the Blackberry.

But times are good too for those who want to go the other way and program larger things rather than smaller ones. If you are interested in thinking about datacenters as a programmable entity, you are in luck: for these long plane trips when you run out of battery, bring a printout of the proceedings of the research meeting organized last year in Cambridge by Microsoft and HP Labs, titled “The Rise and Rise of the Declarative Datacentre”. When you’re back on-line go check the presentations on the site.

And if you liked Paul Anderson’s “Programming the Data Centre” presentation at the Cambridge meeting, you can also read his “Programming the Virtual Infrastructure” slides from LISA 08. More LISA 08 presentations here.

I got the link to Paul Anderson’s second presentation (and maybe also the first one, some time ago) from Steve Loughran, who also adds a few comments, starting with the debate between the declarative and procedural approaches. This question has plenty of down-the-road implications. There is a lot to like about the declarative approach in terms of composition, manageability and more generally as a framework to manage complexity via encapsulation.

A simple analogy for this debate is to think about driving directions. The declarative approach is for me to give you a map with a circle on it showing where my house is and let you find your way. It’s more work for you but it’s also more resilient. The procedural approach is for me to give you a set of turn-by-turn directions, based on where you are coming from. If you miss one turn or if one road happens to be blocked at the time, then you’re in trouble.

That being said, there are enough powerful and useful PowerShell or Puppet scripts out there to give you a pause before discarding procedural approaches. While the declarative (aka “desired state”, “policy-driven” and sometimes “model-based”) approach looks a lot more elegant, at this point in time the real work usually gets done via scripts, deployment procedures or the likes.

In additin to academia, the competition between these approaches is playing out right now between all the companies and products that want to help you automate and manage your cloud deployments (public and/or private): for example, Rightscale scripts (custom scripts and Righscripts, see here and here) versus the more declarative ECML/EDML documents from Elastra. Or the very declarative approach taken by SmartFrog.

5 Comments

Filed under Automation, Cloud Computing, Conference, Desired State, Everything, Grid, Implementation, Research, Tech, Utility computing

Long-running processes on Google App Engine: it finally works

I am probably taking things a bit too personally, but I feel like I just successfully guilt-tripped the Google App Engine (GAE) team. Just last night I was complaining that they were teasing me (supporting urllib2, but an older version, without timeout support). And tonight I noticed a new post on their blog, announcing the end of these “High CPU Requests” that have been the bane of my GAE experience.

The reason why I was looking for timeout support in the first place is to avoid generating these dreaded “High CPU Requests” that quickly result in your application being disabled. It’s all explained here and  here.

But now that they are gone, I don’t need timeouts anymore. Around 11:00PM Pacific (on Thursday 2/12) I restarted my application. All it does is create 5 new simple entities in the datastore, one every second (it sleeps for one second between each entry). Then it spawns its successor, which will do the same thing, ad vitam aeternam. The application’s name, “rere”, is short for “request relay”, the pattern used to emulate a long running process. The default page for the app (available here) just returns a list of the 30 last entities created. The point is to illustrate that a single original request spawned an ever-lasting computing task on GAE.

Here is the code:

#!/usr/bin/env python
#
# Copyright 2009 William Vambenepe
#

import wsgiref.handlers
import os
import logging
import time

from google.appengine.ext import db
from google.appengine.ext.webapp import template
from google.appengine.ext import webapp
from google.appengine.api import urlfetch

numberOfHeartbeatsViewed = 30
secondsDurationOfTaskWait = 1
numberOfTasksPerRequest = 5

class HeartBeat(db.Model):
  requestId = db.IntegerProperty()
  date = db.DateTimeProperty(auto_now_add=True)

# The mere existence of an instance of this class in the DB means that the relay has to stop.
class StopExec(db.Model):
  date = db.DateTimeProperty(auto_now_add=True)

class MainHandler(webapp.RequestHandler):
  def get(self):
    hbs = HeartBeat.all().order("-date").fetch(numberOfHeartbeatsViewed)
    template_values = {"hbs": hbs}
    path = os.path.join(os.path.dirname(__file__), "index.html")
    self.response.out.write(template.render(path, template_values))

class StartHandler(webapp.RequestHandler):
  def get(self):
    if (StopExec.all().count() == 0):
      try:
        id = int(self.request.get("id"))
      except (TypeError, ValueError):
        id = 0
      try:
        logging.debug("Request " + str(id) + " launching background task.")
        loopCount = 0
        while(loopCount < numberOfTasksPerRequest):
          hb = HeartBeat()
          hb.requestId = id
          hb.put()
          logging.debug("Request " + str(id) + " saved heartbeat #" + str(loopCount))
          time.sleep(secondsDurationOfTaskWait)
          loopCount = loopCount+1
      finally:
        logging.debug("Launching successor request with id=" + str(id+1))

# This silly back and forth between the two URLs is because of
# "App cannot fetch the same URL as the one used for the request" error.
        if (self.request.url.find("start2") == -1):
          urlfetch.fetch("http://localhost/start2?id=" + str(id+1))
        else:
          urlfetch.fetch("http://localhost/start?id=" + str(id+1))
        logging.debug("Request " + str(id) + " completed")

def main():
  application = webapp.WSGIApplication([("/", MainHandler), ("/start", StartHandler), ("/start2", StartHandler)], debug=True)
  wsgiref.handlers.CGIHandler().run(application)

if __name__ == "__main__":
  main()

One thing I had to change from the earlier version (written using version 1.1.0 of the GAE SDK) is that urlfetch now returns an error if your app tries to invoke itself at the same URL (“App cannot fetch the same URL as the one used for the request”). So I have to alternate between http://localhost/start and http://localhost/start2, both of which are mapped to the same handler. This was added sometimes between SDK 1.1.0 and SDK 1.1.9. If it is aimed at preventing the kind of batton-passing that I am doing, it is pretty unefficient considering how easy it is to circumvent.

It is now 1:02AM Pacific the next day (Friday 2/13) and the process is still progressing, based on the single HTTP request I sent to it at 11:00PM the previous evening. The result page currently returns:

  • From request # 1056, with date Fri, 13 Feb 2009 09:02:15 +0000
  • From request # 1056, with date Fri, 13 Feb 2009 09:02:14 +0000
  • From request # 1056, with date Fri, 13 Feb 2009 09:02:13 +0000
  • From request # 1056, with date Fri, 13 Feb 2009 09:02:12 +0000
  • From request # 1056, with date Fri, 13 Feb 2009 09:02:11 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:10 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:09 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:08 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:07 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:06 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:05 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:04 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:03 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:02 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:01 +0000
  • From request # 1053, with date Fri, 13 Feb 2009 09:01:59 +0000
  • From request # 1053, with date Fri, 13 Feb 2009 09:01:58 +0000

Which shows that 1056 successive requests have participated in the relay (the last one just happened, at 09:02:15 UTC which is 1:02AM Pacific).

Hopefully it will still be running when I wake up tomorrow.

[UPDATED 2009/2/13, 9:08AM Pacific: It’s alive!

  • From request # 6411, with date Fri, 13 Feb 2009 17:08:45 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:44 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:43 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:42 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:41 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:40 +0000
  • From request # 6409, with date Fri, 13 Feb 2009 17:08:39 +0000

BTW, the code provided uses localhost to run on my local machine. The version uploaded to Google of course replaces this with rere.appspot.com.]

[UPDATED 2009/5/1: For some reason this entry is attracting a lot of comment spam, so I am disabling comments. Contact me if you’d like to comment.]

4 Comments

Filed under Cloud Computing, Everything, Google App Engine, Implementation, Tech, Utility computing

Google App Engine is teasing me

Version 1.1.9 of the Google App Engine (GAE) SDK was released earlier this week. The first item in the announcement covers the big news, that developers can “use the Python standard libraries urllib, urllib2 or httplib to make HTTP requests”. My first thought reading this was that I was finally going to be able to use timeouts on outgoing HTTP requests. I care about this because my earlier attempt to emulate a long-running process in GAE was stymied by the GAE quota system, something I think I can work around if I can timeout a request once it has spawned its successor (more precisely, once it has spawned the successor of the incoming request that created the new outgoing request).

I got the new SDK last evening and moved the code from using urlfetch to using urllib2 (with timeout). On my local machine it seems to work, but the quota system (that I am trying to finesse) doesn’t run in the SDK. So the only real test happens once you deploy the app in the Google environment. Which is when I realized that GAE uses Python 2.5.2 and that the timeout parameter in urllib2 (and httplib) came with 2.6. Slap.

I was especially disappointed because the links for urllib2 and httplib in the GAE 1.1.9 SDK announcement take us to the Python 2.6.1 documentation. The timeout parameter is right there at the top of these pages, staring at me. It would be more accurate for this announcement to point to the 2.5.2 documentation (here it is for urllib2 and httplib).

It doesn’t really matter of course because this is just a toy project. And a real scheduler seems to be in the works (see this pre-announcement and this work in progress). I just do this as a fun way to get a glimpse of what it takes to turn existing infrastructure into a *aaS product, something that is going on in different ways in many places these days. Linux wasn’t created to run on something else than hardware. Xen wasn’t created to support EC2. Python wasn’t created to support GAE.

And GAE wasn’t created to support long-running processes. But I haven’t given up.

1 Comment

Filed under Cloud Computing, Everything, Google App Engine, Implementation, Tech, Utility computing

UCI: setting RDF for failure?

I don’t get it. I just read Reuven Cohen’s description of the Unified Cloud Interface project that he recently started. It’s nothing less than using RDF to create “a Semantic Cloud Infrastructure capable of adapting to a variety of methodologies / architectures and completely agnostic to any specific API or platform being described.”

What made me fall off my chair is the methodology/architecture part of this statement. It’s hard enough (but doable) to use RDF to map philosophically similar APIs. It’s a non-starter to use it to bridge architectural and methodological differences. I have spent a fair amount of time looking at Semantic Web technologies in the context of modeling IT systems (see the “semantic tech” category of this blog). While I think they would be a great foundation I don’t see them ever coming anywhere near what Reuven describes.

But to be fair, I am not sure what he really is describing. There are a few overly ambitious proclamations like the one above and this paragraph:

The key drivers of a unified cloud interface (UCI) is “One abstraction to Rule them All” – an API for other API’s. A singular abstraction that can encompass the entire infrastructure stack as well as emerging cloud centric technologies through a unified interface. What a semantic model enables for UCI is a capability to bridge both cloud based API’s such as Amazon Web Services with existing protocols and standards, regardless of the level of adoption of the underlying API’s or technology. The goal is simple, develop your application once, deploy anywhere at anytime for any reason.

But in his piece you’ll also find CIM being cited as an example. There are good things to be said about CIM, but it certainly is not “a dynamic computing model that can, under certain conditions, be ‘trained’ to appropriately ‘learn’ the meaning of related cloud & infrastructure resources” (or, in the case of CIM, computer system resources). Good luck “training” CIM to “learn” anything. It’s CIM that’s going to train you to do it its way, period.

The CIM example (and other standards he lists) paints the picture of defining a standard API for Cloud Computing and forcing all providers to use it. That’s the conventional approach to universality. If that’s what UCI is after then it is technically achievable. And RDF might be a very good technical foundation for it. Whether anyone can pull this off politically and commercially at this stage is a different question of course. In any case, such an effort would have nothing to do with magically wrapping whatever API each provider has defined and whatever architecture/methodology they chose.

And further down we see a sketch of another, much more modest, vision, when Reuven talks about how “these web resources could just as easily be ‘cloud resources’ or API’s” which seems to represent a whole API as an RDF resource. Sure, then you can use RDF/OWL to capture versioning information between them, backward compatibility etc. Probably very useful, but that’s a very different scope.

So which is it? Reuven is a thought leader in Cloud Computing, so I want to think I am missing his point.

So far, I haven’t seen any Cloud taxonomy that is reasonably complete and has received broad support. Shouldn’t we first try to come up with a human-readable taxonomy before we try to turn it into a machine-readable ontology? In my previous post I explicitly stayed away from being pedantic about the difference between the terms, but the confusion between a taxonomy and an ontology seems to be part of what’s going on here.

The sad thing is that they (you know, them) will point to this as a proof that Semantic Web technologies don’t work.

Or maybe I’ve just set myself up for a generous portion of humble pie on April 2nd (when Reuven says an “initial functional draft UCI implementation, taxonomy and ontology” will be unveiled). I’d love to be surprised. And my ego has taken worse hits before.

[UPDATED 2009/2/10: You should read Steve Oberlin’s take on this overall taxonomy/ontology discussion. He knows the topic, carefully reads the posts that he comments on, packs a healthy dose of skepticism and takes the time to explain what taxonomies and ontologies are, which was overdue. Plus, I just love sites that don’t feel the need to use decorative pictures. His doesn’t have a single image file which means that even if he didn’t have superb credentials (which he does) he’d get my respect by default. A blog to watch.]

1 Comment

Filed under Automation, Cloud Computing, Everything, Modeling, Portability, RDF, Semantic tech, Standards, Tech, Utility computing