William Vambenepe's blog

IT management in a changing IT world

One way to think about mental health is by looking at how best way to buy viagra online and successfully a person functions.In the Republic of Georgia, the Eliava Institute of Bacteriophage, drug generic viagraThe first major morphological change was the evolution of a bipedal viagra online overnight delivery adaptation from an arboreal or semi-arboreal one, with all its attendant adaptations, such as a valgus knee, low intermembral index (long legs relative to the arms), and reduced upper-body strength.Exercise is the performance of movements in order to develop or buy generic viagra physical fitness and overall health.The Persian scientist Avicenna introduced experimental medicine, cheapest viagra contagious diseases, introduced quarantine and clinical trials, and described many anaesthetics and medical and therapeutic drugs, in The Canon of Medicine.

Archive for the 'Management integration' Category

18
Aug
2010

The necessity of PaaS: Will Microsoft be the Singapore of Cloud Computing?

by William (@vambenepe on Twitter)

From ancient Mesopotamia to, more recently, Holland, Switzerland, Japan, Singapore and Korea, the success of many societies has been in part credited to their lack of natural resources. The theory being that it motivated them to rely on human capital, commerce and innovation rather than resource extraction. This approach eventually put them ahead of their better-endowed neighbors.

A similar dynamic may well propel Microsoft ahead in PaaS (Platform as a Service): IaaS with Windows is so painful that it may force Microsoft to focus on PaaS. The motivation is strong to “go up the stack” when the alternative is to cultivate the arid land of Windows-based IaaS.

I should disclose that I work for one of Microsoft’s main competitors, Oracle (though this blog only represents personal opinions), and that I am not an expert Windows system administrator. But I have enough experience to have seen some of the many reasons why Windows feels like a much less IaaS-friendly environment than Linux: e.g. the lack of SSH, the cumbersomeness of RDP, the constraints of the Windows license enforcement system, the Windows update mechanism, the immaturity of scripting, the difficulty of managing Windows from non-Windows machines (despite WS-Management), etc. For a simple illustration, go to EC2 and compare, between a Windows AMI and a Linux AMI, the steps (and time) needed to get from selecting an image to the point where you’re logged in and in control of a VM. And if you think that’s bad, things get even worse when we’re not just talking about a few long-lived Windows server instances in the Cloud but a highly dynamic environment in which all steps have to be automated and repeatable.

I am not saying that there aren’t ways around all this, just like it’s not impossible to grow grapes in Holland. It’s just usually not worth the effort. This recent post by RighScale illustrates both how hard it is but also that it is possible if you’re determined. The question is what benefits you get from Windows guests in IaaS and whether they justify the extra work. And also the additional license fee (while many of the issues are technical, others stem more from Microsoft’s refusal to acknowledge that the OS is a commodity). [Side note: this discussion is about Windows as a guest OS and not about the comparative virtues of Hyper-V, Xen-based hypervisors and VMWare.]

Under the DSI banner, Microsoft has been working for a while on improving the management/automation infrastructure for Windows, with tools like PowerShell (which I like a lot). These efforts pre-date the Cloud wave but definitely help Windows try to hold it own on the IaaS battleground. Still, it’s an uphill battle compared with Linux. So it makes perfect sense for Microsoft to move the battle to PaaS.

Just like commerce and innovation will, in the long term, bring more prosperity than focusing on mining and agriculture, PaaS will, in the long term, yield more benefits than IaaS. Even though it’s harder at first. That’s the good news for Microsoft.

On the other hand, lack of natural resources is not a guarantee of success either (as many poor desertic countries can testify) and Microsoft will have to fight to be successful in PaaS. But the work on Azure and many research efforts, like the “next-generation programming model for the cloud” (codename “Orleans”) that Mary Jo Foley revealed today, indicate that they are taking it very seriously. Their approach is not restricted by a VM-centric vision, which is often tempting for hypervisor and OS vendors. Microsoft’s move to PaaS is also facilitated by the fact that, while system administration and automation may not be a strength, development tools and application platforms are.

The forward-compatible Cloud will soon overshadow the backward-compatible Cloud and I expect Microsoft to play a role in it. They have to.

05
Aug
2010

Updates on Microsoft Oslo and “SSH on Windows”

by William (@vambenepe on Twitter)

I’ve been tracking the modeling technology previously known as “Microsoft Oslo” with a sympathetic eye for the almost three years since it’s been introduced. I look at it from the perspective of model-driven IT management but the news hadn’t been good on that front lately (except for Douglas Purdy’s encouraging hint).

The prospects got even bleaker today, at least according to the usually-well-informed Mary Jo Foley, who writes: “Multiple contacts of mine are telling me that Microsoft has decided to shelve Quadrant and ‘refocus’ M.” Is “M” the end of the SDM/SML/M model-driven management approach at Microsoft? Or is the “refocus” a hint that M is returning “home” to address IT management use cases? Time (or Doug) will tell…

While we’re talking about Microsoft and IT automation, I have one piece of free advice for the Microsofties: people *really* want to SSH into Windows servers. Here’s how I know. This blog rarely talks about Microsoft but over the course of two successive weekends over a year ago I toyed with ways to remotely manage Windows machines using publicly documented protocols. In effect, showing what to send on the wire (from Linux or any platform) to leverage the SOAP-based management capabilities in recent versions of Windows. To my surprise, these posts (1, 2, 3) still draw a disproportionate amount of traffic. And whenever I look at my httpd logs, I can count on seeing search engine queries related to “windows native ssh” or similar keywords.

If heterogeneous Cloud is something Microsoft cares about they need to better leverage the potential of the PowerShell Remoting Protocol. They can release open-source Python, Java and Ruby client-side libraries. Alternatively, they can drastically simplify the protocol, rather than its current “binary over SOAP” (you read this right) incarnation. Because the poor Kridek who is looking for the “WSDL for WinRM / Remote Powershell” is in for a nasty surprise if he finds it and thinks he’ll get a ready-to-use stub out of it.

That being said, a brave developer willing to suck it up and create such a Python/Ruby/Java library would probably make some people very grateful.

18
Jul
2010

Introducing the Oracle Cloud API

by William (@vambenepe on Twitter)

Oracle recently published a Cloud management API on OTN and also submitted a subset of the API to the new DMTF Cloud Management working group. The OTN specification, titled “Oracle Cloud Resource Model API”, is available here. In typical DMTF fashion, the DMTF-submitted specification is not publicly available (if you have a DMTF account and are a member of the right group you can find it here). It is titled the “Oracle Cloud Elemental Resource Model” and is essentially the same as the OTN version, minus sections 9.2, 9.4, 9.6, 9.8, 9.9 and 9.10 (I’ll explain below why these sections have been removed from the DMTF submission). Here is also a slideset that was recently used to present the submitted specification at a DMTF meeting.

So why two documents? Because they serve different purposes. The Elemental Resource Model, submitted to DMTF, represents the technical foundation for the IaaS layer. It’s not all of IaaS, just its core. You can think of its scope as that of the base EC2 service (boot a VM from an image, attach a volume, connect to a network). It’s the part that appears in all the various IaaS APIs out there, and that looks very similar, in its model, across all of them. It’s the part that’s ripe for a simple standard, hopefully free of much of the drama of a more open-ended and speculative effort. A standard that can come out quickly and provide interoperability right out of the gate (for the simple use cases it supports), not after years of plugfests and profiles. This is the narrow scope I described in an earlier rant about Cloud standards:

I understand the pain of customers today who just want to have a bit more flexibility and portability within the limited scope of the VM/Volume/IP offering. If we really want to do a standard today, fine. Let’s do a very small and pragmatic standard that addresses this. Just a subset of the EC2 API. Don’t attempt to standardize the virtual disk format. Don’t worry about application-level features inside the VM. Don’t sweat the REST or SOA purity aspects of the interface too much either. Don’t stress about scalability of the management API and batching of actions. Just make it simple and provide a reference implementation. A few HTTP messages to provision, attach, update and delete VMs, volumes and IPs. That would be fine. Anything else (and more is indeed needed) would be vendor extensions for now.

Of course IaaS goes beyond the scope of the Elemental Resource Model. We’ll need load balancing. We’ll need tunneling to the private datacenter. We’ll need low-latency sub-networks. We’ll need the ability to map multi-tier applications to different security zones. Etc. Some Cloud platforms support some of these (e.g. Amazon has an answer to all but the last one), but there is a lot more divergence (both in the “what” and the “how”) between the various Cloud APIs on this. That part of IaaS is not ready for standardization.

Then there are the extensions that attempt to make the IaaS APIs more application-aware. These too exist in some Cloud APIs (e.g. vCloud vApp) but not others. They haven’t naturally converged between implementations. They haven’t seen nearly as much usage in the industry as the base IaaS features. It would be a mistake to overreach in the initial phase of IaaS standardization and try to tackle these questions. It would not just delay the availability of a standard for the base IaaS use cases, it would put its emergence and adoption in jeopardy.

This is why Oracle withheld these application-aware aspects from the DMTF submission, though we are sharing them in the specification published on OTN. We want to expose them and get feedback. We’re open to collaborating on them, maybe even in the scope of a standard group if that’s the best way to ensure an open IP framework for the work. But it shouldn’t make the upcoming DMTF IaaS specification more complex and speculative than it needs to be, so we are keeping them as separate extensions. Not to mention that DMTF as an organization has a lot more infrastructure expertise than middleware and application expertise.

Again, the “Elemental Resource Model” specification submitted to DMTF is the same as the “Oracle Cloud Resource Model API” on OTN except that it has a different license (a license grant to DMTF instead of the usual OTN license) and is missing some resources in the list of resource types (section 9).

Both specifications share the exact same protocol aspects. It’s pretty cleanly RESTful and uses a JSON serialization. The credit for the nice RESTful protocol goes to the folks who created the original Sun Cloud API as this is pretty much what the Oracle Cloud API adopted in its entirety. Tim Bray described the genesis and design philosophy of the Sun Cloud API last year. He also described his role and explained that “most of the heavy lifting was done by Craig McClanahan with guidance from Lew Tucker“. It’s a shame that the Oracle specification fails to credit the Sun team and I kick myself for not noticing this in my reviews. This heritage was noted from the get go in the slides and is, in my mind, a selling point for the specification. When I reviewed the main Cloud APIs available last summer (the first part in a “REST in practice for IT and Cloud management” series), I liked Sun’s protocol design the best.

The resource model, while still based on the Sun Cloud API, has seen many more changes. That’s where our tireless editor, Jack Yu, with help from Mark Carlson, has spent most of the countless hours he devoted to the specification. I won’t do a point to point comparison of the Sun model and the Oracle model, but in general most of the changes and additions are motivated by use cases that are more heavily tilted towards private clouds and compatibility with existing application infrastructure. For example, the semantics of a Zone have been relaxed to allow a private Cloud administrator to choose how to partition the Cloud (by location is an obvious option, but it could also by security zone or by organizational ownership, as heretic as this may sound to Cloud purists).

The most important differences between the DMTF and OTN versions relate to the support for assemblies, which are groups of VMs that jointly participate in the delivery of a composite application. This goes hand-in-hand with the recently-released Oracle Virtual Assembly Builder, a framework for creating, packing, deploying and configuring multi-tier applications. To support this approach, the Cloud Resource Model (but not the Elemental Model, as explained above) adds resource types such as AssemblyTemplate, AssemblyInstance and ScalabilityGroup.

So what now? The DMTF working group has received a large number of IaaS APIs as submissions (though not the one that matters most or the one that may well soon matter a lot too). If all goes well it will succeed in delivering a simple and useful standard for the base IaaS use cases, and we’ll be down to a somewhat manageable triplet (EC2, RackSpace/OpenStack and DMTF) of IaaS specifications. If not (either because the DMTF group tries to bite too much or because it succumbs to infighting) then DMTF will be out of the game entirely and it will be between EC2, OpenStack and a bunch of private specifications. It will be the reign of toolkits/library/brokers and hell on earth for all those who think that such a bridging approach is as good as a standard. And for this reason it will have to coalesce at some point.

As far as the more application-centric approach to hypervisor-based Cloud, well, the interesting things are really just starting. Let’s experiment. And let’s talk.

16
Jun
2010

CMDB in the Cloud: not your father’s CMDB

by William (@vambenepe on Twitter)

Bernd Harzog recently wrote a blog entry to examine whether “the CMDB [is] irrelevant in a Virtual and Cloud based world“. If I can paraphrase, his conclusion is that there will be something that looks like a CMDB but the current CMDB products are ill-equipped to fulfill that function. Here are the main reasons he gives for this prognostic:

  1. A whole new class of data gets created by the virtualization platform – specifically how the virtualization platform itself is configured in support of the guests and the applications that run on the guest.
  2. A whole new set of relationships between the elements in this data get created – specifically new relationships between hosts, hypervisors, guests, virtual networks and virtual storage get created that existing CMDB’s were not built to handle.
  3. New information gets created at a very rapid rate. Hundreds of new guests can get provisioned in time periods much too short to allow for the traditional Extract, Transform and Load processes that feed CMDB’s to be able to keep up.
  4. The environment can change at a rate that existing CMDB’s cannot keep up with. Something as simple as vMotion events can create thousands of configuration changes in a few minutes, something that the entire CMDB architecture is simply not designed to keep up with.
  5. Having portions of IT assets running in a public cloud introduces significant data collection challenges. Leading edge APM vendors like New Relic and AppDynamics have produced APM products that allow these products to collect the data that they need in a cloud friendly way. However, we are still a long way away from having a generic ability to collect the configuration data underlying a cloud based IT infrastructure – notwithstanding the fact that many current cloud vendors would not make this data available to their customers in the first place.
  6. The scope of the CMDB needs to expand beyond just asset and configuration data and incorporate Infrastructure Performance, Applications Performance and Service assurance information in order to be relevant in the virtualization and cloud based worlds.

I wanted to expand on some of these points.

New model elements for Cloud (bullets #1 and #2)

These first bullets are not the killers. Sure, the current CMDBs were designed before the rise of virtualized environment, but they are usually built on a solid modeling foundation that can easily be extend with new resources classes. I don’t think that extending the model to describe VM, VNets, Volumes, hypervisors and their relationships to the physical infrastructure is the real challenge.

New approach to “discovery” (bullets #3 and #4)

This, on the other hand is much more of a “dinosaurs meet meteorite” kind of historical event. A large part of the value provided by current CMDBs is their ability to automate resource discovery. This is often achieved via polling/scanning (at the hardware level) and heuristics/templates (directory names, port numbers, packet inspection, bird entrails…) for application discovery. It’s imprecise but often good enough in static environments (and when it fails, the CMDB complements the automatic discovery with a reconciliation process to let the admin clean things up). And it used to be all you could get anyway so there wasn’t much point complaining about the limitations. The crown jewel of many of today’s big CMDBs can often be traced back to smart start-ups specialized in application discovery/mapping, like Appilog (now HP, by way of Mercury) and nLayers (now EMC). And more recently the purchase of Tideway by BMC (ironically – but unsurprisingly – often cast in Cloud terms).

But this is not going to cut it in “the Cloud” (by which I really mean in a highly automated IT environment). As Bernd Harzog explains, the rate of change can completely overwhelm such discovery heuristics (plus, some of the network scans they sometimes use will get you in trouble in public clouds). And more importantly, there now is a better way. Why discover when you can ask? If resources are created via API calls, there are also API calls to find out which resources exist and how they are configured. This goes beyond the resources accessible via IaaS APIs, like what VMWare, EC2 and OVM let you retrieve. This “don’t guess, ask” approach to discovery needs to also apply at the application level. Rather than guessing what software is installed via packet inspection or filesystem spelunking, we need application-aware discovery that retrieves the application and configuration and dependencies from the application itself (or its underlying framework). And builds a model in which the connections between application entities are expressed in terms of the configuration settings that drive them rather than the side effects by which they can be noticed.

If I can borrow the words of Lew Cirne:

“All solutions built in the pre-cloud era are modeled on jvms (or their equivalent), hosts and ports, rather than the logical application running in a more fluid environment. If the solution identifies a web application by host/port or some other infrastructural id, then you cannot effectively manage it in a cloud environment, since the app will move and grow, and your management system (that is, everything offered by the Big 4, as well as all infrastructure management companies that pay lip service to the application) will provide nearly-useless visibility and extraordinarily high TCO.”

I don’t agree with everything in Lew Cirne’s post, but this diagnostic is correct and well worded. He later adds:

“So application management becomes the strategic center or gravity for the client of a public or private cloud, and infrastructure-centric tools (even ones that claim to be cloudy) take on a lesser role.”

Which is also very true even if counter-intuitive for those who think that

cloud = virtualization (in the “fake machine” interpretation of virtualization)

Embracing such a VM-centric view naturally raises the profile of infrastructure management compared to application management, which is a fallacy in Cloud computing.

Drawing the line between Cloud infrastructure management and application management (bullet #5)

This is another key change that traditional CMDBs are going to have a hard time with. In a Big-4 CMDB, you’re after the mythical “single source of truth”. Even in a federated CMDB (which doesn’t really exist anyway), you’re trying to have a unified logical (if not physical) repository of information. There is an assumption that you want to manage everything from one place, so you can see all the inter-dependencies, across all layers of the stack (even if individual users may have a scope that is limited by permissions). Not so with public Clouds and even, I would argue, any private Cloud that is more than just a “cloud” sticker slapped on an old infrastructure. The fact that there is a clean line between the infrastructure model and the application model is not a limitation. It is empowering. Even if your Cloud provider was willing to expose a detailed view of the underlying infrastructure you should resist the temptation to accept. Despite the fact that it might be handy in the short term and provide an interesting perspective, it is self-defeating in the long term from the perspective of realizing the productivity improvements promised by the Cloud. These improvements require that the infrastructure administrator be freed from application-specific issues and focus on meeting the contract of the platform. And that the application administrator be freed from infrastructure-level concerns (while at the same time being empowered to diagnose application-level concerns). This doesn’t mean that the application and infrastructure models should be disconnected. There is a contract and both models (infrastructure and consumption) should represent it in the same way. It draws a line, albeit one with some width.

Blurring the line between configuration and monitoring (bullet #6)

This is another shortcoming of current CMDBs, but one that I think is more easily addressed. The “contract” between the Cloud infrastructure and the consuming application materializes itself in a mix of configuration settings, administrative capabilities and monitoring data. This contract is not just represented by the configuration-centric Cloud API that immediately comes to mind. It also includes the management capabilities and monitoring points of the resulting instances/runtimes.

Wither CMDB?

Whether all these considerations mean that traditional CMDBs are doomed in the Cloud as Bernd Harzog posits, I don’t know. In this post, BMC’s Kia Behnia acknowledges the importance of application management, though it’s not clear that he agrees with their primacy. I am also waiting to see whether the application management portfolio he has assembled can really maps to the new methods of application discovery and management.

But these are resourceful organization, with plenty of smart people (as I can testify: in the end of my HP tenure I worked with the very sharp CMDB team that came from the Mercury acquisition). And let’s keep in mind that customers also value the continuity of support of their environment. Most of them will be dealing with a mix of old-style and Cloud applications and they’ll be looking for a unified management approach. This helps CMDB incumbents. If you doubt the power to continuity, take a minute to realize that the entire value proposition of hypervisor-style virtualization is centered around it. It’s the value of backward-compatibility versus forward-compatibility. in addition, CMDBs are evolving into CMS and are a lot more than configuration repositories. They are an important supporting tool for IT management processes. Whether, and how, these processes apply to “the Cloud” is a topic for another post. In the meantime, read what the IT Skeptic and Rodrigo Flores have to say.

I wouldn’t be so quick to count the Big-4 out, even though I work every day towards that goal, building Oracle’s application and middleware management capabilities in conjunction with my colleagues focused on infrastructure management.

If the topic of application-centric management in the age of Cloud is of interest to you (and it must be if you’ve read this long entry all the way to the end), You might also find this previous entry relevant: “Generalizing the Cloud vs. SOA Governance debate“.

25
May
2010

Dear Cloud API, your fault line is showing

by William (@vambenepe on Twitter)

Most APIs are like hospital gowns. They seem to provide good coverage, until you turn around.

I am talking about the dreadful state of fault reporting in remote APIs, from Twitter to Cloud interfaces. They are badly described in the interface documentation and the implementations often don’t even conform to what little is documented.

If, when reading a specification, you get the impression that the “normal” part of the specification is the result of hours of whiteboard debate but that the section that describes the faults is a stream-of-consciousness late-night dump that no-one reviewed, well… you’re most likely right. And this is not only the case for standard-by-committee kind of specifications. Even when the specification is written to match the behavior of an existing implementation, error handling is often incorrectly and incompletely described. In part because developers may not even know what their application returns in all error conditions.

After learning the lessons of SOAP-RPC, programmers are now more willing to acknowledge and understand the on-the-wire messages received and produced. But when it comes to faults, there is still a tendency to throw their hands in the air, write to the application log and then let the stack do whatever it does when an unhandled exception occurs, on-the-wire compliance be damned. If that means sending an HTML error message in response to a request for a JSON payload, so be it. After all, it’s just a fault.

But even if fault messages may only represent 0.001% of the messages your application sends, they still represent 85% of those that the client-side developers will look at.

Client developers can’t even reverse-engineer the fault behavior by hitting a reference implementation (whether official or de-facto) the way they do with regular messages. That’s because while you can generate response messages for any successful request, you don’t know what error conditions to simulate. You can’t tell your Cloud provider “please bring down your user account database for five minutes so I can see what faults you really send me when that happens”. Also, when testing against a live application you may get a different fault behavior depending on the time of day. A late-night coder (or a daytime coder in another time zone) might never see the various faults emitted when the application (like Twitter) is over capacity. And yet these will be quite common at peak time (when the coder is busy with his day job… or sleeping).

All these reasons make it even more important to carefully (and accurately) document fault behavior.

The move to REST makes matters even worse, in part because it removes SOAP faults. There’s nothing magical about SOAP faults, but at least they force you to think about providing an information payload inside your fault message. Many REST APIs replace that with HTTP error codes, often accompanied by a one-line description with a sometimes unclear relationship with the semantics of the application. Either it’s a standard error code, which by definition is very generic or it’s an application-defined code at which point it most likely overlaps with one or more standard codes and you don’t know when you should expect one or the other. Either way, there is too much faith put in the HTTP code versus the payload of the error. Let’s be realistic. There are very few things most applications can do automatically in response to a fault. Mainly:

  • Ask the user to re-enter credentials (if it’s an authentication/permission issue)
  • Retry (immediately or after some time)
  • Report a problem and fail

So make sure that your HTTP errors support this simple decision tree. Beyond that point, listing a panoply of application-specific error codes looks like an attempt to look “RESTful” by overdoing it. In most cases, application-specific error codes are too detailed for most automated processing and not detailed enough to help the developer understand and correct the issue. I am not against using them but what matters most is the payload data that comes along.

On that aspect, implementations generally fail in one of two extremes. Some of them tell you nothing. For example the payload is a string that just repeats what the documentation says about the error code. Others dump the kitchen sink on you and you get a full stack trace of where the error occurred in the server implementation. The former is justified as a security precaution. The latter as a way to help you debug. More likely, they both just reflect laziness.

In the ideal world, you’d get a detailed error payload telling you exactly which of the input parameters the application choked on and why. Not just vague words like “invalid”. Is parameter “foo” invalid for syntactical reasons? Is it invalid because inconsistent with another parameter value in the request? Is it invalid because it doesn’t match the state on the server side? Realistically, implementations often can’t spend too many CPU cycles analyzing errors and generating such detailed reports. That’s fine, but then they can include a link to a wiki a knowledge base where more details are available about the error, its common causes and the workarounds.

Your API should document all messages accurately and comprehensively. Faults are messages too.

28
Apr
2010

PaaS portability challenges and the VMforce example

by William (@vambenepe on Twitter)

The VMforce announcement is a great step for SalesForce, in large part because it lets them address a recurring concern about the force.com PaaS offering: the lack of portability of Apex applications. Now they can be written using Java and Spring instead. A great illustration of how painful this issue was for SalesForce is to see the contortions that Peter Coffee goes through just to acknowledge it: “On the downside, a project might be delayed by debates—some in good faith, others driven by vendor FUD—over the perception of platform lock-in. Political barriers, far more than technical barriers, have likely delayed many organizations’ return on the advantages of the cloud”. The issue is not lock-in it’s the potential delays that may come from the perception of lock-in. Poetic.

Similarly, portability between clouds is also a big theme in Steve Herrod’s blog covering VMforce as illustrated by the figure below. The message is that “write once run anywhere” is coming to the Cloud.

Because this is such a big part of the VMforce value proposition, both from the SalesForce and the VMWare/SpringSource side (as well as for PaaS in general), it’s worth looking at the portability aspect in more details. At least to the extent that we can do so based on this pre-announcement (VMforce is not open for developers yet). And while I am taking VMforce as an example, all the considerations below apply to any enterprise PaaS offering. VMforce just happens to be one of the brave pioneers, willing to take a first step into the jungle.

Beyond the use of Java as a programming language and Spring as a framework, the portability also comes from the supporting tools. This is something I did not cover in my initial analysis of VMforce but that Michael Cote covers well on his blog and Carl Brooks in his comment. Unlike the more general considerations in my previous post, these matters of tooling are hard to discuss until the tools are actually out. We can describe what they “could”, “should” and “would” do all day long, but in the end we need to look at the application in practice and see what, if anything, needs to change when I redirect my deployment target from one cloud to the other. As SalesForce’s Umit Yalcinalp commented, “the details are going to be forthcoming in the coming months and it is too early to speculate”.

So rather than speculating on what VMforce tooling will do, I’ll describe what portability questions any PaaS platform would have to address (or explicitly decline to address).

Code portability

That’s the easiest to address. Thanks to Java, the runtime portability problem for the core language is pretty much solved. Still, moving applications around require changes to way the application communicates with its infrastructure. Can your libraries and frameworks for data access and identity, for example, successfully encapsulate and hide the different kinds of data/identity stores behind them? Even when the stores are functionally equivalent (e.g. SQL, LDAP), they may have operational differences that matter for an enterprise application. Especially if the database is delivered (and paid for) as a service. I may well design my application differently depending on whether I am charged by the amount of data in the DB, by the number of requests to the DB, by the quantity of app-to-DB traffic or by the total processing time of my requests in the DB. Apparently force.com considers the number of “database objects” in its pricing plans and going over 200 pushes you from the “Enterprise” version to the more expensive “Unlimited” version. If I run against my local relational database I don’t think twice about having 201 “database objects”. But if I run in force.com and I otherwise can live within the limits of the “Enterprise” version I’d probably be tempted to slightly alter my data model to fit under 200 objects. The example is borderline silly, but the underlying truth is that not all differences in application infrastructure can be automatically encapsulated by libraries.

While code portability is a solvable problem for a reasonably large set of use cases, things get hairier for the more demanding applications. A large part of the PaaS value proposition is contingent on the willingness to give up some low-level optimizations. This, and harder portability in some cases, may just have to be part of the cost of running demanding applications in a PaaS environment. Or just keep these off PaaS for now. This is part of the backward-compatible versus forward compatible Cloud dilemma.

Data portability

I have covered data portability in the previous entry, in response to Steve Herrod’s comment that “you should be able to extract the code from the cloud it currently runs in and move it, along with its data, to another cloud choice”. Your data in the force.com database can already be moved somewhere else… as long as you’re willing to write code to get it and perform any needed transformation. In theory, any data that you can read is data that you can move (thus fulfilling Steve’s promise). The question is at what cost. Presumably Steve is referring to data migration tools that VMWare will build (or acquire) and make part of its cloud enablement platform. Another way in which VMWare is trying to assemble a more complete middleware portfolio (see Oracle ODI for an example of a complete data integration offering, which goes far beyond ETL).

There is a subtle difference between the intrinsic portability of Java (which will run in any JRE, modulo JDK version) and the extrinsic portability of data which can in theory be moved anywhere but each place you move it to may require a different process. A car and an oak armoire are both “portable”, but one is designed for moving while the other will only move if you bring a truck and two strong guys.

Application service portability

I covered this in my previous entry and Bob Warfield summarized it as “take advantage of all those juicy services and it will be hard to back out of that platform, Java or no Java”. He is referring to all the platform services (search, reporting, mobile, integration, BPM, IdM, administration) that make a large part of the force.com value proposition. They won’t be waiting for you in your private cloud (though some may be remotely invocable, depending on how SalesForce wants to play its cards). Applications that depend on them will have to be changed, at least until we have standards interfaces for all these services (don’t hold your breath).

Management portability

Even if you can seamlessly migrate your application and your data from your internal servers to force.com, what do you think is going to happen to your management console, especially if it uses operating system agents? These agents are not coming along for the ride, that’s for sure. Are you going to tell your administrators that rather than having a centralized configuration/monitoring/event console they are going to have to look at cute “monitoring” web pages for each application? And all the transaction tracing, event correlation, configuration policy and end-user monitoring features they were relying on are unfortunate victims of the relentless march of progress? Good luck with that sale.

VMWare’s answer will probably be that they will eventually provide you with all the management capabilities that you need. And it’s a fair one, along the lines of the “Application-to-Disk Management” message at the recent launch of Oracle Enterprise Manager 11G. With the difference that EM is not the only way to manage a top-to-bottom Oracle stack, just the one that we think is the best. BMC and HP aren’t locked out.

VMWare and SpringSource (+Hyperic) could indeed theoretically assemble a full-fledged management solution. But this doesn’t happen overnight, even with acquisitions as I know from experience both at HP Software and currently at Oracle. Integration (of management domains across the stack, of acquired application management products, of support data/services from oracle.com) is one of the main advances in Enterprise Manager 11G and it took work.

And even then, this leads to the next logical question. If you can move from cloud to cloud but you are forced to use VMWare development, deployment and management tools, haven’t you traded one lock-in problem for another?

Not to mention that your portability between clouds, if it depends on VMWare tools, is limited to VMWare-powered clouds (private or public). In effect, there are now three levels of portability:

  • not portable (only runs on VMforce)
  • portable to any cloud (public or private) built using VMWare infrastructure
  • portable to any Java/Spring Cloud platform

Is your application portable the way cash is portable, or the way a gift card is portable (across stores of a retail chain)?

If this reminds you of the java portability debates of the early days of Enterprise Java that’s no surprise. Remember, we’re replaying the tape.

21
Apr
2010

The battle of the Cloud Frameworks: Application Servers redux?

by William (@vambenepe on Twitter)

The battle of the Cloud Frameworks has started, and it will look a lot like the battle of the Application Servers which played out over the last decade and a half. Cloud Frameworks (which manage IT automation and runtime outsourcing) are to the Programmable Datacenter what Application Servers are to the individual IT server. In the longer term, these battlefronts may merge, but for now we’ve been transported back in time, to the early days of Web programming. The underlying dynamic is the same. It starts with a disruptive IT event (part new technology, part new mindset). 15 years ago the disruptive event was the Web. Today it’s Cloud Computing.

Stage 1

It always starts with very simple use cases. For the Web, in the mid-nineties, the basic use case was “how do I return HTML that is generated by a script as opposed to a static file”. For Cloud Computing today, it is “how do I programmatically create, launch and stop servers as opposed to having to physically install them”.

In that sense, the IaaS APIs of today are the equivalent of the Common Gateway Interface (CGI) circa 1993/1994. Like the EC2 API and its brethren, CGI was not optimized, not polished, but it met the basic use cases and allowed many developers to write their first Web apps (which we just called “CGI scripts” at the time).

Stage 2

But the limitations became soon apparent. In the CGI case, it had to do with performance (the cost of the “one process per request” approach). Plus, the business potential was becoming clearer and attracted a different breed of contenders than just academic and research institutions. So we got NSAPI, ISAPI, FastCGI, Apache Modules, JServ, ZDAC…

We haven’t reached that stage for Cloud yet. That will be when the IaaS APIs start to support events, enumerations, queries, federated identity etc…

Stage 3

Stage 2 looked like the real deal, when we were in it, but little did we know that we were still just nibbling on the hors d’oeuvres. And it was short-lived. People quickly decided that they wanted more than a way to handle HTTP requests. If the Web was going to be central to most programs, then all aspects of programming had to fit well in the context of the Web. We didn’t want Web servers anymore, we wanted application servers (re-purposing a term that had been used for client-server). It needed more features, covering data access, encapsulation, UI frameworks, integration, sessions. It also needed to meet non-functional requirements: availability, scalability (hello clustering), management, identity…

That turned into the battle between the various Java application servers as well as between Java and Microsoft (with .Net coming along), along with other technology stacks. That’s where things got really interesting too, because we explored different ways to attack the problem. People could still program at the HTTP request level. They could use MVC framework, ColdFusion/ASP/JSP/PHP-style markup-driven applications, or portals and other higher-level modular authoring frameworks. They got access to adapters, message buses, process flows and other asynchronous mechanisms. It became clear that there was not just one way to write Web applications. And the discovery is still going on, as illustrated by the later emergence of Ruby on Rails and similar frameworks.

Stage 4

Stage 3 is not over for Web applications, but stage 4 is already there, as illustrated by the fact that some of the gurus of stage 3 have jumped to stage 4. It’s when the Web is everywhere. Clients are everywhere and so are servers for that matter. The distinction blurs. We’re just starting to figure out the applications that will define this stage, and the frameworks that will best support them. The game is far from over.

So what does it mean for Cloud Frameworks?

If, like me, you think that the development of Cloud Frameworks will follow a path similar to that of Application Servers, then the quick retrospective above can be used as a (imperfect) crystal ball. I don’t pretend to be a Middleware historian or that these four stages are the most accurate representation, but I think they are a reasonable perspective. And they hold some lessons for Cloud Frameworks.

It’s early

We are at stage 1. I’ll admit that my decision to separate stages 1 and 2 is debatable and mainly serves to illustrate how early in the process we are with Cloud frameworks. Current IaaS APIs (and the toolkits that support them) are the equivalent of CGI (and the early httpd), something that’s still around (Google App Engine emulates CGI in its Python incarnation) but almost no-one programs to directly anymore. It’s raw, it’s clunky, it’s primitive. But it was a needed starting point that launched the whole field of Web development. Just like IaaS APIs like EC2 have launched the field Cloud Computing.

Cloud Frameworks will need to go through the equivalent of all the other stages. First, the IaaS APIs will get more optimized and capable (stage 2). Then, at stage 3, we will focus on higher-level, more productive abstraction layers (generally referred to as PaaS) at which point we should expect a thousand different approaches to bloom, and several of them to survive. I will not hazard a guess as to what stage 4 will look like (here is my guess for stage 3, in two parts).

No need to rush standards

One benefit of this retrospective is to highlight the tragedy of Cloud standards compared to Web development standards. Wouldn’t we be better off today if the development leads of AWS and a couple of other Cloud providers had been openly cooperating in a Cloud equivalent of the www-talk mailing list of yore? Out of it came a rough agreement on HTML and CGI that allowed developers to write basic Web applications in a reasonably portable way. If the same informal collaboration had taken place for IaaS APIs, we’d have a simple de-facto consensus that would support the low-hanging fruits of basic IaaS. It would allow Cloud developers to support the simplest use cases, and relieve the self-defeating pressure to standardize too early. Standards played a huge role in the development of Application Servers (especially of the Java kind), but that really took place as part of stage 3. In the absence of an equivalent to CGI in the Cloud world, we are at risk of rushing the standardization without the benefit of the experimentation and lessons that come in stage 3.

I am not trying to sugar-coat the history of Web standards. The HTML saga is nothing to be inspired by. But there was an original effort to build consensus that wasn’t even attempted with Cloud APIs. I like the staged process of a rough consensus that covers the basic use cases, followed by experimentation and proprietary specifications and later a more formal standardization effort. If we skip the rough consensus stage, as we did with Cloud, we end up rushing to do final step (to the tune of “customers demand Cloud standards”) even though all we need for now is an interoperable way to meet the basic use cases.

Winners and losers

Whoever you think of as the current leaders of the Application Server battle (hint: I work for one), they were not the obvious leaders of stages 1 and 2. So don’t be in too much of a hurry to crown the Cloud Framework kings. Those you think of today may turn out to be the Netscapes of that battle.

New roles

It’s not just new technology. The development of Cloud Frameworks will shape the roles of the people involved. We used to have designers who thought their job was done when they produced a picture or at best a FrameMaker or QuarkXPress document, which is what they were used to. We had “webmasters” who thought they were set for life with their new Apache skills, then quickly had to evolve or make way, a lesson for IaaS gurus of today. Under terms like “DevOps” new roles are created and existing roles are transformed. Nobody yet knows what will stick. But if I was an EC2 guru today I’d make sure to not get stuck providing just that. Don’t wait for other domains of Cloud expertise to be in higher demand than your current IaaS area, as by then you’ll be too late.

It’s the stack

There aren’t many companies out there making a living selling a stand-alone Web server. Even Zeus, who has a nice one, seems to be downplaying it on its site compared to its application delivery products. The combined pressure of commoditization (hello Apache) and of the demand for a full stack has made it pretty hard to sell just a Web server.

Similarly, it’s going to be hard to stay in business selling just portions of a Cloud Framework. For example, just provisioning, just monitoring, just IaaS-level features, etc. That’s well-understood and it’s fueling a lot of the acquisitions (e.g. VMWare’s purchase of SpringSource which in turn recently purchased RabbitMQ) and partnerships (e.g. recently between Eucalyptus and GroundWork though rarely do such “partnerships” rise to the level of integration of a real framework).

It’s not even clear what the right scope for a Cloud Framework is. What makes a full stack and what is beyond it? Is it just software to manage a private Cloud environment and/or deployments into public Clouds? Does the framework also include the actual public Cloud service? Does it include hardware in some sort of “private Cloud in a box”, of the kind that this recent Dell/Ubuntu announcement seems to be inching towards?

Integration

If indeed we can go by the history of Application Server to predict the future of Cloud Frameworks, then we’ll have a few stacks (with different levels of completeness, standardized or proprietary). This is what happened for Web development (the JEE stack, the .Net stack, a more loosely-defined alternative stack which is mostly open-source, niche stacks like the backend offered by Adobe for Flash apps, etc) and at some point the effort moved from focusing on standardizing the different application environment technology alternatives (e.g. J2EE) towards standardizing how the different platforms can interoperate (e.g. WS-*). I expect the same thing for Cloud Frameworks, especially as they grow out of stages 1 and 2 and embrace what we call today PaaS. At which point the two battlefields (Application Servers and Cloud Frameworks) will merge. And when this happens, I just can’t picture how one stack/framework will suffice for all. So we’ll have to define meaningful integration between them and make them work.

If you’re a spectator, grab plenty of popcorn. If you’re a soldier in this battle, get ready for a long campaign.

02
Apr
2010

Enterprise application integration patterns for IT management: a blast from the past or from the future?

by William (@vambenepe on Twitter)

In a recent blog post, Don Ferguson (CTO at CA) describes CA Catalyst, a major architectural overhaul which “applies enterprise application integration patterns to the problem of integrating IT management systems”. Reading this was fascinating to me. Not because the content was some kind of revelation, but exactly for the opposite reason. Because it is so familiar.

For the better part of the last decade, I tried to build just this at HP. In the process, I worked with (and sometimes against) Don’s colleague at IBM, who were on the same mission. Both companies wanted a flexible and reliable integration platform for all aspects of IT management. We had decided to use Web services and SOA to achieve it. The Web services management protocols that I worked on (WSMF, WSDM, WS-Management and the “reconciliation stack”) were meant for this. We were after management integration more than manageability. Then came CMDBf, another piece of the puzzle. From what I could tell, the focus on SOA and Web services had made Don (who was then Mr. WebSphere) the spiritual father of this effort at IBM, even though he wasn’t at the time focused on IT management.

As far as I know, neither IBM nor HP got there. I covered some of the reasons in this post-mortem. The standards bickering. The focus on protocols rather than models. The confusion between the CMDB as a tool for process/service management versus a tool for software integration. Within HP, the turmoil from the many software acquisitions didn’t help, and there were other reasons. I am not sure at this point whether either company is still aiming for this vision or if they are taking a different approach.

But apparently CA is still on this path, and got somewhere. At least according to Don’s post. I have no insight into what was built beyond what’s in the post. I am not endorsing CA Catalyst, just agreeing with the design goals listed by Don. If indeed they have built it, and the integration framework resists the test of time, that’s impressive. And exciting. It apparently even uses some the same pieces we were planning to use, namely WS-Management and CMDBf (I am reluctantly associated with the first and proudly with the second).

While most readers might not share my historical connection with this work, this is still relevant and important to anyone who cares about IT management in the enterprise. If you’re planning to be at CA World, go listen to Don. Web services may have a bad name, but the technical problems of IT management integration remain. There are only a few routes to IT management automation (I count seven, the one taken by CA is #2). You can throw away SOAP if you want, you still need to deal with protocol compatibility, model alignment and instance reconciliation. You need to centralize or orchestrate the management operations performed. You need to be able to integrate with complementary products or at the very least to effectively incorporate your acquisitions. It’s hard stuff.

Bonus point to Don for not forcing a “Cloud” angle for extra sparkle. This is core IT management.

19
Mar
2010

“Freeing SaaS from Cloud”: slides and notes from Cloud Connect keynote

by William (@vambenepe on Twitter)

I got invited to give a short keynote presentation during the Cloud Connect conference this week at the Santa Clara Convention Center (thanks Shlomo and Alistair). Here are the slides (as PPT and PDF). They are visual support for my bad jokes rather than a medium for the actual message. So here is an annotated version.

I used this first slide (a compilation of representations of the 3-layer Cloud stack) to poke some fun at this ubiquitous model of the Cloud architecture. Like all models, it’s neither true nor false. It’s just more or less useful to tackle a given task. While this 3-layer stack can be relevant in the context of discussing economic aspects of Cloud Computing (e.g. Opex vs. Capex in an on-demand world), it is useless and even misleading in the context of some more actionable topics for SaaS: chiefly, how you deliver such services, how you consume them and how you manage them.

In those contexts, you shouldn’t let yourself get too distracted by the “aaS” aspect of SaaS and focus on what it really is.

Which is… a web application (by which I include both HTML access for humans and programmatic access via APIs.). To illustrate this point, I summarized the content of this blog entry. No need to repeat it here. The bottom line is that any distinction between SaaS and POWA (Plain Old Web Applications) is at worst arbitrary and at best concerned with the business relationship between the provider and the consumer rather than  technical aspects of the application.

Which means that for most technical aspect of how SaaS is delivered, consumed and managed, what you should care about is that you are dealing with a Web application, not a Cloud service. To illustrate this, I put up the…

… guillotine slide. Which is probably the only thing people will remember from the presentation, based on the ample feedback I got about it. It probably didn’t hurt that I also made fun of my country of origin (you can never go wrong making fun of France), saying that the guillotine was our preferred way of solving any problem and also the last reliable piece of technology invented in France (no customer has ever come back to complain). Plus, enough people in the audience seemed to share my lassitude with the 3-layer Cloud stack to find its beheading cathartic.

Come to think about it, there are more similarities. The guillotine is to the axe what Cloud Computing is to traditional IT. So I may use it again in Cloud presentations.

Of course this beheading is a bit excessive. There are some aspects for which the IaaS/PaaS/SaaS continuum makes sense, e.g. around security and compliance. In areas related to multi-tenancy and the delegation of control to a third party, etc. To the extent that these concerns can be addressed consistently across the Cloud stack they should be.

But focusing on these “Cloud” aspects of SaaS is missing the forest for the tree.

A large part of the Cloud value proposition is increased flexibility. At the infrastructure level, being able to provision a server in minutes rather than days or weeks, being able to turn one off and instantly stop paying for it, are huge gains in flexibility. It doesn’t work quite that way at the application level. You rarely have 500 new employees joining overnight who need to have their email and CRM accounts provisioned. This is not to minimize the difficulties of deploying and scaling individual applications (any improvement is welcome on this). But those difficulties are not what is crippling the ability of IT to respond to business needs.

Rather, at the application level, the true measure of flexibility is the control you maintain on your business processes and their orchestration across applications. How empowered (or scared) you are to change them (either because you want to, e.g. entering a new business, or because you have to, e.g. a new law). How well your enterprise architecture has been defined and implemented. How much visibility you have into the transactions going through your business applications.

It’s about dealing with composite applications, whether or not its components are on-premise or “in the Cloud”. Even applications like Salesforce.com see a large number of invocations from their APIs rather than their HTML front-end. Which means that there are some business applications calling them (either other SaaS, custom applications or packaged applications with an integration to Salesforce). Which means that the actual business transactions go across a composite system and have to be managed as such, independently of the deployment model of each participating application.

[Side note: One joke that fell completely flat was that it was unlikely that the invocations of Salesforce  through the Web services APIs be the works of sales people telneting to port 80 and typing HTTP and SOAP headers. Maybe I spoke too quickly (as I often do), or the audience was less nerdy than I expected (though I recognized some high-ranking members of the nerd aristocracy). Or maybe they all got it but didn't laugh because I forgot to take encryption into account?]

At this point I launched into a very short summary of the benefits of SOA governance/management, real user experience monitoring, BTM and application-centric IT management in general. Which is very succinctly summarized on the slide by the “SOA” label on the receiving bucket. I would have needed another 10 minutes to do this subject justice. Maybe at Cloud Connect 2011? Alistair?

This picture of me giving the presentation at Cloud Connect is the work of Alex Dunne.

The guillotine picture is the work of Rusty Boxcars who didn’t just take the photo but apparently built the model (yes it’s a small-size model, look closely). Here it is in its original, unedited, glory. My edited version is available under the same CC license to anyone who wants to grab it.

18
Mar
2010

Standards Disconnect at Cloud Connect

by William (@vambenepe on Twitter)

Yesterday’s panel session on the future of Cloud standards at Cloud Connect is still resonating on Twitter tonight. Many were shocked by how acrimonious the debate turned. It didn’t have to be that way but I am not surprised that it was.

The debate was set up and moderated by Bob Marcus (ET-Strategies CTO and master standards coordinator). On stage were Krishna Sankar (Cisco and DMTF Cloud incubator), Archie Reed (HP and CSA), Winston Bumpus (VMWare and DMTF), a gentleman whose name I unfortunately forgot (and who isn’t listed on the program) and me.

If the goal was to glamorize Cloud standards, it was a complete failure. If the goal was to come out with some solutions and agreements, it was also a failure. But if the goal, as I believe, was to surface the current issues, complexities, emotions and misunderstandings surrounding Cloud standards, then I’d say it was a success.

I am not going to attempt to summarize the whole discussion. Charles Babcock, who was in the audience, does a good enough job in this InformationWeek article and, unlike me, he doesn’t have a horse in the race [side note: I am not sure why my country of origin is relevant to his article, but my guess is that this is the main thing he remembered from my presentation during the Cloud Connect keynote earlier that morning, thanks to the "guillotine" slide].

Instead of reporting on what happened during the standards discussion, I’ll just make one comment and provide one take-away.

The comment: the dangers of marketing standards

Early in the session, audience member Reuven Cohen complained that standards organizations don’t do enough to market their specifications. Winston was more than happy to address this and talk about all the marketing work that DMTF does, including trade shows and PR. He added that this is one of the reasons why DMTF needs to charge membership fees, to pay for this marketing. I agree with Winston at one level. Indeed, the DMTF does what he describes and puts a fair amount of efforts into marketing itself and its work. But I disagree with Reuven and Winston that this is a good thing.

First it doesn’t really help. I don’t think that distributing pens and tee-shirts to IT admins and CIO-wannabes results in higher adoption of your standard. Because the end users don’t really care what standard is used. They just want a standard. Whether it comes from DMTF, SNIA, OGF, or OASIS is the least of their concerns. Those that you have to convince to adopt your standard are the vendors and the service providers. The Amazon, Rackspace and GoGrid of the world. The Microsoft, Oracle, VMWare and smaller ones like… Enomaly (Reuven’s company). The highly-specialized consultants who work with them, like Randy. And also, very importantly, the open source developers who provide all the Cloud libraries and frameworks that are the lifeblood of many deployments. I have enough faith left in human nature to assume that all these guys make their strategic standards decisions on a bit more context than exhibit hall loot and press releases. Well, at least we do where I work.

But this traditional approach to marketing is worse than not helping. It’s actually actively harmful, for two reasons. The first is that the cost of these activities, as Winston acknowledges, creates a barrier for participation by requiring higher dues. To Winston it’s an unfortunate side effect, to me it’s a killer. Not necessarily because dropping the membership fee by 50% would bring that many more participants. But because the organizations become so dependent on dues that they are paranoid about making anything public for fear of lowering the incentive for members to keep paying. Which is the worst thing you can do if you want the experts and open source developers, who are the best chance Cloud standards have to not repeat the mistakes of the past, to engage with the standard. Not necessarily as members of the group, also from the outside. Assuming the work happens in public, which is the key issue.

The other reason why it’s harmful to have a standards organization involved in such traditional marketing is that it has a tendency to become a conduit for promoting the agenda of the board members. Promoting a given standard or organization sounds good, until you realize that it’s rarely so pure and unbiased. The trade shows in which the organization participates are often vendor-specific (e.g. Microsoft Management Summit, VMWorld…). The announcements are timed to coincide with relevant corporate announcements. The press releases contain quotes from board members who promote themselves at the same time as the organization. Officers speaking to the press on behalf of the standards organization are often also identified by their position in their company. Etc. The more a standards organization is involved in marketing, the more its low-level members are effectively subsiding the marketing efforts of the board members. Standards have enough inherent conflicts of interest to not add more opportunities.

Just to be clear, that issue of standards marketing is not what consumed most of the time during the session. But it came up and I since I didn’t get a chance to express my view on this while on the panel, I used this blog instead.

My take-away from the panel, on the other hand, is focused on the heart of the discussion that took place.

The take away: confirmation that we are going too fast, too early

Based on this discussion and other experiences, my current feeling on Cloud standards is that it is too early. If you think the practical experience we have today in Cloud Computing corresponds to what the practice of Cloud Computing will be in 10 years, then please go ahead and standardize. But let me tell you that you’re a fool.

The portion of Cloud Computing in which we have some significant experience (get a VM, attach a volume, assign an IP) will still be relevant in 10 years, but it will be a small fraction of Cloud Computing. I can tell you that much even if I can’t tell you what the whole will be. I have my ideas about what the whole will look like but it’s just a guess. Anybody who pretends to know is fooling you, themselves, or both.

I understand the pain of customers today who just want to have a bit more flexibility and portability within the limited scope of the VM/Volume/IP offering. If we really want to do a standard today, fine. Let’s do a very small and pragmatic standard that addresses this. Just a subset of the EC2 API. Don’t attempt to standardize the virtual disk format. Don’t worry about application-level features inside the VM. Don’t sweat the REST or SOA purity aspects of the interface too much either. Don’t stress about scalability of the management API and batching of actions. Just make it simple and provide a reference implementation. A few HTTP messages to provision, attach, update and delete VMs, volumes and IPs. That would be fine. Anything else (and more is indeed needed) would be vendor extensions for now.

Unfortunately, neither of these (waiting, or a limited first standard) is going to happen.

Saying “it’s too early” in the standards world is the same as saying nothing. It puts you out of the game and has no other effect. Amazon, the clear leader in the space, has taken just this position. How has this been understood? Simply as “well I guess we’ll do it without them”. It’s sad, but all it takes is one significant (but not necessarily leader) company trying to capitalize on some market influence to force the standards train to leave the station. And it’s a hard decision for others to not engage the pursuit at that point. In the same way that it only takes one bellicose country among pacifists to start a war.

Prepare yourself for some collateral damages.

While I would prefer for this not to proceed now (not speaking for my employer on this blog, remember), it doesn’t mean that one should necessarily stay on the sidelines rather than make lemonade out of lemons. But having opened the Cloud Connect panel session with somewhat of a mea-culpa (at least for my portion of responsibility) with regards to the failures of the previous IT management standardization wave, it doesn’t make me too happy to see the seeds of another collective mea-culpa, when we’ve made a mess of Cloud standards too. It’s not a given yet. Just a very high risk. As was made clear yesterday.

19
Feb
2010

HP has submitted a specification to the DMTF Cloud incubator

by William (@vambenepe on Twitter)

When I lamented, in a previous post, that I couldn’t tell you about recent submissions to the DMTF Cloud incubator, one of those I had in mind was a submission from HP. I can now write this, because the author of the specification, Nigel Cook, has recently blogged about it. Unfortunately he is isn’t publishing the specification itself, just an announcement that it was submitted. Hopefully he is currently going through the long approval process to make the submitted document public (been there, done that, I know it takes time).

In the blog, Nigel makes a good argument for the need to go beyond a hypervisor-centric view of Cloud computing. Even at the IaaS layer there are cases of automated-but-not-virtualized deployment that have all the characteristics of Cloud computing and need to be supported by Cloud management APIs. Not to mention OS-level isolation like Solaris Containers.

Nigel also offers a spirited defense of SOAP-based protocols. I don’t necessarily agree with all his points (“one could easily map the web service definition I described to REST if that was important” suggests a “it’s just SOAP without the wrapper” view of REST), but I am glad he is launching this debate. We need to discuss this rather than assume that REST is the obvious answer. Remember, a few years ago SOAP was just as obvious an answer to any protocol question. It may well be that indeed REST comes out ahead of this discussion, but the process will force us to be explicit about what benefits of REST we are trying to achieve and will allow us to be practical in the way we approach it.

17
Feb
2010

Waiting for events (in Cloud APIs)

by William (@vambenepe on Twitter)

Events/alerts/notifications have been a central concept in IT management at least since the first SNMP trap was emitted, and probably even long before that. And yet they are curiously absent from all the Cloud management APIs/protocols. If you think that’s because “THE CLOUD CHANGES EVERYTHING” then you may have to think again. Over the last few days, two of the most experienced practitioners of Cloud computing pointed out that this omission is a real pain in the neck. RightScale’s Thorsten von Eicken was first to request “an event based interface instead of a request-reply based interface”, pointing out that “we run a good number of machines that do nothing but chew up 100% cpu polling EC2 to detect changes”. George Reese seconded and started to sketch a solution. And while these blog posts gave the issue increased visibility recently, it has been a recurring topic on the AWS Forum and other similar discussion boards for quite some time. For example, in this thread going back to 2006, an Amazon employee wrote that “this is a feature we’ve discussed recently and we’re looking at options” (incidentally, I see a post by Thorsten in that old thread). We’re still waiting.

Let’s look at what it would take to define such a feature.

I have some experience with events for IT management, having been involved in the WS-Notification family of specifications and having co-chaired the OASIS technical committee that standardized them. This post is not about foisting WS-Notification on Cloud APIs, but just about surfacing some of the questions that come up when you try to standardize such a mechanism. While the main use cases for WS-Notification came from IT (and Grid) management, it was supposed to be a generic mechanism. A Cloud-centric eventing protocol can be made simpler by focusing on fewer use cases (Cloud scenarios only). In addition, WS-Notification was marred by the complexity-is-a-sign-of-greatness spirit of the time . On this too, a Cloud eventing protocol could improve things by keeping IBM at bay simplicity in mind.

Types of event

When you pull the state of a resource to see if anything changed,  you don’t have to tell the provider what kind of change you are interested in. If, on the other hand, you want the provider to notify you, then they need to know what you care about. You may not want to be notified on every single change in the resource state. How do you describe the changes you care about? Is there an agreed-upon set of states for the resource and you are only notified on state transitions? Can you indicate the minimum severity level for an event to be emitted? Who determines the severity of an event? Or do you get to specify what fields in the resource state you want to watch? What about numeric values for which you may not want to be notified of every change but only when a threshold is crossed? Do you get to specify a query and get notified whenever the query result changes? In WS-Notification some of this is handled by WS-Topics which I still like conceptually (I co-edited it) but is too complex for the task at hand.

Event formats

What format are the events serialized in? How is the even metadata captured (e.g. time stamp of observation, which may not be the same as the time at which the notification message was sent)? If the event payload is a representation of the new state of the resource, does it indicate what field changes (and what the old value was)? How do you keep event payloads consistent with the resource representation in the request/response interactions? If many events occur near the same time, can you group them in one notification message for better scalability?

Subscription creation

Presumably you need a subscription mechanism. Is the subscription set in stone when the resource is created? Or can you come later and subscribe? If subscription is an operation on the resource itself, how do you subscribe for events on something that doesn’t exist yet (e.g. “create a VM and notify me once it’s started”)? Do you get to set subscriptions on a per-resource-basis? Or is this a global setting for all the resources that you own? Can you have two different subscriptions on the same resource (e.g. a “critical events only” subscription that exist throughout the life of the resource, plus a “lots of events please” subscription that you keep for a few hours while troubleshooting)?

Subscription management

Do you get to come back and update/pause/delete a subscription? Do you get to change what filter the subscription carries? Or is it set in stone until the subscription expires? Can you change the delivery endpoint? What if events fail to be delivered? Does the provider cancel your subscription? After how many failures? Does it just pause it for a few hours? Keep trying?

Subscription expiration

Who sets the expiration period? The subscriber? Can the provider set a max duration? Do you get a warning message before the subscription expires? Can you renew a subscription or do you have to create a new one? Do you get a message telling you that it has expired? Where are these subscription-lifecycle messages sent? To the same endpoint as the regular messages? What if your subscription is being killed because your deliver endpoint is down, clearly it makes no sense to send the warning message to that same endpoint. Do you provide a separate “subscription management” endpoint (different from the event delivery endpoint) when you subscribe? Alternatively, does an email message get sent to the registered user who set the subscription?

Delivery reliability

How reliable do you want the notifications to be? Should the emitter retry until they’ve received a confirmation? How long do they keep messages that can’t be delivered? Some may have a very short shelf life while others are still useful weeks later. If you don’t have a reliable mechanism but you really “need to know about a lost server within a minute of it disappearing” (the example Georges gives) then in reality you may still have to poll just to make sure that an event wasn’t lost. If you haven’t received an event in a while, how can you test if the subscription is still working? Should subscriptions send a heartbeat message once a while?

Delivery mechanism

How do you deliver notifications? Do you keep HTTP connections open through tricks similar to how self-updating web pages work (e.g. COMET, long polling and soon WebSockets)? Or do you just provide a listener endpoint to which the notifier tries to connect (which, in the case of public cloud deployments, means you need to have a publicly-addressable listener, but hopefully not on the same Cloud infrastructure). Do you use XMPP? AMQP? Email? Can I have you hold my events and let me come pull them?

Security

Do you need to verify the origin of the events you receive? Or do you assume they may be forged and always initiate a connection to the provider to double-check? And on the other side, what are the security requirements for event delivery? If a user looses some of their privileges, do you have to go and cancel the still-active subscriptions that they created?

Throttling

Is there a maximum event rate? Do you get charged for the events the Cloud provider sends you? How do you make sure that someone doesn’t create a subscription pointing to the wrong endpoint (either erroneously or maliciously, e.g. DoS). Do you send a test message at registration asking the delivery endpoint to acknowledge that they indeed want to receive these notifications?

Conclusion

My goal is not to argue that we cannot have a simple yet good enough notification system or to scare anyone from attempting to define it. It’s just to show that it’s not as simple as it may seem at first blush. But there probably is a sweetspot and people like Thorsten and George are very well qualified to find it.

[UPDATED 2010/4/7: Amazon releases AWS Simple notification Service. Not just as an eventing feature for the Cloud API, as a generic notification service. Which can, of course, also carry Cloud management events. Though at this point you're on your own to publish them from your instances, it doesn't look like the AWS infrastructure can do it for you. Which means, for example, that you're not going to be able to publish an event for a sudden crash.]

Categories