Category Archives: Protocols

REST + RDF, finally a practical solution?

The W3C has recently approved the creation of the Linked Data Platform (LDP) Working Group. The charter contains its official marching orders. Its co-chair Erik Wilde shared his thoughts on the endeavor.

This is good. Back in 2009, I concluded a series of three blog posts on “REST in practice for IT and Cloud management” with:

I hereby conclude my “REST in practice for IT and Cloud management” series, with the intent to eventually start a “Linked Data in practice for IT and Cloud management” series.

I never wrote that later part, because my work took me away from that pursuit and there wasn’t much point writing down ideas which I hadn’t  put to the test. But if this W3C working group is successful, they will give us just that.

That’s a big “if” though. Religious debates and paralyzing disconnects between theorists and practitioners are all-too-common in tech, but REST and Semantic Web (of which RDF is the foundation) are especially vulnerable. Bringing these two together and trying to tame both sets of daemons at the same time is a daring proposition.

On the other hand, there is already a fair amount of relevant real-life experience (e.g. – read Jeni Tennison on the choice of Linked Data). Plus, Erik is a great pick to lead this effort (I haven’t met his co-chair, IBM’s Arnaud Le Hors). And maybe REST and RDF have reached the mythical point where even the trolls are tired and practicality can prevail. One can always dream.

Here are a few different ways to think about this work:

The “REST doesn’t provide interoperability” perspective

RESTful API authors too often think they can make the economy of a metamodel. Or that a format (like XML or JSON) can be used as a metamodel. Or they punt the problem towards defining a multitude of MIME types. This will never buy you interoperability. Stu explained it before. Most problems I see addressed via RESTful APIs, in the IT/Cloud management realm, are modeling problems first and only secondarily protocol/interaction problems. And their failures are failures of modeling. LDP should bring modeling discipline to REST-land.

The “RDF was too much, too soon” perspective

The RDF stack is mired in complexity. By the time people outside of academia had formed a set of modeling requirements that cried for RDF, the Semantic Web community was already deep in the weeds and had overloaded its basic metamodel with enough classification and inference technology to bury its core value as a simple graph-oriented and web-friendly metamodel. What XSD-fever was to making XML seem overly complex, OWL-fever was to RDF. Tenfold.

Everything that the LDP working group is trying to achieve can be achieved today with existing Semantic Web technologies. Technically speaking, no new work is needed. But only a handful of people understand these technologies enough to know what to use and what to ignore, and as such this application doesn’t have a chance to materialize. Which is why the LDP group is needed. But there’s a reason why its starting point document is called a “profile”. No new technology is needed. Only clarity and agreement.

For the record, I like OWL. It may be the technology that most influenced the way I think about modeling. But the predominance of RDFS and OWL (along with ugly serializations) in Semantic Web discussions kept RDF safely out of sight of those in industry who could have used it. Who knows what would have happened if a graph query language (SPARQL) had been prioritized ahead of inference technology (OWL)?

The Cloud API perspective

The scope of the LDP group is much larger than Cloud APIs, but my interest in it is mostly grounded in Cloud API use cases. And I see no reason why the requirements of Cloud APIs would not be 100% relevant to this effort.

What does this mean for the Cloud API debate? Nothing in the short term, but if this group succeeds, the result will probably be the best technical foundation for large parts of the Cloud management landscape. Which doesn’t mean it will be adopted, of course. The LDP timeline calls for completion in 2014. Who knows what the actual end date will be and what the Cloud API situation will be at that point. AWS APIs might be entrenched de-facto standards, or people may be accustomed to using several APIs (via libraries that abstract them away). Or maybe the industry will be clamoring for reunification and LDP will arrive just on time to underpin it. Though the track record is not good for such “reunifications”.

The “ghost of WS-*” perspective

Look at the 16 “technical issues” in the LCD working group charter. I can map each one to the relevant WS-* specification. E.g. see this as it relates to #8. As I’ve argued many times on this blog, the problems that WSMF/WSDM/WS-Mgmt/WS-RA and friends addressed didn’t go away with the demise of these specifications. Here is yet another attempt to tackle them.

The standards politics perspective

Another “fun” part of WS-*, beyond the joy of wrangling with XSD and dealing with multiple versions of foundational specifications, was the politics. Which mostly articulated around IBM and Microsoft. Well, guess what the primary competition to LDP is? OData, from Microsoft. I don’t know what the dynamics will be this time around, Microsoft and IBM alone don’t command nearly as much influence over the Cloud infrastructure landscape as they did over the XML middleware standardization effort.

And that’s just the corporate politics. The politics between standards organizations (and those who make their living in them) can be just as hairy; you can expect that DMTF will fight W3C, and any other organization which steps up, for control of the “Cloud management” stack. Not to mention the usual coo-petition between de facto and de jure organizations.

The “I told you so” perspective

When CMDBf started, I lobbied hard to base it on RDF. I explained that you could use it as just a graph-based metamodel, that you could  ignore the ontology and inference part of the stack. Which is pretty much what LDP is doing today. But I failed to convince the group, so we created our own metamodel (at least we explicitly defined one) and our own graph query language and that became CMDBf v1. Of course it was also SOAP-based.

KISS and markup

In closing, I’ll just make a plea for practicality to drive this effort. It’s OK to break REST orthodoxy. And not everything needs to be in RDF either. An overarching graph model is needed, but the detailed description of the nodes can very well remain in JSON, XML, or whatever format does the job for that node type.

All the best to LDP.


Filed under API, Cloud Computing, CMDB Federation, CMDBf, DMTF, Everything, Graph query, IBM, Linked Data, Microsoft, Modeling, Protocols, Query, RDF, REST, Semantic tech, SPARQL, Specs, Standards, Utility computing, W3C

The war on RSS

If the lords of the Internet have their way, the days of RSS are numbered.


John Gruber was right, when pointing to Dan Frakes’ review of the Mail app in Mountain Lion, to highlight the fact that the application drops support for RSS (he calls it an “interesting omission”, which is both correct and understated). It is indeed the most interesting aspect of the review, even though it’s buried at the bottom of the article; Along with the mention that RSS support appears to also be removed from Safari.

[side note: here is the correct link for the Safari information; Dan Frakes’ article mistakenly points to a staging server only available to MacWorld employees.]

It’s not just John Gruber and I who think that’s significant. The disappearance of RSS is pretty much the topic of every comment on the two MacWorld articles (for Mail and Safari). That’s heartening. It’s going to take a lot of agitation to reverse the trend for RSS.

The Mountain Lion setback, assuming it’s not reversed before the OS ships, is just the last of many blows to RSS.


Every twitter profile used to exhibit an RSS icon with the URL of a feed containing the user’s tweets. It’s gone. Don’t assume that’s just the result of a minimalist design because (a) the design is not minimalist and (b) the feed URL is also gone from the page metadata.

The RSS feeds still exist (mine is but to find them you have to know the userid of the user. In other words, knowing that my twitter username is @vambenepe is not sufficient, you have to know that the userid for @vambenepe is 18518601. Which is not something that you can find on my profile page. Unless, that is, you are willing to wade through the HTML source and look for this element:

<div data-user-id="18518601" data-screen-name="vambenepe">

If you know the Twitter API you can retrieve the RSS URL that way, but neither that nor the HTML source method is usable for most people.

That’s too bad. Before I signed up for Twitter, I simply subscribed to the RSS feeds of a few Twitter users. It got me hooked. Obviously, Twitter doesn’t see much value in this anymore. I suspect that they may even see a negative value, a leak in their monetization strategy.

[Updated on 2013/3/1: Unsurprisingly, Twitter is pulling the plug on RSS/Atom entirely.]


It used to be that if any page advertised an RSS feed in its metadata, Firefox would show an RSS icon in the address bar to call your attention to it and let you subscribe in your favorite newsreader. At some point, between Firefox 3 and Firefox 10, this disappeared. Now, you have to launch the “view page info” pop-up and click on “feeds” to see them listed. Or look for “subscribe to this page” in the “bookmarks” menu. Neither is hard, but the discoverability of the feeds is diminished. That’s especially unfortunate in the case of sites that don’t look like blogs but go the extra mile of offering relevant feeds. It makes discovering these harder.


Google has done a lot for RSS, but as a result it has put itself in position to kill it, either accidentally or on purpose. Google Reader is a nice tool, but, just like there has not been any new webmail after GMail, there hasn’t been any new hosted feed reader after Google Reader.

If Google closed GMail (or removed email support from it), email would survive as a communication mechanism (removing email from GMail is hard to imagine today, but keep in mind that Google’s survival doesn’t require GMail but they appear to consider it a matter of life or death for Google+ to succeed). If, on the other hand, Google closed Reader, would RSS survive? Doubtful. And Google has already tweaked Reader to benefit Google+. Not, for now, in a way that harms its RSS support. But whatever Google+ needs from Reader, Google+ will get.

[Updated 2013/3/13: Adios Google Reader. But I’m now a Google employee and won’t comment further.]

As far as the Chrome browser is concerned, I can’t find a way to have it acknowledge the presence of feeds in a page at all. Unlike Firefox, not even “view page info” shows them; It appears that the only way is to look for the feed URLs in the HTML source.


I don’t use Facebook, but for the benefit of this blog post I did some actual research and logged into my account there. I looked for a feed on a friend’s page. None in sight. Unlike Twitter, who started with a very open philosophy, I’m guessing Facebook never supported feeds so it’s probably not a regression in their case. Just a confirmation that no help should be expected from that side.

[update: in fact, Facebook used to offer RSS and killed it too.]

Not looking good for RSS

The good news is that there’s at least one thing that Facebook, Apple, Twitter and (to a lesser extent so far) Google seem to agree on. The bad news is that it’s that RSS, one of the beacons of openness on the internet, is the enemy.

[side note: The RSS/Atom question is irrelevant in this context and I purposedly didn’t mention Atom to not confuse things. If anyone who’s shunning RSS tells you that if it wasn’t for the RSS/Atom confusion they’d be happy to use a standard syndication format, they’re pulling your leg; same thing if they say that syndication is “too hard for users”.]


Filed under Apple, Big picture, Everything, Facebook, Google, Protocols, Social networks, Specs, Standards, Twitter

DMTF publishes draft of Cloud API

Note to anyone who still cares about IaaS standards: the DMTF has published a work in progress.

There was a lot of interest in the topic in 2009 and 2010. Some heated debates took place during Cloud conferences and a few symposiums were organized to try to coordinate various standard efforts. The DMTF started an “incubator” on the topic. Many companies brought submissions to the table, in various levels of maturity: VMware, Fujitsu, HP, Telefonica, Oracle and RedHat. IBM and Microsoft might also have submitted something, I can’t remember for sure.

The DMTF has been chugging along. The incubator turned into a working group. Unfortunately (but unsurprisingly), it limited itself to the usual suspects (and not all the independent Cloud experts out there) and kept the process confidential. But this week it partially lifted the curtain by publishing two work-in-progress documents.

They can be found at but if you read this after March 2012 they won’t be there anymore, as DMTF likes to “expire” its work-in-progress documents. The two docs are:

The first one is the interesting one, and the one you should read if you want to see where the DMTF is going. It’s a RESTful specification (at the cost of some contortions, e.g. section It supports both JSON and XML (bad idea). It plans to use RelaxNG instead of XSD (good idea). And also CIM/MOF (not a joke, see the second document for proof). The specification is pretty ambitious (it covers not just lifecycle operations but also monitoring and events) and well written, especially for a work in progress (props to Gil Pilz).

I am surprised by how little reaction there has been to this publication considering how hotly debated the topic used to be. Why is that?

A cynic would attribute this to people having given up on DMTF providing a Cloud API that has any chance of wide adoption (the adjoining CIM document sure won’t help reassure DMTF skeptics).

To the contrary, an optimist will see this low-key publication as a sign that the passions have cooled, that the trusted providers of enterprise software are sitting at the same table and forging consensus, and that the industry is happy to defer to them.

More likely, I think people have, by now, enough Cloud experience to understand that standardizing IaaS APIs is a minor part of the problem of interoperability (not to mention the even harder goal of portability). The serialization and plumbing aspects don’t matter much, and if they do to you then there are some good libraries that provide mappings for your favorite language. What matters is the diversity of resources and services exposed by Cloud providers. Those choices strongly shape the design of your application, much more than the choice between JSON and XML for the control API. And nobody is, at the moment, in position to standardize these services.

So congrats to the DMTF Cloud Working Group for the milestone, and please get the API finalized. Hopefully it will at least achieve the goal of narrowing down the plumbing choices to three (AWS, OpenStack and DMTF). But that’s not going to solve the hard problem.


Filed under API, Application Mgmt, Automation, Cloud Computing, DMTF, Everything, IaaS, IT Systems Mgmt, Manageability, Mgmt integration, Modeling, Portability, Protocols, REST, Specs, Standards, Tech, Utility computing, Virtual appliance, Virtualization

AJAX+REST as the latest architectural mirage

If the Web wasn’t tragically amnesic, I could show you 15-year old articles explaining how XSLT was about to revolutionize Web applications. In that vision, your Web server would return an XML file with all the data; and alongside that XML, an XSLT (which describes how to transform the XML into HTML). The browser would run it on the XML data and display the resulting HTML. Voila! This was going to bring all kinds of benefits over the old server-spits-out-HTML model. The XML could be easily consumed by other applications (not just humans) and different XSLTs could be used to adapt to the various client platforms.

This hasn’t panned out. At least not in that form. Enters AJAX. The XML doc is still there, though it usually wants to be called JSON. The XSLT is now a big pile of JavaScript. That model has many advantages over the XSLT model, the first one being that you don’t have to use XSLT (and I’m talking as someone who actually enjoys XPath). It’s a lot more flexible, you can do small updates and partial page refresh, etc. But does it also maintain the architectural cleanliness of having a data API separated from the rendering logic?

Sometimes. Lori MacVittie describes that model. That’s how the cool kids do it and they make sure to repeat in every sentence that their Web app uses the same API as 3rd party apps. The Twitter web app, for example, is in this category, as Mike Loukides describes. As is Apache Orion (the diagram below comes from the Orion architecture)

That’s one model, and it is conceptually very elegant. One API, many consumers. The Web site of the service provider is just another consumer. Easy versioning. An application management dream (one API to manage, a well-defined set of operations and flows to test, trace and diagnose). From a security perspective, it offers the smallest possible attack surface. Easy interoperability between different applications consuming the same API. All goodies.

And not just theoretical goodies, there are situations where it is the right model.

And yet I am still dubious that it’s going to be the dominant model. Clients of the same service support different interaction models and it’s hard for a single API to work well for all without sprawling out of control (to the point where calling it “one API” becomes a fig leaf). But if you want to keep the API surface small, you might end up with chatty apps. Not to mention the temptation for service providers to give their software special access over those of their partners/competitors (e.g. other Twitter clients).

Take Google+. As of this writing, the web site is up and obviously very AJAX-driven. And yet the API is not available. There maybe non-technical reasons for it, but if the Google+ web site was just another consumer of the API then wouldn’t, by definition, the API already be up?

The decision of whether to expose the interface consumed by your AJAX app as an open API also has ramifications in the implementation strategy. Such an approach pretty much rules out using frameworks that integrate server-side and browser-side development and pushes you towards writing them separately (and thus controlling all the details of how they interact so that you can make sure it happens in a way that’s consumable by 3rd parties). Though the reverse is not true. You may decide that you don’t want that API exposed to 3rd parties and yet still manually define it and keep your server-side and browser-side code at arms length.

If you decide to go the “one REST API for all” route and forgo frameworks that integrate browser code and server code, how much are you leaving on the table? After all, preeminent developers love to sneer at such frameworks. But that’s a short-sighted view.

Some tennis players think of their racket as one tool. Others, who own a stringing machine, think of the frame and the string as two tools, that they expertly combine. Similarly, not all Web developers want to think of their client framework and their server framework as two tools. Using them as one, pre-assembled, tool may not provide the most optimal code, but may still be the optimal use of your development resources.

There’s a bit of Ricardian angle to this. Even if you can produce better JavaScript (by “better” I mean better suited to your need) than the framework, you have a higher Comparative Advantage in developing business logic than JavaScript so you should focus your efforts there and “import” the JavaScript from the framework (which is utterly incompetent in creating business logic).

Just like, in Ricardo’s famous example, Portugal is better off importing its cloth from England (and focusing on producing wine) even though it is, in absolute term, more able to produce cloth than England is.

Which contradicts Matt Raible’s statement that “the only people that like Ajax integrated into their web frameworks are programmers scared of JavaScript (who probably shouldn’t be developing your UI).” His characterization is sometimes correct, but not absolute as he asserts.

I wouldn’t write Google+ with ADF, but it provides benefits to large class of applications, especially internal applications. Where you’re willing to give away some design control for the benefit of faster development and better-tested JavaScript.

Then there is the orthogonal question of whether AJAX technologies are well-suited to a RESTful architecture. You may think it’s obvious, since both are natively designed for the Web. But a wine glass and a steering wheel are both natively designed for the human hand; that doesn’t make them a good pair. Here’s one way to plant doubt in your mind: if AJAX was a natural fit for REST, would we need the atrocity known as the hashbang? Would AJAX applications need to be made crawlable? Reuven Cohen asserts that “AJAX is quite possibly the worst way to consume a RESTful API”, but unfortunately he doesn’t develop the demonstration. Maybe a topic for a future post.

“Because that’s the way it’s done now” was a bad reason to transform perfectly-functional XML-RPC into “message-oriented” SOAP. It also is a bad reason for assuming that your Web application needs to be AJAX-on-REST.

I’ll leave the last word to Stefan Tilkov: “Don’t confuse integration architecture with application architecture.” His talk doesn’t focus on how to build Web UIs, but the main lesson applies. Here’s the video and here are the slides (warning: Flash and PDF, respectively, which is sadly ironic for such a good presentation about Web technology).


Filed under API, Application Mgmt, Everything, JavaScript, Middleware, Mobile, Protocols, REST, Tech, Web services

Yoga framework for REST-like partial resource access

A tweet by Stefan Tilkov brought Yoga to my attention, “a framework for supporting REST-like URI requests with field selectors”.

As the name suggests, “Yoga” lets you practice some contortions that would strain a run-of-the-mill REST programmer. Basically, you can use a request like

GET /teams/4234.json?selector=:(members:(id,name,birthday)

to retrieve the id, name and birthday of all members of a softball team, rather than having to retrieve the team roaster and then do a GET on each and every team member to retrieve their name and birthday (and lots of other information you don’t care about).

Where have I seen this before? That use case came up over and over again when we were using SOAP Web services for resource management. I have personally crafted support for it a few times. Using this blog to support my memory, here is the list of SOAP-related management efforts listed in the “post-mortem on the previous IT management revolution”:

WSMF, WS-Manageability, WSDM, OGSI, WSRF, WS-Management, WS-ResourceTransfer, WSRA, WS-ResourceCatalog, CMDBf

Each one of them supports this “partial access” use case: WS-Management has :

WSMF, WS-Manageability, WSDM, OGSI, WSRF, WS-Management, WS-ResourceTransfer, WSRA, WS-ResourceCatalog, CMDBf

Each one of them supports this “partial access” use case: WS-Management has SelectorSet, WSRF has ResourceProperties, CMDBf has ContentSelector, WSRA has Fragments, etc.

Years ago, I also created the XMLFrag SOAP header to attack a more general version of this problem. There may be something to salvage in all this for people willing to break REST orthodoxy (with the full knowledge of what they gain and what they loose).

I’m not being sarcastic when I ask “where have I seen this before”. The problem hasn’t gone away just because we failed to solve it in a pragmatic way with SOAP. If the industry is moving towards HTTP+JSON then we’ll need to solve it again on that ground and it’s no surprise if the solution looks similar.

I have a sense of what’s coming next. XPath-for-JSON-over-the-wire. See, getting individual properties is nice, but sometimes you want more. You want to select only the members of the team who are above 14 years old. Or you just want to count these members rather than retrieve specific information about them individually. Or you just want a list of all the cities they live in. Etc.

But even though we want this, I am not convinced (anymore) that we need it.

What I know we need is better support for graph queries. Kingsley Idehen once provided a good explanation of why that is and how SPARQL and XML query languages (or now JSON query languages) complement one another (wouldn’t that be a nice trifecta: RDF/OWL’s precise modeling, JSON’s friendly syntax and SPARQL’s graph support – but I digress).

Going back to partial resource access, the last feature is the biggie: a fine-grained mechanism to update resource properties. That one is extra-hard.


Filed under API, CMDBf, Everything, Graph query, IT Systems Mgmt, Manageability, Mgmt integration, Modeling, Protocols, Query, REST, SOAP, SOAP header, Specs, Standards, Web services, WS-Management, WS-ResourceCatalog, WS-ResourceTransfer, WS-Transfer, XMLFrag, XPath

Backward-looking API ToS vs. forwarding-looking API usage

From the developer’s perspective, most terms of service (ToS) documents for public APIs contain only constraints and no real permission. That’s because they generally don’t come with any guarantee about the future. At best, they tell you what you were authorized to do yesterday. Since the version you’re looking at may already be outdated, it tells you nothing about what you can do now or tomorrow. That perspective is often ignored when debating a change in ToS. The new version may forbid you from doing something that was not forbidden before, but it’s not actually taking anything away from you. The only thing it would take away from you is the right to do something from now on, but you were never granted such right over that period of time.

For example, the Twitter API ToS:

“The Rules will evolve along with our ecosystem as developers continue to innovate and find new, creative ways to use the Twitter API, so please check back periodically to see the most current version.”

and more specifically

“Twitter may update or modify the Twitter API, Rules, and other terms and conditions, including the Display Guidelines, from time to time its sole discretion by posting the changes on this site or by otherwise notifying you (such notice may be via email). You acknowledge that these updates and modifications may adversely affect how your Service accesses or communicates with the Twitter API. If any change is unacceptable to you, your only recourse is to terminate this agreement by ceasing all use of the Twitter API and Twitter Content. Your continued access or use of the Twitter API or any Twitter Content will constitute binding acceptance of the change.”

Changes in the terms of service can have a major effect on consuming application, going as far as making the whole purpose of the application impossible. For example, any Twitter app is a the mercy of the next corporate strategy change decided at Twitter HQ. That can happen without any change whatsoever to the API from a technical perspective.

If building your business on a proprietary API is sharecropping, then changes in ToS are the “droit de cuissage” (a.k.a “droit du seigneur“) that comes with it. Be ready for it to be exercised at any point.

Even in the cases where the ToS contain a guarantee over time, there is often enough language to allow the provider to wiggle out of it. Case in point, the recent decommission of the Google Translate API.

In the API ToS, Google includes a generous three years deprecation window:

“If Google in its discretion chooses to cease providing the current version of the Service whether through discontinuation of the Service or by upgrading the Service to a newer version, the current version of the Service will be deprecated and become a Deprecated Version of the Service. Google will issue an announcement if the current version of the Service will be deprecated. For a period of 3 years after an announcement (the “Deprecation Period”), Google will use commercially reasonable efforts to continue to operate the Deprecated Version of the Service and to respond to problems with the Deprecated Version of the Service deemed by Google in its discretion to be critical. During the Deprecation Period, no new features will be added to the Deprecated Version of the Service.”

But further down, Google reserves the right to do away with the Deprecation Period for several reasons, among which:

“- providing the Deprecated Version of the Service could create a substantial economic burden on Google as determined by Google in its reasonable good faith judgment; or
– providing the Deprecated Version of the Service could create a security risk or material technical burden upon Google as determined by Google in its reasonable good faith judgment.”

A “substantial economic burden” or a “material technical burden”. I am not a lawyer, but good luck going to court against Google arguing that maintaining the API doesn’t provide any such burden if Google claims it does (if I was a Google lawyer, I’d ask “if there’s no such burden, why don’t you provide it yourself?”).

And that’s exactly what Google is claiming in the case of the Translate API. There’s a reason why the words “substantial economic burden” are included in this statement at the very top of the home page for the Google Translate API.

“Important: The Google Translate API has been officially deprecated as of May 26, 2011. Due to the substantial economic burden caused by extensive abuse, the number of requests you may make per day will be limited and the API will be shut off completely on December 1, 2011. For website translations, we encourage you to use the Google Translate Element.”

[For more information about Google’s translation services and what might really be behind this change, I recommend reading this interesting analysis. Whether or not the author is guessing right, it’s an interesting perspective on how the marriage between Search and Advertising turns into a love triangle (with Translation as the third party) when you take a worldwide perspective.]

Protection against such changes (changes in ToS or the API going away entirely) would be a nice value-add for the companies that offer services around managing API usage. I vaguely remember an earlier announcement about Apigee providing a way to invoke the Twitter API that was sheltered from Twitter’s throttling (presumably via some agreement between Apigee and Twitter). Maybe Apigee can also negotiate long-term bidirectional ToS agreements with API providers on behalf of its customers. If I was building a business on top of a third-party service like Twitter, I’d probably value that kind of guarantee even more than the cool technical tools Apigee provides around API usage.

What’s more important in an API? Whether it’s REST or SOAP or whether you can count on using it for some time in the future?

Comments Off on Backward-looking API ToS vs. forwarding-looking API usage

Filed under API, Business, Everything, Google, Protocols

Comments on “The Good, the Bad, and the Ugly of REST APIs”

A survivor of intimate contact with many Cloud APIs, George Reese shared his thoughts about the experience in a blog post titled “The Good, the Bad, and the Ugly of REST APIs“.

Here are the highlights of his verdict, with some comments.

“Supporting both JSON and XML [is good]”

I disagree: Two versions of a protocol is one too many (the post behind this link doesn’t specifically discuss the JSON/XML dichotomy but its logic applies to that situation, as Tim Bray pointed out in a comment).

“REST is good, SOAP is bad”

Not necessarily true for all integration projects, but in the context of Cloud APIs, I agree. As long as it’s “pragmatic REST”, not the kind that involves silly contortions to please the REST police.

“Meaningful error messages help a lot”

True and yet rarely done properly.

“Providing solid API documentation reduces my need for your help”

Goes without saying (for a good laugh, check out the commenter on George’s blog entry who wrote that “if you document an API, you API immediately ceases to have anything to do with REST” which I want to believe was meant as a joke but appears written in earnest).

“Map your API model to the way your data is consumed, not your data/object model”

Very important. This is a core part of Humble Architecture.

“Using OAuth authentication doesn’t map well for system-to-system interaction”


“Throttling is a terrible thing to do”

I don’t agree with that sweeping statement, but when George expands on this thought what he really seems to mean is more along the lines of “if you’re going to throttle, do it smartly and responsibly”, which I can’t disagree with.

“And while we’re at it, chatty APIs suck”

Yes. And one of the main causes of API chattiness is fear of angering the REST gods by violating the sacred ritual. Either ignore that fear or, if you can’t, hire an expensive REST consultant to rationalize a less-chatty design with some media-type black magic and REST-bless it.

Finally George ends by listing three “ugly” aspects of bad APIs (“returning HTML in your response body”, “failing to realize that a 4xx error means I messed up and a 5xx means you messed up” and “side-effects to 500 errors are evil”) which I agree on but I see those as a continuation of the earlier point about paying attention to the error messages you return (because that’s what the developers who invoke your API will be staring at most of the time, even if they represents only 0.01% of the messages you return).

What’s most interesting is what’s NOT in George’s list. No nit-picking about REST purity. That tells you something about what matters to implementers.

If I haven’t yet exhausted my quota of self-referential links, you can read REST in practice for IT and Cloud management for more on the topic.


Filed under API, Cloud Computing, Everything, Implementation, Manageability, Mgmt integration, Modeling, Protocols, REST, SOAP, Specs, Tech

The API, the whole API and nothing but the API

When programming against a remote service, do you like to be provided with a library (or service stub) or do you prefer “the API, the whole API, nothing but the API”?

A dedicated library (assuming it is compatible with your programming language of choice) is the simplest way to get invocations flowing. On the other hand, if you expect your client to last longer than one night of tinkering then you’re usually well-advised to resist making use of such a library in your code. Save yourself license issues, support issues, packaging issues and lifecycle issues. Also, decide for yourself what the right interaction model with the remote API is for your app.

One of the key motivations of SOAP was to prevent having to get stubs from the service provider. That remains an implicit design goals of the recent HTTP APIs (often called “RESTful”). You should be able to call the API directly from your application. If you use a library, e.g. an authentication library, it’s a third party library, not one provided by the service provider you are trying to connect to.

So are provider-provided (!) libraries always bad? Not necessarily, they can be a good learning/testing tool. Just don’t try to actually embed them in your app. Use them to generate queries on the wire that you can learn from. In that context, a nice feature of these libraries is the ability to write out the exact message that they put on the wire so you don’t have to intercept it yourself (especially if messages are encrypted on the wire). Even better if you can see the library code, but even as a black box they are a pretty useful way to clarify the more obscure parts of the API.

A few closing comments:

– In a way, this usage pattern is similar to a tool like the WLST Recorder in the WebLogic Administration Console. You perform the actions using the familiar environment of the Console, and you get back a set of WLST commands as a starting point for writing your script. When you execute your script, there is no functional dependency on the recorder, it’s a WLST script like any other.

– While we’re talking about downloadable libraries that are primarily used as a learning/testing tool, a test endpoint for the API would be nice too (either as part of the library or as a hosted service at a well-known URL). In the case of most social networks, you can create a dummy account for testing; but some other services can’t be tested in a way that is as harmless and inexpensive.

– This question of provider-supplied libraries is one of the reasons why I lament the use of the term “API” as it is currently prevalent. Call me old-fashioned, but to me the “API” is the programmatic interface (e.g. the Java interface) presented by the library. The on-the-wire contract is, in my world, called a service contract or a protocol. As in, the Twitter protocol, or the Amazon EC2 protocol, etc… But then again, I was also the last one to accept to use the stupid term of “Cloud Computing” instead of “Utility Computing”. Twitter conversations don’t offer the luxury of articulating such reticence so I’ve given up and now use “Cloud Computing” and “API” in the prevalent way.

[UPDATE: How timely! Seconds after publishing this entry I noticed a new trackback on a previous entry on this blog (Cloud APIs are like military parades). The trackback is an article from ProgrammableWeb, asking the exact same question I am addressing here: Should Cloud APIs Focus on Client Libraries More Than Endpoints?]


Filed under API, Automation, Everything, Implementation, Protocols, REST, SOAP

The REST bubble

Just yesterday I was writing about how Cloud APIs are like military parades. To some extent, their REST rigor is a way to enforce implementation discipline. But a large part of it is mostly bling aimed at showing how strong (for an army) or smart (for an API) the people in charge are.

Case in point, APIs that have very simple requirements and yet make a big deal of the fact that they are perfectly RESTful.

Just today, I learned (via the ever-informative InfoQ) about the JBoss SteamCannon project (a PaaS wrapper for Java and Ruby apps that can deploy on different host infrastructures like EC2 and VirtualBox). The project looks very interesting, but the API doc made me shake my head.

The very first thing you read is three paragraphs telling you that the API is fully HATEOS (Hypermedia as the Engine of Application State) compliant (our URLs are opaque, you hear me, opaque!) and an invitation to go read Roy’s famous take-down of these other APIs that unduly call themselves RESTful even though they don’t give HATEOS any love.

So here I am, a developer trying to deploy my WAR file on SteamCannon and that’s the API document I find.

Instead of the REST finger-wagging, can I have a short overview of what functions your API offers? Or maybe an example of a request call and its response?

I don’t mean to pick on SteamCannon specifically, it just happens to be a new Cloud API that I discovered today (all the Cloud API out there also spend too much time telling you how RESTful they are and not enough time showing you how simple they are). But when an API document starts with a REST lesson and when PowerPoint-waving sales reps pitch “RESTful APIs” to executives you know this REST thing has gone way beyond anything related to “the fundamentals”.

We have a REST bubble on our hands.

Again, I am not criticizing REST itself. I am criticizing its religious and ostentatious application rather than its practical use based on actual requirements (this was my take on its practical aspects in the context of Cloud APIs).


Filed under API, Application Mgmt, Cloud Computing, Everything, JBoss, Mgmt integration, Protocols, REST, Utility computing

Cloud APIs are like military parades

The previous post (“Amazon proves that REST doesn’t matter for Cloud APIs”) attracted some interesting comments on the blog itself, on Hacker News and in a response post by Mike Pearce (where I assume the photo is supposed to represent me being an AWS fanboy). I failed to promptly follow-up on it and address the response, then the holidays came. But Mark Little was kind enough to pick the entry up for discussion on InfoQ yesterday which brought new readers and motivated me to write a follow-up.

Mark did a very good job at summarizing my point and he understood that I wasn’t talking about the value (or lack of value) of REST in general. Just about whether it is useful and important in the very narrow field of Cloud APIs. In that context at least, what seems to matter most is simplicity. And REST is not intrinsically simpler.

It isn’t a controversial statement in most places that RPC is easier than REST for developers performing simple tasks. But on the blogosphere I guess it needs to be argued.

Method calls is how normal developers write normal code. Doing it over the wire is the smallest change needed to invoke a remote API. The complexity with RPC has never been conceptual, it’s been in the plumbing. How do I serialize my method call and send it over? CORBA, RMI and SOAP tried to address that, none of them fully succeeded in keeping it simple and yet generic enough for the Internet. XML-RPC somehow (and unfortunately) got passed over in the process.

So what did AWS do? They pretty much solved that problem by using parameters in the URL as a dead-simple way to pass function parameters. And you get the response as an XML doc. In effect, it’s one-half of XML-RPC. Amazon did not invent this pattern. And the mechanism has some shortcomings. But it’s a pragmatic approach. You get the conceptual simplicity of RPC, without the need to agree on an RPC framework that tries to address way more than what you need. Good deal.

So, when Mike asksDoes the fact that AWS use their own implementation of an API instead of a standard like, oh, I don’t know, REST, frustrate developers who really don’t want to have to learn another method of communicating with AWS?” and goes on to answer “Yes”, I scratch my head. I’ve met many developers struggling to understand REST. I’ve never met a developer intimidated by RPC. As to the claim that REST is a “standard”, I’d like to read the spec. Please don’t point me to a PhD dissertation.

That being said, I am very aware that simplicity can come back to bite you, when it’s not just simple but simplistic and the task at hand demands more. Andrew Wahbe hit the nail on the head in a comment on my original post:

Exposing an API for a unique service offered by a single vendor is not going to get much benefit from being RESTful.

Revisit the issue when you are trying to get a single client to work across a wide range of cloud APIs offered by different vendors; I’m willing to bet that REST would help a lot there. If this never happens — the industry decides that a custom client for each Cloud API is sufficient (e.g. not enough offerings on the market, or whatever), then REST may never be needed.

Andrew has the right perspective. The usage patterns for Cloud APIs may evolve to the point where the benefits of following the rules of REST become compelling. I just don’t think we’re there and frankly I am not holding my breath. There are lots of functional improvements needed in Cloud services before the burning issue becomes one of orchestrating between Cloud providers. And while a shared RESTful API would be the easiest to orchestrate, a shared RPC API will still be very reasonably manageable. The issue will mostly be one of shared semantics more than protocol.

Mike’s second retort was that it was illogical for me to say that software developers are mostly isolated from REST because they use Cloud libraries. Aren’t these libraries written by developers? What about these, he asks. Well, one of them, Boto‘s Mitch Garnaat left a comment:

Good post. The vast majority of AWS (or any cloud provider’s) users never see the API. They interact through language libraries or via web-based client apps. So, the only people who really care are RESTafarians, and library developers (like me).

Perhaps it’s possible to have an API that’s so bad it prevents people from using it but the AWS Query API is no where near that bad. It’s fairly consistent and pretty easy to code to. It’s just not REST.

Yup. If REST is the goal, then this API doesn’t reach it. If usefulness is the goal, then it does just fine.

Mike’s third retort was to take issue with that statement I made:

The Rackspace people are technically right when they point out the benefits of their API compared to Amazon’s. But it’s a rounding error compared to the innovation, pragmatism and frequency of iteration that distinguishes the services provided by Amazon. It’s the content that matters.

Mike thinks that

If Rackspace are ‘technically’ right, then they’re right. There’s no gray area. Morally, they’re also right and mentally, physically and spiritually, they’re right.

Sure. They’re technically, mentally, physically and spiritually right. They may even be legally, ethically, metaphysically and scientifically right. Amazon is only practically right.

This is not a ding on Rackspace. They’ll have to compete with Amazon on service (and price), not on API, as they well know and as they are doing. But they are racing against a fast horse.

More generally, the debate about how much the technical merits of an API matters (beyond the point where it gets the job done) is a recurring one. I am talking as a recovering over-engineer.

In a post almost a year ago, James Watters declared that it matters. Mitch Garnaat weighed on the other side: given how few people use the raw API we probably spend too much time worrying about details, maybe we worry too much about aesthetics, I still wonder whether we obsess over the details of the API’s a bit too much (in case you can’t tell, I’m a big fan of Mitch).

Speaking of people I admire, Shlomo Swidler (in general, only library developers use the raw HTTP. Everyone else uses a library) and Joe Arnold (library integration (fog / jclouds / libcloud) is more important for new #IaaS providers than an API) make the right point. Rather than spending hours obsessing about the finer points of your API, spend the time writing love letters to Mitch and Adrian so they support you in their libraries (also, allocate less of your design time to RESTfulness and more to the less glamorous subject of error handling).

OK, I’ll pile on two more expert testimonies. Righscale’s Thorsten von Eicken (the API itself is more a programming exercise than a fundamental issue, it’s the semantics of the resources behind the API that really matter) and F5’s Lori MacVittie (the World Doesn’t Care About APIs).

Bottom line, I see APIs a bit like military parades. Soldiers know better than to walk in tight formation, wearing bright colors and to the sound of fanfare into the battlefield. So why are parade exercises so prevalent in all armies? My guess is that they are used to impress potential enemies, reassure citizens and reflect on the strength of the country’s leaders. But military parades are also a way to ensure internal discipline. You may not need to use parade moves on the battlefield, but the fact that the unit is disciplined enough to perform them means they are also disciplined enough for the tasks that matter. Let’s focus on that angle for Cloud APIs. If your RPC API is consistent enough that its underlying model could be used as the basis for a REST API, you’re probably fine. You don’t need the drum rolls, stiff steps and the silly hats. And no need to salute either.


Filed under Amazon, API, Automation, Cloud Computing, Everything, IT Systems Mgmt, Mgmt integration, Protocols, REST, Specs, Utility computing

Partial resource update, one more time

Alex Scordellis has a good blog post about how to handle partial PUT in REST. It starts by explaining why partial PUT is needed in the first place. And then (including in the comments) it runs into the issues this brings and proposes some solutions.

I have bad news. There are many more issues.

Let’s pick a simple example. What does it mean if an element is not present in a partial update? Is it an explicit omission, intended to represent the need to remove this element in the representation? Or does it mean “don’t change its current value”. If the latter, then how do I do removal? Do I need partial DELETE like I have partial PUT? Hopefully not, but then I have to have a mechanism to remove elements as part of a PUT. Empty value? That doesn’t necessarily mean the same thing as an absent element. Nil value? And how do I handle this with JSON?

And how do you deal with repeating elements? If you PUT an element of that type, is it an addition or a replacement? If replacement, which one(s) are you replacing? Or do you force me to PUT the entire list? No matter how long it is? Even if it increases the risk of concurrency issues?

Lots of similar issues. These two are just off the top of my head, memories from hours locked in a room with my HP, IBM, Intel and Microsoft accomplices.

You know what you end up with? You end up with this. Partial Put in WS-RT. I can hear you scream from here.

I am the ghost of dead partial update mechanisms, coming back to haunt you…

As much as WS-* was criticized for re-inventing HTTP, what we see here is HTTP people re-inventing partial resource update mechanisms like those in WSDM, WS-Management and WS-ResourceTransfer. Which is fine, I am in no way advocating that they should re-use these specs.

But let’s realize that while a lot of the complexity in WS-* was unnecessary, some of it actually was a reflection of the complexity of the task at hand. And that complexity doesn’t go away because you get rid of a SOAP envelope and of stupid WS-Addressing headers.

The good news is that we’ve made a lot of the mistakes already and we’ve learned some lessons (see this technical rant, this post-mortem or this experiment). The bad news is that there are plenty of new mistakes waiting to be made.

Good luck. I mean it sincerely.


Filed under API, Everything, IT Systems Mgmt, Manageability, Protocols, REST, Specs, Tech, WS-Management, WS-ResourceTransfer, WS-Transfer, XMLFrag

Redeeming the service description document

A bicycle is a convenient way to go buy cigarettes. Until one day you realize that buying cigarettes is a bad idea. At which point you throw away your bicycle.

Sounds illogical? Well, that’s pretty much what the industry has done with service descriptions. It went this way: people used WSDL (and stub generation tools built around it) to build distributed applications that shared too much. Some people eventually realized that was a bad approach. So they threw out the whole idea of a service description. And now, in the age of APIs, we are no more advanced than we were 15 years ago in terms of documenting application contracts. It’s tragic.

The main fallacies involved in this stagnation are:

  • Assuming that service descriptions are meant to auto-generate all-encompassing program stubs,
  • Looking for the One True Description for a given service,
  • Automatically validating messages based on the service description.

I’ll leave the first one aside, it’s been widely covered. Let’s drill in a bit into the other two.

There is NOT One True Description for a given service

Many years ago, in the same galaxy where we live today (only a few miles from here, actually), was a development team which had to implement a web service for a specific WSDL. They fed the WSDL to their SOAP stack. This was back in the days when WSDL interoperability was a “promise” in the “political campaign” sense of the term so of course it didn’t work. As a result, they gave up on their SOAP stack and implemented the service as a servlet. Which, for a team new to XML, meant a long delay and countless bugs. I’ll always remember the look on the dev lead’s face when I showed him how 2 minutes and a text editor were all you needed to turn the offending WSDL in to a completely equivalent WSDL (from the point of view of on-the-wire messages) that their toolkit would accept.

(I forgot what the exact issue was, maybe having operations with different exchange patterns within the same PortType; or maybe it used an XSD construct not supported by the toolkit, and it was just a matter of removing this constraint and moving it from schema to code. In any case something that could easily be changed by editing the WSDL and the consumer of the service wouldn’t need to know anything about it.)

A service description is not the literal word of God. That’s true no matter where you get it from (unless it’s hand-delivered by an angel, I guess). Just because adding “?wsdl” to the URL of a Web service returns an XML document doesn’t mean it’s The One True Description for that service. It’s just the most convenient one to generate for the app server on which the service is deployed.

One of the things that most hurts XML as an on-the-wire format is XSD. But not in the sense that “XSD is bad”. Sure, it has plenty of warts, but what really hurts XML is not XSD per se as much as the all-too-common assumption that if you use XML you need to have an XSD for it (see fat-bottomed specs, the key message of which I believe is still true even though SML and SML-IF are now dead).

I’ve had several conversations like this one:

– The best part about using JSON and REST was that we didn’t have to deal with XSD.
– So what do you use as your service contract?
– Nothing. Just a human-readable wiki page.
– If you don’t need a service contract, why did you feel like you had to write an XSD when you were doing XML? Why not have a similar wiki page describing the XML format?
– …

It’s perfectly fine to have service descriptions that are optimized to meet a specific need rather than automatically focusing on syntax validation. Not all consumers of a service contract need to be using the same description. It’s ok to have different service descriptions for different users and/or purposes. Which takes us to the next fallacy. What are service descriptions for if not syntax validation?

A service description does NOT mean you have to validate messages

As helpful as “validation” may seem as a concept, it often boils down to rejecting messages that could otherwise be successfully processed. Which doesn’t sound quite as useful, does it?

There are many other ways in which service descriptions could be useful, but they have been largely neglected because of the focus on syntactic validation and stub generation. Leaving aside development use cases and looking at my area of focus (application management), here are a few use cases for service descriptions:

Creating test messages (aka “synthetic transactions”)

A common practice in application management is to send test messages at regular intervals (often from various locations, e.g. branch offices) to measure the availability and response time of an application from the consumer’s perspective. If a WSDL is available for the service, we use this to generate the skeleton of the test message, and let the admin fill in appropriate values. Rather than a WSDL we’d much rather have a “ready-to-use” (possibly after admin review) test message that would be provided as part of the service description. Especially as it would be defined by the application creator, who presumably knows a lot more about that makes a safe and yet relevant message to send to the application as a ping.

Attaching policies and SLAs

One of the things that WSDLs are often used for, beyond syntax validation and stub generation, is to attach policies and SLAs. For that purpose, you really don’t need the XSD message definition that makes up so much of the WSDL. You really just need a way to identify operations on which to attach policies and SLAs. We could use a much simpler description language than WSDL for this. But if you throw away the very notion of a description language, you’ve thrown away the baby (a classification of the requests processed by the service) along with the bathwater (a syntax validation mechanism).

Governance / versioning

One benefit of having a service description document is that you can see when it changes. Even if you reduce this to a simple binary value (did it change since I last checked, y/n) there’s value in this. Even better if you can introspect the description document to see which requests are affected by the change. And whether the change is backward-compatible. Offering the “before” XSD and the “after” XSD is almost useless for automatic processing. It’s unlikely that some automated XSD inspection can tell me whether I can keep using my previous messages or I need to update them. A simple machine-readable declaration of that fact would be a lot more useful.

I just listed three, but there are other application management use cases, like governance/auditing, that need a service description.

In the SOAP world, we usually make do with WSDL for these tasks, not because it’s the best tool (all we really need is a way to classify requests in “buckets” – call them “operations” if you want – based on the content of the message) but because WSDL is the only understanding that is shared between the caller and the application.

By now some of you may have already drafted in your head the comment you are going to post explaining why this is not a problem if people just use REST. And it’s true that with REST there is a default categorization of incoming messages. A simple matrix with the various verbs as columns (GET, POST…) and the various resource types as rows. Each message can be unambiguously placed in one cell of this matrix, so I don’t need a service description document to have a request classification on which I can attach SLAs and policies. Granted, but keep these three things in mind:

  • This default categorization by verb and resource type can be a quite granular. Typically you wouldn’t have that many different policies on your application. So someone who understands the application still needs to group the invocations into message categories at the right level of granularity.
  • This matrix is only meaningful for the subset of “RESTful” apps that are truly… RESTful. Not for all the apps that use REST where it’s an easy mental mapping but then define resource types called “operations” or “actions” that are just a REST veneer over RPC.
  • Even if using REST was a silver bullet that eliminated the need for service definitions, as an application management vendor I don’t get to pick the applications I manage. I have to have a solution for what customers actually do. If I restricted myself to only managing RESTful applications, I’d shrink my addressable market by a few orders of magnitude. I don’t have an MBA, but it sounds like a bad idea.

This is not a SOAP versus REST post. This is not a XML versus JSON post. This is not a WSDL versus WADL post. This is just a post lamenting the fact that the industry seems to have either boxed service definitions into a very limited use case, or given up on them altogether. It I wasn’t recovering from standards burnout, I’d look into a versatile mechanism for describing application services in a way that is geared towards message classification more than validation.

Comments Off on Redeeming the service description document

Filed under API, Application Mgmt, Everything, Governance, IT Systems Mgmt, Manageability, Mashup, Mgmt integration, Middleware, Modeling, Protocols, REST, SML, SOA, Specs, Standards

URL shorteners and privacy: The Good, the Bad and the Cookie

The table below compares various URL shorteners based on how much they value service performance and the privacy of their users.

Here is the short version of the reading guide: a URL shorterner which gives a high priority to reliability, performance and privacy will use a 301 (“Moved Permanently”) response code, will not use cache control headers and will not use cookies. A URL shortener which gives high priority to its own ability to monetize its traffic by tracking users will do one or more of these things.

Here is how a few of the most popular shorteners perform by this measure (red is bad).

For the long version (and an explanation of how I came to create this table) read below the table.

Service name Cookie Status code Caching limitations (Twitter) 301 5 min tracking 301 301 (Google) 301 24h (WordPress) 301 301 10h (Facebook) (*) 301 tracking 301 301 tracking 301 no caching tracking 301 (**) 301 tracking 301 301 10h tracking 301 302 no caching 302 no caching


(*) Facebook’s service,, tries to set a cookie but its content is “locale=en_US” and cannot be used for identification. In addition, it sets the domain to “” in the Set-Cookie directive but since the response comes from another domain ( the cookie is actually never returned by the browser and therefore useless. It looks like this is a leftover configuration setting copied from the normal servers. Defying all expectations, Facebook comes out as one of the most privacy-friendly URL shorteners.

(**) limits the cache to being “private” which means that your browser can cache the result but a shared proxy (e.g. your company’s proxy) should not cache it. Forcing each user behind that proxy to resolve the URL once. I magnanimously did not ding them for this, even though it’s sub-optimal.

Now for the longer explanation

Despite the potential it offers to stretch out our tweets, I wasn’t too impressed when I learned of Twitter’s plan to roll out (and mandate) its own URL shortening service. My fundamental issue is that URL shortening is made necessary by an arbitrary decision on Twitter’s part (the 140 character limit and the fact that URLs count toward it) and that it would be entirely within their power to make these abominations unneeded. Or, at least, much more rarely needed (when came out, the main use case was to insert a very long URL in an email without having problems with carriage returns, not to turn third-world countries into purveyors of silly domain names).

Beyond this fundamental issue, my main concerns about Twitter’s mechanism are that it reduces privacy and it demands that you break the HTTP specification.

From a privacy perspective, the issue is that anyone who clicks on these links tells Twitter where they are going. And Twitter can collect and correlate these actions. The easiest way for them (or any other URL shortener) to do this is to use cookies. Cookies aren’t often used as part of redirections, but technically nothing prevents them. So I wanted to see if Twitter used them.

[Side note: in practice there are ways to track your browser without using identifying cookies, not to mention simply using the IP address which works quite well on people who browse from home. Still, identifying cookies are the preferred method.]

From a specification conformance perspective, the problem is that Twitter announced that they would modify the Terms of Service of their API to prevent you from replacing the short URL with the real location once you’ve resolved it the first time (as of this writing they apparently haven’t yet made the ToS change). That behavior would be in violation of the HTTP specification if the redirection used status code 301 (“Moved Permanently”) which states that “any future references to this resource SHOULD use one of the returned URIs” and “clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server“. So I wanted to see whether indeed returns a 301 (and asks us to violate the spec) or if they use a Temporary Redirect (302 or the new 307) in which case the specification would not be violated but other problems would arise (for example, search engines would not give you PageRank karma for such a link).

The other (spec-compliant) way to force a 301 to call back home once a while is the (strange but legal) practice of using cache control headers on permanent redirections. So I also wanted to see how behaves on that front.

And then I decided to also test a few other services, which is how the table above came to be.

Comments Off on URL shorteners and privacy: The Good, the Bad and the Cookie

Filed under Everything, Facebook, Google, Protocols, Security, Social networks, Tech, Testing, Twitter

Updates on Microsoft Oslo and “SSH on Windows”

I’ve been tracking the modeling technology previously known as “Microsoft Oslo” with a sympathetic eye for the almost three years since it’s been introduced. I look at it from the perspective of model-driven IT management but the news hadn’t been good on that front lately (except for Douglas Purdy’s encouraging hint).

The prospects got even bleaker today, at least according to the usually-well-informed Mary Jo Foley, who writes: “Multiple contacts of mine are telling me that Microsoft has decided to shelve Quadrant and ‘refocus’ M.” Is “M” the end of the SDM/SML/M model-driven management approach at Microsoft? Or is the “refocus” a hint that M is returning “home” to address IT management use cases? Time (or Doug) will tell…

While we’re talking about Microsoft and IT automation, I have one piece of free advice for the Microsofties: people *really* want to SSH into Windows servers. Here’s how I know. This blog rarely talks about Microsoft but over the course of two successive weekends over a year ago I toyed with ways to remotely manage Windows machines using publicly documented protocols. In effect, showing what to send on the wire (from Linux or any platform) to leverage the SOAP-based management capabilities in recent versions of Windows. To my surprise, these posts (1, 2, 3) still draw a disproportionate amount of traffic. And whenever I look at my httpd logs, I can count on seeing search engine queries related to “windows native ssh” or similar keywords.

If heterogeneous Cloud is something Microsoft cares about they need to better leverage the potential of the PowerShell Remoting Protocol. They can release open-source Python, Java and Ruby client-side libraries. Alternatively, they can drastically simplify the protocol, rather than its current “binary over SOAP” (you read this right) incarnation. Because the poor Kridek who is looking for the “WSDL for WinRM / Remote Powershell” is in for a nasty surprise if he finds it and thinks he’ll get a ready-to-use stub out of it.

That being said, a brave developer willing to suck it up and create such a Python/Ruby/Java library would probably make some people very grateful.


Filed under Application Mgmt, Automation, Everything, Implementation, IT Systems Mgmt, Manageability, Mgmt integration, Microsoft, Modeling, Oslo, Protocols, SML, SOAP, Specs, Tech, WS-Management

Introducing the Oracle Cloud API

Oracle recently published a Cloud management API on OTN and also submitted a subset of the API to the new DMTF Cloud Management working group. The OTN specification, titled “Oracle Cloud Resource Model API”, is available here. In typical DMTF fashion, the DMTF-submitted specification is not publicly available (if you have a DMTF account and are a member of the right group you can find it here). It is titled the “Oracle Cloud Elemental Resource Model” and is essentially the same as the OTN version, minus sections 9.2, 9.4, 9.6, 9.8, 9.9 and 9.10 (I’ll explain below why these sections have been removed from the DMTF submission). Here is also a slideset that was recently used to present the submitted specification at a DMTF meeting.

So why two documents? Because they serve different purposes. The Elemental Resource Model, submitted to DMTF, represents the technical foundation for the IaaS layer. It’s not all of IaaS, just its core. You can think of its scope as that of the base EC2 service (boot a VM from an image, attach a volume, connect to a network). It’s the part that appears in all the various IaaS APIs out there, and that looks very similar, in its model, across all of them. It’s the part that’s ripe for a simple standard, hopefully free of much of the drama of a more open-ended and speculative effort. A standard that can come out quickly and provide interoperability right out of the gate (for the simple use cases it supports), not after years of plugfests and profiles. This is the narrow scope I described in an earlier rant about Cloud standards:

I understand the pain of customers today who just want to have a bit more flexibility and portability within the limited scope of the VM/Volume/IP offering. If we really want to do a standard today, fine. Let’s do a very small and pragmatic standard that addresses this. Just a subset of the EC2 API. Don’t attempt to standardize the virtual disk format. Don’t worry about application-level features inside the VM. Don’t sweat the REST or SOA purity aspects of the interface too much either. Don’t stress about scalability of the management API and batching of actions. Just make it simple and provide a reference implementation. A few HTTP messages to provision, attach, update and delete VMs, volumes and IPs. That would be fine. Anything else (and more is indeed needed) would be vendor extensions for now.

Of course IaaS goes beyond the scope of the Elemental Resource Model. We’ll need load balancing. We’ll need tunneling to the private datacenter. We’ll need low-latency sub-networks. We’ll need the ability to map multi-tier applications to different security zones. Etc. Some Cloud platforms support some of these (e.g. Amazon has an answer to all but the last one), but there is a lot more divergence (both in the “what” and the “how”) between the various Cloud APIs on this. That part of IaaS is not ready for standardization.

Then there are the extensions that attempt to make the IaaS APIs more application-aware. These too exist in some Cloud APIs (e.g. vCloud vApp) but not others. They haven’t naturally converged between implementations. They haven’t seen nearly as much usage in the industry as the base IaaS features. It would be a mistake to overreach in the initial phase of IaaS standardization and try to tackle these questions. It would not just delay the availability of a standard for the base IaaS use cases, it would put its emergence and adoption in jeopardy.

This is why Oracle withheld these application-aware aspects from the DMTF submission, though we are sharing them in the specification published on OTN. We want to expose them and get feedback. We’re open to collaborating on them, maybe even in the scope of a standard group if that’s the best way to ensure an open IP framework for the work. But it shouldn’t make the upcoming DMTF IaaS specification more complex and speculative than it needs to be, so we are keeping them as separate extensions. Not to mention that DMTF as an organization has a lot more infrastructure expertise than middleware and application expertise.

Again, the “Elemental Resource Model” specification submitted to DMTF is the same as the “Oracle Cloud Resource Model API” on OTN except that it has a different license (a license grant to DMTF instead of the usual OTN license) and is missing some resources in the list of resource types (section 9).

Both specifications share the exact same protocol aspects. It’s pretty cleanly RESTful and uses a JSON serialization. The credit for the nice RESTful protocol goes to the folks who created the original Sun Cloud API as this is pretty much what the Oracle Cloud API adopted in its entirety. Tim Bray described the genesis and design philosophy of the Sun Cloud API last year. He also described his role and explained that “most of the heavy lifting was done by Craig McClanahan with guidance from Lew Tucker“. It’s a shame that the Oracle specification fails to credit the Sun team and I kick myself for not noticing this in my reviews. This heritage was noted from the get go in the slides and is, in my mind, a selling point for the specification. When I reviewed the main Cloud APIs available last summer (the first part in a “REST in practice for IT and Cloud management” series), I liked Sun’s protocol design the best.

The resource model, while still based on the Sun Cloud API, has seen many more changes. That’s where our tireless editor, Jack Yu, with help from Mark Carlson, has spent most of the countless hours he devoted to the specification. I won’t do a point to point comparison of the Sun model and the Oracle model, but in general most of the changes and additions are motivated by use cases that are more heavily tilted towards private clouds and compatibility with existing application infrastructure. For example, the semantics of a Zone have been relaxed to allow a private Cloud administrator to choose how to partition the Cloud (by location is an obvious option, but it could also by security zone or by organizational ownership, as heretic as this may sound to Cloud purists).

The most important differences between the DMTF and OTN versions relate to the support for assemblies, which are groups of VMs that jointly participate in the delivery of a composite application. This goes hand-in-hand with the recently-released Oracle Virtual Assembly Builder, a framework for creating, packing, deploying and configuring multi-tier applications. To support this approach, the Cloud Resource Model (but not the Elemental Model, as explained above) adds resource types such as AssemblyTemplate, AssemblyInstance and ScalabilityGroup.

So what now? The DMTF working group has received a large number of IaaS APIs as submissions (though not the one that matters most or the one that may well soon matter a lot too). If all goes well it will succeed in delivering a simple and useful standard for the base IaaS use cases, and we’ll be down to a somewhat manageable triplet (EC2, RackSpace/OpenStack and DMTF) of IaaS specifications. If not (either because the DMTF group tries to bite too much or because it succumbs to infighting) then DMTF will be out of the game entirely and it will be between EC2, OpenStack and a bunch of private specifications. It will be the reign of toolkits/library/brokers and hell on earth for all those who think that such a bridging approach is as good as a standard. And for this reason it will have to coalesce at some point.

As far as the more application-centric approach to hypervisor-based Cloud, well, the interesting things are really just starting. Let’s experiment. And let’s talk.


Filed under Amazon, API, Application Mgmt, Cloud Computing, DMTF, Everything, IT Systems Mgmt, Mgmt integration, Modeling, OpenStack, Oracle, Portability, Protocols, Specs, Standards, Utility computing, Virtual appliance, Virtualization

Dear Cloud API, your fault line is showing

Most APIs are like hospital gowns. They seem to provide good coverage, until you turn around.

I am talking about the dreadful state of fault reporting in remote APIs, from Twitter to Cloud interfaces. They are badly described in the interface documentation and the implementations often don’t even conform to what little is documented.

If, when reading a specification, you get the impression that the “normal” part of the specification is the result of hours of whiteboard debate but that the section that describes the faults is a stream-of-consciousness late-night dump that no-one reviewed, well… you’re most likely right. And this is not only the case for standard-by-committee kind of specifications. Even when the specification is written to match the behavior of an existing implementation, error handling is often incorrectly and incompletely described. In part because developers may not even know what their application returns in all error conditions.

After learning the lessons of SOAP-RPC, programmers are now more willing to acknowledge and understand the on-the-wire messages received and produced. But when it comes to faults, there is still a tendency to throw their hands in the air, write to the application log and then let the stack do whatever it does when an unhandled exception occurs, on-the-wire compliance be damned. If that means sending an HTML error message in response to a request for a JSON payload, so be it. After all, it’s just a fault.

But even if fault messages may only represent 0.001% of the messages your application sends, they still represent 85% of those that the client-side developers will look at.

Client developers can’t even reverse-engineer the fault behavior by hitting a reference implementation (whether official or de-facto) the way they do with regular messages. That’s because while you can generate response messages for any successful request, you don’t know what error conditions to simulate. You can’t tell your Cloud provider “please bring down your user account database for five minutes so I can see what faults you really send me when that happens”. Also, when testing against a live application you may get a different fault behavior depending on the time of day. A late-night coder (or a daytime coder in another time zone) might never see the various faults emitted when the application (like Twitter) is over capacity. And yet these will be quite common at peak time (when the coder is busy with his day job… or sleeping).

All these reasons make it even more important to carefully (and accurately) document fault behavior.

The move to REST makes matters even worse, in part because it removes SOAP faults. There’s nothing magical about SOAP faults, but at least they force you to think about providing an information payload inside your fault message. Many REST APIs replace that with HTTP error codes, often accompanied by a one-line description with a sometimes unclear relationship with the semantics of the application. Either it’s a standard error code, which by definition is very generic or it’s an application-defined code at which point it most likely overlaps with one or more standard codes and you don’t know when you should expect one or the other. Either way, there is too much faith put in the HTTP code versus the payload of the error. Let’s be realistic. There are very few things most applications can do automatically in response to a fault. Mainly:

  • Ask the user to re-enter credentials (if it’s an authentication/permission issue)
  • Retry (immediately or after some time)
  • Report a problem and fail

So make sure that your HTTP errors support this simple decision tree. Beyond that point, listing a panoply of application-specific error codes looks like an attempt to look “RESTful” by overdoing it. In most cases, application-specific error codes are too detailed for most automated processing and not detailed enough to help the developer understand and correct the issue. I am not against using them but what matters most is the payload data that comes along.

On that aspect, implementations generally fail in one of two extremes. Some of them tell you nothing. For example the payload is a string that just repeats what the documentation says about the error code. Others dump the kitchen sink on you and you get a full stack trace of where the error occurred in the server implementation. The former is justified as a security precaution. The latter as a way to help you debug. More likely, they both just reflect laziness.

In the ideal world, you’d get a detailed error payload telling you exactly which of the input parameters the application choked on and why. Not just vague words like “invalid”. Is parameter “foo” invalid for syntactical reasons? Is it invalid because inconsistent with another parameter value in the request? Is it invalid because it doesn’t match the state on the server side? Realistically, implementations often can’t spend too many CPU cycles analyzing errors and generating such detailed reports. That’s fine, but then they can include a link to a wiki a knowledge base where more details are available about the error, its common causes and the workarounds.

Your API should document all messages accurately and comprehensively. Faults are messages too.


Filed under API, Application Mgmt, Automation, Cloud Computing, Everything, Mgmt integration, Protocols, REST, SOAP, Specs, Standards, Tech, Testing, Twitter, Utility computing

Enterprise application integration patterns for IT management: a blast from the past or from the future?

In a recent blog post, Don Ferguson (CTO at CA) describes CA Catalyst, a major architectural overhaul which “applies enterprise application integration patterns to the problem of integrating IT management systems”. Reading this was fascinating to me. Not because the content was some kind of revelation, but exactly for the opposite reason. Because it is so familiar.

For the better part of the last decade, I tried to build just this at HP. In the process, I worked with (and sometimes against) Don’s colleague at IBM, who were on the same mission. Both companies wanted a flexible and reliable integration platform for all aspects of IT management. We had decided to use Web services and SOA to achieve it. The Web services management protocols that I worked on (WSMF, WSDM, WS-Management and the “reconciliation stack”) were meant for this. We were after management integration more than manageability. Then came CMDBf, another piece of the puzzle. From what I could tell, the focus on SOA and Web services had made Don (who was then Mr. WebSphere) the spiritual father of this effort at IBM, even though he wasn’t at the time focused on IT management.

As far as I know, neither IBM nor HP got there. I covered some of the reasons in this post-mortem. The standards bickering. The focus on protocols rather than models. The confusion between the CMDB as a tool for process/service management versus a tool for software integration. Within HP, the turmoil from the many software acquisitions didn’t help, and there were other reasons. I am not sure at this point whether either company is still aiming for this vision or if they are taking a different approach.

But apparently CA is still on this path, and got somewhere. At least according to Don’s post. I have no insight into what was built beyond what’s in the post. I am not endorsing CA Catalyst, just agreeing with the design goals listed by Don. If indeed they have built it, and the integration framework resists the test of time, that’s impressive. And exciting. It apparently even uses some the same pieces we were planning to use, namely WS-Management and CMDBf (I am reluctantly associated with the first and proudly with the second).

While most readers might not share my historical connection with this work, this is still relevant and important to anyone who cares about IT management in the enterprise. If you’re planning to be at CA World, go listen to Don. Web services may have a bad name, but the technical problems of IT management integration remain. There are only a few routes to IT management automation (I count seven, the one taken by CA is #2). You can throw away SOAP if you want, you still need to deal with protocol compatibility, model alignment and instance reconciliation. You need to centralize or orchestrate the management operations performed. You need to be able to integrate with complementary products or at the very least to effectively incorporate your acquisitions. It’s hard stuff.

Bonus point to Don for not forcing a “Cloud” angle for extra sparkle. This is core IT management.

Comments Off on Enterprise application integration patterns for IT management: a blast from the past or from the future?

Filed under Automation, CA, CMDB, CMDB Federation, CMDBf, Everything, IT Systems Mgmt, Mgmt integration, Modeling, People, Protocols, SOAP, Specs, Standards, Tech, Web services, WS-Management

Two versions of a protocol is one too many

There is always a temptation, when facing a hard design decision in the process of creating an interface or a protocol, to produce two (or more) versions. It’s sometimes a good idea, as a way to explore where each one takes you so you can make a more informed choice. But we know how this invariably ends up. Documents get published that arguably should not. It’s even harder in a standard working group, where someone was asked (or at least encouraged) by the group to create each of the alternative specifications. Canning one is at best socially awkward (despite the appearances, not everyone in standards is a psychopath or a sadist) and often politically impossible.

And yet, it has to be done. Compare the alternatives, then pick one and commit. Don’t confuse being accommodating with being weak.

The typical example these days is of course SOAP versus REST: the temptation is to support both rather than make a choice. This applies to standards and to proprietary interfaces. When a standard does this, it hurts rather than promote interoperability. Vendors have a bit more of an excuse when they offer a choice (“the customer is always right”) but in reality it forces customers to play Russian roulette whether they want it or not. Because one of the alternatives will eventually be left behind (either discarded or maintained but not improved). If you balance the small immediate customer benefit of using the interface style they are most used to with the risk of redoing the integration down the road, the value proposition of offering several options crumbles.

[Pedantic disclaimer: I use the term “REST” in this post the way it is often (incorrectly) used, to mean pretty much anything that uses HTTP without a SOAP wrapper. The technical issues are a topic for other posts.]


CMDBf v1 is a DMTF standard. It is a SOAP-based protocol. For v2, it has been suggested that there should a REST version. I don’t know what the CMDBf group (in which I participate) will end up doing but I’ve made my position clear: I could go either way (remain with SOAP or dump it) but I do not want to have two versions of the protocol (one SOAP one REST). If we think we’re better off with a REST version, then let’s make v2 REST-only. Supporting both mechanisms in v2 would be stupid. They would address the same use cases and only serve to provide political ass-coverage. There is no functional need for both. The argument that we need to keep supporting SOAP for the benefit of those who implemented v1 doesn’t fly. As an implementer, nobody is saying that you need to turn off your v1 services the second you launch the v2 version.

DMTF Cloud

Between the specifications submitted directly to DMTF, the specifications developed by DMTF “partner” organizations and the existing DMTF protocols, the DMTF Cloud effort is presented with a mix of SOAP, RESTful and XML-RPC-over-HTTP options. In the process of deciding what to create or adopt I am sure that the temptation will be high to take the easy route of supporting several versions to placate everyone. But such a “consensus” would be achieved on the back of the implementers so I very much hope it won’t be the case.

When it is appropriate

There are cases where supporting alternatives options is worth the cost. But it typically happens when they serve very different use cases. Think of SAX versus DOM, which have clearly differentiated sweetspots. In the Cloud world, Amazon S3 gives us interesting examples of both justified and extraneous alternatives. The extraneous one is the choice between REST and SOAP for the S3 API. I often praise AWS for its innovation and pragmatism, but this is an example of something that only looks pragmatic. On the other hand, the AWS import/export mechanism is a useful alternative. It allows you to physically ship a device with a few terabytes of data to Amazon. This is technically an alternative to the S3 programmatic interface, but one with obviously differentiated use cases. I recommend you reserve the use of “alternative APIs” for such scenarios.

If it didn’t work for Tiger Woods, it won’t work for your Cloud API either. Learn to commit.

[CLARIFICATION: based on some of the early Twitter feedback on this entry, I want to clarify that it’s alternative versions that I am against, not successive versions (i.e. an evolution of the interface over time). How to manage successive versions properly is a whole other debate.]


Filed under Amazon, API, Cloud Computing, CMDB, CMDBf, DMTF, Everything, IT Systems Mgmt, Protocols, REST, SOAP, Specs, Standards, Utility computing, Web services

Square peg, REST hole

For all its goodness, REST sometimes feels like trying to fit a square peg in the proverbial round hole. Some interaction patterns just don’t lend themselves well to the REST approach. Here are a few examples, taken from the field of IT/Cloud management.

Long-lived operations. You can’t just hang on for a synchronous response. Tim Bray best described the situation, which he called Slow REST. Do you create an “action in progress” resource?

Query: how do you query for “all the instances of app foo deployed in a container that has patch 1234 installed” in a to-each-resource-its-own-URL world? I’ve seen proposals that create a “query” resource and build it up incrementally by POSTing constraints to it. Very RESTful. Very impractical too.

Events: the process of creating and managing subscriptions maps well to the resource-oriented RESTful approach. It’s when you consider event delivery mechanisms that things get nasty. You quickly end up worrying a lot more about firewalls and the cost of keeping HTTP connections open than about RESTful purity.

Enumeration: what if your resource state is a very long document and you’d rather retrieve it in increments? A basic GET is not going to cut it. You either have to improve on GET or, once again, create a specifically crafted resource (an enumeration context) to serve as a crutch for your protocol.

Filtering: take that same resource with a very long representation. Say you just want a small piece of it (e.g. one XML element). How do you retrieve just that piece?

Collections: it’s hard to manage many resources as one when they each have their own control endpoint. It’s especially infuriating when the URLs look like where XXX, the only variable part, is a resource Id and you know – you just know – that there is one application processing all your messages and yet you can’t send it a unique message and tell it to apply the same request to a list of resources.

The afterlife: how do you retrieve data about a resource once it’s gone? Which is what a DELETE does to it. Except just because it’s been removed operationally doesn’t mean you have no interest in retrieving data about it.

I am not saying that these patterns cannot be supported in a RESTful way. In fact, the problem is that they can. A crafty engineer can come up with carefully-defined resources that would support all such usages. But at the cost of polluting the resource model with artifacts that have little to do with the business at hand and a lot more with the limitations of the access mechanism.

Now if we move from trying to do things in “the REST way” to doing them in “a way that is as simple as possible and uses HTTP smartly where appropriate” then we’re in a better situation as we don’t have to contort ourselves. It doesn’t mean that the problems above go away. Events, for example, are challenging to support even outside of any REST constraint. It just means we’re not tying one hand behind our back.

The risk of course is to loose out on many of the important benefits of REST (simplicity, robustness of links, flexibility…). Which is why it’s not a matter of using REST or not but a matter of using ideas from REST in a practical way.

With WS-*, on the other hand, we get a square peg to fit in a square hole. The problem there is that the peg is twice as wide as the hole…


Filed under Cloud Computing, Everything, Implementation, Modeling, Protocols, Query, REST, Utility computing

Waiting for events (in Cloud APIs)

Events/alerts/notifications have been a central concept in IT management at least since the first SNMP trap was emitted, and probably even long before that. And yet they are curiously absent from all the Cloud management APIs/protocols. If you think that’s because “THE CLOUD CHANGES EVERYTHING” then you may have to think again. Over the last few days, two of the most experienced practitioners of Cloud computing pointed out that this omission is a real pain in the neck. RightScale’s Thorsten von Eicken was first to request “an event based interface instead of a request-reply based interface”, pointing out that “we run a good number of machines that do nothing but chew up 100% cpu polling EC2 to detect changes”. George Reese seconded and started to sketch a solution. And while these blog posts gave the issue increased visibility recently, it has been a recurring topic on the AWS Forum and other similar discussion boards for quite some time. For example, in this thread going back to 2006, an Amazon employee wrote that “this is a feature we’ve discussed recently and we’re looking at options” (incidentally, I see a post by Thorsten in that old thread). We’re still waiting.

Let’s look at what it would take to define such a feature.

I have some experience with events for IT management, having been involved in the WS-Notification family of specifications and having co-chaired the OASIS technical committee that standardized them. This post is not about foisting WS-Notification on Cloud APIs, but just about surfacing some of the questions that come up when you try to standardize such a mechanism. While the main use cases for WS-Notification came from IT (and Grid) management, it was supposed to be a generic mechanism. A Cloud-centric eventing protocol can be made simpler by focusing on fewer use cases (Cloud scenarios only). In addition, WS-Notification was marred by the complexity-is-a-sign-of-greatness spirit of the time . On this too, a Cloud eventing protocol could improve things by keeping IBM at bay simplicity in mind.

Types of event

When you pull the state of a resource to see if anything changed,  you don’t have to tell the provider what kind of change you are interested in. If, on the other hand, you want the provider to notify you, then they need to know what you care about. You may not want to be notified on every single change in the resource state. How do you describe the changes you care about? Is there an agreed-upon set of states for the resource and you are only notified on state transitions? Can you indicate the minimum severity level for an event to be emitted? Who determines the severity of an event? Or do you get to specify what fields in the resource state you want to watch? What about numeric values for which you may not want to be notified of every change but only when a threshold is crossed? Do you get to specify a query and get notified whenever the query result changes? In WS-Notification some of this is handled by WS-Topics which I still like conceptually (I co-edited it) but is too complex for the task at hand.

Event formats

What format are the events serialized in? How is the even metadata captured (e.g. time stamp of observation, which may not be the same as the time at which the notification message was sent)? If the event payload is a representation of the new state of the resource, does it indicate what field changes (and what the old value was)? How do you keep event payloads consistent with the resource representation in the request/response interactions? If many events occur near the same time, can you group them in one notification message for better scalability?

Subscription creation

Presumably you need a subscription mechanism. Is the subscription set in stone when the resource is created? Or can you come later and subscribe? If subscription is an operation on the resource itself, how do you subscribe for events on something that doesn’t exist yet (e.g. “create a VM and notify me once it’s started”)? Do you get to set subscriptions on a per-resource-basis? Or is this a global setting for all the resources that you own? Can you have two different subscriptions on the same resource (e.g. a “critical events only” subscription that exist throughout the life of the resource, plus a “lots of events please” subscription that you keep for a few hours while troubleshooting)?

Subscription management

Do you get to come back and update/pause/delete a subscription? Do you get to change what filter the subscription carries? Or is it set in stone until the subscription expires? Can you change the delivery endpoint? What if events fail to be delivered? Does the provider cancel your subscription? After how many failures? Does it just pause it for a few hours? Keep trying?

Subscription expiration

Who sets the expiration period? The subscriber? Can the provider set a max duration? Do you get a warning message before the subscription expires? Can you renew a subscription or do you have to create a new one? Do you get a message telling you that it has expired? Where are these subscription-lifecycle messages sent? To the same endpoint as the regular messages? What if your subscription is being killed because your deliver endpoint is down, clearly it makes no sense to send the warning message to that same endpoint. Do you provide a separate “subscription management” endpoint (different from the event delivery endpoint) when you subscribe? Alternatively, does an email message get sent to the registered user who set the subscription?

Delivery reliability

How reliable do you want the notifications to be? Should the emitter retry until they’ve received a confirmation? How long do they keep messages that can’t be delivered? Some may have a very short shelf life while others are still useful weeks later. If you don’t have a reliable mechanism but you really “need to know about a lost server within a minute of it disappearing” (the example Georges gives) then in reality you may still have to poll just to make sure that an event wasn’t lost. If you haven’t received an event in a while, how can you test if the subscription is still working? Should subscriptions send a heartbeat message once a while?

Delivery mechanism

How do you deliver notifications? Do you keep HTTP connections open through tricks similar to how self-updating web pages work (e.g. COMET, long polling and soon WebSockets)? Or do you just provide a listener endpoint to which the notifier tries to connect (which, in the case of public cloud deployments, means you need to have a publicly-addressable listener, but hopefully not on the same Cloud infrastructure). Do you use XMPP? AMQP? Email? Can I have you hold my events and let me come pull them?


Do you need to verify the origin of the events you receive? Or do you assume they may be forged and always initiate a connection to the provider to double-check? And on the other side, what are the security requirements for event delivery? If a user looses some of their privileges, do you have to go and cancel the still-active subscriptions that they created?


Is there a maximum event rate? Do you get charged for the events the Cloud provider sends you? How do you make sure that someone doesn’t create a subscription pointing to the wrong endpoint (either erroneously or maliciously, e.g. DoS). Do you send a test message at registration asking the delivery endpoint to acknowledge that they indeed want to receive these notifications?


My goal is not to argue that we cannot have a simple yet good enough notification system or to scare anyone from attempting to define it. It’s just to show that it’s not as simple as it may seem at first blush. But there probably is a sweetspot and people like Thorsten and George are very well qualified to find it.

[UPDATED 2010/4/7: Amazon releases AWS Simple notification Service. Not just as an eventing feature for the Cloud API, as a generic notification service. Which can, of course, also carry Cloud management events. Though at this point you’re on your own to publish them from your instances, it doesn’t look like the AWS infrastructure can do it for you. Which means, for example, that you’re not going to be able to publish an event for a sudden crash.]


Filed under API, Application Mgmt, Automation, Cloud Computing, Desired State, Everything, IT Systems Mgmt, Manageability, Mgmt integration, Protocols, Specs, Standards, Tech, Utility computing