Category Archives: Utility computing

Big Data in the Cloud at Google I/O

Last week was a great party for the entire Google developer family, including Google Cloud Platform. And within the Cloud Platform, Big Data processing services. Which is where my focus has been in the almost two years I’ve been at Google.

It started with a bang, when our fearless leader Urs unveiled Cloud Dataflow in the keynote. Supported by a very timely demo (streaming analytics for a World Cup game) by my colleague Eric.

After the keynote, we had three live sessions:

In “Big Data, the Cloud Way“, I gave an overview of the main large-scale data processing services on Google Cloud:

  • Cloud Pub/Sub, a newly-announced service which provides reliable, many-to-many, asynchronous messaging,
  • the aforementioned Cloud Dataflow, to implement data processing pipelines which can run either in streaming or batch mode,
  • BigQuery, an existing service for large-scale SQL-based data processing at interactive speed, and
  • support for Hadoop and Spark, making it very easy to deploy and use them “the Cloud Way”, well integrated with other storage and processing services of Google Cloud Platform.

The next day, in “The Dawn of Fast Data“, Marwa and Reuven described Cloud Dataflow in a lot more details, including code samples. They showed how to easily construct a streaming pipeline which keeps a constantly-updated lookup table of most popular Twitter hashtags for a given prefix. They also explained how Cloud Dataflow builds on over a decade of data processing innovation at Google to optimize processing pipelines and free users from the burden of deploying, configuring, tuning and managing the needed infrastructure. Just like Cloud Pub/Sub and BigQuery do for event handling and SQL analytics, respectively.

Later that afternoon, Felipe and Jordan showed how to build predictive models in “Predicting the future with the Google Cloud Platform“.

We had also prepared some recorded short presentations. To learn more about how easy and efficient it is to use Hadoop and Spark on Google Cloud Platform, you should listen to Dennis in “Open Source Data Analytics“. To learn more about block storage options (including SSD, both local and remote), listen to Jay in “Optimizing disk I/O in the cloud“.

It was gratifying to see well-informed people recognize the importance of these announcement and partners understand how this will benefit their customers. As well as some good press coverage.

It’s liberating to now be able to talk freely about recent progress on our quest to equip Google Cloud users with easy to use data processing tools. Everyone can benefit from Google’s experience making developers productive while efficiently processing data at large scale. With great power comes great productivity.

1 Comment

Filed under Big Data, BigQuery, Cloud Computing, Cloud Dataflow, Everything, Google Cloud Platform, Implementation, Open source, Query, Spark, Tech, Utility computing

Pyramid of needs for Cloud management


Comments Off on Pyramid of needs for Cloud management

Filed under Application Mgmt, Automation, Business, Cloud Computing, Everything, Governance, IT Systems Mgmt, Manageability, Utility computing

PaaS lets you pick the right tool for the job, without having to worry about the additional operational complexity

In a recent blog post, Dan McKinley explains “Why MongoDB Never Worked Out at Etsy“. In short, the usefulness of using MongoDB in addition to their existing MySQL didn’t justify the additional operational complexity of managing another infrastructure service.

This highlights the least appreciated benefit of PaaS: PaaS lets you pick the right tool for the job, without having to worry about the additional operational complexity.

I tried to explain this a year ago in this InfoQ article. But the title was cringe-worthy and the article was too long.

So this blog will be short. I even made the main point bold; and put it in the title.


Comments Off on PaaS lets you pick the right tool for the job, without having to worry about the additional operational complexity

Filed under Cloud Computing, PaaS, Uncategorized, Utility computing

Ovum on Cloud evolution

By Laurent Lachal:

Two years ago, Ovum expected SaaS to evolve towards business process-as-a-service (BPaaS). The transition has already started. For example, Oracle’s SaaS strategy does not focus on each discrete SaaS application in its portfolio. Instead, Oracle focuses on the integration of its SaaS offering into end-to-end processes such as the “B2C customer experience” process that integrates Oracle WebCenter Sites, Oracle Siebel Marketing, Oracle Endeca, Oracle ATG Commerce, Oracle Financials, Procurement, and Supply Chain, as well as Oracle RightNow CX Cloud Service to achieve a consistent and personalized experience across all channels. Oracle underpins its SaaS-to-BPaaS story with strong iPaaS capabilities. Indeed, iPaaS is to become a key ingredient of the SaaS-to-BPaaS transition.

Remember, the integration is the application.


Comments Off on Ovum on Cloud evolution

Filed under Business Process, Cloud Computing, Everything, Utility computing

Are you the dumb money in the Cloud?

Another paper came out measuring the performance diversity of Cloud VMs within the same advertised type. Here’s the paper (PDF), here are the slides (PDF) and here is the video (this was presented back in June but I hadn’t seen it until today’s post in the High Scalability blog). Once again, the research shows a large discrepancy. The authors assert that “by selecting better-performing instances to complete the same task, end-users of Amazon EC2 platform can achieve up to 30% cost saving”.

I’ve heard people describing how they use “instance tasting”. Every time they get a new instance, they run performance tests (either on CPU like this paper, or i/o, or both, depending on how they plan to use the VM). They quickly terminate the instances that don’t perform well, to improve their bang/buck ratio. I don’t know how prevalent that practice is, but clearly the more sophisticated users have ways to game the system optimize their consumption.

But this doesn’t happen in a vacuum. It necessarily increases the likelihood that less sophisticated users will end up with (comparatively over-priced) lower-performing instances. Especially in Clouds with high heterogeneity and which have little unused capacity. It’s like coming to the fruit salad at a buffet and finding all the berries gone. I hope you like watermelon and honeydew.

Wall Street has a term for this. For people who don’t understand the system in details, who don’t have access to insider information and don’t use sophisticated trading technology: the dumb money.

Comments Off on Are you the dumb money in the Cloud?

Filed under Business, Cloud Computing, Everything, Performance, Utility computing

The enterprise Cloud battle will be an integration battle

The real threat the Cloud poses for incumbent enterprise software vendors is not the obvious one. It’s not about hosting. Sure, they would prefer if nothing changed and they could stick to shipping software and letting customers deal with installing, managing and operating it. Those were the good times.

But these days are over; Oracle, SAP and others realize it. They’re too smart to fight for a lost cause and try to stop this change (though of course they wouldn’t mind if it slowed down). They understand that they will have to host and operate their software themselves. Right now, their datacenter operations and software stacks aren’t as optimized for that model as those of native Cloud providers (at least the good ones), but they’ll get there. They have the smarts and the resources. They don’t have to re-invent it either, the skills and techniques needed will spread through the industry. They’ll also make some acquisitions to hurry things up. In the meantime, they’re probably willing to swallow the operational costs of under-optimized operations in order to prevent those of their customers most eager to move to the Cloud from defecting.

That transition, from running on the customer’s machines to running on the software vendor’s machine, while inconvenient for the vendors, is not by itself a fundamental threat.

The scary part, for enterprise software vendors transitioning to the SaaS model, is whether the enterprise application integration model will also change in the process.

[note: I include “customization” in the larger umbrella of “integration”.]

Enterprise software integration is hard and risky. Once you’ve invested in integrating your enterprise applications with one another (and/or with your partners’ applications), that integration becomes the #1 reason why you don’t want to change your applications. Or even upgrade them. That’s because the integration is an extension of the application being integrated. You can’t change the app and keep the integration. SOA didn’t change that. Both from a protocol perspective (the joys of SOAP and WS-* interoperability) and from a model perspective, SOA-type integration projects are still tightly bound to the applications they were designed for.

For all intents and purposes, the integration is subservient to the applications.

The opportunity (or risk, depending on which side you’re on) is if that model flips over as part of the move to Cloud Computing. If the integration become central and the applications become peripheral.

The integration is the application.

Which is really just pushing “the network is the computer” up the stack.

Just like cell phone operators don’t want to be a “dumb pipe”, enterprise software vendors don’t want to become a “dumb endpoint”. They want to own the glue.

That’s why Salesforce built, acquired Heroku and is now all over “enterprise social networking” (there’s no such thing as “enterprise social networking” by the way, it’s just better groupware, integrated with enterprise applications). That’s why WorkDay’s only acquisition to date, as early as 2008, was an enterprise integration vendor (CapeClear). And why Oracle, in building its public Cloud, is making sure to not just offer its applications as SaaS but also build a portfolio of integration-friendly services (“the technology services are for IT staff and software developers to build and extend business applications”).

How could this be flipped over, freeing the integration from being in the shadow of the application (and running on infrastructure provided by the application vendor)? Architecturally, this looks like a Web integration use case, as Erik Wilde posited. But that’s hard to imagine in practice, when you think of the massive amounts of data and the ever-growing requirements to extract value from it, quickly and at scale. Even if application vendors exposed good HTTP/REST APIs for their data, you don’t want to run Big Data analysis over these remote APIs.

So, how does the integration free itself from being controlled by the SaaS vendor, while retaining high-throughput and low-latency access to the data it requires? And how does the integration get high-throughput/low-latency access to data sets from different applications (by different vendors) at the same time? Well, that’s where the cliffhanger at the end of the previous blog post comes from.

Maybe the solution is to have SaaS applications run on a public Cloud. Or, in the terms used by the previous blog post, to run enterprise Solution Clouds (SaaS) on top of public Deployment Clouds (PaaS and IaaS). Two SaaS applications (by different vendors) which run on the same Deployment Cloud can then, via some level of coordination controlled by the customer, access each other’s data or let an “integration” application access both of their data sets. The customer-controlled coordination (and by “customer” here I mean the enterprise which subscribes to the SaaS apps) is two-fold: ensuring that the two SaaS applications are deployed in close proximity (in the same “Cloud zone” or whatever proximity abstraction the Deployment Cloud provider exposes); and setting appropriate access permission.

Enterprise application vendors will fight to maintain control of application integration. They will have reasons, some of them valid, supporting the case that it’s better if you use them to run your integration. But the most fundamental of these reasons, locality of the data, doesn’t have to be a blocker. That can be bridged by running on a shared public Deployment Cloud.

Remember, the integration is the application.


Filed under Business, Business Process, Cloud Computing, Enterprise Applications, Everything, Integration, Utility computing

Will Clouds run on Clouds?

Or, more precisely, will enterprise Solution Clouds run on Deployment Clouds?

I’ve previously made the case that we shouldn’t include SaaS in the “Cloud” (rather, it’s just Web apps). Obvioulsy my guillotine didn’t cut it against the lure of Cloud-branding. Fine, then let’s agree that we have two types of Clouds: Solution Clouds and Deployment Clouds.

Solution Clouds are the same as SaaS (and I’ll use the terms interchangeably). Enterprise Solution Clouds provide common functions like HR, CRM, Financial, Collaboration, etc. Every company needs them and they are similar between companies.

Deployment Clouds are where you deploy applications. That definition covers both IaaS and PaaS, which are part of the same continuum.

You subscribe to a Solution Cloud; you deploy on a Deployment Cloud.

Considering these two types of Clouds separately doesn’t mean they’re not connected. The same providers may offer both, and they may be tightly packaged together. But they’re still different in their nature.

The reason for this little lexicological excursion is to formulate this question: will enterprise Solution Clouds run on Deployment Clouds or on their own infrastructure? Can application-centric ISVs compete, in the enterprise market, by providing a Solution Cloud on top of someone else’s Deployment Cloud? Or will the most successful enterprise Solution Clouds be run by full-stack operators?

Right now, most incumbent enterprise software vendors (Oracle, SAP…), and the Cloud-only enterprise vendors with the most adoption (SalesForce, WorkDay…) offer Cloud services by operating their own infrastructure. On the other hand, there are many successful SaaS vendors targeting smaller companies which run their operations on top of a Deployment Cloud (e.g. the Google Cloud Platform or AWS). That even seems to be the default mode of operation for these vendors. Why not enterprise vendors? Let’s look at the potential reasons for this, in order to divine if it may change in the future.

  • It could be that it’s a historical accident, either because the currently successful enterprise providers necessarily started before Deployment Clouds were available (it takes time to build an enterprise Solution Cloud) or they started from existing on-premise software which didn’t lend itself to running on a Deployment Cloud. Or these SaaS providers were simply not culturally or technically ready to use Deployment Clouds.
  • It could be that it’s necessary in order to provide the necessary level of security and compliance. Every enterprise SaaS vendor has a “security” whitepaper which plays up the fact that they run their own datacenter, allowing them to ensure that all the guards have a  CSSLP, a fifth-degree karate black belt, a scary-looking goatee and that they follow a documented process when they brush their teeth in the morning. If that’s a reason, then it won’t change until Deployment Clouds can offer the same level of compliance (and prove it). Which is happening, BTW.
  • It could be that enterprise Solution Clouds are large enough (or have vocation to become large enough) that there are little economies of scale in sharing the infrastructure with other workload.
  • It could be that they can optimize the infrastructure to better serve their particular type of workload.

Any other reason? Will enterprise Solution Cloud always run on their own infrastructure? The end result probably hinges a lot on whether, once it fully moves to a Cloud delivery model, the enterprise SaaS landscape is as concentrated as the traditional enterprise application landscape was, or whether the move to Cloud opens the door to a more diverse ecosystem. The answer to this hinges, in turn, on Cloud application integration, which will be the topic of the next post.

1 Comment

Filed under Business, Cloud Computing, Everything, Uncategorized, Utility computing

Joining Google

Next Monday, I will start at Google, in the Cloud Platform team.

I’ve been watching that platform, and especially Google App Engine (GAE), since it started in 2008. It shaped my thoughts on Cloud Computing and on the tension between PaaS and IaaS. In my first post about GAE, 4.5 years ago, I wrote about that tension:

History is rarely kind to promoters of radical departures. The software industry is especially fond of layering the new on top of the old (a practice that has been enabled by the constant increase in underlying computing capacity). If you are wondering why your command prompt, shell terminal or text editor opens with a default width of 80 characters, take a trip back to 1928, when IBM defined its 80-columns punch card format. Will Google beat the odds or be forced to be more accommodating of existing code?

This debate (which I later characterized as “backward-compatible vs. forward-compatible”) is still ongoing. App Engine has grown a lot and shed its early limitations (I had a lot of fun trying to engineer around them in the early days). Google’s Cloud Platform today is also a lot more than App Engine, with Cloud Storage, Compute Engine, etc. It’s much more welcoming to existing applications.

The core question remains, however. How far, and how quickly will we move from the abstractions inherited from seeing the physical server as the natural unit of computation? What benefits will we derive from this transformation and will they make it worthwhile? Where’s the next point of equilibrium in the storm provoked by these shifts:

  • IT management technology was ripe for a change, applying to itself the automation capabilities that it had brought to other domains.
  • Software platforms were ripe for a change, as we keep discovering all the Web can be, all the data we can handle, and how best to take advantage of both.
  • The business of IT was ripe for a change, having grown too important to escape scrutiny of its inefficiency and sluggishness.

These three transformations didn’t have to take place at the same time. But they are, which leaves us with a fascinating multi-variable equation to optimize. I believe Google is the right place to crack this nut.

This is my view today, looking at the larger Cloud environment and observing Google’s Compute Platform from the outside. In a week’s time, I’ll be looking at it from the inside. October me may scoff at the naïveté of September me; or not. Either way, I’m looking forward to it.


Filed under Cloud Computing, Everything, Google, Google App Engine, Google Cloud Platform, People, Uncategorized, Utility computing

Google Compute Engine, the compete engine

Google is going to give Amazon AWS a run for its money. It’s the right move for Google and great news for everyone.

But that wasn’t plan A. Google was way ahead of everybody with a PaaS solution, Google App Engine, which was the embodiment of “forward compatibility” (rather than “backward compatibility”). I’m pretty sure that the plan, when they launched GAE in 2008, didn’t include “and in 2012 we’ll start offering raw VMs”. But GAE (and PaaS in general), while it made some inroads, failed to generate the level of adoption that many of us expected. Google smartly understood that they had to adjust.

“2012 will be the year of PaaS” returns 2,510 search results on Google, while “2012 will be the year of IaaS” returns only 2 results, both of which relate to a quote by Randy Bias which actually expresses quite a different feeling when read in full: “2012 will be the year of IaaS cloud failures”. We all got it wrong about the inexorable rise of PaaS in 2012.

But saying that, in 2012, IaaS still dominates PaaS, while not wrong, is an oversimplification.

At a more fine-grained level, Google Compute Engine is just another proof that the distinction between IaaS and PaaS was always artificial. The idea that you deploy your applications either at the IaaS or at the PaaS level was a fallacy. There is a continuum of application services, including VMs, various forms of storage, various levels of routing, various flavors of code hosting, various API-centric utility functions, etc. You can call one end of the spectrum “IaaS” and the other end “PaaS”, but most Cloud applications live in the continuum, not at either end. Amazon started from the left and moved to the right, Google is doing the opposite. Amazon’s initial approach was more successful at generating adoption. But it’s still early in the game.

As a side note, this is going to be a challenge for the Cloud Foundry ecosystem. To play in that league, Cloud Foundry has to either find a way to cover the full IaaS-to-PaaS continuum or it needs to efficiently integrate with more IaaS-centric Cloud frameworks. That will be a technical challenge, and also a political one. Or Cloud Foundry needs to define a separate space for itself. For example in Clouds which are centered around a strong SaaS offering and mainly work at higher levels of abstraction.

A few more thoughts:

  • If people still had lingering doubts about whether Google is serious about being a Cloud provider, the addition of Google Compute Engine (and, earlier, Google Cloud Storage) should put those to rest.
  • Here comes yet-another-IaaS API. And potentially a major one.
  • It’s quite a testament to what Linux has achieved that Google Compute Engine is Linux-only and nobody even bats an eye.
  • In the end, this may well turn into a battle of marketplaces more than a battle of Cloud environment. Just like in mobile.


Filed under Amazon, Cloud Computing, Everything, Google, Google App Engine, Google Cloud Platform, Utility computing

Podcast with Oracle Cloud experts

A couple of weeks ago, Bob Rhubart (who runs the OTN architect community) assembled four of us who are involved in Oracle’s Cloud efforts, public and private. The conversation turned into a four-part podcast, of which the first part is now available. The participants were James Baty (VP of Global Enterprise Architecture Program), Mark Nelson (lead architect for Oracle’s public Cloud among other responsibilities), Ajay Srivastava (VP of OnDemand platform), and me.

I think the conversation will give a good idea of the many ways in which we think about Cloud at Oracle. Our customers both provide and consume Cloud services. Oracle itself provides both private Cloud systems and a global public Cloud. All these are connected, both in terms of usage patterns (hybrid) and architecture/implementation (via technology sharing between our products and our Cloud services, such as Enterprise Manager’s central role in running the Oracle Cloud).

That makes for many topics and a lively conversation when the four of us get together.

Comments Off on Podcast with Oracle Cloud experts

Filed under Cloud Computing, Everything, Oracle, Oracle Cloud, People, Podcast, Portability, Standards, Utility computing

REST + RDF, finally a practical solution?

The W3C has recently approved the creation of the Linked Data Platform (LDP) Working Group. The charter contains its official marching orders. Its co-chair Erik Wilde shared his thoughts on the endeavor.

This is good. Back in 2009, I concluded a series of three blog posts on “REST in practice for IT and Cloud management” with:

I hereby conclude my “REST in practice for IT and Cloud management” series, with the intent to eventually start a “Linked Data in practice for IT and Cloud management” series.

I never wrote that later part, because my work took me away from that pursuit and there wasn’t much point writing down ideas which I hadn’t  put to the test. But if this W3C working group is successful, they will give us just that.

That’s a big “if” though. Religious debates and paralyzing disconnects between theorists and practitioners are all-too-common in tech, but REST and Semantic Web (of which RDF is the foundation) are especially vulnerable. Bringing these two together and trying to tame both sets of daemons at the same time is a daring proposition.

On the other hand, there is already a fair amount of relevant real-life experience (e.g. – read Jeni Tennison on the choice of Linked Data). Plus, Erik is a great pick to lead this effort (I haven’t met his co-chair, IBM’s Arnaud Le Hors). And maybe REST and RDF have reached the mythical point where even the trolls are tired and practicality can prevail. One can always dream.

Here are a few different ways to think about this work:

The “REST doesn’t provide interoperability” perspective

RESTful API authors too often think they can make the economy of a metamodel. Or that a format (like XML or JSON) can be used as a metamodel. Or they punt the problem towards defining a multitude of MIME types. This will never buy you interoperability. Stu explained it before. Most problems I see addressed via RESTful APIs, in the IT/Cloud management realm, are modeling problems first and only secondarily protocol/interaction problems. And their failures are failures of modeling. LDP should bring modeling discipline to REST-land.

The “RDF was too much, too soon” perspective

The RDF stack is mired in complexity. By the time people outside of academia had formed a set of modeling requirements that cried for RDF, the Semantic Web community was already deep in the weeds and had overloaded its basic metamodel with enough classification and inference technology to bury its core value as a simple graph-oriented and web-friendly metamodel. What XSD-fever was to making XML seem overly complex, OWL-fever was to RDF. Tenfold.

Everything that the LDP working group is trying to achieve can be achieved today with existing Semantic Web technologies. Technically speaking, no new work is needed. But only a handful of people understand these technologies enough to know what to use and what to ignore, and as such this application doesn’t have a chance to materialize. Which is why the LDP group is needed. But there’s a reason why its starting point document is called a “profile”. No new technology is needed. Only clarity and agreement.

For the record, I like OWL. It may be the technology that most influenced the way I think about modeling. But the predominance of RDFS and OWL (along with ugly serializations) in Semantic Web discussions kept RDF safely out of sight of those in industry who could have used it. Who knows what would have happened if a graph query language (SPARQL) had been prioritized ahead of inference technology (OWL)?

The Cloud API perspective

The scope of the LDP group is much larger than Cloud APIs, but my interest in it is mostly grounded in Cloud API use cases. And I see no reason why the requirements of Cloud APIs would not be 100% relevant to this effort.

What does this mean for the Cloud API debate? Nothing in the short term, but if this group succeeds, the result will probably be the best technical foundation for large parts of the Cloud management landscape. Which doesn’t mean it will be adopted, of course. The LDP timeline calls for completion in 2014. Who knows what the actual end date will be and what the Cloud API situation will be at that point. AWS APIs might be entrenched de-facto standards, or people may be accustomed to using several APIs (via libraries that abstract them away). Or maybe the industry will be clamoring for reunification and LDP will arrive just on time to underpin it. Though the track record is not good for such “reunifications”.

The “ghost of WS-*” perspective

Look at the 16 “technical issues” in the LCD working group charter. I can map each one to the relevant WS-* specification. E.g. see this as it relates to #8. As I’ve argued many times on this blog, the problems that WSMF/WSDM/WS-Mgmt/WS-RA and friends addressed didn’t go away with the demise of these specifications. Here is yet another attempt to tackle them.

The standards politics perspective

Another “fun” part of WS-*, beyond the joy of wrangling with XSD and dealing with multiple versions of foundational specifications, was the politics. Which mostly articulated around IBM and Microsoft. Well, guess what the primary competition to LDP is? OData, from Microsoft. I don’t know what the dynamics will be this time around, Microsoft and IBM alone don’t command nearly as much influence over the Cloud infrastructure landscape as they did over the XML middleware standardization effort.

And that’s just the corporate politics. The politics between standards organizations (and those who make their living in them) can be just as hairy; you can expect that DMTF will fight W3C, and any other organization which steps up, for control of the “Cloud management” stack. Not to mention the usual coo-petition between de facto and de jure organizations.

The “I told you so” perspective

When CMDBf started, I lobbied hard to base it on RDF. I explained that you could use it as just a graph-based metamodel, that you could  ignore the ontology and inference part of the stack. Which is pretty much what LDP is doing today. But I failed to convince the group, so we created our own metamodel (at least we explicitly defined one) and our own graph query language and that became CMDBf v1. Of course it was also SOAP-based.

KISS and markup

In closing, I’ll just make a plea for practicality to drive this effort. It’s OK to break REST orthodoxy. And not everything needs to be in RDF either. An overarching graph model is needed, but the detailed description of the nodes can very well remain in JSON, XML, or whatever format does the job for that node type.

All the best to LDP.


Filed under API, Cloud Computing, CMDB Federation, CMDBf, DMTF, Everything, Graph query, IBM, Linked Data, Microsoft, Modeling, Protocols, Query, RDF, REST, Semantic tech, SPARQL, Specs, Standards, Utility computing, W3C

GAE Traffic Splitting

Interesting addition to the Google App Engine (GAE) platform in release 1.6.3:  Traffic Splitting lets you run several versions of your application (using a DNS sub-domain for each version) and choose to direct a certain percentage of requests to a specific version. This lets you, among other things, slowly phase in your updates and test the result on a small set of users.

That’s nice, but until I read the documentation for the feature I had assumed (and hoped) it was something else.

Rather than using traffic splitting to test different versions of my app (something which the platform now makes convenient but which I could have implemented on my own), it would be nice if that mechanism could be used to test updates to the GAE platform itself. As described in “Come for the PaaS Functional Model, stay for the Cloud Operational Model“, it’s wishful thinking to assume that changes to the PaaS platform (an update applied by Google admins) cannot have a negative effect on your application. In other words, “When NoOps meets Murphy’s Law, my money is on Murphy“.

What would be nice is if Google could give application owners advanced warning of a platform change and let them use the Traffic Splitting feature to direct a portion of the incoming requests to application instances running on the new platform. And also a way to include the platform version in all log messages.

Here’s the issue as I described it in the aforementioned “Cloud Operational Model” post:

In other words, if a patch or update is worth testing in a staging environment if you were to apply it on-premise, what makes you think that it’s less likely to cause a problem if it’s the Cloud provider who rolls it out? Sure, in most cases it will work just fine and you can sing the praise of “NoOps”. Until the day when things go wrong, your users are affected and you’re taken completely off-guard. Good luck debugging that problem, when you don’t even know that an infrastructure change is being rolled out and when it might not even have been rolled out uniformly across all instances of your application.

How is that handled in your provider’s Operational Model? Do you have visibility into the change schedule? Do you have the option to test your application on the new infrastructure or to at least influence in any way how and when the change gets rolled out to your instances?

Hopefully, the addition of Traffic Splitting to Google App Engine is a step towards addressing that issue.


Filed under Application Mgmt, Automation, Cloud Computing, DevOps, Everything, Google, Google App Engine, Utility computing

Come for the PaaS Functional Model, stay for the Cloud Operational Model

The Functional Model of PaaS is nice, but the Operational Model matters more.

Let’s first define these terms.

The Functional Model is what the platform does for you. For example, in the case of AWS S3, it means storing objects and making them accessible via HTTP.

The Operational Model is how you consume the platform service. How you request it, how you manage it, how much it costs, basically the total sum of the responsibility you have to accept if you use the features in the Functional Model. In the case of S3, the Operational Model is made of an API/UI to manage it, a bill that comes every month, and a support channel which depends on the contract you bought.

The Operational Model is where the S (“service”) in “PaaS” takes over from the P (“platform”). The Operational Model is not always as glamorous as new runtime features. But it’s what makes Cloud Cloud. If a provider doesn’t offer the specific platform feature your application developers desire, you can work around it. Either by using a slightly-less optimal approach or by building the feature yourself on top of lower-level building blocks (as Netflix did with Cassandra on EC2 before DynamoDB was an option). But if your provider doesn’t offer an Operational Model that supports your processes and business requirements, then you’re getting a hipster’s app server, not a real PaaS. It doesn’t matter how easy it was to put together a proof-of-concept on top of that PaaS if using it in production is playing Russian roulette with your business.

If the Cloud Operational Model is so important, what defines it and what makes a good Operational Model? In short, the Operational Model must be able to integrate with the consumer’s key processes: the business processes, the development processes, the IT processes, the customer support processes, the compliance processes, etc.

To make things more concrete, here are some of the key aspects of the Operational Model.

Deployment / configuration / management

I won’t spend much time on this one, as it’s the most understood aspect. Most Clouds offer both a UI and an API to let you provision and control the artifacts (e.g. VMs, application containers, etc) via which you access the PaaS functional interface. But, while necessary, this API is only a piece of a complete operational interface.


What happens when things go wrong? What support channels do you have access to? Every Cloud provider will show you a list of support options, but what’s really behind these options? And do they have the capability (technical and logistical) to handle all your issues? Do they have deep expertise in all the software components that make up their infrastructure (especially in PaaS) from top to bottom? Do they run their own datacenter or do they themselves rely on a customer support channel for any issue at that level?


I personally think discussions around SLAs are overblown (it seems like people try to reduce the entire Cloud Operational Model to a provisioning API plus an SLA, which is comically simplistic). But SLAs are indeed part of the Operational Model.

Infrastructure change management

It’s very nice how, in a PaaS setting, the Cloud provider takes care of all change management tasks (including patching) for the infrastructure. But the fact that your Cloud provider and you agree on this doesn’t neutralize Murphy’s law any more than me wearing Michael Jordan sneakers neutralizes the law of gravity when I (try to) dunk.

In other words, if a patch or update is worth testing in a staging environment if you were to apply it on-premise, what makes you think that it’s less likely to cause a problem if it’s the Cloud provider who rolls it out? Sure, in most cases it will work just fine and you can sing the praise of “NoOps”. Until the day when things go wrong, your users are affected and you’re taken completely off-guard. Good luck debugging that problem, when you don’t even know that an infrastructure change is being rolled out and when it might not even have been rolled out uniformly across all instances of your application.

How is that handled in your provider’s Operational Model? Do you have visibility into the change schedule? Do you have the option to test your application on the new infrastructure or to at least influence in any way how and when the change gets rolled out to your instances?

Note: I’ve covered this in more details before and so has Chris Hoff.


Developers have assembled a panoply of diagnostic tools (memory/thread analysis, BTM, user experience, logging, tracing…) for the on-premise model. Many of these won’t work in PaaS settings because they require a console on the local machine, or an agent, or a specific port open, or a specific feature enabled in the runtime. But the need doesn’t go away. How does your PaaS Operational Model support that process?

Customer support

You’re a customer of your Cloud, but you have customers of your own and you have to support them. Do you have the tools to react to their issues involving your Cloud-deployed application? Can you link their service requests with the related actions and data exposed via your Cloud’s operational interface?

Security / compliance

Security is part of what a Cloud provider has to worry about. The problem is, it’s a very relative concept. The issue is not what security the Cloud provider needs, it’s what security its customers need. They have requirements. They have mandates. They have regulations and audits. In short, they have their own security processes. The key question, from their perspective, is not whether the provider’s security is “good”, but whether it accommodates their own security process. Which is why security is not a “trust us” black box (I don’t think anyone has coined “NoSec” yet, but it can’t be far behind “NoOps”) but an integral part of the Cloud Operational Model.

Business management

The oft-repeated mantra is that Cloud replaces capital expenses (CapExp) with operational expenses (OpEx). There’s a lot more to it than that, but it surely contributes a lot to OpEx and that needs to be managed. How does the Cloud Operational Model support this? Are buyer-side roles clearly identified (who can create an account, who can deploy a service instance, who can manage a deployed instance, etc) and do they map well to the organizational structure of the consumer organization? Can charges be segmented and attributed to various cost centers? Can quotas be set? Can consumption/cost projections be run?

We all (at least those of us who aren’t accountants) love a great story about how some employee used a credit card to get from the Cloud something that the normal corporate process would not allow (or at too high a cost). These are fun for a while, but it’s not sustainable. This doesn’t mean organizations will not be able to take advantage of the flexibility of Cloud, but they will only be able to do it if the Cloud Operational Model provides the needed support to meet the requirements of internal control processes.


Some of the ways in which the Cloud Operational Model materializes can be unexpected. They can seem old-fashioned. Let’s take Amazon Web Services (AWS) as an example. When they started, ownership of AWS resources was tied to an individual user’s Amazon account. That’s a big Operational Model no-no. They’ve moved past that point. As an illustration of how the Operational Model materializes, here are some of the features that are part of Amazon’s:

  • You can Fedex a drive and have Amazon load the data to S3.
  • You can optimize your costs for flexible workloads via spot instances.
  • The monitoring console (and API) will let you know ahead of time (when possible) which instances need to be rebooted and which will need to be terminated because they run on a soon-to-be-decommissioned server. Now you could argue that it’s a limitation of the AWS platform (lack of live migration) but that’s not the point here. Limitations exists and the role of the Operational Model is to provide the tools to handle them in an acceptable way.
  • Amazon has a program to put customers in touch with qualified System Integrators.
  • You can use your Amazon support channel for questions related to some 3rd party software (though I don’t know what the depth of that support is).
  • To support your security and compliance requirements, AWS support multi-factor authentication and has achieved some certifications and accreditations.
  • Instance status checks can help streamline your diagnostic flows.

These Operational Model features don’t generate nearly as much discussion as new Functional Model features (“oh, look, a NoSQL AWS service!”) . That’s OK. The Operational Model doesn’t seek the limelight.

Business applications are involved, in some form, in almost every activity taking place in a company. Those activities take many different forms, from a developer debugging an application to an executive examining operational expenses. The PaaS Operational Model must meet their needs.


Filed under Amazon, API, Application Mgmt, Automation, Business, Business Process, Cloud Computing, Everything, Manageability, Mgmt integration, PaaS, Utility computing

Everything is PaaSible

That’s the title of an article I wrote for InfoQ and which went live today.

If you can get past the punny title you’ll read about the following points:

  • In traditional (and IaaS) environments, many available application infrastructure features remain rarely used because of the cost (perceived or real) or adding them to the operational environment.
  • Most PaaS environments of today don’t even let you make use of these features, at any cost, because of  constraints imposed by PaaS providers for the sake of simplifying and streamlining their operations.
  • In the future, PaaS will not only make these available but available at a negligible incremental operational cost.
  • Even beyond that, PaaS will make available application services that are, in traditional settings, completely out of scope for the application programmer. Early examples include CDN, DNS and loab balancing services offered, for example, by Amazon. An application developer in most traditional data centers would have to jump through endless hoops if she wanted to control these services within the application. I believe that these network-related services are just the low hanging fruits and many more once-unthinkable infrastructure services will become programmable as part of the application.

PaaS will become less about “hosting” and more about offering application services. In other words, going back to the formula I proposed on Twitter:

Cloud = Hosting + SOA

IaaS is a lot more “hosting” than SOA, PaaS is a lot more “SOA” (application infrastructure services available via APIs) than “hosting”.

You can read the full article for more.

Comments Off on Everything is PaaSible

Filed under API, Application Mgmt, Articles, Cloud Computing, Everything, Manageability, Mgmt integration, Middleware, PaaS, Utility computing

Introducing Enterprise Manager Cloud Control 12c

Oracle Enterprise Manager Cloud Control 12c, the new version of Oracle’s IT management product came out a few weeks ago, during Open World (video highlights of the launch). That release was known internally for a while as “NG” for “next generation” because the updates it contains are far more numerous and profound than your average release. The design for “NG” (now “12c”) started long before Enterprise Manager Grid Control 11g, the previous major release, shipped. The underlying framework has been drastically improved, from the modeling capabilities, to extensibility, scalability, incident management and, most visibly, UI.

If you’re not an existing EM user then those framework upgrades won’t be as visible to you as the feature upgrades and additions. And there are plenty of those as well, from database management to application management and configuration management. The most visible addition is the all-new self-service portal through which an EM-driven private Cloud can be made available. This supports IaaS-level services (individual VMs or assemblies composed of multiple coordinated VMs) and DBaaS services (we’ve also announced and demonstrated upcoming PaaS services). And it’s not just about delivering these services via lifecycle automation, a lot of work has also gone into supporting the business and organizational aspects of delivering services in a private Cloud: quotas, chargeback, cost centers, maintenance plans, etc…

EM Cloud Control is the first Oracle product with the “12c” suffix. You probably guessed it, the “c” stands for “Cloud”. If you consider the central role that IT management software plays in Cloud Computing I think it’s appropriate for EM to lead the way. And there’s a lot more “c” on the way.

Many (short and focused) demo videos are available. For more information, see the product marketing page, the more technical overview of capabilities or the even more technical product documentation. Or you can just download the product (or, for production systems, get it on eDelivery).

If you missed the launch at Open World, EM12c introduction events are taking place all over the world in November and December. They start today, November 3rd, in Athens, Riga and Beijing.

We’re eager to hear back from users about this release. I’ve run into many users blogging about installing EM12c and I’ll keep eye out for their reports after using it for a bit.

Comments Off on Introducing Enterprise Manager Cloud Control 12c

Filed under Application Mgmt, Cloud Computing, Everything, IT Systems Mgmt, Oracle, Utility computing

DMTF publishes draft of Cloud API

Note to anyone who still cares about IaaS standards: the DMTF has published a work in progress.

There was a lot of interest in the topic in 2009 and 2010. Some heated debates took place during Cloud conferences and a few symposiums were organized to try to coordinate various standard efforts. The DMTF started an “incubator” on the topic. Many companies brought submissions to the table, in various levels of maturity: VMware, Fujitsu, HP, Telefonica, Oracle and RedHat. IBM and Microsoft might also have submitted something, I can’t remember for sure.

The DMTF has been chugging along. The incubator turned into a working group. Unfortunately (but unsurprisingly), it limited itself to the usual suspects (and not all the independent Cloud experts out there) and kept the process confidential. But this week it partially lifted the curtain by publishing two work-in-progress documents.

They can be found at but if you read this after March 2012 they won’t be there anymore, as DMTF likes to “expire” its work-in-progress documents. The two docs are:

The first one is the interesting one, and the one you should read if you want to see where the DMTF is going. It’s a RESTful specification (at the cost of some contortions, e.g. section It supports both JSON and XML (bad idea). It plans to use RelaxNG instead of XSD (good idea). And also CIM/MOF (not a joke, see the second document for proof). The specification is pretty ambitious (it covers not just lifecycle operations but also monitoring and events) and well written, especially for a work in progress (props to Gil Pilz).

I am surprised by how little reaction there has been to this publication considering how hotly debated the topic used to be. Why is that?

A cynic would attribute this to people having given up on DMTF providing a Cloud API that has any chance of wide adoption (the adjoining CIM document sure won’t help reassure DMTF skeptics).

To the contrary, an optimist will see this low-key publication as a sign that the passions have cooled, that the trusted providers of enterprise software are sitting at the same table and forging consensus, and that the industry is happy to defer to them.

More likely, I think people have, by now, enough Cloud experience to understand that standardizing IaaS APIs is a minor part of the problem of interoperability (not to mention the even harder goal of portability). The serialization and plumbing aspects don’t matter much, and if they do to you then there are some good libraries that provide mappings for your favorite language. What matters is the diversity of resources and services exposed by Cloud providers. Those choices strongly shape the design of your application, much more than the choice between JSON and XML for the control API. And nobody is, at the moment, in position to standardize these services.

So congrats to the DMTF Cloud Working Group for the milestone, and please get the API finalized. Hopefully it will at least achieve the goal of narrowing down the plumbing choices to three (AWS, OpenStack and DMTF). But that’s not going to solve the hard problem.


Filed under API, Application Mgmt, Automation, Cloud Computing, DMTF, Everything, IaaS, IT Systems Mgmt, Manageability, Mgmt integration, Modeling, Portability, Protocols, REST, Specs, Standards, Tech, Utility computing, Virtual appliance, Virtualization

Perspectives on acquisition

Interesting analysis (by Gartner’s Lydia Leong) on the acquisition of by Citrix (apparently for 100x revenues) and its position as a cheaper alternative for vCloud (at least until OpenStack Nova becomes stable).

Great read, even though that part:

“[Zygna] uses to provide Amazon-compatible (and thus Rightscale-compatible) infrastructure internally, letting it easily move workloads across their own infrastructure and Amazon’s.”

is a bit of a simplification.

While I’m at it, here’s another take on, this time from an OSS license perspective. Namely, the difference between building your business on GPL (like Eucalyptus) or Apache 2 (like the more community-driven open source projects such as OpenStack).

Towards the end, there’s also a nice nod to the Oracle Cloud API:

“DMTF has been receiving other submissions for an API standard. Oracle has made its submission public.  It is based on an earlier Sun proposal, and it is the best API we have yet seen. Furthermore, Oracle has identified a core subset to allow initial early adoption, as well as areas where vendors (including themselves and crucially VMware) may continue to extend to allow differentiation.”

Here’s more on the Oracle Cloud API, including an explanation of the “core/extension” split mentioned above.


Comments Off on Perspectives on acquisition

Filed under Cloud Computing, DMTF, Everything, Governance, Mgmt integration, Open source, OpenStack, Oracle, Specs, Standards, Utility computing, Virtualization, VMware

Reading IBM’s proposed standard for Cloud Architecture

Did you enjoy the first version of IBM’s Cloud Computing Reference Architecture? Did you even get certified on it? Then rejoice, because there’s a new version. IBM  recently submitted the IBM Cloud Computing Reference Architecture 2.0 to The Open Group.

I’m a bit out of practice reading this kind of IBMese (let’s just say that The Open Group was the right place to submit it) but I would never let my readers down. So, even though these box-within-a-box-within-a-box diagrams (see section 2) give me flashbacks to the days of OGF and WSRF, I soldiered on.

I didn’t understand the goal of the document enough to give you a fair summary, but I can share some thoughts.

It starts by talking a lot about SOA. I initially thought this was to make the point that Glen Daniels articulated very well in this tweet:

Yup, correct SOA patterns (loose coupling, dyn refs, coarse interfaces…) are exactly what you need for cloud apps. You knew this.

But no. Rather than Glen’s astute remark, IBM’s point is one meta-level lower. It’s that “Cloud solutions are SOA solutions”. Which I have a harder time parsing. If you though “service” was overloaded before…

While some of the IBM authors are SOA experts, others apparently come from a Telco background so we get OSS/BSS analogies next.

By that point, I’ve learned that Cloud is like SOA except when it’s like Telco (but there’s probably another reference architecture somewhere that explains that Telco is SOA, so it all adds up).

One thing that chagrined me was that even though this document is very high-level it still manages to go down into implementatin technologies long enough to assert, wrongly, that virtualization is required for Cloud solutions. Another Cloud canard repeated here is the IaaS/PaaS/SaaS segmentation of the Cloud world, to which IBM adds a BPaaS (Business Process as a Service) layer for good measure (for my take on how Cloud relates to SOA, and how I dislike the IaaS/PaaS/SaaS pyramid, see this write-up of the presentation I gave at last year’s Cloud Connect, especially the 3rd picture).

It gets a lot better if you persevere to page 29, where the “Architecture Principles” finally get introduced (if had been asked to edit the paper, I would have only kept the last 6 pages). They are:

  1. Design for Cloud-scale Efficiencies: When realizing cloud characteristics such as elasticity, self-service access, and flexible sourcing, the cloud design is strictly oriented to high cloud scale efficiencies and short time-to-delivery/time-to-change. (“Efficiency Principle”)
  2. Support Lean Service Management: The Common Cloud Management Platform fosters lean and lightweight service management policies, processes, and technologies. (“Lightweightness Principle”)
  3. Identify and Leverage Commonalities: All commonalities are identified and leveraged in cloud service design. (“Economies-of-scale principle”)
  4. Define and Manage generically along the Lifecycle of Cloud Services: Be generic across I/P/S/BPaaS & provide ‘exploitation’ mechanism to support various cloud services using a shared, common management platform (“Genericity”).

Each principle gets a nickname, thanks to which IBM can refer to this list as the ELEG principles (Efficiency, Lightweightness, Economies-of-scale, Genericity). It also spells GLEE, but apparently that’s wasn’t the prefered sequence.

The first principle is hard to disagree with. The second also rings true, including its dings on ITIL (but the irony of IBM exhaulting “Lightweightness” is hard to ignore). The third and fourth principles (by that time I had lost too many brain cells to understand how they differ) really scared me. While I can understand the motivation, they elicited a vision of zombies in blue suits (presumably undead IBM Distinguish Engineers and Fellows) staggering towards me: “frameworks… we want frameworks…”.

There you go. If you want more information (and, more importantly, unbiased information) go read the Reference Architecture yourself. I am not involved in The Open Group, and I have no idea what it plans to do with it (and if it has received other submissions of the same type). Though I wouldn’t be surprised if I see, in 5 years, some panic sales rep asking an internal mailing list “The customer RPF asks for a mapping of our solution to the Open Group Cloud Reference Architecture and apparently IBM has 94 slides about it, what do I do? Has anyone heard about this Reference Architecture? This is urgent.”

Urgent things are long in the making.

1 Comment

Filed under Application Mgmt, Automation, Big picture, BPM, BSM, Business Process, Cloud Computing, Everything, Governance, IBM, IT Systems Mgmt, ITIL, Mgmt integration, Utility computing

CloudFormation in context

I’ve been very positive about AWS CloudFormation (both in tweet and blog form) since its announcement . I want to clarify that it’s not the technology that excites me. There’s nothing earth-shattering in it. CloudFormation only covers deployment and doesn’t help you with configuration, monitoring, diagnostic and ongoing lifecycle. It’s been done before (including probably a half-dozen times within IBM alone, I would guess). We’ve had much more powerful and flexible frameworks for a long time (I can’t even remember when SmartFrog first came out). And we’ve had frameworks with better tools (though history suggests that tools for CloudFormation are already in the works, not necessarily inside Amazon).

Here are some non-technical reasons why I tweeted that “I have a feeling that the AWS CloudFormation format might become an even more fundamental de-facto standard than the EC2 API” even before trying it out.

It’s simple to use. There are two main reasons for this (and the fact that it uses JSON rather than XML is not one of them):
– It only support a small set of features
– It “hard-codes” resource types (e.g. EC2, Beanstalk, RDS…) rather than focusing on an abstract and extensible mechanism

It combines a format and an API. You’d think it’s obvious that the two are complementary. What can you do with a format if you don’t have an API to exchange documents in that format? Well, turns out there are lots of free-floating model formats out there for which there is no defined API. And they are still wondering why they never saw any adoption.

It merges IaaS and PaaS. AWS has always defied the “IaaS vs. PaaS” view of the Cloud. By bridging both, CloudFormation is a great way to provide a smooth transition. I expect most of the early templates to be very EC2-centric (are as most AWS deployments) and over time to move to a pattern in which EC2 resources are just used for what doesn’t fit in more specialized containers).

It comes at the right time. It picks the low-hanging fruits of the AWS automation ecosystem. The evangelism and proof of concept for templatized deployments have already taken place.

It provides a natural grouping of the various AWS resources you are currently consuming. They are now part of an explicit deployment context.

It’s free (the resources provisioned are not free, of course, but the fact that they came out of a CloudFormation deployment doesn’t change the cost).

Comments Off on CloudFormation in context

Filed under Amazon, Application Mgmt, Automation, Cloud Computing, Everything, Mgmt integration, Modeling, PaaS, Specs, Utility computing

AWS CloudFormation is the iPhone of Cloud services

Expanding on tweet that I wrote soon after the announcement of AWS CloudFormation.

The iPhone unifies the GPS, phone, PDA, camera and camcorder. CloudFormation does the same for infrastructure services (VMs, volumes, network…) and some platform services (Beanstalk, RDS, SimpleDB, SQS, SNS…). You don’t think about whether you should grab a phone or a PDA, you grab an iPhone and start using the feature you need. It’s the default tool. Similarly with CloudFormation, you won’t start by thinking about what AWS service you want to use. Rather, you grab a CloudFormation template and modify it as needed. The template (or the template editor) is the default tool.

The iPhone doesn’t just group features that used to be provided by many devices. It also allows these features to collaborate. It’s not that you get a PDA and a phone side-by-side in one device. You can press the “call” button from the “PDA” feature. CloudFormation doesn’t just bundle deployments to various AWS services, it wires them together.

Anyone can write apps for the iPhone. Anyone can write apps that use CloudFormation.

There’s an App Store for iPhone apps. On the CloudFormation side, it will probably come soon (right now Amazon has made templates available on S3, but that’s not a real store). Amazon has developed example templates for a set of common applications, but expect application authors to take ownership of that task soon. They’ll consider it one of their deliverables. Right next to the “download” button you’ll start seeing a “deploy to AWS” button. Guess which one will eventually be used the most?

It’s Apple’s platform and your applications have to comply with their policy. AWS is not as much of a control freak as Apple and doesn’t have an upfront approval process, but it has its terms of service and they too can get you kicked out.

The iPhone is not a standard platform (though you may consider it a de-facto standard). Same for AWS CloudFormation.

There are alternatives to the iPhone that define themselves primarily as being more open than it, mainly Android. Same for AWS with OpenStack (which probably will soon have its CloudFormation equivalent).

The iPhone infiltrated itself into corporations at the ground level, even if the CIO initially saw no reason to look beyond BlackBerry for corporate needs. Same with AWS.

Any other parallel? Any fundamental difference I missed?

1 Comment

Filed under Amazon, Application Mgmt, Automation, Cloud Computing, Everything, Mgmt integration, Modeling, OpenStack, PaaS, Portability, Specs, Tech, Utility computing, Virtual appliance