Archive for the ‘Cloud Computing’ Category

Post

Come for the PaaS Functional Model, stay for the Cloud Operational Model

In Amazon,API,Application Mgmt,Automation,Business,Business Process,Cloud Computing,Everything,Manageability,Mgmt integration,PaaS,Utility computing on February 2, 2012 by @vambenepe

The Functional Model of PaaS is nice, but the Operational Model matters more.

Let’s first define these terms.

The Functional Model is what the platform does for you. For example, in the case of AWS S3, it means storing objects and making them accessible via HTTP.

The Operational Model is how you consume the platform service. How you request it, how you manage it, how much it costs, basically the total sum of the responsibility you have to accept if you use the features in the Functional Model. In the case of S3, the Operational Model is made of an API/UI to manage it, a bill that comes every month, and a support channel which depends on the contract you bought.

The Operational Model is where the S (“service”) in “PaaS” takes over from the P (“platform”). The Operational Model is not always as glamorous as new runtime features. But it’s what makes Cloud Cloud. If a provider doesn’t offer the specific platform feature your application developers desire, you can work around it. Either by using a slightly-less optimal approach or by building the feature yourself on top of lower-level building blocks (as Netflix did with Cassandra on EC2 before DynamoDB was an option). But if your provider doesn’t offer an Operational Model that supports your processes and business requirements, then you’re getting a hipster’s app server, not a real PaaS. It doesn’t matter how easy it was to put together a proof-of-concept on top of that PaaS if using it in production is playing Russian roulette with your business.

If the Cloud Operational Model is so important, what defines it and what makes a good Operational Model? In short, the Operational Model must be able to integrate with the consumer’s key processes: the business processes, the development processes, the IT processes, the customer support processes, the compliance processes, etc.

To make things more concrete, here are some of the key aspects of the Operational Model.

Deployment / configuration / management

I won’t spend much time on this one, as it’s the most understood aspect. Most Clouds offer both a UI and an API to let you provision and control the artifacts (e.g. VMs, application containers, etc) via which you access the PaaS functional interface. But, while necessary, this API is only a piece of a complete operational interface.

Support

What happens when things go wrong? What support channels do you have access to? Every Cloud provider will show you a list of support options, but what’s really behind these options? And do they have the capability (technical and logistical) to handle all your issues? Do they have deep expertise in all the software components that make up their infrastructure (especially in PaaS) from top to bottom? Do they run their own datacenter or do they themselves rely on a customer support channel for any issue at that level?

SLAs

I personally think discussions around SLAs are overblown (it seems like people try to reduce the entire Cloud Operational Model to a provisioning API plus an SLA, which is comically simplistic). But SLAs are indeed part of the Operational Model.

Infrastructure change management

It’s very nice how, in a PaaS setting, the Cloud provider takes care of all change management tasks (including patching) for the infrastructure. But the fact that your Cloud provider and you agree on this doesn’t neutralize Murphy’s law any more than me wearing Michael Jordan sneakers neutralizes the law of gravity when I (try to) dunk.

In other words, if a patch or update is worth testing in a staging environment if you were to apply it on-premise, what makes you think that it’s less likely to cause a problem if it’s the Cloud provider who rolls it out? Sure, in most cases it will work just fine and you can sing the praise of “NoOps”. Until the day when things go wrong, your users are affected and you’re taken completely off-guard. Good luck debugging that problem, when you don’t even know that an infrastructure change is being rolled out and when it might not even have been rolled out uniformly across all instances of your application.

How is that handled in your provider’s Operational Model? Do you have visibility into the change schedule? Do you have the option to test your application on the new infrastructure or to at least influence in any way how and when the change gets rolled out to your instances?

Note: I’ve covered this in more details before and so has Chris Hoff.

Diagnostic

Developers have assembled a panoply of diagnostic tools (memory/thread analysis, BTM, user experience, logging, tracing…) for the on-premise model. Many of these won’t work in PaaS settings because they require a console on the local machine, or an agent, or a specific port open, or a specific feature enabled in the runtime. But the need doesn’t go away. How does your PaaS Operational Model support that process?

Customer support

You’re a customer of your Cloud, but you have customers of your own and you have to support them. Do you have the tools to react to their issues involving your Cloud-deployed application? Can you link their service requests with the related actions and data exposed via your Cloud’s operational interface?

Security / compliance

Security is part of what a Cloud provider has to worry about. The problem is, it’s a very relative concept. The issue is not what security the Cloud provider needs, it’s what security its customers need. They have requirements. They have mandates. They have regulations and audits. In short, they have their own security processes. The key question, from their perspective, is not whether the provider’s security is “good”, but whether it accommodates their own security process. Which is why security is not a “trust us” black box (I don’t think anyone has coined “NoSec” yet, but it can’t be far behind “NoOps”) but an integral part of the Cloud Operational Model.

Business management

The oft-repeated mantra is that Cloud replaces capital expenses (CapExp) with operational expenses (OpEx). There’s a lot more to it than that, but it surely contributes a lot to OpEx and that needs to be managed. How does the Cloud Operational Model support this? Are buyer-side roles clearly identified (who can create an account, who can deploy a service instance, who can manage a deployed instance, etc) and do they map well to the organizational structure of the consumer organization? Can charges be segmented and attributed to various cost centers? Can quotas be set? Can consumption/cost projections be run?

We all (at least those of us who aren’t accountants) love a great story about how some employee used a credit card to get from the Cloud something that the normal corporate process would not allow (or at too high a cost). These are fun for a while, but it’s not sustainable. This doesn’t mean organizations will not be able to take advantage of the flexibility of Cloud, but they will only be able to do it if the Cloud Operational Model provides the needed support to meet the requirements of internal control processes.

Conclusion

Some of the ways in which the Cloud Operational Model materializes can be unexpected. They can seem old-fashioned. Let’s take Amazon Web Services (AWS) as an example. When they started, ownership of AWS resources was tied to an individual user’s Amazon account. That’s a big Operational Model no-no. They’ve moved past that point. As an illustration of how the Operational Model materializes, here are some of the features that are part of Amazon’s:

  • You can Fedex a drive and have Amazon load the data to S3.
  • You can optimize your costs for flexible workloads via spot instances.
  • The monitoring console (and API) will let you know ahead of time (when possible) which instances need to be rebooted and which will need to be terminated because they run on a soon-to-be-decommissioned server. Now you could argue that it’s a limitation of the AWS platform (lack of live migration) but that’s not the point here. Limitations exists and the role of the Operational Model is to provide the tools to handle them in an acceptable way.
  • Amazon has a program to put customers in touch with qualified System Integrators.
  • You can use your Amazon support channel for questions related to some 3rd party software (though I don’t know what the depth of that support is).
  • To support your security and compliance requirements, AWS support multi-factor authentication and has achieved some certifications and accreditations.
  • Instance status checks can help streamline your diagnostic flows.

These Operational Model features don’t generate nearly as much discussion as new Functional Model features (“oh, look, a NoSQL AWS service!”) . That’s OK. The Operational Model doesn’t seek the limelight.

Business applications are involved, in some form, in almost every activity taking place in a company. Those activities take many different forms, from a developer debugging an application to an executive examining operational expenses. The PaaS Operational Model must meet their needs.

Post

Everything is PaaSible

In API,Application Mgmt,Articles,Cloud Computing,Everything,Manageability,Mgmt integration,Middleware,PaaS,Utility computing on December 15, 2011 by @vambenepe

That’s the title of an article I wrote for InfoQ and which went live today.

If you can get past the punny title you’ll read about the following points:

  • In traditional (and IaaS) environments, many available application infrastructure features remain rarely used because of the cost (perceived or real) or adding them to the operational environment.
  • Most PaaS environments of today don’t even let you make use of these features, at any cost, because of  constraints imposed by PaaS providers for the sake of simplifying and streamlining their operations.
  • In the future, PaaS will not only make these available but available at a negligible incremental operational cost.
  • Even beyond that, PaaS will make available application services that are, in traditional settings, completely out of scope for the application programmer. Early examples include CDN, DNS and loab balancing services offered, for example, by Amazon. An application developer in most traditional data centers would have to jump through endless hoops if she wanted to control these services within the application. I believe that these network-related services are just the low hanging fruits and many more once-unthinkable infrastructure services will become programmable as part of the application.

PaaS will become less about “hosting” and more about offering application services. In other words, going back to the formula I proposed on Twitter:

Cloud = Hosting + SOA

IaaS is a lot more “hosting” than SOA, PaaS is a lot more “SOA” (application infrastructure services available via APIs) than “hosting”.

You can read the full article for more.

Post

Introducing Enterprise Manager Cloud Control 12c

In Application Mgmt,Cloud Computing,Everything,IT Systems Mgmt,Oracle,Utility computing on November 3, 2011 by @vambenepe

Oracle Enterprise Manager Cloud Control 12c, the new version of Oracle’s IT management product came out a few weeks ago, during Open World (video highlights of the launch). That release was known internally for a while as “NG” for “next generation” because the updates it contains are far more numerous and profound than your average release. The design for “NG” (now “12c”) started long before Enterprise Manager Grid Control 11g, the previous major release, shipped. The underlying framework has been drastically improved, from the modeling capabilities, to extensibility, scalability, incident management and, most visibly, UI.

If you’re not an existing EM user then those framework upgrades won’t be as visible to you as the feature upgrades and additions. And there are plenty of those as well, from database management to application management and configuration management. The most visible addition is the all-new self-service portal through which an EM-driven private Cloud can be made available. This supports IaaS-level services (individual VMs or assemblies composed of multiple coordinated VMs) and DBaaS services (we’ve also announced and demonstrated upcoming PaaS services). And it’s not just about delivering these services via lifecycle automation, a lot of work has also gone into supporting the business and organizational aspects of delivering services in a private Cloud: quotas, chargeback, cost centers, maintenance plans, etc…

EM Cloud Control is the first Oracle product with the “12c” suffix. You probably guessed it, the “c” stands for “Cloud”. If you consider the central role that IT management software plays in Cloud Computing I think it’s appropriate for EM to lead the way. And there’s a lot more “c” on the way.

Many (short and focused) demo videos are available. For more information, see the product marketing page, the more technical overview of capabilities or the even more technical product documentation. Or you can just download the product (or, for production systems, get it on eDelivery).

If you missed the launch at Open World, EM12c introduction events are taking place all over the world in November and December. They start today, November 3rd, in Athens, Riga and Beijing.

We’re eager to hear back from users about this release. I’ve run into many users blogging about installing EM12c and I’ll keep eye out for their reports after using it for a bit.

Post

DMTF publishes draft of Cloud API

In API,Application Mgmt,Automation,Cloud Computing,DMTF,Everything,IaaS,IT Systems Mgmt,Manageability,Mgmt integration,Modeling,Portability,Protocols,REST,Specs,Standards,Tech,Utility computing,Virtual appliance,Virtualization on September 18, 2011 by @vambenepe

Note to anyone who still cares about IaaS standards: the DMTF has published a work in progress.

There was a lot of interest in the topic in 2009 and 2010. Some heated debates took place during Cloud conferences and a few symposiums were organized to try to coordinate various standard efforts. The DMTF started an “incubator” on the topic. Many companies brought submissions to the table, in various levels of maturity: VMware, Fujitsu, HP, Telefonica, Oracle and RedHat. IBM and Microsoft might also have submitted something, I can’t remember for sure.

The DMTF has been chugging along. The incubator turned into a working group. Unfortunately (but unsurprisingly), it limited itself to the usual suspects (and not all the independent Cloud experts out there) and kept the process confidential. But this week it partially lifted the curtain by publishing two work-in-progress documents.

They can be found at http://dmtf.org/standards/cloud but if you read this after March 2012 they won’t be there anymore, as DMTF likes to “expire” its work-in-progress documents. The two docs are:

The first one is the interesting one, and the one you should read if you want to see where the DMTF is going. It’s a RESTful specification (at the cost of some contortions, e.g. section 4.2.1.3.1). It supports both JSON and XML (bad idea). It plans to use RelaxNG instead of XSD (good idea). And also CIM/MOF (not a joke, see the second document for proof). The specification is pretty ambitious (it covers not just lifecycle operations but also monitoring and events) and well written, especially for a work in progress (props to Gil Pilz).

I am surprised by how little reaction there has been to this publication considering how hotly debated the topic used to be. Why is that?

A cynic would attribute this to people having given up on DMTF providing a Cloud API that has any chance of wide adoption (the adjoining CIM document sure won’t help reassure DMTF skeptics).

To the contrary, an optimist will see this low-key publication as a sign that the passions have cooled, that the trusted providers of enterprise software are sitting at the same table and forging consensus, and that the industry is happy to defer to them.

More likely, I think people have, by now, enough Cloud experience to understand that standardizing IaaS APIs is a minor part of the problem of interoperability (not to mention the even harder goal of portability). The serialization and plumbing aspects don’t matter much, and if they do to you then there are some good libraries that provide mappings for your favorite language. What matters is the diversity of resources and services exposed by Cloud providers. Those choices strongly shape the design of your application, much more than the choice between JSON and XML for the control API. And nobody is, at the moment, in position to standardize these services.

So congrats to the DMTF Cloud Working Group for the milestone, and please get the API finalized. Hopefully it will at least achieve the goal of narrowing down the plumbing choices to three (AWS, OpenStack and DMTF). But that’s not going to solve the hard problem.

Post

Perspectives on Cloud.com acquisition

In Cloud Computing,DMTF,Everything,Governance,Mgmt integration,Open source,OpenStack,Oracle,Specs,Standards,Utility computing,Virtualization,VMware on August 11, 2011 by @vambenepe

Interesting analysis (by Gartner’s Lydia Leong) on the acquisition of Cloud.com by Citrix (apparently for 100x revenues) and its position as a cheaper alternative for vCloud (at least until OpenStack Nova becomes stable).

Great read, even though that part:

“[Zygna] uses Cloud.com to provide Amazon-compatible (and thus Rightscale-compatible) infrastructure internally, letting it easily move workloads across their own infrastructure and Amazon’s.”

is a bit of a simplification.

While I’m at it, here’s another take on Cloud.com, this time from an OSS license perspective. Namely, the difference between building your business on GPL (like Eucalyptus) or Apache 2 (like the more community-driven open source projects such as OpenStack).

Towards the end, there’s also a nice nod to the Oracle Cloud API:

“DMTF has been receiving other submissions for an API standard. Oracle has made its submission public.  It is based on an earlier Sun proposal, and it is the best API we have yet seen. Furthermore, Oracle has identified a core subset to allow initial early adoption, as well as areas where vendors (including themselves and crucially VMware) may continue to extend to allow differentiation.”

Here’s more on the Oracle Cloud API, including an explanation of the “core/extension” split mentioned above.

 

Post

Why is there more public PaaS than private PaaS?

In Cloud Computing,Everything,IaaS,Middleware,PaaS on August 5, 2011 by @vambenepe

I asked on Twitter: “For IaaS there’s a fair mix of public and private. But PaaS seems very titled towards public right now. Any idea why?”

Here are the responses I collected:

@wrecks47 challenged the proposition:

  • I see the opposite. I observe much more activity in private PaaS rather than public PaaS.

Others seemed to agree and offered these explanations:

@reillyusa and @mfratto think it’s because it’s too hard to build a private PaaS:

  • Complex and not too many understandable reference architectures – personified by why Azure appliance is taking so long to appear.
  • much harder to build a PaaS in house?

@ryanprociuk and @garnaat think PaaS are very specific (though it’s not clear to me how that explains its lacks of private deployment):

  • PaaS may not be as generic as IaaS, specific to a technical solution. IMO
  • I think a private PaaS might be very domain-specific. Current PaaS target a narrow range of scale which can be generic.

@somic thinks that in a private setting (presumably without specialized app services) there isn’t much to gain by offering PaaS versus letting people run a container on top of IaaS (though this begs the question why don’t private PaaS provide these services like public PaaS do):

  • imho today’s paas, in a private deployment, is just a webapp container – not revolutionary enough to justify a move

@robcheng thinks it’s mostly politics:

  • the same agendas that cause companies to embrace private cloud make them suspicious of PaaS (whose jobs/teams become obsolete?)

@cloud_borat‘s interpretation is just that middleware marketers aren’t as savvy as their infrastructure counterparts.

  • our expert analyst Igor say that because app server marketing people suck more and not label appserver as private PaaS

Thanks all!

[UPDATED 2011/7/5: This was originally a Google+ post, but it really belongs as a blog post here so I am glad this is where you are reading it. The next post explains why.]

Post

Comments on “The Good, the Bad, and the Ugly of REST APIs”

In Cloud Computing,Everything,Implementation,Manageability,Mgmt integration,Modeling,Protocols,REST,SOAP,Specs,Tech on June 6, 2011 by @vambenepe

A survivor of intimate contact with many Cloud APIs, George Reese shared his thoughts about the experience in a blog post titled “The Good, the Bad, and the Ugly of REST APIs“.

Here are the highlights of his verdict, with some comments.

“Supporting both JSON and XML [is good]“

I disagree: Two versions of a protocol is one too many (the post behind this link doesn’t specifically discuss the JSON/XML dichotomy but its logic applies to that situation, as Tim Bray pointed out in a comment).

“REST is good, SOAP is bad”

Not necessarily true for all integration projects, but in the context of Cloud APIs, I agree. As long as it’s “pragmatic REST”, not the kind that involves silly contortions to please the REST police.

“Meaningful error messages help a lot”

True and yet rarely done properly.

“Providing solid API documentation reduces my need for your help”

Goes without saying (for a good laugh, check out the commenter on George’s blog entry who wrote that “if you document an API, you API immediately ceases to have anything to do with REST” which I want to believe was meant as a joke but appears written in earnest).

“Map your API model to the way your data is consumed, not your data/object model”

Very important. This is a core part of Humble Architecture.

“Using OAuth authentication doesn’t map well for system-to-system interaction”

Agreed.

“Throttling is a terrible thing to do”

I don’t agree with that sweeping statement, but when George expands on this thought what he really seems to mean is more along the lines of “if you’re going to throttle, do it smartly and responsibly”, which I can’t disagree with.

“And while we’re at it, chatty APIs suck”

Yes. And one of the main causes of API chattiness is fear of angering the REST gods by violating the sacred ritual. Either ignore that fear or, if you can’t, hire an expensive REST consultant to rationalize a less-chatty design with some media-type black magic and REST-bless it.

Finally George ends by listing three “ugly” aspects of bad APIs (“returning HTML in your response body”, “failing to realize that a 4xx error means I messed up and a 5xx means you messed up” and “side-effects to 500 errors are evil”) which I agree on but I see those as a continuation of the earlier point about paying attention to the error messages you return (because that’s what the developers who invoke your API will be staring at most of the time, even if they represents only 0.01% of the messages you return).

What’s most interesting is what’s NOT in George’s list. No nit-picking about REST purity. That tells you something about what matters to implementers.

If I haven’t yet exhausted my quota of self-referential links, you can read REST in practice for IT and Cloud management for more on the topic.

Post

“Toyota Friend”: It’s cool! It’s social! It’s cloud! It’s… spam.

In Automation,Business,Cloud Computing,Everything,Off-topic on May 23, 2011 by @vambenepe

Michael Coté has it right: “all roads lead to better junk mail.

We can take “road” literally in this case since Toyota has teamed up with Salesforce.com to “build Toyota Friend social network for Toyota customers and their cars“.

If you’re tired of “I am getting a fat-free decaf latte at Starbucks” FourSquare messages, wait until you start receiving “my car is getting a lead-free 95-octane pure arabica gas refill at Chevron”. That’s because Toyota owners will get to “choose to extend their communication to family, friends, and others through public social networks such as Twitter and Facebook“.

Leaving “family and friends” aside (they will beg you to), the main goal of this social network is to connect “Toyota customers with their cars, their dealership, and with Toyota”. And what for purpose? The press release has an example:

For example, if an EV or PHV is running low on battery power, Toyota Friend would notify the driver to re-charge in the form of a “tweet”-like alert.

That’s pretty handy, but every car I’ve ever owned has sent me a “tweet-like” alert in the form of a light on the dashboard when I got low on fuel.

Toyota’s partner, Salesforce, also shares its excitement about this (they bring the Cloud angle), and offers another example of its benefits:

Would you like to know if your dealer’s service department has a big empty space on its calendar tomorrow morning, and is willing to offer you a sizable discount on routine service if you’ll bring the car in then instead of waiting another 100 miles?

Ten years ago, the fancy way to justify spamming people was to say that you offered “personalization”. Look at this old advertisement (which lists Toyota as a customer) about how “personalization” is the way to better connect with customers and get them to buy more. Today, we’ve replaced “personalization” with “social media” but it’s the exact same value proposition to the company (coupled with a shiny new way to feed it to its customers).

BTW, the company behind the advertisement? Broadvision. Remember Broadvision? Internet bubble darling, its share price hit over $20,000 (split-adjusted to today). According to the ad above, it was at the time “the world’s second leading e-commerce vendor in terms of licensing revenues, just behind Netscape and ahead of Oracle, IBM, and even Microsoft” and “the Internet commerce firm listed in Bloomberg’s Top 100 Stocks”. Today, it’s considered a Micro-cap stock. Which reminds me, I still haven’t gotten around to buying some LinkedIn…

Notice who’s missing from the list of people you’ll connect to using Toyota’s social network? Independent repair shops and owners forums (outside Toyota). Now, if this social network was used to let me and third-party shops retrieve all diagnostic information about my car and all related knowledge from Toyota and online forums that would be valuable. But that’s the last thing on earth Toyota wants.

A while ago, a strange-looking icon lit up on the dashboard of my Prius. Looking at it, I had no idea what it meant. A Web search (which did not land on Toyota’s site of course) told me it indicated low tire pressure (I had a slow leak). Even then, I had no idea which tire it was. Now at that point it’s probably a good idea to check all four of them anyway, but you’d think that with two LCD screens available in the car they’d have a way to show you precise and accurate messages rather than cryptic icons. It’s pretty clear that the whole thing is designed with the one and only goal of making you go to your friendly Toyota dealership.

Which is why, without having seen this “Toyota Friend” network in action, I am pretty sure I know it will be just another way to spam me and try to scare me away from bringing my car anywhere but to Toyota.

Dear Toyota, I don’t want “social”, I want “open”.

In the meantime, and since you care about my family, please fix the problem that is infuriating my Japanese-American father in law: that the voice recognition in his Japan-made car doesn’t understand his accented English. Thanks.

Post

Reading IBM’s proposed standard for Cloud Architecture

In Application Mgmt,Automation,Big picture,BPM,BSM,Business Process,Cloud Computing,Everything,Governance,IBM,IT Systems Mgmt,ITIL,Mgmt integration,Utility computing on March 7, 2011 by @vambenepe

Did you enjoy the first version of IBM’s Cloud Computing Reference Architecture? Did you even get certified on it? Then rejoice, because there’s a new version. IBM  recently submitted the IBM Cloud Computing Reference Architecture 2.0 to The Open Group.

I’m a bit out of practice reading this kind of IBMese (let’s just say that The Open Group was the right place to submit it) but I would never let my readers down. So, even though these box-within-a-box-within-a-box diagrams (see section 2) give me flashbacks to the days of OGF and WSRF, I soldiered on.

I didn’t understand the goal of the document enough to give you a fair summary, but I can share some thoughts.

It starts by talking a lot about SOA. I initially thought this was to make the point that Glen Daniels articulated very well in this tweet:

Yup, correct SOA patterns (loose coupling, dyn refs, coarse interfaces…) are exactly what you need for cloud apps. You knew this.

But no. Rather than Glen’s astute remark, IBM’s point is one meta-level lower. It’s that “Cloud solutions are SOA solutions”. Which I have a harder time parsing. If you though “service” was overloaded before…

While some of the IBM authors are SOA experts, others apparently come from a Telco background so we get OSS/BSS analogies next.

By that point, I’ve learned that Cloud is like SOA except when it’s like Telco (but there’s probably another reference architecture somewhere that explains that Telco is SOA, so it all adds up).

One thing that chagrined me was that even though this document is very high-level it still manages to go down into implementatin technologies long enough to assert, wrongly, that virtualization is required for Cloud solutions. Another Cloud canard repeated here is the IaaS/PaaS/SaaS segmentation of the Cloud world, to which IBM adds a BPaaS (Business Process as a Service) layer for good measure (for my take on how Cloud relates to SOA, and how I dislike the IaaS/PaaS/SaaS pyramid, see this write-up of the presentation I gave at last year’s Cloud Connect, especially the 3rd picture).

It gets a lot better if you persevere to page 29, where the “Architecture Principles” finally get introduced (if had been asked to edit the paper, I would have only kept the last 6 pages). They are:

  1. Design for Cloud-scale Efficiencies: When realizing cloud characteristics such as elasticity, self-service access, and flexible sourcing, the cloud design is strictly oriented to high cloud scale efficiencies and short time-to-delivery/time-to-change. (“Efficiency Principle”)
  2. Support Lean Service Management: The Common Cloud Management Platform fosters lean and lightweight service management policies, processes, and technologies. (“Lightweightness Principle”)
  3. Identify and Leverage Commonalities: All commonalities are identified and leveraged in cloud service design. (“Economies-of-scale principle”)
  4. Define and Manage generically along the Lifecycle of Cloud Services: Be generic across I/P/S/BPaaS & provide ‘exploitation’ mechanism to support various cloud services using a shared, common management platform (“Genericity”).

Each principle gets a nickname, thanks to which IBM can refer to this list as the ELEG principles (Efficiency, Lightweightness, Economies-of-scale, Genericity). It also spells GLEE, but apparently that’s wasn’t the prefered sequence.

The first principle is hard to disagree with. The second also rings true, including its dings on ITIL (but the irony of IBM exhaulting “Lightweightness” is hard to ignore). The third and fourth principles (by that time I had lost too many brain cells to understand how they differ) really scared me. While I can understand the motivation, they elicited a vision of zombies in blue suits (presumably undead IBM Distinguish Engineers and Fellows) staggering towards me: “frameworks… we want frameworks…”.

There you go. If you want more information (and, more importantly, unbiased information) go read the Reference Architecture yourself. I am not involved in The Open Group, and I have no idea what it plans to do with it (and if it has received other submissions of the same type). Though I wouldn’t be surprised if I see, in 5 years, some panic sales rep asking an internal mailing list “The customer RPF asks for a mapping of our solution to the Open Group Cloud Reference Architecture and apparently IBM has 94 slides about it, what do I do? Has anyone heard about this Reference Architecture? This is urgent.”

Urgent things are long in the making.

Post

CloudFormation in context

In Amazon,Application Mgmt,Automation,Cloud Computing,Everything,Mgmt integration,Modeling,PaaS,Specs,Utility computing on February 27, 2011 by @vambenepe

I’ve been very positive about AWS CloudFormation (both in tweet and blog form) since its announcement . I want to clarify that it’s not the technology that excites me. There’s nothing earth-shattering in it. CloudFormation only covers deployment and doesn’t help you with configuration, monitoring, diagnostic and ongoing lifecycle. It’s been done before (including probably a half-dozen times within IBM alone, I would guess). We’ve had much more powerful and flexible frameworks for a long time (I can’t even remember when SmartFrog first came out). And we’ve had frameworks with better tools (though history suggests that tools for CloudFormation are already in the works, not necessarily inside Amazon).

Here are some non-technical reasons why I tweeted that “I have a feeling that the AWS CloudFormation format might become an even more fundamental de-facto standard than the EC2 API” even before trying it out.

It’s simple to use. There are two main reasons for this (and the fact that it uses JSON rather than XML is not one of them):
- It only support a small set of features
- It “hard-codes” resource types (e.g. EC2, Beanstalk, RDS…) rather than focusing on an abstract and extensible mechanism

It combines a format and an API. You’d think it’s obvious that the two are complementary. What can you do with a format if you don’t have an API to exchange documents in that format? Well, turns out there are lots of free-floating model formats out there for which there is no defined API. And they are still wondering why they never saw any adoption.

It merges IaaS and PaaS. AWS has always defied the “IaaS vs. PaaS” view of the Cloud. By bridging both, CloudFormation is a great way to provide a smooth transition. I expect most of the early templates to be very EC2-centric (are as most AWS deployments) and over time to move to a pattern in which EC2 resources are just used for what doesn’t fit in more specialized containers).

It comes at the right time. It picks the low-hanging fruits of the AWS automation ecosystem. The evangelism and proof of concept for templatized deployments have already taken place.

It provides a natural grouping of the various AWS resources you are currently consuming. They are now part of an explicit deployment context.

It’s free (the resources provisioned are not free, of course, but the fact that they came out of a CloudFormation deployment doesn’t change the cost).

Post

AWS CloudFormation is the iPhone of Cloud services

In Amazon,Application Mgmt,Automation,Cloud Computing,Everything,Mgmt integration,Modeling,OpenStack,PaaS,Portability,Specs,Tech,Utility computing,Virtual appliance on February 26, 2011 by @vambenepe

Expanding on tweet that I wrote soon after the announcement of AWS CloudFormation.

The iPhone unifies the GPS, phone, PDA, camera and camcorder. CloudFormation does the same for infrastructure services (VMs, volumes, network…) and some platform services (Beanstalk, RDS, SimpleDB, SQS, SNS…). You don’t think about whether you should grab a phone or a PDA, you grab an iPhone and start using the feature you need. It’s the default tool. Similarly with CloudFormation, you won’t start by thinking about what AWS service you want to use. Rather, you grab a CloudFormation template and modify it as needed. The template (or the template editor) is the default tool.

The iPhone doesn’t just group features that used to be provided by many devices. It also allows these features to collaborate. It’s not that you get a PDA and a phone side-by-side in one device. You can press the “call” button from the “PDA” feature. CloudFormation doesn’t just bundle deployments to various AWS services, it wires them together.

Anyone can write apps for the iPhone. Anyone can write apps that use CloudFormation.

There’s an App Store for iPhone apps. On the CloudFormation side, it will probably come soon (right now Amazon has made templates available on S3, but that’s not a real store). Amazon has developed example templates for a set of common applications, but expect application authors to take ownership of that task soon. They’ll consider it one of their deliverables. Right next to the “download” button you’ll start seeing a “deploy to AWS” button. Guess which one will eventually be used the most?

It’s Apple’s platform and your applications have to comply with their policy. AWS is not as much of a control freak as Apple and doesn’t have an upfront approval process, but it has its terms of service and they too can get you kicked out.

The iPhone is not a standard platform (though you may consider it a de-facto standard). Same for AWS CloudFormation.

There are alternatives to the iPhone that define themselves primarily as being more open than it, mainly Android. Same for AWS with OpenStack (which probably will soon have its CloudFormation equivalent).

The iPhone infiltrated itself into corporations at the ground level, even if the CIO initially saw no reason to look beyond BlackBerry for corporate needs. Same with AWS.

Any other parallel? Any fundamental difference I missed?

Post

Defining Cloud from the provider perspective

In Cloud Computing,Everything,IT Systems Mgmt,Utility computing on February 13, 2011 by @vambenepe

I have a new definition for Cloud Computing. No, really.

Many discussions attempted to define Cloud Computing from the perspective of the consumer. To the point where asking “what’s a Cloud” has become a private joke for “let’s waste some time”. Eventually, people settled on the NIST set of definitions either because they like them (probability 0.1), they got tired of arguing (probability 0.4) or they want to sell to the government (probability 0.5).

Well, I have another one. Mine is a definition from the perspective of the Cloud provider (or the creator of Cloud-enablement software). And it’s a simple one.

A Cloud is a computing environment in which the runtime infrastructure and the management infrastructure are indistinguishable.

Ask engineers at Google App Engine to separate their code between the runtime part and the management part. They might not even understand the question.

For companies (like Oracle, where I work) that have a runtime division (Fusion Middleware for us) and a management division (Enterprise Manager), both of which ship products, it’s a challenge.

For companies which only offer one or the other, it’s a huge challenge.

For engineers who have to put it all together, it’s a great time to be in business.

Post

The REST bubble

In Application Mgmt,Cloud Computing,Everything,JBoss,Mgmt integration,Protocols,REST,Utility computing on January 26, 2011 by @vambenepe

Just yesterday I was writing about how Cloud APIs are like military parades. To some extent, their REST rigor is a way to enforce implementation discipline. But a large part of it is mostly bling aimed at showing how strong (for an army) or smart (for an API) the people in charge are.

Case in point, APIs that have very simple requirements and yet make a big deal of the fact that they are perfectly RESTful.

Just today, I learned (via the ever-informative InfoQ) about the JBoss SteamCannon project (a PaaS wrapper for Java and Ruby apps that can deploy on different host infrastructures like EC2 and VirtualBox). The project looks very interesting, but the API doc made me shake my head.

The very first thing you read is three paragraphs telling you that the API is fully HATEOS (Hypermedia as the Engine of Application State) compliant (our URLs are opaque, you hear me, opaque!) and an invitation to go read Roy’s famous take-down of these other APIs that unduly call themselves RESTful even though they don’t give HATEOS any love.

So here I am, a developer trying to deploy my WAR file on SteamCannon and that’s the API document I find.

Instead of the REST finger-wagging, can I have a short overview of what functions your API offers? Or maybe an example of a request call and its response?

I don’t mean to pick on SteamCannon specifically, it just happens to be a new Cloud API that I discovered today (all the Cloud API out there also spend too much time telling you how RESTful they are and not enough time showing you how simple they are). But when an API document starts with a REST lesson and when PowerPoint-waving sales reps pitch “RESTful APIs” to executives you know this REST thing has gone way beyond anything related to “the fundamentals”.

We have a REST bubble on our hands.

Again, I am not criticizing REST itself. I am criticizing its religious and ostentatious application rather than its practical use based on actual requirements (this was my take on its practical aspects in the context of Cloud APIs).

Post

Cloud APIs are like military parades

In Amazon,Automation,Cloud Computing,Everything,IT Systems Mgmt,Mgmt integration,Protocols,REST,Specs,Utility computing on January 24, 2011 by @vambenepe

The previous post (“Amazon proves that REST doesn’t matter for Cloud APIs”) attracted some interesting comments on the blog itself, on Hacker News and in a response post by Mike Pearce (where I assume the photo is supposed to represent me being an AWS fanboy). I failed to promptly follow-up on it and address the response, then the holidays came. But Mark Little was kind enough to pick the entry up for discussion on InfoQ yesterday which brought new readers and motivated me to write a follow-up.

Mark did a very good job at summarizing my point and he understood that I wasn’t talking about the value (or lack of value) of REST in general. Just about whether it is useful and important in the very narrow field of Cloud APIs. In that context at least, what seems to matter most is simplicity. And REST is not intrinsically simpler.

It isn’t a controversial statement in most places that RPC is easier than REST for developers performing simple tasks. But on the blogosphere I guess it needs to be argued.

Method calls is how normal developers write normal code. Doing it over the wire is the smallest change needed to invoke a remote API. The complexity with RPC has never been conceptual, it’s been in the plumbing. How do I serialize my method call and send it over? CORBA, RMI and SOAP tried to address that, none of them fully succeeded in keeping it simple and yet generic enough for the Internet. XML-RPC somehow (and unfortunately) got passed over in the process.

So what did AWS do? They pretty much solved that problem by using parameters in the URL as a dead-simple way to pass function parameters. And you get the response as an XML doc. In effect, it’s one-half of XML-RPC. Amazon did not invent this pattern. And the mechanism has some shortcomings. But it’s a pragmatic approach. You get the conceptual simplicity of RPC, without the need to agree on an RPC framework that tries to address way more than what you need. Good deal.

So, when Mike asksDoes the fact that AWS use their own implementation of an API instead of a standard like, oh, I don’t know, REST, frustrate developers who really don’t want to have to learn another method of communicating with AWS?” and goes on to answer “Yes”, I scratch my head. I’ve met many developers struggling to understand REST. I’ve never met a developer intimidated by RPC. As to the claim that REST is a “standard”, I’d like to read the spec. Please don’t point me to a PhD dissertation.

That being said, I am very aware that simplicity can come back to bite you, when it’s not just simple but simplistic and the task at hand demands more. Andrew Wahbe hit the nail on the head in a comment on my original post:

Exposing an API for a unique service offered by a single vendor is not going to get much benefit from being RESTful.

Revisit the issue when you are trying to get a single client to work across a wide range of cloud APIs offered by different vendors; I’m willing to bet that REST would help a lot there. If this never happens — the industry decides that a custom client for each Cloud API is sufficient (e.g. not enough offerings on the market, or whatever), then REST may never be needed.

Andrew has the right perspective. The usage patterns for Cloud APIs may evolve to the point where the benefits of following the rules of REST become compelling. I just don’t think we’re there and frankly I am not holding my breath. There are lots of functional improvements needed in Cloud services before the burning issue becomes one of orchestrating between Cloud providers. And while a shared RESTful API would be the easiest to orchestrate, a shared RPC API will still be very reasonably manageable. The issue will mostly be one of shared semantics more than protocol.

Mike’s second retort was that it was illogical for me to say that software developers are mostly isolated from REST because they use Cloud libraries. Aren’t these libraries written by developers? What about these, he asks. Well, one of them, Boto‘s Mitch Garnaat left a comment:

Good post. The vast majority of AWS (or any cloud provider’s) users never see the API. They interact through language libraries or via web-based client apps. So, the only people who really care are RESTafarians, and library developers (like me).

Perhaps it’s possible to have an API that’s so bad it prevents people from using it but the AWS Query API is no where near that bad. It’s fairly consistent and pretty easy to code to. It’s just not REST.

Yup. If REST is the goal, then this API doesn’t reach it. If usefulness is the goal, then it does just fine.

Mike’s third retort was to take issue with that statement I made:

The Rackspace people are technically right when they point out the benefits of their API compared to Amazon’s. But it’s a rounding error compared to the innovation, pragmatism and frequency of iteration that distinguishes the services provided by Amazon. It’s the content that matters.

Mike thinks that

If Rackspace are ‘technically’ right, then they’re right. There’s no gray area. Morally, they’re also right and mentally, physically and spiritually, they’re right.

Sure. They’re technically, mentally, physically and spiritually right. They may even be legally, ethically, metaphysically and scientifically right. Amazon is only practically right.

This is not a ding on Rackspace. They’ll have to compete with Amazon on service (and price), not on API, as they well know and as they are doing. But they are racing against a fast horse.

More generally, the debate about how much the technical merits of an API matters (beyond the point where it gets the job done) is a recurring one. I am talking as a recovering over-engineer.

In a post almost a year ago, James Watters declared that it matters. Mitch Garnaat weighed on the other side: given how few people use the raw API we probably spend too much time worrying about details, maybe we worry too much about aesthetics, I still wonder whether we obsess over the details of the API’s a bit too much (in case you can’t tell, I’m a big fan of Mitch).

Speaking of people I admire, Shlomo Swidler (in general, only library developers use the raw HTTP. Everyone else uses a library) and Joe Arnold (library integration (fog / jclouds / libcloud) is more important for new #IaaS providers than an API) make the right point. Rather than spending hours obsessing about the finer points of your API, spend the time writing love letters to Mitch and Adrian so they support you in their libraries (also, allocate less of your design time to RESTfulness and more to the less glamorous subject of error handling).

OK, I’ll pile on two more expert testimonies. Righscale’s Thorsten von Eicken (the API itself is more a programming exercise than a fundamental issue, it’s the semantics of the resources behind the API that really matter) and F5′s Lori MacVittie (the World Doesn’t Care About APIs).

Bottom line, I see APIs a bit like military parades. Soldiers know better than to walk in tight formation, wearing bright colors and to the sound of fanfare into the battlefield. So why are parade exercises so prevalent in all armies? My guess is that they are used to impress potential enemies, reassure citizens and reflect on the strength of the country’s leaders. But military parades are also a way to ensure internal discipline. You may not need to use parade moves on the battlefield, but the fact that the unit is disciplined enough to perform them means they are also disciplined enough for the tasks that matter. Let’s focus on that angle for Cloud APIs. If your RPC API is consistent enough that its underlying model could be used as the basis for a REST API, you’re probably fine. You don’t need the drum rolls, stiff steps and the silly hats. And no need to salute either.

Post

Amazon proves that REST doesn’t matter for Cloud APIs

In Amazon,Application Mgmt,Cloud Computing,Everything,Implementation,Mgmt integration,REST,Specs,Utility computing on December 6, 2010 by @vambenepe

Every time a new Cloud API is announced, its “RESTfulness” is heralded as if it was a MUST HAVE feature. And yet, the most successful of all Cloud APIs, the AWS API set, is not RESTful.

We are far enough down the road by now to conclude that this isn’t a fluke. It proves that REST doesn’t matter, at least for Cloud management APIs (there are web-scale applications of an entirely different class for which it does). By “doesn’t matter”, I don’t mean that it’s a bad choice. Just that it is not significantly different from reasonable alternatives, like RPC.

AWS mostly uses RPC over HTTP. You send HTTP GET requests, with instructions like ?Action=CreateKeyPair added in the URL. Or DeleteKeyPair. Same for any other resource (volume, snapshot, security group…). Amazon doesn’t pretend it’s RESTful, they just call it “Query API” (except for the DevPay API, where they call it “REST-Query” for unclear reasons).

Has this lack of REStfulness stopped anyone from using it? Has it limited the scale of systems deployed on AWS? Does it limit the flexibility of the Cloud offering and somehow force people to consume more resources than they need? Has it made the Amazon Cloud less secure? Has it restricted the scope of platforms and languages from which the API can be invoked? Does it require more experienced engineers than competing solutions?

I don’t see any sign that the answer is “yes” to any of these questions. Considering the scale of the service, it would be a multi-million dollars blunder if indeed one of them had a positive answer.

Here’s a rule of thumb. If most invocations of your API come via libraries for object-oriented languages that more or less map each HTTP request to a method call, it probably doesn’t matter very much how RESTful your API is.

The Rackspace people are technically right when they point out the benefits of their API compared to Amazon’s. But it’s a rounding error compared to the innovation, pragmatism and frequency of iteration that distinguishes the services provided by Amazon. It’s the content that matters.

If you think it’s rich, for someone who wrote a series of post examining “REST in practice for IT and Cloud management” (part 1, part 2 and part 3), to now declare that REST doesn’t matter, well go back to these posts. I explicitly set them up as an effort to investigate whether (and in what way) it mattered and made it clear that my intuition was that actual RESTfulness didn’t matter as much as simplicity. The AWS API being an example of the latter without the former. As I wrote in my review of the Sun Cloud API, “it’s not REST that matters, it’s the rest”. One and a half years later, I think the case is closed.

Post

Nice incremental progress in Google App Engine SDK 1.4

In Application Mgmt,Cloud Computing,Everything,Google,Google App Engine,IT Systems Mgmt,Manageability,Middleware,PaaS,Utility computing on December 2, 2010 by @vambenepe

When Google released version 1.3.8 of the Google App Engine SDK in October, they introduced an instance console, showing you how many instances are serving your application and some basic metrics about these instances. I wrote a blog to consider the implications of providing this level of visibility to application administrators. It also pointed out some shortcomings of this first version of the console.

The most glaring problem was that the console showed an “average latency” which was just a straight average of the latencies of all the instances, independently of the traffic they see. Which is a meaningless number.

Today, Google released an update to the SDK (1.4), and along with it some minor updates to the instance console. Except that, as you can see below, the screen capture in their announcement happens to show three instances that have processed exactly the same number of messages. Which means that we can’t tell whether they have fixed the “unweighted average” problem or not. Is this just by chance? Google, WTF? (which stands for “what’s the formula?”, of course).

I decided it was worth spending a few minutes to find the answer. I don’t have any app currently in use on GAE, but it doesn’t take much work to generate enough load to wake up one of my old apps and get it to spin a couple of instances. Here is the resulting console instance:

If you run the numbers, you can see that they’ve fixed that issue; the average latency is now weighted based on instance traffic. Thanks Google for listening.

Apparently, not all the updates have trickled down to my version of the instance console. The “requests”, “errors” and “age” columns are missing. I assume they’re on their way. Seeing the age of the instances, especially, is a nice addition, one of those I requested in my blog.

In the grand scheme of things, these minor updates to the console (which remains quite basic) are not the big news. The major announcement with SDK 1.4 is that the dreaded 30 seconds limit on execution time has been lifted for background tasks (those from Task Queue and Cron). It’s now a much more manageable 10 minutes. This doesn’t apply to the execution of Web requests served by your app.

Google App Engine has been under criticism recently, and that 30-second limit (along with reliability issues) figured prominently in the complains. Assuming the reliability issues are also coming under control, this update will go a long way towards addressing these issues.

Just so you realize how lucky you are if you are just now starting with Google App Engine, here are the kind of hoops you had to jump through, in the early days, to process any task that took a significant amount of time. This was done a year before the Cron and Task Queue features were added to GAE.

Another nice addition with SDK 1.4 is that you can now retrieve the source code of your application from Google’s servers. Of course you should never need that if you are rigorous and well-organized… Presumably this is only for Python since in the Java case Google’s servers never see the source code.

The steady progress of the GAE SDK continues.

Post

Cloud management is to traditional IT management what spreadsheets are to calculators

In Application Mgmt,Automation,Big picture,Business,Cloud Computing,DevOps,Everything,IT Systems Mgmt,Manageability,Mgmt integration,Utility computing on November 3, 2010 by @vambenepe

It’s all in the title of the post. An elevator pitch short enough for a 1-story ride. A description for business people. People who don’t want to hear about models, virtualization, blueprints and devops. But people who also don’t want to be insulted with vague claims about “business/IT alignment” and “agility”.

The focus is on repeatability. Repeatability saves work and allows new approaches. I’ve found spreadsheets (and “super-spreadsheets”, i.e. more advanced BI tools) to be a good analogy for business people. Compared to analysts furiously operating calculators, spreadsheets save work and prevent errors. But beyond these cost savings, they allow you to do things you wouldn’t even try to do without them. It’s not just the same process, done faster and cheaper. It’s a more mature way of running your business.

Same with the “Cloud” style of IT management.

Post

Lifting the curtain on PaaS Cloud infrastructure (can you handle the truth?)

In Application Mgmt,Automation,Cloud Computing,Everything,Google App Engine,IT Systems Mgmt,Manageability,Mgmt integration,Middleware,PaaS,Utility computing on October 19, 2010 by @vambenepe

The promise of PaaS is that application owners don’t need to worry about the infrastructure that powers the application. They just provide application artifacts (e.g. WAR files) and everything else is taken care of. Backups. Scaling. Infrastructure patching. Network configuration. Geographic distribution. Etc. All these headaches are gone. Just pick from a menu of quality of service options (and the corresponding price list). Make your choice and forget about it.

In theory.

In practice no abstraction is leak-proof and the abstractions provided by PaaS environments are even more porous than average. The first goal of PaaS providers should be to shore them up, in order to deliver on the PaaS value proposition of simplification. But at some point you also have to acknowledge that there are some irreducible leaks and take pragmatic steps to help application administrators deal with them. The worst thing you can do is have application owners suffer from a leaky abstraction and refuse to even acknowledge it because it breaks your nice mental model.

Google App Engine (GAE) gives us a nice and simple example. When you first deploy an application on GAE, it is deployed as just one instance. As traffic increases, a second instance comes up to handle the load. Then a third. If traffic decreases, one instance may disappear. Or one of them may just go away for no reason (that you’re aware of).

It would be nice if you could deploy your application on what looks like a single, infinitely scalable, machine and not ever have to worry about horizontal scale-out. But that’s just not possible (at a reasonable cost) so Google doesn’t try particularly hard to hide the fact that many instances can be involved. You can choose to ignore that fact and your application will still work. But you’ll notice that some requests take a lot more time to complete than others (which is typically the case for the first request to hit a new instance). And some requests will find an empty local cache even though your application has had uninterrupted traffic. If you choose to live with the “one infinitely scalable machine” simplification, these are inexplicable and unpredictable events.

Last week, as part of the release of the GAE SDK 1.3.8, Google went one step further in acknowledging that several instances can serve your application, and helping you deal with it. They now give you a console (pictured below) which shows the instances currently serving your application.

I am very glad that they added this console, because it clearly puts on the table the question of how much your PaaS provider should open the kimono. What’s the right amount of visibility, somewhere between “one infinitely scalable computer” and giving you fan speeds and CPU temperature?

I don’t know what the answer is, but unfortunately I am pretty sure this console is not it. It is supposed to be useful “in debugging your application and also understanding its performance characteristics“. Hmm, how so exactly? Not only is this console very simple, it’s almost useless. Let me enumerate the ways.

Misleading

Actually it’s worse than useless, it’s misleading. As we can see on the screen shot, two of the instances saw no traffic during the collection period (which, BTW, we don’t know the length of), while the third one did all the work. At the top, we see an “average latency” value. Averaging latency across instances is meaningless if you don’t weight it properly. In this case, all the requests went to the instance that had an average latency of 1709ms, but apparently the overall average latency of the application is 569.7ms (yes, that’s 1709/3). Swell.

No instance identification

What happens when the console is refreshed? Maybe there will only be two instances. How do I know which one went away? Or say there are still three, how do I know these are the same three? For all I know it could be one old instance and two new ones. The single most important data point (from the application administrator’s perspective) is when a new instance comes up. I have no way, in this UI, to know reliably when that happens: no instance identification, no indication of the age of an instance.

Average memory

So we get the average memory per instance. What are we supposed to do with that information? What’s a good number, what’s a bad number? How much memory is available? Is my app memory-bound, CPU-bound or IO-bound on this instance?

Configuration management

As I have described before, change and configuration management in a PaaS setting is a thorny problem. This console doesn’t tackle it. Nowhere does it say which version of the GAE platform each instance is running. Google announces GAE SDK releases (the bits you download), but these releases are mostly made of new platform features, so they imply a corresponding update to Google’s servers. That can’t happen instantly, there must be some kind of roll-out (whether the instances can be hot-patched or need to be recycled). Which means that the instances of my application are transitioned from one platform version to another (and presumably that at a given point in time all the instances of my application may not be using the same platform version). Maybe that’s the source of my problem. Wouldn’t it be nice if I knew which platform version an instance runs? Wouldn’t it be nice if my log files included that? Wouldn’t it be nice if I could request an app to run on a specific platform version for debugging purpose? Sure, in theory all the upgrades are backward-compatible, so it “shouldn’t matter”. But as explained above, “the worst thing you can do is have application owners suffer from a leaky abstraction and refuse to even acknowledge it“.

OK, so the instance monitoring console Google just rolled out is seriously lacking. As is too often the case with IT monitoring systems, it reports what is convenient to collect, not what is useful. I’m sure they’ll fix it over time. What this console does well (and really the main point of this blog) is illustrate the challenge of how much information about the underlying infrastructure should be surfaced.

Surface too little and you leave application administrators powerless. Surface more data but no control and you’ll leave them frustrated. Surface some controls (e.g. a way to configure the scaling out strategy) and you’ve taken away some of the PaaS simplicity and also added constraints to your infrastructure management strategy, making it potentially less efficient. If you go down that route, you can end up with the other flavor of PaaS, the IaaS-based PaaS in which you have an automated way to create a deployment but what you hand back to the application administrator is a set of VMs to manage.

That IaaS-centric PaaS is a well-understood beast, to which many existing tools and management practices can be applied. The “pure PaaS” approach pioneered by GAE is much more of a terra incognita from a management perspective. I don’t know, for example, whether exposing the platform version of each instance, as described above, is a good idea. How leaky is the “platform upgrades are always backward-compatible” assumption? Google, and others, are experimenting with the right abstraction level, APIs, tools, and processes to expose to application administrators. That’s how we’ll find out.

Post

Exalogic, EC2-on-OVM, Oracle Linux: The Oracle Open World early recap

In Amazon,Application Mgmt,Cloud Computing,Conference,Everything,Linux,Manageability,Middleware,Open source,Oracle,Oracle Open World,OVM,Tech,Trade show,Utility computing,Virtualization,Xen on September 20, 2010 by @vambenepe

Among all the announcements at Oracle Open World so far, here is a summary of those I was the most impatient to blog about.

Oracle Exalogic Elastic Cloud

This was the largest part of Larry’s keynote, he called it “one big honkin’ cloud”. An impressive piece of hardware (360 2.93GHz cores, 2.8TB of RAM, 960GB SSD, 40TB disk for one full rack) with excellent InfiniBand connectivity between the nodes. And you can extend the InfiniBand connectivity to other Exalogic and/or Exadata racks. The whole packaged is optimized for the Oracle Fusion Middleware stack (WebLogic, Coherence…) and managed by Oracle Enterprise Manager.

This is really just the start of a long linage of optimized, pre-packaged, simplified (for application administrators and infrastructure administrators) application platforms. Management will play a central role and I am very excited about everything Enterprise Manager can and will bring to it.

If “Exalogic Elastic Cloud” is too taxing to say, you can shorten it to “Exalogic” or even just “EL”. Please, just don’t call it “E2C”. We don’t want to get into a trademark fight with our good friends at Amazon, especially since the next important announcement is…

Run certified Oracle software on OVM at Amazon

Oracle and Amazon have announced that AWS will offer virtual machines that run on top of OVM (Oracle’s hypervisor). Many Oracle products have been certified in this configuration; AMIs will soon be available. There is a joint support process in place between Amazon and Oracle. The virtual machines use hard partitioning and the licensing rules are the same as those that apply if you use OVM and hard partitioning in your own datacenter. You can transfer licenses between AWS and your data center.

One interesting aspect is that there is no extra fee on Amazon’s part for this. Which means that you can run an EC2 VM with Oracle Linux on OVM (an Oracle-tested combination) for the same price (without Oracle Linux support) as some other Linux distribution (also without support) on Amazon’s flavor of Xen. And install any software, including non-Oracle, on this VM. This is not the primary intent of this partnership, but I am curious to see if some people will take advantage of it.

Speaking of Oracle Linux, the next announcement is…

The Unbreakable Enterprise Kernel for Oracle Linux

In addition to the RedHat-compatible kernel that Oracle has been providing for a while (and will keep supporting), Oracle will also offer its own Linux kernel. I am not enough of a Linux geek to get teary-eyed about the birth announcement of a new kernel, but here is why I think this is an important milestone. The stratification of the application runtime stack is largely a relic of the past, when each layer had enough innovation to justify combining them as you see fit. Nowadays, the innovation is not in the hypervisor, in the OS or in the JVM as much as it is in how effectively they all combine. JRockit Virtual Edition is a clear indicator of things to come. Application runtimes will eventually be highly integrated and optimized. No more scheduler on top of a scheduler on top of a scheduler. If you squint, you’ll be able to recognize aspects of a hypervisor here, aspects of an OS there and aspects of a JVM somewhere else. But it will be mostly of interest to historians.

Oracle has by far the most expertise in JVMs and over the years has built a considerable amount of expertise in hypervisors. With the addition of Solaris and this new milestone in Linux access and expertise, what we are seeing is the emergence of a company for which there will be no technical barrier to innovation on making all these pieces work efficiently together. And, unlike many competitors who derive most of their revenues from parts of this infrastructure, no revenue-protection handcuffs hampering innovation either.

Fusion Apps

Larry also talked about Fusion Apps, but I believe he plans to spend more time on this during his Wednesday keynote, so I’ll leave this topic aside for now. Just remember that Enterprise Manager loves Fusion Apps.

And what about Enterprise Manager?

We don’t have many attention-grabbing Enterprise Manager product announcements at Oracle Open World 2010, because we had a big launch of Enterprise Manager 11g earlier this year, in which a lot of new features were released. Technically these are not Oracle Open World news anymore, but many attendees have not seen them yet so we are busy giving demos, hands-on labs and presentations. From an application and middleware perspective, we focus on end-to-end management (e.g. from user experience to BTM to SOA management to Java diagnostic to SQL) for faster resolution, application lifecycle integration (provisioning, configuration management, testing) for lower TCO and unified coverage of all the key parts of the Oracle portfolio for productivity and reliability. We are also sharing some plans and our vision on topics such as application management, Cloud, support integration etc. But in this post, I have chosen to only focus on new product announcements. Things that were not publicly known 48 hours ago. I am also not covering JavaOne (see Alexis). There is just too much going on this week…

Just kidding, we like it this way. And so do the customers I’ve been talking to.

Post

The PaaS Lament: In the Cloud, application administrators should administrate applications

In Application Mgmt,Cloud Computing,Everything,IT Systems Mgmt,Manageability,Mgmt integration,Middleware,PaaS,Utility computing,Virtualization on September 12, 2010 by @vambenepe

Some organizations just have “systems administrators” in charges of their applications. Others call out an “application administrator” role but it is usually overloaded: it doesn’t separate the application platform administrator from the true application administrator. The first one is in charge of the application runtime infrastructure (e.g. the application server, SOA tools, MDM, IdM, message bus, etc). The second is in charge of the applications themselves (e.g. Java applications and the various artifacts that are used to customize the middleware stack to serve the application).

In effect, I am describing something close to the split between the DBA and the application administrators. The first step is to turn this duo (app admin, DBA) into a triplet (app admin, platform admin, DBA). That would be progress, but such a triplet is not actually what I am really after as it is too strongly tied to a traditional 3-tier architecture. What we really need is a first-order separation between the application administrator and the infrastructure administrators (not the plural). And then, if needed, a second-order split between a handful of different infrastructure administrators, one of which may be a DBA (or a DBA++, having expanded to all data storage services, not just relational), another of which may be an application platform administrator.

There are two reasons for the current unfortunate amalgam of the “application administrator” and “application platform administrator” roles. A bad one and a good one.

The bad reason is a shortcomings of the majority of middleware products. While they generally do a good job on performance, reliability and developer productivity, they generally do a poor job at providing a clean separation of the performance/administration functions that are relevant to the runtime and those that are relevant to the deployed applications. Their usual role definitions are more structured along the lines of what actions you can perform rather than on what entities you can perform them. From a runtime perspective, the applications are not well isolated from one another either, which means that in real life you have to consider the entire system (the middleware and all deployed applications) if you want to make changes in a safe way.

The good reason for the current lack of separation between application administrators and middleware administrators is that middleware products have generally done a good job of supporting development innovation and optimization. Frameworks appear and evolve to respond to the challenges encountered by developers. Knobs and dials are exposed which allow heavy customization of the runtime to meet the performance and feature needs of a specific application. With developers driving what middleware is used and how it is used, it’s a natural consequence that the middleware is managed in tight correlation with how the application is managed.

Just like there is tension between DBAs and the “application people” (application administrators and/or developers), there is an inherent tension in the split I am advocating between application management and application platform management. The tension flows from the previous paragraph (the “good reason” for the current amalgam): a split between application administrators and application platform administrators would have the downside of dampening application platform innovation. Or rather it redirects it, in a mutation not unlike the move from artisans to industry. Rather than focusing on highly-specialized frameworks and highly-tuned runtimes, the application platform innovation is redirected towards the goals of extreme cost efficiency, high reliability, consistent security and scalability-by-default. These become the main objectives of the application platform administrator. In that perspective, the focus of the application architect and the application administrator needs to switch from taking advantage of the customizability of the runtime to optimize local-node performance towards taking advantage of the dynamism of the application platform to optimize for scalability and economy.

Innovation in terms of new frameworks and programming models takes a hit in that model, but there are ways to compensate. The services offered by the platform can be at different levels of generality. The more generic ones can be used to host innovative application frameworks and tools. For example, a highly-specialized service like an identity management system is hard to use for another purpose, but on the other hand a JVM can be used to host not just business applications but also platform-like things like Hadoop. They can run in the “application space” until they are mature enough to be incorporated in the “application platform space” and become the responsibility of the application platform administrator.

The need to keep a door open for innovation is part of why, as much as I believe in PaaS, I don’t think IaaS is going away anytime soon. Not only do we need VMs for backward-looking legacy apps, we also need polyvalent platforms, like a VM, for forward-looking purposes, to allow developers to influence platform innovation, based on their needs and ideas.

Forget the guillotine, maybe I should carry an axe around. That may help get the point across, that I want to slice application administrators in two, head to toe. PaaS is not a question of runtime. It’s a question of administrative roles.