Category Archives: Research

The necessity of PaaS: Will Microsoft be the Singapore of Cloud Computing?

From ancient Mesopotamia to, more recently, Holland, Switzerland, Japan, Singapore and Korea, the success of many societies has been in part credited to their lack of natural resources. The theory being that it motivated them to rely on human capital, commerce and innovation rather than resource extraction. This approach eventually put them ahead of their better-endowed neighbors.

A similar dynamic may well propel Microsoft ahead in PaaS (Platform as a Service): IaaS with Windows is so painful that it may force Microsoft to focus on PaaS. The motivation is strong to “go up the stack” when the alternative is to cultivate the arid land of Windows-based IaaS.

I should disclose that I work for one of Microsoft’s main competitors, Oracle (though this blog only represents personal opinions), and that I am not an expert Windows system administrator. But I have enough experience to have seen some of the many reasons why Windows feels like a much less IaaS-friendly environment than Linux: e.g. the lack of SSH, the cumbersomeness of RDP, the constraints of the Windows license enforcement system, the Windows update mechanism, the immaturity of scripting, the difficulty of managing Windows from non-Windows machines (despite WS-Management), etc. For a simple illustration, go to EC2 and compare, between a Windows AMI and a Linux AMI, the steps (and time) needed to get from selecting an image to the point where you’re logged in and in control of a VM. And if you think that’s bad, things get even worse when we’re not just talking about a few long-lived Windows server instances in the Cloud but a highly dynamic environment in which all steps have to be automated and repeatable.

I am not saying that there aren’t ways around all this, just like it’s not impossible to grow grapes in Holland. It’s just usually not worth the effort. This recent post by RighScale illustrates both how hard it is but also that it is possible if you’re determined. The question is what benefits you get from Windows guests in IaaS and whether they justify the extra work. And also the additional license fee (while many of the issues are technical, others stem more from Microsoft’s refusal to acknowledge that the OS is a commodity). [Side note: this discussion is about Windows as a guest OS and not about the comparative virtues of Hyper-V, Xen-based hypervisors and VMWare.]

Under the DSI banner, Microsoft has been working for a while on improving the management/automation infrastructure for Windows, with tools like PowerShell (which I like a lot). These efforts pre-date the Cloud wave but definitely help Windows try to hold it own on the IaaS battleground. Still, it’s an uphill battle compared with Linux. So it makes perfect sense for Microsoft to move the battle to PaaS.

Just like commerce and innovation will, in the long term, bring more prosperity than focusing on mining and agriculture, PaaS will, in the long term, yield more benefits than IaaS. Even though it’s harder at first. That’s the good news for Microsoft.

On the other hand, lack of natural resources is not a guarantee of success either (as many poor desertic countries can testify) and Microsoft will have to fight to be successful in PaaS. But the work on Azure and many research efforts, like the “next-generation programming model for the cloud” (codename “Orleans”) that Mary Jo Foley revealed today, indicate that they are taking it very seriously. Their approach is not restricted by a VM-centric vision, which is often tempting for hypervisor and OS vendors. Microsoft’s move to PaaS is also facilitated by the fact that, while system administration and automation may not be a strength, development tools and application platforms are.

The forward-compatible Cloud will soon overshadow the backward-compatible Cloud and I expect Microsoft to play a role in it. They have to.

10 Comments

Filed under Application Mgmt, Automation, Azure, Cloud Computing, DevOps, Everything, IT Systems Mgmt, Linux, Manageability, Mgmt integration, Microsoft, Middleware, Oslo, PaaS, Research, Utility computing, WS-Management

The datacenter as a programmable entity

This is an exciting time for those who want to shrink the computer. They are having a field day playing with devices powered by Android, the iPhone’s Cocoa, Palm’s new WebOS, Windows Mobile, JavaFX (maybe one day) and, to a lesser extent, the Blackberry.

But times are good too for those who want to go the other way and program larger things rather than smaller ones. If you are interested in thinking about datacenters as a programmable entity, you are in luck: for these long plane trips when you run out of battery, bring a printout of the proceedings of the research meeting organized last year in Cambridge by Microsoft and HP Labs, titled “The Rise and Rise of the Declarative Datacentre”. When you’re back on-line go check the presentations on the site.

And if you liked Paul Anderson’s “Programming the Data Centre” presentation at the Cambridge meeting, you can also read his “Programming the Virtual Infrastructure” slides from LISA 08. More LISA 08 presentations here.

I got the link to Paul Anderson’s second presentation (and maybe also the first one, some time ago) from Steve Loughran, who also adds a few comments, starting with the debate between the declarative and procedural approaches. This question has plenty of down-the-road implications. There is a lot to like about the declarative approach in terms of composition, manageability and more generally as a framework to manage complexity via encapsulation.

A simple analogy for this debate is to think about driving directions. The declarative approach is for me to give you a map with a circle on it showing where my house is and let you find your way. It’s more work for you but it’s also more resilient. The procedural approach is for me to give you a set of turn-by-turn directions, based on where you are coming from. If you miss one turn or if one road happens to be blocked at the time, then you’re in trouble.

That being said, there are enough powerful and useful PowerShell or Puppet scripts out there to give you a pause before discarding procedural approaches. While the declarative (aka “desired state”, “policy-driven” and sometimes “model-based”) approach looks a lot more elegant, at this point in time the real work usually gets done via scripts, deployment procedures or the likes.

In additin to academia, the competition between these approaches is playing out right now between all the companies and products that want to help you automate and manage your cloud deployments (public and/or private): for example, Rightscale scripts (custom scripts and Righscripts, see here and here) versus the more declarative ECML/EDML documents from Elastra. Or the very declarative approach taken by SmartFrog.

5 Comments

Filed under Automation, Cloud Computing, Conference, Desired State, Everything, Grid, Implementation, Research, Tech, Utility computing

Where will you be when the Semantic Web gets Grid’ed?

I see the tide rising for semantic technologies. On the other hand, I wonder if they don’t need to fail in order to succeed.

Let’s use the Grid effort as an example. By “Grid effort” I mean the work that took place in and around OGF (or GGF as it was known before its merger w/ EGA). That community, mostly made of researchers and academics, was defining “utility computing” and creating related technology (e.g. OGSA, OGSI, GridFTP, JSDL, SAGA as specs, Globus and Platform as implementations) when Amazon was still a bookstore. There was an expectation that, as large-scale, flexible, distributed computing became a more pressing need for the industry at large, the Grid vision and technology would find their way into the broader market. That’s probably why IBM (and to a lesser extent HP) invested in the effort. Instead, what we are seeing is a new approach to utility computing (marketed as “cloud computing”), delivered by Amazon and others. It addresses utility computing with a different technology than Grid. With X86 virtualization as a catalyst, “cloud computing” delivers flexible, large-scale computing capabilities in a way that, to the users, looks a lot like their current environment. They still have servers with operating systems and applications on them. It’s not as elegant and optimized as service factories, service references (GSR), service handle (GSH), etc but it maps a lot better to administrators’ skills and tools (and to running the current code unchanged). Incremental changes with quick ROI beat paradigm shifts 9 times out of 10.

Is this indicative of what is going to happen with semantic technologies? Let’s break it down chronologically:

  1. Trailblazers (often faced with larger/harder problems than the rest of us) come up with a vision and a different way to think about what computers can do (e.g. the “computers -> compute grid” transition).
  2. They develop innovative technology, with a strong theoretical underpinning (OGSA-BES and those listed above).
  3. There are some successful deployments, but the adoption is mostly limited to a few niches. It is seen as too complex and too different from current practices for broad adoption.
  4. Outsiders use incremental technology to deliver 80% of the vision with 20% of the complexity. Hype and adoption ensue.

If we are lucky, the end result will look more like the nicely abstracted utility computing vision than the “did you patch your EC2 Xen images today” cloud computing landscape. But that’s a necessary step that Grid computing failed to leapfrog.

Semantic web technologies can easily be mapped to the first three bullets. Replace “computers -> computer grid” with “documents/data -> information” in the first one. Fill in RDF, RDFS, OWL (with all its flavors), SPARQL etc as counterparts to OGSA-BES and friends in the second. For the third, consider life sciences and defense as niche markets in which semantic technologies are seeing practical adoption. What form will bullet #4 take for semantic technology (e.g. who is going to be the EC2 of semantic technology)? Or is this where it diverges from Grid and instead gets adopted in its “original” form?

1 Comment

Filed under Everything, Grid, HP, IBM, RDF, Research, Semantic tech, Specs, Standards, Tech, Utility computing, Virtualization

An interesting business process query language

While doing some research on the different ways to probe and squeeze business process definitions to extract insight relevant for IT management I ran into this very interesting paper: Querying Business Processes. It defines a query language (called BP-QL) to query process definitions. Not much in common with CMDB Federation at first sight, and CMDBf was not on my mind at the time. Until I looked at the description of the query language that the researchers came up with. It is strikingly similar to the CMDBf query language. This is not very surprising since both are graph-based query languages that rely on patterns (where the patterns mix topological aspects with constraints on node/link properties).

CMDBf is more complete in some respects. It supports properties on the relationships, not just the items. The “depthLimit” element provide more control than BP-QL’s double-headed edges. BP-QL has its own extra features, including support for joins (something we discussed in CMDBf and that could be added to the specification) and negation at the graph level (e.g. A and B are not connected by any relationship of type “foo”, which may be useful but one should remember that CMDB discovery is rarely guaranteed to be comprehensive so an open-world approach is often preferable).

Assuming a suitable CMDB model for business processes, a CMDBf-compliant CMDB should cover many of the simpler use cases addressed by BP-QL. And reciprocally, the more advanced features in BP-QL are not really specific to business process definitions (even though that’s the scope of the paper) and could well be applied to CMDBf. I was also very interested by the BP-QL “compact representation” and the implementation choices. I hadn’t heard of Active XML before, something to look into especially if, as the paper hints, it does a better job than XQuery at dealing with idrefs. And Active XML introduces some interesting federation (or at least distribution) capabilities that are not currently exploited by BP-QL but which I find intriguing and which reinforce the parallel with the declared goal of CMDBf.

Is this similarity between the query languages just an interesting pattern to notice? Or is there more to it? The parallel between BP-QL and CMDBf invites the question of whether one should model business processes in a CMDB. And if so, is a business process represented by just one CI or do you break it down into a model similar to the one the BP-QL query language works on? You would need to go that far if you wanted to use queries to the CMDB to answer questions such as those handled by the BP-QL engine. And by doing this in the context of a CMDB that contains a lot more than just process definitions, you’d be able to enrich the queries with considerations from other domains, such as application or host topology. Modeling business process steps/activities may seem like very fine-grained modeling for a CMDB, but isn’t this part of the sales pitch for federated CMDBs, that participants in the federation can provide different levels of granularity? Of course, CMDB federation might never work out. If it does work and if we use it that way, we are not talking about just supporting change management processes (which are more likely to take place at the level of the overall process definition than the individual step) but rather about management integration for a wide variety of use cases. If that means we need to drop the term CMDB along the way (and leave it for the sole usage of the IT process people), I am more than happy to oblige.

[UPDATE on 2008/01/11: Prof. Milo pointed me to this follow-up paper that proposes a similar looking query language except that this time it is targeted at monitoring process instances rather than analyzing process definitions. And the monitoring runs as a set of BPEL processes within the monitored BPEL engine. Her group is doing some very interesting work.]

2 Comments

Filed under Business Process, CMDB Federation, CMDBf, Everything, Graph query, Mgmt integration, Query, Research

WWW2007 call for papers

I am on the program committee this year again for the Web services track at International World Wide Web Conference. So please send us some great papers.

Comments Off

Filed under Everything, Research

A map to federated IT model repositories

Using scissors and tape, one can stitch street maps and road maps together to obtain an aggregated map showing how to go from downtown Palo Alto to downtown San Francisco. The equivalent in IT management is to stitch together different model repositories by federating them, as a way to get a complete view of an IT system of interest. As we go about creating the infrastructure for model federation, there is a lot to be learned from the evolution of street maps.

Let’s go back to paper maps for a minute. A map of the Bay Area will tell me what highways to take to go from Palo Alto to SF. But it won’t help me get from a specific house in Palo Alto to the highway and once in SF it won’t help me get from the highway to a specific restaurant. For this, I need to find maps of downtown Palo Alto and downtown SF and somehow stitch the three maps together for an end to end view. Of course all these maps have different orientations, different scales, partial overlap, different legends, etc. Compare this to using Google maps which covers the entire itinerary and allows the user to zoom in and out at will.

Let’s now go back to IT management. In order to make IT systems more adaptable, the level of automation in their management must drastically increase. This requires simplification. Trying to capture all the complexity of a system in one automation point is neither scalable nor maintainable. But one cannot simply wave a wand and make a system simpler. The basic building blocks of IT are not getting simpler: the number of transistors on a chip is going up, the number of lines of code in an application is going up, the number of data items in a customer record is going up. Literal simplification would be going back to mechanical calculators and paper records… What I really mean by simplification is decomposing the system into decision points (or control points) that process information and take action at a certain level of granularity. For example, an “employee provisioning” control point is written in terms of “mail account provisioning” and “payroll addition”, not in terms of “increasing size of a DB table”. That’s simplification. Of course, someone needs to worry about allocating enough space in the database. There is another control point at that lower level of granularity. The challenge in front of us is to find a way to seamlessly integrate the models at these different levels of granularity. Because they are obviously linked. The performance and reliability of the “employee provisioning” service is affected by the performance and reliability of the database. Management services need to be able to navigate across these models. We need to do this in a way inspired by Google Maps, not by stitching paper maps. Let’s use the difference between these two types of maps to explore the requirements of infrastructure for IT models federation.

Right level of granularity

The publishers of a paper map decide, based on space constraints, which streets are shown. With Google Maps, as you zoom in and out smaller streets show up and disappear. Similarly, an IT model should be exposed in a way that allows the consumer to decide what level of granularity is presented.

Machine-readable

Paper maps are for people, Google Maps can be used by people and programs. IT models must be exposed in a way that doesn’t assume a human sitting in front of a console is the consumer of the information.

Open to metadata and additional info

To add information to a paper map, you have to retrieve the information, find out where on the map it belongs and manually add it there. Google map lets you overlay any information directly on top of the map (see Housingmaps.com). Similarly, IT model federation requires the ability to link metadata and extra model information about model elements to the representation of the model, even if that information resides outside the model repository.

Standards-based

Google provides documentation for its maps service. It’s not a standard, but at least it’s documented and publicly accessible. Presumably they are not going to sue their users for patent violation. Time will tell whether this is good enough for the mapping world. In the IT management world, this will not be enough. Customers demand real standards to protect their investment, speed up deployment and prevent unneeded integration costs. Vendors need it as protection (however imperfect) against patent threats, as a way to focus their energy on value-added products rather than plumbing and just because smart customers demand it.

Seamless integration

I don’t know if Google gets all its mapping information from one source or from several, and I don’t need to know it. As I move North, South, East, West and zoom in and out, it is a seamless experience. The same needs to be true in the way federated models are exposed. The framework through which this takes place should provide seamless integration across sources. And simplify as much as possible discovery of the right source for the information needed.

Support for different metamodels

Not all maps use the same classification and legend. Similarly, not all models repositories use the same meta-model. Two meta-models might have the notion of “owner” of a resource but call it differently and provide different information about the owner. Seamless integration requires support for model bridging.

Searchable

Federated models repositories need to be efficiently searchable.

Up to date

Paper maps age quickly. Google Maps is more likely (but not guaranteed) to be up to date. Federated models must be as close a representation of the real state of the system as possible.

Secure

As you are composing information from different sources, the seamless navigation among these resources needs to be matched by similar seamless integration in the way the access is secured, using security federation.

Note 1: When I talk about navigating “models” in this entry, I am referring to an instance model that describes a system. For example, such a “model” can be a set of applications along with the containers in which they live, the OS these containers run on and the servers that host them. That’s one “model”. If the information is distributed among a set of MBean servers, CMOM, etc, then this is a federated model. I know some people don’t call this a “model” and I am not married to this word. Based on the analogy used in this entry, “system map” and “federated system map” would work just as well.

Note 2: This entry corresponds to the presentation I gave when participating in a panel (which I also moderated) on “Quality of Manageability of Web Services” at the IEEE ICWS 2005 conference in Orlando last week. The other speakers were Dr. Hemant Jain (UW Milwaukee), Dr. Hai Jin (Huazhong University of Science and Technology), Heather Kreger (IBM), Dr. Geng Lin (Cisco). Unfortunately, the presentation was made quite challenging when (1) the microphone stopped working (it was in a large ballroom), (2) a rainstorm had us compete with the sound of thunder, (3) torrential rain started to fall on the roof of our one-story building, turning the room into a resonance box and, to top it off, (4) the power went off completely in the entire hotel leaving me to try to continue talking by the light of the laptop screen and the emergency exit lights…. With all this plus time constraints, I am not sure I did a good job making my point clear. This entry hopefully does a better job than the presentation. The conference was quite interesting. In addition to the panel I also presented a co-authored paper based on an HP Lab project. The paper is titled “Dealing with Scale and Adaptation of Global Web Services Management”. The conference also allowed me to finally meet Steve Loughran face to face. Congrats to Steve and Ed Smith for being awarded the “Best paper” award for “Rethinking the Java SOAP stack“, also known as “the Alpine paper”. When a papers gets a nickname you know it is having an impact…

1 Comment

Filed under Everything, Research, Tech