William Vambenepe's blog

IT management in a changing IT world

Archive for the 'Research' Category

31
Mar
2008

Where will you be when the Semantic Web gets Grid’ed?

by William Vambenepe

I see the tide rising for semantic technologies. On the other hand, I wonder if they don’t need to fail in order to succeed.

Let’s use the Grid effort as an example. By “Grid effort” I mean the work that took place in and around OGF (or GGF as it was known before its merger w/ EGA). That community, mostly made of researchers and academics, was defining “utility computing” and creating related technology (e.g. OGSA, OGSI, GridFTP, JSDL, SAGA as specs, Globus and Platform as implementations) when Amazon was still a bookstore. There was an expectation that, as large-scale, flexible, distributed computing became a more pressing need for the industry at large, the Grid vision and technology would find their way into the broader market. That’s probably why IBM (and to a lesser extent HP) invested in the effort. Instead, what we are seeing is a new approach to utility computing (marketed as “cloud computing”), delivered by Amazon and others. It addresses utility computing with a different technology than Grid. With X86 virtualization as a catalyst, “cloud computing” delivers flexible, large-scale computing capabilities in a way that, to the users, looks a lot like their current environment. They still have servers with operating systems and applications on them. It’s not as elegant and optimized as service factories, service references (GSR), service handle (GSH), etc but it maps a lot better to administrators’ skills and tools (and to running the current code unchanged). Incremental changes with quick ROI beat paradigm shifts 9 times out of 10.

Is this indicative of what is going to happen with semantic technologies? Let’s break it down chronologically:

  1. Trailblazers (often faced with larger/harder problems than the rest of us) come up with a vision and a different way to think about what computers can do (e.g. the “computers -> compute grid” transition).
  2. They develop innovative technology, with a strong theoretical underpinning (OGSA-BES and those listed above).
  3. There are some successful deployments, but the adoption is mostly limited to a few niches. It is seen as too complex and too different from current practices for broad adoption.
  4. Outsiders use incremental technology to deliver 80% of the vision with 20% of the complexity. Hype and adoption ensue.

If we are lucky, the end result will look more like the nicely abstracted utility computing vision than the “did you patch your EC2 Xen images today” cloud computing landscape. But that’s a necessary step that Grid computing failed to leapfrog.

Semantic web technologies can easily be mapped to the first three bullets. Replace “computers -> computer grid” with “documents/data -> information” in the first one. Fill in RDF, RDFS, OWL (with all its flavors), SPARQL etc as counterparts to OGSA-BES and friends in the second. For the third, consider life sciences and defense as niche markets in which semantic technologies are seeing practical adoption. What form will bullet #4 take for semantic technology (e.g. who is going to be the EC2 of semantic technology)? Or is this where it diverges from Grid and instead gets adopted in its “original” form?

10
Jan
2008

An interesting business process query language

by William Vambenepe

While doing some research on the different ways to probe and squeeze business process definitions to extract insight relevant for IT management I ran into this very interesting paper: Querying Business Processes. It defines a query language (called BP-QL) to query process definitions. Not much in common with CMDB Federation at first sight, and CMDBf was not on my mind at the time. Until I looked at the description of the query language that the researchers came up with. It is strikingly similar to the CMDBf query language. This is not very surprising since both are graph-based query languages that rely on patterns (where the patterns mix topological aspects with constraints on node/link properties).

CMDBf is more complete in some respects. It supports properties on the relationships, not just the items. The “depthLimit” element provide more control than BP-QL’s double-headed edges. BP-QL has its own extra features, including support for joins (something we discussed in CMDBf and that could be added to the specification) and negation at the graph level (e.g. A and B are not connected by any relationship of type “foo”, which may be useful but one should remember that CMDB discovery is rarely guaranteed to be comprehensive so an open-world approach is often preferable).

Assuming a suitable CMDB model for business processes, a CMDBf-compliant CMDB should cover many of the simpler use cases addressed by BP-QL. And reciprocally, the more advanced features in BP-QL are not really specific to business process definitions (even though that’s the scope of the paper) and could well be applied to CMDBf. I was also very interested by the BP-QL “compact representation” and the implementation choices. I hadn’t heard of Active XML before, something to look into especially if, as the paper hints, it does a better job than XQuery at dealing with idrefs. And Active XML introduces some interesting federation (or at least distribution) capabilities that are not currently exploited by BP-QL but which I find intriguing and which reinforce the parallel with the declared goal of CMDBf.

Is this similarity between the query languages just an interesting pattern to notice? Or is there more to it? The parallel between BP-QL and CMDBf invites the question of whether one should model business processes in a CMDB. And if so, is a business process represented by just one CI or do you break it down into a model similar to the one the BP-QL query language works on? You would need to go that far if you wanted to use queries to the CMDB to answer questions such as those handled by the BP-QL engine. And by doing this in the context of a CMDB that contains a lot more than just process definitions, you’d be able to enrich the queries with considerations from other domains, such as application or host topology. Modeling business process steps/activities may seem like very fine-grained modeling for a CMDB, but isn’t this part of the sales pitch for federated CMDBs, that participants in the federation can provide different levels of granularity? Of course, CMDB federation might never work out. If it does work and if we use it that way, we are not talking about just supporting change management processes (which are more likely to take place at the level of the overall process definition than the individual step) but rather about management integration for a wide variety of use cases. If that means we need to drop the term CMDB along the way (and leave it for the sole usage of the IT process people), I am more than happy to oblige.

[UPDATE on 2008/01/11: Prof. Milo pointed me to this follow-up paper that proposes a similar looking query language except that this time it is targeted at monitoring process instances rather than analyzing process definitions. And the monitoring runs as a set of BPEL processes within the monitored BPEL engine. Her group is doing some very interesting work.]

28
Sep
2006

WWW2007 call for papers

by William Vambenepe

I am on the program committee this year again for the Web services track at International World Wide Web Conference. So please send us some great papers.

19
Jul
2005

A map to federated IT model repositories

by William Vambenepe

Using scissors and tape, one can stitch street maps and road maps together to obtain an aggregated map showing how to go from downtown Palo Alto to downtown San Francisco. The equivalent in IT management is to stitch together different model repositories by federating them, as a way to get a complete view of an IT system of interest. As we go about creating the infrastructure for model federation, there is a lot to be learned from the evolution of street maps.

Let’s go back to paper maps for a minute. A map of the Bay Area will tell me what highways to take to go from Palo Alto to SF. But it won’t help me get from a specific house in Palo Alto to the highway and once in SF it won’t help me get from the highway to a specific restaurant. For this, I need to find maps of downtown Palo Alto and downtown SF and somehow stitch the three maps together for an end to end view. Of course all these maps have different orientations, different scales, partial overlap, different legends, etc. Compare this to using Google maps which covers the entire itinerary and allows the user to zoom in and out at will.

Let’s now go back to IT management. In order to make IT systems more adaptable, the level of automation in their management must drastically increase. This requires simplification. Trying to capture all the complexity of a system in one automation point is neither scalable nor maintainable. But one cannot simply wave a wand and make a system simpler. The basic building blocks of IT are not getting simpler: the number of transistors on a chip is going up, the number of lines of code in an application is going up, the number of data items in a customer record is going up. Literal simplification would be going back to mechanical calculators and paper records… What I really mean by simplification is decomposing the system into decision points (or control points) that process information and take action at a certain level of granularity. For example, an “employee provisioning” control point is written in terms of “mail account provisioning” and “payroll addition”, not in terms of “increasing size of a DB table”. That’s simplification. Of course, someone needs to worry about allocating enough space in the database. There is another control point at that lower level of granularity. The challenge in front of us is to find a way to seamlessly integrate the models at these different levels of granularity. Because they are obviously linked. The performance and reliability of the “employee provisioning” service is affected by the performance and reliability of the database. Management services need to be able to navigate across these models. We need to do this in a way inspired by Google Maps, not by stitching paper maps. Let’s use the difference between these two types of maps to explore the requirements of infrastructure for IT models federation.

Right level of granularity

The publishers of a paper map decide, based on space constraints, which streets are shown. With Google Maps, as you zoom in and out smaller streets show up and disappear. Similarly, an IT model should be exposed in a way that allows the consumer to decide what level of granularity is presented.

Machine-readable

Paper maps are for people, Google Maps can be used by people and programs. IT models must be exposed in a way that doesn’t assume a human sitting in front of a console is the consumer of the information.

Open to metadata and additional info

To add information to a paper map, you have to retrieve the information, find out where on the map it belongs and manually add it there. Google map lets you overlay any information directly on top of the map (see Housingmaps.com). Similarly, IT model federation requires the ability to link metadata and extra model information about model elements to the representation of the model, even if that information resides outside the model repository.

Standards-based

Google provides documentation for its maps service. It’s not a standard, but at least it’s documented and publicly accessible. Presumably they are not going to sue their users for patent violation. Time will tell whether this is good enough for the mapping world. In the IT management world, this will not be enough. Customers demand real standards to protect their investment, speed up deployment and prevent unneeded integration costs. Vendors need it as protection (however imperfect) against patent threats, as a way to focus their energy on value-added products rather than plumbing and just because smart customers demand it.

Seamless integration

I don’t know if Google gets all its mapping information from one source or from several, and I don’t need to know it. As I move North, South, East, West and zoom in and out, it is a seamless experience. The same needs to be true in the way federated models are exposed. The framework through which this takes place should provide seamless integration across sources. And simplify as much as possible discovery of the right source for the information needed.

Support for different metamodels

Not all maps use the same classification and legend. Similarly, not all models repositories use the same meta-model. Two meta-models might have the notion of “owner” of a resource but call it differently and provide different information about the owner. Seamless integration requires support for model bridging.

Searchable

Federated models repositories need to be efficiently searchable.

Up to date

Paper maps age quickly. Google Maps is more likely (but not guaranteed) to be up to date. Federated models must be as close a representation of the real state of the system as possible.

Secure

As you are composing information from different sources, the seamless navigation among these resources needs to be matched by similar seamless integration in the way the access is secured, using security federation.

Note 1: When I talk about navigating “models” in this entry, I am referring to an instance model that describes a system. For example, such a “model” can be a set of applications along with the containers in which they live, the OS these containers run on and the servers that host them. That’s one “model”. If the information is distributed among a set of MBean servers, CMOM, etc, then this is a federated model. I know some people don’t call this a “model” and I am not married to this word. Based on the analogy used in this entry, “system map” and “federated system map” would work just as well.

Note 2: This entry corresponds to the presentation I gave when participating in a panel (which I also moderated) on “Quality of Manageability of Web Services” at the IEEE ICWS 2005 conference in Orlando last week. The other speakers were Dr. Hemant Jain (UW Milwaukee), Dr. Hai Jin (Huazhong University of Science and Technology), Heather Kreger (IBM), Dr. Geng Lin (Cisco). Unfortunately, the presentation was made quite challenging when (1) the microphone stopped working (it was in a large ballroom), (2) a rainstorm had us compete with the sound of thunder, (3) torrential rain started to fall on the roof of our one-story building, turning the room into a resonance box and, to top it off, (4) the power went off completely in the entire hotel leaving me to try to continue talking by the light of the laptop screen and the emergency exit lights…. With all this plus time constraints, I am not sure I did a good job making my point clear. This entry hopefully does a better job than the presentation. The conference was quite interesting. In addition to the panel I also presented a co-authored paper based on an HP Lab project. The paper is titled “Dealing with Scale and Adaptation of Global Web Services Management”. The conference also allowed me to finally meet Steve Loughran face to face. Congrats to Steve and Ed Smith for being awarded the “Best paper” award for “Rethinking the Java SOAP stack“, also known as “the Alpine paper”. When a papers gets a nickname you know it is having an impact…

Categories