Category Archives: Specs

March 31, 2008

Where will you be when the Semantic Web gets Grid’ed?

I see the tide rising for semantic technologies. On the other hand, I wonder if they don’t need to fail in order to succeed.

Let’s use the Grid effort as an example. By “Grid effort” I mean the work that took place in and around OGF (or GGF as it was known before its merger w/ EGA). That community, mostly made of researchers and academics, was defining “utility computing” and creating related technology (e.g. OGSA, OGSI, GridFTP, JSDL, SAGA as specs, Globus and Platform as implementations) when Amazon was still a bookstore. There was an expectation that, as large-scale, flexible, distributed computing became a more pressing need for the industry at large, the Grid vision and technology would find their way into the broader market. That’s probably why IBM (and to a lesser extent HP) invested in the effort. Instead, what we are seeing is a new approach to utility computing (marketed as “cloud computing”), delivered by Amazon and others. It addresses utility computing with a different technology than Grid. With X86 virtualization as a catalyst, “cloud computing” delivers flexible, large-scale computing capabilities in a way that, to the users, looks a lot like their current environment. They still have servers with operating systems and applications on them. It’s not as elegant and optimized as service factories, service references (GSR), service handle (GSH), etc but it maps a lot better to administrators’ skills and tools (and to running the current code unchanged). Incremental changes with quick ROI beat paradigm shifts 9 times out of 10.

Is this indicative of what is going to happen with semantic technologies? Let’s break it down chronologically:

Trailblazers (often faced with larger/harder problems than the rest of us) come up with a vision and a different way to think about what computers can do (e.g. the “computers -> compute grid” transition).
They develop innovative technology, with a strong theoretical underpinning (OGSA-BES and those listed above).
There are some successful deployments, but the adoption is mostly limited to a few niches. It is seen as too complex and too different from current practices for broad adoption.
Outsiders use incremental technology to deliver 80% of the vision with 20% of the complexity. Hype and adoption ensue.

If we are lucky, the end result will look more like the nicely abstracted utility computing vision than the “did you patch your EC2 Xen images today” cloud computing landscape. But that’s a necessary step that Grid computing failed to leapfrog.

Semantic web technologies can easily be mapped to the first three bullets. Replace “computers -> computer grid” with “documents/data -> information” in the first one. Fill in RDF, RDFS, OWL (with all its flavors), SPARQL etc as counterparts to OGSA-BES and friends in the second. For the third, consider life sciences and defense as niche markets in which semantic technologies are seeing practical adoption. What form will bullet #4 take for semantic technology (e.g. who is going to be the EC2 of semantic technology)? Or is this where it diverges from Grid and instead gets adopted in its “original” form?

1 Comment

Filed under Everything, Grid, HP, IBM, RDF, Research, Semantic tech, Specs, Standards, Tech, Utility computing, Virtualization

February 26, 2008

Unintentional comedy

With these two words, “unintentional comedy”, Damon Edwards finds the perfect description of the IT management standards scene. In a very restrained way, he captures it all:

the irony,
the predictability,
the unstated rules of the genre,
the stereotypical roles that keep reappearing: the bully, the calculator, the rambler, the simple-minded (that’s the one I used to play),
the pretentiousness,
the importance of appearances,
the necessity of conflict and tension,
the repetitiveness,
and the fact that after a while people tend to behave as caricatures of themselves.

I don’t mind being (with many others) the butt of the joke when the joke is right on. Plus, I made a similar analogy in the past: Commedia dell (stand)arte (once there, make sure you also follow the link to Umit’s verses).

To be fair, I don’t think this is limited to IT management standards. Other standard areas behave alike (OOXML vs. ODF anyone?). You can also see the bullet points above in action in many open source mailing lists. And most of all in the blogosphere. BTW Damon, why do you think the server for this blog is stage.vambenepe.com and not a more neutral blog.vambenepe.com? It’s not that I got mixed up between my staging server and my production server. It’s that I see a lot of comedy aspects to our part of the blogosphere and I wanted to acknowledge that I, like others, assume a persona on my blog and through it I play a role in a big comedy. Which is not as dismissive as it sounds, comedy can be an excellent vehicle to convey important and serious ideas. But we need people like you to remind us, from time to time, that comedy it is.

5 Comments

Filed under Everything, IT Systems Mgmt, Specs, Standards

February 23, 2008

SCA, OGSi and Spring from an IT management perspective

March starts next week and the middleware blogging bees are busy collecting OSGi-nectar, Spring-nectar, SCA-nectar, bringing it all back to the hive and seeing what kind of honey they can make from it.

Like James Governor, I had to train myself to stop associating OSGi with OGSI (which was the framework created by GGF, now OGF, to implement OGSA, and was – not very successfully – replaced with OASIS’s WSRF, want more acronyms?). Having established that OSGi does not relate to OGSI, how does it relate to SCA and Spring? What with the Sprint-OSGi integration and this call to integrate OSGi and SCA (something Paremus says they already do)? The third leg of the triangle (SCA-Spring integration) is included in the base SCA framework. Call this a disclosure or a plug as you prefer, I’ll note that many of my Oracle colleagues on the middleware side of the house are instrumental in these efforts (Hal, Greg, Khanderao, Dave…).

There is also a white paper (getting a little dated but still very much worth reading) that describes the potential integrations in this triangle in very clear and concrete terms (a rare achievement for this kind of exercise). It ends with “simplicity, flexibility, manageability, testability, reusability. A key combination for enterprise developers”. I am happy to grant the “flexibility” (thanks OSGi), “testability” (thanks Spring) and “reusability” (thanks SCA) claims. Not so for simplicity at this point unless you are one of the handful of people involved in all three efforts. As for the “manageability”, let’s call it “manageability potential” and remain friends.

That last part, manageability, is of course what interests me the most in this area. I mentioned this before in the context of SCA alone but the conjunction of SCA with Spring and/or OSGi only increases the potential. What happened with BPEL adoption provides a good illustration of this:

There are lots of JEE management tools and technologies out there, with different levels of impact on application performance (ideally low enough that they are suitable for production systems). The extent to which enterprise Java has been instrumented, probed and analyzed is unprecedented. These tools are often focused on the performance more than the configuration/dependency aspects of the application, partly because that’s easier to measure. And while they are very useful, they struggle with the task of relating what they measure to a business view of the application, especially in the case of composite applications with many shared components. Enter BPEL. Like SCA, BPEL wasn’t designed for manageability. It was meant for increased productivity, portability and flexibility. It was designed to support the SOA vision of service re-use and to allow more tasks to be moved from Java coding to infrastructure configuration. All this it helps with indeed. But at the same time, it also provides very useful metadata for application management. Both in terms of highlighting the application flow (through activities) and in terms of clarifying the dependencies and associated policies (through partner links). This allowed a new breed of application management tools to emerge that hungrily consumer BPEL process definitions and use them to better relate application management to the user-visible aspects of the application.

But the visibility provided by BPEL only goes so far, and soon the application management tools are back in bytecode instrumentation, heap analysis, transaction tracing, etc. Using a mix of standard mechanisms and “top secret”, “patent pending” tricks. In addition to all of their well-known benefits, SCA, OGSi and Spring also help fill that gap. They provide extra application metadata that can be used by application management tools to provide more application context to management tasks. A simple example is that SCA’s service/reference mechanism extends BPEL partner links to components not implemented with BPEL (and provides a more complete policy framework). Of course, all this metadata doesn’t just magically organize itself in an application management framework and there is a lot of work to harness its value (thus the “potential” qualifier I added to “manageability”). But SCA, OSGi and Spring can improve application management in ways similar to what BPEL does.

Here I am again, taking exciting middleware technologies and squeezing them to extract boring management value. But if you can, like me, get excited about these management aspects then you want to follow the efforts around the conjunction of these three technologies. I understand SCA, but I need to spend more time on OGSi and Spring. Maybe this post is my way of motivating myself to do it (I wish my mental processes were instrumented with better metadata so I could answer this question with more certainty – oh please shoot me now).

And while this is all exciting, part of me also wonders whether it’s not too early to risk connecting these specifications too tightly. I have seen too many “standards framework” kind of powerpoint slides that show how a bunch of under-development specifications would precisely work together to meet all the needs of the world. I may have even written one myself. If one thing is certain in that space, it’s that the failure rate is high and over-eager re-use and linkage between specifications kills. That was one of the errors of WSDM. For a contemporary version, look at this “Leveraging CMDBf” plan at Eclipse. I am very supportive of the effort to create an open-source implementation of the CMDBf specification, but mixing a bunch of other unproven and evolving specifications (in addition to CMDBf, I see WS-ResourceCatalog, SML and a “TBD” WS API which I can’t imagine will be anything other than WS-ResourceTransfer) is very risky. And of course IBM’s good old CBE. Was this HTML page auto-generated from an IBM “standards strategy” powerpoint document? But I digress…

Bonus question: what’s the best acronym to refer to OGSi+SCA+Spring. OSS? Taken (twice). SOS? Taken (and too desperate-sounding). SSO? Taken (twice). OS2? Taken. S2O? Available, as far as I can tell, but who wants a name so easily confused with the stinky and acid-rain causing sulfur dioxide (SO2)? Any suggestion? Did I hear J3EE in the back of the room?

10 Comments

Filed under Everything, IT Systems Mgmt, OSGi, SCA, Specs, Spring, Standards

February 18, 2008

JSR262 (JMX over WS-Management) public review

If you care about exposing or accessing MBeans via WS-Management, now is a good time to read the public review draft of the JSR262 spec.

JSR262 is very much on the “manageability” side of the “manageability vs. management integration” chasm, which is not the most exciting side to me. But more commonality in manageability protocols is good, I guess, and this falls inside the WS-Management window of opportunity so it may help tip the balance.

There is also a nice white paper which does a nice job of retracing the history from JMX to JMX Remote API to JSR 262 and the different efforts along the way to provide access to the JMX API from outside of the local JVM. The white paper is actually too accurate for its own good: it explains well that models and protocols should be orthogonal (there is a section titled “The Holy Grail of Management: Model, Data and Protocol Independence”) which only highlights the shortcomings of JSR262 in that regard.

In a what looks from the outside like a wonderful exercise of “when you have a hammer” (and also “when you work in a hammer factory” like the JCP), this whole Java app management effort has been API-driven rather than model-driven. What we don’t get out of all this is a clearly defined metamodel and a set of model elements for Java apps with an XML serialization that can be queried and updated. What we do get is a mapping of “WS-Management protocol operations to MBean and MBean server operations” that “exposes JMX technology MBeans as WS-Management resources”.

Yes it now goes over HTTP so it can more easily fool firewalls, but I am yet to see such a need in manageability scenarios (other than from hackers who I am sure are very encouraged by the development). Yes it is easier for a non-Java endpoint to interact with a JSR262 endpoint than before but this is an incremental improvement above the previous JMX over RMI over IIOP because the messages involved still reflect the underlying API.

Maybe that’s all ok. There may very well not be much management integration possible at the level of details provided by JMX APIs. Management integration is probably better served at the SCA and OSGi levels anyway. Having JSR262 just provide incremental progress towards easier Java manageability by HP OVO and the like may be all we should ask of it. I told some of the JSR262 guys, back when they were creating their own XML over HTTP protocol to skirt the WS-Management vs. WSDM debate, that they should build on WS-Management and I am glad they took that route (no idea how much influence my opinion had on this). I just can’t get really excited about the whole thing.

All the details on the current status of JSR262 on Jean-Francois Denise’s blog.

6 Comments

Filed under Everything, JMX, Manageability, Mgmt integration, Specs, Standards, WS-Management

February 15, 2008

Comparing Joe Gregorio’s RESTful Partial Updates to WS-ResourceTransfer

Joe Gregorio just proposed a way to do RESTful partial updates. I am not in that boat anymore but, along with my then-colleagues from HP, Microsoft, IBM and Intel, I have spent a fair bit of time trying to address the same problem, albeit in a SOAP-based way. That was WS-ResourceTransfer (WS-RT) which has been out as a draft since summer 2006. In a way, Joe’s proposal is to AtomPub what WS-ResourceTransfer is to WS-Transfer, retrofitting a partial resource update on top of a “full update” mechanism. Because of this, I read his proposal with interest. I have mentioned before that WS-RT isn’t the best-looking cow in the corral so I was ready to like Joe’s presumably simpler approach.

I don’t think it meets the bill for partial update requirements in IT management scenarios.

This is not a REST versus SOAP kind of thing and I am not about to launch in a “how do you do end to end encryption and reliable messaging” tirade. I think it is perfectly possible to meet most management scenarios in a RESTful way. And BTW, I think most management scenarios do not need partial updates at all.

But for those that do, there is just too little flexibility in Joe’s proposal. Not that it means it’s a bad proposal, I don’t have much of an idea of what his use cases are. The proposal might be perfectly adequate for them. But when I read his proposal, it’s IT management I was mentally trying to apply it to and it came short in that regard.

Joe’s proposal requires the server to annotate any element that can be updated. On the positive side, this “puts the server firmly in control of what sub-sections of a document it is willing to handle partial updates on” which can indeed be useful. On the negative side it is not very flexible. If you are interacting with a desired-state controller, the rules that govern what you can and cannot change are often a lot more complex than “you can change X, you can’t change Y”. Look at SML validation for an example.

Another aspect is that the requester has to explicitly name the elements to replace. That could make for a long list. And it creates a risk of race conditions. If I want to change all the elements that have an attribute “foo” with a value “bar” I need to retrieve them first so that I can find their xml:id. Then I need to send a message to update them. But if someone changed them in the meantime, they may not have the “bar” value anymore and I am going to end up updating elements that should not be updated. Again, not necessarily a problem in some domains. An update mechanism that lets you point at the target set via something like XPath helps prevent this round-tripping (at a significant complexity cost unfortunately, something WS-RT tries to address with simplified dialects of XPath).

Joe volunteers another obvious limitation when he writes that “this doesn’t solve all the partial update scenarios, for example this doesn’t help if you have a long sub-list that you want to append to”. Indeed. And it’s even worse if you care about element order. Not something that we normally care much about in IT management (UML, MOF, etc don’t have a notion of property order) but the overuse of XSD in system modeling has resulted in order being important to avoid validation failures (because it’s really hard to write an XSD that doesn’t care about order even though it is often not meaningful to the domain being modeled).

In early 2007, I wrote an implementation of WS-RT and in the process I found many gaps in the specification, especially in the PUT operation. It is not the ideal answer in any way. If one was to try to fix it, a good place to start might be to make the specification a bit less flexible (e.g. restricting the change granularity to the level of an element, not an attribute and not a text node). There is plenty of room to find the simplicity/flexibility sweetspot for IT management scenarios between what WS-RT tries to offer and what Joe’s proposal offers.

Comments Off on Comparing Joe Gregorio’s RESTful Partial Updates to WS-ResourceTransfer

Filed under Everything, IT Systems Mgmt, Specs, WS-ResourceTransfer

February 13, 2008

Microsoft ditches SML, returns to SDM?

I gave in to the temptation of a tabloid-style title for this post, but the resulting guilt forces me to quickly explain that it is speculation and not based on any information other than what is in the links below (none of which explicitly refers to SDM or SML). And of course I work for a Microsoft competitor, so keep your skeptic hat on, as always.

The smoke that makes me picture that SML/SDM fire comes from this post on the Service Center team blog. In it, the product marketing manager for System Center Service Manager announces that the product will not ship until 2010. Here are the reasons given.

“The relevant feedback here can be summarized as:

Improve performance
Enhance integration with the rest of the System Center product family and with the wider Microsoft product offering

To meet these requirements we have decided to replace specific components of the Service Manager infrastructure. We will also take this opportunity to align the product with the rest of the System Center family by taking advantage of proven technologies in use in those products”

Let’s rewind a little bit and bring some context. Microsoft developed the Service Definition Model (SDM) to try to capture a consistent model of IT resources. There are several versions of SDM out there, and one of them is currently used by Operations Manager. It is how you capture domain-specific knowledge in a Management Pack (Microsoft’s name for a plug-in that lets you bring a new target type to Operations Manager). In order to get more people to write management packs that Operations Manager can consume, Microsoft decided to standardize SDM. It approached companies like IBM and HP and the SDM specification became SML. Except that there was a lot in SDM that looked like XSD, so SML was refactored as an extension of XSD (pulling in additions from Schematron) rather than a more stand-alone, management-specific approach like SDM. As I’ve argued before (look for the “XSD in SML” paragraph), in retrospect this was the wrong choice. SML was submitted to W3C and is now well advanced towards completion as a standard. Microsoft was forging ahead with the transition from SDM to SML and when they announced their upcoming CMDB they made it clear that it would use SML as its native metamodel (“we’re taking SML and making it the schema for CMDB” said Kirill Tatarinov who then headed the Service Center group).

Back to the present time. This NetworkWorld article clarifies that it’s a redesign of the CMDB part of Service Center that is causing the delay: “beta testing revealed performance and scalability issues with the CMDB and Microsoft plans to rebuild its architecture using components already used in Operations Manager.” More specifically, Robert Reynolds, a “group product planner for System Center” explains that “the core model-based data store in Operations Manager has the basic pieces that we need”. That “model-based data store” is the one that uses SDM. As a side note, I would very much like to know what part of the “performance and scalability issues” come from using XSD (where a lot of complications come from features not relevant for systems management).

Thus the “enhance integration with the rest of the System Center product family” in the original blog post reads a lot like dumping SML as the metamodel for the CMDB in favor of SDM (or an updated version of SDM). QED. Kind of.

In addition to the problems Microsoft uncovered with the Service Center Beta, the upcoming changes around project Oslo might have further weakened the justification for using SML. In another FUD-spreading blog post, I hypothesized about what Oslo means for SML/CML. This recent development with the CMDB reinforces that view.

I understand that there is probably more to this decision at Microsoft than the SML/SDM question but this aspect is the one that may have an impact not just on Microsoft customers but on others who are considering using SML. In the larger scheme of things, the overarching technical question is whether one metamodel (be it SDM, SML, MOF or something else) can efficiently be used to represent models across the entire IT stack. I am growing increasingly convinced that it cannot.

4 Comments

Filed under CMDB, Everything, IT Systems Mgmt, Microsoft, Oslo, SML, Specs, Standards

January 30, 2008

HP’s GIFt to the SOA world

I just noticed a press release from HP to announce the release of GIF (the Governance Interoperability Framework). In short, GIF is a specification that describes how to use the HP SOA Registry for governance tasks that go beyond what UDDI can do. It has been around for a long time inside Systinet then Mercury then HP and some partners had been somewhat enrolled in the program (whatever that meant) but it wasn’t clear what HP was really planning to do with it. Looks like they have decided to put some muscle into it by attracting more partners and releasing it. Or at least announcing that they would release it. I can’t find it on the HP site so I can’t see if and how the specification changed since when I was in HP. It will be interesting to see if they present it as a neutral specification of which the HP SOA Registry is one implementation or as something that requires that registry.

I also looked for it on Wikipedia since the press release declares that it will be made available there but to no avail. That part puzzles me a bit since this would be pretty atypical for Wikipedia. At most there could be an article about GIF that links to the specification on hp.com. And even then, you’d have to convince Wikipedia editors that the topic is “worthy of notice”. Or maybe they meant to refer to an HP wiki and some confused editor turned that into Wikipedia?

The press release has a few additional items (yet more fancy-sounding SOA offering from HP Services and some new OEM-friendly packaging for the Registry) but they don’t seem too exciting to me. The GIF action is what could be interesting if things really get moving on this. In any case, congratulations to Luc and Dave.

[UPDATED 2008/2/4: turns out Luc isn’t at HP anymore, he’s joined Active Endpoints as Senior Director of Product Management. Double congrats then, one set for the past work on GIF and the other for the new job.]

[UPDATED 2008/2/5: there is now a Wikipedia page with a description of GIF. But still no sign of the specification itself on the HP GIF page.]

[UPDATED 2008/3/16: you can now download the spec (but you’ll need to register with HP first).]

1 Comment

Filed under Everything, Governance, HP, Specs

January 13, 2008

A sign of life from the CML working group

In a recent entry on this blog I remarked that “apparently restaurant owners are easy preys for incompetent Web site designers” because they often end up with stupid Flash-only web sites. Well, it’s not just restaurant owners apparently, it’s also leading technology companies like IBM, Microsoft, HP, Cisco, etc. For proof, have a look at the web site for the CML effort: http://www.cml-project.org/

The only reason I would subject you to this awful site is that there is some news about CML. The group of companies involved in this effort has released a white paper to explain what CML is. You can spare yourself the Flash part by getting the white paper directly. If you want to spare yourself even more, you will also skip reading the white paper and just read my not-so-enthusiastic summary bellow. And you need neither Flash nor Acrobat to do so.

CML is going to solve parts or all of all problems related to IT management, and it will be based on SML. So much we can tell from the paper. Any other information that we glean there is contradicted somewhere else in the same paper.

No, really, isn’t CML a set of model elements expressed in SML? Yes, it says so:

“At its core, CML is a collection of models which are expressed as XML documents that describe IT entities and their relationships. As the basis for common modeling elements and semantics, the models describe information which can be exchanged between management tools and managed resources. This information is about IT systems and includes infrastructure (e.g., servers, application servers, Web services), logical entities (e.g., software license, incident reports, IT roles), and relationships (e.g., hosted, is hosted by, supplied by).”

But the next paragraph says:

“However, CML does not attempt to present a single model or single set of models which will ensure integration. Instead, CML provides a means for creating new models or extending, combining, or evolving existing models.”

We are also told that even though it provides a model of a server, it doesn’t supercede CIM. Oh, and it may not be just models either, it will also tell you about on-the-wire protocols:

“recommendations in the documentation, including the need to transmit models in an interoperable manner via an agreed upon set of wire protocols”

The “CML overview” picture leaves us with the same impression that it does everything and across the entire lifecycle at that. And I must admit that the nuance between “common modeling elements” and “shared modeling elements” escapes me, even after reading the definitions. I sounds like they are all reusable but some are more reusable than others…

If you are looking for a ray of hope, I found mine in the acknowledgement of the potential role of RDF/OWL which, combined with section 4.1.5, can be seen as hinting at an effort towards creating some simple ontologies for management integration. Which could be very useful if well-scoped.

If you are wondering about timelines, you’re out of luck. When referring to CML the white paper mixes present and future tenses, and also throws in some conditionals for good measure. Hard to guess how much is real at this point. And the next steps are? More scenarios and some guidelines. See you in 2009 (which BTW is consistent with my little theory about Oslo’s impact on CML).

All in all, this doesn’t mean nothing valuable will ever come out of CML. It just means that the group still hasn’t figured out what it wants to be when it grows up.

[UPDATED on 2008/01/24: Good news! They revamped the site to remove the stupid Flash interface. I hope my rant provided ammunitions to those inside the CML group who pushed for sanity. Also, they have put out a press release to announce, retroactively, the white paper. No surprise in the content of the press release and the associated vendor quotes. I wish that whoever wrote the quote for my ex-boss Mark Potts knew the difference between “compliment” and “complement” though.]

1 Comment

Filed under CML, Everything, Flash, Specs

December 19, 2007

How not to re-use XML technologies

I like XML. Call me crazy but I find it relatively easy to work with. Whether it is hand-editing an XML document in a text editor, manipulating it programmatically (as long as you pick a reasonable API, e.g. XOM in Java), transforming it (e.g. XSLT) or querying an XML back-end through XPath/XQuery. Sure it carries useless features that betray its roots in the publishing world (processing instructions anyone?), sure the whole attribute/element overlap doesn’t have much value for systems modeling, but overall it hits a good compromise between human readability and machine processing and it has a pretty solid extensibility story with namespaces.

In addition, the XML toolbox of specifications is very large and offers standard-based answers to many XML-related tasks. That’s good, but when composing a solution it also means that one needs to keep two things in mind:

not all these XML specifications are technically sound (even if they carry a W3C stamp of approval), and
just because XML’s inherent flexibility lets one stretch a round hole, it doesn’t mean it’s a good idea to jam a square peg into it.

The domain of IT management provides examples for both of these risks. These examples constitute some of the technical deficiencies of management-related XML specifications that I mentioned in the previous post. More specifically, let’s look at three instances of XML mis-use that relate to management-related specifications. We will see:

a terrible XML specification that infects any solution it touches (WS-Addressing, used in WS-Management),
a mediocre XML specification that has plenty of warts but can be useful for a class of problems, except in this case it isn’t (XSD, used in SML), and
a very good XML specification except it is used in the wrong place (XPath, used in CMDBf).

Let’s go through them one by one.

WS-Addressing in WS-Management

The main defect of WS-Management (and of WSDM before it) is probably its use of WS-Addressing. SOAP needs WS-Addressing like a migraine patient needs a bullet in the head (actually, four bullets in the head since we got to deal with four successive versions). SOAP didn’t need a new addressing model, it already had URIs. It just needed a message correlation mechanism. But what we got is many useless headers (like wsa:Action) and the awful EPR construct which solves a problem that didn’t exist and creates many very real new ones. One can imagine nifty hacks that would be enabled by a templating mechanism for SOAP (I indulged myself and sketched one to facilicate mash-up style integrations with SOAP) but if that’s what we’re after then there is no reason to limit it to headers.

XSD in SML

The words “Microsoft” and “bully” often appear in the same sentence, but invariably “Microsoft” is the subject not the object of the bullying. Well, to some extent we have a reverse example here, as unlikely as it may seem. Microsoft created an XML-based meta-model called SDM that included capabilities that looked like parts of XSD. When they opened it up to the industry and floated the idea of standardizing it, they heard back pretty loudly that it would have to re-use XSD rather than “re-invent” it. So they did and that ended up as SML. Except it was the wrong choice and in retrospect I think it would have been better to improve on the original SDM to create a management-specific meta-model than swallow XSD (SML does profile out a few of the more obscure features of XSD, like xs:redefine, but that’s marginal). Syntactic validation of documents is very different from validation of IT models. Of course this may all be irrelevant anyway if SML doesn’t get adopted, which at this point still looks like the most likely outcome (due to things like the failure of CML to produce any model element so far, the ever-changing technical strategy for DSI and of course the XSD-induced complexity of SML).

XPath in CMDBf

I have already covered this in my review of CMDBf 1.0. The main problem is that while XML is a fine interchange format for the CMDBf specification, one should not assume that it is the native format of the data stores that get connected. Using XPath as a selector language makes life difficult for those who don’t use XML as their backend format. Especially when it is not just XPath 1.0 but also the much more complex XPath 2.0. To make matters worse, there is no interoperable serialization format for XPath 1.0 nodesets, which will prevent any kind of interoperability on this. That omission can be easily fixed (and I am sure it will be fixed in DMTF) but that won’t address the primary concern. In the context of CMDBf, XPath/XQuery is an excellent implementation choice for some situations, but not something that should be pushed at the level of the protocol. For example, because XPath is based on the XML model, it has clear notions of order of elements. But what if I have an OO or an RDF-based backend? What am I to make of a selector that says that the “foo” element has to come after the “bar” element? There is no notion of order in Java attributes and/or RDF properties.

Revisionism?

My name (in the context of my previous job at HP) appears in all three management specifications listed above (in increasing level of involvement as contributor for WS-Management, co-author for SML and co-editor for CMDBf) so I am not a neutral observer on these questions. My goal here is not to de-associate myself from these specifications or pick and choose the sections I want to be associated with (we can have this discussion over drinks if anyone is interested). Some of these concerns I had at the time the specifications were being written and I was overruled by the majority. Other weren’t as clear to me then as they are now (my view of WS-Addressing has moved over time from “mostly harmless” to “toxic”). I am sure all other authors have a list of things they wished had come out differently. And while this article lists deficiencies of these specifications, I am not throwing the baby with the bathwater. I wrote recently about WS-Management’s potential for providing consistency for resource manageability. I have good hopes for CMDBf, now in the DTMF, not necessarily as a federation technology but as a useful basis for increased interoperability between configuration repositories. SML has the most dubious fate at this time because, unlike the other two, it hasn’t (yet?) transcended its original supporter to become something that many companies clearly see fitting in their plans.

[UPDATED 2008/3/27: For an extreme example of purposely abusing XML technologies (namely XPath in that case) in a scenario in which it is not the right tool for the job (graph queries), check out this XPath brain teasers article.]

4 Comments

Filed under CMDB Federation, CMDBf, Everything, IT Systems Mgmt, Microsoft, SML, SOAP, SOAP header, Specs, Standards, Tech, WS-Management, XOM

November 28, 2007

CMDBf now in the hands of the DMTF

It’s now official, the CMDBf specification has been submitted to the DMTF and will be standardized there. Here is the press release and here is the specification (unchanged) republished on the DMTF site. The CMDBf working group was created a while ago at the DMTF but I didn’t report it since it wasn’t clear to me whether that was public information or not. The press release makes this clear now.

As a side note, this is one of my ongoing frustrations with the DMTF. Almost everything happens in private with no publicly-accessible URL until a press release comes out and of course lots of interesting things happen that don’t get a press release. I have heard many times that the DMTF is working on opening up the process, but I still haven’t seen much change. If this had been OASIS or W3C, the call for formation of the new working group would have been publicly accessible even before the group was created. OK, end of ranting.

As always, there isn’t much useful information to be gleaned from the text of the press release. Only that, as expected, the authors addressed the question of how this relates to CIM, since for many DMTF=CIM. So the press release proactively declares that the CMDBf work will not be limited to CIM-modeled configuration data. What this means in practice will be seen later (e.g. will there be CIM-specific extensions?).

Having seen how executive quotes for press releases get generated I hate to read too much into them, but another thing I can’t help noticing in the press release is that none of the quotes from the companies submitting the specification tout federation, but simply “integration” or “sharing”. For example: “integration and interoperability” (BMC), “share data” (CA), “sharing of information” (HP), “view, track and change information” (IBM), “exchange data” (Microsoft). This more realistic assessment of what the specification does stands in contrast to the way the DMTF presents it in the press release : “this specification provides a standard way to federate management data stored in multiple different data models”. At this point, it doesn’t really provide federation and especially not across different models.

All in all, it’s as good thing for this work to be moved to a standards organization. I may join the CMDBf group at the DMTF to track it, but I don’t plan to engage very much as this area isn’t my focus anymore now that I am at Oracle. But of course everything is linked at some level in the management field.

[UPDATE on 2007/11/30: two days after posting this message I got the monthly DMTF newsletter which touches on points I raise here. So here are the relevant links. First, Mike Baskey, DMTF Chairman, shares his view on what CMDBf means for DMTF. Second, as if to respond to my rant on the opacity of the DMTF, Josh Cohen, DMTF Vice-chairman, gives an update on process improvements. Some progress indeed, but still a far cry from opening up mailing list archives so that observers can see in real time what issues are addressed and can go back in time to understand how a specific technical decision was made and what were the considerations.]

Comments Off on CMDBf now in the hands of the DMTF

Filed under CMDB, CMDB Federation, CMDBf, DMTF, Everything, IT Systems Mgmt, ITIL, Specs, Standards

November 6, 2007

Illustrative algorithm for CMDBf 1.0 Query operation

When I posted an algorithm for the server side implementation of a CMDBf Query call for version 0.95 of the specification, the interoperability testing session based on that version was over and I was pretty sure no-one but those of us who participated in that session would write an implementation of 0.95. But I published the algorithm anyway since I thought it was helpful to anyone who wanted to understand the specification in depth, even if they were not implementing it. Now that 1.o is out, there is a much higher probability of people implementing the specification, so I figured it would be worth updating the algorithm to take into account the changes from 0.95 to 1.0. So here it is.

One caveat. This algorithm assumes that the query request does not make use of the xpathExpression element because, as I have explained in my review of CMDBf 1.0, I don’t think interoperability is achievable on this feature in the current state of the specification.

As a note of caution, the previous version of the algorithm was backed by my implementation of CMDBf 0.95 for the interoperability testing, so I felt pretty confident about it. For this version of the algorithm I have not written a corresponding implementation and I have not done interoperability testing with anyone, it’s just based on my reading of the specification. The handling of depthLimit in particular is a little tricky and needs to be validated by implementation (what with creating a bunch of dummy item and relationship templates with temporary names and later going back to the original template names), please let me know if you find it flawed.

And, as previously, this is in no way an optimal implementation strategy. It is the most direct and obvious set of steps that I can come up with to implement the Query call in a way that exactly conforms to the specification. There are lots of ways to make this go faster, such as the ones I mentioned in a previous post (e.g. breaking out of loops once an instance has been removed, or not recalculating L1 and L2 over and over again for relationships in the same working set that share a source/target) plus new ones such as being smarter than my brute-force approach to handling depthLimit (in step 2).

All this said, here is the algorithm:

1) for each itemTemplate, calculate the set of all items (including relationships since they are a subclass of item) that obey the instanceIdConstraint and recordConstraint elements in the template (if present). Call this the working set for the itemTemplate.
2) for each relationshipTemplate RT that has a depthLimit element:

2.1) for i ranging from 1 to the value of maxIntermediateItems for RT:

2.1.1) create an itemTemplate that is an exact copy of the itemTemplate referenced by RT’s sourceTemplate, except that it has a new, unique temporary id (keep a record linking that new id to the id of the original source itemTemplate).
2.1.2) create an itemTemplate that is an exact copy of the itemTemplate referenced by RT’s targetTemplate, except that it has a new, unique, temporary id (keep a record linking that new id to the id of the original target itemTemplate).
2.1.3) for j ranging 1 from i:

2.1.3.1) create an itemTemplate that is an exact copy of the itemTemplate referenced by RT’s intermediateItemTemplate, except that it has a new, unique, temporary id (keep a record linking that new id to the id of the original intermediary itemTemplate).
2.1.3.2) create a relationshipTemplate that is an exact copy of RT, except that its source is the itemTemplate created in the previous iteration of the current loop (or the itemTemplate created in step 2.1.1 if j=1), its target is the itemTemplate created in the previous step and it has a new, unique, temporary id (keep a record linking that new id to RT’s id).

2.1.4) create a relationshipTemplate that is an exact copy of RT, except that its source is the last itemTemplate created in the 2.1.3 loop, its target is the itemTemplate created in 2.1.2 and it has a new, unique, temporary id (keep a record linking that new id to RT’s id).

3) for each relationshipTemplate calculate the set of all relationships that obey the instanceIdConstraint and recordConstraint elements in the template (if present). Call this the working set for the relationshipTemplate.
4) set need_to_loop = true
5) while (need_to_loop == true)

5.1) set need_to_loop = false
5.2) for each relationshipTemplate RT

5.2.1) let ITsource be the itemTemplate that is referenced as sourceTemplate by RT. Calculate the set of all items (including relationships since they are a subclass of item) that obey at least one of the instanceIdConstraint elements in ITsource (assuming there is at least one such element) and all the recordConstraint elements in ITsource. Call this the working set for ITsource.
5.2.2) let ITtarget be the itemTemplate that is referenced as targetTemplate by RT. Calculate the set of all items (including relationships since they are a subclass of item) that obey at least one of the instanceIdConstraint elements in ITtarget (assuming there is at least one such element) and all the recordConstraint elements in ITtarget. Call this the working set for ITtarget.
5.2.3) for each relationship R in the working set for RT

5.2.3.1) if the source of R is not in the working set for ITsource, then remove R from the RT working set
5.2.3.2) if the target of R is not in the working set for ITtarget, then remove R from the RT working set
5.2.3.3) if RT has a source/@minimum or a source/@maximum attribute

5.2.3.3.1) find the list L1 of all relationships in the working set for RT that have the same source as R
5.2.3.3.2) if RT has source/@minimum and the cardinality of L1 is less than this minimum then remove all relationships in L1 from the RT working set
5.2.3.3.3) if RT has source/@maximum and the cardinality of L1 is more than this maximum then remove all relationships in L1 from the RT working set

5.2.3.4) if RT has a target/@minimum or a target/@maximum attribute

5.2.3.4.1) find the list L2 of all relationships in the working set for RT that have the same target as R
5.2.3.4.2) if RT has target/@minimum and the cardinality of L2 is less than this minimum then remove all relationships in L2 from the RT working set
5.2.3.4.3) if RT has target/@maximum and the cardinality of L2 is more than this maximum then remove all relationships in L2 from the RT working set

5.3) for each itemTemplate IT:

5.3.1) let sourceRTset be the set of all relationshipTemplates that references IT as its sourceTemplate
5.3.2) let targetRTset be the set of all relationshipTemplates that references IT as its targetTemplate
5.3.3) for each item I in the IT working set

5.3.3.1) for each relationshipTemplate sourceRT in sourceRTset, if there is no relationship in the working set for sourceRT that uses I as its source, remove I from the IT working set and set need_to_loop to true
5.3.3.2) for each relationshipTemplate targetRT in targetRTset, if there is no relationship in the working set for targetRT that uses I as its source, remove I from the IT working set and set need_to_loop to true

6) process the eventual contentSelector elements and/or the @suppressFromResult attributes on the templates that have matching items/relationships in the response to remove or pair down items and relationships as requested
7) package the resulting items and relationships in a way that conforms to the CMDBf response message format (including putting each item in the <nodes> element with the appropriate @templateId attribute and putting each relationship in the <edges> element with the appropriate @templateId).
8) replace all the temporary template ids (from step 2) that appear in templateId attributes in the response with the original ids of the items and template based on the records that were kept in step 2.

Just to clarify things, what I do in step 2 is simply make explicit all the itemTemplates and relationshipTemplates that are made implicit by the depthLimit element, so that we can provide with a simpler algorithm after that assumes that all relationshipTemplate correspond to direct relationships (no intermediary). And in step 8 I hide the fact that this took place.

[UPDATED 2009/5/1: For some reason this entry is attracting a lot of comment spam, so I am disabling comments. Contact me if you’d like to comment.]

8 Comments

Filed under CMDB, CMDB Federation, CMDBf, Graph query, IT Systems Mgmt, Pseudo-algorithm, Query, Specs, Standards, Tech

October 28, 2007

Review of the CMDBf specification version 1.0

Having read the recently released CMDBf 1.0 specification over the weekend, I see several improvements since 0.95, including:

the introduction of depthLimit
the lastModified metadata element
the ability to specify more than one instanceId in a template
the ability to advertise what parts of the specification you implement
the definition of faults

But while 1.0 is more complete than 0.95, I think it makes it harder to achieve interoperability. Here are the main friction points for interop:

New role for XPath

The xpathExpression element (which replaces xpath1Selector) changes in two very important ways. First, rather than being limited to XPath 1.0, it now also allows XPath 2.0. Support for this is a lot harder to achieve for people who don’t use XML as the backend format for their data. Considering the current state of adoption of XPath 2.0 and the low level of XML complexity exposed by most CMDB models, I don’t think it was opportune to bring this into CMDBf yet. And my guess is that most implementations will stay away from this. But there is a second change, less obvious but even more problematic. XPath is not just another constraint mechanism for a CMDBf template anymore, one that returns a boolean result indicating whether the instance meets the constraint or not, as it used to be in 0.95. It is now an alternative selection and filtering mechanism that lives in parallel to all the other elements in a template (and can’t mix with them). Overall, I think this change goes too far in the direction of turning a shared agreement to exchange data in XML into an assumption that the internal data models are all based on XML. And the killer with regards to interoperability is that the specification says nothing about how the resulting node sets are serialized in the response. There may be a serialization for the XPath 2.0 model, but there is no such thing for XPath 1.0 and I don’t see in the current state of the specification how two implementations have any chance to interoperate when using this feature.

Introduction of linkDepth

As I mentioned earlier, linkDepth is a very useful addition (even though it pales in comparison to the inferencing capabilities that could have been derived from basing CMDBf on RDF). But it is also a complicated feature. The intermediateItemTemplate attribute is a good re-use of the existing plumbing, but it needs at least a detailed example. I trust that the group will generate one once they’ve caught their breath from putting out the specification.

Service capability metadata

There is a new section (#6) to provide ways to describe what CMDBf features an implementation supports. But it is a very granular representation. Basically, for every feature you can describe if you support it or not. So someone may describe that they support everything inside propertyValue, except for the “like” operator. And someone else might support all the operators but not the caseSensitive modifier. That might be ok for human consumptions, but automated scenarios rely on pre-programmed queries and that is made very hard by all the possible combinations of options. What we need is a few well-defined profiles that people implement fully. Starting of course with a profile that rules out xpathExpression.

Record metadata

This new version introduces metadata on records. While recordId and lastModified are probably well understood and interoperably usable I am a bit more dubious about whether baselineId and snapshotId are going to be interoperable across vendors based on their limited description in the specification. The nice thing is that this metadata can not only be returned but also searched on. Well, at least that’s the intent. But this goes through the recordMetadata attribute on propertyValue which, while present in the pseudo-schema, is missing in the XSD…

The contentSelector element

This new element is more flexible that the propertySubsetDirective element that it replaces. In addition to specifying what properties you want returned it also allows you to specify that you only want certain record types and/or that you only want the record(s) that were used to satisfy constraints in the template. Those are nice additions, but the way the second part is implemented (through the use of the matchedRecords attribute) seems to assume that only one record in the instance was used to match all the constraints in the template. This is not necessarily the case, an instance can be selected by having different records match the different constraints in the template as long as it has at least one matching record per constraint (line 765 says “the item satisfies all the constraints”, not “a record of the item satisfies all the constraints” and you can also see this in the example in section 4.2 where the records mentioned on lines 637 and 639 don’t have to be the same). So do you return all records that have a role in matching the template, or only those (if there is any) that matches all the constraints on their own as the text seems to imply? And if several record combinations inside an instance can be used to match the constraints in a template, do I return all of them or can I just pick any subset that matches? Also, how can I say that I want all records that established the template match, independently of their type? There doesn’t seem to be a way to do this, or is it by putting a contentSelector element with no child element and the matchedRecords attribute set to false? There won’t be much interoperability on this feature until all this is clarified.

Relationships as items

A major change between 0.95 and 1.0 is that now a relationship can match an itemTemplate. For example, if you ask for all items that were modified during the last 24 hours you will get all the items and all the relationships that meet that criteria while in the previous version you’d have to explicitly request the relationships with a relationshipTemplate if you wanted to get them too). There is a good case to be made for either view and the one that works best largely depends on your backend implementation technology (RDF, objects, SQL, CIM…). But the important thing is for the spec to be clear and on this point I think the change wasn’t made explicit enough in the query section of the specification. If Van hadn’t called my attention to this on his blog, I would have missed this important change.

Security boilerplate

There is a person at IBM (probably located in a well-stoked underground bunker in upstate NY) who has instilled the fear of god in all IBM employees (at least all those who author publicly available specifications) and forces them to include a boilerplate “security considerations” section everywhere. I have co-authored several documents with IBM employees and it never fails, even thought it doesn’t add anything useful to the specification. You should see the look of fear on the face of the IBM employees when someone else suggests doing without it. We somehow managed to sneak one such slimmer specification past the IBMers with CMDBf 0.95 but I see that this has been “corrected” in 1.0. I hope that whatever painful punishment Scott, Jacob, Andrew and Mark (or their families and pets) were subjected to in the process by the IBM security ogre wasn’t too cruel. Sure, this doesn’t really impact interoperability, but now that I don’t work for a company that makes money from ink anymore, I have even less patience for this bloating.

OK, that’s enough back seat driving for now. Hopefully the standards group that will take over the specification will address all these questions. In the context of the entire specification, these are pretty small issues and mostly easy to fix. And the CMDBf group can go on to address the hard issues of federation (including security-related issues that abound in this field if one really wants to tackle them). The current specification is a useful graph-oriented query language that is a good match for CMDB data. But it’s really just a query language (plus a simple registration system).

[UPDATE: while updating the CMDBf query algorithm, I noticed another small error: maxIntermediateItems is an attribute in the pseudo-schema but an element in the schema. Something else to fix in the next version.]

3 Comments

Filed under CMDB, CMDB Federation, CMDBf, Everything, Graph query, IT Systems Mgmt, ITIL, Query, Specs, Standards, Tech

October 25, 2007

CMDBf 1.0 specification released

The CMDBf committee has just released version 1.0 of the specification. Van Wiles has an overview of the changes between 0.95 and 1.0. I left HP soon after 0.95 was released and that’s when my participation in CMDBf ended, so Van’s summary is very useful to me. The changes he lists are not surprising and some of them already existed in draft form before 0.95 publication. I need to spend some intimate time with the specification to review the changes to the template mechanism in more details. Some of the changes have the potential to make the specification quite a bit harder to implement. This is especially the case for the introduction of “depthLimit” (but it’s probably a needed feature anyway). And the fact that relationships can now match item selectors will make things either easier or harder to implement, depending on your implementation choice (e.g. straight-to-SQL/XML or through an OO or RDF model). Congrats to the group. We should soon hear about submission for standardization.

Comments Off on CMDBf 1.0 specification released

Filed under CMDB Federation, CMDBf, Everything, IT Systems Mgmt, Query, Specs, Standards, Tech

September 16, 2007

A review of OVF from a systems management perspective

I finally took a look at OVF, the virtual machine distribution specification that was recently submitted to DMTF. The document is authored by VMware and XenSource, but they are joined in the submission to DMTF by some other biggies, namely Microsoft, HP, IBM and Dell.

Overall, the specification does a good job of going after the low-hanging fruits of VM distribution/portability. And the white paper is very good. I wish I could say that all the specifications I have been associated with came accompanied by such a clear description of what they are about.

I am not a virtualization, operating system or hardware expert. I am mostly looking at this specification from the systems management perspective. More specifically I see virtualization and standardization as two of the many threads that create a great opportunity for increased automation of IT management and more focus on the application rather than the infrastructure (which is part of why I am now at Oracle). Since OVF falls in both the “virtualization” and “standardization” buckets, it got my attention. And the stated goal of the specification (“facilitate the automated, secure management not only of virtual machines but the appliance as a functional unit”, see section 3.1) seems to fit very well with this perspective.

On the other hand, the authors explicitly state that in the first version of the specification they are addressing the package/distribution stage and the deployment stage, not the earlier stage (development) or the later ones (management and retirement). This sidesteps many of the harder issues, which is part of why I write that the specification goes after the low-hanging fruits (nothing wrong with starting that way BTW).

The other reason for the “low hanging fruit” statement is that OVF is just a wrapper around proprietary virtual disk formats. It is not a common virtual disk format. I’ve read in several news reports that this specification provides portability across VM platforms. It’s sad but almost expected that the IT press would get this important nuance wrong, it’s more disappointing when analysts (who should know better) do, as for example the Burton Group which writes in its analysis “so when OVF is supported on Xen and VMware virtualization platforms for example, a VM packaged on a VMware hypervisor can run on a Xen hypervisor, and vice-versa”. That’s only if someone at some point in the chain translates from the Xen virtual disk format to the VMware one. OVF will provide deployment metadata and will allow you to package both virtual disks in a TAR if you so desire, but it will not do the translation for you. And the OVF authors are pretty up front about this (for example, the white paper states that “the act of packaging a virtual machine into an OVF package does not guarantee universal portability or install-ability across all hypervisors”). On a side note, this reminds me a bit of how the Sun/Microsoft Web SSO MEX and Web SSO Interop Profile specifications were supposed to bridge Passport with WS-Federation which was a huge overstatement. Except that in that case, the vendors were encouraging the misconception (which the IT press happily picked up) while in the OVF case it seems like the vendors are upfront about the limitations.

There is nothing rocket-science about OVF and even as a non-virtualization expert it makes sense to me. I was very intrigued by the promise that the specification “directly supports the configuration of multi-tier applications and the composition of virtual machines to deliver composed services” but this turns out to be a bit of an overstatement. Basically, you can distribute the VMs across networks by specifying a network name for each VM. I can easily understand the simple case, where all the VMs are on the same network and talking to one another. But there is no way (that I can see) to specify the network topology that joins different networks together, e.g. saying that there is a firewall between networks “blue” and “red” that only allows traffic on port 80). So why would I create an OVF file that composes several virtual machines if they are going to be deployed on networks that have no relationships to one another? I guess the one use case I can think of would be if one of the virtual machines was assigned to two networks and acted as a gateway/firewall between them. But that’s not a very common and scalable way to run your networks. There is a reason why Cisco sells $30 billions of networking gear every year. So what’s the point of this lightweight distributed deployment? Is it just for that use case where the network gear is also virtualized, in the expectation of future progress in that domain? Is this just a common anchor point to be later extended with more advanced network topology descriptions? This looks to me like an attempt to pick a low-hanging fruit that wasn’t there.

Departing from usual practice, this submission doesn’t seem to come with any license grant, which must have greatly facilitated its release and the recruitment of supporters for the submission. But it should be a red flag for adopters. It’s worth keeping track of its IP status as the work progresses. Unless things have changed recently, DMTF’s IP policy is pretty weak so the fact that works happens there doesn’t guarantee much protection per se to the adopters. Interestingly, there are two sections (6.2 about the virtual disk format and 11.3 about the communication between the guest software and the deployment platform) where the choice of words suggests the intervention of patent lawyers: phrases like “unencumbered specification” (presumably unencumbered with licensing requirements) and “someone skilled in the art”. Which is not surprising since this is the part where the VMWare-specific, Xen-specific or Microsoft-specific specifications would plug in.

Speaking of lawyers, the section that allows the EULA to be shipped with the virtual appliance is very simplistic. It’s just a human-readable piece of text in the OVF file. The specification somewhat naively mentions that “if unattended installs are allowed, all embedded license sections are implicitly accepted”. Great, thanks, enterprises love to implicitly accept licensing terms. I would hope that the next version will provide, at least, a way to have a URI to identify the EULA so that I can maintain a list of pre-approved EULAs for which unattended deployment is possible. Automation of IT management is supposed to makes things faster and cheaper. Having a busy and expensive lawyer read a EULA as part of my deployment process goes against both objectives.

It’s nice of the authors to do the work of formatting the specification using the DMTF-approved DSPxxxx format before submitting to the organization. But using a targetnamespace in the dmtf.org domain when the specification is just a submission seems pretty tacky to me, unless they got a green light from the DMTF ahead of time. Also, it looks a little crass on the part of VMware to wrap the specification inside their corporate white paper template (cover page and back page) if this is a joint publication. See the links at http://www.vmware.com/appliances/learn/ovf.html. Even though for all I know VMware might have done most of the actual work. That’s why the links that I used to the white paper and the specification are those at XenSource, which offers the plain version. But then again, this specification is pretty much a wrapper around a virtual disk file, so graphically wrapping it may have seemed appropriate…

OK, now for some XML nitpicking.

I am not a fan of leaving elementformdefault set to “unqualified” but it’s their right to do so. But then they qualify all the attributes in the specification examples. That looks a little awkward to me (I tend to do the opposite and qualify the elements but not the attributes) and, more importantly, it violates the schema in appendix since the schema leaves attributeFormDefault to its default value (unqualified). I would rather run a validation before makings this accusation, but where are the stand-alone XSD files? The white paper states that “it is the intention of the authors to ensure that the first version of the specification is implemented in their products, and so the vendors of virtual appliances and other ISV enablement, can develop to this version of the specification” but do you really expect us to copy/paste from PDF and then manually remove the line numbers and header/footer content that comes along? Sorry, I have better things to do (like whine about it on this blog) so I haven’t run the validation to verify that the examples are indeed in violation. But that’s at least how they look to me.

I also have a problem with the Section and Content elements that are just shells defined by the value of their xsi:type attribute. The authors claim it’s for extensibility (“the use of xsi:type is a core part of making the OVF extensible, since additional type definitions for sections can be added”) but there are better ways to do extensibility in XML (remember, that’s what the X stands for). It would be better to define an element per type (disk, network…). They could possibly be based on the same generic type in XSD. And this way you get more syntactic flexibility and you get the option to have sub-types of sub-types rather than a flat list. Interestingly, there is a comment inside the XSD that defines the Section type that reads “the base class for a section. Subclassing this is the most common form of extensibility”. That’s the right approach, but somehow it got dropped at some point.

Finally, the specification seems to have been formated based on WS-Management (which is the first specification that mixed the traditional WS-spec conventions with the DMTF DSPxxxx format), which may explain why WS-Management is listed as a reference at the end even though it is not used anywhere in the specification. That’s fine but it shows in a few places where more editing is needed. For example requirement R1.5-1 states that “conformant services of this specification MUST use this XML namespace Universal Resource Identifier (URI): http://schemas.dmtf.org/ovf”. I know what a conformant service is for WS-Management but I don’t know what it is for this specification. Also, the namespace that this requirement uses is actually not defined or used by this specification, so this requirement is pretty meaningless. The table of namespaces that follows just after is missing some namespaces. For example, the prefix “xsi” is used on line 457 (xsi:any and xsi:AnyAttribute) and I want to say it’s the wrong one as xsi is usually assigned to “http://www.w3.org/2001/XMLSchema-instance” and not “http://www.w3.org/2001/XMLSchema” but since the prefix is not in the table I guess it’s anyone’s guess (and BTW, it’s “anyAttribute”, not “AnyAttribute”).

By this point I may sound like I don’t like the specification. Not at all. I still stand with what I wrote in the second paragraph. It’s a good specification and the subset of problems that it addresses is a useful subset. There are a few things to fix in the current content and several more specifications to write to complement it, but it’s a very good first step and I am glad to see VMware and XenSource collaborating on this. Microsoft is nominally in support at this point, but it remains to be seen to what extent. I haven’t seen them in the past very interested in standards effort that they are not driving and so far this doesn’t appear to be something they are driving.

6 Comments

Filed under DMTF, Everything, IT Systems Mgmt, OVF, Specs, Standards, Tech, Virtualization, VMware, XenSource

August 3, 2007

Tutorial and pseudo-algorithm for CMDBF Query operation

[UPDATE: an updated version of the algorithm that complies to version 1.0 of the specification (and not 0.95 as in this post) is now available]

The CMDBF Query operation (section 4 of the CMDBF specification) quickly becomes very intuitive to use, but it can first look a little strange to people used to SQL-style queries, because of its graph-based nature. I hope the normative text and the examples in the spec mitigate this. But just in case, here is some additional information that doesn’t belong in the spec but can be useful. Think of it as a very first draft of a primer/tutorial about that Query interface.

The easiest way to think about this interface is to think graphically. Imagine that you’re not writing your query in XML, but instead creating it in a GUI. You want to find all Windows XP machines that are owned by someone in the marketing department. First you create a circle in the GUI that represents the machine (through a Visio-like drag-and-drop into the query composer window). You right-click on that circle and in the right-click menu you select the option called “type” and you set it to “computerSystem”. Then you right-click to select “add property constraint”. You enter “OS_Version” as the property name (if your GUI tools is any good it will present you with a list to choose from, based on the previously selected type) and “WindowsXP” as the value. Already you’ve created a query that selects all Windows XP machines. Not a bad start.

But you only want the machines that are owned by marketing people. So go ahead and create another circle to represent a person. Use similar right-click actions to set the type to “person” and to set the “department” property to “marketing”.

If you submit the query at this point, you’ll get a list of all Windows XP machines and all people in the marketing department. So, rather than reducing the result (by removing machines not owned by marketing people), you have expanded it (by adding records for the marketing people). Oops.

The next step is to create a relationship that constrains the machine to belong to someone in marketing. This is done very simply in our handy GUI tool by drawing an arrow that goes from the “person” circle to the “machine” circle. This requires that there be a relationship between the two. If we want to further ensure that this relationship is of type “owns” (and not “uses” for example), we do this by right-clicking on the arrow (like we do on circles), selecting “type” and setting its value to “owns”.

If we run the query now, we get the list of all Windows XP machines owned by marketing people. We also get the list of the marketing people who own these machines and we get the relationships between people and machines. So we now have want we wanted. But maybe a little more. Maybe we only care about the list of machines, we don’t want to retrieve all the data about the marketing people. As long as we know the machines are owned by marketing people, that’s all that we care about. We don’t need to know what specific person owns what specific machine. We could simply ignore the people and relationships in the response. Or we could enrich the query to specify that the people and relationship need not be returned (but they are still part of the query in the sense that they limit what machines get returned).We do this by right-clicking on the “person” circle, and selecting the “suppress” option. Similarly, select the “suppress” option on the relationship (the arrow) too.

This query will return a list of all Windows XP machines that are owned by someone in marketing, and nothing else.

Here is what the query could look like graphically (here I assume that the GUI tools represents the fact that the arrow and the “person” circle have the “suppress” option selected on them by turning their solid lines into dotted lines and their text into italics):

The most intuitive way to think about what happens when the query gets processed is that the program looks for all instances of the patterns described by the query. In other words, it tries to superimpose the requested graph everywhere on the graph of available data and selects all the instances where the requested graph “fits”.

What does the GUI tool do behind the scene to turn this query into the proper XML, as described by the spec?

For each circle, it creates an <itemTemplate> element. For each arrow, it creates a <relationshipTemplate> element and sets its <source> and <target> elements to the right item templates. For each constraint on a circle or arrow (i.e. when we set the type or when we set the value or a give property) it creates the appropriate selector and embeds it in the <itemTemplate> or <relationshipTemplate> that corresponds to this circle or arrow. Finally, it sets the @dropDirective attribute to “true” on all the <itemTemplate> and <relationshipTemplate> elements that corresponds to circles and arrows on which the “suppress” option was selected.

Here is what the resulting query looks like in XML:

<query xmlns="http://schemas.cmdbf.org/0-9-5/datamodel">
  <itemTemplate id="machine">
    <propertyValueSelector namespace="http://example.com/computerModel" localName="OS_Version">
      <equal>Windows XP</equal>
    </propertyValueSelector>
  <recordTypeSelector namespace="http://example.com/computerModel" localName="computerSystem"/>
  </itemTemplate>
  <itemTemplate id="person" dropDirective="true">
    <propertyValueSelector namespace="http://example.com/peopleModel" localName="department">
      <equal>marketing</equal>
    </propertyValueSelector>
    <recordTypeSelector namespace="http://example.com/peopleModel" localName="person"/>
  </itemTemplate>
  <relationshipTemplate id="administers" dropDirective="true">
    <recordTypeSelector namespace="http://example.com/computerModel" localName="owns"/>
    <source ref="person"/>
    <target ref="machine"/>
  </relationshipTemplate>
</query>

Note: like all query language, the actual query depends of course on the underlying model. In this example, I assumed that the OS version is represented as a property of the machine. More commonly, the OS will be a node of its own that has a relationship with the machine. So you’d have another circle for the OS. With a property constraint on that circle (version=”WindowsXP”) and a line representing a “runs” relationship between the machine circle and the OS circle. Similarly, “marketing” could be a node of its own that people have a relationship with, rather than just a property of each person. None of this changes the logic behind the Query operation.

Now, this is nice for the user of the query, but what about the poor developer who gets the 50-pages spec thrown on his/her desk and has 2 weeks to make sense of it and implement the server side of the query? I’ve said above that the program “tries to superimpose the requested graph everywhere on the graph of available data and selects all the instances where the requested graph fits” but that’s a lot easier to write as a sentence than to implement. So here is a pseudo-algorithm to help.

1) for each itemTemplate calculate the set of all items that obey all the selectors in the template. Call this the working set for the itemTemplate.
2) for each relationshipTemplate calculate the set of all relationships that obey all the selectors in the template. Call this the working set for the relationshipTemplate.
3) set need_to_loop = true
4) while (need_to_loop == true)

4.1) set need_to_loop = false
4.2) for each relationshipTemplate RT

4.2.1) let ITsource be the itemTemplate that is referenced as source by RT
4.2.2) let ITtarget be the itemTemplate that is referenced as target by RT
4.2.3) for each relationship R in the working set for RT

4.2.3.1) if the source of R is not in the working set for ITsource, then remove R from the RT working set
4.2.3.2) if the target of R is not in the working set for ITtarget, then remove R from the RT working set
4.2.3.3) if RT has a source/@minimum or a source/@maximum attribute

4.2.3.3.1) find the list L1 of all relationships in the working set for RT that have the same source as R
4.2.3.3.2) if RT has source/@minimum and the cardinality of L1 is less than this minimum then remove all relationships in L1 from the RT working set
4.2.3.3.3) if RT has source/@maximum and the cardinality of L1 is more than this maximum then remove all relationships in L1 from the RT working set

4.2.3.4) if RT has a target/@minimum or a target/@maximum attribute

4.2.3.4.1) find the list L2 of all relationships in the working set for RT that have the same target as R
4.2.3.4.2) if RT has target/@minimum and the cardinality of L2 is less than this minimum then remove all relationships in L2 from the RT working set
4.2.3.4.3) if RT has target/@maximum and the cardinality of L2 is more than this maximum then remove all relationships in L2 from the RT working set

4.3) for each itemTemplate IT

4.3.1) let sourceRTset be the set of all relationshipTemplates that references IT as its source
4.3.2) let targetRTset be the set of all relationshipTemplates that references IT as its target
4.3.3) for each item I in the IT working set

4.3.3.1) for each relationshipTemplate sourceRT in sourceRTset, if there is no relationship in the working set for sourceRT that uses I as its source, remove I from the IT working set and set need_to_loop to true
4.3.3.2) for each relationshipTemplate targetRT in targetRTset, if there is no relationship in the working set for targetRT that uses I as its source, remove I from the IT working set and set need_to_loop to true

5) process all directives (dropDirective or propertySubsetDirective) to remove or pair down items and relationships as requested
6) package the resulting items and relationships in a way that conforms to the CMDBF response message format (including putting each item in the <nodes> element with the appropriate @templateId attribute and putting each relationship in the <edges> element with the appropriate @templateId).

There are all kinds of optimizations possible here (like breaking out of loops once an instance has been removed, or not recalculating L1 and L2 over and over again for relationships in the same working set that share a source/target), but this is the most basic form or the algorithm. The goal is to illuminate the spec, not to provide an optimal implementation strategy.

In this post, I have focused on describing and illustrating the topological aspects of the query language. The other concepts that come into play are the Selector and the Directive mechanisms. But these are a lot more familiar to people used to SQL and I think they are sufficiently explained in the spec. So I have assumed here (in steps 1, 2 and 5 of the pseudo-algorithm) that they are well understood.

4 Comments

Filed under CMDB Federation, CMDBf, Everything, Graph query, Pseudo-algorithm, Query, Specs, Standards, Tutorial

August 2, 2007

First release of the CMDBF specification

The CMDBF (CMDB Federation) group just released a first public draft of the specification. Here is a direct link to it (PDF, 1MB). That’s a very nice achievement for a group that was a bit slow to find its pace but has been very productive since the beginning of 2007. Having the interop test drive our efforts had a lot to do with it, this is an approach to repeat the next time around.

This spec will look a lot more familiar to people used to reading WS-* specs than to people used to reading ITIL manuals. The concepts and use cases are (hopefully) consistent with ITIL, but the meat of the spec is about defining interoperable SOAP-based message exchanges that realize these use cases. The spec is about defining SOAP payloads, not IT management best practices. Please adjust your expectations accordingly and you’ll like the spec a lot more.

The most useful and important part (in my mind at least), is the definition of the Query service (section 4). This is used in many interactions. It is used by clients to query a CMDB that federates many MDRs (Management Data Repositories). It is used by the clients to go interact directly with the MDRs is they choose to. And it is used by the federating CMDB to retrieve data from the MDRs. Even in the more static scenarios, in which data is replicated from the MDRs into the CMDB (instead of true federation with no replication), the Query service is still the way for clients to access federated data from the CMDB.

So, yet another query language? Indeed. But one that natively supports concepts that are key to the kind of queries most useful for CMDB scenarios, namely relationships traversal and types. This is a topological query language. I hope the simple example in section 4.2 gives a good example of what this means.

Neither SQL nor XPath/XQuery has this topology-friendly approach (which is not to say that you can’t create useful queries on CMDB data with these query languages). On the other hand, there is one query language that is inherently relationship-oriented, that has native support for the notion of class and that has received a lot more attention, interop testing and implementation experience than our effort. One that I would have loved for us to leverage in CMDBF. It’s SPARQL. But semantic web technologies seem once again to be doomed by the perception that they are too much “out there” despite all the efforts of its proponents to make them connect with XML and other non-RDF views of the world.

Final caveat, this is a first draft. It is known to be incomplete (you’ll find text boxes that describe known gaps) and not all features in the spec were tested at the interop. We need more interop, review and development before it is robust. And most importantly, a lot of the difficult aspects of federation aren’t sufficiently addressed (reconciliation, model differences, source tracing, administrative metadata…) But it is at a point where we think it gives a good idea of how we are approaching the problem.

Equipped with this foreword, I wish you a pleasant read.

3 Comments

Filed under CMDB Federation, CMDBf, Everything, Specs, Tech

July 12, 2007

Gutting the SOAP processing model

One thing I have constantly witnessed in the SOAP design work I have been involved in is the fear of headers. There are several aspects to this, and it’s very hard to sort out the causes and the consequences as they are now linked in a vicious cycle.

For one thing, headers in SOAP messages don’t map well to RPC and we all know SOAP’s history there. Even now, as people have learned that they’re supposed to say they are not doing RPC (“look, my WSDL says doc/literal therefore I am not doing RPC”), the code is still RPC-ish with the grand-children of the body being serialized into Java (or another language) objects and passed as arguments to an operation inside a machine-generated stub. No room for pesky headers.

In addition (and maybe because of this view), tools often bend over backward to prevent you from processing headers along with the rest of the message, requiring instead some handlers to be separately written and registered/configured in whatever way (usually poorly documented as people aren’t expected to do this) the SOAP stack requires.

And the circle goes on, with poor tooling support for headers being used to justify staying away from them in SOAP-based interfaces.

I wrote above that “headers in SOAP messages don’t map well to RPC” but the more accurate sentence is “headers that are not handled by the infrastructure don’t map well to RPC”. If the headers are generic enough to be processed by the SOAP stack before getting to the programmer, then the RPC paradigm can still apply. Halleluiah. Which is why there is no shortage of specs that define headers. But those are the specs written by the “big boys” of Web services, the same who build SOAP stacks. They trust themselves to create code in their stacks to handle WS-Addressing, WS-Security and WS-ReliableMessaging headers for the developers. Those headers are ok. But the little guys shouldn’t be trusted with headers. They’d poke their eyes out. Or, as Elliotte Rusty Harold recently wrote (on a related topic, namely WS-* versus REST), “developers have to be told what to do and kept from getting their grubby little hands all over the network protocols because they can’t be trusted to make the right choices”. The difference is that Elliotte is blasting any use of WS-* while I am advocating a use of SOAP that gives developers full access to the network protocol. Which goes in the same direction but without closing the door on using WS-* specs when they actually help.

I don’t mean to keep banging on my IBM friends, but I’ve spent a lot of time developing specs with well respected IBM engineers who will say openly that headers should not contain “application information”, by which they mean that headers should be processed by the SOAP stack and not require the programmer to deal with them.

That sounds very kind, and certainly making life easy for programmers is good, but what it does is rob them of the single most useful feature in SOAP. Let’s take a step back and look at what SOAP is. If we charitably leave aside the parts that are lipstick on the SOAP-RPC pig (e.g. SOAP data model, SOAP encoding and other RPC features), we are left with:

the SOAP processing model
the SOAP extensibility model
SOAP faults

Of these three, the first two are all about headers. Loose the ability to define headers in your interface, you loose 2/3 of the benefits of SOAP (and maybe even more, I would argue that SOAP faults aren’t nearly as useful as the other two items in the list).

The SOAP processing model describes how headers can be processed by intermediaries and, most important of all, it clearly spells out how the mustUnderstand attribute works. So that everyone who processes a SOAP message knows (or should know) that a header with no such attribute (or where the attribute has a value of “false”) can be ignored if not recognized, while the message must not be processed if a non-understood header has this attribute set to “true”. It doesn’t sound like much, but it is huge in terms of reliably supporting features and extending an existing set of SOAP messages.

When they are deprived of this mechanism (or rather, scared away from using it), people who create SOAP interfaces end up re-specifying this very common behavior over and over by attaching it to parts of the body. But then it has to be understood and implemented correctly every time rather than leveraging a built-in feature of SOAP that any SOAP engine must (if compliance with the SOAP spec has any meaning) implement.

Case in point, the wsman:OptimizeEnumeration element in WS-Management. The normal pattern for an enumeration (see WS-Enumeration) is to first send an Enumerate request to retrieve an enumeration context and then a series of Pull requests to get the data. In this approach, the minimum number of interactions to retrieve even a small enumeration set is 2 (one Enumerate and one Pull). The OptimizeEnumeration element was designed to allow one to request that the Enumerate call be also the first Pull so that we can save one interaction. Not a big deal if you are going to send 500 Pull messages to retrieve the data, but if the enumeration set is small enough to retrieve all the data in one or two Pull messages then you can optimize traffic by up to 50%.

WS-Management decided to do that with an extension element in the body rather than a SOAP header. The problem then is what happens when a WS-Enumeration implementation that doesn’t use the WS-Management additions gets that element. There is no general rule in SOAP about how to handle this. In this case, the WS-Enumeration specs says to ignore the extensions, but it only says so in the “notation conventions” section, and as a SHOULD, not a MUST.

Not surprisingly, some implementers made the wrong assumption on what the presence of wsman:OptimizeEnumeration means and as a result their code faults on receiving a message with this option, rather than ignoring it. Something that would likely not have happened had the SOAP processing model been used rather than re-specifying this behavior.

Using a header not only makes it clear when an extension can be ignored, it also provides, at no additional cost, the ability to specify when it should not be ignored. Rather than baking this into the spec as WS-Management tries to do (not very successfully), using headers allows a sender to decide based on his/her specific needs and capabilities whether support for this extension is a must or not in this specific instance. Usually, flexibility and interoperability tug in opposing directions, but using headers rather than body elements for protocol extensions improves both!

Two more comments about the use of headers in WS-Management:

It shows that it’s not all IBM’s fault, because IBM had no involvement in the development of the WS-Management standard,
WS-Management is far from being the worst offender at under-using SOAP headers, since it makes good use of them in other places (which probably wouldn’t have been the case had the first bullet not been true). But not consistently enough.

Now let’s look at WS-ResourceTransfer for another example. This too illustrates that those who ignore the SOAP processing model just end up re-inventing it in the body, but it also gives a beautiful illustration of the kind of compromises that come from design by committee (and I was part of this compromise so I am as guilty as the other authors).

WS-ResourceTransfer creates a set of extensions on top of the WS-Transfer specification. The main one is the ability to retrieve only a portion of the document as opposed to the whole document. Defining this extension as a SOAP header (as WS-Management does it) leverages the SOAP processing model. Doing so means you don’t have to write any text in your spec about whether this header can be ignored or not, and the behavior can be specified clearly and interoperably by the sender, through the use of the mustUnderstand attribute. In some case (say if the underlying XML document is in fact an entire database with gigabytes of data) I definitely want the receiver to fault if it doesn’t understand that I only want a portion of the document, so I will set mustUnderstand to true. In other cases (if the subset is not much smaller than the entire document and I am willing to wad through the entire document to extract the relevant data if I have to) then I’ll set mustUnderstand to “false” so that if the receiver can’t honor the extension then I get the whole document instead of a fault.

But with WS-ResourceTransfer, we had to deal with the “headers should not contain application information” view of the world in which it is heresy to use headers for these extensions (why the portion of the document to retrieve is considered “application information” while addressing information is not is beyond my reach and frankly not a debate worth having because this whole dichotomy only makes sense in an RPC view of the world).

So, between the “this information must be in the body” camp and the “let’s leverage the SOAP processing model” camp (my view, as should be obvious by now), the only possible compromise was to do both, with a mustUnderstand header that just says “look in the body to find the extensions that you must follow”. So much for elegance and simplicity. And since that header applies to all the extensions, you have to make all of them mustUnderstand or none of them. For example, you can’t have a wsrt:Create message in which the wsmex:Metadata extension must be understood but the wsrt:Fragment extension can be ignored.

WS-Management and WS-ResourceTransfer share the characteristic of being designed as extensions to existing specifications. But even when creating a SOAP-based interaction from scratch, one benefits from using the SOAP processing model rather than re-inventing ways to express whether support for an option is required or not. I have seen several other examples (in interfaces that aren’t public like the two examples above) that also re-invent this very common behavior for no good reason. Simply because developers are scared of headers.

These examples illustrate a simple point. Most of the value in SOAP resides in how headers are used in the SOAP processing model and the SOAP extensibility model. Steering developers away from defining headers is robbing them of that value. Rather than thinking in terms of header and body, I prefer to think of them as header and tail.

6 Comments

Filed under Everything, SOAP, SOAP header, Specs, Standards, Tech

June 25, 2007

CMDBF interoperability testing (alpha edition)

Last week, a bunch of us from the CMDBF author companies got together in a room for our first interoperability testing event. It was based on a subset of the spec. As usual with these kinds of events, the first hurdle was to get the network setup right (we had a mix of test endpoints running on remote servers and on our laptops; getting all laptops to talk to one another on a local network was easy; getting them all to talk to the remote endpoints over the internet was easy too; but getting both at the same time took a bit of work).

Once this was taken care of, the interop tests went pretty smoothly. A few problems were found but they were fixed on the fly and by the end we had happy MDRs (Management Data Repositories) talking to happy CMDBs who themselves were accessed by happy client applications. All this using the CMDBF-defined query and update interfaces.

Next steps are to update the spec with the lessons from the interop and to complete it with a few additional features that we put out of scope for the interop. Stay tuned.

Comments Off on CMDBF interoperability testing (alpha edition)

Filed under CMDB Federation, CMDBf, Everything, Implementation, Specs, Standards

May 31, 2007

CMDBF update

My CMDBF colleague from BMC Van Wiles has a short update on the state of the CMDBF work. This is an occasion for me to point to his blog. I know that several of my readers are very interested in the CMDBF work, and they should probably monitor Van’s posts in addition to mine.

Like Van, I see the pace accelerating, which is good. More importantly, the quality of the discussion has really improved, not just the quantity. We’ve spent way too much time in UML-land (come on people, this is a protocol, get the key concepts and move on, no need to go in UML to the same level of detail that the XML contains), but the breakthrough was the move to pseudo-schema, and from there to example XML instance docs and finally to java code for the interop. I am having a lot of fun writing this code. I used this opportunity to take XOM on a road test and it’s been a very friendly companion.

Now that we’re getting close to the CMDBF interop, I feel a lot better about the effort than I did at the beginning of the year. It’s still going to be challenging to make the interop. Even if we succeed, the interop only covers a portion of what the spec needs to deliver. But it will be a very good milestone.

1 Comment

Filed under Everything, Implementation, Specs, Standards, Tech

May 29, 2007

XMLFrag SOAP header

This HTML document titled “XMLFrag header” describes a proposal I wrote for a header that would allow one to target a SOAP message to a subset of the resource to which the message is sent. This is useful in cases where messages are used to interact with systems that have a well-known model, as is often the case in the IT systems management world. If you’ve ever used or been intrigued by WS-Management’s “FragmentTransfer” header, or WS-ResourceProperties’s “QueryResourceProperties” message or WS-RessourceTransfer’s “Expression” element then you may be interested by a different approach that is operation-independent. Please read the example at the beginning of the proposal for an explanation of what XMLFrag does.

I think it’s nifty, but I am not sure how much need for this there is. Are the composition/mash-up scenarios that this allows important enough to replace well established alternatives, such as WS-Management’s “FragmentTransfer” header? I am not sure. Which is why I am putting this out, to see if there’s any interest.

One possible approach would be to generalize WS-Management’s current mechanism to support the features presented here. This could be done very easily and in a backward-compatible way by declaring that the response wrapper (wsman:XmlFragment) is not applied for all dialect but defined in a per-dialect fashion. The currently defined dialects would keep using the wsman:XmlFragment wrapper, but a new dialect could be defined that didn’t use a wrapper and would behave like the XMLFrag element defined in my proposal.

A few additional notes and comments:

This proposal has been called “WS-SubversiveAddressing” by some of my IBM friends who have very definite ideas about what belongs in SOAP headers, what belongs in the body and what headers are appropriate to use as reference properties in an EPR. And this proposal seems to break all these rules. But since the rules seem more inspired by lack of flexibility in the WebSphere message routing/processing capabilities than by true architectural constraints I am not too worried. I would even say that this proposal represents the only kind of header that really make sense to use as reference properties. Headers that sometimes are set by the sender explicitly and sometimes are hard-coded in the EPR used. Using reference properties for headers that only come from reference properties (and aren’t expected to ever be set explicitly by the sender) is a sign of lack of ability of the stack to route messages based on URL. A too-frequent limitation of Java SOAP stacks. But I am digressing…

I didn’t think there was anything worthy of a patent in this, but in these sad times you can’t be sure that someone is not going to try to get one for it, so just to be sure HP published this a year ago in Research Disclosure so that no-one can patent it. If you want to check, it starts on page 627 of the May 2006 Research Disclosure (not available on-line unless you have an account w/ them, but you can order it). So in reality, this document has already been public for a year, but in a pretty hidden form. This post just gives it a little bit more visibility.

This is not a spec. It is just a description of what a spec could do. It doesn’t have normative language, it doesn’t provide formal syntax (pseudo-schema, XSD and/or other) and it doesn’t address some of the details that would be needed for interoperability (e.g. what namespaces declarations are in context for the XPath evaluation).

Why use a car fleet as an example for illustration? For the same reason that the SML spec uses a university (class/student) example. To pick a domain that is different from the expected domain of application of the technology to not invite modeling discussions that are irrelevant to the proposal and to not biaise people’s view of what this could be used for.

Is there a relationship between this and federation efforts? Sort of. This could be very useful when exposed by a a federator, if the model is very hierachical. But it doesn’t work so well for graph-type models. Which is what CMDBF does and which is why CMDBF is coming up with a more graph-oriented approach. But that too is a different topic.

4 Comments

Filed under Everything, SOAP, SOAP header, Specs, Tech, XMLFrag