Category Archives: Tech

The elusive XPath nodeset serialization

I have been involved in various capacity with five different specifications that define a GET (or GET-like) operation that takes as input an XPath expression used to pinpoint the subset of the XML document that should be retrieved (here is a quick history as of a couple of years ago, more has happened since). And I must shamefully admit that all but one are simply impossible to implement in an interoperable way.

That’s because they instruct implementers to return an XPath nodeset in the response SOAP message but say nothing about how to serialize the nodeset. While an XPath nodeset contains the kind of things that make up an XML document, it is not an XML document by itself. There is an infinite number of possible ways to serialized an XPath nodeset into XML. To have any hope of interoperability on this, a serialization algorithm has to be clearly described by the specification. Which hasn’t happened.

Let’s start with WS-ResourceProperties (WS-RP). It has a QueryResourceProperties operation that takes an XPath expression as input. The specification says that “the response MUST contain an XML serialization of the results of evaluating the QueryExpression against the resource properties document“. Great, thanks. The example provided happens to return a nodeset with only one node (a boolean), which is implicitly serialized into the text representation of that boolean. What if there is more than one node in the nodeset? What about other types of nodes?

Moving on to WS-Management, which defines a SOAP header that uses XPath to qualify a WS-Transfer GET request such that it only retrieves a subset of the target XML document. While it does a better job than WS-RP at describing the input (e.g. it specifies the context node and what namespace declarations are in scope for the XPath evaluation) it is even more cavalier than WS-RP in describing the output: “the output (lines 53-55) is like that supplied by a typical XPath processor and might or might not contain XML namespace information or attributes“. By “a typical XPath processor” we should understand MSXML I suppose. But as far as I know a “typical XML processor” doesn’t return XML, it returns language-specific data structures (e.g. a C# or Java object, like a nu.xom.Nodes instance). And here too, the examples only use single-node nodesets.

WS-ResourceTransfer (WS-RT) was supposed to be the convergence of these two efforts, so presumably it would have learned from their mistakes. While it is better written in general than its predecessors, it fails just as badly with regards to specifying the nodeset serialization. And once again, the example provided uses a nodeset with just one node.

And then came the CMDBf query operation which, for some unclear reason, was deemed in need of a built-in XPath transformation of records. As I pointed out in my review of CMDBf 1.0 at the time, this feature was added without taking the pain to define the XML serialization of the resulting nodeset. And there isn’t even an example of the XPath serialization.

It is sad in a way, but the only specification that acknowledges the problem and addresses it came before any of the four above even got started. It is the WSMF (Web Services Management Framework) work that we did at HP, and more specifically the “note on dynamic attributes and meta information” (not available at HP anymore but available from archive.org) . This specification was the first one to define a GET operation that is qualified by an XPath expression. Unlike its successors it also explicitly narrowed down the types of nodes that could be selected (“The manager MUST NOT send as input an XPath statement that returns a nodeset containing nodes other than element, attribute and namespace nodes“). And for those valid types it described how to serialized them in XML (“When a node in the result nodeset is an attribute node, for the sake of the response it is serialized as an element node which has the same name as the name of the original attribute (see example 4 for an illustration). The element is in the same namespace as the namespace the attribute it represents is in. This applies to namespace nodes as well, they are serialized like an attributes in the xmlns namespace“). Turning an attribute into an element of the same QName might not be the smartest thing in retrospect (after all there may be an element by that QName already) but at least we recognized and addressed the problem.

But all is good now, I am told, because XPath 2.0 is here, along with a clean data model and a well-described serialization.

Not so. Anyone wanting to use XPath for a SOAP-based query language still would have to specify a serialization.

The first problem with the W3C serialization is that the XML output method doesn’t work for all nodesets. Try to use it on a nodeset that contains a top-level attribute node and you get error err:SENR0001. And even for the nodesets it accepts, it sometimes returns less-than-useful results. For example, if your XPath is of the form /employee/name/text() and you have four employees, the result will look something like this:

“Joe SmithKathy O’ConnorHelen MartinBrian Jones”

Concatenated text values without separators. I guess W3C is like a department store, they don’t offer complimentary wrapping anymore…

That’s why the nux.xom.xquery.ResultSequenceSerializer class had to define its own wrapping mechanims to produce a useful XML serialization. The API gives you the choice between the W3C_ALGORITHM and the WRAP_ALGORITHM.

Bottom line, and however much some would like to think of it that way, XPath (1 or 2) is not an XML subsetting/transformation mechanism. It could be used to create one (as XSLT does), but you have to do your own plumbing.

In addition to the technical aspects of this discussion, what else can be learned from this sad state of things? The fact that all these specifications define an XPath-driven query mechanism that is simply broken (beyond the simplest use cases) withouth anyone even noticing tells me that there isn’t a real need for full XPath query over SOAP (and I am talking about XPath 1.0, the introduction of XPath 2.0 in CMDBf is even more out there). A way to retrieve individual elements (and maybe text values) is all that is needed for 99% of the use cases addressed by these specifications. Users would be better served (especially in a version 1.0) by specifications that cover the simple case correctly than by overly generic, complex and poorly documented features. There is always time to add features later if the initial specification is successful enough that users encounter its limitations.

3 Comments

Filed under CMDB Federation, CMDBf, Everything, SOAP, Specs, Standards, Tech, W3C, WS-Management, WS-ResourceTransfer, XPath

Between skinny and bloated

Spring’s Rod Johnson writes today about the future he sees for Java Bloatware (his unkind term for Java EE middleware). Of course, as Mr. Spring, he is far from neutral. Of course he is focusing on a certain class of applications (web-centered, mostly greenfield, which is a huge – and sexy – segment, but not in any way the only kind of applications). Of course he underestimates how established technology that works remains used long after it may have ceased to be the optimal solution for new developments. But even taking all that into account, he makes some good points about the proliferation of rarely-used capabilities in Java EE and the associated cost. Most of those points are well understood and are driving the more modular approach taken by Java EE 6. As well as the adoption of OSGi (see here and here for BEA’s example). In addition, as Rod mentions, the JCP now has to share the playground with other framework standardization efforts like SCA.

The most interesting part of Rod’s post from my perspective, is this prediction:

“The market will need to address the gap between Tomcat and WebLogic/WebSphere. Currently an important part of the market is neglected. The majority of Java web applications are most at home on Tomcat. A minority actually want some of the more esoteric functionality of a full-blown application server, such as JCA, or specialized capabilities such as distributed transaction management. But a larger minority need some of the operational and management features of those products, but are not interested in the esoteric APIs and the bloat they bring along with them. As more and more end user companies look to phase out legacy application servers in favor of better suited technologies, there will inevitably be a response to market demand, with products that hit the sweet spot and bridge this gap.”

Right on. This is the second time in a week that we see an acknowledgment of the importance of application manageability coming from SpringSource. Whether this mid-point demand will be met from the top down by a more modular Java EE stack or from the bottom up by building on top of Tomcat (or some non-Java HTTP server) remains to be seen. The two aren’t exclusive either.

I expect that the hosted application frameworks like the recently announced Google App Engine will also aim at that “more than Tomcat, less than J2EE” sweetspot. But the cost/benefit formula of a more full-featured (or “bloated” if you prefer Rod’s terminology) environment might turn out to be different in a “hosted framework” situation.

1 Comment

Filed under Application Mgmt, Everything, IT Systems Mgmt, JMX, Manageability, Spring, Tech

Where will you be when the Semantic Web gets Grid’ed?

I see the tide rising for semantic technologies. On the other hand, I wonder if they don’t need to fail in order to succeed.

Let’s use the Grid effort as an example. By “Grid effort” I mean the work that took place in and around OGF (or GGF as it was known before its merger w/ EGA). That community, mostly made of researchers and academics, was defining “utility computing” and creating related technology (e.g. OGSA, OGSI, GridFTP, JSDL, SAGA as specs, Globus and Platform as implementations) when Amazon was still a bookstore. There was an expectation that, as large-scale, flexible, distributed computing became a more pressing need for the industry at large, the Grid vision and technology would find their way into the broader market. That’s probably why IBM (and to a lesser extent HP) invested in the effort. Instead, what we are seeing is a new approach to utility computing (marketed as “cloud computing”), delivered by Amazon and others. It addresses utility computing with a different technology than Grid. With X86 virtualization as a catalyst, “cloud computing” delivers flexible, large-scale computing capabilities in a way that, to the users, looks a lot like their current environment. They still have servers with operating systems and applications on them. It’s not as elegant and optimized as service factories, service references (GSR), service handle (GSH), etc but it maps a lot better to administrators’ skills and tools (and to running the current code unchanged). Incremental changes with quick ROI beat paradigm shifts 9 times out of 10.

Is this indicative of what is going to happen with semantic technologies? Let’s break it down chronologically:

  1. Trailblazers (often faced with larger/harder problems than the rest of us) come up with a vision and a different way to think about what computers can do (e.g. the “computers -> compute grid” transition).
  2. They develop innovative technology, with a strong theoretical underpinning (OGSA-BES and those listed above).
  3. There are some successful deployments, but the adoption is mostly limited to a few niches. It is seen as too complex and too different from current practices for broad adoption.
  4. Outsiders use incremental technology to deliver 80% of the vision with 20% of the complexity. Hype and adoption ensue.

If we are lucky, the end result will look more like the nicely abstracted utility computing vision than the “did you patch your EC2 Xen images today” cloud computing landscape. But that’s a necessary step that Grid computing failed to leapfrog.

Semantic web technologies can easily be mapped to the first three bullets. Replace “computers -> computer grid” with “documents/data -> information” in the first one. Fill in RDF, RDFS, OWL (with all its flavors), SPARQL etc as counterparts to OGSA-BES and friends in the second. For the third, consider life sciences and defense as niche markets in which semantic technologies are seeing practical adoption. What form will bullet #4 take for semantic technology (e.g. who is going to be the EC2 of semantic technology)? Or is this where it diverges from Grid and instead gets adopted in its “original” form?

1 Comment

Filed under Everything, Grid, HP, IBM, RDF, Research, Semantic tech, Specs, Standards, Tech, Utility computing, Virtualization

Elastra and data center configuration formats

I heard tonight for the first time of a company called Elastra. It sounds like they are trying to address a variation of the data center automation use cases covered by Opsware (now HP) and Bladelogic (now BMC). Elastra seems to be in an awareness-building phase and as far as I can tell it’s working (since I heard about them). They got to me through John’s blog. They are also using the more conventional PR channel (and in that context they follow all the cheesy conventions: you get to “unlock the value”, with “the leading provider” who gives you “a new product that revolutionizes…” etc, all before the end of the first paragraph). And while I am making fun of the PR-talk I can’t help zeroing on this quote from the CEO, who “wanted to pick up where utility computing left off – to go beyond the VM and toward virtualizing complex applications that span many machines and networks”. Does he feels the need to narrowly redefine “utility computing” (who knew that all that time “utility computing” was just referring to a single hypervisor?) as a way to justify the need for the new “cloud” buzzword (you’ll notice that I haven’t quite given up yet, this post is in the “utility computing” category and I still do not have a “cloud” category)?

The implied difference with Opsware and Bladelogic seems to be that while these incumbent (hey Bladelogic, how does it feel to be an “incumbent”?) automate data center management tasks in old boring data centers, Elastra does it in clouds. More specifically “public and private compute clouds”. I think I know roughly what a public cloud is supposed to be (e.g. EC2), but a private cloud? How is that different from a data center? Is a private cloud a data center that has the Elastra management software deployed? In that case, how is automating private clouds with Elastra different from automating data centers with Elastra? Basically it sounds like they don’t want to be seen as competing with Opsware and Bladelogic so they try to redefine the category. Which makes it easier to claim (see above) to be “the leading provider of software for designing, deploying, and managing applications in public and private compute clouds” without having the discovery or change management capabilities of Opsware (or anywhere near the same number of customers).

John seems impressed by their “public cloud” capabilities (I don’t think he has actually tested them yet though) and I trust him on that. Knowing the complexities of internal data centers, I am a lot more doubtful of the “private cloud” claims (at least if I interpret them correctly).

Anyway, I am getting carried away with some easy nitpicking on the PR-talk, but in truth it uses a pretty standard level of obfuscation/hype for this type of press release. Sad, I know.

The interesting thing (and the reason I started this blog entry in the first place) is that they seem to have created structures to capture system design (ECML) and deployment (EDML) rules. From John’s blog:

“At the core of Elastra’s architecture are the system design specifications called ECML and EDML. ECML is an XML markup language to specify a cloud design (i.e., multiple system design of firewalls, load balancers, app servers, db servers, etc…). The EDML markup provides the provisioning instructions.”

John generously adds “Elastra seems to be the first to have designed their autonomics into a standards language” which seems to assume that anything in XML is a standard. Leaving the “standard” debate aside, an XML format does tend improve interoperability and that’s a good thing.

So where are the specifications for these ECML and EDML formats? I would be very interested in reading them, but they don’t appear to be available anywhere. Maybe that would be a good first step towards making them industry standards.

I would be especially interested in comparing this to what the SML/CML effort is going after. Here are some propositions that need to be validated or disproved. Comparing SML/CML to ECML/EDML could help shade light on them:

  • SML/CML encompasses important and useful datacenter automation use cases.
  • Some level of standardization of cross-domain system design/deployment/management is needed.
  • SML/CML will be too late.
  • SML/CML will try to do too many things at once.

You can perform the same exercise with OVF. Why isn’t OVF based on SML? If you look at the benefits that could be theoretically be derived by that approach (hardware, VM, network and application configuration all in the same metamodel) it tells you about all that is attractive about SML. At the same time, if you look at the fact that OVF is happening while CML doesn’t seem to go anywhere, it tells you that the “from the very top all the way down to the very bottom” approach that SML is going after is very difficult to pull off. Especially with so many cooks in the kitchen.

And BTW, what is the relationship between ECML/EDML and OVF? I’d like to find out where the Elastra specifications land in all this. In the worst case, they are just an XML rendering of the internals of the Elastra application, mixing all domains of the IT stack. The OOXML of data center automation if you want. In the best case, it is a supple connective tissue that links stiffer domain-specific formats.

[UPDATED 2008/3/26: Elastra’s “introduction to elastic programing” white paper has a few words about the relationship between OVF and EDML: “EDML builds on the foundation laid by Open Virtual Machine Format (OVF) and extends that language’s capabilities to specify ways in which applications are deployed onto a Virtual Machine system”. Encouraging, if still vague.]

[UPDATED 2008/3/31: A week ago I hadn’t heard of Elastra and now I learn that I had been tracking the blog of its lead-architect-to-be all along! Maybe Stu will one day explain what a “private cloud” is. His description of his new company seems to confirm my impression that they are really focused (for now at least) on “public clouds” and not the Opsware-like “private clouds” automation capabilities. Maybe the “private clouds” are just in the business plan (and marketing literature) to be able to show a huge potential markets to VCs so they pony up the funds. Or maybe they really plan to go after this too. Being able to seamlessly integrate both (for mixed deployments) is the holly grail, I am just dubious that focusing on this rather than doing one or the other “right” is the best starting point for a new company. My guess is that despite the “private cloud” talk, they are really focusing on “public clouds” for now. That’s what I would do anyway.]

[UPDATED on 2008/6/25: Stephen O’Grady has an interesting post about the role of standards in Cloud computing. But he only looks at it from the perspective of possible standardization of the interfaces used by today’s Cloud providers. A full analysis also needs to include the role, in Cloud Computing, of standards (app runtime standards, IT management standards, system modeling standards, etc…) that started before Cloud computing was big. Not everything in Cloud computing is new. And even less is new about how it will be used. Especially if, as I expect, utility computing and on-premise computing are going to become more and more intertwined, resulting in the need to manage them as a whole. If my app is deployed at Amazon, why doesn’t it (and its hosts) show up in my CMDB and in my monitoring panel? As Coté recently wrote, “as the use of cloud computing for an extension of data centers evolves, you could see a stronger linking between Hyperic’s main product, HQ and something like Cloud Status”.]

9 Comments

Filed under Automation, CML, Everything, IT Systems Mgmt, OVF, SML, Tech, Utility computing, Virtualization

IT management for the personal CIO

In the previous post, I described how one can easily run their own web applications to beneficially replace many popular web sites. It was really meant as background for the present article, which is more relevant to the “IT management” topic of this blog.

Despite my assertion that recent developments (and the efforts of some hosting providers) have made the proposition of running your own web apps “easy”, it is still not as easy as it should be. What IT management tools would a “personal CIO” need to manage their personal web applications? Here are a few scenarios:

  • get a catalog of available applications that can be installed and/or updated
  • analyze technical requirements (e.g. PHP version) of an application and make sure it can be installed on your infrastructure
  • migrate data and configuration between comparable applications (or different versions of the same application)
  • migrate applications from one hosting provider to another
  • back-up/snapshot data and configuration
  • central access to application stats/logs in simple format
  • uptime, response time monitoring
  • central access to user management (share users and configure across all your applications)
  • domain name management (registration, renewal)

As the CIO of my personal web applications, I don’t need to see Linux patches that need to be applied or network latency problems. If my hosting provider doesn’t take care of these without me even noticing, I am moving to another provider. What I need to see are the controls that make sense to a user of these applications. Many of the bullet listed above correspond to capabilities that are available today, but in a very brittle and hard-to-put-together form. My hosting provider has a one-click update feature but they have a limited application catalog. I wouldn’t trust them to measure uptime and response time for my sites, but there are third party services that do it. I wouldn’t expect my hosting provider to make it easy to move my apps to a new hosting provider, but it would be nice if someone else offered this. Etc. A neutral web application management service for the “personal CIO” could bring all this together and more. While I am at it, it could also help me backup/manage my devices and computers at home and manage/monitor my DSL or cable connection.

1 Comment

Filed under Everything, IT Systems Mgmt, Portability, Tech

My web apps and me

Registering a domain name: $10 per year
Hosting it with all the features you may need: $80 per year
Controlling your on-line life: priceless

To be frank, the main reason that I do not use Facebook or MySpace is that I am not very social to start with. But, believe it or not, I have a few friends and family member with whom I share photos and personal stories. Not to mention this blog for different kinds of friends and different kinds of stories (you are missing out on the cute toddler photos).

Rather than doing so on Facebook, MySpace, BlogSpot, Flickr, Picasa or whatever the Microsoft copies of these sites are, I maintain a couple of blogs and on-line photo albums on vambenepe.com. They all provide user access control and RSS-based syndication so no-one has to come to vambenepe.com just to check on them. No annoying advertising, no selling out of privacy and no risk of being jerked around by bait-and-switch (or simply directionless) business strategies (“in order to serve you better, we have decided that you will no longer be able to download the high-resolution version of your photos, but you can use them to print with our approved print-by-mail partners”). Have you noticed how people usually do not say “I use Facebook” but rather “I am on Facebook” as if riding a mechanical bull?

The interesting thing is that it doesn’t take a computer genius to set things up in such a way. I use Dreamhost and it, like similar hosting providers, gives you all you need. From the super-easy (e.g. they run WordPress for you) to the slightly more personal (they provide a one-click install of your own WordPress instance backed by your own database) to the do-it-yourself (they give you a PHP or RoR environment to create/deploy whatever app you want). Sure you can further upgrade to a dedicated server if you want to install a servlet container or a CodeGears environment, but my point is that you don’t need to come anywhere near this to own and run your own on-line life. You never need to see a Unix shell, unless you want to.

This is not replacing Facebook lock-in with Dreamhost lock-in. We are talking about an open-source application (WordPress) backed by a MySQL database. I can move it to any other hosting provider. And of course it’s not just blogging (WordPress) but also wiki (MediaWiki), forum (phpBB), etc.

Not that every shinny new on-line service can be replaced with a self-hosted application. You may have to wait a bit. For example, there is more to Facebook than a blog plus photo hosting. But guess what. Sounds like Bob Bickel is on the case. I very much hope that Bob and the ex-Bluestone gang isn’t just going to give us a “Facebook in a box” but also something more innovative, that makes it easy for people to run and own their side of a Facebook-like presence, with the ability to connect with other implementations for the social interactions.

We have always been able to run our own web applications, but it used to be a lot of work. My college nights were soothed by the hum of an always-running Linux server (actually a desktop used as a server) under my desk on which I ran my own SMTP server and HTTPd. My daughter’s “soothing ocean waves” baby toy sounds just the same. There were no turnkey web apps available at the time. I wrote and ran my own Web-based calendar management application in Python. When I left campus, I could have bought some co-locating service but it was a hassle and not cheap, so I didn’t bother [*].

I have a lot less time (and Linux administration skills) now than when I left university, so how come it is now attractive for me to run my own web apps again? What changed in the environment?

The main driver is the rise of the LAMP stack and especially PHP. For all the flaws of the platform and the ugliness of the code, PHP has sparked a huge ecosystem. Not just in terms of developers but also of administrators: most hosting providers are now very comfortable offering and managing PHP services.

The other driver is the rise of virtualization. Amazon hosts Xen images for you. But it’s not just the hypervisor version of virtualization. My Dreamhost server, for example, is not a Xen or VMWare virtual machine. It’s just a regular server that I share with other users but Dreamhost has created an environment that provides enough isolation from other users to meet my needs as an individual. The poor man’s virtualization if you will. Good enough.

These two trends (PHP and virtualization) have allowed Dreamhost and others to create an easy-to-use environment in which people can run and deploy web applications. And it becomes easier every day for someone to compete with Dreamhost on this. Their value to me is not in the hardware they run. It’s in environment they provide that prevents me from having to do low-level LAMP administration that I don’t have time for. Someone could create such an environment and run it on top of Amazon’s utility computing offering. Which is why I am convinced that such environments will be around for the foreseeable future, Dreamhost or no Dreamhost. Running your own web applications won’t be just for geeks anymore, just like using a GPS is not just for the geeks anymore.

Of course this is not a panacea and it won’t allow you to capture all aspects of your on-line life. You can’t host your eBay ratings. You can’t host your Amazon rank as a reviewer. It takes more than just technology to break free, but technology has underpinned many business changes before. In addition to the rise of LAMP and virtualization already mentioned, I am watching with interest the different efforts around data portability: dataportability.org, OpenID, OpenSocial, Facebook API… Except for OpenID, these efforts are driven by Web service providers hoping to canalize the demand for integration. But if they are successful, they should give rise to open source applications you can host on your own to enjoy these services without the lock-in. One should also watch tools like WSO2’s Mashup Server and JackBe Presto for their potential to rescue captive data and exploit freed data. On the “social networks” side, the RDF community has been buzing recently with news that Google is now indexing FOAF documents and exposing the content through its OpenSocial interface.

Bottom line, when you are offered to create a page, account or URL that will represent you or your data, take a second to ask yourself what it would take to do the same thing under your domain name. You don’t need to be a survivalist freak hiding in a mountain cabin in Montana (“it’s been eight years now, I wonder if they’ve started to rebuild cities after the Y2K apocalypse…”) to see value in more self-reliance on the web, especially when it can be easily achieved.

Yes, there is a connection between this post and the topic of this blog, IT management. It will be revealed in the next post (note to self: work on your cliffhangers).

[*] Some of my graduating colleagues took their machines to the dorm basement and plugged them into a switch there. Those Linux Slackware machines had amazing uptimes of months and years. Their demise didn’t come from bugs, hacking or component failures (even when cats made their litter inside a running computer with an open case) but the fire marshal, and only after a couple of years (the network admins had agreed to turn a blind eye).

[UPDATED 2008/7/7: Oh, yeah, another reason to run your own apps is that you won’t end up threatened with jail time for violating the terms of service. You can still end up in trouble if you misbehave, but they’ll have to charge you with something more real, not a whatever-sticks approach.]

[UPDATED 2009/12/30: Ringside (the Bob Bickel endeavor that I mention above), closed a few months after this post. Too bad. We still need what they were working on.]

2 Comments

Filed under Everything, Portability, Tech, Virtualization

WSO2 Mashup Server

I see that WSO2 has just released version 1.0 of their Mashup Server. Congratulations to Jonathan and the rest of the team. I haven’t played with the earlier betas of the Mashup Server but I have read enough about it to be interested. Now that it’s been released, it might be a good time to invest a few hours to look into it (I just downloaded it and I filled a small documentation bug already). I know (and like) many of the WSO2 guys (Jonathan, but also Sanjiva and Glen) from the early days of the W3C WSDL working group. Plus, you have to give credit to a company that offers visibility on its web site not just to its board and management team but also to its engineers.

But the Mashup Server is not interesting to me just because I know some of its authors. There are tow more important reasons. One is that it is the integration product in WSO2’s portfolio that is the most different in its approach from the many integration products in Oracle Fusion Middleware. We want Oracle Enterprise Manager to do an outstanding job at managing Oracle Fusion Middleware, but we also want it to manage other integrations approaches as well (we manage Tomcat for example). At this point there is of course no market demand for managing WSO2’s Mashup Server, but from an architectural perspective it’s a good alternative to keep in mind along with the BPEL, ESB, ODI, etc that are already in heavy use. I am always interested in perspectives that help make sure that the most abstract application/service management concepts remain suitably abstracted, so learning a bit about the Mashup Server can’t hurt. I’ll know more once I’ve looked at it, but my impression is that the Mashup Server is somewhere between BPEL and Ruby on Rails (or TurboGears) in terms of declarativity and introspectability (yes I like to make up words) for management purposes.

This may well be sweet spot and it’s my second reason for being interested in the Mashup Server. I am always interested in tools that help with quick prototyping and the best tool is different for each job. The Mashup Server is pretty unique and I can imagine it being a nice tool for some management integration prototypes once the participating services have been suitably XML-ized (something that that Oracle Fusion Middleware makes easy).

Interestingly, the release of this JavaScript-based platform comes on the same day that Joe Gregorio declares JavaScript to be the new SmallTalk.

Comments Off on WSO2 Mashup Server

Filed under Everything, JavaScript, Mashup, Mgmt integration, Tech, WSO2

Top 10 lists and virtualization management

Over the last few months, I have seen two “top 10” lists with almost the same title and nearly zero overlap in content. One is Network World’s “10 virtualization companies to watch” published in August 2007. The other is CIO’s “10 Virtualization Vendors to Watch in 2008” published three months later. To be precise, there is only one company present in both lists, Marathon Technologies. Congratulations to them (note to self: hire their PR firm when I start my own company). Things are happening quickly in that field, but I doubt the landscape changed drastically in these three months (even though the announcement of Oracle’s Virtual Machine product came during that period). So what is this discrepancy telling us?

If anything, this is a sign of the immaturity of the emerging ecosystem around virtualization technologies. That being said, it could well be that all this really reflects is the superficiality of these “top 10” lists and the fact that they measure PR efforts more than any market/technology fact (note to self: try to become less cynical in 2008) (note to self: actually, don’t).

So let’s not read too much into the discrepancy. Less striking but more interesting is the fact that these lists are focused on management tools rather than hypervisors. It is as if the competitive landscape for hypervisors was already defined. And, as shouldn’t be a surprise, it is defined in a way that closely mirrors the operating system landscape, with Xen as Linux (the various Xen-based offerings correspond to the Linux distributions), VMWare as Solaris (good luck) and Microsoft as, well Microsoft.

In the case of Windows and Hyper-V, it is actually bundled as one product. We’ll see this happen more and more on the Linux/Xen side as well, as illustrated by Oracle’s offering. I wouldn’t be surprised to see this bundling so common that people start to refer to it as “LinuX” with a capital X.

Side note: I tried to see if the word “LinuX” is already being used but neither Google nor Yahoo nor MSN seems to support case-sensitive searching. From the pre-Google days I remember that Altavista supported it (a lower-case search term meant “any capitalization”, any upper-case letter in the search term meant “this exact capitalization”) but they seem to have dropped it too. Is this too computationally demanding at this scale? Is there no way to do a case-sensitive search on the Web?

With regards to management tools for virtualized environments, I feel pretty safe in predicting that the focus will move from niche products (like those on these lists) that deal specifically with managing virtualization technology to the effort of managing virtual entities in the context of the overall IT management effort. Just like happened with security management and SOA management. And of course that will involve the acquisition of some of the niche players, for which they are already positioning themselves. The only way I could be proven wrong on such a prediction is by forecasting a date, so I’ll leave it safely open ended…

As another side note, since I mention Network World maybe I should disclose that I wrote a couple of articles for them (on topics like model-based management) in the past. But when filtering for bias on this blog it’s probably a lot more relevant to keep in mind that I am currently employed by Oracle than to know what journal/magazine I’ve been published in.

Comments Off on Top 10 lists and virtualization management

Filed under Everything, IT Systems Mgmt, Linux, Microsoft, Oracle, OVM, Tech, Virtualization, VMware, XenSource

How not to re-use XML technologies

I like XML. Call me crazy but I find it relatively easy to work with. Whether it is hand-editing an XML document in a text editor, manipulating it programmatically (as long as you pick a reasonable API, e.g. XOM in Java), transforming it (e.g. XSLT) or querying an XML back-end through XPath/XQuery. Sure it carries useless features that betray its roots in the publishing world (processing instructions anyone?), sure the whole attribute/element overlap doesn’t have much value for systems modeling, but overall it hits a good compromise between human readability and machine processing and it has a pretty solid extensibility story with namespaces.

In addition, the XML toolbox of specifications is very large and offers standard-based answers to many XML-related tasks. That’s good, but when composing a solution it also means that one needs to keep two things in mind:

  • not all these XML specifications are technically sound (even if they carry a W3C stamp of approval), and
  • just because XML’s inherent flexibility lets one stretch a round hole, it doesn’t mean it’s a good idea to jam a square peg into it.

The domain of IT management provides examples for both of these risks. These examples constitute some of the technical deficiencies of management-related XML specifications that I mentioned in the previous post. More specifically, let’s look at three instances of XML mis-use that relate to management-related specifications. We will see:

  • a terrible XML specification that infects any solution it touches (WS-Addressing, used in WS-Management),
  • a mediocre XML specification that has plenty of warts but can be useful for a class of problems, except in this case it isn’t (XSD, used in SML), and
  • a very good XML specification except it is used in the wrong place (XPath, used in CMDBf).

Let’s go through them one by one.

WS-Addressing in WS-Management

The main defect of WS-Management (and of WSDM before it) is probably its use of WS-Addressing. SOAP needs WS-Addressing like a migraine patient needs a bullet in the head (actually, four bullets in the head since we got to deal with four successive versions). SOAP didn’t need a new addressing model, it already had URIs. It just needed a message correlation mechanism. But what we got is many useless headers (like wsa:Action) and the awful EPR construct which solves a problem that didn’t exist and creates many very real new ones. One can imagine nifty hacks that would be enabled by a templating mechanism for SOAP (I indulged myself and sketched one to facilicate mash-up style integrations with SOAP) but if that’s what we’re after then there is no reason to limit it to headers.

XSD in SML

The words “Microsoft” and “bully” often appear in the same sentence, but invariably “Microsoft” is the subject not the object of the bullying. Well, to some extent we have a reverse example here, as unlikely as it may seem. Microsoft created an XML-based meta-model called SDM that included capabilities that looked like parts of XSD. When they opened it up to the industry and floated the idea of standardizing it, they heard back pretty loudly that it would have to re-use XSD rather than “re-invent” it. So they did and that ended up as SML. Except it was the wrong choice and in retrospect I think it would have been better to improve on the original SDM to create a management-specific meta-model than swallow XSD (SML does profile out a few of the more obscure features of XSD, like xs:redefine, but that’s marginal). Syntactic validation of documents is very different from validation of IT models. Of course this may all be irrelevant anyway if SML doesn’t get adopted, which at this point still looks like the most likely outcome (due to things like the failure of CML to produce any model element so far, the ever-changing technical strategy for DSI and of course the XSD-induced complexity of SML).

XPath in CMDBf

I have already covered this in my review of CMDBf 1.0. The main problem is that while XML is a fine interchange format for the CMDBf specification, one should not assume that it is the native format of the data stores that get connected. Using XPath as a selector language makes life difficult for those who don’t use XML as their backend format. Especially when it is not just XPath 1.0 but also the much more complex XPath 2.0. To make matters worse, there is no interoperable serialization format for XPath 1.0 nodesets, which will prevent any kind of interoperability on this. That omission can be easily fixed (and I am sure it will be fixed in DMTF) but that won’t address the primary concern. In the context of CMDBf, XPath/XQuery is an excellent implementation choice for some situations, but not something that should be pushed at the level of the protocol. For example, because XPath is based on the XML model, it has clear notions of order of elements. But what if I have an OO or an RDF-based backend? What am I to make of a selector that says that the “foo” element has to come after the “bar” element? There is no notion of order in Java attributes and/or RDF properties.

Revisionism?

My name (in the context of my previous job at HP) appears in all three management specifications listed above (in increasing level of involvement as contributor for WS-Management, co-author for SML and co-editor for CMDBf) so I am not a neutral observer on these questions. My goal here is not to de-associate myself from these specifications or pick and choose the sections I want to be associated with (we can have this discussion over drinks if anyone is interested). Some of these concerns I had at the time the specifications were being written and I was overruled by the majority. Other weren’t as clear to me then as they are now (my view of WS-Addressing has moved over time from “mostly harmless” to “toxic”). I am sure all other authors have a list of things they wished had come out differently. And while this article lists deficiencies of these specifications, I am not throwing the baby with the bathwater. I wrote recently about WS-Management’s potential for providing consistency for resource manageability. I have good hopes for CMDBf, now in the DTMF, not necessarily as a federation technology but as a useful basis for increased interoperability between configuration repositories. SML has the most dubious fate at this time because, unlike the other two, it hasn’t (yet?) transcended its original supporter to become something that many companies clearly see fitting in their plans.

[UPDATED 2008/3/27: For an extreme example of purposely abusing XML technologies (namely XPath in that case) in a scenario in which it is not the right tool for the job (graph queries), check out this XPath brain teasers article.]

4 Comments

Filed under CMDB Federation, CMDBf, Everything, IT Systems Mgmt, Microsoft, SML, SOAP, SOAP header, Specs, Standards, Tech, WS-Management, XOM

Oracle has joined the VM party

On the occasion of the introduction of the Oracle Virtual Machine (OVM) at Oracle World a couple of weeks ago, here are a few thoughts about virtual machines in general. As usual when talking about virtualization (see the OVF review), I come to this mainly from a systems management perspective.

Many of the commonly listed benefits of VMWare-style (I guess I can also now say OVM-style) virtualization make perfect sense. It obviously makes it easier to test on different platforms/configurations and it is a convenient (modulo disk space availability) way to distribute ready-to-use prototypes and demos. And those were, not surprisingly, the places where the technology was first used when it appeared on X86 platforms many years ago (I’ll note that the Orale VM won’t be very useful for the second application because it only runs on bare metal while in the demo scenario you usually want to be able to run it on the host OS that normally runs you laptop). And then there is the server consolidation argument (and associated hardware/power/cooling/space savings) which is where virtualization enters the data center, where it becomes relevant to Oracle, and where its relationship with IT management becomes clear. But the value goes beyond the direct benefits of server consolidation. It also lies in the additional flexibility in the management of the infrastructure and the potential for increased automation of management tasks.

Sentences that contains both the words “challenge” and “opportunity” are usually so corny they make me cringe, but I’ll have to give in this one time: virtualization is both a challenge and an opportunity for IT management. Most of today’s users of virtualization in data centers probably feel that the technology has made IT management harder for them. It introduces many new considerations, at the same time technical (e.g. performance of virtual machines on the same host are not independent), compliance-related (e.g. virtualization can create de-facto super-users) and financial (e.g. application licensing). And many management tools have not yet incorporated these new requirements, or at least not in a way that is fully integrated with the rest of the management infrastructure. But in the longer run the increased uniformity and flexibility provided by a virtualized infrastructure raise the ability to automate and optimize management tasks. We will get from a situation where virtualization is justified by statements such as “the savings from consolidation justify the increased management complexity” to a situation where the justification is “we’re doing this for the increased flexibility (through more automated management that virtualization enables), and server consolidation is icing on the cake”.

As a side note, having so many pieces of the stack (one more now with OVM) at Oracle is very interesting from a technical/architectural point of view. Not that Oracle would want to restrict itself to managing scenarios that utilize its VM, its OS, its App Server, its DB, etc. But having the whole stack in-house provides plenty of opportunity for integration and innovation in the management space. These capabilities also need to be delivered in heterogeneous environments but are a lot easier to develop and mature when you can openly collaborate with engineers in all these domains. Having done this through standards and partnerships in the past, I am pleased to be in a position to have these discussions inside the same company for a change.

1 Comment

Filed under Everything, IT Systems Mgmt, Oracle, Oracle Open World, OVM, Tech, Virtualization, VMware

Illustrative algorithm for CMDBf 1.0 Query operation

When I posted an algorithm for the server side implementation of a CMDBf Query call for version 0.95 of the specification, the interoperability testing session based on that version was over and I was pretty sure no-one but those of us who participated in that session would write an implementation of 0.95. But I published the algorithm anyway since I thought it was helpful to anyone who wanted to understand the specification in depth, even if they were not implementing it. Now that 1.o is out, there is a much higher probability of people implementing the specification, so I figured it would be worth updating the algorithm to take into account the changes from 0.95 to 1.0. So here it is.

One caveat. This algorithm assumes that the query request does not make use of the xpathExpression element because, as I have explained in my review of CMDBf 1.0, I don’t think interoperability is achievable on this feature in the current state of the specification.

As a note of caution, the previous version of the algorithm was backed by my implementation of CMDBf 0.95 for the interoperability testing, so I felt pretty confident about it. For this version of the algorithm I have not written a corresponding implementation and I have not done interoperability testing with anyone, it’s just based on my reading of the specification. The handling of depthLimit in particular is a little tricky and needs to be validated by implementation (what with creating a bunch of dummy item and relationship templates with temporary names and later going back to the original template names), please let me know if you find it flawed.

And, as previously, this is in no way an optimal implementation strategy. It is the most direct and obvious set of steps that I can come up with to implement the Query call in a way that exactly conforms to the specification. There are lots of ways to make this go faster, such as the ones I mentioned in a previous post (e.g. breaking out of loops once an instance has been removed, or not recalculating L1 and L2 over and over again for relationships in the same working set that share a source/target) plus new ones such as being smarter than my brute-force approach to handling depthLimit (in step 2).

All this said, here is the algorithm:

1) for each itemTemplate, calculate the set of all items (including relationships since they are a subclass of item) that obey the instanceIdConstraint and recordConstraint elements in the template (if present). Call this the working set for the itemTemplate.
2) for each relationshipTemplate RT that has a depthLimit element:

2.1) for i ranging from 1 to the value of maxIntermediateItems for RT:

2.1.1) create an itemTemplate that is an exact copy of the itemTemplate referenced by RT’s sourceTemplate, except that it has a new, unique temporary id (keep a record linking that new id to the id of the original source itemTemplate).
2.1.2) create an itemTemplate that is an exact copy of the itemTemplate referenced by RT’s targetTemplate, except that it has a new, unique, temporary id (keep a record linking that new id to the id of the original target itemTemplate).
2.1.3) for j ranging 1 from i:

2.1.3.1) create an itemTemplate that is an exact copy of the itemTemplate referenced by RT’s intermediateItemTemplate, except that it has a new, unique, temporary id (keep a record linking that new id to the id of the original intermediary itemTemplate).
2.1.3.2) create a relationshipTemplate that is an exact copy of RT, except that its source is the itemTemplate created in the previous iteration of the current loop (or the itemTemplate created in step 2.1.1 if j=1), its target is the itemTemplate created in the previous step and it has a new, unique, temporary id (keep a record linking that new id to RT’s id).

2.1.4) create a relationshipTemplate that is an exact copy of RT, except that its source is the last itemTemplate created in the 2.1.3 loop, its target is the itemTemplate created in 2.1.2 and it has a new, unique, temporary id (keep a record linking that new id to RT’s id).

3) for each relationshipTemplate calculate the set of all relationships that obey the instanceIdConstraint and recordConstraint elements in the template (if present). Call this the working set for the relationshipTemplate.
4) set need_to_loop = true
5) while (need_to_loop == true)

5.1) set need_to_loop = false
5.2) for each relationshipTemplate RT

5.2.1) let ITsource be the itemTemplate that is referenced as sourceTemplate by RT. Calculate the set of all items (including relationships since they are a subclass of item) that obey at least one of the instanceIdConstraint elements in ITsource (assuming there is at least one such element) and all the recordConstraint elements in ITsource. Call this the working set for ITsource.
5.2.2) let ITtarget be the itemTemplate that is referenced as targetTemplate by RT. Calculate the set of all items (including relationships since they are a subclass of item) that obey at least one of the instanceIdConstraint elements in ITtarget (assuming there is at least one such element) and all the recordConstraint elements in ITtarget. Call this the working set for ITtarget.
5.2.3) for each relationship R in the working set for RT

5.2.3.1) if the source of R is not in the working set for ITsource, then remove R from the RT working set
5.2.3.2) if the target of R is not in the working set for ITtarget, then remove R from the RT working set
5.2.3.3) if RT has a source/@minimum or a source/@maximum attribute

5.2.3.3.1) find the list L1 of all relationships in the working set for RT that have the same source as R
5.2.3.3.2) if RT has source/@minimum and the cardinality of L1 is less than this minimum then remove all relationships in L1 from the RT working set
5.2.3.3.3) if RT has source/@maximum and the cardinality of L1 is more than this maximum then remove all relationships in L1 from the RT working set

5.2.3.4) if RT has a target/@minimum or a target/@maximum attribute

5.2.3.4.1) find the list L2 of all relationships in the working set for RT that have the same target as R
5.2.3.4.2) if RT has target/@minimum and the cardinality of L2 is less than this minimum then remove all relationships in L2 from the RT working set
5.2.3.4.3) if RT has target/@maximum and the cardinality of L2 is more than this maximum then remove all relationships in L2 from the RT working set

5.3) for each itemTemplate IT:

5.3.1) let sourceRTset be the set of all relationshipTemplates that references IT as its sourceTemplate
5.3.2) let targetRTset be the set of all relationshipTemplates that references IT as its targetTemplate
5.3.3) for each item I in the IT working set

5.3.3.1) for each relationshipTemplate sourceRT in sourceRTset, if there is no relationship in the working set for sourceRT that uses I as its source, remove I from the IT working set and set need_to_loop to true
5.3.3.2) for each relationshipTemplate targetRT in targetRTset, if there is no relationship in the working set for targetRT that uses I as its source, remove I from the IT working set and set need_to_loop to true

6) process the eventual contentSelector elements and/or the @suppressFromResult attributes on the templates that have matching items/relationships in the response to remove or pair down items and relationships as requested
7) package the resulting items and relationships in a way that conforms to the CMDBf response message format (including putting each item in the <nodes> element with the appropriate @templateId attribute and putting each relationship in the <edges> element with the appropriate @templateId).
8) replace all the temporary template ids (from step 2) that appear in templateId attributes in the response with the original ids of the items and template based on the records that were kept in step 2.

Just to clarify things, what I do in step 2 is simply make explicit all the itemTemplates and relationshipTemplates that are made implicit by the depthLimit element, so that we can provide with a simpler algorithm after that assumes that all relationshipTemplate correspond to direct relationships (no intermediary). And in step 8 I hide the fact that this took place.

[UPDATED 2009/5/1: For some reason this entry is attracting a lot of comment spam, so I am disabling comments. Contact me if you’d like to comment.]

8 Comments

Filed under CMDB, CMDB Federation, CMDBf, Graph query, IT Systems Mgmt, Pseudo-algorithm, Query, Specs, Standards, Tech

The Oslo accords (presumably between composite application modeling and systems management)?

Microsoft introduced an umbrella project called Oslo at their SOA and Business Process conference this week. There is very little information available but it seems to have two main components: improving the ability of the Microsoft platform to support SOA-style distributed applications and improving the use of models to develop and manage applications. At first sight there isn’t anything new. The SOA talk is similar to any number of “why SOA” presentations available from dozens of companies. And the modeling aspect is the same story that Microsoft has been pitching with DSI for years. The real news is that the two stories are being linked (at least at the marketing level, which is a starting point) and that the application development people have taken over the application modeling baton from the System Center group.

Over the last few years, I worked with people from System Center on different standards related to DSI, including SML which they see as the heart of the modeling effort. One of the things that kept me skeptical when hearing the DSI pitch, was to see the System Center team making announcement and promises about how SML would be central to the development experience in Visual Studio. I am pretty sure I know who’s the gorilla and who’s the chimp at Microsoft between Visual Studio / .Net Framework on the one hand and System Center on the other. The application model is too central to the developer experience for the Visual Studio group not to own it. It looks like it’s now happening and it’s a good thing.

The only content I could find on Oslo that’s not PR fluff is a report from Directions on Microsoft which mostly talks about incremental improvements to BizTalk. Towards the end, there is a small section about a “repository” that will “provide centralized storage of composite application components”. At that point I can’t help remembering the blog post from David Chappell about why it wouldn’t make sense for Microsoft to support SCA. Through comments in his post as well as a blog post of my own, I followed-up with the assertion that the application component model also plays a very important role for management. And at the risk of sounding self-congratulatory, the Oslo announcement seems vindicate that view. I see that David was a speaker at the Microsoft conference where Oslo was announced and he has very good insights into both the application developement and the systems management efforts at Microsoft. So hopefully he’ll soon have a white paper or a blog entry out to share some insights.

If you’re wondering what this means for the technical work that has been going on under the DSI umbrella so far, you can only read the tea leaves. It could be that the application development people adopted the whole SML/CML technology stack as promoted by their System Center colleagues and are going to use it as is. Or on the other extreme, it could be a complete reset that leads to the creation of a component model that is much less general and much more application-centric. Of course, no matter which one happens (or something in the middle), it will be presented as a perfectly smooth and controlled evolution of the DSI vision (get ready for some nice spin at MMS2008). If you are adopting SML because you expect Microsoft to base its application component model on it, you might want to wait a bit until more details emerge about Oslo. For example, after calling XSD a schema language that attempts to be a floor wax, dessert topping, and personal lubricant all at the same time” you have to wonder whether Don Box would advocate to use SML (80% of which is XSD) as the most effective metamodel for an application component model…

Let’s end with this quote from the Directions on Microsoft report on Oslo, regarding application integration: “SAP and Oracle are better positioned in that regard, and so their customers will want to investigate these vendors’ composite application platforms along side Microsoft’s”. Can’t disagree with that. A good place to start this investigation would be the upcoming Oracle Open World.

3 Comments

Filed under IT Systems Mgmt, Microsoft, Oslo, SCA, SML, Tech

Review of the CMDBf specification version 1.0

Having read the recently released CMDBf 1.0 specification over the weekend, I see several improvements since 0.95, including:

  • the introduction of depthLimit
  • the lastModified metadata element
  • the ability to specify more than one instanceId in a template
  • the ability to advertise what parts of the specification you implement
  • the definition of faults

But while 1.0 is more complete than 0.95, I think it makes it harder to achieve interoperability. Here are the main friction points for interop:

New role for XPath

The xpathExpression element (which replaces xpath1Selector) changes in two very important ways. First, rather than being limited to XPath 1.0, it now also allows XPath 2.0. Support for this is a lot harder to achieve for people who don’t use XML as the backend format for their data. Considering the current state of adoption of XPath 2.0 and the low level of XML complexity exposed by most CMDB models, I don’t think it was opportune to bring this into CMDBf yet. And my guess is that most implementations will stay away from this. But there is a second change, less obvious but even more problematic. XPath is not just another constraint mechanism for a CMDBf template anymore, one that returns a boolean result indicating whether the instance meets the constraint or not, as it used to be in 0.95. It is now an alternative selection and filtering mechanism that lives in parallel to all the other elements in a template (and can’t mix with them). Overall, I think this change goes too far in the direction of turning a shared agreement to exchange data in XML into an assumption that the internal data models are all based on XML. And the killer with regards to interoperability is that the specification says nothing about how the resulting node sets are serialized in the response. There may be a serialization for the XPath 2.0 model, but there is no such thing for XPath 1.0 and I don’t see in the current state of the specification how two implementations have any chance to interoperate when using this feature.

Introduction of linkDepth

As I mentioned earlier, linkDepth is a very useful addition (even though it pales in comparison to the inferencing capabilities that could have been derived from basing CMDBf on RDF). But it is also a complicated feature. The intermediateItemTemplate attribute is a good re-use of the existing plumbing, but it needs at least a detailed example. I trust that the group will generate one once they’ve caught their breath from putting out the specification.

Service capability metadata

There is a new section (#6) to provide ways to describe what CMDBf features an implementation supports. But it is a very granular representation. Basically, for every feature you can describe if you support it or not. So someone may describe that they support everything inside propertyValue, except for the “like” operator. And someone else might support all the operators but not the caseSensitive modifier. That might be ok for human consumptions, but automated scenarios rely on pre-programmed queries and that is made very hard by all the possible combinations of options. What we need is a few well-defined profiles that people implement fully. Starting of course with a profile that rules out xpathExpression.

Record metadata

This new version introduces metadata on records. While recordId and lastModified are probably well understood and interoperably usable I am a bit more dubious about whether baselineId and snapshotId are going to be interoperable across vendors based on their limited description in the specification. The nice thing is that this metadata can not only be returned but also searched on. Well, at least that’s the intent. But this goes through the recordMetadata attribute on propertyValue which, while present in the pseudo-schema, is missing in the XSD…

The contentSelector element

This new element is more flexible that the propertySubsetDirective element that it replaces. In addition to specifying what properties you want returned it also allows you to specify that you only want certain record types and/or that you only want the record(s) that were used to satisfy constraints in the template. Those are nice additions, but the way the second part is implemented (through the use of the matchedRecords attribute) seems to assume that only one record in the instance was used to match all the constraints in the template. This is not necessarily the case, an instance can be selected by having different records match the different constraints in the template as long as it has at least one matching record per constraint (line 765 says “the item satisfies all the constraints”, not “a record of the item satisfies all the constraints” and you can also see this in the example in section 4.2 where the records mentioned on lines 637 and 639 don’t have to be the same). So do you return all records that have a role in matching the template, or only those (if there is any) that matches all the constraints on their own as the text seems to imply? And if several record combinations inside an instance can be used to match the constraints in a template, do I return all of them or can I just pick any subset that matches? Also, how can I say that I want all records that established the template match, independently of their type? There doesn’t seem to be a way to do this, or is it by putting a contentSelector element with no child element and the matchedRecords attribute set to false? There won’t be much interoperability on this feature until all this is clarified.

Relationships as items

A major change between 0.95 and 1.0 is that now a relationship can match an itemTemplate. For example, if you ask for all items that were modified during the last 24 hours you will get all the items and all the relationships that meet that criteria while in the previous version you’d have to explicitly request the relationships with a relationshipTemplate if you wanted to get them too). There is a good case to be made for either view and the one that works best largely depends on your backend implementation technology (RDF, objects, SQL, CIM…). But the important thing is for the spec to be clear and on this point I think the change wasn’t made explicit enough in the query section of the specification. If Van hadn’t called my attention to this on his blog, I would have missed this important change.

Security boilerplate

There is a person at IBM (probably located in a well-stoked underground bunker in upstate NY) who has instilled the fear of god in all IBM employees (at least all those who author publicly available specifications) and forces them to include a boilerplate “security considerations” section everywhere. I have co-authored several documents with IBM employees and it never fails, even thought it doesn’t add anything useful to the specification. You should see the look of fear on the face of the IBM employees when someone else suggests doing without it. We somehow managed to sneak one such slimmer specification past the IBMers with CMDBf 0.95 but I see that this has been “corrected” in 1.0. I hope that whatever painful punishment Scott, Jacob, Andrew and Mark (or their families and pets) were subjected to in the process by the IBM security ogre wasn’t too cruel. Sure, this doesn’t really impact interoperability, but now that I don’t work for a company that makes money from ink anymore, I have even less patience for this bloating.

OK, that’s enough back seat driving for now. Hopefully the standards group that will take over the specification will address all these questions. In the context of the entire specification, these are pretty small issues and mostly easy to fix. And the CMDBf group can go on to address the hard issues of federation (including security-related issues that abound in this field if one really wants to tackle them). The current specification is a useful graph-oriented query language that is a good match for CMDB data. But it’s really just a query language (plus a simple registration system).

[UPDATE: while updating the CMDBf query algorithm, I noticed another small error: maxIntermediateItems is an attribute in the pseudo-schema but an element in the schema. Something else to fix in the next version.]

3 Comments

Filed under CMDB, CMDB Federation, CMDBf, Everything, Graph query, IT Systems Mgmt, ITIL, Query, Specs, Standards, Tech

CMDBf 1.0 specification released

The CMDBf committee has just released version 1.0 of the specification. Van Wiles has an overview of the changes between 0.95 and 1.0. I left HP soon after 0.95 was released and that’s when my participation in CMDBf ended, so Van’s summary is very useful to me. The changes he lists are not surprising and some of them already existed in draft form before 0.95 publication. I need to spend some intimate time with the specification to review the changes to the template mechanism in more details. Some of the changes have the potential to make the specification quite a bit harder to implement. This is especially the case for the introduction of “depthLimit” (but it’s probably a needed feature anyway). And the fact that relationships can now match item selectors will make things either easier or harder to implement, depending on your implementation choice (e.g. straight-to-SQL/XML or through an OO or RDF model). Congrats to the group. We should soon hear about submission for standardization.

Comments Off on CMDBf 1.0 specification released

Filed under CMDB Federation, CMDBf, Everything, IT Systems Mgmt, Query, Specs, Standards, Tech

Grid-enabled SOA article

Dave Chappell and David Berry have recently published an article in SOA Magazine titled “The Next-Generation, Grid-Enabled Service-Oriented Architecture”. I had unexpectedly gotten a quick overview of this work a few weeks ago, when I ran into Dave Chappell at the Oracle gym (since I was coming out of an early morning swim it took Dave a couple of seconds to recognize me, as I walked through the weight room leaving a trail of water behind me). Even if you are more interested in systems management than middleware, this article is worth your reading time because it describes a class of problems (or rather, opportunities) that cut across middleware and IT management. Namely, providing the best environment for scalable, reliable and flexible SOA applications. In other words, making the theoretical promises of SOA practically achievable in real scenarios. The article mentions “self-healing management and SLA enforcement” and it implies lots of capabilities to automatically provision, configure and manage the underlying elements of the Grid as well the SOA applications that make use of it. Those are the capabilities that I am now working on as part of the Oracle Enterprise Manager team. And the beauty of doing this at Oracle, is that we can work on this hand in hand with people like Dave to make sure that we don’t create artificial barriers between middleware and systems management.

1 Comment

Filed under Articles, Everything, Grid, Grid-enabled SOA, IT Systems Mgmt, Tech

A review of OVF from a systems management perspective

I finally took a look at OVF, the virtual machine distribution specification that was recently submitted to DMTF. The document is authored by VMware and XenSource, but they are joined in the submission to DMTF by some other biggies, namely Microsoft, HP, IBM and Dell.

Overall, the specification does a good job of going after the low-hanging fruits of VM distribution/portability. And the white paper is very good. I wish I could say that all the specifications I have been associated with came accompanied by such a clear description of what they are about.

I am not a virtualization, operating system or hardware expert. I am mostly looking at this specification from the systems management perspective. More specifically I see virtualization and standardization as two of the many threads that create a great opportunity for increased automation of IT management and more focus on the application rather than the infrastructure (which is part of why I am now at Oracle). Since OVF falls in both the “virtualization” and “standardization” buckets, it got my attention. And the stated goal of the specification (“facilitate the automated, secure management not only of virtual machines but the appliance as a functional unit”, see section 3.1) seems to fit very well with this perspective.

On the other hand, the authors explicitly state that in the first version of the specification they are addressing the package/distribution stage and the deployment stage, not the earlier stage (development) or the later ones (management and retirement). This sidesteps many of the harder issues, which is part of why I write that the specification goes after the low-hanging fruits (nothing wrong with starting that way BTW).

The other reason for the “low hanging fruit” statement is that OVF is just a wrapper around proprietary virtual disk formats. It is not a common virtual disk format. I’ve read in several news reports that this specification provides portability across VM platforms. It’s sad but almost expected that the IT press would get this important nuance wrong, it’s more disappointing when analysts (who should know better) do, as for example the Burton Group which writes in its analysis “so when OVF is supported on Xen and VMware virtualization platforms for example, a VM packaged on a VMware hypervisor can run on a Xen hypervisor, and vice-versa”. That’s only if someone at some point in the chain translates from the Xen virtual disk format to the VMware one. OVF will provide deployment metadata and will allow you to package both virtual disks in a TAR if you so desire, but it will not do the translation for you. And the OVF authors are pretty up front about this (for example, the white paper states that “the act of packaging a virtual machine into an OVF package does not guarantee universal portability or install-ability across all hypervisors”). On a side note, this reminds me a bit of how the Sun/Microsoft Web SSO MEX and Web SSO Interop Profile specifications were supposed to bridge Passport with WS-Federation which was a huge overstatement. Except that in that case, the vendors were encouraging the misconception (which the IT press happily picked up) while in the OVF case it seems like the vendors are upfront about the limitations.

There is nothing rocket-science about OVF and even as a non-virtualization expert it makes sense to me. I was very intrigued by the promise that the specification “directly supports the configuration of multi-tier applications and the composition of virtual machines to deliver composed services” but this turns out to be a bit of an overstatement. Basically, you can distribute the VMs across networks by specifying a network name for each VM. I can easily understand the simple case, where all the VMs are on the same network and talking to one another. But there is no way (that I can see) to specify the network topology that joins different networks together, e.g. saying that there is a firewall between networks “blue” and “red” that only allows traffic on port 80). So why would I create an OVF file that composes several virtual machines if they are going to be deployed on networks that have no relationships to one another? I guess the one use case I can think of would be if one of the virtual machines was assigned to two networks and acted as a gateway/firewall between them. But that’s not a very common and scalable way to run your networks. There is a reason why Cisco sells $30 billions of networking gear every year. So what’s the point of this lightweight distributed deployment? Is it just for that use case where the network gear is also virtualized, in the expectation of future progress in that domain? Is this just a common anchor point to be later extended with more advanced network topology descriptions? This looks to me like an attempt to pick a low-hanging fruit that wasn’t there.

Departing from usual practice, this submission doesn’t seem to come with any license grant, which must have greatly facilitated its release and the recruitment of supporters for the submission. But it should be a red flag for adopters. It’s worth keeping track of its IP status as the work progresses. Unless things have changed recently, DMTF’s IP policy is pretty weak so the fact that works happens there doesn’t guarantee much protection per se to the adopters. Interestingly, there are two sections (6.2 about the virtual disk format and 11.3 about the communication between the guest software and the deployment platform) where the choice of words suggests the intervention of patent lawyers: phrases like “unencumbered specification” (presumably unencumbered with licensing requirements) and “someone skilled in the art”. Which is not surprising since this is the part where the VMWare-specific, Xen-specific or Microsoft-specific specifications would plug in.

Speaking of lawyers, the section that allows the EULA to be shipped with the virtual appliance is very simplistic. It’s just a human-readable piece of text in the OVF file. The specification somewhat naively mentions that “if unattended installs are allowed, all embedded license sections are implicitly accepted”. Great, thanks, enterprises love to implicitly accept licensing terms. I would hope that the next version will provide, at least, a way to have a URI to identify the EULA so that I can maintain a list of pre-approved EULAs for which unattended deployment is possible. Automation of IT management is supposed to makes things faster and cheaper. Having a busy and expensive lawyer read a EULA as part of my deployment process goes against both objectives.

It’s nice of the authors to do the work of formatting the specification using the DMTF-approved DSPxxxx format before submitting to the organization. But using a targetnamespace in the dmtf.org domain when the specification is just a submission seems pretty tacky to me, unless they got a green light from the DMTF ahead of time. Also, it looks a little crass on the part of VMware to wrap the specification inside their corporate white paper template (cover page and back page) if this is a joint publication. See the links at http://www.vmware.com/appliances/learn/ovf.html. Even though for all I know VMware might have done most of the actual work. That’s why the links that I used to the white paper and the specification are those at XenSource, which offers the plain version. But then again, this specification is pretty much a wrapper around a virtual disk file, so graphically wrapping it may have seemed appropriate…

OK, now for some XML nitpicking.

I am not a fan of leaving elementformdefault set to “unqualified” but it’s their right to do so. But then they qualify all the attributes in the specification examples. That looks a little awkward to me (I tend to do the opposite and qualify the elements but not the attributes) and, more importantly, it violates the schema in appendix since the schema leaves attributeFormDefault to its default value (unqualified). I would rather run a validation before makings this accusation, but where are the stand-alone XSD files? The white paper states that “it is the intention of the authors to ensure that the first version of the specification is implemented in their products, and so the vendors of virtual appliances and other ISV enablement, can develop to this version of the specification” but do you really expect us to copy/paste from PDF and then manually remove the line numbers and header/footer content that comes along? Sorry, I have better things to do (like whine about it on this blog) so I haven’t run the validation to verify that the examples are indeed in violation. But that’s at least how they look to me.

I also have a problem with the Section and Content elements that are just shells defined by the value of their xsi:type attribute. The authors claim it’s for extensibility (“the use of xsi:type is a core part of making the OVF extensible, since additional type definitions for sections can be added”) but there are better ways to do extensibility in XML (remember, that’s what the X stands for). It would be better to define an element per type (disk, network…). They could possibly be based on the same generic type in XSD. And this way you get more syntactic flexibility and you get the option to have sub-types of sub-types rather than a flat list. Interestingly, there is a comment inside the XSD that defines the Section type that reads “the base class for a section. Subclassing this is the most common form of extensibility”. That’s the right approach, but somehow it got dropped at some point.

Finally, the specification seems to have been formated based on WS-Management (which is the first specification that mixed the traditional WS-spec conventions with the DMTF DSPxxxx format), which may explain why WS-Management is listed as a reference at the end even though it is not used anywhere in the specification. That’s fine but it shows in a few places where more editing is needed. For example requirement R1.5-1 states that “conformant services of this specification MUST use this XML namespace Universal Resource Identifier (URI): http://schemas.dmtf.org/ovf”. I know what a conformant service is for WS-Management but I don’t know what it is for this specification. Also, the namespace that this requirement uses is actually not defined or used by this specification, so this requirement is pretty meaningless. The table of namespaces that follows just after is missing some namespaces. For example, the prefix “xsi” is used on line 457 (xsi:any and xsi:AnyAttribute) and I want to say it’s the wrong one as xsi is usually assigned to “http://www.w3.org/2001/XMLSchema-instance” and not “http://www.w3.org/2001/XMLSchema” but since the prefix is not in the table I guess it’s anyone’s guess (and BTW, it’s “anyAttribute”, not “AnyAttribute”).

By this point I may sound like I don’t like the specification. Not at all. I still stand with what I wrote in the second paragraph. It’s a good specification and the subset of problems that it addresses is a useful subset. There are a few things to fix in the current content and several more specifications to write to complement it, but it’s a very good first step and I am glad to see VMware and XenSource collaborating on this. Microsoft is nominally in support at this point, but it remains to be seen to what extent. I haven’t seen them in the past very interested in standards effort that they are not driving and so far this doesn’t appear to be something they are driving.

6 Comments

Filed under DMTF, Everything, IT Systems Mgmt, OVF, Specs, Standards, Tech, Virtualization, VMware, XenSource

First release of the CMDBF specification

The CMDBF (CMDB Federation) group just released a first public draft of the specification. Here is a direct link to it (PDF, 1MB). That’s a very nice achievement for a group that was a bit slow to find its pace but has been very productive since the beginning of 2007. Having the interop test drive our efforts had a lot to do with it, this is an approach to repeat the next time around.

This spec will look a lot more familiar to people used to reading WS-* specs than to people used to reading ITIL manuals. The concepts and use cases are (hopefully) consistent with ITIL, but the meat of the spec is about defining interoperable SOAP-based message exchanges that realize these use cases. The spec is about defining SOAP payloads, not IT management best practices. Please adjust your expectations accordingly and you’ll like the spec a lot more.

The most useful and important part (in my mind at least), is the definition of the Query service (section 4). This is used in many interactions. It is used by clients to query a CMDB that federates many MDRs (Management Data Repositories). It is used by the clients to go interact directly with the MDRs is they choose to. And it is used by the federating CMDB to retrieve data from the MDRs. Even in the more static scenarios, in which data is replicated from the MDRs into the CMDB (instead of true federation with no replication), the Query service is still the way for clients to access federated data from the CMDB.

So, yet another query language? Indeed. But one that natively supports concepts that are key to the kind of queries most useful for CMDB scenarios, namely relationships traversal and types. This is a topological query language. I hope the simple example in section 4.2 gives a good example of what this means.

Neither SQL nor XPath/XQuery has this topology-friendly approach (which is not to say that you can’t create useful queries on CMDB data with these query languages). On the other hand, there is one query language that is inherently relationship-oriented, that has native support for the notion of class and that has received a lot more attention, interop testing and implementation experience than our effort. One that I would have loved for us to leverage in CMDBF. It’s SPARQL. But semantic web technologies seem once again to be doomed by the perception that they are too much “out there” despite all the efforts of its proponents to make them connect with XML and other non-RDF views of the world.

Final caveat, this is a first draft. It is known to be incomplete (you’ll find text boxes that describe known gaps) and not all features in the spec were tested at the interop. We need more interop, review and development before it is robust. And most importantly, a lot of the difficult aspects of federation aren’t sufficiently addressed (reconciliation, model differences, source tracing, administrative metadata…) But it is at a point where we think it gives a good idea of how we are approaching the problem.

Equipped with this foreword, I wish you a pleasant read.

3 Comments

Filed under CMDB Federation, CMDBf, Everything, Specs, Tech

Gutting the SOAP processing model

One thing I have constantly witnessed in the SOAP design work I have been involved in is the fear of headers. There are several aspects to this, and it’s very hard to sort out the causes and the consequences as they are now linked in a vicious cycle.

For one thing, headers in SOAP messages don’t map well to RPC and we all know SOAP’s history there. Even now, as people have learned that they’re supposed to say they are not doing RPC (“look, my WSDL says doc/literal therefore I am not doing RPC”), the code is still RPC-ish with the grand-children of the body being serialized into Java (or another language) objects and passed as arguments to an operation inside a machine-generated stub. No room for pesky headers.

In addition (and maybe because of this view), tools often bend over backward to prevent you from processing headers along with the rest of the message, requiring instead some handlers to be separately written and registered/configured in whatever way (usually poorly documented as people aren’t expected to do this) the SOAP stack requires.

And the circle goes on, with poor tooling support for headers being used to justify staying away from them in SOAP-based interfaces.

I wrote above that “headers in SOAP messages don’t map well to RPC” but the more accurate sentence is “headers that are not handled by the infrastructure don’t map well to RPC”. If the headers are generic enough to be processed by the SOAP stack before getting to the programmer, then the RPC paradigm can still apply. Halleluiah. Which is why there is no shortage of specs that define headers. But those are the specs written by the “big boys” of Web services, the same who build SOAP stacks. They trust themselves to create code in their stacks to handle WS-Addressing, WS-Security and WS-ReliableMessaging headers for the developers. Those headers are ok. But the little guys shouldn’t be trusted with headers. They’d poke their eyes out. Or, as Elliotte Rusty Harold recently wrote (on a related topic, namely WS-* versus REST), “developers have to be told what to do and kept from getting their grubby little hands all over the network protocols because they can’t be trusted to make the right choices”. The difference is that Elliotte is blasting any use of WS-* while I am advocating a use of SOAP that gives developers full access to the network protocol. Which goes in the same direction but without closing the door on using WS-* specs when they actually help.

I don’t mean to keep banging on my IBM friends, but I’ve spent a lot of time developing specs with well respected IBM engineers who will say openly that headers should not contain “application information”, by which they mean that headers should be processed by the SOAP stack and not require the programmer to deal with them.

That sounds very kind, and certainly making life easy for programmers is good, but what it does is rob them of the single most useful feature in SOAP. Let’s take a step back and look at what SOAP is. If we charitably leave aside the parts that are lipstick on the SOAP-RPC pig (e.g. SOAP data model, SOAP encoding and other RPC features), we are left with:

  • the SOAP processing model
  • the SOAP extensibility model
  • SOAP faults

Of these three, the first two are all about headers. Loose the ability to define headers in your interface, you loose 2/3 of the benefits of SOAP (and maybe even more, I would argue that SOAP faults aren’t nearly as useful as the other two items in the list).

The SOAP processing model describes how headers can be processed by intermediaries and, most important of all, it clearly spells out how the mustUnderstand attribute works. So that everyone who processes a SOAP message knows (or should know) that a header with no such attribute (or where the attribute has a value of “false”) can be ignored if not recognized, while the message must not be processed if a non-understood header has this attribute set to “true”. It doesn’t sound like much, but it is huge in terms of reliably supporting features and extending an existing set of SOAP messages.

When they are deprived of this mechanism (or rather, scared away from using it), people who create SOAP interfaces end up re-specifying this very common behavior over and over by attaching it to parts of the body. But then it has to be understood and implemented correctly every time rather than leveraging a built-in feature of SOAP that any SOAP engine must (if compliance with the SOAP spec has any meaning) implement.

Case in point, the wsman:OptimizeEnumeration element in WS-Management. The normal pattern for an enumeration (see WS-Enumeration) is to first send an Enumerate request to retrieve an enumeration context and then a series of Pull requests to get the data. In this approach, the minimum number of interactions to retrieve even a small enumeration set is 2 (one Enumerate and one Pull). The OptimizeEnumeration element was designed to allow one to request that the Enumerate call be also the first Pull so that we can save one interaction. Not a big deal if you are going to send 500 Pull messages to retrieve the data, but if the enumeration set is small enough to retrieve all the data in one or two Pull messages then you can optimize traffic by up to 50%.

WS-Management decided to do that with an extension element in the body rather than a SOAP header. The problem then is what happens when a WS-Enumeration implementation that doesn’t use the WS-Management additions gets that element. There is no general rule in SOAP about how to handle this. In this case, the WS-Enumeration specs says to ignore the extensions, but it only says so in the “notation conventions” section, and as a SHOULD, not a MUST.

Not surprisingly, some implementers made the wrong assumption on what the presence of wsman:OptimizeEnumeration means and as a result their code faults on receiving a message with this option, rather than ignoring it. Something that would likely not have happened had the SOAP processing model been used rather than re-specifying this behavior.

Using a header not only makes it clear when an extension can be ignored, it also provides, at no additional cost, the ability to specify when it should not be ignored. Rather than baking this into the spec as WS-Management tries to do (not very successfully), using headers allows a sender to decide based on his/her specific needs and capabilities whether support for this extension is a must or not in this specific instance. Usually, flexibility and interoperability tug in opposing directions, but using headers rather than body elements for protocol extensions improves both!

Two more comments about the use of headers in WS-Management:

  • It shows that it’s not all IBM’s fault, because IBM had no involvement in the development of the WS-Management standard,
  • WS-Management is far from being the worst offender at under-using SOAP headers, since it makes good use of them in other places (which probably wouldn’t have been the case had the first bullet not been true). But not consistently enough.

Now let’s look at WS-ResourceTransfer for another example. This too illustrates that those who ignore the SOAP processing model just end up re-inventing it in the body, but it also gives a beautiful illustration of the kind of compromises that come from design by committee (and I was part of this compromise so I am as guilty as the other authors).

WS-ResourceTransfer creates a set of extensions on top of the WS-Transfer specification. The main one is the ability to retrieve only a portion of the document as opposed to the whole document. Defining this extension as a SOAP header (as WS-Management does it) leverages the SOAP processing model. Doing so means you don’t have to write any text in your spec about whether this header can be ignored or not, and the behavior can be specified clearly and interoperably by the sender, through the use of the mustUnderstand attribute. In some case (say if the underlying XML document is in fact an entire database with gigabytes of data) I definitely want the receiver to fault if it doesn’t understand that I only want a portion of the document, so I will set mustUnderstand to true. In other cases (if the subset is not much smaller than the entire document and I am willing to wad through the entire document to extract the relevant data if I have to) then I’ll set mustUnderstand to “false” so that if the receiver can’t honor the extension then I get the whole document instead of a fault.

But with WS-ResourceTransfer, we had to deal with the “headers should not contain application information” view of the world in which it is heresy to use headers for these extensions (why the portion of the document to retrieve is considered “application information” while addressing information is not is beyond my reach and frankly not a debate worth having because this whole dichotomy only makes sense in an RPC view of the world).

So, between the “this information must be in the body” camp and the “let’s leverage the SOAP processing model” camp (my view, as should be obvious by now), the only possible compromise was to do both, with a mustUnderstand header that just says “look in the body to find the extensions that you must follow”. So much for elegance and simplicity. And since that header applies to all the extensions, you have to make all of them mustUnderstand or none of them. For example, you can’t have a wsrt:Create message in which the wsmex:Metadata extension must be understood but the wsrt:Fragment extension can be ignored.

WS-Management and WS-ResourceTransfer share the characteristic of being designed as extensions to existing specifications. But even when creating a SOAP-based interaction from scratch, one benefits from using the SOAP processing model rather than re-inventing ways to express whether support for an option is required or not. I have seen several other examples (in interfaces that aren’t public like the two examples above) that also re-invent this very common behavior for no good reason. Simply because developers are scared of headers.

These examples illustrate a simple point. Most of the value in SOAP resides in how headers are used in the SOAP processing model and the SOAP extensibility model. Steering developers away from defining headers is robbing them of that value. Rather than thinking in terms of header and body, I prefer to think of them as header and tail.

6 Comments

Filed under Everything, SOAP, SOAP header, Specs, Standards, Tech

CMDBF update

My CMDBF colleague from BMC Van Wiles has a short update on the state of the CMDBF work. This is an occasion for me to point to his blog. I know that several of my readers are very interested in the CMDBF work, and they should probably monitor Van’s posts in addition to mine.

Like Van, I see the pace accelerating, which is good. More importantly, the quality of the discussion has really improved, not just the quantity. We’ve spent way too much time in UML-land (come on people, this is a protocol, get the key concepts and move on, no need to go in UML to the same level of detail that the XML contains), but the breakthrough was the move to pseudo-schema, and from there to example XML instance docs and finally to java code for the interop. I am having a lot of fun writing this code. I used this opportunity to take XOM on a road test and it’s been a very friendly companion.

Now that we’re getting close to the CMDBF interop, I feel a lot better about the effort than I did at the beginning of the year. It’s still going to be challenging to make the interop. Even if we succeed, the interop only covers a portion of what the spec needs to deliver. But it will be a very good milestone.

1 Comment

Filed under Everything, Implementation, Specs, Standards, Tech

XMLFrag SOAP header

This HTML document titled “XMLFrag header” describes a proposal I wrote for a header that would allow one to target a SOAP message to a subset of the resource to which the message is sent. This is useful in cases where messages are used to interact with systems that have a well-known model, as is often the case in the IT systems management world. If you’ve ever used or been intrigued by WS-Management’s “FragmentTransfer” header, or WS-ResourceProperties’s “QueryResourceProperties” message or WS-RessourceTransfer’s “Expression” element then you may be interested by a different approach that is operation-independent. Please read the example at the beginning of the proposal for an explanation of what XMLFrag does.

I think it’s nifty, but I am not sure how much need for this there is. Are the composition/mash-up scenarios that this allows important enough to replace well established alternatives, such as WS-Management’s “FragmentTransfer” header? I am not sure. Which is why I am putting this out, to see if there’s any interest.

One possible approach would be to generalize WS-Management’s current mechanism to support the features presented here. This could be done very easily and in a backward-compatible way by declaring that the response wrapper (wsman:XmlFragment) is not applied for all dialect but defined in a per-dialect fashion. The currently defined dialects would keep using the wsman:XmlFragment wrapper, but a new dialect could be defined that didn’t use a wrapper and would behave like the XMLFrag element defined in my proposal.

A few additional notes and comments:

This proposal has been called “WS-SubversiveAddressing” by some of my IBM friends who have very definite ideas about what belongs in SOAP headers, what belongs in the body and what headers are appropriate to use as reference properties in an EPR. And this proposal seems to break all these rules. But since the rules seem more inspired by lack of flexibility in the WebSphere message routing/processing capabilities than by true architectural constraints I am not too worried. I would even say that this proposal represents the only kind of header that really make sense to use as reference properties. Headers that sometimes are set by the sender explicitly and sometimes are hard-coded in the EPR used. Using reference properties for headers that only come from reference properties (and aren’t expected to ever be set explicitly by the sender) is a sign of lack of ability of the stack to route messages based on URL. A too-frequent limitation of Java SOAP stacks. But I am digressing…

I didn’t think there was anything worthy of a patent in this, but in these sad times you can’t be sure that someone is not going to try to get one for it, so just to be sure HP published this a year ago in Research Disclosure so that no-one can patent it. If you want to check, it starts on page 627 of the May 2006 Research Disclosure (not available on-line unless you have an account w/ them, but you can order it). So in reality, this document has already been public for a year, but in a pretty hidden form. This post just gives it a little bit more visibility.

This is not a spec. It is just a description of what a spec could do. It doesn’t have normative language, it doesn’t provide formal syntax (pseudo-schema, XSD and/or other) and it doesn’t address some of the details that would be needed for interoperability (e.g. what namespaces declarations are in context for the XPath evaluation).

Why use a car fleet as an example for illustration? For the same reason that the SML spec uses a university (class/student) example. To pick a domain that is different from the expected domain of application of the technology to not invite modeling discussions that are irrelevant to the proposal and to not biaise people’s view of what this could be used for.

Is there a relationship between this and federation efforts? Sort of. This could be very useful when exposed by a a federator, if the model is very hierachical. But it doesn’t work so well for graph-type models. Which is what CMDBF does and which is why CMDBF is coming up with a more graph-oriented approach. But that too is a different topic.

4 Comments

Filed under Everything, SOAP, SOAP header, Specs, Tech, XMLFrag