SML versus the fat-bottomed specs

SML is, if I simplify, XSD augmented with Schematron. For those, like me, who aren’t fond of XSD, this is not very exciting… until you try to look at things in a different light. Instead of another spec that forces you towards the use of XSD (like WSDL), maybe the fact that SML uses XSD is your ticket *out* of XSD-hell. Let me explain.

I wrote above that I am not fond of XSD, and yet I see the value of having SML make use of it. Like it or not, many people and organizations have made heavy use of XSD to define well-known and reusable XML elements. And there is a lot of tooling (design time and runtime) for it. Breaking away from XSD altogether is possible (and advisable in many cases), but hard to do in places like systems management that have already invested heavily in using XSD.

The problem is that XSD is a document description language. It works well when the “document” abstraction is a good match. So, when I retrieve an XHTML page from a Web site, I want the paragraphs to be in the right order. The “document” abstraction is a good match. On the other hand, when I retrieve the configuration of a server, I don’t necessarily care if the description of the CPU comes before or after the description of the network card. I am still retrieving a document though (because XML forces this abstraction). But I don’t have the same requirements on its structure that I have on a document meant for publishing (like a Web page). For the non-publishing kind of interaction, a contract (a bullet list of things you can count on) is a better abstraction than a document.

XSD works better for the publishing kind of scenario, where you want to control all aspects of the document. It doesn’t work as well in situations where you just have some constraints that need to be met (e.g., the memory size must be a number) but other things are not important to you (order of some of the elements). As a result of XSD quirks, people often end up arbitrarily fixing the order of elements where it’s not needed (using xsd:sequence) and even have to introduce unneeded elements (to escape the dreaded UPA rule). And things become even worse when you have to extend and/or version existing XSD because of all the arbitrary constraints. Other metamodels like RDF avoid a lot of these problems by focusing on the assertion, rather than the document, as the base concept but this is a topic for another post.

One nice thing about the syntax constraints usually imposed by an XSD is that it makes the serialization of a piece of XML into a Java (or other language) more efficient. It doesn’t really matter semantically if the zip code is before or after the city name. In the US the zip code typically comes after (in postal addresses), in France it’s the contrary. And for this (unlike for the stupid MMDDYY date format, don’t get me started on this) you can make a case either way since in some places a zip code includes several cities and in others a city contains several zip codes. But whichever way you choose, you may be able to write a faster parser if you know in what order to expect them.

So I don’t mind at all having an XSD that describes a reusable type for elements that are very often used as an information atom, like an address (on the other hand, serializing an entire XML document into a Java object is often the wrong way to handle it).

By now you are getting an idea of what I want as an XML contract language. I want reusable elements that are small and potentially tightly defined (XSD definitions for a set of GEDs). And I want assertions that describe rules that a set of such elements need to obey in order to be valid as a unit per the contract. Which is where SML comes in. Because it provides a way to package XSD and Schematron, I can’t help thinking of it as a possible alternative to an all-XSD view of the world. If people have the discipline to only use the XSD part to describe small reusable elements and to rely on the XPath-driven Schematron constraints to provide the contract rules that tie these GEDs into a meaningful unit.

A few notes:

– I am fully aware (being part of it) that SML wasn’t created as a generic contract language for XML-based interaction, but as a desired state modeling language. The usage I am suggesting here is clearly a hack that abuses the syntax provided by SML (actually SML-IF). And I am not even sure that the SML-IF packaging would be an entirely convenient vehicle for this approach. I haven’t done the experimentation needed to validate that. It just seems to hit the ballpark of the requirements.

– I find it ironic that the approach to an XML contract language that I described above is already how many XML specs are defined in their human-readable section (at least in the SOAP world): a list of pseudo-XPath statements with a description of what to expect at the end of each one. But somehow at the bottom of each of these specs we get a huge XSD that imposes a lot of extra constraints that have no justification in the semantics of the spec. Rather than having a set of XPath-driven schematron statements that provide a machine-readable equivalent of the human-readable rules described used pseudo-XPath. Like the Queen song (almost) says “Fat bottomed specs you make the SOAPin world go dumb”.

One Response to SML versus the fat-bottomed specs