A new SPIN on enriching a model with domain knowledge (constraints and inferences)

Back when I was at HP and we got involved with what turned into SML (now a W3C candidate recommendation), we tried to make a case for the specification to be based on RDF/OWL rather than XML/XSD/Schematron. It was a strange situation from a technical perspective because RDF is a better foundation for an IT model than XML, but on the other hand XSD/Schematron is a better choice for validation than OWL. OWL is focused on inference, not validation (because of both fundamental design choices, e.g. the open world assumption, and language expressiveness limitations).

So our options were to either use the right way to represent the system (RDF) combined with the wrong way to capture constraints (OWL) or to use the wrong way to represent the system (XML) combined with the right way to constrain it (mostly Schematron, with some limited help from XSD). At the end, of course, this subtle technical debate was crushed under the steamroller of vendor politics and RDF never got a fair chance anyway.

The point of this little background story is to describe the context in which I read this announcement from Holger Knublauch of TopQuadrant: the new version of their TopBraid Composer tool introduces SPIN, a way to complement OWL with a SPARQL-based constraint checking and inference mechanism.

This relates to SML in two ways.

First, there are similarities in the approach: Schematron leverages the XPath language, used to query XML, to create validation rules. SML then marries Schematron with XSD, for a more powerful validation mechanism. Compare this to SPIN: SPIN leverages the SPARQL query language, used to query RDF, to create validation/inference rules. SPIN also marries this with OWL, for a more powerful validation/inference mechanism.

But beyond the mirroring structures of SPIN and SML, the most interesting thing is that it looks like SPIN could nicely solve the conundrum, described above, of RDF being the right foundation for modeling IT systems but OWL being the wrong constraint mechanism. SPIN may do a better job than SML at what SML is aiming to do (validation rules). And at the same time, you get “for free” (or as close to “for free” as you can get with software, which is still far from “free”) a pretty powerful inference mechanism. The most powerful I know of, short of using a general programming language to capture your inference rules (and good luck with maintaining these rules).

This may sound like sci-fi, but it’s the next logical step for IT configuration standardization. Let’s look at where we are today:

  • SML (at W3C) is an attempt to standardize the expression of constraints.
  • CMDBf (at DMTF) is standardizing how the model content is queried (and, to some limited extent at this point, federated).
  • And recently IBM authored a proposal for a reconciliation specification for items in the model and sent it to an Eclipse group (COSMOS).

But once you tackle reconciliation, you are already half-way into inferencing territory. At least if you want to reconcile between models, not just between instances expressed in the same model. Because the models may not be defined at the same level of granularity, and before you can reconcile items you need to infer finer-grained entities in your coarser-grained model (or vice-versa) so that you can reconcile apples with apples.

Today, inferencing for IT models is done as part of the “discovery packs” that you can buy along with your IT management model repository. But not very well, in general. Because the way you write such a discovery module for the HP Universal CMDB is very different from how you write it for the BMC CMDB, IBM’s CCMDB or as a plug-in for Oracle Enterprise Manager or Microsoft System Center. Not to mention the smaller, more specialized, players. As a result, there is little incentive for 3rd party domain experts to put work into capturing inference rules since the work cannot be widely leveraged.

I am going a bit off-topic here, but one interesting thing about standardization of inferencing for IT management, if it happens, is that it is going to be very hard to not use RDF, OWL and some flavor of SPARQL (SPIN or equivalent) there. And once you do that, the XML-based constraint mechanisms (SML or others) are going to be in for a rough ride. After resisting the RDF stack for constraints, queries and basic reconciliation (because the added value was supposedly not “worth the cost” for each of these separately), the XML dam might get a crack for inferencing. And once RDF starts to trickle through that crack, the whole dam is going to come down in a big wave. Just to be clear, this is a prophetic long-term vision, not a prediction for 2009 (unfortunately).

In the meantime, I’d like to take this SPIN feature a… spin (sorry) when I find some time. We’ll see if I can install the new beta of TopBraid composer despite having used up, a year ago, my evaluation license of the earlier version of the product. Despite what I had hopped at some point, this is not directly applicable to my current work, so I am not sure I want to buy a license. But who knows, SPIN may turn out to be the change that eventually puts RDF back on my “day job” list (one can dream)…

It’s also nice that Holger took the pain to deliver SPIN not just as a feature of his product but also as a stand-alone specification, which should make it pretty easy for anyone who has a SPARQL engine handy to support it. Hopefully the next step will be for him to clarify the IP terms for the specification and to decide whether or not he wants to eventually submit it for standardization. Maybe to the W3C SML working group? :-) I’d have a hard time resisting joining if he did.

12 Comments

Filed under CMDB, CMDBf, Everything, IT Systems Mgmt, Modeling, RDF, Semantic tech, SML, SPARQL, Specs, Tech, W3C

12 Responses to A new SPIN on enriching a model with domain knowledge (constraints and inferences)

  1. Interesting!

    There may be another option here as well… Tools like Pellet add SWRL/Datalog rules which could be used for constraints. I think that this is still preliminary work in their AL-Log engine, but it looks similar/valuable.

    Andrea

  2. Thanks a lot for discussing SPIN on your blog, William!

    (Unsurprisingly) I couldn’t agree more with your observations. The company I work for (TopQuadrant) has quite a number of “real-world” customers from industry, government military and one of the main complaints that we always tend to hear is that the open-world semantics of OWL don’t match their use cases and expectations. In many cases, our users (and the tools) simply ignore that the open-world assumption (and no unique-name assumption) exist in OWL and instead pretend that an OWL cardinality restriction of 1 really means just “one value allowed” – which is incorrect according to the formal semantics of OWL. SPARQL has a much clearer answer here: you get what you see in the graph, and not more. All what SPARQL does is pattern matching against the triples, and thus it has closed-world semantics and the unique name assumption. Having said this there are of course still use cases where bringing in OWL semantics has benefits, but the choice should be yours if you want to activate those semantics for your use case. I gave an example of how to express a large subset of OWL using SPIN in a follow-up posting [1].

    And yes, a general strength of RDF-based technology is for linking models. Everything in RDF has a unique URI to start with, but then it’s also possible to link arbitrary concepts together by reusing or sub-property-ing properties. We have many customers (such as NASA) who are very keen on integrating their large stock of existing XML dialects to make information exchange more efficient. We are building ontologies for them and with them to capture the domains of interest and thus avoid redundancies etc. I wrote about the SPIN-based unit conversion that is one of the outcomes of this work [2]. A constant problem that my colleagues face in this XML-related work is the weakness of expressivity of XML Schema – a format mainly geared for tree structures whereas most of the real-world is better modeled as graphs.

    Another aspect of RDF that SPIN rides on is the vision of a distributed self-describing data structure. In the Semantic Web, both classes and instances live in the same space and can be queried using the same mechanisms. SPIN takes this idea to extremes: you can not only define classes and properties, but even define executable semantics of those and use this mechanism to build your own modeling languages. I wrote about this elsewhere as well, but my article on SPIN templates [3] might be a good place to start. The key idea is that if someone introduces a new concept such as a constraint check on units then he or she can publish this constraint check by means of a SPIN template that lives at a specific URI for anyone (human or machine) to look up and execute. No other hard-coded magic is needed anywhere as long as your tools understand SPARQL. This approach can be used to build truly open model-driven applications, and the computer game on my blog is one example [4] of this vision.

    Anyway, there is certainly more to evolve in the SPARQL space. And yes we may be going to W3C with SPIN in the future. Right now we are in beta phase and I didn’t even properly announce this work yet to the wider Semantic Web community. So far responses from our users have been very positive and I know of at least one company that is considering to even create their own SPIN implementation. Since it’s all based on SPARQL, a large number of vendors with SPARQL support (including Oracle 11g :) ) could in principle add SPIN-like technology to their stack.

    Holger

    [1] http://composing-the-semantic-web.blogspot.com/2009/01/owl-2-rl-in-sparql-using-spin.html
    [2] http://composing-the-semantic-web.blogspot.com/2009/01/video-sparql-based-unit-conversion-with.html
    [3] http://composing-the-semantic-web.blogspot.com/2009/01/understanding-spin-templates.html
    [4] http://composing-the-semantic-web.blogspot.com/2009/01/spin-box-sparql-based-computer-game.html

  3. Stu

    We are using SPARQL heavily for query, validation, and “controlled inference” at Elastra (I really hope we can share some of our work publicly soon). We chose RDF and SPARQL over XSD/Schematron etc. for many of the reasons listed above; SPIN looks like a very interesting general approach for the mainstream (woulda been nice 4 months ago ;).

  4. Pingback: William Vambenepe’s blog » Blog Archive » CMDBf is a lot more and a lot less than you think

  5. Sylvere

    Holger, SPIN is nice work.
    I went through your blog, especially this page http://composing-the-semantic-web.blogspot.com/2009/01/object-oriented-semantic-web-with-spin.html
    I have just one question, about the square example: if I’m not wrong, we can also use SWRL to define the same constraint, right? So what is the difference between using SPIN or SWRL?

  6. Sylvere: you’ve posted on the wrong blog. Holger’s blog is http://composing-the-semantic-web.blogspot.com/.

  7. Sylvere

    William: I apologize but I did on purpose since I was not able to post on Holger’s blog.

  8. Sylvere: no problem. I sent Holger an email to let him know.

  9. Sylvere, thanks for your feedback on SPIN. Yes SPIN and SWRL are related: they both have an RDF syntax to embed rules into ontologies. They both have similar expressivity and in this toy example with rectangles and squares you could pick either language. I see various advantages of SPIN over SWRL as below, mostly having to do with SPIN’s use of SPARQL.

    – SPIN is more expressive than SWRL because SPARQL is (e.g. you have richer filter expressions and you can really use the semantic web as part of your queries via named graphs).
    – SPIN will benefit from the further evolution of SPARQL which is a very active W3C standard with lots of tools. As far as I can tell, SWRL is not actively moved further right now apart from a RIF mapping.
    – SPIN is more extensible, e.g. you can create your own functions and templates. With SWRL you are limited to whatever hard-coded execution library has been provided by the engine. SPIN functions are first-class citizens and can be shared together with the data models.
    – SPIN has explicit support for constraint checking.
    – SPIN is object-oriented and therefore arguably easier to maintain. Also rule execution and constraint checking can be scoped better due to the OO attachment.

    You may want to get additional opinions by posting a similar question to the SWRL proponents such as the Protege-OWL 3 people. I have seen some postings there that compare SWRL with SPARQL, but I largely disagree with their conclusions. In particular they seem to claim that only SWRL can use OWL DL semantics, which is untrue because you can of course execute SPARQL on top of a triple store with OWL DL inferencing turned on. For additional questions on SPIN I recommend either contacting me directly or via the TopBraid Composer mailing list http://groups.google.com/group/topbraid-composer-users. Thanks.

  10. Sylvere

    Holger, thanks for you answer, especially since I think that I’m not the first one to ask.
    Your answer is complete and exactly what I was looking for.
    I will join your group on google and might come back there with a couple of questions.

  11. Pingback: William Vambenepe — OWL news you can use

  12. Pingback: SPIN overview « Passion for the Web Of Data