William Vambenepe's blog

IT management in a changing IT world

With advances in medicinal chemistry, most antibiotics are now buy generic viagra airmode de chemically from original compounds found in nature, as is the case with beta-lactams (which include the penicillins, produced by fungi in the genus Penicillium, the cephalosporins, and the carbapenems).Some antibiotics are still produced and isolated from living price for generic viagra, such as the aminoglycosides; in addition, many more have been created through purely synthetic means, such as the quinolones.Antibiotic use in food animal production has been associated with the free viagra order online of antibiotic-resistant strains of bacteria including Salmonella spp.The anti-authoritarian Catholic social theorist Ivan Illich buy cheap viagra contemporary western medicine to detailed attack in his Medical Nemesis, first published in 1975.One problem with this 'best practice' approach is that it cheapest viagra price be seen to stifle novel approaches to treatment.

Archive for the 'RDF' Category

28
Oct
2009

OWL news you can use

by William (@vambenepe on Twitter)

The W3C released OWL 2 today. Most readers of this blog are IT management people (whether they call it “cloud computing” or “boring old system management”) and don’t follow RDF, OWL, SPARQL etc too closely (if at all). Yet there is a lot of potential value in using these technologies for IT management, so I thought it might be helpful to provide some practical resources on the topic. I have selected articles that cover the special (some may say “twisted”) approach of using OWL and its friends for validation rather than just inference, as this use case is very relevant to IT management.

Of course you can also go to the W3C standard itself, starting with the overview of OWL 2.

Just so you don’t feel lonely if you decide to explore this path, have a look at Elastra’s sexy technology stack. ECML, EDML and EMML are all defined as OWL ontologies.

22
Jul
2009

Anthology of blog posts about protocols and data formats

by William (@vambenepe on Twitter)

I just finished reading or re-reading a half-dozen great short texts about data formats and protocols, in the XML/RDF space.

I started with this “do we need WADL” post by Joe Gregorio (since the previous entry made me go back to WADL which is used by Rackspace). Under the guise of a Q&A about WADL, Joe’s post disposes of the notion that IDL-based code generation is any good (of course the reference on this topic is Steve’s Alpine paper, but Joe very elegantly captures the gist of it a few sentences). He then explains what it really take to specify a protocol (hint: it’s not just a syntax). This is about WSDL and XSD as much as WADL.

When I reached the point in Joe’s Q&A where he discusses whether one should ever create a new protocol, I remembered a post on this very topic from Tim Bray, which I easily Googled back to life. Two of them actually, one about why you shouldn’t do it and the other about how to do it since he knows his advice will be ignored. There are so many lessons in these that I won’t even attempt to summarize.

Tim’s second piece then delivered me to this excellent article about the various facets of RDF. It’s six years old but still true. Though if it was written today I expect it would add “graph query language” and possibly even “constraint language” as facets of RDF.

While I am at it, I should add to the list this to this bird-eye view of all the XML obstacles that pedestrians run into (I have highlighted this entry in a previous post).

These are all very well written articles by people who think very clearly about the domain. None of them technically taught me anything I didn’t know before, but they definitely helped me clarify my thoughts (and find the words to explain them to others).

We’re not artists. We’re not scientists. We’re not mathematicians. But there is some beauty in computer protocol design too. These writings are museum pieces, in the “lasting/worthwhile” sense of the term (not the “old/outdated” sense that it often has in the computing world). Don’t rush to read them, they are all several years old and have aged very well. Wait until you have the time to read them carefully.

I didn’t set out to create a best-of compilation of writings about protocols and data formats. I just happened to run into these great entries in a 30 minutes period and I was impressed by how much “above average” they all are. Is it luck? Does the topic of computer protocols naturally attract good thinkers and writers? Am I just in a good mood tonight? Who knows.

There must be others, possibly even better. Elliotte Rusty Harold occasionally surfaces one through his not-so-daily “quote of the day“. Suggestions for more articles of this caliber are welcome. A thousand monkeys may not be able to produce Hamlet, but a thousand bloggers may come close to an equivalent of Feynman’s lectures.

12
Jun
2009

With M (Oslo), is Microsoft on the path to reinventing RDF?

by William (@vambenepe on Twitter)

I have given up, at least for now, on understanding what Microsoft wants Oslo (and more specifically the “M” part) to be. I used to pull my hair reading inconsistent articles and interviews about what M tries to be (graphical programming! DSL! IT models! generic parser! application components! workflow! SOA framework! generic data layer! SQL/T-SQL for dummies! JSON replacement! all of the above!). Douglas Purdy makes a valiant 4-part effort (1, 2, 3, 4) but it’s still not crisp enough for my small brain. Even David Chapell, explainer extraordinaire, seems to throw up his hands (“a modeling platform that can be applied in lots of different ways”, which BTW is the most exact, if vague, description I’ve heard). Rather than articles, I now mainly look at the base specifications and technical documents that show what it actually is. That’s what I did  when the Oslo SDK first came out last year. A new technical document came out recently, an update to the MGraph Object Model so I took another a look.

And it turns out that MGraph is… RDF. Or rather, “RDF minus entailment”. And with turtle as the base representation rather than an add-on.

Look at section 3 (“RDF concepts”) in this table of content from W3C. It describes the core RDF concepts. Keep the first five concepts (sections 3.1 to 3.5) and drop the last one (“3.6: Entailment”). You have MGraph, a graph-oriented object model.

On top of this, the RDF community adds reasoning capabilities with RDF entailment, RDFS, OWL, SWRL, SPIN, etc and a variety of engines that implement these different levels of reasoning.

Microsoft, on the other hand, seems to ignore that direction. Instead, it focuses on creating a good mapping from this graph object model to programming languages. In two directions:

  • from programming languages to the graph model: they make it easy for you to create a domain-specific language (DSL) that can easily be turned into M instances.
  • from the graph model to programming languages: they make it easy for you to work on these M instances (including storing them) using the .NET technology stack.

So, if Microsoft is indeed reinventing RDF as the title of this entry provocatively suggests, then they are taking an interesting detour on the way. Rather than going straight to “model-based inferencing”, they are first focusing on mapping the core MGraph concepts to programming (by regular developers) and user interactions (with regular users). Something that for a long time had not gotten much attention in the RDF world beyond pointing developers to Jena (though it seemed to have improved over the last few years with companies like TopQuadrant; ironically, the Oslo model browser/editor is code-named “Quadrant”).

Whether the Oslo team sees the inferencing fun as a later addition or something that’s not needed is another question, on which I don’t see any hint at this point.

I hope they eventually get to it. But I like the fact that they cleanly separate the ability to represent and manipulate the graph model from the question of whether instances can be inferred. We could use such a reusable graph representation mechanism. Did CMDBf, for example, really have to create a new graph-oriented metamodel and query language? I failed to convince the group to adopt RDF/SPARQL, but I may have been more successful if there had been a cleanly-separated “static” version of RDF/SPARQL, a way to represent and query a graph independently of whether the edges and nodes in the graph (and their types) are declared or inferred. Instead, the RDF stack has entailment deeply embedded and that’s very scary to many.

But as much as I like this separation, I can’t help squirming when I see the first example in the MGraph document:

// Populate a small village with some people
Villagers => {
  Jenn => Person { Name => 'Jennifer', Age => 28, Spouse => Rich },
  Rich => Person { Name => 'Richard', Age => 26, Spouse => Jenn },
  Charly => Person { Name => 'Charlotte', Age => 12 }
},
HaveSpouses => { Villagers.Rich, Villagers.Jenn }

That last line is an eyesore to anyone who has been anywhere near RDF. I have just declared that Rich and Jenn are one another’s spouse, why do I have to add a line that says that they have spouses? What I want is to say that participation in a “Spouse” relationship entails membership in the “hasSpouse” class. And BTW, I also want to mark the “Spouse” relationship as symmetric so I only have to declare it one way and the inverse can be inferred.

So maybe I don’t really know what I want on this. I want the graph model to be separated from the inference logic and yet I want the syntactic simplicity that derives from base entailments like the example above. Is June too early to start a Christmas wish list?

While I am at it, can we please stop putting people’s ages in the model rather than their dates of birth? I know it’s just an example, but I see it over and over in so many modeling examples. And it’s so wrong in 99% of cases. It just hurts.

There are other things about MGraph edges that look strange if you are used to RDF. For example, edges can be labeled or not, as illustrated on this first example of the graph model:

In this example, “Age” is a labeled edge that points to the atomic node “42″, while the credit score is modeled as a non-atomic node linked from the person via an unlabeled edge. Presumably the “credit score” node is also linked to an atomic node (not shown) that contains the actual score value (e.g. “800″). I can see why one would want to call out the credit score as a node rather than having an edge (labeled “credit score”) that goes to an atomic node containing the actual credit score value (similar to how “age” is handled). For one thing, you may want to attach additional data to that “credit score” node (when was it calculated, which reporting agency provided it, etc) so it helps to have it be a node. But making this edge unlabeled worries me. Originally you may only think of one possible relationship type between a person and a credit score (the person has a credit score). But other may pop up further down the road, e.g. the person could be a loan agent who orders the credit score but the score is about a customer. So now you create a new edge label (“orders”) to link the loan agent person to the credit score. But what happens to all the code that was written previously and navigates the relationship from the person to the score with the expectation that the score is about the person. Do you think that code was careful to only navigate “unlabeled” edges? Unlikely. Most likely it just grabbed whatever credit score was linked to the person. If that code is applied to a person who happens to also be a loan agent, it might well grab a credit score about other people which happened to be ordered by the loan agent. These unlabeled edges remind me of the practice of not bothering with a “version” field in the first version of your work because, hey, there is only one version so far.

The restriction that a node can have at most one edge with a given label coming out of it is another one that puzzles me. Though it may explain why an unlabeled edge is used for the credit score (since you can get several credit scores for the same person, if you ask different rating agencies). But if unlabeled edges are just a way to free yourself from this restriction then it would be better to remove the restriction rather than work around it. Let’s take the “Spouse” label as an example. For one thing in some countries/cultures having more than one such edge might be possible. And having several ex-spouses is possible in many places. Why would the “ex-spouse” relationship have to be defined differently from “spouse”? What about children? How is this modeled? Would we be forced to have a chain of edges from parent to 1st child to next sibling to next sibling, etc? Good luck dealing with half-siblings. And my model may not care so much about capturing the order (especially if the date of birth is already captured anyway). This reminds me of how most XML document formats force element order in places where it is not semantically meaningful, just because of XSD’s bias towards “sequence”.

Having started this entry by declaring that I don’t understand what M tries to be, I really shouldn’t be criticizing its design choices. The “weird” aspects I point out are only weird in the context of a certain usage but they may make perfect sense in the usage that the Oslo team has in mind. So I’ll stop here. The bottom line is that there are traces, in M, of a nice, reusable, graph-oriented data model with strong bridges (in both directions) to programming languages and user interfaces. That is appealing to me. There are also some strange restrictions that puzzle me. We’ll see where this goes (hopefully this article, “Designing Domains and Models Using M” will soon contain more than “to be submitted” and I can better understand the M approach). In any case, kuddos to the team for being so open about their work and the evolution of their design.

02
Mar
2009

CMDBf is a lot more and a lot less than you think

by William (@vambenepe on Twitter)

The DMTF CMDBf working group has recently published an updated draft of its specification. The final version should follow soon and I don’t expect major changes so now is not a bad time to start thinking about what this baby can do.

Since CMDBf stands for “configuration management database federation”, you might think the obvious answer to the “what can it do” question is “build a federation of configuration management databases”. Except it’s not. Despite its name, CMDBf provides little support for federation unless you take a very loose definition of the term. The specification gives you a query language and a very simple registration interface, with a sprinkle of metadata to improve interoperability. The query language lets you talk to a CMDB to retrieve information on configuration items (CIs) that it knows about. The registration interface lets you keep a CMDB informed of changes to CIs that it may care about. If you want to build on top of this a real federation, one that scales to the type of environment that CMDBs are used for today, you have to go further than what the specification provides. What CMDBf does give you is some amount of integration between CMDBs (at the protocol level at least, not at the model level). It may not sound like much but it is a lot of progress on the current situation and the right incremental step, whether you are aiming for true federation as the end goal or not.

That’s the “a lot less than you think” part. So, what’s the “a lot more than you think” part? Good stuff all around:

CMDBf provides a metamodel that is well-suited for complex IT systems and it provides an elegant graph-oriented query language on top of it. The most convenient representation for an IT system is neither “one big XML document” nor “a sea of nodes and edges”. CMDBf gives you a middle ground: a graph model with XML leaf nodes. So you can precisely model the relationships between your IT elements using explicit relationships (with their own records), but you can also attach a well-understood piece of XML to an item as a record without having to break that XML into a bunch of tiny relationships.

I am pretty sure there are other domains, beyond IT systems, for which this would be useful. It will be interesting to see if the CMDBf specification gets considered outside of its intended scope. But these domains are more likely to end up using RDF/OWL/SPARQL instead. Not everyone has made the leap from XML as a tool to XML as a religion, which made CMDBf necessary for us. But let’s not veer into another rant.

Let’s go back instead to describing how useful CDMBf can be to IT systems management, independently of any “federation” objective. Let me put it this way: if one was to create from scratch a configuration store for IT systems they should strongly consider the CMDBf conceptual model as the base metamodel. And something along the lines of the CMDBf Query (though not necessarily through its XML serialization) as the native query language for it. Most CMDBf implementers of course are not in this situation. Rather than writing the store from scratch they will create a CMDBf wrapper/interface on their current CMDB. And that’s fine too. CMDBf will work well as an interoperability protocol. Putting aside my gripes about XPath overuse, CMDBf strikes a reasonable balance that makes it implementable on top of any back-end technology (relational, XML, RDF, in-memory objects, bags of name-value pairs…). And the query patterns it supports map well to CMDB-to-CMDB integration use cases. But it is underselling it, in my view, to restrict it to this over-the-wire interoperability scenario. CMDBf also provides a very useful foundation for local access to the CMDB. CMDBf graph queries can support powerful visualization of the content of the CMDB. They can support the definition of configuration rules. They can support in-depth inspection of relationships (e.g. fault tree).

And that may jsut be the beginning. It could take three directions after v1:

The first one, as always for a standard, is that it is ignored and becomes irrelevant. I have to reluctantly list this one first, because it is statistically the most likely for a new standard. Especially one that is not a ratification of an existing de facto standard. And one that threatens an important control point for vendors. A slight variation on this scenario is for CMDBf to succeed from a marketing perspective, as a checkmark that most vendors tick, but not as a true technology. This is the “smokescreen” scenario from Mr. Skeptic. One scenario that worries me is that CMDBf could fail because of the poor models of the CMDBs that implement it. If your IT model is not granular enough or if it matches the UI of your application more than the semantics of the IT components, then CMDBf will expose these shortcomings and probably be blamed for them (with bad models, “shoot the messenger” becomes “shoot the protocol”).

The second possible direction is that CMDBf provides enough value in integrating CMDBs that people want more and challenge the group to deliver on the “f” part, federation. That could take the form of a combination of:

  • better integration with other protocols (mostly from the WS-Management family, like WS-Enumeration and WS-Eventing),
  • reconciliation support (here are ways to address it),
  • some model transformations or canonical models,
  • some optimizations in the query mechanism for distributed queries (e.g. data partition rules).

The third possible direction (not exclusive) is for CMDBf to become the basis for a standard rule language for IT models. Yeah, another one (remember SML?). SPIN and SML show us how a generic query language can be used to support configuration rules. I very much like SPIN but it requires adopting RDF as a metamodel, which is a hard sell in XML-land. SML suffers technically from being too reliant on an inappropriate validation tool (XSD) and treating relationships as a second thought rather than an integral part of the model. Which is fine in many areas (EMF does it too), but not, in my view, when modeling IT systems.

If we are not going to use RDF/SPIN then let’s copy them. We can use the CMDBf metamodel (graph-based) where SPIN uses RDF. We can use the CMDBf query language (graph-oriented) where SPIN uses SPARQL. Since CMDBf queries use XPath, we see some commonalities with SML (which uses XPath through Schematron). But in CMDBf XPath is scoped to the leaf nodes of the graph, not the entire model as it is in SML. In other words, SML adds relationship traversal to XPath, while CMDBf adds XPath to its relationship-aware queries. It’s a matter of who’s on top. It sounds academic but it isn’t.

Does the industry really want standardized, re-usable configuration rules? SML/CML seem to say no. The push towards Cloud interop, on the other hand, begs for it. At least if you believe in programming your environment in a way that is partialy declarative rather than entirely procedural.

[UPDATED 2009/3/5: Rob England (a.k.a. Mr. Skeptic as I refer to him above) provides a geek-to-English translation for this post. Neat!]

02
Feb
2009

UCI: setting RDF for failure?

by William (@vambenepe on Twitter)

I don’t get it. I just read Reuven Cohen’s description of the Unified Cloud Interface project that he recently started. It’s nothing less than using RDF to create “a Semantic Cloud Infrastructure capable of adapting to a variety of methodologies / architectures and completely agnostic to any specific API or platform being described.”

What made me fall off my chair is the methodology/architecture part of this statement. It’s hard enough (but doable) to use RDF to map philosophically similar APIs. It’s a non-starter to use it to bridge architectural and methodological differences. I have spent a fair amount of time looking at Semantic Web technologies in the context of modeling IT systems (see the “semantic tech” category of this blog). While I think they would be a great foundation I don’t see them ever coming anywhere near what Reuven describes.

But to be fair, I am not sure what he really is describing. There are a few overly ambitious proclamations like the one above and this paragraph:

The key drivers of a unified cloud interface (UCI) is “One abstraction to Rule them All” – an API for other API’s. A singular abstraction that can encompass the entire infrastructure stack as well as emerging cloud centric technologies through a unified interface. What a semantic model enables for UCI is a capability to bridge both cloud based API’s such as Amazon Web Services with existing protocols and standards, regardless of the level of adoption of the underlying API’s or technology. The goal is simple, develop your application once, deploy anywhere at anytime for any reason.

But in his piece you’ll also find CIM being cited as an example. There are good things to be said about CIM, but it certainly is not “a dynamic computing model that can, under certain conditions, be ‘trained’ to appropriately ‘learn’ the meaning of related cloud & infrastructure resources” (or, in the case of CIM, computer system resources). Good luck “training” CIM to “learn” anything. It’s CIM that’s going to train you to do it its way, period.

The CIM example (and other standards he lists) paints the picture of defining a standard API for Cloud Computing and forcing all providers to use it. That’s the conventional approach to universality. If that’s what UCI is after then it is technically achievable. And RDF might be a very good technical foundation for it. Whether anyone can pull this off politically and commercially at this stage is a different question of course. In any case, such an effort would have nothing to do with magically wrapping whatever API each provider has defined and whatever architecture/methodology they chose.

And further down we see a sketch of another, much more modest, vision, when Reuven talks about how “these web resources could just as easily be ‘cloud resources’ or API’s” which seems to represent a whole API as an RDF resource. Sure, then you can use RDF/OWL to capture versioning information between them, backward compatibility etc. Probably very useful, but that’s a very different scope.

So which is it? Reuven is a thought leader in Cloud Computing, so I want to think I am missing his point.

So far, I haven’t seen any Cloud taxonomy that is reasonably complete and has received broad support. Shouldn’t we first try to come up with a human-readable taxonomy before we try to turn it into a machine-readable ontology? In my previous post I explicitly stayed away from being pedantic about the difference between the terms, but the confusion between a taxonomy and an ontology seems to be part of what’s going on here.

The sad thing is that they (you know, them) will point to this as a proof that Semantic Web technologies don’t work.

Or maybe I’ve just set myself up for a generous portion of humble pie on April 2nd (when Reuven says an “initial functional draft UCI implementation, taxonomy and ontology” will be unveiled). I’d love to be surprised. And my ego has taken worse hits before.

[UPDATED 2009/2/10: You should read Steve Oberlin's take on this overall taxonomy/ontology discussion. He knows the topic, carefully reads the posts that he comments on, packs a healthy dose of skepticism and takes the time to explain what taxonomies and ontologies are, which was overdue. Plus, I just love sites that don't feel the need to use decorative pictures. His doesn't have a single image file which means that even if he didn't have superb credentials (which he does) he'd get my respect by default. A blog to watch.]

12
Jan
2009

A new SPIN on enriching a model with domain knowledge (constraints and inferences)

by William (@vambenepe on Twitter)

Back when I was at HP and we got involved with what turned into SML (now a W3C candidate recommendation), we tried to make a case for the specification to be based on RDF/OWL rather than XML/XSD/Schematron. It was a strange situation from a technical perspective because RDF is a better foundation for an IT model than XML, but on the other hand XSD/Schematron is a better choice for validation than OWL. OWL is focused on inference, not validation (because of both fundamental design choices, e.g. the open world assumption, and language expressiveness limitations).

So our options were to either use the right way to represent the system (RDF) combined with the wrong way to capture constraints (OWL) or to use the wrong way to represent the system (XML) combined with the right way to constrain it (mostly Schematron, with some limited help from XSD). At the end, of course, this subtle technical debate was crushed under the steamroller of vendor politics and RDF never got a fair chance anyway.

The point of this little background story is to describe the context in which I read this announcement from Holger Knublauch of TopQuadrant: the new version of their TopBraid Composer tool introduces SPIN, a way to complement OWL with a SPARQL-based constraint checking and inference mechanism.

This relates to SML in two ways.

First, there are similarities in the approach: Schematron leverages the XPath language, used to query XML, to create validation rules. SML then marries Schematron with XSD, for a more powerful validation mechanism. Compare this to SPIN: SPIN leverages the SPARQL query language, used to query RDF, to create validation/inference rules. SPIN also marries this with OWL, for a more powerful validation/inference mechanism.

But beyond the mirroring structures of SPIN and SML, the most interesting thing is that it looks like SPIN could nicely solve the conundrum, described above, of RDF being the right foundation for modeling IT systems but OWL being the wrong constraint mechanism. SPIN may do a better job than SML at what SML is aiming to do (validation rules). And at the same time, you get “for free” (or as close to “for free” as you can get with software, which is still far from “free”) a pretty powerful inference mechanism. The most powerful I know of, short of using a general programming language to capture your inference rules (and good luck with maintaining these rules).

This may sound like sci-fi, but it’s the next logical step for IT configuration standardization. Let’s look at where we are today:

  • SML (at W3C) is an attempt to standardize the expression of constraints.
  • CMDBf (at DMTF) is standardizing how the model content is queried (and, to some limited extent at this point, federated).
  • And recently IBM authored a proposal for a reconciliation specification for items in the model and sent it to an Eclipse group (COSMOS).

But once you tackle reconciliation, you are already half-way into inferencing territory. At least if you want to reconcile between models, not just between instances expressed in the same model. Because the models may not be defined at the same level of granularity, and before you can reconcile items you need to infer finer-grained entities in your coarser-grained model (or vice-versa) so that you can reconcile apples with apples.

Today, inferencing for IT models is done as part of the “discovery packs” that you can buy along with your IT management model repository. But not very well, in general. Because the way you write such a discovery module for the HP Universal CMDB is very different from how you write it for the BMC CMDB, IBM’s CCMDB or as a plug-in for Oracle Enterprise Manager or Microsoft System Center. Not to mention the smaller, more specialized, players. As a result, there is little incentive for 3rd party domain experts to put work into capturing inference rules since the work cannot be widely leveraged.

I am going a bit off-topic here, but one interesting thing about standardization of inferencing for IT management, if it happens, is that it is going to be very hard to not use RDF, OWL and some flavor of SPARQL (SPIN or equivalent) there. And once you do that, the XML-based constraint mechanisms (SML or others) are going to be in for a rough ride. After resisting the RDF stack for constraints, queries and basic reconciliation (because the added value was supposedly not “worth the cost” for each of these separately), the XML dam might get a crack for inferencing. And once RDF starts to trickle through that crack, the whole dam is going to come down in a big wave. Just to be clear, this is a prophetic long-term vision, not a prediction for 2009 (unfortunately).

In the meantime, I’d like to take this SPIN feature a… spin (sorry) when I find some time. We’ll see if I can install the new beta of TopBraid composer despite having used up, a year ago, my evaluation license of the earlier version of the product. Despite what I had hopped at some point, this is not directly applicable to my current work, so I am not sure I want to buy a license. But who knows, SPIN may turn out to be the change that eventually puts RDF back on my “day job” list (one can dream)…

It’s also nice that Holger took the pain to deliver SPIN not just as a feature of his product but also as a stand-alone specification, which should make it pretty easy for anyone who has a SPARQL engine handy to support it. Hopefully the next step will be for him to clarify the IP terms for the specification and to decide whether or not he wants to eventually submit it for standardization. Maybe to the W3C SML working group? :-) I’d have a hard time resisting joining if he did.

17
Sep
2008

Here be (XML) dragons

by William (@vambenepe on Twitter)

Spoiler alert: if you like to learn things the hard way, don’t follow this link. It points to a clear description of all the problems, frustrations, disillusions and “ah ah!” moments that are ahead of you as you start to use XML and grow into an expert.

If, on the other hand, you like to be fully prepared and informed when you choose a technology and if you don’t mind sacrificing some adventure and excitement in the process, then you owe it to yourself to read Erik Wilde and Robert Glushko’s XML Fever article. Even if you already consider yourself an XML expert. Especially if you do.

I knew I would like it when I read this in the introduction:

Advanced strains of XML fever often take hold after exposure to the proliferation of more complex and esoteric XML-based technologies layered on top of it. These advanced diseases are harder to catch, but they are also harder to remedy because people who have caught these advanced strains tend to congregate with others with the same diseases and they are continually reinfecting each other.

Oh yes they do. And they speak with such authority that they infect others around them. People who don’t even understand these “more complex and esoteric XML-based technologies” end up being convinced of their magical properties and the need to use them.

I am not going to attempt to summarize the article because it is too tightly packed with great content to be summarized without being butchered. The “tree trauma” section alone could probably save the world billions of dollars in lost productivity if it was widely read.  I’ll just quote a few sections to motivate you to go read the whole thing.

Tree tremors. Whereas tree trauma (discussed earlier) is a basic strain of XML fever caused by the various flavors of trees in XML technologies, tree tremors are a more serious condition afflicting victims trying to manage data in XML that is not inherently tree-structured. The most common causes are data models requiring nontree graph structures and document models needing overlapping structures. In both cases, mapping these models to XML’s tree model results in XML structures that cannot conveniently represent the application-level model.

(…)

The choice of schema languages, however, is more often determined by available tool support and acquired habits than by a thorough analysis of what would be the most appropriate language.

(…)

Triple shock. While RDF itself is simple, large datasets easily contain millions of triples (for truly large datasets this can go up to billions), and managing and querying such a big dataset can become a considerable challenge. If the schema of these large datasets is simple, but ontology overkill has set in and it has been reformulated as an ontology, handling this dataset may become considerably harder, without any immediate benefit.

This is true not just for RDF (a graph model that can be serialized in XML) but for any non-tree model that can be serialized in XML (which is to say any model one can think of). Including every graph model.

Maybe it would help if the article stated more clearly that it’s ok to serialize such a model as XML (e.g. for transmission) as long as you don’t process it (at the application level) as XML. As long as it gets accessed using an API and concepts that are aligned with the semantics of the model.

Imagine that you are receiving an RDF dataset over the wire. You could (if your app runs on the network card rather than in CPU) process it as a bunch of electrical impulses, but that wouldn’t be very convenient. You could process it as a bunch of bits, but that’s still hard. You could process it as a character stream but that’s not that much better. You could process it as XML but that’s still no great. Or you could process it as RDF triplets and be home on time to have dinner with your family. It’s not the fact that it is represented as XML at some point that’s the problem, it’s the fact that your application processes it as XML. Said in another way, just because it makes sense to store it or to send it over the network in XML doesn’t mean that you have to process it as XML in your application.

There is at least one more problem (not covered by the article) that people will eventually run into. You’d think that XML technologies are a consistent and complementary set. Not true. The lack of consistency is illustrated by the “tree trauma” section of the article. But there is also a complementarity problem, in the sense that there are large gaps between the specifications, as anyone who has tried to serialize an XPath nodeset has found out.

As the article points out, all this doesn’t mean that XML is bad or useless. XML technologies can be very useful, but for not for all tasks.

26
Aug
2008

All I know about RDF/OWL I learned in preschool

by William (@vambenepe on Twitter)

I don’t want to seem pretentious, but back in preschool I was a star student. At least when it came to potatoes. I am not sure what it’s called in US preschools, but what we meant by a potato, in my French classroom, was an oval shape in which you put objects. The typical example had two overlapping ovals, one for green things and the other for animals. A green armchair goes in the non-overlapping part of the “green” oval. A lion goes in the non-overlapping part of the “animal” oval. A green frog goes in the intersection. A non-green bus goes outside of both ovals. Etc.

As you probably remember, there are many variations on this, including cases where more than two ovals overlap. The hardest part was when we had to draw the ovals ourselves as opposed to positioning objects in pre-drawn ovals: we had to decide whether to make these ovals overlap or not. Typically they would first be drawn separately until an object that belonged to both would come up, prompting some head-scratching and, hopefully, a redrawing of the boundaries. Some ovals were even entirely contained within a larger oval! Hours of fun! I loved it.

[Side note: meanwhile, of course, the cool kids were punching one another in the face or stealing somebody's lunch money. But they are now stuck with boring million-dollar-a-year jobs as cosmetic surgeons or Wall Street bankers (respectively) while I enjoy the glamorous occupation of modeling IT systems. Who's laughing now?]

To a large extent, these potatoes really are all you need to understand about RDFS and OWL classes. OO people, especially, are worried about “multiple inheritance”. But we are not talking about programmatic objects here, in which inheritance brings methods with it. Just about intersecting potatoes. Subclassing is just putting a potato inside another one. Unions and intersections are just misshaped potatoes made by following the contours of existing potatoes. How hard can all that be?

Sure there are these “properties” you’ve heard about, but that’s just adding an arrow to show that the lion is sitting on the armchair. Or eating the frog.

Just don’t bring up the fact that these arrows can themselves be classified inside their own potatoes, or the school bully (Alex Emmel) will get you.

12
Jun
2008

Mapping CIM associations to CMDBf relationships

by William (@vambenepe on Twitter)

This post started as a comment on the blog of Van Wiles. When it became too long (and turned into a therapeutic rant at the end) I turned it into a blog post of its own. Please, read Van’s post first. Here is my response to him:

Hi Van. Sounds like what you are after is not a mapping of the CIM_Dependency association to a CMDBf record type (anyone can make up such a mapping as you point out), but a generic algorithm to map any CIM association to a corresponding CMDBf relationship record type. Correct? That algorithm needs to handle the fact that the CIM metamodel has the concept of relationship roles while the CMDBf metamodel doesn’t.

Here is a possible such mapping:

  1. Take a CIM association (called “myAssociation”) that has two roles (called “thisOne” and “theOtherOne”).
  2. Take the item that has role name that comes first alphabetically and make it the source (in this example, it is “theOtherOne”)
  3. Take the item that has role name that comes second alphabetically and make it the target (in this example, it is “thisOne”)
  4. Generate a CMDBf record type called “{associationName} _from_ {firstRoleNameAlphabetically} _to_ {secondRoleNameAlphabetically}”

You’re done. The new CMDBf record type is “myAssociation_from_theOtherOne_to_thisOne”, the source is the item with the role “theOtherOne” and the target is the item with the role “thisOne”. Everyone who follows this algorithm (of course it needs to be formally defined and evangelized, there is no guarantee here unless we bake CIM-specific concepts in the core CMDBf specification, which would be a mistake) will produce the same CMDBf relationship record type for a given CIM association.

Applied to the CIM_Dependency example, this would generate a “CIM_Dependency_from_Antecedent_to_Dependent” CMDBf record type, in which the source is the CIM Antecedent and the target is the CIM Dependent.

Alternatively, you can have the algorithm generate two CMDBf relationship record types (one going in each direction) for each CIM association. So you don’t have to arbitrarily pick the first one (alphabetically) as the source. But then you need to have model metadata to capture the fact that these relationships are the inverse of one another (and imply one another). As you well know,I have been advocating for the use of RDF/RDFS/OWL in CMDBf for a while. :-)

In the end, there are three potential approaches:

1) Someone (the CMDBf group or someone else) creates an authoritative mapping for all CIM associations (or at least all the useful ones) and we expect anyone who uses the CIM model with CMDBf to use that mapping.

2) Someone (again, the CMDBf group or someone else) defines a normative CIM to CMDBf mapping, e.g. the one above, and we expect anyone who generates a CMDBf relationship record type from a CIM association to use this mapping algorithm. From a pure logical perspective, it is the same as defining a CMDBf record type for each CIM association (approach 1), but it is less work and it doesn’t have to be updated every time a CIM association is created/versioned. At the cost of uglier (more arbitrary) CMDBf record types being defined.

3) We let people define the relationships in whatever way they choose and we provide a model metadata framework (aka ontology language) to allow mappings between these approaches. For example, you define, in your namespace, a van:CIM-inspired-dependency CMDBf record type that goes from antecedent to dependent. Separately, I defined, in my namespace, a william:CIM-like-dependency CMDBf record type that carries the same semantics (defined, not so precisely BTW but that’s a different topic, by CIM) except that its source is the dependent and its target is the antecedent. The inverse of yours. A suitable ontology language would allow someone (you, me, or a third party who has to assemble a system that uses both relationship types) to assert that mine is the inverse of yours. Once this assertion is captured, a request for any [A]—(van:CIM-inspired-dependency)—>[B] would also return the instances of [B]—(william:CIM-like-dependency)—>[A] because they are known to be the same. And you know how I am going to conclude, of course: OWL (specifically owl:inverseOf) provides just this.

BTW, approach 3 is not incompatible with 1 or 2. Whether or not we define mappings for CIM relationships and whether or not that mapping gets adopted, there will be plenty of cases in a federated scenario in which you need to reconcile models (CIM-based or not). Model metadata (aka an ontology language) is useful anyway.

Readers who only care about the technical aspects and have little time for rants can stop reading here. But, since I haven’t addressed any constructive criticism to the DMTF in a while, I can’t resist the opportunity to point out that if the mailing list archives for the DMTF working groups were publicly available, we wouldn’t have to have these discussions on our personal blogs. I am very glad that Van posted this on his blog because it is a question that many people will have. Whatever the CMDBf specification ends up doing, developers and architects who make use of it will benefit from having access to the deliberations and considerations that resulted in the specification being what it is. There are many emails in the CMDBf mailing list private archive that I am sure would be useful to future CMDBf implementers, but if they don’t show up on Google they don’t exist for any practical purpose. When grappling with the finer points of some specification or programming language I have often Googled my way into email archives (or old specification drafts) of the working groups that designed them. Sometimes I come out thinking “oh, ok, now I understand why they chose that approach” and other times it’s “ok, that’s what I suspected, these guys were high”. Either way, it’s useful to me as a user of the specification. W3C is the best example (of making working group records available, not of being high): not only is the mailing list available but the phone meetings often have a supporting IRC channel in which key points of the discussion get captured and archived. Here is an example. Making life easier for implementers is probably the single most important thing to make a specification successful. And ultimately, that’s the DMTF’s success too.

And it’s not just for developers and architects. It also impacts industry observers and pundits. Like the IT Skeptic who looked into CMDBf and reported “nothing on the DMTF website but press releases. try to find anything by navigating from the homepage”. And you wonder why his article is titled “the CMDB Federation proceeeds (sic) at its usual glacial pace”. There is good work going on, but there is no way for him to see it. This too is bad for the adoption and credibility of DMTF specifications.

Isn’t it ironic that the DMTF expends resources to sponsor a “hospitality suite” at the Burton Group Catalyst conference (presumably to spread the word about the good work taking place in the organization) but fails to make it easy for the industry to see that same good work taking place? It’s like a main street retail shop that advertises in the newspaper but covers its store window with cardboard, preventing passersby from seeing what’s on offer. I notice that all the other “hospitality suites” seem to be staffed by for-profit vendors (Oracle, IBM, Cisco, Microsoft etc are all there). Somehow W3C and OASIS (whose work is very relevant to some of the conference themes, like identity management and SOA) don’t feel the need to give away pens and key chains at the conference.

Dear DMTF, open source is not just good for code.

20
May
2008

I have seen the future of CMDBf

by William (@vambenepe on Twitter)

I got a sneak peak at CMDBf v2 today.

I am calling it v2 based on the assumption that the one being currently standardized in DMTF will end up being called 1.0 (because it’s the first one out of DMTF) or 1.1 (to prevent confusion with the submitted version).

At the Semantic Technology Conference, David Booth from HP presented his work (along with his partner, Steve Battle from HP Labs) to provide a SPARQL front-end to HP’s Universal CMDB (the engine under what was the Mercury MAM product). Here are the slides.

The mapping from SPARQL to TQL (the native query interface for UCMDB) was made pretty easy by the fact that TQL is a graph-oriented query language. How much harder would it be to similarly transform a CMDBf (v1) query interface into a SPARQL query interface (and vice-versa)? Not much. The only added difficulty would come from the CMDBf XPath constraints. TQL has a property value mechanism that is very similar to CMDBf’s “propertyValue” constraint and maps well to SPARQL functions. The introduction of XPath as a constraint language in CMDBf makes things harder. It could be handled by adding XPath support to the SPARQL engine using function extensibility. Or by turning the entire XML into RDF and emulating XPath in SPARQL. But in either case, you’ll have impedence mismatch at some point because concepts such as element order that exist in XPath have no native equivalent in RDF.

The use of XPath in selectors on the other hand is not a problem. HP’s prototype uses Gloze (available as a Jena package) to turn the XML returned by UCMDB into RDF. An XSLT transform could turn that same XML into a CMDBf-valid XML response instead and that XSLT could easily handle the XPath selectors from the query request. This is another reason why constraints and selectors should remain separate in CMDBf (fortunately the specification is back to doing this properly).

Here is why I call this prototype CMDBf v2: The CMDBf effort (v1 or 1.1), in its current form of re-inventing a graph query, can succeed. Let’s assume the working group strikes a reasonable balance between completeness and complexity, and vendors choose to compete on innovation and execution rather than lock-in (insert cynical comment here). CMDBf may then end up being supported by the main CMDB vendors. It wouldn’t provide federation capabilities, but having a common CMDB query interface supported by the Big Four would help with management integration. And yet, while the value would be real, it would only provide a little help to solve a larger problem:

  • As a technology limited to IT systems management, it would be unlikely to see widely available tools (e.g. user consoles and language-specific libraries).
  • It wouldn’t get the kind of robustness and interoperability that comes from wide adoption. While pretty similar, there might be some minor differences in the various implementations. Once your implementation has been tweaked to work with the implementations from the Big Four, you’ll call it done. Just like SNMP, another technology that is specific to IT systems management (see it happen here).
  • Even if it works perfectly at the query level, it will just hasten the time when developers run into the real problem, model interoperability. CMDBf doesn’t help at all with this. In fact, it makes it harder by hard-coding some dependencies on an XML back-end (the XPath constraints).

In the long run, IT management has to become more automated and integrated. That’s a given. The way it happens may or may not go through CMDB-like configuration stores. But if it does, we’ll have to eventually move beyond CMDBf (v1) towards something that addresses the three requirements above. And federation. I don’t know if it will be called CMDBf v2, and/or if it will come from the DMTF (by then, the CMDBf brand might be an asset or a liability depending on developer experience with the specification). But I strongly suspect (“probability 0.8″ as a Gartner analyst might put it) that it will use semantic technologies. Because the real, hard, underlying problem is a problem of semantic integration. In that sense, David and Steve’s prototype is a sneak peek at what will come after CMDBf v1/1.1.

Pretty much since the beginning of CMDBf I have been pushing for it to ideally embrace SPARQL (with no success) or to at least stay close to it conceptually in order to make the eventual mapping/evolution smooth (with a bit more success). This includes pushing for a topological query language, trying to keep XML idiosyncrasies at bay and keeping constraints and selectors cleanly separated. Rather than working within the CMDBf group, David took the alternative approach of simply doing it. Hopefully this will help convince people of the value of re-using semantic web technology for IT systems management. Yes semantic technologies have been designed for a much more general use case. But the use cases that CMDB systems address are a subset of the use cases addressed by semantic technologies. It’s hard for domain experts to see their domain as just a subset of a larger problem, but this is the case here. Isn’t HTTP serving the IT management community better than a systems management-specific alternative would?

By the way, there is no inferencing taking place in the HP prototype. We are just talking about re-using an existing, well though-through graph query language. Sure OWL inferencing and some rules could be seamless layered on top of this. But this is in no way required to do (better) what CMDBf v1 tries to do.

And then there is the “federation” question. Who do you trust more to deliver this? A bunch of IT system management architects in DMTF or the web and query experts at W3C, HP Labs etc who designed and implemented SPARQL over many years? BTW, it sounds like SPQARL federation was discussed at WWW 2008, based on these meeting notes (search for “federation”).

Categories