William Vambenepe's blog

IT management in a changing IT world

order viagraviagra prices

Archive for the 'Semantic tech' Category

12
Jun
2008

Mapping CIM associations to CMDBf relationships

by William Vambenepe

This post started as a comment on the blog of Van Wiles. When it became too long (and turned into a therapeutic rant at the end) I turned it into a blog post of its own. Please, read Van’s post first. Here is my response to him:

Hi Van. Sounds like what you are after is not a mapping of the CIM_Dependency association to a CMDBf record type (anyone can make up such a mapping as you point out), but a generic algorithm to map any CIM association to a corresponding CMDBf relationship record type. Correct? That algorithm needs to handle the fact that the CIM metamodel has the concept of relationship roles while the CMDBf metamodel doesn’t.

Here is a possible such mapping:

  1. Take a CIM association (called “myAssociation”) that has two roles (called “thisOne” and “theOtherOne”).
  2. Take the item that has role name that comes first alphabetically and make it the source (in this example, it is “theOtherOne”)
  3. Take the item that has role name that comes second alphabetically and make it the target (in this example, it is “thisOne”)
  4. Generate a CMDBf record type called “{associationName} _from_ {firstRoleNameAlphabetically} _to_ {secondRoleNameAlphabetically}”

You’re done. The new CMDBf record type is “myAssociation_from_theOtherOne_to_thisOne”, the source is the item with the role “theOtherOne” and the target is the item with the role “thisOne”. Everyone who follows this algorithm (of course it needs to be formally defined and evangelized, there is no guarantee here unless we bake CIM-specific concepts in the core CMDBf specification, which would be a mistake) will produce the same CMDBf relationship record type for a given CIM association.

Applied to the CIM_Dependency example, this would generate a “CIM_Dependency_from_Antecedent_to_Dependent” CMDBf record type, in which the source is the CIM Antecedent and the target is the CIM Dependent.

Alternatively, you can have the algorithm generate two CMDBf relationship record types (one going in each direction) for each CIM association. So you don’t have to arbitrarily pick the first one (alphabetically) as the source. But then you need to have model metadata to capture the fact that these relationships are the inverse of one another (and imply one another). As you well know,I have been advocating for the use of RDF/RDFS/OWL in CMDBf for a while. :-)

In the end, there are three potential approaches:

1) Someone (the CMDBf group or someone else) creates an authoritative mapping for all CIM associations (or at least all the useful ones) and we expect anyone who uses the CIM model with CMDBf to use that mapping.

2) Someone (again, the CMDBf group or someone else) defines a normative CIM to CMDBf mapping, e.g. the one above, and we expect anyone who generates a CMDBf relationship record type from a CIM association to use this mapping algorithm. From a pure logical perspective, it is the same as defining a CMDBf record type for each CIM association (approach 1), but it is less work and it doesn’t have to be updated every time a CIM association is created/versioned. At the cost of uglier (more arbitrary) CMDBf record types being defined.

3) We let people define the relationships in whatever way they choose and we provide a model metadata framework (aka ontology language) to allow mappings between these approaches. For example, you define, in your namespace, a van:CIM-inspired-dependency CMDBf record type that goes from antecedent to dependent. Separately, I defined, in my namespace, a william:CIM-like-dependency CMDBf record type that carries the same semantics (defined, not so precisely BTW but that’s a different topic, by CIM) except that its source is the dependent and its target is the antecedent. The inverse of yours. A suitable ontology language would allow someone (you, me, or a third party who has to assemble a system that uses both relationship types) to assert that mine is the inverse of yours. Once this assertion is captured, a request for any [A]—(van:CIM-inspired-dependency)—>[B] would also return the instances of [B]—(william:CIM-like-dependency)—>[A] because they are known to be the same. And you know how I am going to conclude, of course: OWL (specifically owl:inverseOf) provides just this.

BTW, approach 3 is not incompatible with 1 or 2. Whether or not we define mappings for CIM relationships and whether or not that mapping gets adopted, there will be plenty of cases in a federated scenario in which you need to reconcile models (CIM-based or not). Model metadata (aka an ontology language) is useful anyway.

Readers who only care about the technical aspects and have little time for rants can stop reading here. But, since I haven’t addressed any constructive criticism to the DMTF in a while, I can’t resist the opportunity to point out that if the mailing list archives for the DMTF working groups were publicly available, we wouldn’t have to have these discussions on our personal blogs. I am very glad that Van posted this on his blog because it is a question that many people will have. Whatever the CMDBf specification ends up doing, developers and architects who make use of it will benefit from having access to the deliberations and considerations that resulted in the specification being what it is. There are many emails in the CMDBf mailing list private archive that I am sure would be useful to future CMDBf implementers, but if they don’t show up on Google they don’t exist for any practical purpose. When grappling with the finer points of some specification or programming language I have often Googled my way into email archives (or old specification drafts) of the working groups that designed them. Sometimes I come out thinking “oh, ok, now I understand why they chose that approach” and other times it’s “ok, that’s what I suspected, these guys were high”. Either way, it’s useful to me as a user of the specification. W3C is the best example (of making working group records available, not of being high): not only is the mailing list available but the phone meetings often have a supporting IRC channel in which key points of the discussion get captured and archived. Here is an example. Making life easier for implementers is probably the single most important thing to make a specification successful. And ultimately, that’s the DMTF’s success too.

And it’s not just for developers and architects. It also impacts industry observers and pundits. Like the IT Skeptic who looked into CMDBf and reported “nothing on the DMTF website but press releases. try to find anything by navigating from the homepage”. And you wonder why his article is titled “the CMDB Federation proceeeds (sic) at its usual glacial pace”. There is good work going on, but there is no way for him to see it. This too is bad for the adoption and credibility of DMTF specifications.

Isn’t it ironic that the DMTF expends resources to sponsor a “hospitality suite” at the Burton Group Catalyst conference (presumably to spread the word about the good work taking place in the organization) but fails to make it easy for the industry to see that same good work taking place? It’s like a main street retail shop that advertises in the newspaper but covers its store window with cardboard, preventing passersby from seeing what’s on offer. I notice that all the other “hospitality suites” seem to be staffed by for-profit vendors (Oracle, IBM, Cisco, Microsoft etc are all there). Somehow W3C and OASIS (whose work is very relevant to some of the conference themes, like identity management and SOA) don’t feel the need to give away pens and key chains at the conference.

Dear DMTF, open source is not just good for code.

20
May
2008

I have seen the future of CMDBf

by William Vambenepe

I got a sneak peak at CMDBf v2 today.

I am calling it v2 based on the assumption that the one being currently standardized in DMTF will end up being called 1.0 (because it’s the first one out of DMTF) or 1.1 (to prevent confusion with the submitted version).

At the Semantic Technology Conference, David Booth from HP presented his work (along with his partner, Steve Battle from HP Labs) to provide a SPARQL front-end to HP’s Universal CMDB (the engine under what was the Mercury MAM product). Here are the slides.

The mapping from SPARQL to TQL (the native query interface for UCMDB) was made pretty easy by the fact that TQL is a graph-oriented query language. How much harder would it be to similarly transform a CMDBf (v1) query interface into a SPARQL query interface (and vice-versa)? Not much. The only added difficulty would come from the CMDBf XPath constraints. TQL has a property value mechanism that is very similar to CMDBf’s “propertyValue” constraint and maps well to SPARQL functions. The introduction of XPath as a constraint language in CMDBf makes things harder. It could be handled by adding XPath support to the SPARQL engine using function extensibility. Or by turning the entire XML into RDF and emulating XPath in SPARQL. But in either case, you’ll have impedence mismatch at some point because concepts such as element order that exist in XPath have no native equivalent in RDF.

The use of XPath in selectors on the other hand is not a problem. HP’s prototype uses Gloze (available as a Jena package) to turn the XML returned by UCMDB into RDF. An XSLT transform could turn that same XML into a CMDBf-valid XML response instead and that XSLT could easily handle the XPath selectors from the query request. This is another reason why constraints and selectors should remain separate in CMDBf (fortunately the specification is back to doing this properly).

Here is why I call this prototype CMDBf v2: The CMDBf effort (v1 or 1.1), in its current form of re-inventing a graph query, can succeed. Let’s assume the working group strikes a reasonable balance between completeness and complexity, and vendors choose to compete on innovation and execution rather than lock-in (insert cynical comment here). CMDBf may then end up being supported by the main CMDB vendors. It wouldn’t provide federation capabilities, but having a common CMDB query interface supported by the Big Four would help with management integration. And yet, while the value would be real, it would only provide a little help to solve a larger problem:

  • As a technology limited to IT systems management, it would be unlikely to see widely available tools (e.g. user consoles and language-specific libraries).
  • It wouldn’t get the kind of robustness and interoperability that comes from wide adoption. While pretty similar, there might be some minor differences in the various implementations. Once your implementation has been tweaked to work with the implementations from the Big Four, you’ll call it done. Just like SNMP, another technology that is specific to IT systems management (see it happen here).
  • Even if it works perfectly at the query level, it will just hasten the time when developers run into the real problem, model interoperability. CMDBf doesn’t help at all with this. In fact, it makes it harder by hard-coding some dependencies on an XML back-end (the XPath constraints).

In the long run, IT management has to become more automated and integrated. That’s a given. The way it happens may or may not go through CMDB-like configuration stores. But if it does, we’ll have to eventually move beyond CMDBf (v1) towards something that addresses the three requirements above. And federation. I don’t know if it will be called CMDBf v2, and/or if it will come from the DMTF (by then, the CMDBf brand might be an asset or a liability depending on developer experience with the specification). But I strongly suspect (”probability 0.8″ as a Gartner analyst might put it) that it will use semantic technologies. Because the real, hard, underlying problem is a problem of semantic integration. In that sense, David and Steve’s prototype is a sneak peek at what will come after CMDBf v1/1.1.

Pretty much since the beginning of CMDBf I have been pushing for it to ideally embrace SPARQL (with no success) or to at least stay close to it conceptually in order to make the eventual mapping/evolution smooth (with a bit more success). This includes pushing for a topological query language, trying to keep XML idiosyncrasies at bay and keeping constraints and selectors cleanly separated. Rather than working within the CMDBf group, David took the alternative approach of simply doing it. Hopefully this will help convince people of the value of re-using semantic web technology for IT systems management. Yes semantic technologies have been designed for a much more general use case. But the use cases that CMDB systems address are a subset of the use cases addressed by semantic technologies. It’s hard for domain experts to see their domain as just a subset of a larger problem, but this is the case here. Isn’t HTTP serving the IT management community better than a systems management-specific alternative would?

By the way, there is no inferencing taking place in the HP prototype. We are just talking about re-using an existing, well though-through graph query language. Sure OWL inferencing and some rules could be seamless layered on top of this. But this is in no way required to do (better) what CMDBf v1 tries to do.

And then there is the “federation” question. Who do you trust more to deliver this? A bunch of IT system management architects in DMTF or the web and query experts at W3C, HP Labs etc who designed and implemented SPARQL over many years? BTW, it sounds like SPQARL federation was discussed at WWW 2008, based on these meeting notes (search for “federation”).

19
May
2008

At the Semantic Technology Conference this week

by William Vambenepe

I am going to spend some time, starting tomorrow, at the 2008 Semantic Technology Conference in San Jose. I have a few ideas (on ways to use semantic technologies for IT management) that I am trying to validate, improve or kill, as appropriate. I will of course be interested in any discussion/presentation related to applying semantic technologies to IT management. But I am going there in a pretty open frame of mind. Peripheral vision is important when considering technology that is so generic in nature and that has been used in many different fields. In general, I am more interested in concrete projects, lessons learned, best practices etc than new raw technology. There is already a lot more raw semantic web technology available than we can hope to make use of in IT management anytime soon. The question is more around the best ways to practically and opportunistically deliver value in a field that has plenty of existing relational schemas, XML descriptors, semi-standard class models, semi-structured events and in-Joe’s-head-only domain knowledge nuggets floating around.

If you are going to go there and have any interest in the application of semantic technologies to IT management, I’d be happy to hook up.

31
Mar
2008

Where will you be when the Semantic Web gets Grid’ed?

by William Vambenepe

I see the tide rising for semantic technologies. On the other hand, I wonder if they don’t need to fail in order to succeed.

Let’s use the Grid effort as an example. By “Grid effort” I mean the work that took place in and around OGF (or GGF as it was known before its merger w/ EGA). That community, mostly made of researchers and academics, was defining “utility computing” and creating related technology (e.g. OGSA, OGSI, GridFTP, JSDL, SAGA as specs, Globus and Platform as implementations) when Amazon was still a bookstore. There was an expectation that, as large-scale, flexible, distributed computing became a more pressing need for the industry at large, the Grid vision and technology would find their way into the broader market. That’s probably why IBM (and to a lesser extent HP) invested in the effort. Instead, what we are seeing is a new approach to utility computing (marketed as “cloud computing”), delivered by Amazon and others. It addresses utility computing with a different technology than Grid. With X86 virtualization as a catalyst, “cloud computing” delivers flexible, large-scale computing capabilities in a way that, to the users, looks a lot like their current environment. They still have servers with operating systems and applications on them. It’s not as elegant and optimized as service factories, service references (GSR), service handle (GSH), etc but it maps a lot better to administrators’ skills and tools (and to running the current code unchanged). Incremental changes with quick ROI beat paradigm shifts 9 times out of 10.

Is this indicative of what is going to happen with semantic technologies? Let’s break it down chronologically:

  1. Trailblazers (often faced with larger/harder problems than the rest of us) come up with a vision and a different way to think about what computers can do (e.g. the “computers -> compute grid” transition).
  2. They develop innovative technology, with a strong theoretical underpinning (OGSA-BES and those listed above).
  3. There are some successful deployments, but the adoption is mostly limited to a few niches. It is seen as too complex and too different from current practices for broad adoption.
  4. Outsiders use incremental technology to deliver 80% of the vision with 20% of the complexity. Hype and adoption ensue.

If we are lucky, the end result will look more like the nicely abstracted utility computing vision than the “did you patch your EC2 Xen images today” cloud computing landscape. But that’s a necessary step that Grid computing failed to leapfrog.

Semantic web technologies can easily be mapped to the first three bullets. Replace “computers -> computer grid” with “documents/data -> information” in the first one. Fill in RDF, RDFS, OWL (with all its flavors), SPARQL etc as counterparts to OGSA-BES and friends in the second. For the third, consider life sciences and defense as niche markets in which semantic technologies are seeing practical adoption. What form will bullet #4 take for semantic technology (e.g. who is going to be the EC2 of semantic technology)? Or is this where it diverges from Grid and instead gets adopted in its “original” form?

18
Mar
2008

Ontology governance?

by William Vambenepe

How do organizations that make heavy use of ontologies implement design-time governance?

A quick search tonight didn’t return much on that topic (note to Google: “governance” is not synonymous with “government”). Am I imagining a problem that doesn’t really exist? Am I too influenced by SOA governance use cases?

Sure, a lot of the pain that SOA governance tries to address is self-inflicted. When used to deal with contract versioning hell (hello XSD) and brittle implementations (hello stub generation), SOA governance is just a bandage on a self-shot foot. Thanks to the open-world assumption, deliberate modeling decisions (e.g. no ordering unless required), a very simple metamodel and maybe some built-in versioning support (e.g. owl:backwardCompatibleWith, owl:incompatibleWith, owl:priorVersion, owl:versionInfo, etc, with which I don’t have much experience), in the RDF/OWL world these self-inflicted wounds are a lot less likely.

On the other hand, there are aspects of SOA governance that are more than lipstick on a pig and it seems that some of those should apply to ontology governance. You still need to discover artifacts. You may still have incompatible versions. You still have to deal with the collaborative aspects of having different people responsible for different parts. You may still need a review/approval process, or other lifecycle aspects. And that’s just at the ontology design level. At the service level you have the same questions of discovery, protocol, query capabilities, etc.

What are the best practices for this in the semantic world? What are the tools? Or alternatively, why is this less important than I think?

Maybe this upcoming book will have some answers to these practical concerns. It was recommended to me by someone who reviewed a draft and had good things to say about it, if not quite as enthusiastically as Toru Ishida from Kyoto University (from the editorial reviews on Amazon):

“At the time when the world needs to find consensus on a wide range of subjects, publication of this book carries special importance. Crossing over East-West cultural differences, I hope semantic web technology contributes to bridge different ontologies and helps build the foundation for consensus towards the global community.”

If semantic technologies can bring world peace, surely they can help with IT management integration…

PS: if you got to this page through a Web search along the lines of “XSD versioning hell” I am sorry that you won’t find the solution to your problems here. Tim Ewald and Dave Orchard have recommendations for you. But the fact that XML experts like Tim and Dave have some kind of (partial) workarounds doesn’t mean the industry doesn’t have a problem. Or you can put your hopes in XSD 1.1 if you are so inclined (see these slides for an overview of the versioning-related changes).

13
Mar
2008

Another IT event standard? I’ll believe it when I CEE it.

by William Vambenepe

Looks like there is yet another attempt to standardize IT events. It’s called the Common Event Expression (CEE). My cynicism would have prevented me from paying much attention to it (how many failed attempts at this do we really need?) if I hadn’t noticed an “event taxonomy” as the first deliverable listed on the home page. These days I am a sucker for the T word. So I dug around a bit and found out that they have a publicly-archived mailing list on which we can see a working draft of a CEE white paper. It looks pretty polished but it is nonetheless a working draft and I am keeping this in mind when reading it (it wouldn’t be fair to hold the group to something they haven’t yet agreed to release).

The first reassuring thing I see (in the “prior efforts” section) is that they are indeed very aware of all the proprietary log formats and all the (mostly failed) past standardization attempts. They are going into this open-eyed (read the “why should we attempt yet another log standard event” section and see if it convinces you). I should disclose that I have some history with one of these proprietary standards (and failed standardization attempts) that probably contributes to my cynicism on the topic. It took place when IBM tried to push their proprietary CBE format into WSDM, which they partially succeeded in doing (as the WSDM Event Format). This all became a moot point when WSDM stalled, but I had become pretty familiar with CBE in the process.

The major advance in CEE is that, unlike previous efforts, it separates the semantics (which they propose to capture in a taxonomy) from the representation. The paper is a bit sloppy at times (e.g. “while the syntax is unique, it can be expressed and transmitted in a number of different ways” uses, I think, “syntax” to mean “semantics”) but that’s the sense I get. That’s nice but I am not sure it goes far enough.

The best part about having a blog is that you get to give unsolicited advice, and that’s what I am about to do. If I wanted to bring real progress to the world of standardized IT logging, I would leave aside the representation part and focus on ontologies. At two levels: first, I would identify a framework for capturing ontologies. I say “identify”, not “invent”, because it has already been invented and implemented. It’s just a matter of selecting relevant parts and explaining how they apply to expressing the semantics of IT events. Then I would define a few ontologies that are applicable to IT events. Yes, plural. There isn’t one ontology for IT events. It depends both on what the events are about (networking, applications, sensors…) and what they are used for (security audit, performance analysis, change management…).

The thing about logs is that when you collect them you don’t necessarily know what they are going to be used for. Which is why you need to collect them in a way that is as close to what really happened as possible. Any transformation towards a more abstracted/general representation looses some information that may turn out to be needed. For example, messages often have several potential ID fields (transport-level, header, application logic…) and if you pick one of them to map it to the canonical messageId field you may loose the others. Let logs be captured in non-standard ways, focus on creating flexible means to attach and process common semantics on top of them.

Should I be optimistic? I look at this proposed list of CEE fields and I think “nope, they’re just going to produce another CBE” (the name similarity doesn’t help). Then I read “by eliminating subjective information, such as perceived impact or importance, sometimes seen in current log messages…” in the white paper draft and I want to kiss (metaphorically, at least until I see a photo) whoever wrote this. Because it shows an understanding of the difference between the base facts and the domain-specific interpretations. Interpretations are useful of course, but should be separated (and ideally automatically mapped to the base facts using ontology-driven rules). I especially like this example because it illustrates one of the points I tried to make during the WSDM/CBE discussions, that severity is relative. It changes based on time (e.g. a malfunction in an order-booking system might be critical towards the end of the quarter but not at the beginning) and based on the perspective of the event consumer (e.g. the disappearance of a $5 cable is trivial from an asset management perspective but critical from an operations perspective if that cable connects your production DB to the network). Not only does CBE (and, to be fair, several other log formats) consider the severity to be intrinsic to the event, it also goes out of its way to say that “it is not mutable once it is set”. Glad to see that the CEE people have a better understanding.

Another sentence that gives me both hope and fear is “another, similar approach would be to define a pseudo-language with subjects, objects, verbs, etc along with a finite set of words”. That’s on the right tracks, but why re-invent? Doesn’t it sound a lot like subject/predicate/object? CEE is hosted by MITRE which has plenty of semantic web expertise. Why not take these guys out to lunch one day and have a chat?

More thoughts on CEE (and its relationship with XDAS) on the Burton Group blog.

Let’s finish on a hopeful note. The “CEE roadmap” sees three phases of adoption for the taxonomy work. The second one is “publish a taxonomy and talk to software vendors for adoption”. The third one is “increase adoption of taxonomy across various logs; have vendors map all new log messages to a taxonomy”. Wouldn’t it be beautiful if it was that simple and free of politics? I wonder if there is a chapter about software standards in The Audacity of Hope.

07
Mar
2008

Oracle semantic technologies resources

by William Vambenepe

I have started to look at the semantic technologies available in Oracle’s portfolio and so far I like what I see. At HP, I had access to top experts on semantic technologies (mostly from HP Labs) but no special product (not counting Jena which is available to everyone). At Oracle, I find both top experts and very robust products. If you too are looking into Oracle’s offering related to semantic technologies, here are a few links to publicly-available resources that I have found useful. This is filtered based on my interests (yours may be different, for example I skip content related to life sciences applications).

The main page (and what should be your starting point on that topic) is the Semantic Technologies Center on OTN. Most of the other resources listed below are only a click or two away from there. The Semantic Technologies Forum is the right place for questions. The Semantic Web page on the Oracle Wiki doesn’t contain much right now but that may change.

For an overview of the semantic technology capabilities and their applicability, start with Semantic Data Integration for the Enterprise (white paper) and Why, When, and How to Use Oracle Database 11g Semantic Technologies (slides). Then look at Enterprise Semantic Web in Practice (slides) for many real-life examples.

When you are ready to take advantage of the Oracle semantic technologies capabilities, start with The Semantic Web for Application Developers (slides) followed by RDF Support in Oracle RDBMS (these are more detailed slides but they seem based on 10gR2 rather than 11g so not as complete, no OWL for example). Then grab a thermos of coffee and lock yourself in the basement for a while with the Oracle Database Semantic Technologies Developer’s Guide (also available as a hundred-page PDF).

At that point, you may chose to look into the design choices (with performance analysis) that where made in the Oracle implementation by reading A Scalable RDBMS-Based Inference Engine for RDFS/OWL. There is also a Wiki page on OWLPrime to describe the subset of OWL supported in 11g. Finally, you can turn to the Inference Best Practices with RDFS/OWL white paper for tuning tips on 11g.

To get the actual bits, you can download the Oracle 11g Database on OTN. The semantic technologies support is in the Spatial option for the database, which is included in the base download.

I will keep updating this page as interesting new resouces are created (or as I discover existing ones). For resources on semantic technologies in general (non Oracle specific) good sources are Dave Beckett’s page, the W3C (list of resources or standardization activities) or the Cover Pages.