Category Archives: Semantic tech

REST + RDF, finally a practical solution?

The W3C has recently approved the creation of the Linked Data Platform (LDP) Working Group. The charter contains its official marching orders. Its co-chair Erik Wilde shared his thoughts on the endeavor.

This is good. Back in 2009, I concluded a series of three blog posts on “REST in practice for IT and Cloud management” with:

I hereby conclude my “REST in practice for IT and Cloud management” series, with the intent to eventually start a “Linked Data in practice for IT and Cloud management” series.

I never wrote that later part, because my work took me away from that pursuit and there wasn’t much point writing down ideas which I hadn’t  put to the test. But if this W3C working group is successful, they will give us just that.

That’s a big “if” though. Religious debates and paralyzing disconnects between theorists and practitioners are all-too-common in tech, but REST and Semantic Web (of which RDF is the foundation) are especially vulnerable. Bringing these two together and trying to tame both sets of daemons at the same time is a daring proposition.

On the other hand, there is already a fair amount of relevant real-life experience (e.g. data.gov.uk – read Jeni Tennison on the choice of Linked Data). Plus, Erik is a great pick to lead this effort (I haven’t met his co-chair, IBM’s Arnaud Le Hors). And maybe REST and RDF have reached the mythical point where even the trolls are tired and practicality can prevail. One can always dream.

Here are a few different ways to think about this work:

The “REST doesn’t provide interoperability” perspective

RESTful API authors too often think they can make the economy of a metamodel. Or that a format (like XML or JSON) can be used as a metamodel. Or they punt the problem towards defining a multitude of MIME types. This will never buy you interoperability. Stu explained it before. Most problems I see addressed via RESTful APIs, in the IT/Cloud management realm, are modeling problems first and only secondarily protocol/interaction problems. And their failures are failures of modeling. LDP should bring modeling discipline to REST-land.

The “RDF was too much, too soon” perspective

The RDF stack is mired in complexity. By the time people outside of academia had formed a set of modeling requirements that cried for RDF, the Semantic Web community was already deep in the weeds and had overloaded its basic metamodel with enough classification and inference technology to bury its core value as a simple graph-oriented and web-friendly metamodel. What XSD-fever was to making XML seem overly complex, OWL-fever was to RDF. Tenfold.

Everything that the LDP working group is trying to achieve can be achieved today with existing Semantic Web technologies. Technically speaking, no new work is needed. But only a handful of people understand these technologies enough to know what to use and what to ignore, and as such this application doesn’t have a chance to materialize. Which is why the LDP group is needed. But there’s a reason why its starting point document is called a “profile”. No new technology is needed. Only clarity and agreement.

For the record, I like OWL. It may be the technology that most influenced the way I think about modeling. But the predominance of RDFS and OWL (along with ugly serializations) in Semantic Web discussions kept RDF safely out of sight of those in industry who could have used it. Who knows what would have happened if a graph query language (SPARQL) had been prioritized ahead of inference technology (OWL)?

The Cloud API perspective

The scope of the LDP group is much larger than Cloud APIs, but my interest in it is mostly grounded in Cloud API use cases. And I see no reason why the requirements of Cloud APIs would not be 100% relevant to this effort.

What does this mean for the Cloud API debate? Nothing in the short term, but if this group succeeds, the result will probably be the best technical foundation for large parts of the Cloud management landscape. Which doesn’t mean it will be adopted, of course. The LDP timeline calls for completion in 2014. Who knows what the actual end date will be and what the Cloud API situation will be at that point. AWS APIs might be entrenched de-facto standards, or people may be accustomed to using several APIs (via libraries that abstract them away). Or maybe the industry will be clamoring for reunification and LDP will arrive just on time to underpin it. Though the track record is not good for such “reunifications”.

The “ghost of WS-*” perspective

Look at the 16 “technical issues” in the LCD working group charter. I can map each one to the relevant WS-* specification. E.g. see this as it relates to #8. As I’ve argued many times on this blog, the problems that WSMF/WSDM/WS-Mgmt/WS-RA and friends addressed didn’t go away with the demise of these specifications. Here is yet another attempt to tackle them.

The standards politics perspective

Another “fun” part of WS-*, beyond the joy of wrangling with XSD and dealing with multiple versions of foundational specifications, was the politics. Which mostly articulated around IBM and Microsoft. Well, guess what the primary competition to LDP is? OData, from Microsoft. I don’t know what the dynamics will be this time around, Microsoft and IBM alone don’t command nearly as much influence over the Cloud infrastructure landscape as they did over the XML middleware standardization effort.

And that’s just the corporate politics. The politics between standards organizations (and those who make their living in them) can be just as hairy; you can expect that DMTF will fight W3C, and any other organization which steps up, for control of the “Cloud management” stack. Not to mention the usual coo-petition between de facto and de jure organizations.

The “I told you so” perspective

When CMDBf started, I lobbied hard to base it on RDF. I explained that you could use it as just a graph-based metamodel, that you could  ignore the ontology and inference part of the stack. Which is pretty much what LDP is doing today. But I failed to convince the group, so we created our own metamodel (at least we explicitly defined one) and our own graph query language and that became CMDBf v1. Of course it was also SOAP-based.

KISS and markup

In closing, I’ll just make a plea for practicality to drive this effort. It’s OK to break REST orthodoxy. And not everything needs to be in RDF either. An overarching graph model is needed, but the detailed description of the nodes can very well remain in JSON, XML, or whatever format does the job for that node type.

All the best to LDP.

4 Comments

Filed under API, Cloud Computing, CMDB Federation, CMDBf, DMTF, Everything, Graph query, IBM, Linked Data, Microsoft, Modeling, Protocols, Query, RDF, REST, Semantic tech, SPARQL, Specs, Standards, Utility computing, W3C

Don’t tell Facebook what you like, tell Twitter

There seems to be a lot to like technically about the announcements at Facebook’s f8 conference, especially for a Semantic Web aficionado. But I won’t have anything to do with it as a user. Along with the usual “your privacy is our toy” subtext, I really don’t like the lack of data portability. “Web 2.0” is starting to look a lot like “AOL 2.0”. Here is a better way to do it.

Taking the new “like” button as a simple example, I’d much rather tell Twitter what I like than Facebook. A simple #like hashtag in a tweet can be used to express positive feelings for what the tweet describes. Here is a quick list of the many advantages of this approach over the newly-introduced Facebook “like” feature.

It’s public

Your tweets are available to all. Your Facebook profile can still consume them, so if you think Facebook does the best job at organizing this information about you and your friends you can still go there to view the results. But other applications and networks can tap into the same data, so you can also benefits from innovation coming out of companies which do not want to be Facebook sharecroppers.

It’s publicly public

By which I mean that there is no pretense of privacy and no nasty surprise when trust is violated. Which is going to happen again and again. Especially when it’s not just a matter of displaying data but also of inferring new information based on the raw data collected. At which point it’s almost impossible to segregate access to the derived information based on the privacy settings of the individual data pieces. On Twitter, it’s all public, we all know it from the start, and as such we’re not fooled into sharing more than we should. See the fallacy of privacy settings.

It works on all things

Rather than only being on a web page, you can use a #like hashtag to describe any URI (dereferenceable or not) or even plain text. Just like RDF allows the value of an attribute to be either a URI or a scalar value (string, number…). For example, you can express that you like a quote or a verse of a poem by including them directly in the tweet. It’s not as identifiable as something that has a URI, but it can still be part of your profile. And smart consumers of this data might still be able to do some processing on it (e.g. recognizing it as a line from a song).

It can still be 1-click

You don’t necessarily have to copy/paste a URL (or text) into twitter. A web site can still do this for you, as long as it has your permission to post on your behalf. With that approach, it looks exactly like the Twitter “like” button to the user. You don’t have to be a Twitter user, just to have a Twitter account. No need for a Twitter client or to visit the Twitter web site if you don’t want to. It’s also OK if you have zero followers, Twitter is just a technical conduit in this approach.

It can evolve

The success of Twitter is also the success of self-organization as illustrated by the emergence of @replies, #hashtags and RT, directly form the users. Rather than having Facebook decide what verbs make sense to allow users to express their thoughts on the Web, let people decide and see what verbs emerge (e.g. to describe what you like, dislike, are curious about, are considering buying, etc). The only thing we need is an understanding that the hashtag qualifies the user’s attitude towards what’s described by the rest of the tweet. Or maybe hashtags should not be reused for this, maybe we need a new breed, “semtags” (semantic tags), with a different syntax, e.g. “^like”. This way you can semtag a hashtag, e.g. “^like #nyc” might replace “I ♥ NY” on twitter feeds (and tee shirts). It can be as simple or as complex as needed, based on what sticks in the real world. Nerds like me will try to qualify it (e.g. “^!like” for “I don’t like”) and might even come up with ontologies (^love subClassOf ^like). These experiences will probably fail and that’s fine. Evolution strives on failures.

It is transparent

Even if you let a site write these messages on your twitter feed, you can see exactly what goes on. There is no secret channel as with Facebook. The fact that it goes on your Twitter timeline acts as a validation, ensuring that only relevant, human-readable messages get added to your profile. Which is the only way in which we can maintain control of our profile information. If sites start to send too much information or opaque information you’ll see it. And so will your followers. This will put pressure on sites to make the posted data sparse and meaningful, because they know that their users won’t want to scare away their followers with social spam. See, for example, how the outcries over foursquare spam seem to have forced a clean-up (or at least so it looks to me, but maybe it’s just because I’ve unfollowed the spammers). Keeping social profiling on a human scale is a bug, not a feature.

It is persisted in many places

Who do you think is more likely to be around in 20 years, Facebook or the Library of Congress? Tweets are archived in many places, including Twitter itself, of course, but also Google, Bing and the Library of Congress. Plus, it’s very easy for you to set up a system to save all your tweets. Even if Twitter disappears, all the data in your profile that was built from your tweets will still be around. And if Google, Bing and the Library of Congress all go dark before Facebook, well that’s fine because the profile data from your tweets can be there too.

In effect, you should think of Facebook as a repository and Twitter as a stream. Don’t publish directly to one repository. Publish to a stream and benefit from all the repositories and other consumers that tap into it. It’s a well-known enterprise integration pattern (message bus), but it’s not just good for enterprise applications.

In fact, more than Twitter itself it’s this pattern that I want to encourage. Twitter is just the most obvious implementation, at this time, of a profile data bus. It already has almost everything we need (though a more fine-grained authorization model, or a delegated authorization model, would make me more likely to allow sites to tweet on my behalf). What matters is the switch from social networks owning data to you owning your data and social networks competing on how much value they can deliver to you based on the data. For example, LinkedIn might be the best for work connections, Facebook for personal connections, Google for brute search/retrieval of information, etc. I don’t want to maintain different profile data and privacy settings for each of them. I have one global privacy settings, which controls what I share with the world. Based on this, I want these sites to compete on the value they provide to me. It may not be what Facebook wants, but if what works best for us.

If you like this proposal, you know what you have to do. Go ahead and tweet:

^like https://stage.vambenepe.com/archives/1464

Or just retweet it.

[UPDATED 2010/5/6: See the next post for some clarifications.]

10 Comments

Filed under Everything, Facebook, Google, Mashup, RDF, Semantic tech, Social networks, Twitter

REST in practice for IT and Cloud management (part 3: wrap-up)

[Preface: a few months ago I shared some thoughts about how REST was (or could) be applied to IT and Cloud management. Part 1 was a comparison of the RESTful aspects of four well-known IaaS Cloud APIs and part 2 was an analysis of how REST applies to configuration management. Both of these entries received well-informed reader comments BTW, so if you read the posts but didn’t come back for the comments you really owe it to yourself to do so now. At the time, I jotted down thoughts for subsequent entries in this series, but I never got around to posting them. Since the topic seems to be getting a lot of attention these days (especially in DMTF) I decided to go back to these notes and see if I could extract a few practical recommendations in the form of a wrap-up.]

The findings listed below should be relevant whether your protocol is trying to be truly RESTful, just HTTP-centric or even zen-SOAPy. Many of the issues that arise when creating a protocol that maps well to IT management use cases should transcend these variations and that’s what I try to cover.

Finding #1: Relationships (links) are first-class entities (a.k.a. “hypermedia”)

The clear conclusion of both part 1 and part 2 was that the most relevant part of REST for IT and Cloud management is the use of hypermedia. IT management enjoys a head start on this compared to other domains, because its models are already rich in explicit relationships (e.g. CIM associations), as opposed to other business domains in which relationships are more implicit (to the end user at least). But REST teaches us that just having relationships in your model is not enough. They need to be exposed in a way that maps directly to the protocol, so that following a relationship is an infrastructure-level task, not an application-level task: passing an ID as a parameter for some domain-specific function is not it.

This doesn’t violate the rule to not mix the protocol and the model because the alignment should take place in the metamodel. XML is famously weak in that respect, but that’s where Atom steps in, handling relationships in a generic way. Similarly, support for references is, in addition to its accolade to Schematron, one of the main benefits of SML (extra kudos for apparently dropping the “EPR” reference scheme between submission and standardization, in favor of just the “URI” scheme). Not to mention RDFa and friends. Or HTTP Link headers (explained) for link-challenged types.

Finding #2: Put IDs on steroids

There is little to argue about the value of clearly identifying things of interest and we didn’t wait for the Web to realize this. But it is also one of the most vexing and complex problems in many areas of computing (including IT management). Some of the long-standing questions include:

  • Use an opaque ID (some random-looking string a characters) or an ID grounded in “unique” properties of the resource (if you can find any)?
  • At what point does a thing stop being the same (typical example: if I replace each hardware component of a server one after the other, at which point is it not the same server anymore? Does it make sense for the IT guys to slap an “asset id” sticker on the plastic box around it?)
  • How do you deal with reconciling two resources (with their own IDs) when you realize they represent the same thing?

REST guidelines don’t help with these questions. There often is an assumption, which is true for many web apps, that the application “owns” the resource. My “inbox” only exists as a resource within the mail server application (e.g. Gmail or an Exchange server). Whatever URI GMail assigns for it is the URI for my inbox, period. Things are not as simple when the resources exist outside of any specific application: take a server, for example: the board management controller (or the hypervisor in the case of a VM), the OS management layer and the management agent installed on the machine all have claims to report on the machine (and therefore a need to identify it).

To some extent, Cloud computing simplifies many of these issues by providing controllers that “own” infrastructure resources and can authoritatively identify them. But it really is only pushing the problem to the next level of the stack.

Making the ID a URI doesn’t magically answer these questions. Though it helps in that it lets you leverage reconciliation mechanisms developed around URIs (such as <atom:link rel=”alternate”> or owl:sameAs). What REST does is add another constraint to this ID mechanism: Make the IDs dereferenceable URLs rather than just URIs.

I buy into this. A simple GET on a resource URI doesn’t solve everything but it has so many advantages that it should be attempted in all cases. And make this HTTP GET please (see finding #6).

In this adoption of GET, we just have to deal with small details such as:

  • What URL do I use for resources that have more than one agent/controller?
  • How close to the resource do I point this URL? If it’s too close to it then it may change as the resource evolves (e.g. network changes) or be affected by the resource performance (e.g. a crashed machine or application that does not respond to its management API). If it’s removed from the resource, then I introduce a scope (e.g. one controller) within which the resource has to remain, which may cause scalability concerns (how many VMs can/should one controller handle, what if I want to migrate a VM across the ocean…).

These are somewhat corner cases (and the more automation and virtualization you get, the fewer possible controllers you have per resource). While they need to be addressed, they don’t come close to negating the value of dereferenceable IDs. In addition, there are plenty of mechanisms to help with the issues above, from links in the representations (obviously) to RDDL-style lightweight directory to a last resort “give Saint Peter a call” mechanism (the original WSRF proposal had a sub-specification called WS-RenewableReferences that would let you ask for a new version of an expired EPR but it was never published — WS-Naming in then-GGF also touched on that with its reference resolvers — showing once again that the base challenges don’t change as fast as technology flavors).

Implicit in this is the fact that URIs are vastly superior to EPRs. The latter were only just a band-aid on a broken system (which may have started back when WSDL 1.1 decided to define “ports” as message aggregators that can have only one URL) and it’s been more debilitating to SOAP than any other interoperability issue. Web services containers internalized this assumption to the point of providing a stunted dispatch mechanism that made it very hard to assign distinct URLs to resources.

Finding #3: If REST told you to jump off a bridge, would you do it?

Adherence to REST is not required to get the benefits I describe in this series. There is a lot to be inspired by in REST, but it shouldn’t be a religion. Sure, if you squint hard enough (and poke it here and there) you can call your interface RESTful, but why bother with the contortions if some parts are not so. As long as they don’t detract from the value of REST in the other parts. As in all conversions, the most fervent adepts of RPC will likely be tempted to become its most violent denunciators once they’re born again. This is a tired scenario that we don’t need to repeat. Don’t think of it as a conversion but as a new perspective.

Look at the “RESTful with many parameters?” comment thread on Stefan Tilkov’s excellent InfoQ introduction to REST. It starts with some shared distaste for parameter-laden URIs and a search for a more RESTful approach. This gets suggested:

You could do a post on some URI like ./query/product_dep which would create a query resource. Now you “add” products to the query either by sending a product uri list with the initial post or by calling post on ./query/product_dep/{id}. With every post to the query resource the get on the query resource would change.

Yeah, you could. But how about an RPC-like query operation rather than having yet another resource lifecycle to manage just for the sake of being REST-compliant? And BTW, how do you think any sane consumer of your API is going to handle this? You guessed it, by packaging the POST/POST/GET/DELETE in one convenient client-side library function called “query”. As much as I criticize RPC-centric toolkits (see finding #5 below), it would be justified in this case.

Either you understand why/how REST principles benefit you or you don’t. If you do, then use this understanding to interpret the REST principles to best fit your needs. If you don’t, then no amount of CONTENT-TYPE-pixie-dust-spreading, GET-PUT-POST-DELETE-golden-rule-following and HATEOAS-magical-incantation-reciting will help you. That’s the whole point, for me at least, of this tree-part investigation. Stefan says essential the same, but in a converse way, in his article: “there are often reasons why one would violate a REST constraint, simply because every constraint induces some trade-off that might not be acceptable in a particular situation. But often, REST constraints are violated due to a simple lack of understanding of their benefits.” He says “understand why you violate” and I say “understand why you obey”. It is essentially the same (if you’re into stereotypes you can attribute the difference to his Germanic heritage and my Gallic blood).

Even worse than bending your interface to appear RESTful, don’t cherry-pick your use cases to only keep those that you feel you can properly address via REST, leaving the others aside. Conversely, don’t add requirements just because REST makes them easy to support (interesting how quickly “why do you force me to manage the lifecycle of yet another resource just to run a query” turns into “isn’t this great, you can share queries among users and you can handle long-running queries, I am sure we need this”).

This is not to say that you should not create a fully RESTful system. Just that you don’t necessarily have to and you can still get many benefits as long as you open your eyes to the cost/benefits trade-off involved.

Finding #4: Learn humility from REST

Beyond the technology, there is a vibe behind REST design. You can copy the technology and still miss it. I described it in 2005 as Humble Architecture, and applied to SOA at the time. But it describes REST just as well:

More practically, this means that the key things to keep in mind when creating a service, is that you are not at the center of the universe, that you don’t know who is going to consume your service, that you don’t know what they are going to do with it, that you are not necessarily the one who can make the best use of the information you have access to and that you should be willing to share it with others openly…

The SOA Manifesto recently called this “intrinsic interoperability”.

In IT management terms, it means that you can RESTify your CMDB and your event console and your asset management software and your automation engine all you want, if you see your code as the ultimate consumer and the one that knows best, as the UI that users have to go through, the “ultimate source of truth” and the “manager of managers” then it doesn’t matter how well you use HTTP.

Finding #5: Beware of tools bearing gifts

To a large extent, the great thing about REST is how few tools there are to take it away from you. So you’re pretty much forced to understand what is going on in your contract as opposed to being kept ignorant by a wsdl2java type of toolkit. Sure, Java (and .NET) have improved in that regard, but really the cultural damage is done and the expectations have been set. Contrast this to “the ‘router’ is just a big case statement over URI-matching regexps”, from Tim Bray’s post on the Sun Cloud API, one of my main inspirations for this investigation.

REST is not inherently immune to the tool-controlling-the-hand syndrome. It’s just a matter of time until such tools try to make REST “accessible” to the “normal” developer (who can supposedly prevent thread deadlocks but not parse XML). Joe Gregorio warns about this in the context of WADL (to summarize: WADL brings XSD which leads to code generation). Keep this in mind next time someone states that REST is more “loosely coupled” than SOAP. It’s how you use it that matters.

Finding #6: Use screws, not glue, so we can peer inside and then close the lid again

The “view source” option is how I and many others learned HTML. It unfortunately created a generation of HTML monsters who never went past version 3.2 (the marbled background makes me feel young again). But it also fueled the explosion of the Web. On-the-wire inspection through soapUI is what allowed me to perform this investigation and report on it (WMI has allowed this for years, but WS-Management is what made it accessible and usable for anyone on any platform). This was, of course, in the context of SOAP which is also inspectable. Still, in that respect nothing beats plain HTTP which is why I recommend HTTP GET in finding #2 (make IDs dereferenceable) even though I don’t expect that the one-page-per-resource view is going to be the only way to access it in the finished product.

These (HTML source, on-the-wire XML and resource-description pages) rarely hit the human eye and yet their presence enables the development of the more commonly used views. Making it as easy as possible to see what is going on under the covers helps with learning, with debugging, with extending and with innovating. In the same way that 99% of web users don’t look at the HTML source (and 99.99% of them don’t see the HTTP requests) but the Web would not be what it is to them if this inspectability wasn’t been there to fuel its development.

Along the same line, make as few assumptions as possible about the consumers in your interfaces. Which, in practice, often means document what goes on the wire. WSDL/WADL can be used as a format, but they are at most one small component. Human-readable semantics are much more important.

Finding #7: Nothing is free

Part of what was so attractive about SOAP is everything you were going to get “for free” by using it. Message-level security (for all these use cases where your messages starts over HTTP, then hops onto a train, then get delivered by a carrier pigeon). Reliable messaging. Transactionality. Intermediaries (they were going to be a big deal in SOAP, as you can see in vestigial form today in the Nodes/Roles left in the spec – also, do you remember WS-Routing? I do.)

And it’s true that by now there is a body of specifications that support this as composable SOAP headers. But the lack of usage of these features contrasts with how often they were bandied in the early days of SOAP.

Well, I am detecting some of the same in the REST camp. How often have you heard about how REST enables caching? Or about how content types allows an ISP to compress images on the fly to speed up delivery over dial-up? Like in the SOAP case, these are real features and sometimes useful. It doesn’t mean that they are valuable to you. And if they are not, then don’t let them be used as justifications. Especially since they are not free. If caching doesn’t help me (because of low volume, because security considerations prevent a shared cache, etc) then its presence actually adds a cost to me, since I now have to worry whether something is cached or not and deal with ETags. Or I have to consistently remember to request the cache to be bypassed.

Finding #8: Starting by sweeping you front door.

Before you agonize about how RESTful your back-end management protocol is, how about you make sure that your management application (the user front-end) is a decent Web application? One with cool URIs , where the back button works, where bookmarks work, where the data is not hidden in some over-encompassing Flash/Silverlight thingy. Just saying.

***

Now for some questions still unanswered.

Question #1: Is this a flee market?

I am highly dubious of content negotiation and yet I can see many advantages to it. Mostly along the lines of finding #6: make it easy for people to look under the hood and get hold of the data. If you let them specify how they want to see the data, it’s obviously easier.

But there is no free lunch. Even if your infrastructure takes care of generating these different views for you (“no coding, just check the box”), you are expanding the surface of your contract. This means more documentation, more testing, more interoperability problems and more friction when time comes to modify the interface.

I don’t have enough experience with format negotiation to define the sweetspot of this practice. Is it one XML representation and one HTML, period (everything else get produced by the client by transforming the XML)? But is the XML Atom-wrapped or not? What about RDF? What about JSON? Not to forget that SOAP wrapper, how hard can it be to add. But soon enough we are in legacy hell.

Question #2: Mime-types?

The second part of Joe Gregorio’s WADL entry is all about Mime types and I have a harder time following him there. For one thing, I am a bit puzzled by the different directions in which Mime types go at the same time. For example, we have image formats (e.g. “image/png”), packaging/compression formats (e.g. “application/zip”) and application formats (e.g. “application/vnd.oasis.opendocument.text” or “application/msword”). But what if I have a zip full of PNG images? And aren’t modern word processing formats basically a zip of XML files? If I don’t have the appropriate viewer, maybe I’d like them to be at least recognized as ZIP files. I don’t see support for such composition and taxonomy in these types.

And even within one type, things seem a bit messy in practice. Looking at the registered applications in the “options” menu of my Firefox browser, I see plenty of duplication:

  • application/zip vs. application/x-zip-compressed
  • application/ms-powerpoint vs. application/vnd.ms-powerpoint
  • application/sdp vs. application/x-sdp
  • audio/mpeg vs. audio/x-mpeg
  • video/x-ms-asf vs. video/x-ms-asf-plugin

I also wonder at what level of depth I want to take my Mime types. Sure I can use Atom as a package but if the items I am passing around happen to be CIM classes (serialized to XML), doesn’t it make sense to advertise this? And within these classes, can I let you know which domain (e.g. which namespace) my resources are in (virtual machines versus support tickets)?

These questions may simply be a reflection of my lack of maturity in the fine art of using Mime types as part of protocol design. My experience with them is more of the “find the type that works through trial and error and then leave it alone” kind.

[Side note: the first time I had to pay attention to Mime types was back in 1995/1996, playing with non-parsed headers and the multipart/x-mixed-replace type to bring some dynamism to web pages (that was before JavaScript or even animated GIFs). The site is still up, but the admins have messed up the Apache config so that the CGIs aren’t executed anymore but return the Python code. So, here are some early Python experiments from yours truly: this script was a “pushed” countdown and this one was a “pushed” image animation. Cool stuff at the time, though not in a “get a date” kind of way.]

On the other hand, I very much agree with Joe’s point that “less is more”, i.e. that by not dictating how the semantics of a Mime type are defined the system forces you to think about the proper way to define them (e.g. an English-language RFC). As opposed to WSDL/XSD which gives the impression that once your XML validator turns green you’re done describing your interface. These syntactic validations are a complement at best, and usually not a very useful one (see “fat-bottomed specs”).

In comments on previous posts, Stu Charlton also emphasizes the value that Mime types bring. “Hypermedia advocates exposing a variety of links for such state-transitions, along with potentially unique media types to describe interfaces to those transitions.” I get the hypermedia concept, the HATEOAS approach and its very practical benefits. But I am still dubious about the role of Mime types in achieving them and I am not the only one with such qualms. I have too much respect for Joe and Stu to dismiss it entirely, but until I get an example that makes it “click” in practice for me I won’t sweat about Mime types too much.

Question #3: Riding the Zeitgeist?

That’s a practical question rather than a technical one, but as a protocol creator/promoter you are going to have to decide whether you market it as “RESTful”. If I have learned one thing in my past involvement with standards it is that marketing/positioning/impressions matter for standards as much as for products. To a large extent, for Clouds, Linked Data is a more appropriate label. But that provides little marketing/credibility humph with CIOs compared to REST (and less buzzword-compliance for the tech press). So maybe you want to write your spec based on Linked Data and then market it with a REST ribbon (the two are very compatible anyway). Just keep in mind that REST is the obvious choice for protocols in 2009 in the same way that SOAP was a few years ago.

Of course this is not an issue if you specification is truly RESTful. But none of the current Cloud “RESTful” APIs is, and I don’t expect this to change. At least if you go by Roy Fielding’s definition (or Paul’s handy summary):

A REST API must not define fixed resource names or hierarchies (an obvious coupling of client and server). Servers must have the freedom to control their own namespace. Instead, allow servers to instruct clients on how to construct appropriate URIs, such as is done in HTML forms and URI templates, by defining those instructions within media types and link relations. [Failure here implies that clients are assuming a resource structure due to out-of band information, such as a domain-specific standard, which is the data-oriented equivalent to RPC’s functional coupling].

And (in a comment) Mark Baker adds:

I’ve reviewed lots of “REST APIs”, many of them privately for clients, and a common theme I’ve noticed is that most folks coming from a CORBA/DCE/DCOM/WS-* background, despite all the REST knowledge I’ve implanted into their heads, still cannot get away from the need to “specify the interface”. Sometimes this manifests itself through predefined relationships between resources, specifying URI structure, or listing the possible response codes received from different resources in response to the standard 4 methods (usually a combination of all those). I expect it’s just habit. But a second round of harping on the uniform interface – that every service has the same interface and so any service-specific interface specification only serves to increase coupling – sets them straight.

So the question of whether you want to market yourself as RESTful (rather than just as “inspired by the proper use of HTTP illustrated by REST”) is relevant, if only because you may find the father of REST throwing (POSTing?) tomatoes at you. There is always a risk in wearing clothes that look good but don’t quite fit you. The worst time for your pants to fall off is when you suddenly have to start running.

For more on this, refer to Ted Neward’s excellent Roy decoder ring where he not only explains what Roy means but more importantly clarifies that “if you’re not doing REST, it doesn’t mean that your API sucks” (to which I’d add that it is actually more likely to suck if you try to ape REST than if you allow yourself to be loosely inspired by it).

***

Wrapping up the wrap-up

There is one key topic that I had originally included in this wrap-up but decided to remove: extensibility. Mark Hapner brings it up in a comment on a previous post:

It is interesting to note that HTML does not provide namespaces but this hasn’t limited its capabilities. The reason is that links are a very effective mechanism for composing resources. Rather than composition via complicated ‘embedding’ mechanisms such as namespaces, the web composes resources via links. If HTML hadn’t provided open-ended, embeddable links there would be no web.

I am the kind of guy who would have namespace-qualified his children when naming them (had my wife not stepped in) so I don’t necessarily see “extension via links” as a negation of the need for namespaces (best example: RDF). The whole topic of embedding versus linking is a great one but this post doesn’t need another thousand words and the “REST in practice” umbrella is not necessarily the best one for this discussion. So I hereby conclude my “REST in practice for IT and Cloud management” series, with the intent to eventually start a “Linked Data in practice for IT and Cloud management” series in which extensibility will be properly handled. And we can also talk about querying (conspicuously absent from Cloud APIs, unless CMDBf is now a Cloud API) and versioning. As a teaser for the application of Linked Data to IT/Cloud, I will leave you with what Vint Cerf has to say.

[UPDATED 2010/1/27: I still haven’t written the promised “Linked Data in practice for IT and Cloud management” post, but this explanation of the usage of Linked Data for data.gov.uk pretty much says it all. I may still write a post describing how what Jeni says about government data applies to Cloud management APIs, but it’s almost too obvious to bother. Actually, there may be reasons why Cloud management benefits even more from Linked Data than UK government data, so it may still be worth a post. At some point. When I convince myself that it may influence things rather than be background noise.]

15 Comments

Filed under API, Application Mgmt, Automation, Cloud Computing, Everything, IT Systems Mgmt, Manageability, Mgmt integration, Modeling, Protocols, REST, Semantic tech, SOA, SOAP, Specs, Utility computing

OWL news you can use

The W3C released OWL 2 today. Most readers of this blog are IT management people (whether they call it “cloud computing” or “boring old system management”) and don’t follow RDF, OWL, SPARQL etc too closely (if at all). Yet there is a lot of potential value in using these technologies for IT management, so I thought it might be helpful to provide some practical resources on the topic. I have selected articles that cover the special (some may say “twisted”) approach of using OWL and its friends for validation rather than just inference, as this use case is very relevant to IT management.

Of course you can also go to the W3C standard itself, starting with the overview of OWL 2.

Just so you don’t feel lonely if you decide to explore this path, have a look at Elastra’s sexy technology stack. ECML, EDML and EMML are all defined as OWL ontologies.

Comments Off on OWL news you can use

Filed under Application Mgmt, Everything, IT Systems Mgmt, Mgmt integration, Modeling, OWL, RDF, Semantic tech, Specs, Standards

With M (Oslo), is Microsoft on the path to reinventing RDF?

I have given up, at least for now, on understanding what Microsoft wants Oslo (and more specifically the “M” part) to be. I used to pull my hair reading inconsistent articles and interviews about what M tries to be (graphical programming! DSL! IT models! generic parser! application components! workflow! SOA framework! generic data layer! SQL/T-SQL for dummies! JSON replacement! all of the above!). Douglas Purdy makes a valiant 4-part effort (1, 2, 3, 4) but it’s still not crisp enough for my small brain. Even David Chapell, explainer extraordinaire, seems to throw up his hands (“a modeling platform that can be applied in lots of different ways”, which BTW is the most exact, if vague, description I’ve heard). Rather than articles, I now mainly look at the base specifications and technical documents that show what it actually is. That’s what I did  when the Oslo SDK first came out last year. A new technical document came out recently, an update to the MGraph Object Model so I took another a look.

And it turns out that MGraph is… RDF. Or rather, “RDF minus entailment”. And with turtle as the base representation rather than an add-on.

Look at section 3 (“RDF concepts”) in this table of content from W3C. It describes the core RDF concepts. Keep the first five concepts (sections 3.1 to 3.5) and drop the last one (“3.6: Entailment”). You have MGraph, a graph-oriented object model.

On top of this, the RDF community adds reasoning capabilities with RDF entailment, RDFS, OWL, SWRL, SPIN, etc and a variety of engines that implement these different levels of reasoning.

Microsoft, on the other hand, seems to ignore that direction. Instead, it focuses on creating a good mapping from this graph object model to programming languages. In two directions:

  • from programming languages to the graph model: they make it easy for you to create a domain-specific language (DSL) that can easily be turned into M instances.
  • from the graph model to programming languages: they make it easy for you to work on these M instances (including storing them) using the .NET technology stack.

So, if Microsoft is indeed reinventing RDF as the title of this entry provocatively suggests, then they are taking an interesting detour on the way. Rather than going straight to “model-based inferencing”, they are first focusing on mapping the core MGraph concepts to programming (by regular developers) and user interactions (with regular users). Something that for a long time had not gotten much attention in the RDF world beyond pointing developers to Jena (though it seemed to have improved over the last few years with companies like TopQuadrant; ironically, the Oslo model browser/editor is code-named “Quadrant”).

Whether the Oslo team sees the inferencing fun as a later addition or something that’s not needed is another question, on which I don’t see any hint at this point.

I hope they eventually get to it. But I like the fact that they cleanly separate the ability to represent and manipulate the graph model from the question of whether instances can be inferred. We could use such a reusable graph representation mechanism. Did CMDBf, for example, really have to create a new graph-oriented metamodel and query language? I failed to convince the group to adopt RDF/SPARQL, but I may have been more successful if there had been a cleanly-separated “static” version of RDF/SPARQL, a way to represent and query a graph independently of whether the edges and nodes in the graph (and their types) are declared or inferred. Instead, the RDF stack has entailment deeply embedded and that’s very scary to many.

But as much as I like this separation, I can’t help squirming when I see the first example in the MGraph document:

// Populate a small village with some people
Villagers => {
  Jenn => Person { Name => 'Jennifer', Age => 28, Spouse => Rich },
  Rich => Person { Name => 'Richard', Age => 26, Spouse => Jenn },
  Charly => Person { Name => 'Charlotte', Age => 12 }
},
HaveSpouses => { Villagers.Rich, Villagers.Jenn }

That last line is an eyesore to anyone who has been anywhere near RDF. I have just declared that Rich and Jenn are one another’s spouse, why do I have to add a line that says that they have spouses? What I want is to say that participation in a “Spouse” relationship entails membership in the “hasSpouse” class. And BTW, I also want to mark the “Spouse” relationship as symmetric so I only have to declare it one way and the inverse can be inferred.

So maybe I don’t really know what I want on this. I want the graph model to be separated from the inference logic and yet I want the syntactic simplicity that derives from base entailments like the example above. Is June too early to start a Christmas wish list?

While I am at it, can we please stop putting people’s ages in the model rather than their dates of birth? I know it’s just an example, but I see it over and over in so many modeling examples. And it’s so wrong in 99% of cases. It just hurts.

There are other things about MGraph edges that look strange if you are used to RDF. For example, edges can be labeled or not, as illustrated on this first example of the graph model:

In this example, “Age” is a labeled edge that points to the atomic node “42”, while the credit score is modeled as a non-atomic node linked from the person via an unlabeled edge. Presumably the “credit score” node is also linked to an atomic node (not shown) that contains the actual score value (e.g. “800”). I can see why one would want to call out the credit score as a node rather than having an edge (labeled “credit score”) that goes to an atomic node containing the actual credit score value (similar to how “age” is handled). For one thing, you may want to attach additional data to that “credit score” node (when was it calculated, which reporting agency provided it, etc) so it helps to have it be a node. But making this edge unlabeled worries me. Originally you may only think of one possible relationship type between a person and a credit score (the person has a credit score). But other may pop up further down the road, e.g. the person could be a loan agent who orders the credit score but the score is about a customer. So now you create a new edge label (“orders”) to link the loan agent person to the credit score. But what happens to all the code that was written previously and navigates the relationship from the person to the score with the expectation that the score is about the person. Do you think that code was careful to only navigate “unlabeled” edges? Unlikely. Most likely it just grabbed whatever credit score was linked to the person. If that code is applied to a person who happens to also be a loan agent, it might well grab a credit score about other people which happened to be ordered by the loan agent. These unlabeled edges remind me of the practice of not bothering with a “version” field in the first version of your work because, hey, there is only one version so far.

The restriction that a node can have at most one edge with a given label coming out of it is another one that puzzles me. Though it may explain why an unlabeled edge is used for the credit score (since you can get several credit scores for the same person, if you ask different rating agencies). But if unlabeled edges are just a way to free yourself from this restriction then it would be better to remove the restriction rather than work around it. Let’s take the “Spouse” label as an example. For one thing in some countries/cultures having more than one such edge might be possible. And having several ex-spouses is possible in many places. Why would the “ex-spouse” relationship have to be defined differently from “spouse”? What about children? How is this modeled? Would we be forced to have a chain of edges from parent to 1st child to next sibling to next sibling, etc? Good luck dealing with half-siblings. And my model may not care so much about capturing the order (especially if the date of birth is already captured anyway). This reminds me of how most XML document formats force element order in places where it is not semantically meaningful, just because of XSD’s bias towards “sequence”.

Having started this entry by declaring that I don’t understand what M tries to be, I really shouldn’t be criticizing its design choices. The “weird” aspects I point out are only weird in the context of a certain usage but they may make perfect sense in the usage that the Oslo team has in mind. So I’ll stop here. The bottom line is that there are traces, in M, of a nice, reusable, graph-oriented data model with strong bridges (in both directions) to programming languages and user interfaces. That is appealing to me. There are also some strange restrictions that puzzle me. We’ll see where this goes (hopefully this article, “Designing Domains and Models Using M” will soon contain more than “to be submitted” and I can better understand the M approach). In any case, kuddos to the team for being so open about their work and the evolution of their design.

8 Comments

Filed under Application Mgmt, Articles, Everything, Graph query, Mgmt integration, Microsoft, Modeling, RDF, Semantic tech

UCI: setting RDF for failure?

I don’t get it. I just read Reuven Cohen’s description of the Unified Cloud Interface project that he recently started. It’s nothing less than using RDF to create “a Semantic Cloud Infrastructure capable of adapting to a variety of methodologies / architectures and completely agnostic to any specific API or platform being described.”

What made me fall off my chair is the methodology/architecture part of this statement. It’s hard enough (but doable) to use RDF to map philosophically similar APIs. It’s a non-starter to use it to bridge architectural and methodological differences. I have spent a fair amount of time looking at Semantic Web technologies in the context of modeling IT systems (see the “semantic tech” category of this blog). While I think they would be a great foundation I don’t see them ever coming anywhere near what Reuven describes.

But to be fair, I am not sure what he really is describing. There are a few overly ambitious proclamations like the one above and this paragraph:

The key drivers of a unified cloud interface (UCI) is “One abstraction to Rule them All” – an API for other API’s. A singular abstraction that can encompass the entire infrastructure stack as well as emerging cloud centric technologies through a unified interface. What a semantic model enables for UCI is a capability to bridge both cloud based API’s such as Amazon Web Services with existing protocols and standards, regardless of the level of adoption of the underlying API’s or technology. The goal is simple, develop your application once, deploy anywhere at anytime for any reason.

But in his piece you’ll also find CIM being cited as an example. There are good things to be said about CIM, but it certainly is not “a dynamic computing model that can, under certain conditions, be ‘trained’ to appropriately ‘learn’ the meaning of related cloud & infrastructure resources” (or, in the case of CIM, computer system resources). Good luck “training” CIM to “learn” anything. It’s CIM that’s going to train you to do it its way, period.

The CIM example (and other standards he lists) paints the picture of defining a standard API for Cloud Computing and forcing all providers to use it. That’s the conventional approach to universality. If that’s what UCI is after then it is technically achievable. And RDF might be a very good technical foundation for it. Whether anyone can pull this off politically and commercially at this stage is a different question of course. In any case, such an effort would have nothing to do with magically wrapping whatever API each provider has defined and whatever architecture/methodology they chose.

And further down we see a sketch of another, much more modest, vision, when Reuven talks about how “these web resources could just as easily be ‘cloud resources’ or API’s” which seems to represent a whole API as an RDF resource. Sure, then you can use RDF/OWL to capture versioning information between them, backward compatibility etc. Probably very useful, but that’s a very different scope.

So which is it? Reuven is a thought leader in Cloud Computing, so I want to think I am missing his point.

So far, I haven’t seen any Cloud taxonomy that is reasonably complete and has received broad support. Shouldn’t we first try to come up with a human-readable taxonomy before we try to turn it into a machine-readable ontology? In my previous post I explicitly stayed away from being pedantic about the difference between the terms, but the confusion between a taxonomy and an ontology seems to be part of what’s going on here.

The sad thing is that they (you know, them) will point to this as a proof that Semantic Web technologies don’t work.

Or maybe I’ve just set myself up for a generous portion of humble pie on April 2nd (when Reuven says an “initial functional draft UCI implementation, taxonomy and ontology” will be unveiled). I’d love to be surprised. And my ego has taken worse hits before.

[UPDATED 2009/2/10: You should read Steve Oberlin’s take on this overall taxonomy/ontology discussion. He knows the topic, carefully reads the posts that he comments on, packs a healthy dose of skepticism and takes the time to explain what taxonomies and ontologies are, which was overdue. Plus, I just love sites that don’t feel the need to use decorative pictures. His doesn’t have a single image file which means that even if he didn’t have superb credentials (which he does) he’d get my respect by default. A blog to watch.]

1 Comment

Filed under Automation, Cloud Computing, Everything, Modeling, Portability, RDF, Semantic tech, Standards, Tech, Utility computing

A new SPIN on enriching a model with domain knowledge (constraints and inferences)

Back when I was at HP and we got involved with what turned into SML (now a W3C candidate recommendation), we tried to make a case for the specification to be based on RDF/OWL rather than XML/XSD/Schematron. It was a strange situation from a technical perspective because RDF is a better foundation for an IT model than XML, but on the other hand XSD/Schematron is a better choice for validation than OWL. OWL is focused on inference, not validation (because of both fundamental design choices, e.g. the open world assumption, and language expressiveness limitations).

So our options were to either use the right way to represent the system (RDF) combined with the wrong way to capture constraints (OWL) or to use the wrong way to represent the system (XML) combined with the right way to constrain it (mostly Schematron, with some limited help from XSD). At the end, of course, this subtle technical debate was crushed under the steamroller of vendor politics and RDF never got a fair chance anyway.

The point of this little background story is to describe the context in which I read this announcement from Holger Knublauch of TopQuadrant: the new version of their TopBraid Composer tool introduces SPIN, a way to complement OWL with a SPARQL-based constraint checking and inference mechanism.

This relates to SML in two ways.

First, there are similarities in the approach: Schematron leverages the XPath language, used to query XML, to create validation rules. SML then marries Schematron with XSD, for a more powerful validation mechanism. Compare this to SPIN: SPIN leverages the SPARQL query language, used to query RDF, to create validation/inference rules. SPIN also marries this with OWL, for a more powerful validation/inference mechanism.

But beyond the mirroring structures of SPIN and SML, the most interesting thing is that it looks like SPIN could nicely solve the conundrum, described above, of RDF being the right foundation for modeling IT systems but OWL being the wrong constraint mechanism. SPIN may do a better job than SML at what SML is aiming to do (validation rules). And at the same time, you get “for free” (or as close to “for free” as you can get with software, which is still far from “free”) a pretty powerful inference mechanism. The most powerful I know of, short of using a general programming language to capture your inference rules (and good luck with maintaining these rules).

This may sound like sci-fi, but it’s the next logical step for IT configuration standardization. Let’s look at where we are today:

  • SML (at W3C) is an attempt to standardize the expression of constraints.
  • CMDBf (at DMTF) is standardizing how the model content is queried (and, to some limited extent at this point, federated).
  • And recently IBM authored a proposal for a reconciliation specification for items in the model and sent it to an Eclipse group (COSMOS).

But once you tackle reconciliation, you are already half-way into inferencing territory. At least if you want to reconcile between models, not just between instances expressed in the same model. Because the models may not be defined at the same level of granularity, and before you can reconcile items you need to infer finer-grained entities in your coarser-grained model (or vice-versa) so that you can reconcile apples with apples.

Today, inferencing for IT models is done as part of the “discovery packs” that you can buy along with your IT management model repository. But not very well, in general. Because the way you write such a discovery module for the HP Universal CMDB is very different from how you write it for the BMC CMDB, IBM’s CCMDB or as a plug-in for Oracle Enterprise Manager or Microsoft System Center. Not to mention the smaller, more specialized, players. As a result, there is little incentive for 3rd party domain experts to put work into capturing inference rules since the work cannot be widely leveraged.

I am going a bit off-topic here, but one interesting thing about standardization of inferencing for IT management, if it happens, is that it is going to be very hard to not use RDF, OWL and some flavor of SPARQL (SPIN or equivalent) there. And once you do that, the XML-based constraint mechanisms (SML or others) are going to be in for a rough ride. After resisting the RDF stack for constraints, queries and basic reconciliation (because the added value was supposedly not “worth the cost” for each of these separately), the XML dam might get a crack for inferencing. And once RDF starts to trickle through that crack, the whole dam is going to come down in a big wave. Just to be clear, this is a prophetic long-term vision, not a prediction for 2009 (unfortunately).

In the meantime, I’d like to take this SPIN feature a… spin (sorry) when I find some time. We’ll see if I can install the new beta of TopBraid composer despite having used up, a year ago, my evaluation license of the earlier version of the product. Despite what I had hopped at some point, this is not directly applicable to my current work, so I am not sure I want to buy a license. But who knows, SPIN may turn out to be the change that eventually puts RDF back on my “day job” list (one can dream)…

It’s also nice that Holger took the pain to deliver SPIN not just as a feature of his product but also as a stand-alone specification, which should make it pretty easy for anyone who has a SPARQL engine handy to support it. Hopefully the next step will be for him to clarify the IP terms for the specification and to decide whether or not he wants to eventually submit it for standardization. Maybe to the W3C SML working group? :-) I’d have a hard time resisting joining if he did.

12 Comments

Filed under CMDB, CMDBf, Everything, IT Systems Mgmt, Modeling, RDF, Semantic tech, SML, SPARQL, Specs, Tech, W3C

All I know about RDF/OWL I learned in preschool

I don’t want to seem pretentious, but back in preschool I was a star student. At least when it came to potatoes. I am not sure what it’s called in US preschools, but what we meant by a potato, in my French classroom, was an oval shape in which you put objects. The typical example had two overlapping ovals, one for green things and the other for animals. A green armchair goes in the non-overlapping part of the “green” oval. A lion goes in the non-overlapping part of the “animal” oval. A green frog goes in the intersection. A non-green bus goes outside of both ovals. Etc.

As you probably remember, there are many variations on this, including cases where more than two ovals overlap. The hardest part was when we had to draw the ovals ourselves as opposed to positioning objects in pre-drawn ovals: we had to decide whether to make these ovals overlap or not. Typically they would first be drawn separately until an object that belonged to both would come up, prompting some head-scratching and, hopefully, a redrawing of the boundaries. Some ovals were even entirely contained within a larger oval! Hours of fun! I loved it.

[Side note: meanwhile, of course, the cool kids were punching one another in the face or stealing somebody’s lunch money. But they are now stuck with boring million-dollar-a-year jobs as cosmetic surgeons or Wall Street bankers (respectively) while I enjoy the glamorous occupation of modeling IT systems. Who’s laughing now?]

To a large extent, these potatoes really are all you need to understand about RDFS and OWL classes. OO people, especially, are worried about “multiple inheritance”. But we are not talking about programmatic objects here, in which inheritance brings methods with it. Just about intersecting potatoes. Subclassing is just putting a potato inside another one. Unions and intersections are just misshaped potatoes made by following the contours of existing potatoes. How hard can all that be?

Sure there are these “properties” you’ve heard about, but that’s just adding an arrow to show that the lion is sitting on the armchair. Or eating the frog.

Just don’t bring up the fact that these arrows can themselves be classified inside their own potatoes, or the school bully (Alex Emmel) will get you.

4 Comments

Filed under Everything, OWL, RDF, Semantic tech

Mapping CIM associations to CMDBf relationships

This post started as a comment on the blog of Van Wiles. When it became too long (and turned into a therapeutic rant at the end) I turned it into a blog post of its own. Please, read Van’s post first. Here is my response to him:

Hi Van. Sounds like what you are after is not a mapping of the CIM_Dependency association to a CMDBf record type (anyone can make up such a mapping as you point out), but a generic algorithm to map any CIM association to a corresponding CMDBf relationship record type. Correct? That algorithm needs to handle the fact that the CIM metamodel has the concept of relationship roles while the CMDBf metamodel doesn’t.

Here is a possible such mapping:

  1. Take a CIM association (called “myAssociation”) that has two roles (called “thisOne” and “theOtherOne”).
  2. Take the item that has role name that comes first alphabetically and make it the source (in this example, it is “theOtherOne”)
  3. Take the item that has role name that comes second alphabetically and make it the target (in this example, it is “thisOne”)
  4. Generate a CMDBf record type called “{associationName} _from_ {firstRoleNameAlphabetically} _to_ {secondRoleNameAlphabetically}”

You’re done. The new CMDBf record type is “myAssociation_from_theOtherOne_to_thisOne”, the source is the item with the role “theOtherOne” and the target is the item with the role “thisOne”. Everyone who follows this algorithm (of course it needs to be formally defined and evangelized, there is no guarantee here unless we bake CIM-specific concepts in the core CMDBf specification, which would be a mistake) will produce the same CMDBf relationship record type for a given CIM association.

Applied to the CIM_Dependency example, this would generate a “CIM_Dependency_from_Antecedent_to_Dependent” CMDBf record type, in which the source is the CIM Antecedent and the target is the CIM Dependent.

Alternatively, you can have the algorithm generate two CMDBf relationship record types (one going in each direction) for each CIM association. So you don’t have to arbitrarily pick the first one (alphabetically) as the source. But then you need to have model metadata to capture the fact that these relationships are the inverse of one another (and imply one another). As you well know,I have been advocating for the use of RDF/RDFS/OWL in CMDBf for a while. :-)

In the end, there are three potential approaches:

1) Someone (the CMDBf group or someone else) creates an authoritative mapping for all CIM associations (or at least all the useful ones) and we expect anyone who uses the CIM model with CMDBf to use that mapping.

2) Someone (again, the CMDBf group or someone else) defines a normative CIM to CMDBf mapping, e.g. the one above, and we expect anyone who generates a CMDBf relationship record type from a CIM association to use this mapping algorithm. From a pure logical perspective, it is the same as defining a CMDBf record type for each CIM association (approach 1), but it is less work and it doesn’t have to be updated every time a CIM association is created/versioned. At the cost of uglier (more arbitrary) CMDBf record types being defined.

3) We let people define the relationships in whatever way they choose and we provide a model metadata framework (aka ontology language) to allow mappings between these approaches. For example, you define, in your namespace, a van:CIM-inspired-dependency CMDBf record type that goes from antecedent to dependent. Separately, I defined, in my namespace, a william:CIM-like-dependency CMDBf record type that carries the same semantics (defined, not so precisely BTW but that’s a different topic, by CIM) except that its source is the dependent and its target is the antecedent. The inverse of yours. A suitable ontology language would allow someone (you, me, or a third party who has to assemble a system that uses both relationship types) to assert that mine is the inverse of yours. Once this assertion is captured, a request for any [A]—(van:CIM-inspired-dependency)—>[B] would also return the instances of [B]—(william:CIM-like-dependency)—>[A] because they are known to be the same. And you know how I am going to conclude, of course: OWL (specifically owl:inverseOf) provides just this.

BTW, approach 3 is not incompatible with 1 or 2. Whether or not we define mappings for CIM relationships and whether or not that mapping gets adopted, there will be plenty of cases in a federated scenario in which you need to reconcile models (CIM-based or not). Model metadata (aka an ontology language) is useful anyway.

Readers who only care about the technical aspects and have little time for rants can stop reading here. But, since I haven’t addressed any constructive criticism to the DMTF in a while, I can’t resist the opportunity to point out that if the mailing list archives for the DMTF working groups were publicly available, we wouldn’t have to have these discussions on our personal blogs. I am very glad that Van posted this on his blog because it is a question that many people will have. Whatever the CMDBf specification ends up doing, developers and architects who make use of it will benefit from having access to the deliberations and considerations that resulted in the specification being what it is. There are many emails in the CMDBf mailing list private archive that I am sure would be useful to future CMDBf implementers, but if they don’t show up on Google they don’t exist for any practical purpose. When grappling with the finer points of some specification or programming language I have often Googled my way into email archives (or old specification drafts) of the working groups that designed them. Sometimes I come out thinking “oh, ok, now I understand why they chose that approach” and other times it’s “ok, that’s what I suspected, these guys were high”. Either way, it’s useful to me as a user of the specification. W3C is the best example (of making working group records available, not of being high): not only is the mailing list available but the phone meetings often have a supporting IRC channel in which key points of the discussion get captured and archived. Here is an example. Making life easier for implementers is probably the single most important thing to make a specification successful. And ultimately, that’s the DMTF’s success too.

And it’s not just for developers and architects. It also impacts industry observers and pundits. Like the IT Skeptic who looked into CMDBf and reported “nothing on the DMTF website but press releases. try to find anything by navigating from the homepage”. And you wonder why his article is titled “the CMDB Federation proceeeds (sic) at its usual glacial pace”. There is good work going on, but there is no way for him to see it. This too is bad for the adoption and credibility of DMTF specifications.

Isn’t it ironic that the DMTF expends resources to sponsor a “hospitality suite” at the Burton Group Catalyst conference (presumably to spread the word about the good work taking place in the organization) but fails to make it easy for the industry to see that same good work taking place? It’s like a main street retail shop that advertises in the newspaper but covers its store window with cardboard, preventing passersby from seeing what’s on offer. I notice that all the other “hospitality suites” seem to be staffed by for-profit vendors (Oracle, IBM, Cisco, Microsoft etc are all there). Somehow W3C and OASIS (whose work is very relevant to some of the conference themes, like identity management and SOA) don’t feel the need to give away pens and key chains at the conference.

Dear DMTF, open source is not just good for code.

2 Comments

Filed under CA, CMDB Federation, CMDBf, Conference, DMTF, Everything, IT Systems Mgmt, Mgmt integration, Modeling, RDF, Semantic tech, Specs, Standards, Trade show, W3C

I have seen the future of CMDBf

I got a sneak peak at CMDBf v2 today.

I am calling it v2 based on the assumption that the one being currently standardized in DMTF will end up being called 1.0 (because it’s the first one out of DMTF) or 1.1 (to prevent confusion with the submitted version).

At the Semantic Technology Conference, David Booth from HP presented his work (along with his partner, Steve Battle from HP Labs) to provide a SPARQL front-end to HP’s Universal CMDB (the engine under what was the Mercury MAM product). Here are the slides.

The mapping from SPARQL to TQL (the native query interface for UCMDB) was made pretty easy by the fact that TQL is a graph-oriented query language. How much harder would it be to similarly transform a CMDBf (v1) query interface into a SPARQL query interface (and vice-versa)? Not much. The only added difficulty would come from the CMDBf XPath constraints. TQL has a property value mechanism that is very similar to CMDBf’s “propertyValue” constraint and maps well to SPARQL functions. The introduction of XPath as a constraint language in CMDBf makes things harder. It could be handled by adding XPath support to the SPARQL engine using function extensibility. Or by turning the entire XML into RDF and emulating XPath in SPARQL. But in either case, you’ll have impedence mismatch at some point because concepts such as element order that exist in XPath have no native equivalent in RDF.

The use of XPath in selectors on the other hand is not a problem. HP’s prototype uses Gloze (available as a Jena package) to turn the XML returned by UCMDB into RDF. An XSLT transform could turn that same XML into a CMDBf-valid XML response instead and that XSLT could easily handle the XPath selectors from the query request. This is another reason why constraints and selectors should remain separate in CMDBf (fortunately the specification is back to doing this properly).

Here is why I call this prototype CMDBf v2: The CMDBf effort (v1 or 1.1), in its current form of re-inventing a graph query, can succeed. Let’s assume the working group strikes a reasonable balance between completeness and complexity, and vendors choose to compete on innovation and execution rather than lock-in (insert cynical comment here). CMDBf may then end up being supported by the main CMDB vendors. It wouldn’t provide federation capabilities, but having a common CMDB query interface supported by the Big Four would help with management integration. And yet, while the value would be real, it would only provide a little help to solve a larger problem:

  • As a technology limited to IT systems management, it would be unlikely to see widely available tools (e.g. user consoles and language-specific libraries).
  • It wouldn’t get the kind of robustness and interoperability that comes from wide adoption. While pretty similar, there might be some minor differences in the various implementations. Once your implementation has been tweaked to work with the implementations from the Big Four, you’ll call it done. Just like SNMP, another technology that is specific to IT systems management (see it happen here).
  • Even if it works perfectly at the query level, it will just hasten the time when developers run into the real problem, model interoperability. CMDBf doesn’t help at all with this. In fact, it makes it harder by hard-coding some dependencies on an XML back-end (the XPath constraints).

In the long run, IT management has to become more automated and integrated. That’s a given. The way it happens may or may not go through CMDB-like configuration stores. But if it does, we’ll have to eventually move beyond CMDBf (v1) towards something that addresses the three requirements above. And federation. I don’t know if it will be called CMDBf v2, and/or if it will come from the DMTF (by then, the CMDBf brand might be an asset or a liability depending on developer experience with the specification). But I strongly suspect (“probability 0.8” as a Gartner analyst might put it) that it will use semantic technologies. Because the real, hard, underlying problem is a problem of semantic integration. In that sense, David and Steve’s prototype is a sneak peek at what will come after CMDBf v1/1.1.

Pretty much since the beginning of CMDBf I have been pushing for it to ideally embrace SPARQL (with no success) or to at least stay close to it conceptually in order to make the eventual mapping/evolution smooth (with a bit more success). This includes pushing for a topological query language, trying to keep XML idiosyncrasies at bay and keeping constraints and selectors cleanly separated. Rather than working within the CMDBf group, David took the alternative approach of simply doing it. Hopefully this will help convince people of the value of re-using semantic web technology for IT systems management. Yes semantic technologies have been designed for a much more general use case. But the use cases that CMDB systems address are a subset of the use cases addressed by semantic technologies. It’s hard for domain experts to see their domain as just a subset of a larger problem, but this is the case here. Isn’t HTTP serving the IT management community better than a systems management-specific alternative would?

By the way, there is no inferencing taking place in the HP prototype. We are just talking about re-using an existing, well though-through graph query language. Sure OWL inferencing and some rules could be seamless layered on top of this. But this is in no way required to do (better) what CMDBf v1 tries to do.

And then there is the “federation” question. Who do you trust more to deliver this? A bunch of IT system management architects in DMTF or the web and query experts at W3C, HP Labs etc who designed and implemented SPARQL over many years? BTW, it sounds like SPQARL federation was discussed at WWW 2008, based on these meeting notes (search for “federation”).

2 Comments

Filed under Automation, CMDB, CMDB Federation, CMDBf, Conference, DMTF, Everything, Graph query, HP, IT Systems Mgmt, Query, RDF, Semantic tech, SPARQL, Standards, W3C, XPath

At the Semantic Technology Conference this week

I am going to spend some time, starting tomorrow, at the 2008 Semantic Technology Conference in San Jose. I have a few ideas (on ways to use semantic technologies for IT management) that I am trying to validate, improve or kill, as appropriate. I will of course be interested in any discussion/presentation related to applying semantic technologies to IT management. But I am going there in a pretty open frame of mind. Peripheral vision is important when considering technology that is so generic in nature and that has been used in many different fields. In general, I am more interested in concrete projects, lessons learned, best practices etc than new raw technology. There is already a lot more raw semantic web technology available than we can hope to make use of in IT management anytime soon. The question is more around the best ways to practically and opportunistically deliver value in a field that has plenty of existing relational schemas, XML descriptors, semi-standard class models, semi-structured events and in-Joe’s-head-only domain knowledge nuggets floating around.

If you are going to go there and have any interest in the application of semantic technologies to IT management, I’d be happy to hook up.

Comments Off on At the Semantic Technology Conference this week

Filed under Conference, Everything, IT Systems Mgmt, OWL, RDF, Semantic tech

Where will you be when the Semantic Web gets Grid’ed?

I see the tide rising for semantic technologies. On the other hand, I wonder if they don’t need to fail in order to succeed.

Let’s use the Grid effort as an example. By “Grid effort” I mean the work that took place in and around OGF (or GGF as it was known before its merger w/ EGA). That community, mostly made of researchers and academics, was defining “utility computing” and creating related technology (e.g. OGSA, OGSI, GridFTP, JSDL, SAGA as specs, Globus and Platform as implementations) when Amazon was still a bookstore. There was an expectation that, as large-scale, flexible, distributed computing became a more pressing need for the industry at large, the Grid vision and technology would find their way into the broader market. That’s probably why IBM (and to a lesser extent HP) invested in the effort. Instead, what we are seeing is a new approach to utility computing (marketed as “cloud computing”), delivered by Amazon and others. It addresses utility computing with a different technology than Grid. With X86 virtualization as a catalyst, “cloud computing” delivers flexible, large-scale computing capabilities in a way that, to the users, looks a lot like their current environment. They still have servers with operating systems and applications on them. It’s not as elegant and optimized as service factories, service references (GSR), service handle (GSH), etc but it maps a lot better to administrators’ skills and tools (and to running the current code unchanged). Incremental changes with quick ROI beat paradigm shifts 9 times out of 10.

Is this indicative of what is going to happen with semantic technologies? Let’s break it down chronologically:

  1. Trailblazers (often faced with larger/harder problems than the rest of us) come up with a vision and a different way to think about what computers can do (e.g. the “computers -> compute grid” transition).
  2. They develop innovative technology, with a strong theoretical underpinning (OGSA-BES and those listed above).
  3. There are some successful deployments, but the adoption is mostly limited to a few niches. It is seen as too complex and too different from current practices for broad adoption.
  4. Outsiders use incremental technology to deliver 80% of the vision with 20% of the complexity. Hype and adoption ensue.

If we are lucky, the end result will look more like the nicely abstracted utility computing vision than the “did you patch your EC2 Xen images today” cloud computing landscape. But that’s a necessary step that Grid computing failed to leapfrog.

Semantic web technologies can easily be mapped to the first three bullets. Replace “computers -> computer grid” with “documents/data -> information” in the first one. Fill in RDF, RDFS, OWL (with all its flavors), SPARQL etc as counterparts to OGSA-BES and friends in the second. For the third, consider life sciences and defense as niche markets in which semantic technologies are seeing practical adoption. What form will bullet #4 take for semantic technology (e.g. who is going to be the EC2 of semantic technology)? Or is this where it diverges from Grid and instead gets adopted in its “original” form?

1 Comment

Filed under Everything, Grid, HP, IBM, RDF, Research, Semantic tech, Specs, Standards, Tech, Utility computing, Virtualization

Ontology governance?

How do organizations that make heavy use of ontologies implement design-time governance?

A quick search tonight didn’t return much on that topic (note to Google: “governance” is not synonymous with “government”). Am I imagining a problem that doesn’t really exist? Am I too influenced by SOA governance use cases?

Sure, a lot of the pain that SOA governance tries to address is self-inflicted. When used to deal with contract versioning hell (hello XSD) and brittle implementations (hello stub generation), SOA governance is just a bandage on a self-shot foot. Thanks to the open-world assumption, deliberate modeling decisions (e.g. no ordering unless required), a very simple metamodel and maybe some built-in versioning support (e.g. owl:backwardCompatibleWith, owl:incompatibleWith, owl:priorVersion, owl:versionInfo, etc, with which I don’t have much experience), in the RDF/OWL world these self-inflicted wounds are a lot less likely.

On the other hand, there are aspects of SOA governance that are more than lipstick on a pig and it seems that some of those should apply to ontology governance. You still need to discover artifacts. You may still have incompatible versions. You still have to deal with the collaborative aspects of having different people responsible for different parts. You may still need a review/approval process, or other lifecycle aspects. And that’s just at the ontology design level. At the service level you have the same questions of discovery, protocol, query capabilities, etc.

What are the best practices for this in the semantic world? What are the tools? Or alternatively, why is this less important than I think?

Maybe this upcoming book will have some answers to these practical concerns. It was recommended to me by someone who reviewed a draft and had good things to say about it, if not quite as enthusiastically as Toru Ishida from Kyoto University (from the editorial reviews on Amazon):

“At the time when the world needs to find consensus on a wide range of subjects, publication of this book carries special importance. Crossing over East-West cultural differences, I hope semantic web technology contributes to bridge different ontologies and helps build the foundation for consensus towards the global community.”

If semantic technologies can bring world peace, surely they can help with IT management integration…

PS: if you got to this page through a Web search along the lines of “XSD versioning hell” I am sorry that you won’t find the solution to your problems here. Tim Ewald and Dave Orchard have recommendations for you. But the fact that XML experts like Tim and Dave have some kind of (partial) workarounds doesn’t mean the industry doesn’t have a problem. Or you can put your hopes in XSD 1.1 if you are so inclined (see these slides for an overview of the versioning-related changes).

1 Comment

Filed under Everything, Governance, OWL, Semantic tech

Another IT event standard? I’ll believe it when I CEE it.

Looks like there is yet another attempt to standardize IT events. It’s called the Common Event Expression (CEE). My cynicism would have prevented me from paying much attention to it (how many failed attempts at this do we really need?) if I hadn’t noticed an “event taxonomy” as the first deliverable listed on the home page. These days I am a sucker for the T word. So I dug around a bit and found out that they have a publicly-archived mailing list on which we can see a working draft of a CEE white paper. It looks pretty polished but it is nonetheless a working draft and I am keeping this in mind when reading it (it wouldn’t be fair to hold the group to something they haven’t yet agreed to release).

The first reassuring thing I see (in the “prior efforts” section) is that they are indeed very aware of all the proprietary log formats and all the (mostly failed) past standardization attempts. They are going into this open-eyed (read the “why should we attempt yet another log standard event” section and see if it convinces you). I should disclose that I have some history with one of these proprietary standards (and failed standardization attempts) that probably contributes to my cynicism on the topic. It took place when IBM tried to push their proprietary CBE format into WSDM, which they partially succeeded in doing (as the WSDM Event Format). This all became a moot point when WSDM stalled, but I had become pretty familiar with CBE in the process.

The major advance in CEE is that, unlike previous efforts, it separates the semantics (which they propose to capture in a taxonomy) from the representation. The paper is a bit sloppy at times (e.g. “while the syntax is unique, it can be expressed and transmitted in a number of different ways” uses, I think, “syntax” to mean “semantics”) but that’s the sense I get. That’s nice but I am not sure it goes far enough.

The best part about having a blog is that you get to give unsolicited advice, and that’s what I am about to do. If I wanted to bring real progress to the world of standardized IT logging, I would leave aside the representation part and focus on ontologies. At two levels: first, I would identify a framework for capturing ontologies. I say “identify”, not “invent”, because it has already been invented and implemented. It’s just a matter of selecting relevant parts and explaining how they apply to expressing the semantics of IT events. Then I would define a few ontologies that are applicable to IT events. Yes, plural. There isn’t one ontology for IT events. It depends both on what the events are about (networking, applications, sensors…) and what they are used for (security audit, performance analysis, change management…).

The thing about logs is that when you collect them you don’t necessarily know what they are going to be used for. Which is why you need to collect them in a way that is as close to what really happened as possible. Any transformation towards a more abstracted/general representation looses some information that may turn out to be needed. For example, messages often have several potential ID fields (transport-level, header, application logic…) and if you pick one of them to map it to the canonical messageId field you may loose the others. Let logs be captured in non-standard ways, focus on creating flexible means to attach and process common semantics on top of them.

Should I be optimistic? I look at this proposed list of CEE fields and I think “nope, they’re just going to produce another CBE” (the name similarity doesn’t help). Then I read “by eliminating subjective information, such as perceived impact or importance, sometimes seen in current log messages…” in the white paper draft and I want to kiss (metaphorically, at least until I see a photo) whoever wrote this. Because it shows an understanding of the difference between the base facts and the domain-specific interpretations. Interpretations are useful of course, but should be separated (and ideally automatically mapped to the base facts using ontology-driven rules). I especially like this example because it illustrates one of the points I tried to make during the WSDM/CBE discussions, that severity is relative. It changes based on time (e.g. a malfunction in an order-booking system might be critical towards the end of the quarter but not at the beginning) and based on the perspective of the event consumer (e.g. the disappearance of a $5 cable is trivial from an asset management perspective but critical from an operations perspective if that cable connects your production DB to the network). Not only does CBE (and, to be fair, several other log formats) consider the severity to be intrinsic to the event, it also goes out of its way to say that “it is not mutable once it is set”. Glad to see that the CEE people have a better understanding.

Another sentence that gives me both hope and fear is “another, similar approach would be to define a pseudo-language with subjects, objects, verbs, etc along with a finite set of words”. That’s on the right tracks, but why re-invent? Doesn’t it sound a lot like subject/predicate/object? CEE is hosted by MITRE which has plenty of semantic web expertise. Why not take these guys out to lunch one day and have a chat?

More thoughts on CEE (and its relationship with XDAS) on the Burton Group blog.

Let’s finish on a hopeful note. The “CEE roadmap” sees three phases of adoption for the taxonomy work. The second one is “publish a taxonomy and talk to software vendors for adoption”. The third one is “increase adoption of taxonomy across various logs; have vendors map all new log messages to a taxonomy”. Wouldn’t it be beautiful if it was that simple and free of politics? I wonder if there is a chapter about software standards in The Audacity of Hope.

4 Comments

Filed under Everything, IT Systems Mgmt, Semantic tech, Standards

Oracle semantic technologies resources

I have started to look at the semantic technologies available in Oracle’s portfolio and so far I like what I see. At HP, I had access to top experts on semantic technologies (mostly from HP Labs) but no special product (not counting Jena which is available to everyone). At Oracle, I find both top experts and very robust products. If you too are looking into Oracle’s offering related to semantic technologies, here are a few links to publicly-available resources that I have found useful. This is filtered based on my interests (yours may be different, for example I skip content related to life sciences applications).

The main page (and what should be your starting point on that topic) is the Semantic Technologies Center on OTN. Most of the other resources listed below are only a click or two away from there. The Semantic Technologies Forum is the right place for questions. The Semantic Web page on the Oracle Wiki doesn’t contain much right now but that may change.

For an overview of the semantic technology capabilities and their applicability, start with Semantic Data Integration for the Enterprise (white paper) and Why, When, and How to Use Oracle Database 11g Semantic Technologies (slides). Then look at Enterprise Semantic Web in Practice (slides) for many real-life examples.

When you are ready to take advantage of the Oracle semantic technologies capabilities, start with The Semantic Web for Application Developers (slides) followed by RDF Support in Oracle RDBMS (these are more detailed slides but they seem based on 10gR2 rather than 11g so not as complete, no OWL for example). Then grab a thermos of coffee and lock yourself in the basement for a while with the Oracle Database Semantic Technologies Developer’s Guide (also available as a hundred-page PDF).

At that point, you may chose to look into the design choices (with performance analysis) that where made in the Oracle implementation by reading A Scalable RDBMS-Based Inference Engine for RDFS/OWL. There is also a Wiki page on OWLPrime to describe the subset of OWL supported in 11g. Finally, you can turn to the Inference Best Practices with RDFS/OWL white paper for tuning tips on 11g.

To get the actual bits, you can download the Oracle 11g Database on OTN. The semantic technologies support is in the Spatial option for the database, which is included in the base download.

I will keep updating this page as interesting new resouces are created (or as I discover existing ones). For resources on semantic technologies in general (non Oracle specific) good sources are Dave Beckett’s page, the W3C (list of resources or standardization activities) or the Cover Pages.

1 Comment

Filed under Everything, Oracle, OWL, RDF, Semantic tech