I am now an Oracle employee. My last day at HP was last Friday. I have a lot of excellent memories of my almost nine years there. And the company is (finally) very serious about software and investing a lot in it. HP Software is a very good place to be. But so is Oracle and the very interesting position I was offered convinced me that now was the time to go. So I am now in the Enterprise Manager group with the title of Architect. More specifically, I am in the part of EM that manages Middleware and Applications. Which also means that I’ll get to interact with the ex-Bluestone people who were my colleagues at HP Middleware and later joined Oracle’s application server team (like Greg). And I just learned today that David Chappell (with whom I collaborated on several specs) recently joined that group too. This is a happening place.
Monthly Archives: August 2007
Tutorial and pseudo-algorithm for CMDBF Query operation
[UPDATE: an updated version of the algorithm that complies to version 1.0 of the specification (and not 0.95 as in this post) is now available]
The CMDBF Query operation (section 4 of the CMDBF specification) quickly becomes very intuitive to use, but it can first look a little strange to people used to SQL-style queries, because of its graph-based nature. I hope the normative text and the examples in the spec mitigate this. But just in case, here is some additional information that doesn’t belong in the spec but can be useful. Think of it as a very first draft of a primer/tutorial about that Query interface.
The easiest way to think about this interface is to think graphically. Imagine that you’re not writing your query in XML, but instead creating it in a GUI. You want to find all Windows XP machines that are owned by someone in the marketing department. First you create a circle in the GUI that represents the machine (through a Visio-like drag-and-drop into the query composer window). You right-click on that circle and in the right-click menu you select the option called “type” and you set it to “computerSystem”. Then you right-click to select “add property constraint”. You enter “OS_Version” as the property name (if your GUI tools is any good it will present you with a list to choose from, based on the previously selected type) and “WindowsXP” as the value. Already you’ve created a query that selects all Windows XP machines. Not a bad start.
But you only want the machines that are owned by marketing people. So go ahead and create another circle to represent a person. Use similar right-click actions to set the type to “person” and to set the “department” property to “marketing”.
If you submit the query at this point, you’ll get a list of all Windows XP machines and all people in the marketing department. So, rather than reducing the result (by removing machines not owned by marketing people), you have expanded it (by adding records for the marketing people). Oops.
The next step is to create a relationship that constrains the machine to belong to someone in marketing. This is done very simply in our handy GUI tool by drawing an arrow that goes from the “person” circle to the “machine” circle. This requires that there be a relationship between the two. If we want to further ensure that this relationship is of type “owns” (and not “uses” for example), we do this by right-clicking on the arrow (like we do on circles), selecting “type” and setting its value to “owns”.
If we run the query now, we get the list of all Windows XP machines owned by marketing people. We also get the list of the marketing people who own these machines and we get the relationships between people and machines. So we now have want we wanted. But maybe a little more. Maybe we only care about the list of machines, we don’t want to retrieve all the data about the marketing people. As long as we know the machines are owned by marketing people, that’s all that we care about. We don’t need to know what specific person owns what specific machine. We could simply ignore the people and relationships in the response. Or we could enrich the query to specify that the people and relationship need not be returned (but they are still part of the query in the sense that they limit what machines get returned).We do this by right-clicking on the “person” circle, and selecting the “suppress” option. Similarly, select the “suppress” option on the relationship (the arrow) too.
This query will return a list of all Windows XP machines that are owned by someone in marketing, and nothing else.
Here is what the query could look like graphically (here I assume that the GUI tools represents the fact that the arrow and the “person” circle have the “suppress” option selected on them by turning their solid lines into dotted lines and their text into italics):

The most intuitive way to think about what happens when the query gets processed is that the program looks for all instances of the patterns described by the query. In other words, it tries to superimpose the requested graph everywhere on the graph of available data and selects all the instances where the requested graph “fits”.
What does the GUI tool do behind the scene to turn this query into the proper XML, as described by the spec?
For each circle, it creates an <itemTemplate> element. For each arrow, it creates a <relationshipTemplate> element and sets its <source> and <target> elements to the right item templates. For each constraint on a circle or arrow (i.e. when we set the type or when we set the value or a give property) it creates the appropriate selector and embeds it in the <itemTemplate> or <relationshipTemplate> that corresponds to this circle or arrow. Finally, it sets the @dropDirective attribute to “true” on all the <itemTemplate> and <relationshipTemplate> elements that corresponds to circles and arrows on which the “suppress” option was selected.
Here is what the resulting query looks like in XML:
<query xmlns="http://schemas.cmdbf.org/0-9-5/datamodel">
<itemTemplate id="machine">
<propertyValueSelector namespace="http://example.com/computerModel" localName="OS_Version">
<equal>Windows XP</equal>
</propertyValueSelector>
<recordTypeSelector namespace="http://example.com/computerModel" localName="computerSystem"/>
</itemTemplate>
<itemTemplate id="person" dropDirective="true">
<propertyValueSelector namespace="http://example.com/peopleModel" localName="department">
<equal>marketing</equal>
</propertyValueSelector>
<recordTypeSelector namespace="http://example.com/peopleModel" localName="person"/>
</itemTemplate>
<relationshipTemplate id="administers" dropDirective="true">
<recordTypeSelector namespace="http://example.com/computerModel" localName="owns"/>
<source ref="person"/>
<target ref="machine"/>
</relationshipTemplate>
</query>
Note: like all query language, the actual query depends of course on the underlying model. In this example, I assumed that the OS version is represented as a property of the machine. More commonly, the OS will be a node of its own that has a relationship with the machine. So you’d have another circle for the OS. With a property constraint on that circle (version=”WindowsXP”) and a line representing a “runs” relationship between the machine circle and the OS circle. Similarly, “marketing” could be a node of its own that people have a relationship with, rather than just a property of each person. None of this changes the logic behind the Query operation.
Now, this is nice for the user of the query, but what about the poor developer who gets the 50-pages spec thrown on his/her desk and has 2 weeks to make sense of it and implement the server side of the query? I’ve said above that the program “tries to superimpose the requested graph everywhere on the graph of available data and selects all the instances where the requested graph fits” but that’s a lot easier to write as a sentence than to implement. So here is a pseudo-algorithm to help.
1) for each itemTemplate calculate the set of all items that obey all the selectors in the template. Call this the working set for the itemTemplate.
2) for each relationshipTemplate calculate the set of all relationships that obey all the selectors in the template. Call this the working set for the relationshipTemplate.
3) set need_to_loop = true
4) while (need_to_loop == true)
4.1) set need_to_loop = false
4.2) for each relationshipTemplate RT4.2.1) let ITsource be the itemTemplate that is referenced as source by RT
4.2.2) let ITtarget be the itemTemplate that is referenced as target by RT
4.2.3) for each relationship R in the working set for RT4.2.3.1) if the source of R is not in the working set for ITsource, then remove R from the RT working set
4.2.3.2) if the target of R is not in the working set for ITtarget, then remove R from the RT working set
4.2.3.3) if RT has a source/@minimum or a source/@maximum attribute4.2.3.3.1) find the list L1 of all relationships in the working set for RT that have the same source as R
4.2.3.3.2) if RT has source/@minimum and the cardinality of L1 is less than this minimum then remove all relationships in L1 from the RT working set
4.2.3.3.3) if RT has source/@maximum and the cardinality of L1 is more than this maximum then remove all relationships in L1 from the RT working set4.2.3.4) if RT has a target/@minimum or a target/@maximum attribute
4.2.3.4.1) find the list L2 of all relationships in the working set for RT that have the same target as R
4.2.3.4.2) if RT has target/@minimum and the cardinality of L2 is less than this minimum then remove all relationships in L2 from the RT working set
4.2.3.4.3) if RT has target/@maximum and the cardinality of L2 is more than this maximum then remove all relationships in L2 from the RT working set4.3) for each itemTemplate IT
4.3.1) let sourceRTset be the set of all relationshipTemplates that references IT as its source
4.3.2) let targetRTset be the set of all relationshipTemplates that references IT as its target
4.3.3) for each item I in the IT working set4.3.3.1) for each relationshipTemplate sourceRT in sourceRTset, if there is no relationship in the working set for sourceRT that uses I as its source, remove I from the IT working set and set need_to_loop to true
4.3.3.2) for each relationshipTemplate targetRT in targetRTset, if there is no relationship in the working set for targetRT that uses I as its source, remove I from the IT working set and set need_to_loop to true
5) process all directives (dropDirective or propertySubsetDirective) to remove or pair down items and relationships as requested
6) package the resulting items and relationships in a way that conforms to the CMDBF response message format (including putting each item in the <nodes> element with the appropriate @templateId attribute and putting each relationship in the <edges> element with the appropriate @templateId).
There are all kinds of optimizations possible here (like breaking out of loops once an instance has been removed, or not recalculating L1 and L2 over and over again for relationships in the same working set that share a source/target), but this is the most basic form or the algorithm. The goal is to illuminate the spec, not to provide an optimal implementation strategy.
In this post, I have focused on describing and illustrating the topological aspects of the query language. The other concepts that come into play are the Selector and the Directive mechanisms. But these are a lot more familiar to people used to SQL and I think they are sufficiently explained in the spec. So I have assumed here (in steps 1, 2 and 5 of the pseudo-algorithm) that they are well understood.
Filed under CMDB Federation, CMDBf, Everything, Graph query, Pseudo-algorithm, Query, Specs, Standards, Tutorial
First release of the CMDBF specification
The CMDBF (CMDB Federation) group just released a first public draft of the specification. Here is a direct link to it (PDF, 1MB). That’s a very nice achievement for a group that was a bit slow to find its pace but has been very productive since the beginning of 2007. Having the interop test drive our efforts had a lot to do with it, this is an approach to repeat the next time around.
This spec will look a lot more familiar to people used to reading WS-* specs than to people used to reading ITIL manuals. The concepts and use cases are (hopefully) consistent with ITIL, but the meat of the spec is about defining interoperable SOAP-based message exchanges that realize these use cases. The spec is about defining SOAP payloads, not IT management best practices. Please adjust your expectations accordingly and you’ll like the spec a lot more.
The most useful and important part (in my mind at least), is the definition of the Query service (section 4). This is used in many interactions. It is used by clients to query a CMDB that federates many MDRs (Management Data Repositories). It is used by the clients to go interact directly with the MDRs is they choose to. And it is used by the federating CMDB to retrieve data from the MDRs. Even in the more static scenarios, in which data is replicated from the MDRs into the CMDB (instead of true federation with no replication), the Query service is still the way for clients to access federated data from the CMDB.
So, yet another query language? Indeed. But one that natively supports concepts that are key to the kind of queries most useful for CMDB scenarios, namely relationships traversal and types. This is a topological query language. I hope the simple example in section 4.2 gives a good example of what this means.
Neither SQL nor XPath/XQuery has this topology-friendly approach (which is not to say that you can’t create useful queries on CMDB data with these query languages). On the other hand, there is one query language that is inherently relationship-oriented, that has native support for the notion of class and that has received a lot more attention, interop testing and implementation experience than our effort. One that I would have loved for us to leverage in CMDBF. It’s SPARQL. But semantic web technologies seem once again to be doomed by the perception that they are too much “out there” despite all the efforts of its proponents to make them connect with XML and other non-RDF views of the world.
Final caveat, this is a first draft. It is known to be incomplete (you’ll find text boxes that describe known gaps) and not all features in the spec were tested at the interop. We need more interop, review and development before it is robust. And most importantly, a lot of the difficult aspects of federation aren’t sufficiently addressed (reconciliation, model differences, source tracing, administrative metadata…) But it is at a point where we think it gives a good idea of how we are approaching the problem.
Equipped with this foreword, I wish you a pleasant read.
Filed under CMDB Federation, CMDBf, Everything, Specs, Tech