The art of reconciling items in your IT management model

Whether you call it a CMDB or some other name, any repository of IT model elements has the problem of establishing whether two entities are the same or not. Here is a quick map of the problem space.

Why there is no “one true solution”

There is no “true” answer to the “sameness” question. The following example illustrates this, even though it is not necessarily representative of datacenter scenarios. Ask any gamer to tell you the history of that 3-year-old PC under their desk. The power cord might be the only original piece left after they’ve upgraded the RAM, video card, sound card, hard disk(s), DVD drive and power supply. Not to mention the tragic overclocking accident that took the life of the motherboard/CPU. After the upgrade/replacement of each of these parts, the user still thought of the machine as the same PC, just upgraded/fixed. But how can it be the same as at the beginning if pretty much every single part has changed? And when time came to reinstall Windows, the registration probably failed because Microsoft decided that the same license was being used for a new machine. Sameness is in the eye of the beholder.

And it’s not just a hardware problem. When you upgrade your Oracle Database and start using a new ORACLE_HOME, it may feel like the same database to most users (including the applications that talk to the database) but a more executable-centric view might conclude that it is a new database.

Defining what makes an IT element unique is not a matter of truth. It’s a matter of usefulness for a given purpose. When trying to establish this for your model, if the conversation ever veers philosophical, you’re off track. This is engineering, not science. “A and B are the same” should be understood to be a shortcut for “it makes sense for my purpose to consider A and B to be the same”. Of course things become complicated when “my purpose” encompasses a whole set of use cases (add “management” after each of: performance, compliance, configuration, change, asset, business service, business transaction…).

How the problem arises

It can arise over time. For example the management agent has to be reinstalled and it forgets the id that had been assigned by the server. When it comes back up, it reports what looks like a new item. But you want to reconcile it with the historical data that came from the agent’s previous incarnation.

It can arise because you have different discovery channels for the same item. For example, a BPEL engine reports to the management server the processes it supports and that model includes the external Web services (partnerLink) invoked by the processes, thereby creating items for these external services in your repository. But some of those external services may be running on servers which you also monitor and the services (and more generally the applications that deliver them) may be separately discovered by the agents on these hosts, resulting in a potential duplicate representation in the repository.

Or the problem can be a result of the integration of IT management products. For example, that Dell server in my asset management system may be the same as the Linux host that runs my production database and appears in Oracle Enterprise Manager.

Fixing the mess

There are two stages to this:

First, you need some level of model alignment. In the general case, the different items that you are trying to reconcile are not expressed in the same model. The view of the server coming from the asset management system does not necessarily contain the same data as its view in your operations console. One contains the lease expiration date, the other one contains the amount of space left on the disk. Some data may be in both (e.g. number of CPUs, host name) but not necessarily in fields of the same name. Or with the same granularity (ownerName versus ownerFirstnameownerLastname). Not to mention type system differences (but if the items are already in the same repository you have presumably already forced some level of metamodel alignment). In short, you first have the challenge of model transformation, a more general problem. With the advantage that the entire model does not need to be translated for item reconciliation, only the subset of data needed to establish “sameness”: the identifying properties. And in some cases (e.g. when a standard model is used or when two instances of the same agent report on the item), the items to reconcile are already described in the same model and this step can be skipped.

Once the necessary level of model alignment has taken place (if needed) so that items can be compared, the real task of reconciliation takes place, based on domain knowledge. It could be through a set of scripts (Python’s mix of simplicity, portability, broad array of libraries and ease of integration make it shine in this usage). It could be through some kind of reconciliation taxonomy, like this draft that IBM has contributed to the Eclipse COSMOS project. Or through metadata such as WSDM’s correlatable properties. [BTW, as the spec editor I got to insert dubious cultural references in the specification (see the <print:PrinterResourcePropDoc> example in section, but let me assure you that I have since matured… ;-)]

These are not the only ways to reconcile items, but they are the approaches that can be followed based on just the data in the repository. Beyond that, you can run a dummy transaction and trace it (if possible) across different management systems to reconcile entities between them. There are plenty of other domain-specific tricks, depending on the item type (I remember a machine room, back in the days when each server had a CD drive, where a script to open the CD tray was used to allow the operator to put a sticker on the correct machine). In general, these approaches play on external variables that are not directly part of the model of the item and yet can be influenced through it. Similar to how the bulb temperature is used in this famous brain teaser. I guess the IT equivalent would be to load-stress an application and use IPMI to see which CPUs register a rise in temperature (note: not a recommended approach in production systems…).

Coming back to the IT model repository, you also need to have plumbing in place to deal with the result of the reconciliation: requests and data may come in that reference either one of the reconciled items and you need to be able to deal with that split personality, while providing a unified view in the general case. You also need to be ready to deal with potential data discrepancy between the items (either automatically or through of process that involves humans, but this is out of scope for this entry).

Preventing the mess from happening

Can’t we just prevent the problem from occurring in the first place? To some extent yes. The main way to prevent it is to not reconcile what doesn’t need to be. This may sound heretical in these days of “single source of truth” and “end to end visibility” but reconciliation of key connection points is often enough. You may not need to have one single model that contains everything from your company’s employee directory to the fan speed of all your servers. It’s a matter of delivering on use cases, not hoarding data.

When you do want to consolidate and reconcile, one approach is to standardize on natural IDs for items of different types. But this requires domain experts to carefully select identifying (and therefore immutable) properties of the different object types, which sounds a lot easier than it is. And it requires convincing others to adopt this approach, an even harder task. But as the proverb (almost) goes, one ounce of convention is worth one pound of reconciliation.

[Note: Whenever you talk about item reconciliation, the topic of correlating events is not far behind. It is assisted by a solid underlying IT model, but it has challenges of its own, so I’ll consider this out of scope for this discussion.]


Filed under CMDB, Everything, IT Systems Mgmt, Mgmt integration, Modeling