Category Archives: Everything

Oracle acquires mValent for application config management

mValent will become part of Enterprise Manager. It comes to complement other recent acquisition in the application management space: Auptyma, Moniforce, Empirix, ClearApp.

The announcement and FAQ are here.

More details about the acquired product and technology are on the mValent site, including here and here.

1 Comment

Filed under Application Mgmt, CMDB, Everything, IT Systems Mgmt, Mgmt integration, Modeling, Oracle

UCI: setting RDF for failure?

I don’t get it. I just read Reuven Cohen’s description of the Unified Cloud Interface project that he recently started. It’s nothing less than using RDF to create “a Semantic Cloud Infrastructure capable of adapting to a variety of methodologies / architectures and completely agnostic to any specific API or platform being described.”

What made me fall off my chair is the methodology/architecture part of this statement. It’s hard enough (but doable) to use RDF to map philosophically similar APIs. It’s a non-starter to use it to bridge architectural and methodological differences. I have spent a fair amount of time looking at Semantic Web technologies in the context of modeling IT systems (see the “semantic tech” category of this blog). While I think they would be a great foundation I don’t see them ever coming anywhere near what Reuven describes.

But to be fair, I am not sure what he really is describing. There are a few overly ambitious proclamations like the one above and this paragraph:

The key drivers of a unified cloud interface (UCI) is “One abstraction to Rule them All” – an API for other API’s. A singular abstraction that can encompass the entire infrastructure stack as well as emerging cloud centric technologies through a unified interface. What a semantic model enables for UCI is a capability to bridge both cloud based API’s such as Amazon Web Services with existing protocols and standards, regardless of the level of adoption of the underlying API’s or technology. The goal is simple, develop your application once, deploy anywhere at anytime for any reason.

But in his piece you’ll also find CIM being cited as an example. There are good things to be said about CIM, but it certainly is not “a dynamic computing model that can, under certain conditions, be ‘trained’ to appropriately ‘learn’ the meaning of related cloud & infrastructure resources” (or, in the case of CIM, computer system resources). Good luck “training” CIM to “learn” anything. It’s CIM that’s going to train you to do it its way, period.

The CIM example (and other standards he lists) paints the picture of defining a standard API for Cloud Computing and forcing all providers to use it. That’s the conventional approach to universality. If that’s what UCI is after then it is technically achievable. And RDF might be a very good technical foundation for it. Whether anyone can pull this off politically and commercially at this stage is a different question of course. In any case, such an effort would have nothing to do with magically wrapping whatever API each provider has defined and whatever architecture/methodology they chose.

And further down we see a sketch of another, much more modest, vision, when Reuven talks about how “these web resources could just as easily be ‘cloud resources’ or API’s” which seems to represent a whole API as an RDF resource. Sure, then you can use RDF/OWL to capture versioning information between them, backward compatibility etc. Probably very useful, but that’s a very different scope.

So which is it? Reuven is a thought leader in Cloud Computing, so I want to think I am missing his point.

So far, I haven’t seen any Cloud taxonomy that is reasonably complete and has received broad support. Shouldn’t we first try to come up with a human-readable taxonomy before we try to turn it into a machine-readable ontology? In my previous post I explicitly stayed away from being pedantic about the difference between the terms, but the confusion between a taxonomy and an ontology seems to be part of what’s going on here.

The sad thing is that they (you know, them) will point to this as a proof that Semantic Web technologies don’t work.

Or maybe I’ve just set myself up for a generous portion of humble pie on April 2nd (when Reuven says an “initial functional draft UCI implementation, taxonomy and ontology” will be unveiled). I’d love to be surprised. And my ego has taken worse hits before.

[UPDATED 2009/2/10: You should read Steve Oberlin’s take on this overall taxonomy/ontology discussion. He knows the topic, carefully reads the posts that he comments on, packs a healthy dose of skepticism and takes the time to explain what taxonomies and ontologies are, which was overdue. Plus, I just love sites that don’t feel the need to use decorative pictures. His doesn’t have a single image file which means that even if he didn’t have superb credentials (which he does) he’d get my respect by default. A blog to watch.]

1 Comment

Filed under Automation, Cloud Computing, Everything, Modeling, Portability, RDF, Semantic tech, Standards, Tech, Utility computing

Cloud ontology: to boldly go where ITIL, Grid and SOA have gone before

I wasn’t at the recent Cloud interop meeting in Mountain View, but based on blog reports from those who were (Stu, James and Bob) a key takeaway is that we need a “Cloud taxonomy”.

As John points out, there are plenty of existing proposals. The SPI taxonomy (SaaS/PaaS/IaaS), the SADIST-PIMP taxonomy (Storage, App, DB, Info, Security, Test, Platform, Integration, Management, Process – which I took the liberty to re-order) and the UCSB/IBM “Toward a Unified Ontology of Cloud Computing” paper.

Is it a matter of picking one of these? And if we do, how are we better of?

Let’s look at some related domains that are a bit more mature to see how they handled this and what benefits they got out of it. The three most relevant comparison I can come up with are ITSM, Grid Computing and SOA.

ITSM has ITIL, a key part of which is its glossary which provides value in and of itself, even if you don’t explicitly use the rest of the framework. Is this what the Cloud taxonomy should aim for? I contend that the Cloud field is not nearly mature enough to aim for this.

Grid Computing has OGSA, which started as a white paper containing a service model (section 6.1). It was later expanded (within GGF, a standards organization specific to Grid Computing) into this document and was used as a basis for many standard technical specification. Is the goal of the Cloud Ontology to follow a similar path? OGSA was more than a glossary though, it included a vision, goals and architectural recommendations. Sure you can start with the glossary, but I am willing to bet that a stand-alone glossary would change significantly if/when a Cloud Computing architecture is defined based on it. And there isn’t today, for Cloud Computing, the singularity of vision that the Globus gang brought to Grid Computing and that is necessary for an architecture.

SOA was never as structured. There was an early attempt, called the Web Services Architecture at W3C. Like OGSA, this is a high level architecture, not just a taxonomy. In any case, the WSA work didn’t really end up having much of an influence in SOA (or even Web services) practice. Then came the SOA Reference Model, an OASIS standard. Of the three (ITIL, OGSA, SOA-RM) this is what I think is closest to what a Cloud taxonomy should aim for at this point. An ITIL-like glossary would be too brittle (if it has any depth) and it’s too early for an OGSA-like architecture.

Again, I was not at the meeting and only know what was blogged about. This is not a criticism of the proposal, just some thoughts about how it might compare to similar efforts.

Side note #1: I prevented myself from going off about the difference between a controlled vocabulary, a glossary, a taxonomy and an ontology. These differences are not germane to the present discussion. And I am not wearing my pedantic hat today (for once). If some want to call a wedding cake an ontology, who am I to stop them?

Side note #2: Speaking of trying to be less pedantic in 2009, this entry represents my capitulation: I finally created a “Cloud Computing” category in this blog, rather than my prefered designation, “Utility Computing”, which I have been using so far. A small step for me, a microscopic step for humanity.

[UPDATED 2009/1/28: We’re getting there (fast!). Via John, this updated taxonomy graph from Chris Hoff is going in the right direction, I think. Next step: more text describing the relationships between the boxes to have more of a model.]

[UPDATED 2009/1/30: If you think that any single-vendor taxonomy will be suspicious, that a multi-vendor taxonomy won’t happen and that customers are too busy to do it, then you may be asking “aren’t analysts supposed to do this?”. Help may be on the way.]

8 Comments

Filed under Cloud Computing, Everything, Grid, ITIL, Modeling, Standards, Utility computing

The Hippocratic Oath for IT management systems

The first feature of any IT management application should be “primum non nocere”, or in English “first, do no harm”. Your IT management application should not ever:

  • crash the system it manages
  • materially degrade its performance
  • introduce security vulnerabilities to the system

It’s appropriate that the “primum non nocere” expression comes from a medical background, since IT management software acts, to a large extent, like the doctor of the datacenter. In that spirit, I wondered whether the Hippocratic Oath could also provide good guidelines to those designing IT management software.

Turns out, the original oath hasn’t aged so well (though it cannot hurt to remind IT consultants not to have sex with the IT staff of their customers or attempt to remove kidney stones on them).

On the other hand, there is a modern version (written by Louis Lasagna in 1964) that maps quite well to our domain. Here it is, with each paragraph followed (in bold font) by its translation for IT management. Just food for thoughts.

I swear to fulfill, to the best of my ability and judgment, this covenant:

I will respect the hard-won scientific gains of those physicians in whose steps I walk, and gladly share such knowledge as is mine with those who are to follow.

Follow standards, publish configuration best practices in your domain of expertise, contribute to open source projects.

I will apply, for the benefit of the sick, all measures [that] are required, avoiding those twin traps of overtreatment and therapeutic nihilism.

Provide useful features that fully address customer needs. Not a superficial product, but not a boatload of useless features and meaningless metrics either.

I will remember that there is art to medicine as well as science, and that warmth, sympathy, and understanding may outweigh the surgeon’s knife or the chemist’s drug.

Good technology is important, but so are support, documentation and user interface.

I will not be ashamed to say “I know not,” nor will I fail to call in my colleagues when the skills of another are needed for a patient’s recovery.

Don’t push “consulting engagements” to stretch your application to deliver features it is not designed for. Instead, integrate with products that perform the function well.

I will respect the privacy of my patients, for their problems are not disclosed to me that the world may know. Most especially must I tread with care in matters of life and death. If it is given me to save a life, all thanks. But it may also be within my power to take a life; this awesome responsibility must be faced with great humbleness and awareness of my own frailty. Above all, I must not play at God.

Implement roles, permissions and auditing. Be very careful not to inadvertently crash, slow or compromise the target system. Keep in mind, in your design, that you and your developers are fallible.

I will remember that I do not treat a fever chart, a cancerous growth, but a sick human being, whose illness may affect the person’s family and economic stability. My responsibility includes these related problems, if I am to care adequately for the sick.

You are not managing just a server, a JVM or a database. Considers the entire IT system and the business value it delivers.

I will prevent disease whenever I can, for prevention is preferable to cure.

Proactively monitor the system, detect problems before they impact the system and, when possible, automate resolution.

I will remember that I remain a member of society, with special obligations to all my fellow human beings, those sound of mind and body as well as the infirm.

Don’t cheat, or abuse your position. Be honest and decent with customers, partners and competitors.

If I do not violate this oath, may I enjoy life and art, respected while I live and remembered with affection thereafter. May I always act so as to preserve the finest traditions of my calling and may I long experience the joy of healing those who seek my help.

If you do all this, customers will keep buying from you and may even like you.

1 Comment

Filed under Business, Everything, IT Systems Mgmt

Sorry, CMDBf doesn’t make coffee either

The IT Skeptic is writing to us from his mountain retreat (via a time-delayed post on his blog), and the topic he felt safe to cover in such fashion (what journalists call an “evergreen”) is the fact that CMDBf is an orchestrated sham, brilliantly executed by IT management vendors.

I’d love to be part of something that’s brilliantly executed for once, even if it is a sham, but I am afraid this is not it. But first I should state the obvious, clarifying that even though I am a member of the CMDBf group at DMTF (and also an author of the original version, under my previous employer) I do not speak for the group or DMTF (or my employer for that matter). Just as myself, as always on this blog.

The problem that Rob England, Mr. Skeptic, has with the CMDBf specification is that it doesn’t do a bunch of things that he’d like it to do, such as specifying how data sources acquire data for their domain, how they store the data, how the underlying resources are reconfigured, what processes are followed etc. See the full list from his post. The list is a copy/paste from the CMDBf specification, with some comments added, so at the very least he has to admit that as far as “smokescreens” go this one is pretty upfront about its limitations…

He concludes that “this is once again a geeky technical solution to a cultural, organizational and procedural problem.” I have to ask: who expects DMTF specifications to solve “cultural, organizational and procedural” problems? Does CIM solve such problems? Does WBEM?

Human-to-human communication is a “cultural, organizational and procedural” problem and SMTP/POP/IMAP/etc (the interoperable protocols used by email systems) are just as geeky as CMDBf. They don’t solve the larger problem, only contribute to the solution. If CMDBf can contribute as much to datacenter management as SMTP/POP/IMAP contribute to human communication (minus the SPAM if possible), I’d call that a success.

And then there is this warning:

“WARNING: vendors will waive this white paper around to overcome buyer resistance to a mixed-vendor solution. For example if you already have availability monitoring from one of them, one of the other vendors will try to sell you their service desk and use this paper as a promise that the two will play nicely.”

Has anyone actually seen this happen? I am asking because so far, both at HP and Oracle, the only sales reps I have ever met who know of CMDBf heard about it from their customers. When asked about it, the sales person (or solutions engineer) sends a email to some internal mailing list asking “customer asking about something called cmdbf, do we do that?” and that’s how I get in touch with them. Not the other way around.

Also, if the objective really was to trick customers into “mixed-vendor solutions” then I also don’t really understand why vendors would go through the effort of collaborating on such a scheme since it’s a zero-sum game between them at the end.

As far as the glacial pace of progress (“Glacial advance. That’s the way the vendors want it” from an earlier post by the Skeptic), CMDBf is no race horse but I don’t see it going any slower than other standards. Slowness (I mean, deliberation) is part of the landscape. I would submit a slight twist on Hanlon’s razor: “Never attribute to malice that which can be adequately explained by legal, procedural and organizational inertia.”

Having said all this, some of Rob’s criticism is perfectly justified, such as his sarcasm about this sentence from the specification:

“The Federated CMDB operates in a closed environment, in which some security issues are less critical than in open access or public systems.”

OK, that’s stupid indeed. Especially in a public cloud environment where you don’t know who is renting the VM next door. I’ll ask the group to remove this. Actually, that whole appendix is useless and I pointed this out in my earlier review of CMDBf 1.0 (look for the “security boilerplate” section at the bottom of the review).

Rob could also have pointed out that this specification only addresses “federation” if you accept a very scaled-down definition of the term. What it does do is help with CMDB query and synchronization. Not the holy grail, but nothing to sneer at either.

Rob, next time you want to throw tomatoes at CMDBf while you’re on holiday, just give me the password to the site and I’ll do it for you… :-)

[UPDATED 2009/1/21: Rob responds via a comment on his original blog entry.]

2 Comments

Filed under BSM, CMDB Federation, CMDBf, DMTF, Everything, IT Systems Mgmt, ITIL, Mgmt integration, Security, Specs, Standards

A new SPIN on enriching a model with domain knowledge (constraints and inferences)

Back when I was at HP and we got involved with what turned into SML (now a W3C candidate recommendation), we tried to make a case for the specification to be based on RDF/OWL rather than XML/XSD/Schematron. It was a strange situation from a technical perspective because RDF is a better foundation for an IT model than XML, but on the other hand XSD/Schematron is a better choice for validation than OWL. OWL is focused on inference, not validation (because of both fundamental design choices, e.g. the open world assumption, and language expressiveness limitations).

So our options were to either use the right way to represent the system (RDF) combined with the wrong way to capture constraints (OWL) or to use the wrong way to represent the system (XML) combined with the right way to constrain it (mostly Schematron, with some limited help from XSD). At the end, of course, this subtle technical debate was crushed under the steamroller of vendor politics and RDF never got a fair chance anyway.

The point of this little background story is to describe the context in which I read this announcement from Holger Knublauch of TopQuadrant: the new version of their TopBraid Composer tool introduces SPIN, a way to complement OWL with a SPARQL-based constraint checking and inference mechanism.

This relates to SML in two ways.

First, there are similarities in the approach: Schematron leverages the XPath language, used to query XML, to create validation rules. SML then marries Schematron with XSD, for a more powerful validation mechanism. Compare this to SPIN: SPIN leverages the SPARQL query language, used to query RDF, to create validation/inference rules. SPIN also marries this with OWL, for a more powerful validation/inference mechanism.

But beyond the mirroring structures of SPIN and SML, the most interesting thing is that it looks like SPIN could nicely solve the conundrum, described above, of RDF being the right foundation for modeling IT systems but OWL being the wrong constraint mechanism. SPIN may do a better job than SML at what SML is aiming to do (validation rules). And at the same time, you get “for free” (or as close to “for free” as you can get with software, which is still far from “free”) a pretty powerful inference mechanism. The most powerful I know of, short of using a general programming language to capture your inference rules (and good luck with maintaining these rules).

This may sound like sci-fi, but it’s the next logical step for IT configuration standardization. Let’s look at where we are today:

  • SML (at W3C) is an attempt to standardize the expression of constraints.
  • CMDBf (at DMTF) is standardizing how the model content is queried (and, to some limited extent at this point, federated).
  • And recently IBM authored a proposal for a reconciliation specification for items in the model and sent it to an Eclipse group (COSMOS).

But once you tackle reconciliation, you are already half-way into inferencing territory. At least if you want to reconcile between models, not just between instances expressed in the same model. Because the models may not be defined at the same level of granularity, and before you can reconcile items you need to infer finer-grained entities in your coarser-grained model (or vice-versa) so that you can reconcile apples with apples.

Today, inferencing for IT models is done as part of the “discovery packs” that you can buy along with your IT management model repository. But not very well, in general. Because the way you write such a discovery module for the HP Universal CMDB is very different from how you write it for the BMC CMDB, IBM’s CCMDB or as a plug-in for Oracle Enterprise Manager or Microsoft System Center. Not to mention the smaller, more specialized, players. As a result, there is little incentive for 3rd party domain experts to put work into capturing inference rules since the work cannot be widely leveraged.

I am going a bit off-topic here, but one interesting thing about standardization of inferencing for IT management, if it happens, is that it is going to be very hard to not use RDF, OWL and some flavor of SPARQL (SPIN or equivalent) there. And once you do that, the XML-based constraint mechanisms (SML or others) are going to be in for a rough ride. After resisting the RDF stack for constraints, queries and basic reconciliation (because the added value was supposedly not “worth the cost” for each of these separately), the XML dam might get a crack for inferencing. And once RDF starts to trickle through that crack, the whole dam is going to come down in a big wave. Just to be clear, this is a prophetic long-term vision, not a prediction for 2009 (unfortunately).

In the meantime, I’d like to take this SPIN feature a… spin (sorry) when I find some time. We’ll see if I can install the new beta of TopBraid composer despite having used up, a year ago, my evaluation license of the earlier version of the product. Despite what I had hopped at some point, this is not directly applicable to my current work, so I am not sure I want to buy a license. But who knows, SPIN may turn out to be the change that eventually puts RDF back on my “day job” list (one can dream)…

It’s also nice that Holger took the pain to deliver SPIN not just as a feature of his product but also as a stand-alone specification, which should make it pretty easy for anyone who has a SPARQL engine handy to support it. Hopefully the next step will be for him to clarify the IP terms for the specification and to decide whether or not he wants to eventually submit it for standardization. Maybe to the W3C SML working group? :-) I’d have a hard time resisting joining if he did.

12 Comments

Filed under CMDB, CMDBf, Everything, IT Systems Mgmt, Modeling, RDF, Semantic tech, SML, SPARQL, Specs, Tech, W3C

Announcing Xen Transcendent Memory project

If you have more than one child, you’ve probably heard yourself say things like “if you are not using your train, you should let your brother play with it” more often than you’d like. The same happens in a datacenter (minus the screams and tears, at least usually). In that context, the rivaling siblings take the form of guest virtual machines and the toys in contention are the physical resources of the host system: CPU, I/O, memory. While virtualization platforms do a pretty good job at efficiently sharing the first two, the situation is not nearly as good for memory. It is often, as a result, the limiting factor for virtualization-driven consolidation. A new project aims to fix this.

The Oracle engineers working on the Xen-based Oracle Virtual Machine have just announced a new open source (GPL-licensed) project to improve the sharing of physical memory between guest virtual machines on the same physical system. It’s called Transcendent Memory, or tmem for short.

Much more information, including a comparison with VMWare’s memory balloon, is available from the project home page.

Another reason to come to the upcoming Xen Summit (February 24 and 25), hosted by Oracle here at headquarters.

Comments Off on Announcing Xen Transcendent Memory project

Filed under Everything, Linux, Open source, Oracle, OVM, Tech, Virtualization, Xen

The art of reconciling items in your IT management model

Whether you call it a CMDB or some other name, any repository of IT model elements has the problem of establishing whether two entities are the same or not. Here is a quick map of the problem space.

Why there is no “one true solution”

There is no “true” answer to the “sameness” question. The following example illustrates this, even though it is not necessarily representative of datacenter scenarios. Ask any gamer to tell you the history of that 3-year-old PC under their desk. The power cord might be the only original piece left after they’ve upgraded the RAM, video card, sound card, hard disk(s), DVD drive and power supply. Not to mention the tragic overclocking accident that took the life of the motherboard/CPU. After the upgrade/replacement of each of these parts, the user still thought of the machine as the same PC, just upgraded/fixed. But how can it be the same as at the beginning if pretty much every single part has changed? And when time came to reinstall Windows, the registration probably failed because Microsoft decided that the same license was being used for a new machine. Sameness is in the eye of the beholder.

And it’s not just a hardware problem. When you upgrade your Oracle Database and start using a new ORACLE_HOME, it may feel like the same database to most users (including the applications that talk to the database) but a more executable-centric view might conclude that it is a new database.

Defining what makes an IT element unique is not a matter of truth. It’s a matter of usefulness for a given purpose. When trying to establish this for your model, if the conversation ever veers philosophical, you’re off track. This is engineering, not science. “A and B are the same” should be understood to be a shortcut for “it makes sense for my purpose to consider A and B to be the same”. Of course things become complicated when “my purpose” encompasses a whole set of use cases (add “management” after each of: performance, compliance, configuration, change, asset, business service, business transaction…).

How the problem arises

It can arise over time. For example the management agent has to be reinstalled and it forgets the id that had been assigned by the server. When it comes back up, it reports what looks like a new item. But you want to reconcile it with the historical data that came from the agent’s previous incarnation.

It can arise because you have different discovery channels for the same item. For example, a BPEL engine reports to the management server the processes it supports and that model includes the external Web services (partnerLink) invoked by the processes, thereby creating items for these external services in your repository. But some of those external services may be running on servers which you also monitor and the services (and more generally the applications that deliver them) may be separately discovered by the agents on these hosts, resulting in a potential duplicate representation in the repository.

Or the problem can be a result of the integration of IT management products. For example, that Dell server in my asset management system may be the same as the Linux host that runs my production database and appears in Oracle Enterprise Manager.

Fixing the mess

There are two stages to this:

First, you need some level of model alignment. In the general case, the different items that you are trying to reconcile are not expressed in the same model. The view of the server coming from the asset management system does not necessarily contain the same data as its view in your operations console. One contains the lease expiration date, the other one contains the amount of space left on the disk. Some data may be in both (e.g. number of CPUs, host name) but not necessarily in fields of the same name. Or with the same granularity (ownerName versus ownerFirstnameownerLastname). Not to mention type system differences (but if the items are already in the same repository you have presumably already forced some level of metamodel alignment). In short, you first have the challenge of model transformation, a more general problem. With the advantage that the entire model does not need to be translated for item reconciliation, only the subset of data needed to establish “sameness”: the identifying properties. And in some cases (e.g. when a standard model is used or when two instances of the same agent report on the item), the items to reconcile are already described in the same model and this step can be skipped.

Once the necessary level of model alignment has taken place (if needed) so that items can be compared, the real task of reconciliation takes place, based on domain knowledge. It could be through a set of scripts (Python’s mix of simplicity, portability, broad array of libraries and ease of integration make it shine in this usage). It could be through some kind of reconciliation taxonomy, like this draft that IBM has contributed to the Eclipse COSMOS project. Or through metadata such as WSDM’s correlatable properties. [BTW, as the spec editor I got to insert dubious cultural references in the specification (see the <print:PrinterResourcePropDoc> example in section 5.3.3.1), but let me assure you that I have since matured… ;-)]

These are not the only ways to reconcile items, but they are the approaches that can be followed based on just the data in the repository. Beyond that, you can run a dummy transaction and trace it (if possible) across different management systems to reconcile entities between them. There are plenty of other domain-specific tricks, depending on the item type (I remember a machine room, back in the days when each server had a CD drive, where a script to open the CD tray was used to allow the operator to put a sticker on the correct machine). In general, these approaches play on external variables that are not directly part of the model of the item and yet can be influenced through it. Similar to how the bulb temperature is used in this famous brain teaser. I guess the IT equivalent would be to load-stress an application and use IPMI to see which CPUs register a rise in temperature (note: not a recommended approach in production systems…).

Coming back to the IT model repository, you also need to have plumbing in place to deal with the result of the reconciliation: requests and data may come in that reference either one of the reconciled items and you need to be able to deal with that split personality, while providing a unified view in the general case. You also need to be ready to deal with potential data discrepancy between the items (either automatically or through of process that involves humans, but this is out of scope for this entry).

Preventing the mess from happening

Can’t we just prevent the problem from occurring in the first place? To some extent yes. The main way to prevent it is to not reconcile what doesn’t need to be. This may sound heretical in these days of “single source of truth” and “end to end visibility” but reconciliation of key connection points is often enough. You may not need to have one single model that contains everything from your company’s employee directory to the fan speed of all your servers. It’s a matter of delivering on use cases, not hoarding data.

When you do want to consolidate and reconcile, one approach is to standardize on natural IDs for items of different types. But this requires domain experts to carefully select identifying (and therefore immutable) properties of the different object types, which sounds a lot easier than it is. And it requires convincing others to adopt this approach, an even harder task. But as the proverb (almost) goes, one ounce of convention is worth one pound of reconciliation.

[Note: Whenever you talk about item reconciliation, the topic of correlating events is not far behind. It is assisted by a solid underlying IT model, but it has challenges of its own, so I’ll consider this out of scope for this discussion.]

3 Comments

Filed under CMDB, Everything, IT Systems Mgmt, Mgmt integration, Modeling

Less is more: inventory of XPath subsets

Many specifications that manipulate XML content have taken the step to create their own subset of XPath. Typically, they need an XML query/pointer language but full XPath (or XPointer) is overkill for their purpose (I am talking about XPath 1.0 here, XPath 2.0 is usually over-over-kill-and-then-some). Defining a subset of XPath rather than inventing a query language from scratch is attractive because:

  • people are relatively familiar with XPath (at least the most common parts)
  • it is already specified so you can leverage the W3C spec-writing work
  • implementers who have access to an XPath engine get an implementation of the subset “for free” since the XPath engine will process statements from any XPath subset

Here is a quick inventory of spec-defined XPath subsets that I am aware of.

XML Schema

Section 3.11.6 of the XML Schema specification (part 1) defines a subset of XPath used to point to a set of elements that are the target of an identity constraint. Here is the BNF of the abbreviated form of the subset (you can also use the functionally equivalent full-length notation):

Selector   ::=    Path ( '|' Path )*
Path       ::=    ('.//')? Step ( '/' Step )*
Step       ::=    '.' | NameTest
NameTest   ::=    QName | '*' | NCName ':' '*'

Actually, there is a second subset defined, to point to the identifying key from the identified element. It is very similar to the previous one but it allows attribute nodes to be selected, via this modification:

Path       ::=    ('.//')? ( Step '/' )* ( Step | '@' NameTest )

According to the specification, these subsets were defined “in order to reduce the burden on implementers, in particular implementers of streaming processors”. As this article points out, stream-friendliness of an XPath subset is a relative notion. But the subsets above seems, indeed, to fit the bill. They also include many simplifications that “reduce the burden on implementers” but have nothing to do with streaming, such as removing functions and all predicates (rather than simply restricting the content of predicates).

WS-Management and WS-ResourceTransfer

WS-Managment also defines two subsets (it calls them dialects) of XPath. I won’t copy the BNF definitions here, as they are pretty long. You can find them in Appendix D of the specification. They are called “XPath level 1” and “XPath level 2”.

The main use cases driving these dialects had to do with implementing WS-Management in resource-constrained environments, e.g. the board management controller of a server.

WS-ResourceTransfer took this idea from WS-Management and it too defines an “XPath level 1” dialect (see Appendix I of the specification)

Windows EventLog Remoting Protocol v6

Version 6.0  of the Microsoft EventLog Remoting Protocol, new in Vista, adds a filter mechanism (to select events in the log) that is based on a subset of XPath. Streaming appears to be a concern there too (“evaluation of each event MUST be restricted to forward-only, in-order, depth-first traversal of the XML”). And, being Microsoft, they also add some extensions to XPath.

CMDBf (coming soon)

CMDBf also defines a subset of XPath. It is not defined via a restricted syntax but via a limitation of the type of objects returned. There is no BNF provided. You can write your XPath any way you want as long as it only returns objects of the right types (e.g. nodesets containing comment nodes are out of luck). Think of it as “management by objectives” rather than micromanagement. The main driver here is not support for  streaming. It’s that since XPath nodeset serialization is a pain we only want to do it where there is a compelling use case. There is no point creating interoperability challenges for no practical benefit.

Others?

Except for XSD, all the examples above come from the IT management world, because that’s where I live. There are probably plenty of specifications in other domains which took similar steps, such as the Digital Talking Book ANSI standard (see this section).

And those are only XPath subsets defined as part of specifications. There are also XPath subsets defined by implementations (e.g. ElementTree’s limited XPath support). And others defined for research purposes (e.g. “Univariate XPath”, from this ACM article, that is quickly described in the previously-mentioned post about stream-friendly XPath subsets).

If you know of other interesting XPath subsets, please leave a comment.

Comments Off on Less is more: inventory of XPath subsets

Filed under Everything, Specs, Standards, WS-Management, WS-ResourceTransfer, XPath

Now I know why GAE has been killing me

I haven’t written about Google App Engine lately because I haven’t spent too much time using it. And the time I did spend was mostly consumed fighting JavaScript for some client-side aspects that have nothing to do with the fact that the server side is on GAE. I have only touched JavaScript occasionally in the 12 years since writing this game, and I am pretty rusted. Funny how Python came back to me almost instantly but JavaScript just doesn’t.

I am writing a quick post on GAE today because the Google team just published a blog entry that explains why my attempts to create a long-running process in GAE resulted in my application being disabled for having too many “high CPU requests”, a concept that was never documented in the quota system.

Google is now offering a “quota detailed dashboard” and, as this screen capture shows, it tells us that you only get 60 “high CPU requests” and that they replenish at a rate of 2 per minutes. So at least now I know.

What is still not clear is why a request that is doing nothing but waiting for a URLfetch to come back is considered by Google to use too much CPU…

2 Comments

Filed under Everything, Google App Engine, Implementation, Tech, Utility computing

HP introduces “Operations Manager i”

If you’ve seen a lot of news articles about HP’s IT management software this week (e.g. through Cote or Doug) it’s because the company held its Software Universe conference in Vienna this week and timed a bunch of announcements and PR events to match.

Most of the articles linked above just paraphrase the press releases and talking points. So if you’re going to get the company line, might as well get it straight from the horse’s mouth. Which we can now do through a new HP blog about BSM. The first article was penned by Mike Shaw and that’s enough for me to want to subscribe (I worked with Mike a few times when I was at HP and he is very sharp). I think Mike also wrote the other entries but since they are not signed (and the account name, “adsey007”, is pretty opaque) I am not sure. In any case, they are pretty good. This one gives an overview of the Vienna announcements. The next one describes in more details the OMi product. I am not in position to know how well it works but, according to the article, OMi takes the important step of modeling and managing events in the context of the overall model in the CMDB. Such that the event management features (e.g. correlation) can use the already-discovered relationships between the IT elements involved in the events (e.g. dependencies). The article also implies that the CMDB has been integrated with NNM (OpenView), Service Manager (Peregrine) and Server Automation (Opsware). Which is a lot of progress in 16 months since I left HP, so I am taking it with a grain of salt (we all know there are different levels of integration). The press release says that the CMDB is now integrated with 17 HP BTO applications, so you may need a whole salt shaker. In any case it’s great to see that Ramin and team are forging ahead, delivering products and driving the integration of the BTO portfolio.

The last paragraph (“OMi actually sits on top of existing HP Operations Manager installations…”) is intriguing and may provide a clue about the depth of the integration. In any case, OMi is something to keep an eye on as it is positioned to leverage a lot of the key strengths of the HP BTO portfolio.

BTW, this OMi product has nothing to do with this OMI which was a precursor to WSMF, WSDM and WS-Management. And which most people currently working in HP Software have never heard of.

2 Comments

Filed under Application Mgmt, Conference, Everything, HP, IT Systems Mgmt, Mgmt integration, Modeling, People

Here comes WSTF

A new Web services-related industry body has been announced today: WSTF (Web Services Test Forum). More details about it from Infoworld. My employer (Oracle) seems to be one of the drivers (along with IBM) but I am not personally involved.

A lot of hand-wringing, of course, about its relationship with WS-I. Which is understandable if you consider what WS-I was originally supposed to deliver (profiles, sample applications and testing tools). But not if you consider what it has actually delivered that is relevant (a couple of profiles, some time ago). WSTF could also be compared with the SOAPBuilders Yahoo group, but since that group has seen only two emails messages so far in 2008 (last one dated April 2nd), it seems safe to consider it dead. It would be interesting to know why that is though (it used to be pretty active in the early days) and what lesson WSTF may learn from it. Another effort you may want to compare this to is the Microsoft Web Services Protocol Workshops Process. It’s too early to tell, but they may turn out to be more closely related than meets the eye.

I noticed this innocuous-sounding sentence in the press release (warning, PDF): “As an open community, WSTF has made it easy to introduce new interoperability scenarios and approve work through simple majority governance”. You may wonder why this is important enough to figure in the press release.

I interpret it as a dog whistle call (heard only by those to whom it is intended) for the WS-I board. Microsoft’s Paul Cotton responds to it in his quote for Infoworld: “WS-I provides a proven and open organization and process that best suits our customers’ needs”. He also talks about “the incredible industry-wide momentum and leadership of WS-I”, which is indeed not very credible (especially the momentum part). The WS-I process, associated board politics and resulting inaction is what I was talking about in this entry (“veto rules being commonly invoked, stopping most of the activities that the resort was originally planning to offer”).

Speaking of this “Standardstown” blog entry, I should probably soon update it to include WSTF. What should it be? Maybe a trailer park in which customers bring their own lodging, put them side by side and see how they line up?

The current test scenarios seem to focus on fixing the interoperability mess that is WS-Addressing. I assume more will soon be added to test the different WS-* specifications out there. It will be interesting to see what direction WSTF takes after that. Will the payloads of the test messages be obvious dummy payloads (so that the focus is on testing the implementation of the WS-* protocols)? Or will they start to include real payloads (e.g. real purchase orders from real enterprise applications)? How about this: “dear vendor, I will only buy your wonderfully open, standard and interoperable Web services-based application when it is available as a WSTF endpoint and there are three other real-life products (including one from your main competitor) that successfully interoperate with the exact same SOAP messages I will be using”. This could become an interesting tussle between vendors as well as between vendors and buyers.

Alternatively, of course, WSTF could turn into a test of how much difference there is between a “standard” and a publicly specified and interop-tested interaction scenario…

A quick (and unsuccesful) Technorati search for some blog comments returns the “WSTF Dark Retribution Dinorobots Limited Giftset” which “includes all five Dinorobots in their sinister evil incarnation”. Can’t say you were not warned…

[UPDATED 2008/12/10: Gilbert Pilz, who was involved with WSTF from the start (and also left a comment on this entry, see below), wrote a detailed description of the problem WSTF tries to address and how Gilbert and others have structured WSTF to solve it.]

[UPDATED 2008/12/15: Via InfoQ, another long description of the goals and processes of WSTF, this time from Doug Davis.]

[UPDATES 2009/1/5: Chris Ferris also weighs in, including his view on the relationship with WS-I. Having participated in several of the early WS-I plenary meetings, I have to wonder if Chris had any double-entente in mind when he wrote that WS-I helps “the community understand where the bar is”.]

[UPDATED 2009/2/17: A response from Redmond.]

1 Comment

Filed under Everything, IBM, Implementation, Microsoft, Oracle, SOAP, Specs, Standards, Testing

What you’ve been spared (aka blog drafts boneyard #1)

I try to keep posts on this blog relevant to the general topic of IT management. Less than 10% of messages are in the “off-topic” category and even those are usually somewhat related to computer technology (mostly rants against the misuse of Flash and against the stupid ways in which US Social Security numbers are used). What this means in practice is that off-topic drafts are often abandoned when I realize that they are not relevant enough to make the cut. My “drafts” folder is a boneyard of such entries. Today, I am relaxing my standards and subjecting you to a list of them (they are still computer-related). Hopefully, either you find at least some of them interesting, or you come out with a renewed appreciation of what you’ve been spared over the years. Since they are all in one post, they are easy to just skip it altogether without being too tempted to hit the “unsubsribe” button for those who really only want to read about IT management (at least from me).

Here is a list of the topics covered below:

Messing with a blogger’s head

I recently looked at the HTTP logs for this site. Maybe I am the last blogger to realize this, but it looks like the online blog readers (e.g. Google Reader, Bloglines…) tell you how many subscribers they have for your feed. They do this through the user-agent HTTP header, which gets logged. It looks something like this:

Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 102 subscribers; feed-id=…)

Of course that’s only on a per-feed basis, so you need to add all the feeds (Atom and the different RSS versions) to get a total. Still, it’s a lot more visibility than I had before.

My first thought was “hey, some people are reading, better watch what I write”. But I quickly discarded that in favor of a more intriguing idea: if bloggers use this data, how hard would it be to mess with their heads? After all, this is not verifiable. Anyone can send HTTP requests with any user-agent they want. I can pick a blog and starts sending HTTP GET requests on their feeds with a user agent that pretends to be “Feedfetcher-Google”. And I can set the “subscribers” number to anything I want. To not be too suspicious, I could slowly pump it up, to look like a realistic increase.

Of course, an alert blogger would probably smell a rat if the number of subscribers shoots up and the number of incoming links and comments didn’t change, if the site still didn’t show up near the top of Google searches, or if the technorati “authority” didn’t change. Etc. There are pleny to ways to reality-test this. But people have an amazing ability to suspend disbelief when they like what they see, however logic-defying. If you don’t believe me, I have a pile of mortgage-backed securities to sell you.

This stat-pumping experiment could be done as a practical joke. It could be done out of meanness.  It could be done as an unethical and pointless sociological study (how many subscribers does it take for someone to go buy a Porsche on the assumption that the traffic will eventually turn into $$$, how does the impression of popularity change the writing on the blog…). It could even be done as a fraud (guaranteed increase in your subscription numbers if you sign up for my blog marketing service or you get your money back: just check your logs to see the results… – of course you could also generate fake users to create real subscriptions). It hits bloggers where they are the most vulnerable: the ego.

If you are thinking of doing this as a way to be nice to someone who needs encouragements, it will probably backfire. Before you process, listen to act two of this radio show (description: “A group called Improv Everywhere decides that an unknown band, Ghosts of Pasha, playing their first ever tour in New York, ought to think they’re a smash hit. So they study the band’s music and then crowd the performance, pretending to be hard-core fans. Improv Everywhere just wants to make the band happy — to give them the best day of their lives. But the band doesn’t see it that way.”)

Google search suggestions

When you enter a Google search query (on google.com or in the Firefox search bar), as soon as you’ve typed a few characters it proposes to complete your search terms (BTW, it’s not just Google, it is now an well-know extension to OpenSearch but Google pioneered it, at least according to the spec). Something about this just doesn’t sound right. If you think you know what I am looking for, why not propose the most likely answers rather than trying to complete my search request? If you get it right, then I’ll stop typing and I’ll click. Plus, Google already concentrates viewers on a small set of pages for each search query, with this feature won’t they compound this by concentrating people to a smaller set of queries, further shrinking the Web?

Since Google feels free to give me plenty of unsolicited suggestions, here is mine to them. If you are going to hand-held people as they write their queries, provide suggestions that desambiguate rather than suggestions that overly constraint. For example, if I type “python”, I get these suggestions:

“python tutorial”, “python list”, “python strong”, “python ide”, “python download”, “python for loop”, “python datetime”, “python re”, “python time”, “python os”, all clearly about the programming language. Wouldn’t it be more useful to detect algorithmically that results from searching on “python” fall into three largely disjoint groups, to detect a common word in each group and to ask the user to qualify their “python” request with either “programming”, “snake” or “monty”? Rather than the simpler but, in my opinion, less valuable approach of showing the most popular search queries that start with “python”?

On the other hand, this “most popular” feature has one benefit: it provides plenty of fodder for pop psychology, as I found out when tried to ask Google why they provide these search suggestions. As soon as I typed “why”, I got suggestions including “why men cheat” and “why did I get married”.

The part I like about all this, is the meta-meta aspect. Google doesn’t only suggest what you might want to read based on your search, they even suggest what you might want to search on. What’s the next meta level? Suggesting that you want to do a web search when you’re not even thinking of doing one? You can bet they will if they can. What a butler indeed.

Google to navigate rather than search

Still on the topic of Google, but a positive comment this time. It struck me one day that pretty much every single bookmark I have in Firefox is for an Oracle-internal site, not the public Web. After thinking about it for a minute, I realized the reason: Google doesn’t index the Oracle intranet. When I find a good page there, I can’t be sure I’ll be able to find it again easily, so I bookmark it. On the Web, on the other hand, why bother bookmarking it. I pretty much know I can find it from my Firefox search bar.

Most of the time, when I use Google, it’s not to find a new page. It’s to get back to a specific page. Case in point, when I want to look something up in the XPath spec (which I have done a few times lately in the context a CMDBf). I know it’s on the W3C web site, I could go there and navigate to the page in a few clicks. I also have a copy of it on my disk, I could open my file explorer and get it from there. But instead I just type “xpath” in Google. Again, I am not looking really “searching” (trying to find information about XPath), I am just navigating (finding my way back to the spec).

So I started a post to share this brilliant insight, at which point I saw (using Google in “search” mode for once) that Robin Cannon has already perfectly described it.

So I’ll just add a few thoughts to complement what Robin wrote:

  • I am sure the implication in terms of advertising have long been studied by Google (I would guess that people who use Google for navigation are a lot less likely to click on ads than those who are actually searching).
  • AOL had to die for the “AOL keyword” to live.
  • There are serious privacy aspects to letting Google know what you’re up to all the time (but I am not logged into Google, I clean up my cookie jar relatively often and, at least at work, I am behind a large enough firewall to have a mostly anonymized IP).
  • Somewhat ironically, there a potential security benefits. For example, the HP employee credit union is called “Addison Avenue credit union”. Googling for “addison avenue” gets you right there. If you mistype the name and ask for “adison avenue”, you get a suggestion that maybe you meant “addison avenue”, along with a list of links related to “madison avenue”. That’s enough data to realize and correct your mistake. On the other hand, directly typing adisonavenue.com into the navigation bar could have taken you to a spoof site (in reality it takes you to a link farm, not quite as bad, but you never know what it will turn into tomorrow).

BTW, am I the only one who doesn’t know what 2 of the top 3 “Google Fastest Rising Search Terms 2007” relate to (from the list in Robin’s post)?

What is a computer

It started with this New Scientist article: Ten weirdest computers. With all these examples, how do we define what a computer is? Fundamentally, it’s a physical system that can process data. Meaning that you can define a logical data model that can be mapped to the physical characteristics of the system. And the system is such that it (through the laws of physics) changes in such a way that after a time its new physical configuration represents data that corresponds to a calculation that took place on your original data. You get the resulting data by measuring physical characteristics of the system (not necessarily the same physical characteristics that you controlled to represent the input data) and deriving the result data from it. In short, to use a computer:

  • Step 1: you create a system that represents your input data
  • Step 2: you let the laws of physics “do their thing” on the sytem
  • Step 3: you measure the system to derive your output data

For example, take a spring scale and a bunch of 1kg weights. That’s a computer. At least it can add (within a given range). To calculate “4+8” you put four 1kg weights on the scale, then you put eight more, then you read the number next to the needle and it should tell you “12”. This is an example in which the physical characteristics that you use to provide input data (putting weights on the scale) is different in nature from the physical characteristics that you measure to get the output (the position of the needle, which is really a way to measure the compression of the spring in the scale).

Based on this, we can ask the next (and more practically useful) question: what makes a *good* computer? It has the following characteristics:

  • easy to set up
  • easy to measure results at needed precision level
  • not too many side effects (e.g. energy consumption)
  • fast and versatile (planting a pine tree seed and waiting for a pine cone to come out in order to calculate a Fibonacci sequence is a little too slow and too specialized)
  • able to process large amounts of data (that’s where the mechanical scale doesn’t… scale).

On that last topic, there are two ways to process large amounts of data. The way used by current computers is to process little at once but very fast and in a way that makes it very easy to use the output of one operation as input to the next one. The alternative would be to compute a large problem in one go of the physical system. For example, maybe one day we’ll know how to represent a mathematical problem in DNA form, such that we know that the solution to the problem corresponds to the DNA sequence most useful to a bacteria in a given environment, e.g. most likely to resist a given antibiotic. Setting up the computation system, in this case, would be engineering the antibiotic that selects for the problem’s solution. You can put that antibiotic in your Petri dish (or in the food of your 1000 cows, now that’s a “computer farm”), wait for a few days, then sequence the DNA of the bacteria that’s in the dish (or in your cow’s “output” matter, think of it as a “core dump”).

You can think of it as the RISC versus CISC debate, except with many more orders of magnitude in difference between the alternatives.

It is also interesting to note that networks and storage mechanisms (the other two consitutive elements of a data center, along with computers) can be thought of in a very similar way. If step 2 doesn’t change the data and can be made to last long enough, you have a storage system (e.g. engrave text on stone, store stone for a few thousand years, read text from stone). If instead of being far apart in time the locations in which you perform steps 1 and 3 are far apart in space (with 2 still not changing the data), then you have a networking system.

Is this a site or a feed

Like 99% of the blogs out there, this site is just an HTML rendition of an RSS (or Atom) feed. Isn’t it a little silly to have millions of Web site (visited by humans) that have their structure dictated by a machine-to-machine protocol? It is especially ironic on a site like mine, which occasionally talks about data models and protocols (and on which you would therefore expect the difference between the two to be understood). But no. Every time a new release of CMDBf comes out, for example, I create a new post with an updated version of the pseudo-algorithm for performing a graph query. Rather than having one page that gets updated (with potentially a “history” feature to access older versions).

As much as I’d like to blame the limitations of WordPress, I think it’s more a sign of my laziness. There are plenty of WordPress extensions that I have never considered. Or I could move to Drupal. The key question is, is there a way to get a site that is more useful as a unit (“show me what information William provides on his site”), while keeping the value of the feed (“tell me when William adds new content”) and not adding to my workload?

Comments Off on What you’ve been spared (aka blog drafts boneyard #1)

Filed under Everything, Google, Off-topic

Who said WS-Transfer is for REST?

One more post on the “REST over SOAP” topic, recently revived by the birth of the W3C WS Resource Access working group. Then I’ll go quiet for a bit and let people actually working on it show me why I am wrong to worry about WS-RT.

Before that, I just want to clarify one thing. People seem to assume that WS-Transfer was created as a way to support the creation of RESTful systems that communicate over SOAP. As much as I can tell, this is simply not true.

I never worked for Microsoft and I was not in the room when WS-Transfer was created. But I know what WS-Transfer was created to support: chiefly, it was WS-Management and the Devices Profile for Web Services, neither of which claims to have anything to do with REST. It’s just that they both happen to deal with resources (that word again!) that have properties and they want to access (mostly retrieve, really) the values of these properties. But in both cases, these resources have a lot more than just state. You can call all sorts of type-specific operations on them. No uniform interface. It’s not REST and it’s not trying to be REST. The Devices Profile also happens to make heavy use of WS-Discovery and I am pretty sure that UDP broadcasts aren’t a recommend Web-scale design pattern. And no “hypermedia” in sight in either spec either.

A specification is not RESTful. An application system is. And most application systems that use WS-Transfer don’t even try to be RESTful. Mocking WS-Transfer for not being as good as HTTP to support REST systems is like mocking an airplane for not being as good as your hatchback for grocery shopping. It’s true, but who cares.

So let’s not reflexively attack WS-Transfer for assumed purposes. And similarly, let’s not reflexively defend WS-Transfer as a good way to build RESTful systems.

Just to clarify, this is not meant as a defense of WS-Transfer. I think that, at least in the context of its original purpose, it should be gutted to only its GET operation. The PUT and DELETE tasks should be handled by domain-specific operations. Which would have the consequence of making it look less like a REST wannabe. But my recommendation aims at improving its applicability to the management domain, not at making it comply to an architecture style that is not (at least currently) used in that domain.

4 Comments

Filed under Everything, IT Systems Mgmt, Manageability, Mgmt integration, REST, SOAP, Specs, WS-Transfer

IT management and Cloud: now some products

Many of us have been thinking (a bit) and talking (a lot) about the relationship between Clouds and good old IT management.  John understands both sides and produced a few good posts (like this one).

Maybe it’s just a coincidence that both Hyperic and CA recently made such announcements. In any case, it gives the impression that time has come for some actual product capabilities in the area of managing Cloud-based systems.

I haven’t investigated either, so keep your slideware shields up, but this is what I read:

From Javier Soltero’s “Announcing HQ 4.0”: “It also provides the first cloud-friendly management agent which allows users to manage cloud based virtual machines securely and reliably from either inside the cloud, or from HQ 4.0 installations inside your datacenter”. John approves.

And at CA World, according to InformationWeek, CA will announce a partnership with Amazon to provide management capabilities around Amazon’s EC2 utility computing platform, potentially including discovery of software running on EC2 instances, performance monitoring, configuration management, software deployment capabilities and provisioning”.

When someone looks into these two products (and others, soon to follow or alrady out and that I have missed), it will be interesting to see how these Cloud-friendly capabilities relate to the good old capabilities of management products: “software discovery”, “perf monitoring”, “config management”, “software deployment”, “provisioning”. That all sounds pretty familiar. Is it just a matter of pointing the old tools to an EC2 IP address? Is it all new capabilities, done in a new way? Or, more realistically, where does it land between these extrems? Where do you want them to land? It’s not so obvious.

Utility computing comes with an expectation of additional flexibility (now that is obvious). When tweaking IT management tools to address the domain, does one leave “in datacenter” capabilities the same and branch off to do cool things in the new land? Or do you raise the level of flexibility accross the board?

In other words, rather than snickering at them, maybe we should praise IT management vendors for whom the “look, I do Clouds” marketing spiel is just a repackaging of normal IT management features. Because it may mean that they’ve raised the bar on “in datacenter” automation capabilities. These Opsware and BladeLogic acquisitions have to come in somewhere, don’t they?

BTW, both of the announcements above also perpetuate the confusion between providing utility services (CA’s extended SaaS offering, Hyperic’s release of a pre-packaged Hyperic AMI) and the ability to manage Cloud-based systems. It’s all crammed in the same announcement/article because, hey, it’s all Cloud stuff.

Speaking of CA World, if I was there I would go to this session. At least for old time sake, and maybe to get some interesting ideas. Hopefully Don will blog about it after he is done presenting later today.

5 Comments

Filed under Amazon, Application Mgmt, Articles, CA, Conference, Everything, IT Systems Mgmt, Open source, Standards, Utility computing

WS Resource Access working group starting at W3C

Things went quiet for a while, but the W3C Web Services Resource Access Working Group has finally taken life, as was announced last week. It’s a well-know PR trick to announce bad news on a Friday such that it goes undetected, is it a coincidence that W3C picked a Friday for this announcement?

As you can tell by this last remark, I have no trouble containing my enthusiasm about this new group. Which should not come as a surprise to regular readers of this blog (see this, this, this and this, chronologically).

The most obvious potential pushback against this effort is the questionable architectural need to redo over SOAP what can be done over simple HTTP. Along the lines of Erik Wilde’s “HTTP over SOAP over HTTP” post. But I don’t expect too much noise about this aspect, because even on the blogosphere people eventually get tired of repeating the same arguments. If some really wanted to put up a fight against this, it would have been done when the group was first announced, not now. That resource modeling party is over.

While I understand the “WS-Transfer is just HTTP over SOAP over HTTP” argument, this is not my problem with this group. For one thing, this group is not really about WS-Transfer, it’s about WS-ResourceTransfer (WS-RT) which adds fine-grained resource access on top of WS-Transfer. Which is not something that HTTP gives you out of the box. You may argue that this is not needed (just model your addressable resources in a fine-grained way and use “hypermedia” to navigate between them) but I don’t really buy this. At least not in the context of IT management models, which is where the whole thing started. You may be able to architect an IT management system in such RESTful way, but even if you can it’s too far away from current IT modeling practices to be practical in many scenarios (unfortunately, as it would be a great complement to an RDF-based IT model). On the other hand, I am not convinced that this fine-grained access needs to go beyond “read” (i.e. no need for “fine-grained write”).

The next concern along that “HTTP over SOAP over HTTP” line of thought might then be why build this on top of SOAP rather than on top of HTTP. I don’t really buy this one either. SOAP, through the SOAP processing model (mainly the use of headers, something that WS-RT unfortunately butchers) is better suited than HTTP for such extensions. And enough of them have already been defined that you may want to piggyback on. The main problem with SOAP is the WS-Addressing tumor that grew on it (first I thoughts it was just a wart, but then it metastatized). WS-RT is affected by it, but it’s not intrinsic to WS-RT.

Finally, it would be a little hard for me to reject SOAP-based resources access altogether, having been associated with many such systems: WSMF, WSDM/WSRF, WS-Management and even WS-RT in its pre-submission days (and my pre-Oracle days). Not that I have signed away my rights to change my mind.

So my problem with WS-RAWG is not a fundamental architectural problem. It’s not even a problem with the defects in the current version of WS-RT. They are fixable and the alternative specifications aren’t beauty queens either.

Rather, my concerns are focused on the impact on the interoperability landscape.

When WS-RT started (when I was involved in it), it was as part of a convergence effort between HP, IBM, Intel and Microsoft. With the plan to use this to unify the competing WS-Management and WSDM/WSRF stacks. Sure it was also an opportunity to improve things a bit, but 90% of the value came from the convergence/unification aspect, not technical improvements.

With three of the four companies having given up on this, it isn’t much of a convergence anymore. Rather then paring-down the number of conflicting options that developers have to chose from (a choice that usually results in “I won’t pick either sine there is no consensus, I’ll just do it my own way”), this effort is going to increase it. One more candidate. WS-Management is not going to go away, and it’s pretty likely that in W3C WS-RT will move further away from it.

Not to mention the fact that CMDBf (and its SOAP-based graph-oriented query protocol) has since emerged and is progressing towards standardization. At this point, my (notoriously buggy) crystal ball shows a mix of WS-management and CMDBf taking the prize overall. With WS-Management used to access individual resources and CMDBf used to access any kind of overall system view. Which, as a side note, means that DMTF has really taken this game over (at least in the IT management domain) from W3C and OASIS. Not that W3C really wanted to be part of the game in the first place…

11 Comments

Filed under CMDBf, DMTF, Everything, HP, IBM, IT Systems Mgmt, Manageability, Mgmt integration, Microsoft, Query, REST, SOAP, SOAP header, Specs, Standards, W3C, WS-Management, WS-ResourceTransfer, WS-Transfer

Barack Obama’s first day on the job

A phone conversation.

– White House IT support.

– Hi, it’s Barack Obama.

– Good morning Mr. President and welcome to the White House.

– Thanks. Hey, I have a problem with the computer on my desk.

– Is it the screensaver? I know, it’s pretty embarrassing. President Bush got it from the vice-president and he really liked it. I was planning to remove it before you arrive this morning, but you got here before me. Sorry about that.

– Forget the screensaver. It’s the keyboard.

– Pretzel crumbs again, I am sure. Just shake it upside down.

– No it’s just the “Z” key.

– What about it?

– I’ve been pressing “control-Z” all morning. The economy is still a mess, the deficit is still huge, we’re still stuck in Iraq and Guantanamo is still open. And now my hand hurts. What gives?

– …

– Can you help?

– I am sorry Mr. President, I am afraid you cannot undo the work of the previous administration that easily.

– Really? Well, how on earth am I going to do it?

– I think it will take a lot more work.

– You’re positive I really can’t use “control-Z”?

– No you can’t.

[UPDATED 2008/11/9: Looks like he is not deterred: “Obama Weighs Quick Undoing of Bush Policy” (New York Times article, November 9, 2008)]

5 Comments

Filed under Everything, Off-topic

First in-depth look at Microsoft’s Oslo and the “M” modeling language

Microsoft’s PDC is taking place this week and more details were shared with the attendees about project Oslo, an effort announced last year to drastically improve the use of models across the application lifecycle. Some code is available (I think the Quadrant code is only for PDC attendees but the Oslo SDK is available to everyone). I am not at PDC, I didn’t see any presentation and I didn’t download any code. But Microsoft has also posted technical details on MSDN and, as far as I am concerned, that’s the most time-effective way to spend a couple of hours learning about Oslo. BTW, the way they share these early design descriptions and accept to make their evolution public is admirable.

For those who only want to spend 10 minutes rather than 2 hours, here are the thoughts that came to my mind as I was reading.

Overall I am somewhat underwhelmed, but not necessarily in a bad way. I know that’s a little schizophrenic so let me explain. After hearing a lot about how Oslo was the next big thing in modeling, it is a little surprising to read a document that can be summarized as “modeling is good, so go create some SQL tables and store them in a RDBMS”. That’s the underwhelming part. But on the other hand, it is more down to earth and practically-minded than I feared. And this is just a summary, in truth there is more than just “use SQL”.

Half of the MSDN documentation basically explains how to use SQL Server to store application models (as of today, the “Developing Models for the Metadata Store” section has only one sub-section, “SQL Server Guidelines for Modeling in the Oslo Repository“). Does this mean that all .NET applications will eventually have to carry with them a deployment of SQL Server 2008 even if they don’t use it to store the their operational data? Sure there are a few extra repository services (e.g. finer-grained change auditing) but most Oslo repository services are generic SQL Server features. That section has quite a lot of T-SQL, but it’s pretty readable. It also has a lot of dependencies on following naming conventions which makes me think that directly creating T-SQL code is not the best approach.

Fortunately there is an alternative, the “M” language. It’s a schema language with a built-in constraint mechanism. I found it more data-oriented (as opposed to resource-oriented) than I expected. Even though “each model is really a set of data structures, relationships, and constraints in serialized form“, there is a lot more support for data structures and constraints than for relationships. It’s just a foreign key. Relationships aren’t items and don’t have any property (or “field” as they’re called in “M”). For example, the relationship between a student’s enrollment record and a given class can’t have, as property, the grade that the student got for that class (as in the example in section 4.1.4 of the second LC of SML). To model this in “M” you need to create another item (e.g. “courseEnrollment”) and have a relationship from the student to that item and another one from that item to the “course” item itself. Or to replace the foreign key in the student table with a complex structure that contains both the foreign key and the properties of the relationship. At the end it has the same expressiveness potential, but in a less streamlined form. I assume Microsoft took this approach for performance reasons.

I am going on a limb here, but it may also be a difference between development-time concerns and operation-time concerns. During development (all the way to testing and packaging), you can still mostly get away with a relatively simple containment structure. You care about the components of your application and how they are packaged inside or next to one another. Sure you care about who calls who outside of the deployment unit but that’s not as core a concern as getting your class dependencies right, your tests in order and your installer configured. In fact, some of the “who calls who” bindings will be only be realized at runtime. Oslo, at least so far, clearly seems more focused on development time than operations so support for a relationship-rich model may not seem critical. At operations time, on the other hand, you don’t really care so much about how things were packaged before installation. You care a lot more about who invokes who (especially for modern distributed applications), what the network layout is, what resources a ticket is attached to, etc. The model looks a lot more like a graph with complex relationships. Something that “M” doesn’t seem ideally suited for.

Except for this caveat, I like “M”. It’s not anti-XML (you can represent values as XML if you’d like) but it avoids the “the answer is XML/XSD what is the question” approach to modeling that is sometimes a little too prevalent. “M” is a much better schema language for IT systems than XSD. I especially like its approach to types. A value is not intrinsically of a given type. A type is a condition that you happen to meet or not at the current time (“take heart little field, you can be anything you want when you grow up”). As such, you can be of several types at the same time. Refined types are potatoes inside potatoes (not sure if “M” supports definition of types as unions and/or intersection of existing types, for intersection I want to write something like”type NewType : OldType1 where this in OldType2” but there is no “this” in “M”). That approach to types (and the way constraints leverage types) is reminiscent of RDF/OWL. It’s a classification more than a typification, but I understand why they didn’t want to call it “class”. The similarities with RDF/OWL don’t go any further. As I wrote earlier, “M”is very data-focused and not resource-focused: as far as I can tell “M” types are defined syntactically, not semantically (the semantics come as a consequence). For example, I don’t think that you can assert that a given item representing a person is of type “friendly” if there is no corresponding data in the item. You’d have to first create a boolean field called “friendly” and define that those that have that field set to “true” are of type “friendly”. Unlike in RDF/OWL where you can just assert that a subject is “friendly”.

Here is another reason why you can’t have “semantics-only” types: “if you do not specify the type of a field or value, M infers a type for it“. Two things don’t sound quite right to me here. First a detail: the sentence (like others in the doc) talks about “the” type of a field of value, while there can be more than one. More importantly, what’s the point of this feature? How does it help me to have my IRC nickname classified as a post code or as a password just because it happens to be made of a compatible combination of letters and numbers? Maybe it makes sense as a storage optimization, but why does it make sense to expose this to the user?

I also like the way “extents” work. The current description of that feature is pretty limited, but based on how it is used in other parts I think one of its usages is to support a non-OO equivalent to inheritance: create two extents, one for the “superclass” and one for the “subclass” where each only contains the properties/fields defined at that level. You should get both of them in order to have the full picture (all the fields). This is, if I understand it correctly, similar to something I have been (unsuccessfully so far because “XML doesn’t do it this way”) trying to sell to the DMTF CMDBf working group: model inheritance through a set of non-overlapping records rather than dealing with a type hierarchy on record types. It’s not just that it makes relational storage easier (even though it does and that’s probably why “M” does it this way), it also makes your query/select operations a lot easier to specify and implement.

All in all (and without having gone through the exercise of defining actual models in “M”), it seems like a fine schema language (except that its dependency on the CLR base types is unpractical for users outside of the Microsoft universe) but I am not sure if it is beefy enough to be a good IT management metamodel. When the document says that “the Oslo repository provides open and flexible access to the data it contains, which enables direct access to SQL Server views of the underlying data. There are no complex data access layers or APIs” it sounds better than saying “it’s just SQL, so map your model to it and if you want relationships or type inheritance just build it on top of it and quit whining”. But it is an admission of limitation at the same time as a claim of simplicity. I also smell an assumption that LINQ will provide enough hand-holding that non-SQL-savvy developers will be ok. We’ll see.

And then there is MGrammar. Things get a little confusing at that point if you try to relate MGrammar to “M”. Actually, the FAQ states that “the M language consists of three parts: MGraph, MSchema and MGrammar“. This came a bit as a surprise to me since at that point I had finished reading (not in details but not too quickly either) the “M” documentation and I hadn’t seen these names mentioned once. Looks like there is some documentation consistency issues here, but that’s hardly surprising considering this is a “hyper-early (pre-alpha)” release as Doug Purdy puts it.

I think that everything that I have referred to as “M” above is MSchema.

MGrammar is something different altogether: it’s the source of the Domain Specific Language (DSL) references we’ve been hearing in relationship with Oslo. Technically, MGrammar is a BNF on steroids plus an automatically generated parser for your syntax. Cute. I assume that “M” (i.e. MSchema) is built as MGrammar-defined DSL but I am not sure why I would care. I am all for reuse and if someone at Microsoft thought that there was something reusable in the way they defined MSchema then it’s a good thing to expose this tool. But where does it come into play in application modeling? The last thing I want is people inventing completely independent languages to describe different domains. I am all for specialization, but a common underlying metamodel is pretty nice when you have to make sense of a whole system. I don’t see any such commonality in MGrammar: as far as I can tell it can be used to define anything from PostScript to sonnets.

From the FAQ, the connection point between MGrammar and MSchema is MGraph (MGrammar languages are parsed into an MGraph, MSchema “builds on MGraph”). That’s nice, but since neither the MSchema nor the MGrammar documentation mention MGraph I don’t really know what to make of this. David Chappell’s white paper also mentions MSchema and MGrammar but not MGraph. The introduction to the MGrammar Language Specification states that “the data that results from Mg [a.k.a. MGrammar] processing is compatible with Mg’s sister language, The Oslo Modeling Language, M, which provides a SQL-compatible schema and query language that can be used to further process the underlying information“. Compatible? I need more information here. In any case, MGrammar sounds like a fun project for a techie. Who am I to deny Microsoft engineers their fun. Jokes aside, I am probably missing something here seeing how prevalent the DSL message is in all discussions of Oslo. Look at the “highlights of this book” section for the upcoming Oslo/M book from the creators of the “M” language: half of it is about the DSL support and there must be a reason beyond pure geekery. As a side note, if you buy this book you need to understand what little shelf life it will have (I can give you a good price on a lightly-used Hailstorm/”.Net my services” specification book).

Aside from the “M” language itself, there are a few models described in the documentation. One corresponds to BPMN (actually, it says that it “closely aligns with” BPPMN 1.1, does this imply that they are not quite the same?). The fact that this model supports imports from Visio is a nice feature.

The Application model (one of the places where you can see “extents” in action) scares me a little bit because I doubt that two different people would use the same “extents” to describe the same software elements. Unless of course that’s being done for them by a pre-defined mapping to their development framework (.NET) enacted by their common development tool (Visual Studio). Which may be the assumption. Yet, the Application model is defined in generic terms, not Microsoft-specific (with a couple of slip-ups, like a WebApplicationModule being defined as a “Web application (module) implemented by IIS or WAS“. Maybe I’ll feel better about the generic applicability of this Application model when I see a full-fledged description (e.g. including relationship semantics as captured in foreign key field names) and an example.

At the bottom of that Application model, there is a lonely “Manageable” type to use if you have a LifecycleState field. This reinforces my impression that despite the claims to link development time with operational time, a lot of the focus to date has been on the former rather than the latter.

The ServiceModel model will look familiar to people familiar with SCA and is presumably complementary to the WorkflowModel and WorkflowServiceModel models, both of which are directly mapped to Windows Workflow Foundation. I guess that’s where Oslo and Dublin touch one another. I am still glad they are now clearly separated.

There is also a “Quadrant” model which concerns me a bit (it seems to be used to store customization of the Quadrant UI which, while convenient to store straight in the repository, doesn’t strike me as necessarily belonging there).

At this point, the question is not whether Microsoft can build Oslo as it is currently defined. SQL Server 2008 already exists, the usage guidelines aren’t unrealistic and even the “M-to-T-SQL” translation doesn’t seem too hard for Microsoft to implement (the SDK presumable already contains an implementation). I have no doubt they can deliver the system they describe. What I don’t know is whether and how it will be actually useful.

Describing “M” in details is good. Describing how the repository is implemented on top of SQL Server 2008 is interesting but not so relevant. What I’d like to see is a description of how all this gets used. How does it change the Visual Studio experience? How does it change the installation process/format? How does it support round-tripping between lifecycle stages (e.g. if the developer changes the workflow model, does that original BPMN model get consequently updated)? How does it relate to SLAs and policies? How does it apply to application monitoring? How does it apply to configuration management, to the change process? Etc. In short, what’s the Oslo ecosystem going to be.

These questions aren’t completely ignored in the MSDN documentation, but they are dispensed with in a couple of pages: “Application Development and Lifecycle Improvements” and “IT Operations Benefits“. The former states, for example, that “having the Oslo repository act as a central location for these models also enables a connection between the design and implementation models. This connection helps prevent these models from becoming disconnected during the development process“. Which all sounds good but is just a set of assertions that we have heard many times before (not just from Microsoft). How do “M” and the Oslo repository really make this true?

On the “IT Operations Benefits” side, things are equally blurry: “the Oslo repository can store all types of machine and application configuration data. When consistently updated, this configuration data is a catalog of the current state of all monitored machines and applications in the environment“. Notice the “when consistently updated” hand wave. That’s kind of the crux if you really want to manage across the lifecycle. How will they achieve this consistency? By centralizing all changes through a model-driven controller a la SDM/SML? Through ongoing discovery and/or change notifications? By relying on good old ITIL/MOF processes?

The FAQ declares that “having a common approach does not necessarily correlate to one physical store, but more of a federated model and we believe that some of the new Repository, along with existing investments in both Configuration Management Database (CMDB) and Team Foundation Server (TFS), will form the foundation for a common Microsoft metadata strategy and should be supported across our set of products“. OK, but who is the source of truth for application configuration data? The Oslo repository or the CMDB? Is one the desired state and the other the observed state? Does the CMDB go back to simply being a Service Desk (and if so, does the Oslo repository take on the responsibility to enforce change processes, something that requires more than the security model in Oslo)? If the CMDB is still going to use SML as its metamodel, how do you efficiently federate across such different metamodels as SML (i.e. XSD + schematron + relationships) and “M”?

Lots of questions remaining. What will Oslo have turned into in a few years? A business process design/implementation/monitoring suite (there is a strong workflow feel to many parts)? A generic drag-and-drop programming environment (“the fact that entire features are already described by models means that for a wide array of application and component categories you can start using visual tools to design and implement your components“)? A control center for end to end application management? All of the above? Nothing?

This was just a quick brain dump after reading the documents. Actually, I just realized it somehow got pretty long (congrats if you’re still reading). I hope this post is not too disorganized. Oslo is an interesting effort, but, as Microsoft is first to admit, it’s at a very early stage. I am just surprised that this first release spends so much time on the “how” rather than the “what”. Maybe it’s just because I only got my information from the MSDN documentation. We’ll see when more content from PDC finds its way online. I just want the slides, watching recorded presentations is rarely time-efficient (and you can expect them to require Silverlight).

Speaking of Silverlight, there is this new site on Oslo if you think watching some videos is worth installing Silverlight. Those screenshots don’t motivate me sufficiently.

[UPDATED 2008/10/30: Rather than going to bed I Googled around a bit and found a  post by Martin Fowler that answers some of my questions about MGrammar, MGraph and MSchema. MGraph is for instances, MSchema is for types. It answers some plumbing question, but I still have questions about expected usage and relevance to applications modeling.]

[UPDATED 2008/10/30: I also found the recordings and slides from past PDC sessions. Nice job Microsoft for this quick turnaround time, even if you require Sliverlight and/or the PPTX viewer. The sessions are:

  • TL23 A Lap around “Oslo” (Doug Purdy, Vijaye Raji)
  • TL27 “Oslo”: The Language (Don Box, David Langworthy)
  • TL18 “Oslo”: Customizing and Extending the Visual Design Experience (Don Box, Florian Voss)
  • TL28 “Oslo”: Repository and Models (Chris Sells)

The first two sessions (deliverd Tuesday) have a replay and slides, the others should, I assume, follow soon.]

[UPDATED 2008/11/3: A nice overview of Oslo by Aaron Skonnard. Unlike most other Oslo articles over the last week, this one tries to paint the (yet-to-be-realized) full picture of the Oslo ecoystem. He mentions that “other Microsoft products and technologies are expected to build on Oslo to provide other runtimes. A few that have already been announced include Microsoft System Center (Operations Manager) and Team Foundation Server (TFS) in Visual Studio Team System”. It’s interesting that he qualifies System Center to be more specifically “operations manager” rather than “configuration manager” but I wouldn’t read too much into it at this point.]

5 Comments

Filed under Application Mgmt, BPM, Business Process, CMDB, Everything, IT Systems Mgmt, Manageability, Mgmt integration, Microsoft, Middleware, Modeling, Oslo, SML, Specs, Tech

CMDBf work in progress

The DMTF CMDBf working group (of which I am part) has released a work in progress version of the CMDBf specification. The changes from the submitted version are minor. It’s mostly a move to the DMTF template. More important (but not drastic) changes should appear in the next release.

Comments Off on CMDBf work in progress

Filed under CMDB Federation, CMDBf, DMTF, Everything, Graph query, Specs, Standards

Dear Microsoft, here is my $0.25 Windows license fee for the month

Pricing is now available for Windows instances on Amazon EC2. More than the technical availability of Windows AMIs, the fact that you get pay your Windows license fee based on usage is a major change. This is where Microsoft’s announcement goes beyond Oracle’s EC2 announcement at Oracle Open World.

But why stop at EC2 instances? If I can do it there, why can’t I do it at home? Considering how rarely my home desktop is booted to Windows, I would love to pay my Windows license in a metered way. It would basically be limited to time spent editing video and participating in family Skype videconference (at least until I manage to get Skype full screen video to work on Ubuntu).

After all, why only Amazon and not other Cloud providers. And when this happens, I think I may become a cloud provider myself. It would be a small-scale operation. One physical CPU (my desktop). And one user (me). I would meter my usage and dutifully pay Microsoft every month based on the number of hours during which I was running Windows.

How much would that be? Well, a Linux Small Standard Image EC2 instance (the closest thing to my aging desktop) costs $0.10 per hour. The Windows version costs $0.125 per hour, so the Windows license on this machine costs 2.5 cents per hour. On a given month, I don’t use it for more than 10 hours (edit/render one DVD plus a few hours on Skype). That’s 25 cents. Does Microsoft take Paypal? Is the Microsoft tax about to get more progressive?

It will be interesting to see how Microsoft manages to be flexible on server OS licensing (where it has plenty of competition) and while keeping its highly profitable (and unfairly front-loaded and restrictive) desktop OS licensing intact.

[UPDATED 2009/1/19: What do you know, here is a Microsoft patent for a “Metered Pay-As-You-Go Computing Experience”, found through this article.]

1 Comment

Filed under Amazon, Business, Everything, Microsoft, Utility computing