Analyzing the DMTF incubator process

Depending on how you choose to look at it, either the DMTF has streamlined the process of defining standards or it has created a rubberstamping machine. I am referring to the “DMTF Standards Incubation Process”. It is recent, but not brand new (the DSP that defines it is dated April 6, 2007). I had heard about it but never really looked into it. Until this weekend, when I finally got motivated to investigate a bit. AFAIK this process has not yet produced any specification.

As I understand it, the goal of this incubation process is to allow a group of like-minded companies to get together in the DMTF and produce an “informational specification”, which is typically a refinement of a vendor submission. The informational specification would then go through a normal DMTF working group but often in an expedited fashion, only allowing limited changes. That’s not a guaranteed outcome, but it seems to be the “normal” case as envisioned in this process.

This overview should make it clear why this can be seen as a rubberstamping machine. Here are a few key points (the quotes come from the process description):

  • “Standards Incubators are often formed in conjunction with an initial baseline contribution by the founding members with the expectation that the group will serve to evolve and finalize that contribution” and later it says that leadership members should have a “commitment to maintaining alignment with the input submission”.
  • Only leadership-level DMTF members get to be on the review board (the part of the incubator that makes decisions). They have to fairly consider comments from the other companies, whatever “fairly” means.
  • It is necessary but not even sufficient to be DMTF leadership-level company to be on the review board of an incubator. If you are not there when the incubator is created then you have to be approved by the current leadership members to join. It is unclear to me whether any DMTF leadership-level company can join the “review board” of an incubator at the start or whether those who propose the incubators get to choose who they let in.

There is nothing sneaky here, the incubator process is pretty upfront about being designed to allow a vendor-provided specification to be considered/reviewed/improved in a friendly environment, in which opponents are kept away. The question is what happens next. There are four possible dispositions once an incubator finishes its work and its has produced an informational specification.

The first two, “bootstrap / expedited delivery” and “finalization” are pretty close in practice. A working group is created that is restricted to not making any significant change to the “information specification”. The “bootstrap” approach only allows small corrections, “finalization” also allows some additions (but no change of what is already there). In other words, the working group is mandated to pretty much rubberstamp the informational specification: with these two dispositions, there is no opportunity (in the incubator or in the ratification group) for people to suggest technical approaches that are significantly different from those in the initial submission.

The place where technical alternatives can be considered is if a competing incubator is created. At this point, the DMTF board may decide that a working group should be created to reconcile the two. Even then, the board may pick a winner (in which case the reconciliation amounts to adding in the selected specification some features that are only present in the other specification, in effect protecting existing implementations of the winner). And if this is the path taken, the process makes it clear that this should be driven not by technical merits but by “adoption and momentum”. Which implies that the companies that ship the most products get to pick since they can single-handly create “adoption and momentum”.

And finally, the last possible disposition is “termination”, in which no further work on the informational specification takes place in the DMTF. But the barrier is pretty high for this direction to be taken: the specification has to have “little adoption or industry interest”. It seems reasonable to interpret this to mean that this would only apply if the initial proponents (who created the incubator) have lost interest themselves, otherwise they alone would provide sufficient “industry interest” for the termination to not take place and force another outcome (which can only be one of the first two, the rubberstamping options,  if there is no competing incubation group). And even if the work is indeed terminated, the specification remains available indefinitely as a “DMTF informational specification” which people can (and will, if it serves their purpose) simplify to “a DMTF specification”. The difference between this and a DMTF standard will be lost on 99% of IT writers and IT buyers. I submit as evidence all the confusion about the status of a “W3C note” (the confusion prompted W3C to eventually rename this to “submission”). It will be even less clear with DMTF “informational specifications” because, unlike W3C notes, they did go through some modifications in the DMTF and they are indeed a product of the DMTF. I would not be surprised if some “DMTF informational specifications” stayed just that and lived a happy life as a pseudo-standard. One thing that remains unclear is whether such a terminated informational specification can be taken over by another standards organization.

Bottom line, if you don’t like a submission to an incubator group you’d better put together an alternative (quickly) to stay in the game. And if your position is not “this is a bad technical approach” but rather “this is something we should do in a more open and deliberative way, or maybe later” then you only get one chance to make your case: you’d better make a convincing argument to the board to not allow the incubator to be created because after that there is little chance to stop the train. And how many standard organization boards do you know who say “no” to submissions from large vendors?

Having said all this, is this really a bad thing? Let’s look at it from the “glass half full” side.

First let’s realize that this happens anyway. Companies get together (often around an initial document created by a single leader) to create a specification and then look for a standard organization to ratify it with as few changes as possible. Other than CIM, it seems that all recent DMTF efforts started out this way (WS-Management, CMDBf, OVF). This is how Microsoft (sometimes with IBM) built the whole WS-* stack. They even had a name for it (the “workshop process”) to try to make it sound more open than it was. I’ve been on the inside (SML, CMDBf, WSRF/WSRT) and outside (WS-Management and other WS-* specifications) of it and it’s a pain whether you’re inside or out. It’s very opaque. Efforts may die and nobody ever knows (for example, does anyone know what’s up with CML)? Even when those inside want to get feedback and share their work they have to deal with a tangle of legal agreements that make it unnecessarily hard for everyone. In addition, all the work and discussions that go into the submitted specification usually get lost as the work transitions to a standards body (e.g. no email archive and unclear IP/confidentiality rules in re-using them). And the fact that these efforts are private does not prevent companies from demanding guarantees that their submissions won’t be changed too much. For example, the WS-Management working group had an explicit goal in its charter to maintain compatibility with the submission and the same debate was played over and over again in drafting the charters of several OASIS and W3C groups. Companies play one standard organization against another if necessary to get this guarantee.

Anything that can take us away from this mess merits consideration even if it is not a perfect alternative (there isn’t any). The DMTF incubator process doesn’t seem to relax the control of the sponsor companies, but it provides some level of transparency (at least for DMTF members) and, presumably, some continuity between the incubation and ratification phases.

Standards organizations constantly get blamed for either being too slow/procedural (e.g. HTML at W3C) or being rubberstamping machines (e.g. OOXML at ISO). Or both at the same time (most WS-* work). Most steps an organization can take to address one criticism makes the other worse. This “incubator” process is an example.

Everyone complains about “design by committee” and how inconsistent and bloated specifications become when everyone is listened to and made to feel included. The specifications end up with too many options (a killer of interoperability) and no guiding vision. A more constrained set of authors usually produce a simpler and more consistent specification. Has anyone ever seen a standard that is shorter than the submission that started it?

Not to mention the fact that working group chairs are often in an uncomfortable position, forced to choose between, on one hand, accusations of being dictatorial and, on the other hand, seeing his/her working group drift away from one rathole to the next, with no end in sight. The standards world has its fair share of obstructionists and pontificators. Some do it on purpose (they have been mandated by their employer to prevent progress in a group), most do it just because of their personality (and the fact that their employer has no real interest in what the person does in this group, as long as his/her presence allows the employer to claim to be part of the game). Forcing people who want an alternative approach to actually put together a proposal is a way to keep pontificators at bay. Unfortunately, it also shuts off qualified people who know a domain well and want to share their knowledge but don’t have the time (or employer sponsorship) to put together an entire alternative specification around their proposal.

At the end, it comes down to what a standard should be. If you think a standard should capture the knowledge of most experts in the industry and give an equal voice to all organizations, then this is a step in the wrong direction. If, on the other hand, your position is that the big guys will effectively set standards anyway so it might as well be done in a way that is fast, relatively transparent and consistent with their implementation, then you’ll applaud this initiative.

In creating this process, the DMTF made a clear (though grammatically challenged) statement on this topic: “adoption and momentum may outweigh technical issue regarding success”.

2 Comments

Filed under DMTF, Everything, Mgmt integration, Specs, Standards

Oracle Enterprise Manager 10g R5

Oracle Enterprise Manager 10g R5 (a.k.a. 10.2.0.5) is coming out. It will be announced by our SVP on Tuesday (subscribe to the Webcast). InfoWorld got a preview. From the application management perspective, it includes new management capabilities for middleware products that came from BEA (WL and OSB) and for several Oracle applications (Siebel, EBS, PeopleSoft, Beehive, BRM). And lots of goodies in other areas (virtualization, database, automation…).

[UPDATED 2009/3/3: bits! bits! bits!]

[UPDATED 2009/3/4 (and later): My colleague Chung Wu provides more details, followed by drill-downs into specific area. I’ll add links here as they become available:

Thanks Chung!]

Comments Off on Oracle Enterprise Manager 10g R5

Filed under Application Mgmt, Everything, IT Systems Mgmt, Oracle

Towards making Cloud services more consumable by enterprises

If you are a hard core network/system management person who has been very suspicious of all the ITIL/ITSM bullshit from the boss, and even more suspicious of the “Cloud” nonsense that occupies the interns while they should be monitoring the event console, then I have bad news for you: they are mating. Not the interns (as far as I know), I am talking about ITSM and the Cloud.

If, on the other hand, you are an ITSM and Cloud enthusiast who sees himself/herself leveraging all these nifty ITSM/BSM tools and shinny Cloud services to become the ultimate CIO, delivering unprecedented business value, compliance and IT efficiency from your iPhone at the beach, then you’ll see this marriage as good news, a sign that your move to Hawaii is approaching.

I am referring to the announcement by Rodrigo Flores that his company, newScale, has released a new product, newScale FrontOffice for Virtual Data Centers, to incorporate Cloud-based services in their IT Service Catalog.

This is not sexy but it’s the kind of support that some classes of enterprises will need in order to really make use of Cloud services. Eventually, Cloud providers are going to have to move their focus away from cool technology and developer evangelism towards making their services easily consumable by corporations. Otherwise they’ll be like an office supply store that doesn’t take American Express.

While the direction is very interesting I can’t comment on how much value this new product actually provides because the company seems to engage in anti-marketing activities. If you want to dig a bit deeper than the press release, you get redirected to this page which requires your info in order to download the “complimentary information brief”. The confirmation page promises an email “containing the information you requested within the next 30 minutes”. I thought that the info I requested was the brief. What I got instead was an email asking me to call them. I don’t know if it’s my unpronouncable last name of the oracle.com in my email address that scares them (I’d guess the latter) but that seems like a lot of precautions for a “complimentary information brief”. Unless this is an attempt to grab the “Cloud” buzzword with little meat to back it up (not that anyone would do this, of course). Hopefully they’ll make more information publicly available when they get around to it.

[UPDATED 2009/2/25: Via Coté, I just saw what I consider supporting envidence, from Gartner that, once we’re out of the early adopter phase, Cloud firms will need to focus less on pleasing developers and more on pleasing IT people: “But my observation from client interactions is that cloud adoption in established, larger organizations (my typical client is $100m+ in revenue) is, and will be, driven by Operations, and not by Development.”]

[UPDATED 2009/3/3: I got the PDF brief, thanks newScale (Ken and Mark). Many of the benefits it describes assumes that there is a pretty robust automation/provisioning infrastructure to back it up, in addition to the Catalog itself. E.g. the Catalog alone will not allow you to “shorten the provisioning cycle time to minutes instead of months”. The brief lists adapter kits to VMWare/EC2 and more internal-minded tools (HP and BMC, presumable through their Opsware and BladeLogic acquisitions respectively). So on the “public cloud” side it’s EC2 for now, not surprisingly. Integration with many of the Cloud tools (like RightScale) could be tricky since these tools bundle a catalog with the automation engine. If we ever do get a useful ontology of Cloud services this catalog would be a natural user for it, when it expands to other services beyond EC2 and tries to help you compare them. I guess newSCale wouldn’t appreciate if I provided a direct link to the PDF, so go request it to see for yourself.]

[UPDATED 2009/3/23: Speaking of managing your IT systems from your cell phone at the beach: VMWare vCenter Mobile Access.]

3 Comments

Filed under Cloud Computing, Everything, Governance, IT Systems Mgmt, ITIL, Mgmt integration, Utility computing

The datacenter as a programmable entity

This is an exciting time for those who want to shrink the computer. They are having a field day playing with devices powered by Android, the iPhone’s Cocoa, Palm’s new WebOS, Windows Mobile, JavaFX (maybe one day) and, to a lesser extent, the Blackberry.

But times are good too for those who want to go the other way and program larger things rather than smaller ones. If you are interested in thinking about datacenters as a programmable entity, you are in luck: for these long plane trips when you run out of battery, bring a printout of the proceedings of the research meeting organized last year in Cambridge by Microsoft and HP Labs, titled “The Rise and Rise of the Declarative Datacentre”. When you’re back on-line go check the presentations on the site.

And if you liked Paul Anderson’s “Programming the Data Centre” presentation at the Cambridge meeting, you can also read his “Programming the Virtual Infrastructure” slides from LISA 08. More LISA 08 presentations here.

I got the link to Paul Anderson’s second presentation (and maybe also the first one, some time ago) from Steve Loughran, who also adds a few comments, starting with the debate between the declarative and procedural approaches. This question has plenty of down-the-road implications. There is a lot to like about the declarative approach in terms of composition, manageability and more generally as a framework to manage complexity via encapsulation.

A simple analogy for this debate is to think about driving directions. The declarative approach is for me to give you a map with a circle on it showing where my house is and let you find your way. It’s more work for you but it’s also more resilient. The procedural approach is for me to give you a set of turn-by-turn directions, based on where you are coming from. If you miss one turn or if one road happens to be blocked at the time, then you’re in trouble.

That being said, there are enough powerful and useful PowerShell or Puppet scripts out there to give you a pause before discarding procedural approaches. While the declarative (aka “desired state”, “policy-driven” and sometimes “model-based”) approach looks a lot more elegant, at this point in time the real work usually gets done via scripts, deployment procedures or the likes.

In additin to academia, the competition between these approaches is playing out right now between all the companies and products that want to help you automate and manage your cloud deployments (public and/or private): for example, Rightscale scripts (custom scripts and Righscripts, see here and here) versus the more declarative ECML/EDML documents from Elastra. Or the very declarative approach taken by SmartFrog.

5 Comments

Filed under Automation, Cloud Computing, Conference, Desired State, Everything, Grid, Implementation, Research, Tech, Utility computing

Is notification wrapping getting a bum rap?

Looks like the question of whether to wrap SOAP-based notifications is back. Like Gil I prefer to stay away from wrapping notifications but my reasons are somewhat different.

I am not convinced by WSDL-centric arguments one way or the other. Proponents of wrapping say that it gives them a WSDL they can use for creating a generic listener, while opponents say that avoiding wrapping gives them a WSDL that generates useful code (payload-aware). I am not a big fan of WSDL-based code generation, but even if you are going to do it nobody says that you have to do it based on the WSDL document that ships with the specification. You’re free to modify the WSDL any way you want before feeding it to your code generation tool, as long as the result correctly describes the messages. One can write an infinity of WSDL documents for a given set of messages, some more precise and others more high-level (in which you quickly hit an xs:any). So, if the spec gives you a WSDL where the payload is xs:any and you know that in your case the payload is going to be sec:intrusionDetected, feel free to insert that element in the WSDL before running wsdl2java or whatever.

At the end, the question is not about what the WSDL in the specification looks like. The question is simply to what extent you know ahead of time the payload of the events you are going to have to handle. And you’d better know enough about the payload to create whatever logic your event consumer has to apply to the notification. Whether that’s through WSDL or some other mean. If you are not going to apply any payload-dependent logic (“generic sink”) then you don’t need to know anything about the payload. And I don’t see why someone needs a wrapper to create a generic sink.

Rather, what I don’t like about wrapping notifications is that you force them to be handled only as notifications, not as regular SOAP messages. You put them in a separate world and you make it hard for someone to create a service that can be invoked either in a subscription-driven way or in a direct way.

Here is a made-up example: consider a message to indicate that a physical intrusion has been detected in a building. There are many possible consumers for this message (local security staff, private security company, police, sound alarm, the cell phone of the owner, audit log, etc…). There are many possible sources for the message. In some cases, the message does not come from a subscription (e.g. a homeowner calls the security company and the operator enters data in a system that produces the message, or the sensor is hard-coded to sound the alarm). In others, there is a subscription (e.g. a home alarm system allows someone to register phone numbers and email addresses to which to send intrusion alerts). Sometimes something that starts as a subscription-based notification gets forwarded to someone who did not register for anything. It’s a good thing if web services that consume this message do not have to know (if they don’t care) whether this message originated because of a subscription or not. All they need to worry about is that there is a message that they have to respond to (e.g. by dispatching a patrol of clowns with orange lights on their car).

Here is a simpler analogy. Imagine that you have a filter in your email client to move all messages from Joe to a given folder. How much would you like to have to write the rule twice, one for messages that Joe sends to you directly and one for messages that Joe sends to a mailing list to which you are subscribed? Not very much I imagine.

At the same time, most notification systems are aware that they are processing notifications and there may be notification-related data that you’d like to have available in a consistent way (e.g. enough information to manage the subscription that resulted in you receiving this message). That’s fine but you don’t need an intrusive wrapper for this. Just use a SOAP header. It’s out of the way if you don’t care about it and it’s right there if you do (if you want to subject yourself to a two-year-old rant about how the SOAP processing model is unfortunately underutilized, be my guest).

One place where you need some kind of wrapping is when delivering several events at a time (either because you use pull-style retrieval or because you find it more efficient to push them in batches). If that’s what you’re after (and you want to handle it within one SOAP message rather than boxcarring a set of SOAP messages) then go ahead define a wrapper but make it a specialized wrapper that serves this purpose: collecting notifications and properly attaching whatever metadata to each. That’s a real purpose, not some WSDL make-believe.

Another use case is if you apply some transformation to the notification before sending it. Say that instead of returning a large notification you filter it by running an XPath on it and returning a serialization of the resulting node set (assuming you first solve the XPath serialization conundrum). You’d need some kind of wrapper to contain the result and put it in context, but again that should be a specialized wrapper for you filter mechanism. Not a generic wrapper.

It’s been a while since I really thought about this. My recollection may be flawed but I think I was already holding this position in the OASIS WS-Notification technical committee (which completed its work by publishing three standards in October 2006). I remember David Hull making a very eloquent case in the same direction (“wrapping” as policy-advertised option, not a part of the base framework), and strong pushback from IBM. I learned a lot about pub/sub systems from my WS-Notification committee co-chair, IBM’s Peter Niblett (a leading expert on the topic) while working on WS-Notification, but this is one area in which he did not convert me.

Comments Off on Is notification wrapping getting a bum rap?

Filed under Everything, Mashup, Mgmt integration, Middleware, SOAP, SOAP header, Specs, Standards, Tech

Long-running processes on Google App Engine: it finally works

I am probably taking things a bit too personally, but I feel like I just successfully guilt-tripped the Google App Engine (GAE) team. Just last night I was complaining that they were teasing me (supporting urllib2, but an older version, without timeout support). And tonight I noticed a new post on their blog, announcing the end of these “High CPU Requests” that have been the bane of my GAE experience.

The reason why I was looking for timeout support in the first place is to avoid generating these dreaded “High CPU Requests” that quickly result in your application being disabled. It’s all explained here and  here.

But now that they are gone, I don’t need timeouts anymore. Around 11:00PM Pacific (on Thursday 2/12) I restarted my application. All it does is create 5 new simple entities in the datastore, one every second (it sleeps for one second between each entry). Then it spawns its successor, which will do the same thing, ad vitam aeternam. The application’s name, “rere”, is short for “request relay”, the pattern used to emulate a long running process. The default page for the app (available here) just returns a list of the 30 last entities created. The point is to illustrate that a single original request spawned an ever-lasting computing task on GAE.

Here is the code:

#!/usr/bin/env python
#
# Copyright 2009 William Vambenepe
#

import wsgiref.handlers
import os
import logging
import time

from google.appengine.ext import db
from google.appengine.ext.webapp import template
from google.appengine.ext import webapp
from google.appengine.api import urlfetch

numberOfHeartbeatsViewed = 30
secondsDurationOfTaskWait = 1
numberOfTasksPerRequest = 5

class HeartBeat(db.Model):
  requestId = db.IntegerProperty()
  date = db.DateTimeProperty(auto_now_add=True)

# The mere existence of an instance of this class in the DB means that the relay has to stop.
class StopExec(db.Model):
  date = db.DateTimeProperty(auto_now_add=True)

class MainHandler(webapp.RequestHandler):
  def get(self):
    hbs = HeartBeat.all().order("-date").fetch(numberOfHeartbeatsViewed)
    template_values = {"hbs": hbs}
    path = os.path.join(os.path.dirname(__file__), "index.html")
    self.response.out.write(template.render(path, template_values))

class StartHandler(webapp.RequestHandler):
  def get(self):
    if (StopExec.all().count() == 0):
      try:
        id = int(self.request.get("id"))
      except (TypeError, ValueError):
        id = 0
      try:
        logging.debug("Request " + str(id) + " launching background task.")
        loopCount = 0
        while(loopCount < numberOfTasksPerRequest):
          hb = HeartBeat()
          hb.requestId = id
          hb.put()
          logging.debug("Request " + str(id) + " saved heartbeat #" + str(loopCount))
          time.sleep(secondsDurationOfTaskWait)
          loopCount = loopCount+1
      finally:
        logging.debug("Launching successor request with id=" + str(id+1))

# This silly back and forth between the two URLs is because of
# "App cannot fetch the same URL as the one used for the request" error.
        if (self.request.url.find("start2") == -1):
          urlfetch.fetch("http://localhost/start2?id=" + str(id+1))
        else:
          urlfetch.fetch("http://localhost/start?id=" + str(id+1))
        logging.debug("Request " + str(id) + " completed")

def main():
  application = webapp.WSGIApplication([("/", MainHandler), ("/start", StartHandler), ("/start2", StartHandler)], debug=True)
  wsgiref.handlers.CGIHandler().run(application)

if __name__ == "__main__":
  main()

One thing I had to change from the earlier version (written using version 1.1.0 of the GAE SDK) is that urlfetch now returns an error if your app tries to invoke itself at the same URL (“App cannot fetch the same URL as the one used for the request”). So I have to alternate between http://localhost/start and http://localhost/start2, both of which are mapped to the same handler. This was added sometimes between SDK 1.1.0 and SDK 1.1.9. If it is aimed at preventing the kind of batton-passing that I am doing, it is pretty unefficient considering how easy it is to circumvent.

It is now 1:02AM Pacific the next day (Friday 2/13) and the process is still progressing, based on the single HTTP request I sent to it at 11:00PM the previous evening. The result page currently returns:

  • From request # 1056, with date Fri, 13 Feb 2009 09:02:15 +0000
  • From request # 1056, with date Fri, 13 Feb 2009 09:02:14 +0000
  • From request # 1056, with date Fri, 13 Feb 2009 09:02:13 +0000
  • From request # 1056, with date Fri, 13 Feb 2009 09:02:12 +0000
  • From request # 1056, with date Fri, 13 Feb 2009 09:02:11 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:10 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:09 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:08 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:07 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:06 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:05 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:04 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:03 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:02 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:01 +0000
  • From request # 1053, with date Fri, 13 Feb 2009 09:01:59 +0000
  • From request # 1053, with date Fri, 13 Feb 2009 09:01:58 +0000

Which shows that 1056 successive requests have participated in the relay (the last one just happened, at 09:02:15 UTC which is 1:02AM Pacific).

Hopefully it will still be running when I wake up tomorrow.

[UPDATED 2009/2/13, 9:08AM Pacific: It’s alive!

  • From request # 6411, with date Fri, 13 Feb 2009 17:08:45 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:44 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:43 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:42 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:41 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:40 +0000
  • From request # 6409, with date Fri, 13 Feb 2009 17:08:39 +0000

BTW, the code provided uses localhost to run on my local machine. The version uploaded to Google of course replaces this with rere.appspot.com.]

[UPDATED 2009/5/1: For some reason this entry is attracting a lot of comment spam, so I am disabling comments. Contact me if you’d like to comment.]

4 Comments

Filed under Cloud Computing, Everything, Google App Engine, Implementation, Tech, Utility computing

Google App Engine is teasing me

Version 1.1.9 of the Google App Engine (GAE) SDK was released earlier this week. The first item in the announcement covers the big news, that developers can “use the Python standard libraries urllib, urllib2 or httplib to make HTTP requests”. My first thought reading this was that I was finally going to be able to use timeouts on outgoing HTTP requests. I care about this because my earlier attempt to emulate a long-running process in GAE was stymied by the GAE quota system, something I think I can work around if I can timeout a request once it has spawned its successor (more precisely, once it has spawned the successor of the incoming request that created the new outgoing request).

I got the new SDK last evening and moved the code from using urlfetch to using urllib2 (with timeout). On my local machine it seems to work, but the quota system (that I am trying to finesse) doesn’t run in the SDK. So the only real test happens once you deploy the app in the Google environment. Which is when I realized that GAE uses Python 2.5.2 and that the timeout parameter in urllib2 (and httplib) came with 2.6. Slap.

I was especially disappointed because the links for urllib2 and httplib in the GAE 1.1.9 SDK announcement take us to the Python 2.6.1 documentation. The timeout parameter is right there at the top of these pages, staring at me. It would be more accurate for this announcement to point to the 2.5.2 documentation (here it is for urllib2 and httplib).

It doesn’t really matter of course because this is just a toy project. And a real scheduler seems to be in the works (see this pre-announcement and this work in progress). I just do this as a fun way to get a glimpse of what it takes to turn existing infrastructure into a *aaS product, something that is going on in different ways in many places these days. Linux wasn’t created to run on something else than hardware. Xen wasn’t created to support EC2. Python wasn’t created to support GAE.

And GAE wasn’t created to support long-running processes. But I haven’t given up.

1 Comment

Filed under Cloud Computing, Everything, Google App Engine, Implementation, Tech, Utility computing

Oracle acquires mValent for application config management

mValent will become part of Enterprise Manager. It comes to complement other recent acquisition in the application management space: Auptyma, Moniforce, Empirix, ClearApp.

The announcement and FAQ are here.

More details about the acquired product and technology are on the mValent site, including here and here.

1 Comment

Filed under Application Mgmt, CMDB, Everything, IT Systems Mgmt, Mgmt integration, Modeling, Oracle

UCI: setting RDF for failure?

I don’t get it. I just read Reuven Cohen’s description of the Unified Cloud Interface project that he recently started. It’s nothing less than using RDF to create “a Semantic Cloud Infrastructure capable of adapting to a variety of methodologies / architectures and completely agnostic to any specific API or platform being described.”

What made me fall off my chair is the methodology/architecture part of this statement. It’s hard enough (but doable) to use RDF to map philosophically similar APIs. It’s a non-starter to use it to bridge architectural and methodological differences. I have spent a fair amount of time looking at Semantic Web technologies in the context of modeling IT systems (see the “semantic tech” category of this blog). While I think they would be a great foundation I don’t see them ever coming anywhere near what Reuven describes.

But to be fair, I am not sure what he really is describing. There are a few overly ambitious proclamations like the one above and this paragraph:

The key drivers of a unified cloud interface (UCI) is “One abstraction to Rule them All” – an API for other API’s. A singular abstraction that can encompass the entire infrastructure stack as well as emerging cloud centric technologies through a unified interface. What a semantic model enables for UCI is a capability to bridge both cloud based API’s such as Amazon Web Services with existing protocols and standards, regardless of the level of adoption of the underlying API’s or technology. The goal is simple, develop your application once, deploy anywhere at anytime for any reason.

But in his piece you’ll also find CIM being cited as an example. There are good things to be said about CIM, but it certainly is not “a dynamic computing model that can, under certain conditions, be ‘trained’ to appropriately ‘learn’ the meaning of related cloud & infrastructure resources” (or, in the case of CIM, computer system resources). Good luck “training” CIM to “learn” anything. It’s CIM that’s going to train you to do it its way, period.

The CIM example (and other standards he lists) paints the picture of defining a standard API for Cloud Computing and forcing all providers to use it. That’s the conventional approach to universality. If that’s what UCI is after then it is technically achievable. And RDF might be a very good technical foundation for it. Whether anyone can pull this off politically and commercially at this stage is a different question of course. In any case, such an effort would have nothing to do with magically wrapping whatever API each provider has defined and whatever architecture/methodology they chose.

And further down we see a sketch of another, much more modest, vision, when Reuven talks about how “these web resources could just as easily be ‘cloud resources’ or API’s” which seems to represent a whole API as an RDF resource. Sure, then you can use RDF/OWL to capture versioning information between them, backward compatibility etc. Probably very useful, but that’s a very different scope.

So which is it? Reuven is a thought leader in Cloud Computing, so I want to think I am missing his point.

So far, I haven’t seen any Cloud taxonomy that is reasonably complete and has received broad support. Shouldn’t we first try to come up with a human-readable taxonomy before we try to turn it into a machine-readable ontology? In my previous post I explicitly stayed away from being pedantic about the difference between the terms, but the confusion between a taxonomy and an ontology seems to be part of what’s going on here.

The sad thing is that they (you know, them) will point to this as a proof that Semantic Web technologies don’t work.

Or maybe I’ve just set myself up for a generous portion of humble pie on April 2nd (when Reuven says an “initial functional draft UCI implementation, taxonomy and ontology” will be unveiled). I’d love to be surprised. And my ego has taken worse hits before.

[UPDATED 2009/2/10: You should read Steve Oberlin’s take on this overall taxonomy/ontology discussion. He knows the topic, carefully reads the posts that he comments on, packs a healthy dose of skepticism and takes the time to explain what taxonomies and ontologies are, which was overdue. Plus, I just love sites that don’t feel the need to use decorative pictures. His doesn’t have a single image file which means that even if he didn’t have superb credentials (which he does) he’d get my respect by default. A blog to watch.]

1 Comment

Filed under Automation, Cloud Computing, Everything, Modeling, Portability, RDF, Semantic tech, Standards, Tech, Utility computing

Cloud ontology: to boldly go where ITIL, Grid and SOA have gone before

I wasn’t at the recent Cloud interop meeting in Mountain View, but based on blog reports from those who were (Stu, James and Bob) a key takeaway is that we need a “Cloud taxonomy”.

As John points out, there are plenty of existing proposals. The SPI taxonomy (SaaS/PaaS/IaaS), the SADIST-PIMP taxonomy (Storage, App, DB, Info, Security, Test, Platform, Integration, Management, Process – which I took the liberty to re-order) and the UCSB/IBM “Toward a Unified Ontology of Cloud Computing” paper.

Is it a matter of picking one of these? And if we do, how are we better of?

Let’s look at some related domains that are a bit more mature to see how they handled this and what benefits they got out of it. The three most relevant comparison I can come up with are ITSM, Grid Computing and SOA.

ITSM has ITIL, a key part of which is its glossary which provides value in and of itself, even if you don’t explicitly use the rest of the framework. Is this what the Cloud taxonomy should aim for? I contend that the Cloud field is not nearly mature enough to aim for this.

Grid Computing has OGSA, which started as a white paper containing a service model (section 6.1). It was later expanded (within GGF, a standards organization specific to Grid Computing) into this document and was used as a basis for many standard technical specification. Is the goal of the Cloud Ontology to follow a similar path? OGSA was more than a glossary though, it included a vision, goals and architectural recommendations. Sure you can start with the glossary, but I am willing to bet that a stand-alone glossary would change significantly if/when a Cloud Computing architecture is defined based on it. And there isn’t today, for Cloud Computing, the singularity of vision that the Globus gang brought to Grid Computing and that is necessary for an architecture.

SOA was never as structured. There was an early attempt, called the Web Services Architecture at W3C. Like OGSA, this is a high level architecture, not just a taxonomy. In any case, the WSA work didn’t really end up having much of an influence in SOA (or even Web services) practice. Then came the SOA Reference Model, an OASIS standard. Of the three (ITIL, OGSA, SOA-RM) this is what I think is closest to what a Cloud taxonomy should aim for at this point. An ITIL-like glossary would be too brittle (if it has any depth) and it’s too early for an OGSA-like architecture.

Again, I was not at the meeting and only know what was blogged about. This is not a criticism of the proposal, just some thoughts about how it might compare to similar efforts.

Side note #1: I prevented myself from going off about the difference between a controlled vocabulary, a glossary, a taxonomy and an ontology. These differences are not germane to the present discussion. And I am not wearing my pedantic hat today (for once). If some want to call a wedding cake an ontology, who am I to stop them?

Side note #2: Speaking of trying to be less pedantic in 2009, this entry represents my capitulation: I finally created a “Cloud Computing” category in this blog, rather than my prefered designation, “Utility Computing”, which I have been using so far. A small step for me, a microscopic step for humanity.

[UPDATED 2009/1/28: We’re getting there (fast!). Via John, this updated taxonomy graph from Chris Hoff is going in the right direction, I think. Next step: more text describing the relationships between the boxes to have more of a model.]

[UPDATED 2009/1/30: If you think that any single-vendor taxonomy will be suspicious, that a multi-vendor taxonomy won’t happen and that customers are too busy to do it, then you may be asking “aren’t analysts supposed to do this?”. Help may be on the way.]

8 Comments

Filed under Cloud Computing, Everything, Grid, ITIL, Modeling, Standards, Utility computing

The Hippocratic Oath for IT management systems

The first feature of any IT management application should be “primum non nocere”, or in English “first, do no harm”. Your IT management application should not ever:

  • crash the system it manages
  • materially degrade its performance
  • introduce security vulnerabilities to the system

It’s appropriate that the “primum non nocere” expression comes from a medical background, since IT management software acts, to a large extent, like the doctor of the datacenter. In that spirit, I wondered whether the Hippocratic Oath could also provide good guidelines to those designing IT management software.

Turns out, the original oath hasn’t aged so well (though it cannot hurt to remind IT consultants not to have sex with the IT staff of their customers or attempt to remove kidney stones on them).

On the other hand, there is a modern version (written by Louis Lasagna in 1964) that maps quite well to our domain. Here it is, with each paragraph followed (in bold font) by its translation for IT management. Just food for thoughts.

I swear to fulfill, to the best of my ability and judgment, this covenant:

I will respect the hard-won scientific gains of those physicians in whose steps I walk, and gladly share such knowledge as is mine with those who are to follow.

Follow standards, publish configuration best practices in your domain of expertise, contribute to open source projects.

I will apply, for the benefit of the sick, all measures [that] are required, avoiding those twin traps of overtreatment and therapeutic nihilism.

Provide useful features that fully address customer needs. Not a superficial product, but not a boatload of useless features and meaningless metrics either.

I will remember that there is art to medicine as well as science, and that warmth, sympathy, and understanding may outweigh the surgeon’s knife or the chemist’s drug.

Good technology is important, but so are support, documentation and user interface.

I will not be ashamed to say “I know not,” nor will I fail to call in my colleagues when the skills of another are needed for a patient’s recovery.

Don’t push “consulting engagements” to stretch your application to deliver features it is not designed for. Instead, integrate with products that perform the function well.

I will respect the privacy of my patients, for their problems are not disclosed to me that the world may know. Most especially must I tread with care in matters of life and death. If it is given me to save a life, all thanks. But it may also be within my power to take a life; this awesome responsibility must be faced with great humbleness and awareness of my own frailty. Above all, I must not play at God.

Implement roles, permissions and auditing. Be very careful not to inadvertently crash, slow or compromise the target system. Keep in mind, in your design, that you and your developers are fallible.

I will remember that I do not treat a fever chart, a cancerous growth, but a sick human being, whose illness may affect the person’s family and economic stability. My responsibility includes these related problems, if I am to care adequately for the sick.

You are not managing just a server, a JVM or a database. Considers the entire IT system and the business value it delivers.

I will prevent disease whenever I can, for prevention is preferable to cure.

Proactively monitor the system, detect problems before they impact the system and, when possible, automate resolution.

I will remember that I remain a member of society, with special obligations to all my fellow human beings, those sound of mind and body as well as the infirm.

Don’t cheat, or abuse your position. Be honest and decent with customers, partners and competitors.

If I do not violate this oath, may I enjoy life and art, respected while I live and remembered with affection thereafter. May I always act so as to preserve the finest traditions of my calling and may I long experience the joy of healing those who seek my help.

If you do all this, customers will keep buying from you and may even like you.

1 Comment

Filed under Business, Everything, IT Systems Mgmt

Sorry, CMDBf doesn’t make coffee either

The IT Skeptic is writing to us from his mountain retreat (via a time-delayed post on his blog), and the topic he felt safe to cover in such fashion (what journalists call an “evergreen”) is the fact that CMDBf is an orchestrated sham, brilliantly executed by IT management vendors.

I’d love to be part of something that’s brilliantly executed for once, even if it is a sham, but I am afraid this is not it. But first I should state the obvious, clarifying that even though I am a member of the CMDBf group at DMTF (and also an author of the original version, under my previous employer) I do not speak for the group or DMTF (or my employer for that matter). Just as myself, as always on this blog.

The problem that Rob England, Mr. Skeptic, has with the CMDBf specification is that it doesn’t do a bunch of things that he’d like it to do, such as specifying how data sources acquire data for their domain, how they store the data, how the underlying resources are reconfigured, what processes are followed etc. See the full list from his post. The list is a copy/paste from the CMDBf specification, with some comments added, so at the very least he has to admit that as far as “smokescreens” go this one is pretty upfront about its limitations…

He concludes that “this is once again a geeky technical solution to a cultural, organizational and procedural problem.” I have to ask: who expects DMTF specifications to solve “cultural, organizational and procedural” problems? Does CIM solve such problems? Does WBEM?

Human-to-human communication is a “cultural, organizational and procedural” problem and SMTP/POP/IMAP/etc (the interoperable protocols used by email systems) are just as geeky as CMDBf. They don’t solve the larger problem, only contribute to the solution. If CMDBf can contribute as much to datacenter management as SMTP/POP/IMAP contribute to human communication (minus the SPAM if possible), I’d call that a success.

And then there is this warning:

“WARNING: vendors will waive this white paper around to overcome buyer resistance to a mixed-vendor solution. For example if you already have availability monitoring from one of them, one of the other vendors will try to sell you their service desk and use this paper as a promise that the two will play nicely.”

Has anyone actually seen this happen? I am asking because so far, both at HP and Oracle, the only sales reps I have ever met who know of CMDBf heard about it from their customers. When asked about it, the sales person (or solutions engineer) sends a email to some internal mailing list asking “customer asking about something called cmdbf, do we do that?” and that’s how I get in touch with them. Not the other way around.

Also, if the objective really was to trick customers into “mixed-vendor solutions” then I also don’t really understand why vendors would go through the effort of collaborating on such a scheme since it’s a zero-sum game between them at the end.

As far as the glacial pace of progress (“Glacial advance. That’s the way the vendors want it” from an earlier post by the Skeptic), CMDBf is no race horse but I don’t see it going any slower than other standards. Slowness (I mean, deliberation) is part of the landscape. I would submit a slight twist on Hanlon’s razor: “Never attribute to malice that which can be adequately explained by legal, procedural and organizational inertia.”

Having said all this, some of Rob’s criticism is perfectly justified, such as his sarcasm about this sentence from the specification:

“The Federated CMDB operates in a closed environment, in which some security issues are less critical than in open access or public systems.”

OK, that’s stupid indeed. Especially in a public cloud environment where you don’t know who is renting the VM next door. I’ll ask the group to remove this. Actually, that whole appendix is useless and I pointed this out in my earlier review of CMDBf 1.0 (look for the “security boilerplate” section at the bottom of the review).

Rob could also have pointed out that this specification only addresses “federation” if you accept a very scaled-down definition of the term. What it does do is help with CMDB query and synchronization. Not the holy grail, but nothing to sneer at either.

Rob, next time you want to throw tomatoes at CMDBf while you’re on holiday, just give me the password to the site and I’ll do it for you… :-)

[UPDATED 2009/1/21: Rob responds via a comment on his original blog entry.]

2 Comments

Filed under BSM, CMDB Federation, CMDBf, DMTF, Everything, IT Systems Mgmt, ITIL, Mgmt integration, Security, Specs, Standards

A new SPIN on enriching a model with domain knowledge (constraints and inferences)

Back when I was at HP and we got involved with what turned into SML (now a W3C candidate recommendation), we tried to make a case for the specification to be based on RDF/OWL rather than XML/XSD/Schematron. It was a strange situation from a technical perspective because RDF is a better foundation for an IT model than XML, but on the other hand XSD/Schematron is a better choice for validation than OWL. OWL is focused on inference, not validation (because of both fundamental design choices, e.g. the open world assumption, and language expressiveness limitations).

So our options were to either use the right way to represent the system (RDF) combined with the wrong way to capture constraints (OWL) or to use the wrong way to represent the system (XML) combined with the right way to constrain it (mostly Schematron, with some limited help from XSD). At the end, of course, this subtle technical debate was crushed under the steamroller of vendor politics and RDF never got a fair chance anyway.

The point of this little background story is to describe the context in which I read this announcement from Holger Knublauch of TopQuadrant: the new version of their TopBraid Composer tool introduces SPIN, a way to complement OWL with a SPARQL-based constraint checking and inference mechanism.

This relates to SML in two ways.

First, there are similarities in the approach: Schematron leverages the XPath language, used to query XML, to create validation rules. SML then marries Schematron with XSD, for a more powerful validation mechanism. Compare this to SPIN: SPIN leverages the SPARQL query language, used to query RDF, to create validation/inference rules. SPIN also marries this with OWL, for a more powerful validation/inference mechanism.

But beyond the mirroring structures of SPIN and SML, the most interesting thing is that it looks like SPIN could nicely solve the conundrum, described above, of RDF being the right foundation for modeling IT systems but OWL being the wrong constraint mechanism. SPIN may do a better job than SML at what SML is aiming to do (validation rules). And at the same time, you get “for free” (or as close to “for free” as you can get with software, which is still far from “free”) a pretty powerful inference mechanism. The most powerful I know of, short of using a general programming language to capture your inference rules (and good luck with maintaining these rules).

This may sound like sci-fi, but it’s the next logical step for IT configuration standardization. Let’s look at where we are today:

  • SML (at W3C) is an attempt to standardize the expression of constraints.
  • CMDBf (at DMTF) is standardizing how the model content is queried (and, to some limited extent at this point, federated).
  • And recently IBM authored a proposal for a reconciliation specification for items in the model and sent it to an Eclipse group (COSMOS).

But once you tackle reconciliation, you are already half-way into inferencing territory. At least if you want to reconcile between models, not just between instances expressed in the same model. Because the models may not be defined at the same level of granularity, and before you can reconcile items you need to infer finer-grained entities in your coarser-grained model (or vice-versa) so that you can reconcile apples with apples.

Today, inferencing for IT models is done as part of the “discovery packs” that you can buy along with your IT management model repository. But not very well, in general. Because the way you write such a discovery module for the HP Universal CMDB is very different from how you write it for the BMC CMDB, IBM’s CCMDB or as a plug-in for Oracle Enterprise Manager or Microsoft System Center. Not to mention the smaller, more specialized, players. As a result, there is little incentive for 3rd party domain experts to put work into capturing inference rules since the work cannot be widely leveraged.

I am going a bit off-topic here, but one interesting thing about standardization of inferencing for IT management, if it happens, is that it is going to be very hard to not use RDF, OWL and some flavor of SPARQL (SPIN or equivalent) there. And once you do that, the XML-based constraint mechanisms (SML or others) are going to be in for a rough ride. After resisting the RDF stack for constraints, queries and basic reconciliation (because the added value was supposedly not “worth the cost” for each of these separately), the XML dam might get a crack for inferencing. And once RDF starts to trickle through that crack, the whole dam is going to come down in a big wave. Just to be clear, this is a prophetic long-term vision, not a prediction for 2009 (unfortunately).

In the meantime, I’d like to take this SPIN feature a… spin (sorry) when I find some time. We’ll see if I can install the new beta of TopBraid composer despite having used up, a year ago, my evaluation license of the earlier version of the product. Despite what I had hopped at some point, this is not directly applicable to my current work, so I am not sure I want to buy a license. But who knows, SPIN may turn out to be the change that eventually puts RDF back on my “day job” list (one can dream)…

It’s also nice that Holger took the pain to deliver SPIN not just as a feature of his product but also as a stand-alone specification, which should make it pretty easy for anyone who has a SPARQL engine handy to support it. Hopefully the next step will be for him to clarify the IP terms for the specification and to decide whether or not he wants to eventually submit it for standardization. Maybe to the W3C SML working group? :-) I’d have a hard time resisting joining if he did.

12 Comments

Filed under CMDB, CMDBf, Everything, IT Systems Mgmt, Modeling, RDF, Semantic tech, SML, SPARQL, Specs, Tech, W3C

Announcing Xen Transcendent Memory project

If you have more than one child, you’ve probably heard yourself say things like “if you are not using your train, you should let your brother play with it” more often than you’d like. The same happens in a datacenter (minus the screams and tears, at least usually). In that context, the rivaling siblings take the form of guest virtual machines and the toys in contention are the physical resources of the host system: CPU, I/O, memory. While virtualization platforms do a pretty good job at efficiently sharing the first two, the situation is not nearly as good for memory. It is often, as a result, the limiting factor for virtualization-driven consolidation. A new project aims to fix this.

The Oracle engineers working on the Xen-based Oracle Virtual Machine have just announced a new open source (GPL-licensed) project to improve the sharing of physical memory between guest virtual machines on the same physical system. It’s called Transcendent Memory, or tmem for short.

Much more information, including a comparison with VMWare’s memory balloon, is available from the project home page.

Another reason to come to the upcoming Xen Summit (February 24 and 25), hosted by Oracle here at headquarters.

Comments Off on Announcing Xen Transcendent Memory project

Filed under Everything, Linux, Open source, Oracle, OVM, Tech, Virtualization, Xen

The art of reconciling items in your IT management model

Whether you call it a CMDB or some other name, any repository of IT model elements has the problem of establishing whether two entities are the same or not. Here is a quick map of the problem space.

Why there is no “one true solution”

There is no “true” answer to the “sameness” question. The following example illustrates this, even though it is not necessarily representative of datacenter scenarios. Ask any gamer to tell you the history of that 3-year-old PC under their desk. The power cord might be the only original piece left after they’ve upgraded the RAM, video card, sound card, hard disk(s), DVD drive and power supply. Not to mention the tragic overclocking accident that took the life of the motherboard/CPU. After the upgrade/replacement of each of these parts, the user still thought of the machine as the same PC, just upgraded/fixed. But how can it be the same as at the beginning if pretty much every single part has changed? And when time came to reinstall Windows, the registration probably failed because Microsoft decided that the same license was being used for a new machine. Sameness is in the eye of the beholder.

And it’s not just a hardware problem. When you upgrade your Oracle Database and start using a new ORACLE_HOME, it may feel like the same database to most users (including the applications that talk to the database) but a more executable-centric view might conclude that it is a new database.

Defining what makes an IT element unique is not a matter of truth. It’s a matter of usefulness for a given purpose. When trying to establish this for your model, if the conversation ever veers philosophical, you’re off track. This is engineering, not science. “A and B are the same” should be understood to be a shortcut for “it makes sense for my purpose to consider A and B to be the same”. Of course things become complicated when “my purpose” encompasses a whole set of use cases (add “management” after each of: performance, compliance, configuration, change, asset, business service, business transaction…).

How the problem arises

It can arise over time. For example the management agent has to be reinstalled and it forgets the id that had been assigned by the server. When it comes back up, it reports what looks like a new item. But you want to reconcile it with the historical data that came from the agent’s previous incarnation.

It can arise because you have different discovery channels for the same item. For example, a BPEL engine reports to the management server the processes it supports and that model includes the external Web services (partnerLink) invoked by the processes, thereby creating items for these external services in your repository. But some of those external services may be running on servers which you also monitor and the services (and more generally the applications that deliver them) may be separately discovered by the agents on these hosts, resulting in a potential duplicate representation in the repository.

Or the problem can be a result of the integration of IT management products. For example, that Dell server in my asset management system may be the same as the Linux host that runs my production database and appears in Oracle Enterprise Manager.

Fixing the mess

There are two stages to this:

First, you need some level of model alignment. In the general case, the different items that you are trying to reconcile are not expressed in the same model. The view of the server coming from the asset management system does not necessarily contain the same data as its view in your operations console. One contains the lease expiration date, the other one contains the amount of space left on the disk. Some data may be in both (e.g. number of CPUs, host name) but not necessarily in fields of the same name. Or with the same granularity (ownerName versus ownerFirstnameownerLastname). Not to mention type system differences (but if the items are already in the same repository you have presumably already forced some level of metamodel alignment). In short, you first have the challenge of model transformation, a more general problem. With the advantage that the entire model does not need to be translated for item reconciliation, only the subset of data needed to establish “sameness”: the identifying properties. And in some cases (e.g. when a standard model is used or when two instances of the same agent report on the item), the items to reconcile are already described in the same model and this step can be skipped.

Once the necessary level of model alignment has taken place (if needed) so that items can be compared, the real task of reconciliation takes place, based on domain knowledge. It could be through a set of scripts (Python’s mix of simplicity, portability, broad array of libraries and ease of integration make it shine in this usage). It could be through some kind of reconciliation taxonomy, like this draft that IBM has contributed to the Eclipse COSMOS project. Or through metadata such as WSDM’s correlatable properties. [BTW, as the spec editor I got to insert dubious cultural references in the specification (see the <print:PrinterResourcePropDoc> example in section 5.3.3.1), but let me assure you that I have since matured… ;-)]

These are not the only ways to reconcile items, but they are the approaches that can be followed based on just the data in the repository. Beyond that, you can run a dummy transaction and trace it (if possible) across different management systems to reconcile entities between them. There are plenty of other domain-specific tricks, depending on the item type (I remember a machine room, back in the days when each server had a CD drive, where a script to open the CD tray was used to allow the operator to put a sticker on the correct machine). In general, these approaches play on external variables that are not directly part of the model of the item and yet can be influenced through it. Similar to how the bulb temperature is used in this famous brain teaser. I guess the IT equivalent would be to load-stress an application and use IPMI to see which CPUs register a rise in temperature (note: not a recommended approach in production systems…).

Coming back to the IT model repository, you also need to have plumbing in place to deal with the result of the reconciliation: requests and data may come in that reference either one of the reconciled items and you need to be able to deal with that split personality, while providing a unified view in the general case. You also need to be ready to deal with potential data discrepancy between the items (either automatically or through of process that involves humans, but this is out of scope for this entry).

Preventing the mess from happening

Can’t we just prevent the problem from occurring in the first place? To some extent yes. The main way to prevent it is to not reconcile what doesn’t need to be. This may sound heretical in these days of “single source of truth” and “end to end visibility” but reconciliation of key connection points is often enough. You may not need to have one single model that contains everything from your company’s employee directory to the fan speed of all your servers. It’s a matter of delivering on use cases, not hoarding data.

When you do want to consolidate and reconcile, one approach is to standardize on natural IDs for items of different types. But this requires domain experts to carefully select identifying (and therefore immutable) properties of the different object types, which sounds a lot easier than it is. And it requires convincing others to adopt this approach, an even harder task. But as the proverb (almost) goes, one ounce of convention is worth one pound of reconciliation.

[Note: Whenever you talk about item reconciliation, the topic of correlating events is not far behind. It is assisted by a solid underlying IT model, but it has challenges of its own, so I’ll consider this out of scope for this discussion.]

3 Comments

Filed under CMDB, Everything, IT Systems Mgmt, Mgmt integration, Modeling

Less is more: inventory of XPath subsets

Many specifications that manipulate XML content have taken the step to create their own subset of XPath. Typically, they need an XML query/pointer language but full XPath (or XPointer) is overkill for their purpose (I am talking about XPath 1.0 here, XPath 2.0 is usually over-over-kill-and-then-some). Defining a subset of XPath rather than inventing a query language from scratch is attractive because:

  • people are relatively familiar with XPath (at least the most common parts)
  • it is already specified so you can leverage the W3C spec-writing work
  • implementers who have access to an XPath engine get an implementation of the subset “for free” since the XPath engine will process statements from any XPath subset

Here is a quick inventory of spec-defined XPath subsets that I am aware of.

XML Schema

Section 3.11.6 of the XML Schema specification (part 1) defines a subset of XPath used to point to a set of elements that are the target of an identity constraint. Here is the BNF of the abbreviated form of the subset (you can also use the functionally equivalent full-length notation):

Selector   ::=    Path ( '|' Path )*
Path       ::=    ('.//')? Step ( '/' Step )*
Step       ::=    '.' | NameTest
NameTest   ::=    QName | '*' | NCName ':' '*'

Actually, there is a second subset defined, to point to the identifying key from the identified element. It is very similar to the previous one but it allows attribute nodes to be selected, via this modification:

Path       ::=    ('.//')? ( Step '/' )* ( Step | '@' NameTest )

According to the specification, these subsets were defined “in order to reduce the burden on implementers, in particular implementers of streaming processors”. As this article points out, stream-friendliness of an XPath subset is a relative notion. But the subsets above seems, indeed, to fit the bill. They also include many simplifications that “reduce the burden on implementers” but have nothing to do with streaming, such as removing functions and all predicates (rather than simply restricting the content of predicates).

WS-Management and WS-ResourceTransfer

WS-Managment also defines two subsets (it calls them dialects) of XPath. I won’t copy the BNF definitions here, as they are pretty long. You can find them in Appendix D of the specification. They are called “XPath level 1” and “XPath level 2”.

The main use cases driving these dialects had to do with implementing WS-Management in resource-constrained environments, e.g. the board management controller of a server.

WS-ResourceTransfer took this idea from WS-Management and it too defines an “XPath level 1” dialect (see Appendix I of the specification)

Windows EventLog Remoting Protocol v6

Version 6.0  of the Microsoft EventLog Remoting Protocol, new in Vista, adds a filter mechanism (to select events in the log) that is based on a subset of XPath. Streaming appears to be a concern there too (“evaluation of each event MUST be restricted to forward-only, in-order, depth-first traversal of the XML”). And, being Microsoft, they also add some extensions to XPath.

CMDBf (coming soon)

CMDBf also defines a subset of XPath. It is not defined via a restricted syntax but via a limitation of the type of objects returned. There is no BNF provided. You can write your XPath any way you want as long as it only returns objects of the right types (e.g. nodesets containing comment nodes are out of luck). Think of it as “management by objectives” rather than micromanagement. The main driver here is not support for  streaming. It’s that since XPath nodeset serialization is a pain we only want to do it where there is a compelling use case. There is no point creating interoperability challenges for no practical benefit.

Others?

Except for XSD, all the examples above come from the IT management world, because that’s where I live. There are probably plenty of specifications in other domains which took similar steps, such as the Digital Talking Book ANSI standard (see this section).

And those are only XPath subsets defined as part of specifications. There are also XPath subsets defined by implementations (e.g. ElementTree’s limited XPath support). And others defined for research purposes (e.g. “Univariate XPath”, from this ACM article, that is quickly described in the previously-mentioned post about stream-friendly XPath subsets).

If you know of other interesting XPath subsets, please leave a comment.

Comments Off on Less is more: inventory of XPath subsets

Filed under Everything, Specs, Standards, WS-Management, WS-ResourceTransfer, XPath

Now I know why GAE has been killing me

I haven’t written about Google App Engine lately because I haven’t spent too much time using it. And the time I did spend was mostly consumed fighting JavaScript for some client-side aspects that have nothing to do with the fact that the server side is on GAE. I have only touched JavaScript occasionally in the 12 years since writing this game, and I am pretty rusted. Funny how Python came back to me almost instantly but JavaScript just doesn’t.

I am writing a quick post on GAE today because the Google team just published a blog entry that explains why my attempts to create a long-running process in GAE resulted in my application being disabled for having too many “high CPU requests”, a concept that was never documented in the quota system.

Google is now offering a “quota detailed dashboard” and, as this screen capture shows, it tells us that you only get 60 “high CPU requests” and that they replenish at a rate of 2 per minutes. So at least now I know.

What is still not clear is why a request that is doing nothing but waiting for a URLfetch to come back is considered by Google to use too much CPU…

2 Comments

Filed under Everything, Google App Engine, Implementation, Tech, Utility computing

HP introduces “Operations Manager i”

If you’ve seen a lot of news articles about HP’s IT management software this week (e.g. through Cote or Doug) it’s because the company held its Software Universe conference in Vienna this week and timed a bunch of announcements and PR events to match.

Most of the articles linked above just paraphrase the press releases and talking points. So if you’re going to get the company line, might as well get it straight from the horse’s mouth. Which we can now do through a new HP blog about BSM. The first article was penned by Mike Shaw and that’s enough for me to want to subscribe (I worked with Mike a few times when I was at HP and he is very sharp). I think Mike also wrote the other entries but since they are not signed (and the account name, “adsey007”, is pretty opaque) I am not sure. In any case, they are pretty good. This one gives an overview of the Vienna announcements. The next one describes in more details the OMi product. I am not in position to know how well it works but, according to the article, OMi takes the important step of modeling and managing events in the context of the overall model in the CMDB. Such that the event management features (e.g. correlation) can use the already-discovered relationships between the IT elements involved in the events (e.g. dependencies). The article also implies that the CMDB has been integrated with NNM (OpenView), Service Manager (Peregrine) and Server Automation (Opsware). Which is a lot of progress in 16 months since I left HP, so I am taking it with a grain of salt (we all know there are different levels of integration). The press release says that the CMDB is now integrated with 17 HP BTO applications, so you may need a whole salt shaker. In any case it’s great to see that Ramin and team are forging ahead, delivering products and driving the integration of the BTO portfolio.

The last paragraph (“OMi actually sits on top of existing HP Operations Manager installations…”) is intriguing and may provide a clue about the depth of the integration. In any case, OMi is something to keep an eye on as it is positioned to leverage a lot of the key strengths of the HP BTO portfolio.

BTW, this OMi product has nothing to do with this OMI which was a precursor to WSMF, WSDM and WS-Management. And which most people currently working in HP Software have never heard of.

2 Comments

Filed under Application Mgmt, Conference, Everything, HP, IT Systems Mgmt, Mgmt integration, Modeling, People

Here comes WSTF

A new Web services-related industry body has been announced today: WSTF (Web Services Test Forum). More details about it from Infoworld. My employer (Oracle) seems to be one of the drivers (along with IBM) but I am not personally involved.

A lot of hand-wringing, of course, about its relationship with WS-I. Which is understandable if you consider what WS-I was originally supposed to deliver (profiles, sample applications and testing tools). But not if you consider what it has actually delivered that is relevant (a couple of profiles, some time ago). WSTF could also be compared with the SOAPBuilders Yahoo group, but since that group has seen only two emails messages so far in 2008 (last one dated April 2nd), it seems safe to consider it dead. It would be interesting to know why that is though (it used to be pretty active in the early days) and what lesson WSTF may learn from it. Another effort you may want to compare this to is the Microsoft Web Services Protocol Workshops Process. It’s too early to tell, but they may turn out to be more closely related than meets the eye.

I noticed this innocuous-sounding sentence in the press release (warning, PDF): “As an open community, WSTF has made it easy to introduce new interoperability scenarios and approve work through simple majority governance”. You may wonder why this is important enough to figure in the press release.

I interpret it as a dog whistle call (heard only by those to whom it is intended) for the WS-I board. Microsoft’s Paul Cotton responds to it in his quote for Infoworld: “WS-I provides a proven and open organization and process that best suits our customers’ needs”. He also talks about “the incredible industry-wide momentum and leadership of WS-I”, which is indeed not very credible (especially the momentum part). The WS-I process, associated board politics and resulting inaction is what I was talking about in this entry (“veto rules being commonly invoked, stopping most of the activities that the resort was originally planning to offer”).

Speaking of this “Standardstown” blog entry, I should probably soon update it to include WSTF. What should it be? Maybe a trailer park in which customers bring their own lodging, put them side by side and see how they line up?

The current test scenarios seem to focus on fixing the interoperability mess that is WS-Addressing. I assume more will soon be added to test the different WS-* specifications out there. It will be interesting to see what direction WSTF takes after that. Will the payloads of the test messages be obvious dummy payloads (so that the focus is on testing the implementation of the WS-* protocols)? Or will they start to include real payloads (e.g. real purchase orders from real enterprise applications)? How about this: “dear vendor, I will only buy your wonderfully open, standard and interoperable Web services-based application when it is available as a WSTF endpoint and there are three other real-life products (including one from your main competitor) that successfully interoperate with the exact same SOAP messages I will be using”. This could become an interesting tussle between vendors as well as between vendors and buyers.

Alternatively, of course, WSTF could turn into a test of how much difference there is between a “standard” and a publicly specified and interop-tested interaction scenario…

A quick (and unsuccesful) Technorati search for some blog comments returns the “WSTF Dark Retribution Dinorobots Limited Giftset” which “includes all five Dinorobots in their sinister evil incarnation”. Can’t say you were not warned…

[UPDATED 2008/12/10: Gilbert Pilz, who was involved with WSTF from the start (and also left a comment on this entry, see below), wrote a detailed description of the problem WSTF tries to address and how Gilbert and others have structured WSTF to solve it.]

[UPDATED 2008/12/15: Via InfoQ, another long description of the goals and processes of WSTF, this time from Doug Davis.]

[UPDATES 2009/1/5: Chris Ferris also weighs in, including his view on the relationship with WS-I. Having participated in several of the early WS-I plenary meetings, I have to wonder if Chris had any double-entente in mind when he wrote that WS-I helps “the community understand where the bar is”.]

[UPDATED 2009/2/17: A response from Redmond.]

1 Comment

Filed under Everything, IBM, Implementation, Microsoft, Oracle, SOAP, Specs, Standards, Testing

What you’ve been spared (aka blog drafts boneyard #1)

I try to keep posts on this blog relevant to the general topic of IT management. Less than 10% of messages are in the “off-topic” category and even those are usually somewhat related to computer technology (mostly rants against the misuse of Flash and against the stupid ways in which US Social Security numbers are used). What this means in practice is that off-topic drafts are often abandoned when I realize that they are not relevant enough to make the cut. My “drafts” folder is a boneyard of such entries. Today, I am relaxing my standards and subjecting you to a list of them (they are still computer-related). Hopefully, either you find at least some of them interesting, or you come out with a renewed appreciation of what you’ve been spared over the years. Since they are all in one post, they are easy to just skip it altogether without being too tempted to hit the “unsubsribe” button for those who really only want to read about IT management (at least from me).

Here is a list of the topics covered below:

Messing with a blogger’s head

I recently looked at the HTTP logs for this site. Maybe I am the last blogger to realize this, but it looks like the online blog readers (e.g. Google Reader, Bloglines…) tell you how many subscribers they have for your feed. They do this through the user-agent HTTP header, which gets logged. It looks something like this:

Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 102 subscribers; feed-id=…)

Of course that’s only on a per-feed basis, so you need to add all the feeds (Atom and the different RSS versions) to get a total. Still, it’s a lot more visibility than I had before.

My first thought was “hey, some people are reading, better watch what I write”. But I quickly discarded that in favor of a more intriguing idea: if bloggers use this data, how hard would it be to mess with their heads? After all, this is not verifiable. Anyone can send HTTP requests with any user-agent they want. I can pick a blog and starts sending HTTP GET requests on their feeds with a user agent that pretends to be “Feedfetcher-Google”. And I can set the “subscribers” number to anything I want. To not be too suspicious, I could slowly pump it up, to look like a realistic increase.

Of course, an alert blogger would probably smell a rat if the number of subscribers shoots up and the number of incoming links and comments didn’t change, if the site still didn’t show up near the top of Google searches, or if the technorati “authority” didn’t change. Etc. There are pleny to ways to reality-test this. But people have an amazing ability to suspend disbelief when they like what they see, however logic-defying. If you don’t believe me, I have a pile of mortgage-backed securities to sell you.

This stat-pumping experiment could be done as a practical joke. It could be done out of meanness.  It could be done as an unethical and pointless sociological study (how many subscribers does it take for someone to go buy a Porsche on the assumption that the traffic will eventually turn into $$$, how does the impression of popularity change the writing on the blog…). It could even be done as a fraud (guaranteed increase in your subscription numbers if you sign up for my blog marketing service or you get your money back: just check your logs to see the results… – of course you could also generate fake users to create real subscriptions). It hits bloggers where they are the most vulnerable: the ego.

If you are thinking of doing this as a way to be nice to someone who needs encouragements, it will probably backfire. Before you process, listen to act two of this radio show (description: “A group called Improv Everywhere decides that an unknown band, Ghosts of Pasha, playing their first ever tour in New York, ought to think they’re a smash hit. So they study the band’s music and then crowd the performance, pretending to be hard-core fans. Improv Everywhere just wants to make the band happy — to give them the best day of their lives. But the band doesn’t see it that way.”)

Google search suggestions

When you enter a Google search query (on google.com or in the Firefox search bar), as soon as you’ve typed a few characters it proposes to complete your search terms (BTW, it’s not just Google, it is now an well-know extension to OpenSearch but Google pioneered it, at least according to the spec). Something about this just doesn’t sound right. If you think you know what I am looking for, why not propose the most likely answers rather than trying to complete my search request? If you get it right, then I’ll stop typing and I’ll click. Plus, Google already concentrates viewers on a small set of pages for each search query, with this feature won’t they compound this by concentrating people to a smaller set of queries, further shrinking the Web?

Since Google feels free to give me plenty of unsolicited suggestions, here is mine to them. If you are going to hand-held people as they write their queries, provide suggestions that desambiguate rather than suggestions that overly constraint. For example, if I type “python”, I get these suggestions:

“python tutorial”, “python list”, “python strong”, “python ide”, “python download”, “python for loop”, “python datetime”, “python re”, “python time”, “python os”, all clearly about the programming language. Wouldn’t it be more useful to detect algorithmically that results from searching on “python” fall into three largely disjoint groups, to detect a common word in each group and to ask the user to qualify their “python” request with either “programming”, “snake” or “monty”? Rather than the simpler but, in my opinion, less valuable approach of showing the most popular search queries that start with “python”?

On the other hand, this “most popular” feature has one benefit: it provides plenty of fodder for pop psychology, as I found out when tried to ask Google why they provide these search suggestions. As soon as I typed “why”, I got suggestions including “why men cheat” and “why did I get married”.

The part I like about all this, is the meta-meta aspect. Google doesn’t only suggest what you might want to read based on your search, they even suggest what you might want to search on. What’s the next meta level? Suggesting that you want to do a web search when you’re not even thinking of doing one? You can bet they will if they can. What a butler indeed.

Google to navigate rather than search

Still on the topic of Google, but a positive comment this time. It struck me one day that pretty much every single bookmark I have in Firefox is for an Oracle-internal site, not the public Web. After thinking about it for a minute, I realized the reason: Google doesn’t index the Oracle intranet. When I find a good page there, I can’t be sure I’ll be able to find it again easily, so I bookmark it. On the Web, on the other hand, why bother bookmarking it. I pretty much know I can find it from my Firefox search bar.

Most of the time, when I use Google, it’s not to find a new page. It’s to get back to a specific page. Case in point, when I want to look something up in the XPath spec (which I have done a few times lately in the context a CMDBf). I know it’s on the W3C web site, I could go there and navigate to the page in a few clicks. I also have a copy of it on my disk, I could open my file explorer and get it from there. But instead I just type “xpath” in Google. Again, I am not looking really “searching” (trying to find information about XPath), I am just navigating (finding my way back to the spec).

So I started a post to share this brilliant insight, at which point I saw (using Google in “search” mode for once) that Robin Cannon has already perfectly described it.

So I’ll just add a few thoughts to complement what Robin wrote:

  • I am sure the implication in terms of advertising have long been studied by Google (I would guess that people who use Google for navigation are a lot less likely to click on ads than those who are actually searching).
  • AOL had to die for the “AOL keyword” to live.
  • There are serious privacy aspects to letting Google know what you’re up to all the time (but I am not logged into Google, I clean up my cookie jar relatively often and, at least at work, I am behind a large enough firewall to have a mostly anonymized IP).
  • Somewhat ironically, there a potential security benefits. For example, the HP employee credit union is called “Addison Avenue credit union”. Googling for “addison avenue” gets you right there. If you mistype the name and ask for “adison avenue”, you get a suggestion that maybe you meant “addison avenue”, along with a list of links related to “madison avenue”. That’s enough data to realize and correct your mistake. On the other hand, directly typing adisonavenue.com into the navigation bar could have taken you to a spoof site (in reality it takes you to a link farm, not quite as bad, but you never know what it will turn into tomorrow).

BTW, am I the only one who doesn’t know what 2 of the top 3 “Google Fastest Rising Search Terms 2007” relate to (from the list in Robin’s post)?

What is a computer

It started with this New Scientist article: Ten weirdest computers. With all these examples, how do we define what a computer is? Fundamentally, it’s a physical system that can process data. Meaning that you can define a logical data model that can be mapped to the physical characteristics of the system. And the system is such that it (through the laws of physics) changes in such a way that after a time its new physical configuration represents data that corresponds to a calculation that took place on your original data. You get the resulting data by measuring physical characteristics of the system (not necessarily the same physical characteristics that you controlled to represent the input data) and deriving the result data from it. In short, to use a computer:

  • Step 1: you create a system that represents your input data
  • Step 2: you let the laws of physics “do their thing” on the sytem
  • Step 3: you measure the system to derive your output data

For example, take a spring scale and a bunch of 1kg weights. That’s a computer. At least it can add (within a given range). To calculate “4+8” you put four 1kg weights on the scale, then you put eight more, then you read the number next to the needle and it should tell you “12”. This is an example in which the physical characteristics that you use to provide input data (putting weights on the scale) is different in nature from the physical characteristics that you measure to get the output (the position of the needle, which is really a way to measure the compression of the spring in the scale).

Based on this, we can ask the next (and more practically useful) question: what makes a *good* computer? It has the following characteristics:

  • easy to set up
  • easy to measure results at needed precision level
  • not too many side effects (e.g. energy consumption)
  • fast and versatile (planting a pine tree seed and waiting for a pine cone to come out in order to calculate a Fibonacci sequence is a little too slow and too specialized)
  • able to process large amounts of data (that’s where the mechanical scale doesn’t… scale).

On that last topic, there are two ways to process large amounts of data. The way used by current computers is to process little at once but very fast and in a way that makes it very easy to use the output of one operation as input to the next one. The alternative would be to compute a large problem in one go of the physical system. For example, maybe one day we’ll know how to represent a mathematical problem in DNA form, such that we know that the solution to the problem corresponds to the DNA sequence most useful to a bacteria in a given environment, e.g. most likely to resist a given antibiotic. Setting up the computation system, in this case, would be engineering the antibiotic that selects for the problem’s solution. You can put that antibiotic in your Petri dish (or in the food of your 1000 cows, now that’s a “computer farm”), wait for a few days, then sequence the DNA of the bacteria that’s in the dish (or in your cow’s “output” matter, think of it as a “core dump”).

You can think of it as the RISC versus CISC debate, except with many more orders of magnitude in difference between the alternatives.

It is also interesting to note that networks and storage mechanisms (the other two consitutive elements of a data center, along with computers) can be thought of in a very similar way. If step 2 doesn’t change the data and can be made to last long enough, you have a storage system (e.g. engrave text on stone, store stone for a few thousand years, read text from stone). If instead of being far apart in time the locations in which you perform steps 1 and 3 are far apart in space (with 2 still not changing the data), then you have a networking system.

Is this a site or a feed

Like 99% of the blogs out there, this site is just an HTML rendition of an RSS (or Atom) feed. Isn’t it a little silly to have millions of Web site (visited by humans) that have their structure dictated by a machine-to-machine protocol? It is especially ironic on a site like mine, which occasionally talks about data models and protocols (and on which you would therefore expect the difference between the two to be understood). But no. Every time a new release of CMDBf comes out, for example, I create a new post with an updated version of the pseudo-algorithm for performing a graph query. Rather than having one page that gets updated (with potentially a “history” feature to access older versions).

As much as I’d like to blame the limitations of WordPress, I think it’s more a sign of my laziness. There are plenty of WordPress extensions that I have never considered. Or I could move to Drupal. The key question is, is there a way to get a site that is more useful as a unit (“show me what information William provides on his site”), while keeping the value of the feed (“tell me when William adds new content”) and not adding to my workload?

Comments Off on What you’ve been spared (aka blog drafts boneyard #1)

Filed under Everything, Google, Off-topic