William Vambenepe's blog

IT management in a changing IT world

buy generic viagraviagra attorneys

Archive for the 'Implementation' Category

13
Jun
2008

Some breathing room for Google App Engine requests

by William Vambenepe

As promised to Felix here is the code that shows how to give extra breathing room to Google App Engine (GAE) requests that may otherwise be killed for taking too long to complete. The approach is similar to the one previously described. But rather than trying to emulate a long-running process, I am simply allowing a request to spread its work over a handful of invocations, thus getting several 9 seconds slots (since this seems to be how much time GAE gives you per request right now).

If all your requests need this then you are going to run into the same “high CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled” problem seen in the previous experiment. But if 90% of your requests complete in a normal time and only 10% of the requests need more time, then this approach can help prevent your users from getting an error for 1 out of every 10 requests. And you should fly under the radar of the GAE resource cop.

The way it works is simply that if your request is interrupted for having run too long the client gets a redirect to a new instance of the same handler. Because the code saves its results incrementally in the datastore, the new instance can build on the work of the previous one.

This specific example retrieves the ubuntu-8.04-server-i386.jigdo file (98K) from a handful of Ubuntu mirrors and returns the average/min/max download times (without checking if the transfer was successful or not). I also had to add a 1 second sleep after each fetch in order to trigger the DeadlineExceededError because the fetch operations go too quickly when running on GAE rather than my machine (I guess Google has better connectivity than my mediocre AT&T-provided DSL line, who would have thought).

#!/usr/bin/env python
#
# Copyright 2008 William Vambenepe
#

import wsgiref.handlers
import os
import logging
import time

from google.appengine.ext import db
from google.appengine.ext.webapp import template
from google.appengine.ext import webapp
from google.appengine.api import urlfetch
from google.appengine.runtime import DeadlineExceededError

targetUrls = ["http://mirror.anl.gov/pub/ubuntu-iso/CDs/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ubuntu.mirror.ac.za/ubuntu-release/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://mirrors.cytanet.com.cy/linux/ubuntu/releases/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ftp.kaist.ac.kr/pub/ubuntu-cd/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ftp.itu.edu.tr/Mirror/Ubuntu/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ftp.belnet.be/mirror/ubuntu.com/releases/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ubuntu-releases.sh.cvut.cz/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ftp.crihan.fr/releases/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ftp.uni-kl.de/pub/linux/ubuntu.iso/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ftp.duth.gr/pub/ubuntu-releases/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://no.releases.ubuntu.com/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://neacm.fe.up.pt/pub/ubuntu-releases/hardy/ubuntu-8.04-server-i386.jigdo"]

class MeasurementSet(db.Model):
  iteration = db.IntegerProperty()
  measurements = db.ListProperty(float)

class MainHandler(webapp.RequestHandler):
  def get(self):
    try:
      key = self.request.get(”key”)
      set = MeasurementSet.get(key)
      if (set == None):
        raise ValueError
      set.iteration = set.iteration + 1
      set.put()
      logging.debug(”Resuming existing set, with key ” + str(key))
    except:
      set = MeasurementSet()
      set.iteration = 1
      set.measurements = []
      set.put()
      logging.debug(”Starting new set, with key ” + str(set.key()))
    try:
      # Dereference remaining URLs
      for target in targetUrls[len(set.measurements):]:
        startTime = time.time()
        urlfetch.fetch(target)
        timeElapsed = time.time() - startTime
        time.sleep(1)
        logging.debug(target + ” dereferenced in ” + str(timeElapsed) + ” sec”)
        set.measurements.append(timeElapsed)
        set.put()
      # We’re done dereferencing URLs, let’s publish the results
      longestIndex = 0
      shortestIndex = 0
      totalTime = set.measurements[0]
      for i in range(1, len(targetUrls)):
        totalTime = totalTime + set.measurements[i]
        if set.measurements[i] < set.measurements[shortestIndex]:
          shortestIndex = i
        elif set.measurements[i] > set.measurements[longestIndex]:
          longestIndex = i
      template_values = {”iterations”: set.iteration,
                         “longestTime”: set.measurements[longestIndex],
                         “longestTarget”: targetUrls[longestIndex],
                         “shortestTime”: set.measurements[shortestIndex],
                         “shortestTarget”: targetUrls[shortestIndex],
                         “average”: totalTime/len(targetUrls)}
      path = os.path.join(os.path.dirname(__file__), “steps.html”)
      self.response.out.write(template.render(path, template_values))
      logging.debug(”Set with key ” + str(set.key()) + ” has returned”)
    except DeadlineExceededError:
      logging.debug(”Set with key ” + str(set.key())
                    + ” interrupted during iteration “+ str(set.iteration)
                    + ” with ” + str(len(set.measurements)) + ” URLs retrieved”)
      self.redirect(”/steps?key=” + str(set.key()))
      logging.debug(”Set with key ” + str(set.key()) + ” sent redirection”)

def main():
  application = webapp.WSGIApplication([("/steps", MainHandler)], debug=True)
  wsgiref.handlers.CGIHandler().run(application)

if __name__ == “__main__”:
  main()

I can’t guarantee I will keep it available, but at the time of this writing the application is deployed here if you want to give it a spin. A typical run produces this kind of log:

06-13 01:44AM 36.814 /steps
XX.XX.XX.XX - - [13/06/2008:01:44:45 -0700] “GET /steps HTTP/1.1″ 302 0 - -
  D 06-13 01:44AM 36.847
    Starting new set, with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM
  D 06-13 01:44AM 37.870
    http://mirror.anl.gov/pub/ubuntu-iso/CDs/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.022078037262 sec
  D 06-13 01:44AM 38.913
    http://ubuntu.mirror.ac.za/ubuntu-release/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0184168815613 sec
  D 06-13 01:44AM 39.962
    http://mirrors.cytanet.com.cy/linux/ubuntu/releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0166189670563 sec
  D 06-13 01:44AM 41.12
    http://ftp.kaist.ac.kr/pub/ubuntu-cd/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0205371379852 sec
  D 06-13 01:44AM 42.103
    http://ftp.itu.edu.tr/Mirror/Ubuntu/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0197179317474 sec
  D 06-13 01:44AM 43.146
    http://ftp.belnet.be/mirror/ubuntu.com/releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0171189308167 sec
  D 06-13 01:44AM 44.215
    http://ubuntu-releases.sh.cvut.cz/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0160200595856 sec
  D 06-13 01:44AM 45.256
    http://ftp.crihan.fr/releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.015625 sec
  D 06-13 01:44AM 45.805
    Set with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM interrupted during iteration 1 with 8 URLs retrieved
  D 06-13 01:44AM 45.806
    Set with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM sent redirection
  W 06-13 01:44AM 45.808
    This request used a high amount of CPU, and was roughly 28.5 times over the average request CPU limit.
    High CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled.

Followed by:

06-13 01:44AM 46.72 /steps?key=agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM
XX.XX.XX.XX - - [13/06/2008:01:44:50 -0700] “GET /steps?key=agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM HTTP/1.1″ 200 472
  D 06-13 01:44AM 46.110
    Resuming existing set, with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM
  D 06-13 01:44AM 47.128
    http://ftp.uni-kl.de/pub/linux/ubuntu.iso/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.016991853714 sec
  D 06-13 01:44AM 48.177
    http://ftp.duth.gr/pub/ubuntu-releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0238039493561 sec
  D 06-13 01:44AM 49.318
    http://no.releases.ubuntu.com/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0177929401398 sec
  D 06-13 01:44AM 50.378
    http://neacm.fe.up.pt/pub/ubuntu-releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0226020812988 sec
  D 06-13 01:44AM 50.410
    Set with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM has returned
  W 06-13 01:44AM 50.413
  This request used a high amount of CPU, and was roughly 13.4 times over the average request CPU limit.
  High CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled.

I believe we can optimize the performance by taking advantage of the fact that successive requests are likely (but not guaranteed) to hit the same instance, allowing global variables to be re-used rather than always going to the datastore. My code is a proof of concept, not an optimized implementation.

Of course, the alternative is to drive things from the client, using JavaScript HTTP requests (rather than HTTP redirect) to repeat the HTTP request until the work has been completed. The list of pros and cons of each approach is left as an exercise to the reader.

[UPDATED 2008/6/13: Added log output. Removed handling of "OverQuotaError" which was not useful since, unlike "DeadlineExceededError", quotas are not per-request. As a result, splitting the work over multiple requests doesn't help. Slowing down a request might help, at which point the approach above might come in handy to prevent this slowdown from triggering "DeadlineExceededError".]

[UPDATED 2008/6/30: Steve Jones provides an interesting analysis of the cut-off time for GAE. Confirms that it's mainly based on wall-clock time rather than CPU time. And that you can sometimes go just over 9 seconds but never up to 10 seconds, which is consistent with my (much less detailed and rigorous) observations.]

06
Jun
2008

Emulating a long-running process (and a scheduler) in Google App Engine

by William Vambenepe

As previously described, Google App Engine (GAE) doesn’t support long running processes. Each process lives in the context of an HTTP request handler and needs to complete within a few seconds. If you’re trying to get extra CPU cycles for some task then Amazon EC2, not GAE, is the right tool (including the option to get high-CPU instances for the CPU-intensive tasks).

More surprising is the fact that GAE doesn’t offer a scheduler. Your app can only get invoked when someone sends it an HTTP request and you can’t ask GAE to generate a canned request every so often (crontab-style). That seems both limiting and arbitrary. In fact, I would be surprised if GAE didn’t soon add support for this.

In the meantime, your best bet is to get an account on a separate server that lets you schedule jobs, at which point you can drive your GAE application from that external scheduler (through HTTP requests to your GAE app). But just for the intellectual exercise, how would one meet the need while staying entirely within the confines of the Google-provided infrastructure?

  • The most obvious option is to piggyback on HTTP requests from your visitors. But:
    • this assumes that you consistently get visitors at a frequency greater than your scheduler’s interval,
    • since you can’t launch sub-processes in GAE, this delays your responses to the visitor,
    • more worrisome, if your scheduled task takes more than a few seconds this means your application might be interrupted by GAE before you respond to the visitor, resulting in a failed request from their perspective.
  • You can try to improve a bit on this by doing this processing not as part of the main request from your visitor but rather by putting in the response HTML some JavaScript that will asynchronously send you HTTP requests in the background (typically not visible to the user). This way, a given visitor will give you repeated invocations for as long as the page is open in the browser. And you can set the invocation interval. You can even create some kind of server-controlled auto-modulation of the interval (increasing it as your number of concurrent visitors increases) so that you don’t eat all your Google-allocated incoming HTTP quota with these XMLHttpRequest invocations. This would probably be a very workable way to do it in practice even though:
    • it only works if your application has visitors who use web browsers, not if it only consumed by programs (e.g. through RSS feeds or other XML format),
    • it puts the burden on your visitors who may or may not appreciate it, assuming they realize it is happening (how would you feel if your real estate agent had to borrow your cell phone to arrange home visits for you and their other customers?).
  • While GAE doesn’t offer a scheduler, another Google service, Google Reader, offers one of sorts. If you register a feed there, Google’s FeedReader will retrieve it once a while (based on my logs, it happens approximately every hour for each of the two feeds for this blog). You can create multiple URLs that all map to the same handler and return some dummy RSS. If you register these feeds with Google Reader, they’ll get pulled once a while. Of course there is no guarantee that the pulling of the different feeds will be nicely spread out, but if you register enough of them you should manage to get invoked with a frequency compatible with you desired scheduler’s frequency.

That’s all nice, but it doesn’t entirely live within the GAE application. It depends on either the visitors or Google Reader. Can we do this entirely within GAE?

The idea is that since a GAE app can only executes within an HTTP request handler, which only runs for a few seconds, you can emulate a long-running process by automatically starting a successor request when the previous one is killed. This is made possible by two characteristics of the GAE runtime:

  • When an HTTP request is canceled on the client side, the request execution on the server is permitted to continue (until it returns or GAE kills it for having run too long).
  • When GAE kills a request for having run too long, it does it through an exception that you have a chance to handle (at least for a few seconds, until you get killed for good), which is when you initiate the HTTP request that spawns the successor process.

If you’ve watched (or played) Rugby, this is equivalent to passing the ball to a teammate during that short interval between when you’re tackled and when you hit the ground (I have no idea whether the analogy also applies to Rugby’s weird cousin called American Football).

In practice, all you have to do is structure your long running task like this:

class StartHandler(webapp.RequestHandler):
  def get(self):
    if (StopExec.all().count() == 0):
      try:
        id = int(self.request.get("id"))
        logging.debug("Request " + str(id) + " is starting its work.")
        # This is where you do your work
      finally:
        logging.debug("Request " + str(id) + " has been stopped.")
        # Save state to the datastore as needed
        logging.debug("Launching successor request with id=" + str(id+1))
        res = urlfetch.fetch("http://myGaeApp.appspot.com/start?id=" + str(id+1))

Once you have deployed this app, just point your browser to http://myGaeApp.appspot.com/start?id=0 (assuming of course that your GAE app is called “myGaeApp”) and the long-running process is started. You can hit the “stop” button on your browser and turn off your computer, the process (or more exactly the succession of processes) has a life of its own entirely within the GAE infrastructure.

The “if (StopExec.all().count() == 0)” statement is my way of keeping control over the beast (if only Dr. Frankenstein had as much foresight). StopExec is an entity type in the datastore for my app. If I want to kill this self-replicating process, I just need to create an entity of this type and the process will stop replicating. Without this, the only way to stop it would be to delete the whole application through the GAE dashboard. In general, using the datastore as shared memory is the way to communicate with this emulation of a long-running process.

A scheduler is an obvious example of a long-running process that could be implemented that way. But there are other examples. The only constraint is that your long-running process should expect to be interrupted (approximately every 9 seconds based on what I have seen so far). It will then re-start as part of a new instance of the same request handler class. You can communicate state between one instance and its successor either via the request parameters (like the “id” integer that I pass in the URL) or by writing to the datastore (in the “finally” clause) and reading from it (at the beginning of your task execution).

By the way, you can’t really test such a system using the toolkit Google provides for local testing, because that toolkit behaves very differently from the real GAE infrastructure in the way it controls long-running processes. You have to run it in the real GAE environment.

Does it work? For a while. The first time I launched it, it worked for almost 30 minutes (that’s a lot of 9 second-long processes). But I started to notice these worrisome warnings in the logs: “This request used a high amount of CPU, and was roughly 21.7 times over the average request CPU limit. High CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled.”

And indeed, after 30 minutes of happiness my app was disabled for a bit.

My quota figures on the dashboard actually looked pretty good. This was not a very busy application.

CPU Used 175.81 of 199608.00 Gigacycles (0%)
Data Sent 0.00 of 2048.00 Megabytes (0%)
Data Received 0.00 of 2048.00 Megabytes (0%)
Emails Sent 0.00 of 2000.00 Emails (0%)
Megabytes Stored 0.05 of 500.00 Megabytes (0%)

But the warning in the logs points to some other restriction. Google doesn’t mind if you use a given number of CPU cycles through a lot of small requests, but it complains if you use the same number of cycles through a few longer requests. Which is not really captured in the “understanding application quotas” page. I also question whether my long requests actually consume more CPU than normal (shorter) requests. I stripped the application down to the point where the “this is where you do your work” part was doing nothing. The only actual work, in the “finally” clause, was to opens an HTTP connection and wait for it to return (which never happens) until the GAE runtime kills the request completely. Hard to see how this would actually use much CPU. Yet, same warning. The warning text is probably not very reflective of the actual algorithm that flags my request as a hog.

What this means is that no matter how small and slim the task is, the last line (with the urlfetch.fetch() call) by itself is enough to get my request identified as a hog. Which means that eventually the app is going to get disabled. Which is silly really because by that the time fetch() gets called nothing useful is happening in this request (the work has transitioned to the successor request) and I’d be happy to have it killed as soon as the successor has been spawned. But GAE doesn’t give you a way to set client-side timeout on outgoing HTTP requests. Neither can you configure the GAE cop to kill you early so that you don’t enter the territory of “this request used a high amount of CPU”.

I am pretty confident that the ability to set client-side HTTP timeout will be added to the urlfetch API. Even Google’s documentation acknowledges this limitation: “Note: Since your application must respond to the user’s request within several seconds, a URL fetch action to a slow remote server may cause your application to return a server error to the user. There is currently no way to specify a time limit to the URL fetch action.” Of course, by the time they fix this they may also have added a real scheduler…

In the meantime, this was a fun exploration of the GAE environment. It makes it clear to me that this environment is still a toy. But a very interesting and promising one.

28
May
2008

RESTful JMX access from someone who knows both sides

by William Vambenepe

Anyone interested in application manageability and/or management integration should read about Jean-Francois Denise’s prototype for RESTful Access to JMX Instrumentation. Not (at least for now) as something to make use of, but to force us to think pragmatically about the pros and cons of the WS-* stack when used for management integration.

The interesting question is: which of these two interfaces (the WS-Management-based interface being standardized or the HTTP-centric interface that Jean-Francois prototyped) makes it easier to write a cross-platform management application such as the poker-cheating demo at JavaOne 2008?

Some may say that he cheated in that demo by using the Microsoft-provided WinRM implementation of WS-Management on the VBScript side. Without it, it would have clearly been a lot harder to implement the WS-Management based protocol in VBScript than the REST approach. True, but that’s the exact point of standards, that they allow such libraries to be made available to assist implementers. The question is whether such a library is available for your platform/language, how good and interoperable that library is (it could actually hinder rather than help) and what is the cost to the project of depending on it. Which is why the question is hard to answer in absolute. I suspect that, even with WinRM, the simple use case demonstrated at JavaOne would have been easier to implement using straight HTTP but that things change quickly when you run into more demanding use cases (e.g. event notification with filters, sequencing of large responses into an enumeration…). Which is why I still think that the sweetspot would be a simplified WS-Management specification (freed of the WS-Addressing crud for example) that makes it easy (almost as easy as the HTTP-based interface) to implement simple use cases (like a GET) by hand but is still SOAP-based, which lets it seamlessly enter library-driven territory when more advanced features are added (e.g. WS-Security, WS-Enumeration…). Rather than the current situation in which there is a protocol-level disconnect between the HTTP interface (easy to implement by hand) and the WS-Management interface (for which manually implementation is a cruel - and hopefully unusual - punishment).

So, Jean-Francois, where is this JMX-REST work going now?

While you’re on Jean-Francois’ blog, another must-read is his account of the use of Wiseman and Metro in the WS Connector for JMX Agent RI.

As a side note (that runs all the way to the end of this post), Jean-Francois’ blog is a perfect illustration of the kind of blogs I like to subscribe to. He doesn’t feel the need to post all the time. But when he does (only four entries so far this year, three of them “must read”), he provides a lot of insight on a topic he really understands. That’s the magic of RSS/Atom. There is zero cost to me in keeping his feed in my reader (it doesn’t even appear until he posts something). The opposite of what used to be conventional knowledge (that you need to post often to “keep your readers engaged” as the HP guidelines for bloggers used to say). Leaving the technology aside (there is nothing to RSS/Atom technologically other than the fact that they happen to be agreed upon formats), my biggest hope for these specifications is that they promote that more thoughtful (and occasional) style of web publishing. In my grumpy days (are there others?), a “I can’t believe United lost my luggage again” or “look at the nice flowers in my backyard” post is an almost-automatic cause for unsubscribing (the “no country for old IT guys” series gets a free pass though).

And Jean-Francois even manages to repress his Frenchness enough to not take snipes at people just for the fun of it. Another thing I need to learn from him. For example, look at this paragraph from the post that describes his use of Wiseman and Metro:

“The JAX-WS Endpoint we developed is a Provider<SOAPMessage>. Simply annotating with @WebService was not possible. WS-Addressing makes intensive use of SOAP headers to convey part of the protocol information. To access to such headers, we need full access to the SOAP Message. After some redesigning of the existing code we extracted a WSManAgent Class that is accessible from a JAX-WS Endpoint or a Servlet.”

In one paragraph he describes how to do something that IBM has been claiming for years can’t be done (implement WS-Management on top of JAX-WS). And he doesn’t even rub it in. Is he a saint? Good think I am here to do the dirty work for him.

BTW, did anyone notice the irony that this diatribe (which, by now, is taking as much space as the original topic of the post) is an example of the kind of text that I am glad Jean-Francois doesn’t post? You can take the man out of standards, but you can’t take the double standard out of the man.

[UPDATED 2008/6/3: Jean-Francois now has a second post to continue his exploration of marrying the Zen philosophy with the JMX technology.]

20
May
2008

JSR262 public review ballot

by William Vambenepe

The Public Review Ballot for JSR #262 that took place in the Executive Committee for SE/EE has closed. I am not familiar enough with the JCP process to know exactly what this milestone represents. But the results are interesting in any case.

The vote narrowly passed with 6 yes, 5 no and 1 abstain.

The overiding concern listed by the “no” voters (and several of the “yes” voters) is the fact that JSR262 uses WS-Management (a DMTF standard) which itself makes use of specifications that have been submitted to W3C but are not currently in the process of standardization (WS-Transfer, WS-Eventing, WS-Enumeration). And that it uses an older version of a now-standard specification (WS-Addressing).

SAP makes the most insightful comment: that this is not really a JCP problem but a DMTF problem. Hopefully the DMTF (and Microsoft, since it controls the fate of the specifications in question) will step up to the plate on this. This is likely to happen. Even if the DMTF and Microsoft didn’t care about making the JCP happy (but they do, don’t they?), they will run into similar issues if/when they push WS-Management towards ANSI/ISO standardization.

Next to this “non-standard dependencies” issue, there is only one technical issue mentioned. As you guessed, it’s IBM whining about the lack of a WSDL to feed their tools. This is becoming so repetitive that I may eventually stop making fun of it (but don’t hold your breath, I am not known for being very good at ending long-running jokes). It is pretty ironic to hear IBM claim that without that WSDL you can’t implement the spec on JAX-WS when you know that the wiseman reference implementation by Sun and HP is based on JAX-WS…

25
Jun
2007

CMDBF interoperability testing (alpha edition)

by William Vambenepe

Last week, a bunch of us from the CMDBF author companies got together in a room for our first interoperability testing event. It was based on a subset of the spec. As usual with these kinds of events, the first hurdle was to get the network setup right (we had a mix of test endpoints running on remote servers and on our laptops; getting all laptops to talk to one another on a local network was easy; getting them all to talk to the remote endpoints over the internet was easy too; but getting both at the same time took a bit of work).

Once this was taken care of, the interop tests went pretty smoothly. A few problems were found but they were fixed on the fly and by the end we had happy MDRs (Management Data Repositories) talking to happy CMDBs who themselves were accessed by happy client applications. All this using the CMDBF-defined query and update interfaces.

Next steps are to update the spec with the lessons from the interop and to complete it with a few additional features that we put out of scope for the interop. Stay tuned.

31
May
2007

CMDBF update

by William Vambenepe

My CMDBF colleague from BMC Van Wiles has a short update on the state of the CMDBF work. This is an occasion for me to point to his blog. I know that several of my readers are very interested in the CMDBF work, and they should probably monitor Van’s posts in addition to mine.

Like Van, I see the pace accelerating, which is good. More importantly, the quality of the discussion has really improved, not just the quantity. We’ve spent way too much time in UML-land (come on people, this is a protocol, get the key concepts and move on, no need to go in UML to the same level of detail that the XML contains), but the breakthrough was the move to pseudo-schema, and from there to example XML instance docs and finally to java code for the interop. I am having a lot of fun writing this code. I used this opportunity to take XOM on a road test and it’s been a very friendly companion.

Now that we’re getting close to the CMDBF interop, I feel a lot better about the effort than I did at the beginning of the year. It’s still going to be challenging to make the interop. Even if we succeed, the interop only covers a portion of what the spec needs to deliver. But it will be a very good milestone.

27
May
2007

Omri on SOAP and WCF

by William Vambenepe

Omri Gazitt presumably couldn’t find sleep last Friday night, so he wrote a well thought-out blog post instead. Well worth the read. His view on this is both broad and practical. There is enough in this post to, once again, take me within inches of trying WCF. But the fact that for all practical purposes the resulting code can only be deployed on Windows stops me from making this investment.

And since he still couldn’t sleep he penned another entry shortly after. That one is good but a bit less convincing. Frankly, I don’t think the technical differences between Java/C# and “dynamic languages” have much to do with the fact that stubs hurt you more often than not when developing code to process XML messages. With a sentence like “in a typed language, the conventional wisdom is that generating a proxy for me based on some kind of description of the service will make it easier for me to call that service using my familiar language semantics” Omri takes pain to avoid saying whether he agrees with this view. But if he doesn’t (and I don’t think he does), you’d think that he’d be in a pretty good position (at least on the .NET side) to change the fact that, as he says “the way WSDL and XSD are used in platforms like J2EE and .NET tends to push you towards RPC”…

I haven’t used .NET since writing C# code back when HP was selling the Bluestone J2EE server and I was in charge of Web services interoperability, so I have limited expertise there. But Java has the exact same problem with its traditional focus on RPC (just ask Steve). I am currently writing a prototype in Java for the CMDB Federation specification that is still at an early stage. All based on directly processing the XML (mostly through a bunch of XPath queries) and it makes it a breeze to evolve as the draft spec changes. Thank you XOM (and soon Nux).

I very much agree with the point Omri is making (that relying on metadata to add complexity in order to remove it) is an issue, but it’s not just for dynamic languages.

18
Jul
2006

A look at Web services support at Microsoft

by William Vambenepe

A “high-level overview of Microsoft support for Web services across its product offerings” was recently published at MSDN. At times it sounds a bit like a superficial laundry list of specs (it really doesn’t mean much to say that a product “supports” a spec even though we all often resort to such vague statement). Also, the screen captures are not very informative. But considering the breath of material and the stated goal of providing a high-level overview this is a nice document. And if I imagine myself in the position of trying to write a similar document for HP, my appreciation for the work that went into it rises quickly.

Having all this listed in one place points out one disappointing aspect: the disconnect between Web services usage for “management and devices” and everything else Web services. If you look at the table at the bottom of the MSDN article, there is no checkbox for any management or device Web service technology in the ASMX 2.0, WSE 2.0, WSE 3.0 or WCF columns. These technologies only appear in the guts of the OS. So Vista can interact with devices using WS-Eventing, WS-Discovery and the Web services device profile, but I guess Visual Studio developers are not supposed to want to do anything with these interfaces. Similarly, Windows Server R2 provides access to its manageability information using WS-Management (actually, AFAIK it’s still an older, pre-standard version of WS-Management but we’ll pass on that), WS-Transfer, WS-Enumeration and WS-Eventing but the Visual Studio developers are again on their own to take advantage of this to write applications that take advantage of these capabilities.

This is disappointing because one of the most interesting aspects of using Web services for management is to ease integration between IT management and business applications, a necessary condition in order to make the former more aligned with the later. By using SOAP for both we are getting a bit closer than when one was SNMP and the other was RMI, but we won’t get to the end goal if there is no interoperability above the SOAP layer.

Hopefully this is only a “point in time” problem and we will soon see better support for Web services technologies used in management in the general Web services stack.

The larger question of course is that of the applicability (or lack of applicability) of generic XML transfer mechanisms (like WS-Transfer, WS-Eventing and WS-Enumeration) outside of the resource management domain. That’s a topic for a later post.

05
Jul
2006

RDF to XML tools for everyone in 2010?

by William Vambenepe

As has been abundantly commented on, a lot of the tool/runtime support for XML development is centered on mapping XML to objects. The assumption being that developers are familiar with OO and not XML and so tools provide value by allowing developers to work in the environment they are most comfortable in. Of course, little by little it becomes obvious that this “help” is not necessarily that helpful and that if the processing of XML document is core to the application then the developer is much better off learning XML concepts and working with them.

The question is, what will happen to the tools once we move beyond XML as the key representation. XML might still very well be around as the serialization (it sure makes transformations easy), but in many domains (IT management for one), we’ll have to go beyond XML for the semantics. Relying on containment and order is a very crude mechanism, and one that can’t be extended very well despite what the X in XML is supposed to stand for. Let’s assume that we move to something like RDF and that by that point most developers are comfortable with XML. Who wants to bet that tools would show up that try to prevent developers from having to learn RDF concepts and instead find twisted ways to process RDF statements as one big XML document, using the likes of XPath, XSLT and XQuery instead of SPARQL?

All the trials and errors around Web services tools in Java (especially) made it very clear how tools can hold you back just as much as they can help you move forward.

23
May
2006

Just-in-scope (or just-if-i-care) validation

by William Vambenepe

I started my Christmas wish list early this year. In January I described how I would like a development tool that tests XPath statements not just against instance documents but also against the schema. Since I am planning to be good this year for a change, I might get away with a second item on my wish list, so here it is. It’s the runtime companion of my January design-time request.

Once I have written an application that consumes messages by providing XPath expressions to retrieve the parts I care about, I would like to have the runtime validate that the incoming messages are compatible with the app. But not reject messages that I could process. One approach would be to schema-validate all incoming messages and reject the messages that fail. Assuming that I validated my XPath statements against the schema using the tool from my January wish, this should protect me. But this might also reject messages that I would be able to process. For example, even if the schema does not allow element extensibility at the end of the document, it shouldn’t break my application if the incoming message does contain an extra element at the end, if all I do with the message is retrieve the value of a well-defined foo/bar element. So what I would like is a runtime that is able to compare the incoming message with the schema I used and reject it only if the deviations from the schema are in locations that can possibly be reached by my application through the XPath statements it uses to access the message. Otherwise allow the message to be processed.

Steve, could Alpine do that?