Category Archives: Implementation

June 13, 2008

Some breathing room for Google App Engine requests

As promised to Felix here is the code that shows how to give extra breathing room to Google App Engine (GAE) requests that may otherwise be killed for taking too long to complete. The approach is similar to the one previously described. But rather than trying to emulate a long-running process, I am simply allowing a request to spread its work over a handful of invocations, thus getting several 9 seconds slots (since this seems to be how much time GAE gives you per request right now).

If all your requests need this then you are going to run into the same “high CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled” problem seen in the previous experiment. But if 90% of your requests complete in a normal time and only 10% of the requests need more time, then this approach can help prevent your users from getting an error for 1 out of every 10 requests. And you should fly under the radar of the GAE resource cop.

The way it works is simply that if your request is interrupted for having run too long the client gets a redirect to a new instance of the same handler. Because the code saves its results incrementally in the datastore, the new instance can build on the work of the previous one.

This specific example retrieves the ubuntu-8.04-server-i386.jigdo file (98K) from a handful of Ubuntu mirrors and returns the average/min/max download times (without checking if the transfer was successful or not). I also had to add a 1 second sleep after each fetch in order to trigger the DeadlineExceededError because the fetch operations go too quickly when running on GAE rather than my machine (I guess Google has better connectivity than my mediocre AT&T-provided DSL line, who would have thought).

#!/usr/bin/env python
#
# Copyright 2008 William Vambenepe
#

import wsgiref.handlers
import os
import logging
import time

from google.appengine.ext import db
from google.appengine.ext.webapp import template
from google.appengine.ext import webapp
from google.appengine.api import urlfetch
from google.appengine.runtime import DeadlineExceededError

targetUrls = ["http://mirror.anl.gov/pub/ubuntu-iso/CDs/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ubuntu.mirror.ac.za/ubuntu-release/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://mirrors.cytanet.com.cy/linux/ubuntu/releases/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ftp.kaist.ac.kr/pub/ubuntu-cd/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ftp.itu.edu.tr/Mirror/Ubuntu/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ftp.belnet.be/mirror/ubuntu.com/releases/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ubuntu-releases.sh.cvut.cz/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ftp.crihan.fr/releases/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ftp.uni-kl.de/pub/linux/ubuntu.iso/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://ftp.duth.gr/pub/ubuntu-releases/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://no.releases.ubuntu.com/hardy/ubuntu-8.04-server-i386.jigdo",
              "http://neacm.fe.up.pt/pub/ubuntu-releases/hardy/ubuntu-8.04-server-i386.jigdo"]

class MeasurementSet(db.Model):
  iteration = db.IntegerProperty()
  measurements = db.ListProperty(float)

class MainHandler(webapp.RequestHandler):
  def get(self):
    try:
      key = self.request.get("key")
      set = MeasurementSet.get(key)
      if (set == None):
        raise ValueError
      set.iteration = set.iteration + 1
      set.put()
      logging.debug("Resuming existing set, with key " + str(key))
    except:
      set = MeasurementSet()
      set.iteration = 1
      set.measurements = []
      set.put()
      logging.debug("Starting new set, with key " + str(set.key()))
    try:
      # Dereference remaining URLs
      for target in targetUrls[len(set.measurements):]:
        startTime = time.time()
        urlfetch.fetch(target)
        timeElapsed = time.time() - startTime
        time.sleep(1)
        logging.debug(target + " dereferenced in " + str(timeElapsed) + " sec")
        set.measurements.append(timeElapsed)
        set.put()
      # We're done dereferencing URLs, let's publish the results
      longestIndex = 0
      shortestIndex = 0
      totalTime = set.measurements[0]
      for i in range(1, len(targetUrls)):
        totalTime = totalTime + set.measurements[i]
        if set.measurements[i] < set.measurements[shortestIndex]:
          shortestIndex = i
        elif set.measurements[i] > set.measurements[longestIndex]:
          longestIndex = i
      template_values = {"iterations": set.iteration,
                         "longestTime": set.measurements[longestIndex],
                         "longestTarget": targetUrls[longestIndex],
                         "shortestTime": set.measurements[shortestIndex],
                         "shortestTarget": targetUrls[shortestIndex],
                         "average": totalTime/len(targetUrls)}
      path = os.path.join(os.path.dirname(__file__), "steps.html")
      self.response.out.write(template.render(path, template_values))
      logging.debug("Set with key " + str(set.key()) + " has returned")
    except DeadlineExceededError:
      logging.debug("Set with key " + str(set.key())
                    + " interrupted during iteration "+ str(set.iteration)
                    + " with " + str(len(set.measurements)) + " URLs retrieved")
      self.redirect("/steps?key=" + str(set.key()))
      logging.debug("Set with key " + str(set.key()) + " sent redirection")

def main():
  application = webapp.WSGIApplication([("/steps", MainHandler)], debug=True)
  wsgiref.handlers.CGIHandler().run(application)

if __name__ == "__main__":
  main()

I can’t guarantee I will keep it available, but at the time of this writing the application is deployed here if you want to give it a spin. A typical run produces this kind of log:

06-13 01:44AM 36.814 /steps
XX.XX.XX.XX - - [13/06/2008:01:44:45 -0700] "GET /steps HTTP/1.1" 302 0 - -
  D 06-13 01:44AM 36.847
    Starting new set, with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM
  D 06-13 01:44AM 37.870
    http://mirror.anl.gov/pub/ubuntu-iso/CDs/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.022078037262 sec
  D 06-13 01:44AM 38.913
    http://ubuntu.mirror.ac.za/ubuntu-release/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0184168815613 sec
  D 06-13 01:44AM 39.962
    http://mirrors.cytanet.com.cy/linux/ubuntu/releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0166189670563 sec
  D 06-13 01:44AM 41.12
    http://ftp.kaist.ac.kr/pub/ubuntu-cd/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0205371379852 sec
  D 06-13 01:44AM 42.103
    http://ftp.itu.edu.tr/Mirror/Ubuntu/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0197179317474 sec
  D 06-13 01:44AM 43.146
    http://ftp.belnet.be/mirror/ubuntu.com/releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0171189308167 sec
  D 06-13 01:44AM 44.215
    http://ubuntu-releases.sh.cvut.cz/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0160200595856 sec
  D 06-13 01:44AM 45.256
    http://ftp.crihan.fr/releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.015625 sec
  D 06-13 01:44AM 45.805
    Set with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM interrupted during iteration 1 with 8 URLs retrieved
  D 06-13 01:44AM 45.806
    Set with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM sent redirection
  W 06-13 01:44AM 45.808
    This request used a high amount of CPU, and was roughly 28.5 times over the average request CPU limit.
    High CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled.

Followed by:

06-13 01:44AM 46.72 /steps?key=agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM
XX.XX.XX.XX - - [13/06/2008:01:44:50 -0700] "GET /steps?key=agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM HTTP/1.1" 200 472
  D 06-13 01:44AM 46.110
    Resuming existing set, with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM
  D 06-13 01:44AM 47.128
    http://ftp.uni-kl.de/pub/linux/ubuntu.iso/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.016991853714 sec
  D 06-13 01:44AM 48.177
    http://ftp.duth.gr/pub/ubuntu-releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0238039493561 sec
  D 06-13 01:44AM 49.318
    http://no.releases.ubuntu.com/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0177929401398 sec
  D 06-13 01:44AM 50.378
    http://neacm.fe.up.pt/pub/ubuntu-releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0226020812988 sec
  D 06-13 01:44AM 50.410
    Set with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM has returned
  W 06-13 01:44AM 50.413
  This request used a high amount of CPU, and was roughly 13.4 times over the average request CPU limit.
  High CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled.

I believe we can optimize the performance by taking advantage of the fact that successive requests are likely (but not guaranteed) to hit the same instance, allowing global variables to be re-used rather than always going to the datastore. My code is a proof of concept, not an optimized implementation.

Of course, the alternative is to drive things from the client, using JavaScript HTTP requests (rather than HTTP redirect) to repeat the HTTP request until the work has been completed. The list of pros and cons of each approach is left as an exercise to the reader.

[UPDATED 2008/6/13: Added log output. Removed handling of “OverQuotaError” which was not useful since, unlike “DeadlineExceededError”, quotas are not per-request. As a result, splitting the work over multiple requests doesn’t help. Slowing down a request might help, at which point the approach above might come in handy to prevent this slowdown from triggering “DeadlineExceededError”.]

[UPDATED 2008/6/30: Steve Jones provides an interesting analysis of the cut-off time for GAE. Confirms that it’s mainly based on wall-clock time rather than CPU time. And that you can sometimes go just over 9 seconds but never up to 10 seconds, which is consistent with my (much less detailed and rigorous) observations.]

2 Comments

Filed under Everything, Google, Google App Engine, Implementation, Utility computing

June 6, 2008

Emulating a long-running process (and a scheduler) in Google App Engine

As previously described, Google App Engine (GAE) doesn’t support long running processes. Each process lives in the context of an HTTP request handler and needs to complete within a few seconds. If you’re trying to get extra CPU cycles for some task then Amazon EC2, not GAE, is the right tool (including the option to get high-CPU instances for the CPU-intensive tasks).

More surprising is the fact that GAE doesn’t offer a scheduler. Your app can only get invoked when someone sends it an HTTP request and you can’t ask GAE to generate a canned request every so often (crontab-style). That seems both limiting and arbitrary. In fact, I would be surprised if GAE didn’t soon add support for this.

In the meantime, your best bet is to get an account on a separate server that lets you schedule jobs, at which point you can drive your GAE application from that external scheduler (through HTTP requests to your GAE app). But just for the intellectual exercise, how would one meet the need while staying entirely within the confines of the Google-provided infrastructure?

The most obvious option is to piggyback on HTTP requests from your visitors. But:
- this assumes that you consistently get visitors at a frequency greater than your scheduler’s interval,
- since you can’t launch sub-processes in GAE, this delays your responses to the visitor,
- more worrisome, if your scheduled task takes more than a few seconds this means your application might be interrupted by GAE before you respond to the visitor, resulting in a failed request from their perspective.
You can try to improve a bit on this by doing this processing not as part of the main request from your visitor but rather by putting in the response HTML some JavaScript that will asynchronously send you HTTP requests in the background (typically not visible to the user). This way, a given visitor will give you repeated invocations for as long as the page is open in the browser. And you can set the invocation interval. You can even create some kind of server-controlled auto-modulation of the interval (increasing it as your number of concurrent visitors increases) so that you don’t eat all your Google-allocated incoming HTTP quota with these XMLHttpRequest invocations. This would probably be a very workable way to do it in practice even though:
- it only works if your application has visitors who use web browsers, not if it only consumed by programs (e.g. through RSS feeds or other XML format),
- it puts the burden on your visitors who may or may not appreciate it, assuming they realize it is happening (how would you feel if your real estate agent had to borrow your cell phone to arrange home visits for you and their other customers?).
While GAE doesn’t offer a scheduler, another Google service, Google Reader, offers one of sorts. If you register a feed there, Google’s FeedReader will retrieve it once a while (based on my logs, it happens approximately every hour for each of the two feeds for this blog). You can create multiple URLs that all map to the same handler and return some dummy RSS. If you register these feeds with Google Reader, they’ll get pulled once a while. Of course there is no guarantee that the pulling of the different feeds will be nicely spread out, but if you register enough of them you should manage to get invoked with a frequency compatible with you desired scheduler’s frequency.

That’s all nice, but it doesn’t entirely live within the GAE application. It depends on either the visitors or Google Reader. Can we do this entirely within GAE?

The idea is that since a GAE app can only executes within an HTTP request handler, which only runs for a few seconds, you can emulate a long-running process by automatically starting a successor request when the previous one is killed. This is made possible by two characteristics of the GAE runtime:

When an HTTP request is canceled on the client side, the request execution on the server is permitted to continue (until it returns or GAE kills it for having run too long).
When GAE kills a request for having run too long, it does it through an exception that you have a chance to handle (at least for a few seconds, until you get killed for good), which is when you initiate the HTTP request that spawns the successor process.

If you’ve watched (or played) Rugby, this is equivalent to passing the ball to a teammate during that short interval between when you’re tackled and when you hit the ground (I have no idea whether the analogy also applies to Rugby’s weird cousin called American Football).

In practice, all you have to do is structure your long running task like this:

class StartHandler(webapp.RequestHandler):
  def get(self):
    if (StopExec.all().count() == 0):
      try:
        id = int(self.request.get("id"))
        logging.debug("Request " + str(id) + " is starting its work.")
        # This is where you do your work
      finally:
        logging.debug("Request " + str(id) + " has been stopped.")
        # Save state to the datastore as needed
        logging.debug("Launching successor request with id=" + str(id+1))
        res = urlfetch.fetch("http://myGaeApp.appspot.com/start?id=" + str(id+1))

Once you have deployed this app, just point your browser to http://myGaeApp.appspot.com/start?id=0 (assuming of course that your GAE app is called “myGaeApp”) and the long-running process is started. You can hit the “stop” button on your browser and turn off your computer, the process (or more exactly the succession of processes) has a life of its own entirely within the GAE infrastructure.

The “if (StopExec.all().count() == 0)” statement is my way of keeping control over the beast (if only Dr. Frankenstein had as much foresight). StopExec is an entity type in the datastore for my app. If I want to kill this self-replicating process, I just need to create an entity of this type and the process will stop replicating. Without this, the only way to stop it would be to delete the whole application through the GAE dashboard. In general, using the datastore as shared memory is the way to communicate with this emulation of a long-running process.

A scheduler is an obvious example of a long-running process that could be implemented that way. But there are other examples. The only constraint is that your long-running process should expect to be interrupted (approximately every 9 seconds based on what I have seen so far). It will then re-start as part of a new instance of the same request handler class. You can communicate state between one instance and its successor either via the request parameters (like the “id” integer that I pass in the URL) or by writing to the datastore (in the “finally” clause) and reading from it (at the beginning of your task execution).

By the way, you can’t really test such a system using the toolkit Google provides for local testing, because that toolkit behaves very differently from the real GAE infrastructure in the way it controls long-running processes. You have to run it in the real GAE environment.

Does it work? For a while. The first time I launched it, it worked for almost 30 minutes (that’s a lot of 9 second-long processes). But I started to notice these worrisome warnings in the logs: “This request used a high amount of CPU, and was roughly 21.7 times over the average request CPU limit. High CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled.”

And indeed, after 30 minutes of happiness my app was disabled for a bit.

My quota figures on the dashboard actually looked pretty good. This was not a very busy application.

CPU Used 175.81 of 199608.00 Gigacycles (0%)
Data Sent 0.00 of 2048.00 Megabytes (0%)
Data Received 0.00 of 2048.00 Megabytes (0%)
Emails Sent 0.00 of 2000.00 Emails (0%)
Megabytes Stored 0.05 of 500.00 Megabytes (0%)

But the warning in the logs points to some other restriction. Google doesn’t mind if you use a given number of CPU cycles through a lot of small requests, but it complains if you use the same number of cycles through a few longer requests. Which is not really captured in the “understanding application quotas” page. I also question whether my long requests actually consume more CPU than normal (shorter) requests. I stripped the application down to the point where the “this is where you do your work” part was doing nothing. The only actual work, in the “finally” clause, was to opens an HTTP connection and wait for it to return (which never happens) until the GAE runtime kills the request completely. Hard to see how this would actually use much CPU. Yet, same warning. The warning text is probably not very reflective of the actual algorithm that flags my request as a hog.

What this means is that no matter how small and slim the task is, the last line (with the urlfetch.fetch() call) by itself is enough to get my request identified as a hog. Which means that eventually the app is going to get disabled. Which is silly really because by that the time fetch() gets called nothing useful is happening in this request (the work has transitioned to the successor request) and I’d be happy to have it killed as soon as the successor has been spawned. But GAE doesn’t give you a way to set client-side timeout on outgoing HTTP requests. Neither can you configure the GAE cop to kill you early so that you don’t enter the territory of “this request used a high amount of CPU”.

I am pretty confident that the ability to set client-side HTTP timeout will be added to the urlfetch API. Even Google’s documentation acknowledges this limitation: “Note: Since your application must respond to the user’s request within several seconds, a URL fetch action to a slow remote server may cause your application to return a server error to the user. There is currently no way to specify a time limit to the URL fetch action.” Of course, by the time they fix this they may also have added a real scheduler…

In the meantime, this was a fun exploration of the GAE environment. It makes it clear to me that this environment is still a toy. But a very interesting and promising one.

[UPDATED 2009/28: Looks like a real GAE scheduler is coming.]

15 Comments

Filed under Brain teaser, Everything, Google, Google App Engine, Implementation, Testing, Utility computing

May 28, 2008

RESTful JMX access from someone who knows both sides

Anyone interested in application manageability and/or management integration should read about Jean-Francois Denise’s prototype for RESTful Access to JMX Instrumentation. Not (at least for now) as something to make use of, but to force us to think pragmatically about the pros and cons of the WS-* stack when used for management integration.

The interesting question is: which of these two interfaces (the WS-Management-based interface being standardized or the HTTP-centric interface that Jean-Francois prototyped) makes it easier to write a cross-platform management application such as the poker-cheating demo at JavaOne 2008?

Some may say that he cheated in that demo by using the Microsoft-provided WinRM implementation of WS-Management on the VBScript side. Without it, it would have clearly been a lot harder to implement the WS-Management based protocol in VBScript than the REST approach. True, but that’s the exact point of standards, that they allow such libraries to be made available to assist implementers. The question is whether such a library is available for your platform/language, how good and interoperable that library is (it could actually hinder rather than help) and what is the cost to the project of depending on it. Which is why the question is hard to answer in absolute. I suspect that, even with WinRM, the simple use case demonstrated at JavaOne would have been easier to implement using straight HTTP but that things change quickly when you run into more demanding use cases (e.g. event notification with filters, sequencing of large responses into an enumeration…). Which is why I still think that the sweetspot would be a simplified WS-Management specification (freed of the WS-Addressing crud for example) that makes it easy (almost as easy as the HTTP-based interface) to implement simple use cases (like a GET) by hand but is still SOAP-based, which lets it seamlessly enter library-driven territory when more advanced features are added (e.g. WS-Security, WS-Enumeration…). Rather than the current situation in which there is a protocol-level disconnect between the HTTP interface (easy to implement by hand) and the WS-Management interface (for which manually implementation is a cruel – and hopefully unusual – punishment).

So, Jean-Francois, where is this JMX-REST work going now?

While you’re on Jean-Francois’ blog, another must-read is his account of the use of Wiseman and Metro in the WS Connector for JMX Agent RI.

As a side note (that runs all the way to the end of this post), Jean-Francois’ blog is a perfect illustration of the kind of blogs I like to subscribe to. He doesn’t feel the need to post all the time. But when he does (only four entries so far this year, three of them “must read”), he provides a lot of insight on a topic he really understands. That’s the magic of RSS/Atom. There is zero cost to me in keeping his feed in my reader (it doesn’t even appear until he posts something). The opposite of what used to be conventional knowledge (that you need to post often to “keep your readers engaged” as the HP guidelines for bloggers used to say). Leaving the technology aside (there is nothing to RSS/Atom technologically other than the fact that they happen to be agreed upon formats), my biggest hope for these specifications is that they promote that more thoughtful (and occasional) style of web publishing. In my grumpy days (are there others?), a “I can’t believe United lost my luggage again” or “look at the nice flowers in my backyard” post is an almost-automatic cause for unsubscribing (the “no country for old IT guys” series gets a free pass though).

And Jean-Francois even manages to repress his Frenchness enough to not take snipes at people just for the fun of it. Another thing I need to learn from him. For example, look at this paragraph from the post that describes his use of Wiseman and Metro:

“The JAX-WS Endpoint we developed is a Provider<SOAPMessage>. Simply annotating with @WebService was not possible. WS-Addressing makes intensive use of SOAP headers to convey part of the protocol information. To access to such headers, we need full access to the SOAP Message. After some redesigning of the existing code we extracted a WSManAgent Class that is accessible from a JAX-WS Endpoint or a Servlet.”

In one paragraph he describes how to do something that IBM has been claiming for years can’t be done (implement WS-Management on top of JAX-WS). And he doesn’t even rub it in. Is he a saint? Good think I am here to do the dirty work for him.

BTW, did anyone notice the irony that this diatribe (which, by now, is taking as much space as the original topic of the post) is an example of the kind of text that I am glad Jean-Francois doesn’t post? You can take the man out of standards, but you can’t take the double standard out of the man.

[UPDATED 2008/6/3: Jean-Francois now has a second post to continue his exploration of marrying the Zen philosophy with the JMX technology.]

2 Comments

Filed under Application Mgmt, CMDB Federation, Everything, Implementation, IT Systems Mgmt, JMX, Manageability, Mgmt integration, Open source, SOAP, SOAP header, Specs, Standards, WS-Management

May 20, 2008

JSR262 public review ballot

The Public Review Ballot for JSR #262 that took place in the Executive Committee for SE/EE has closed. I am not familiar enough with the JCP process to know exactly what this milestone represents. But the results are interesting in any case.

The vote narrowly passed with 6 yes, 5 no and 1 abstain.

The overiding concern listed by the “no” voters (and several of the “yes” voters) is the fact that JSR262 uses WS-Management (a DMTF standard) which itself makes use of specifications that have been submitted to W3C but are not currently in the process of standardization (WS-Transfer, WS-Eventing, WS-Enumeration). And that it uses an older version of a now-standard specification (WS-Addressing).

SAP makes the most insightful comment: that this is not really a JCP problem but a DMTF problem. Hopefully the DMTF (and Microsoft, since it controls the fate of the specifications in question) will step up to the plate on this. This is likely to happen. Even if the DMTF and Microsoft didn’t care about making the JCP happy (but they do, don’t they?), they will run into similar issues if/when they push WS-Management towards ANSI/ISO standardization.

Next to this “non-standard dependencies” issue, there is only one technical issue mentioned. As you guessed, it’s IBM whining about the lack of a WSDL to feed their tools. This is becoming so repetitive that I may eventually stop making fun of it (but don’t hold your breath, I am not known for being very good at ending long-running jokes). It is pretty ironic to hear IBM claim that without that WSDL you can’t implement the spec on JAX-WS when you know that the wiseman reference implementation by Sun and HP is based on JAX-WS…

Comments Off on JSR262 public review ballot

Filed under Application Mgmt, Everything, IBM, Implementation, ISO, IT Systems Mgmt, JMX, Manageability, Microsoft, Specs, Standards, WS-Management, WS-Transfer

June 25, 2007

CMDBF interoperability testing (alpha edition)

Last week, a bunch of us from the CMDBF author companies got together in a room for our first interoperability testing event. It was based on a subset of the spec. As usual with these kinds of events, the first hurdle was to get the network setup right (we had a mix of test endpoints running on remote servers and on our laptops; getting all laptops to talk to one another on a local network was easy; getting them all to talk to the remote endpoints over the internet was easy too; but getting both at the same time took a bit of work).

Once this was taken care of, the interop tests went pretty smoothly. A few problems were found but they were fixed on the fly and by the end we had happy MDRs (Management Data Repositories) talking to happy CMDBs who themselves were accessed by happy client applications. All this using the CMDBF-defined query and update interfaces.

Next steps are to update the spec with the lessons from the interop and to complete it with a few additional features that we put out of scope for the interop. Stay tuned.

Comments Off on CMDBF interoperability testing (alpha edition)

Filed under CMDB Federation, CMDBf, Everything, Implementation, Specs, Standards

May 31, 2007

CMDBF update

My CMDBF colleague from BMC Van Wiles has a short update on the state of the CMDBF work. This is an occasion for me to point to his blog. I know that several of my readers are very interested in the CMDBF work, and they should probably monitor Van’s posts in addition to mine.

Like Van, I see the pace accelerating, which is good. More importantly, the quality of the discussion has really improved, not just the quantity. We’ve spent way too much time in UML-land (come on people, this is a protocol, get the key concepts and move on, no need to go in UML to the same level of detail that the XML contains), but the breakthrough was the move to pseudo-schema, and from there to example XML instance docs and finally to java code for the interop. I am having a lot of fun writing this code. I used this opportunity to take XOM on a road test and it’s been a very friendly companion.

Now that we’re getting close to the CMDBF interop, I feel a lot better about the effort than I did at the beginning of the year. It’s still going to be challenging to make the interop. Even if we succeed, the interop only covers a portion of what the spec needs to deliver. But it will be a very good milestone.

1 Comment

Filed under Everything, Implementation, Specs, Standards, Tech

May 27, 2007

Omri on SOAP and WCF

Omri Gazitt presumably couldn’t find sleep last Friday night, so he wrote a well thought-out blog post instead. Well worth the read. His view on this is both broad and practical. There is enough in this post to, once again, take me within inches of trying WCF. But the fact that for all practical purposes the resulting code can only be deployed on Windows stops me from making this investment.

And since he still couldn’t sleep he penned another entry shortly after. That one is good but a bit less convincing. Frankly, I don’t think the technical differences between Java/C# and “dynamic languages” have much to do with the fact that stubs hurt you more often than not when developing code to process XML messages. With a sentence like “in a typed language, the conventional wisdom is that generating a proxy for me based on some kind of description of the service will make it easier for me to call that service using my familiar language semantics” Omri takes pain to avoid saying whether he agrees with this view. But if he doesn’t (and I don’t think he does), you’d think that he’d be in a pretty good position (at least on the .NET side) to change the fact that, as he says “the way WSDL and XSD are used in platforms like J2EE and .NET tends to push you towards RPC”…

I haven’t used .NET since writing C# code back when HP was selling the Bluestone J2EE server and I was in charge of Web services interoperability, so I have limited expertise there. But Java has the exact same problem with its traditional focus on RPC (just ask Steve). I am currently writing a prototype in Java for the CMDB Federation specification that is still at an early stage. All based on directly processing the XML (mostly through a bunch of XPath queries) and it makes it a breeze to evolve as the draft spec changes. Thank you XOM (and soon Nux).

I very much agree with the point Omri is making (that relying on metadata to add complexity in order to remove it) is an issue, but it’s not just for dynamic languages.

Comments Off on Omri on SOAP and WCF

Filed under Everything, Implementation, SOAP, Tech, XOM

July 18, 2006

A look at Web services support at Microsoft

A “high-level overview of Microsoft support for Web services across its product offerings” was recently published at MSDN. At times it sounds a bit like a superficial laundry list of specs (it really doesn’t mean much to say that a product “supports” a spec even though we all often resort to such vague statement). Also, the screen captures are not very informative. But considering the breath of material and the stated goal of providing a high-level overview this is a nice document. And if I imagine myself in the position of trying to write a similar document for HP, my appreciation for the work that went into it rises quickly.

Having all this listed in one place points out one disappointing aspect: the disconnect between Web services usage for “management and devices” and everything else Web services. If you look at the table at the bottom of the MSDN article, there is no checkbox for any management or device Web service technology in the ASMX 2.0, WSE 2.0, WSE 3.0 or WCF columns. These technologies only appear in the guts of the OS. So Vista can interact with devices using WS-Eventing, WS-Discovery and the Web services device profile, but I guess Visual Studio developers are not supposed to want to do anything with these interfaces. Similarly, Windows Server R2 provides access to its manageability information using WS-Management (actually, AFAIK it’s still an older, pre-standard version of WS-Management but we’ll pass on that), WS-Transfer, WS-Enumeration and WS-Eventing but the Visual Studio developers are again on their own to take advantage of this to write applications that take advantage of these capabilities.

This is disappointing because one of the most interesting aspects of using Web services for management is to ease integration between IT management and business applications, a necessary condition in order to make the former more aligned with the later. By using SOAP for both we are getting a bit closer than when one was SNMP and the other was RMI, but we won’t get to the end goal if there is no interoperability above the SOAP layer.

Hopefully this is only a “point in time” problem and we will soon see better support for Web services technologies used in management in the general Web services stack.

The larger question of course is that of the applicability (or lack of applicability) of generic XML transfer mechanisms (like WS-Transfer, WS-Eventing and WS-Enumeration) outside of the resource management domain. That’s a topic for a later post.

Comments Off on A look at Web services support at Microsoft

Filed under Everything, Implementation, Tech

July 5, 2006

RDF to XML tools for everyone in 2010?

As has been abundantly commented on, a lot of the tool/runtime support for XML development is centered on mapping XML to objects. The assumption being that developers are familiar with OO and not XML and so tools provide value by allowing developers to work in the environment they are most comfortable in. Of course, little by little it becomes obvious that this “help” is not necessarily that helpful and that if the processing of XML document is core to the application then the developer is much better off learning XML concepts and working with them.

The question is, what will happen to the tools once we move beyond XML as the key representation. XML might still very well be around as the serialization (it sure makes transformations easy), but in many domains (IT management for one), we’ll have to go beyond XML for the semantics. Relying on containment and order is a very crude mechanism, and one that can’t be extended very well despite what the X in XML is supposed to stand for. Let’s assume that we move to something like RDF and that by that point most developers are comfortable with XML. Who wants to bet that tools would show up that try to prevent developers from having to learn RDF concepts and instead find twisted ways to process RDF statements as one big XML document, using the likes of XPath, XSLT and XQuery instead of SPARQL?

All the trials and errors around Web services tools in Java (especially) made it very clear how tools can hold you back just as much as they can help you move forward.

Comments Off on RDF to XML tools for everyone in 2010?

Filed under Everything, Implementation, Tech

May 23, 2006

Just-in-scope (or just-if-i-care) validation

I started my Christmas wish list early this year. In January I described how I would like a development tool that tests XPath statements not just against instance documents but also against the schema. Since I am planning to be good this year for a change, I might get away with a second item on my wish list, so here it is. It’s the runtime companion of my January design-time request.

Once I have written an application that consumes messages by providing XPath expressions to retrieve the parts I care about, I would like to have the runtime validate that the incoming messages are compatible with the app. But not reject messages that I could process. One approach would be to schema-validate all incoming messages and reject the messages that fail. Assuming that I validated my XPath statements against the schema using the tool from my January wish, this should protect me. But this might also reject messages that I would be able to process. For example, even if the schema does not allow element extensibility at the end of the document, it shouldn’t break my application if the incoming message does contain an extra element at the end, if all I do with the message is retrieve the value of a well-defined foo/bar element. So what I would like is a runtime that is able to compare the incoming message with the schema I used and reject it only if the deviations from the schema are in locations that can possibly be reached by my application through the XPath statements it uses to access the message. Otherwise allow the message to be processed.

Steve, could Alpine do that?

Comments Off on Just-in-scope (or just-if-i-care) validation

Filed under Everything, Implementation, Tech

January 24, 2006

Schema-based XPath tool

Most XML editors offer an XPath tool that allows one to test and fine-tune XPath expressions by running them against XML documents. Very helpful but also potentially very deceptive. With such a tool it is very easy to convince oneself that an XPath expression is correct after running it against a few instance documents. And a month later the application behave erratically (in many cases it probably won’t break it will execute the request on the wrong element which is worst) because the XPath expression is ran on a different document and what it returns is not what the programmer had in mind. This is especially likely to occur as people use and abuse shortcuts such as “//” and ignore namespaces.

What we need is an XPath tool that can run not only run the XPath against an instance document but can also run it against a schema. In the later case, the tool would flag any portion of the schema that can possibly correspond to a node in the resulting nodeset. It would force programmers to realize that their //bar can match the /foo/bar that they want to reach but it could also match something that falls under the xsd:any at the end of the schema. And the programmer has to deal with that.

1 Comment

Filed under Everything, Implementation, Tech

January 23, 2006

The joys of code-generated WSDL

I recently ran into a WSDL which included an operation called “main”. The type of the request message for the operation was of course:

Command line over SOAP. Nice…

Comments Off on The joys of code-generated WSDL

Filed under Everything, Implementation

August 5, 2005

Apache WSRF, Pubscribe and Muse v1.0 Releases

The WSRF, Pubscribe and Muse teams at Apache have reached a major milestone in their work: version 1.0 release. Congrats to the teams! Binary and source distributions can be downloaded from:

WSRF: http://www.apache.org/dist/ws/wsrf/1.0
Pubscribe: http://www.apache.org/dist/ws/pubscribe/1.0
Muse: http://www.apache.org/dist/ws/muse/1.0

Comments Off on Apache WSRF, Pubscribe and Muse v1.0 Releases

Filed under Everything, Implementation

June 21, 2005

New names for Apache projects

As part of the move out of incubation into full-fledged Apache projects, the WSRF, WS-Notif and WSDM MUWS implementations in Apache have seen some name and URL changes. So here is the new list with the correct links:

Apache WSRF: implementation of the WSRF family of specs
Apache Pubscribe: implementation of the WS-Notification specifications
Apache Muse: implementation of the WSDM MUWS specification

Comments Off on New names for Apache projects

Filed under Everything, Implementation

June 20, 2005

So you want to build an EPR?

EPR (Endpoint References, from WS-Addressing) are a shiny and exciting toy. But a sharp one too. So here is my contribution to try to prevent fingers from being cut and eyes from being poked out.

So far I have seen EPRs used for five main reasons, not all of them very inspired:

1) “Dispatching on URIs is not cool”

Some tools make it hard to dispatch on URI. As a result, when you have many instances of the same service, it is easier to write the service if the instance ID is in the message rather than in the endpoint URI. Fix the tools? Nah, let’s modify the messages instead. I guess that’s what happens when tool vendors drive the standards, you see specifications that fit the tools rather than the contrary. So EPRs are used to put information that should be in the URI in headers instead. REST-heads see this as a capital crime. I am not convinced it is so harmful in practice, but it is definitely not a satisfying justification for EPRs.

2) “I don’t want to send a WSDL doc around for just the endpoint URI”

People seem to have this notion that the WSDL is a “big and static” document and the EPR is a “small and dynamic” document. But WSDL was designed to allow design-time and run-time elements to be separated if needed. If all you want to send around is the URI at which the service is available, you can just send the URI. Or, if you want it wrapped, why not send a soap:address element (assuming the binding is well-known). After all, in many cases EPRs don’t contain the optional service element and its port attribute. If the binding is not known and you want to specify it, send a around a wsdl:port element which contains the soap:address as well as the QName of the binding. And if you want to be able to include several ports (for example to offer multiple transports) or use the wsdl:import mechanism to point to the binding and portType, then ship around a simplified wsdl:descriptions with only one service that itself contains the port(s) (if I remember correctly, WS-MessageDelivery tried to formalize this approach by calling a WSRef a wsdl:service element where all the ports use the same portType). And you can hang metadata off of a service element just as well as off of an EPR.

For some reason people are happy sending an EPR that contains only the address of the endpoint but not comfortable with sending a piece of WSDL of the same size that says the same thing. Again, not a huge deal now that people seem to have settled on using EPRs rather than service elements, but clearly not a satisfying justification for inventing EPRs in the first place.

3) “I can manage contexts without thinking about it”

Dynamically generated EPRs can be used as a replacement for an explicit context mechanism, such as those provided by WS-Context and WS-Coordination. By using EPRs for this, you save yourself the expense of supporting yet-another-spec. What do you loose? This paper gives you a detailed answer (it focuses on comparing EPRs to WS-Context rather than WS-Coordination for pretty obvious reasons, but I assume that on a purely technical level the authors would also recommend WS-Coordination over EPRs, right Greg?). In a shorter and simplified way, my take on the reason why you want to be careful using dynamic EPRs for context is that by doing so you merge the context identifier on the one hand and the endpoint with which you use this context on the other hand into one entity. Once this is done you can’t reliably separate them and you loose potentially valuable information. For example, assume that your company buys from a bunch of suppliers and for each purchase you get an EPR that allows you to track the purchase as it is shipped. These EPRs are essentially one blob to you and the only way to know which one comes through FedEx versus UPS is to look at the address and try to guess based on the domain name. But you are at the mercy of any kind of redirection or load-balancing or other infrastructure reason that might modify the address. That’s not a problem if all you care about is checking the ETA on the shipment, each EPR gives you enough information to do that. But if you also want to consolidate the orders that UPS is delivering to you or if you read in the paper about a potential UPS drivers strike and want to see how it would impact you, it would be nice to have each shipment be an explicit context Id associated to a real service (UPS or FedEx), rather than a mix of both at the same time. This way you can also go to UPS.com, ask about your shipments and easily map each entry returned to an existing shipment you are tracking. With EPRs rather than explicit context you can’t do this without additional agreements.

The ironic thing is that the kind of mess one can get into by using dynamic EPRs too widely instead of explicit context is very similar in nature to the management problems HP OpenView software solves. Discovery of resources, building relationship trees, impact analysis, event correlation, etc. We do it by using both nicely-designed protocols/models (the clean way) and by using heuristics and other hacks when needed. We do what it takes to make sense of the customer’s system. So we could just as well help you manage your shipments even if they were modeled as EPRs (in this example). But we’d rather work on solving existing problems and open new possibilities than fix problems that can be avoided. And BTW using dynamic EPRs is not always bad. Explicit contexts are sometimes overkill. But keep in mind that you are loosing data by bundling the context with the endpoint. Actually, more than loosing data, you are loosing structure in your data. And these days the gold is less in the raw data than in its structure and the understanding you have of it.

4) “I use reference parameters to create new protocols, isn’t that cool!”

No it’s not. If you want to define a SOAP header, go ahead: define an XML element and then describe the semantics associated with this element when it appears as a SOAP header. But why oh why define it as a “reference parameter” (or “reference property” depending on your version of WS-A)? The whole point of an EPR is to be passed around. If you are going to build the SOAP message locally, you don’t need to first build an EPR and then deconstruct it to extract the reference parameters out of it and insert them as SOAP headers. Just build the SOAP message by putting in the SOAP headers you know are needed. If your tooling requires going through an EPR to build the SOAP message, fine, that’s your problem, but don’t force this view on people who may want to use your protocol. For example, one can argue for or against the value of WS-Management‘s System and SelectorSet as SOAP headers, but it doesn’t make sense to define those as reference parameters rather than just SOAP headers (readers of this blog already know that I am the editor of the WSDM MUWS OASIS standard with which WS-Management overlaps so go ahead and question my motives for picking on WS-Management). Once they are defined as SOAP headers, one can make the adventurous decision to hard-code them in EPRs and to send the EPRs to someone else. But that’s a completely orthogonal decision (and the topic of the fifth way EPRs are used – see below). But using EPRs to define protocols is definitely not a justification for EPRs and one would have a strong case to argue that it violates the opacity of reference parameters specified in WS-Addressing.

5) “Look what I can do by hard-coding headers!”

The whole point of reference parameters is to make people include elements that they don’t understand in their SOAP headers (I don’t buy the multi-protocol aspect of WS-Addressing, as far as I am concerned it’s a SOAP thing). This mechanism is designed to open a door to hacking. Both in the good sense of the term (hacking as a clever use of technology, such as displaying Craig’s list rental data on top of Google maps without Craig’s List or Google having to know about it), and in the bad sense of the term (getting things to happen that you should not be able to make happen). Here is an example of good use for reference parameters: if the Google search SOAP input message accepted a header that specifies what site to limit the search on (equivalent to adding “site:vambenepe.com” in the Google text box on Google.com), I could distribute to people an EPR to the vambenepe.com search service by just giving them an EPR pointing to the Google search service and adding a reference parameter that corresponds to the header instructing Google to limit the search to vambenepe.com.

Some believe this is inherently evil and should be stopped, as expressed in this formal objection. I think this is a useful mechanism (to be used rarely and carefully) and I would like to see it survive. But there are two risks associated with this mechanism that people need to understand.

The first risk is that EPRs allow people to trick others into making statements that they don’t know they are making. This is explained in the formal objection from Anish and friends as their problem #1 (“Safety and Security”) and I agree with their description. But I don’t agree with the proposed solutions as they prevent reference parameters to be treated by the service like any other SOAP header. Back last November I made an alternative proposal, using a wsa:CoverMyRearside element that would not have this drawback and I know other people have made similar proposals. In any case, this risk can and should be addressed by the working group before the specification becomes a Recommendation or people will stop accepting to process reference parameters after a few high-profile hacks. Reference parameters will become the ActiveX of SOAP.

The second risk is more subtle and that one cannot be addressed by the specification. It is the fragility that will result from applications that share too many assumptions. I get suspicious when someone gives me directions to their house with instructions such as “turn left after the blue van” or “turn right after the barking dog”, don’t you? “We’re the house after the green barn” is a little better but what if I want to re-use these directions a few years later. What’s the chance that the barn will be replaced or repainted? EPRs that contain reference parameters pose the same problem. Once you’ve sent the EPR, you don’t know how long it will be around, you don’t know who it will get forwarded to, you don’t know what the consumer will know. You need to spend at least as much efforts picking what data you use as a reference parameter (if anything) as you spend designing schemas and WSDL documents. If your organization is smart enough to have a process to validate schemas (and you need that), that same process should approve any element that is put in a reference parameter.

Or you’ll poke your eye out.

2 Comments

Filed under Everything, Implementation, Security, Standards, Tech

June 8, 2005

Apollo, Hermes, Muse out of incubation at Apache

Apollo (WS-ResourceProperties open source implementation), Hermes (WS-Notification open source implementation) and Muse (WSDM MUWS open source implementation) are now full Apache projects, out of incubation mode. Congrats Ian and Sal!

Comments Off on Apollo, Hermes, Muse out of incubation at Apache

Filed under Everything, Implementation

February 9, 2005

WSDM and WSRF progress in Apache

The Apache Muse and Apollo teams put out their first releases today:

Muse v0.5 alpha (WSDM MUWS implementation)
Apollo v1 alpha (WS-ResourceProperties and WS-Lifetime implementation)

Congratulations to the teams! Looking forward to the interop sessions.

Comments Off on WSDM and WSRF progress in Apache

Filed under Everything, Implementation

November 24, 2004

Globus toolkit in Apache?

As Savas already noted (and commented on), Globus seems to intend to contribute all of their toolkit to Apache (warning, this is a link to a Word doc). This follows an earlier co-submission, between HP and Globus, of the WSRF and WSN implementation. It will be interesting to see how the Web services project at Apache scales up with all these contributions.

Comments Off on Globus toolkit in Apache?

Filed under Everything, Implementation

November 9, 2004

Can you hear the muse?

Check out this proposal. It brings three new incubator projects to the Web services activity in Apache. And they come with working code.

The projects are:

Muse, an open source implementation of WSDM MUWS. With existing code contributed by HP.
Apollo, an open source implementation of the WS-ResourceFramework (WSRF) specifications. With existing code contributed by HP and Globus.
Hermes, an open source implementation of the WS-Notification specifications. With existing code contributed by HP and Globus. You’ll also notice in the description of this project that it includes support of WS-Eventing (but this is not included in the contributed code).

Comments Off on Can you hear the muse?

Filed under Everything, Implementation