William Vambenepe's blog

IT management in a changing IT world

Repeatedly facing an object of one's fears may also buy discount online viagra the fight-or-flight response with respect to that stimulus -- e.In the Republic of Georgia, the Eliava Institute of Bacteriophage, drug generic viagraWatson, Italy (Salvador Luria), Switzerland (Alexandre Yersin), viagra online order (Kitasato Shibasaburo), and France (Jean-Martin Charcot, Claude Bernard, Paul Broca and others did significant work.At present, nearly all strains are resistant to penicillin, and many are viagra sale uk to nafcillin, leaving only a narrow selection of drugs such as vancomycin useful for treatment.The quality of the patient-physician relationship is generic viagra review to both parties.

13
Feb
2009

Long-running processes on Google App Engine: it finally works

by William (@vambenepe on Twitter)

I am probably taking things a bit too personally, but I feel like I just successfully guilt-tripped the Google App Engine (GAE) team. Just last night I was complaining that they were teasing me (supporting urllib2, but an older version, without timeout support). And tonight I noticed a new post on their blog, announcing the end of these “High CPU Requests” that have been the bane of my GAE experience.

The reason why I was looking for timeout support in the first place is to avoid generating these dreaded “High CPU Requests” that quickly result in your application being disabled. It’s all explained here and  here.

But now that they are gone, I don’t need timeouts anymore. Around 11:00PM Pacific (on Thursday 2/12) I restarted my application. All it does is create 5 new simple entities in the datastore, one every second (it sleeps for one second between each entry). Then it spawns its successor, which will do the same thing, ad vitam aeternam. The application’s name, “rere”, is short for “request relay”, the pattern used to emulate a long running process. The default page for the app (available here) just returns a list of the 30 last entities created. The point is to illustrate that a single original request spawned an ever-lasting computing task on GAE.

Here is the code:

#!/usr/bin/env python
#
# Copyright 2009 William Vambenepe
#

import wsgiref.handlers
import os
import logging
import time

from google.appengine.ext import db
from google.appengine.ext.webapp import template
from google.appengine.ext import webapp
from google.appengine.api import urlfetch

numberOfHeartbeatsViewed = 30
secondsDurationOfTaskWait = 1
numberOfTasksPerRequest = 5

class HeartBeat(db.Model):
  requestId = db.IntegerProperty()
  date = db.DateTimeProperty(auto_now_add=True)

# The mere existence of an instance of this class in the DB means that the relay has to stop.
class StopExec(db.Model):
  date = db.DateTimeProperty(auto_now_add=True)

class MainHandler(webapp.RequestHandler):
  def get(self):
    hbs = HeartBeat.all().order("-date").fetch(numberOfHeartbeatsViewed)
    template_values = {"hbs": hbs}
    path = os.path.join(os.path.dirname(__file__), "index.html")
    self.response.out.write(template.render(path, template_values))

class StartHandler(webapp.RequestHandler):
  def get(self):
    if (StopExec.all().count() == 0):
      try:
        id = int(self.request.get("id"))
      except (TypeError, ValueError):
        id = 0
      try:
        logging.debug("Request " + str(id) + " launching background task.")
        loopCount = 0
        while(loopCount < numberOfTasksPerRequest):
          hb = HeartBeat()
          hb.requestId = id
          hb.put()
          logging.debug("Request " + str(id) + " saved heartbeat #" + str(loopCount))
          time.sleep(secondsDurationOfTaskWait)
          loopCount = loopCount+1
      finally:
        logging.debug("Launching successor request with id=" + str(id+1))

# This silly back and forth between the two URLs is because of
# "App cannot fetch the same URL as the one used for the request" error.
        if (self.request.url.find("start2") == -1):
          urlfetch.fetch("http://localhost/start2?id=" + str(id+1))
        else:
          urlfetch.fetch("http://localhost/start?id=" + str(id+1))
        logging.debug("Request " + str(id) + " completed")

def main():
  application = webapp.WSGIApplication([("/", MainHandler), ("/start", StartHandler), ("/start2", StartHandler)], debug=True)
  wsgiref.handlers.CGIHandler().run(application)

if __name__ == "__main__":
  main()

One thing I had to change from the earlier version (written using version 1.1.0 of the GAE SDK) is that urlfetch now returns an error if your app tries to invoke itself at the same URL (“App cannot fetch the same URL as the one used for the request”). So I have to alternate between http://localhost/start and http://localhost/start2, both of which are mapped to the same handler. This was added sometimes between SDK 1.1.0 and SDK 1.1.9. If it is aimed at preventing the kind of batton-passing that I am doing, it is pretty unefficient considering how easy it is to circumvent.

It is now 1:02AM Pacific the next day (Friday 2/13) and the process is still progressing, based on the single HTTP request I sent to it at 11:00PM the previous evening. The result page currently returns:

  • From request # 1056, with date Fri, 13 Feb 2009 09:02:15 +0000
  • From request # 1056, with date Fri, 13 Feb 2009 09:02:14 +0000
  • From request # 1056, with date Fri, 13 Feb 2009 09:02:13 +0000
  • From request # 1056, with date Fri, 13 Feb 2009 09:02:12 +0000
  • From request # 1056, with date Fri, 13 Feb 2009 09:02:11 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:10 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:09 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:08 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:07 +0000
  • From request # 1055, with date Fri, 13 Feb 2009 09:02:06 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:05 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:04 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:03 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:02 +0000
  • From request # 1054, with date Fri, 13 Feb 2009 09:02:01 +0000
  • From request # 1053, with date Fri, 13 Feb 2009 09:01:59 +0000
  • From request # 1053, with date Fri, 13 Feb 2009 09:01:58 +0000

Which shows that 1056 successive requests have participated in the relay (the last one just happened, at 09:02:15 UTC which is 1:02AM Pacific).

Hopefully it will still be running when I wake up tomorrow.

[UPDATED 2009/2/13, 9:08AM Pacific: It's alive!

  • From request # 6411, with date Fri, 13 Feb 2009 17:08:45 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:44 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:43 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:42 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:41 +0000
  • From request # 6410, with date Fri, 13 Feb 2009 17:08:40 +0000
  • From request # 6409, with date Fri, 13 Feb 2009 17:08:39 +0000

BTW, the code provided uses localhost to run on my local machine. The version uploaded to Google of course replaces this with rere.appspot.com.]

[UPDATED 2009/5/1: For some reason this entry is attracting a lot of comment spam, so I am disabling comments. Contact me if you'd like to comment.]

Related posts:

  1. Emulating a long-running process (and a scheduler) in Google App Engine
  2. Some breathing room for Google App Engine requests
  3. Google App Engine is teasing me
  4. Google App Engine: less is more
  5. Is Business Process Execution the killer app for PaaS?
  6. Cloud APIs need to be complemented by Cloud processes
AddThis Social Bookmark Button Follow @vambenepe on Twitter.

4 Responses to “Long-running processes on Google App Engine: it finally works”

  1. Cloud Droplets #59 - Cloudveland | IT Management and Cloud Blog Says:

    [...] GAE and Long Running Processes [...]

  2. Joe Gregorio Says:

    The restriction on requesting the same URI is there to prevent accidental infinite loops, not intentional ones.

    -joe

  3. William Vambenepe Says:

    Thanks for the explanation Joe. Maybe the urlfetch API could have an optional “yes I really mean to invoke myself” parameter then.

  4. ray malone Says:

    Thank you for this. I’m using this on my site. The process has been running for several days now without flaw. The only change i made was to Try: the Fetch of the url since it times out and fills up your error log. Other than that this worked really well. Thanks