As promised to Felix here is the code that shows how to give extra breathing room to Google App Engine (GAE) requests that may otherwise be killed for taking too long to complete. The approach is similar to the one previously described. But rather than trying to emulate a long-running process, I am simply allowing a request to spread its work over a handful of invocations, thus getting several 9 seconds slots (since this seems to be how much time GAE gives you per request right now).
If all your requests need this then you are going to run into the same “high CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled” problem seen in the previous experiment. But if 90% of your requests complete in a normal time and only 10% of the requests need more time, then this approach can help prevent your users from getting an error for 1 out of every 10 requests. And you should fly under the radar of the GAE resource cop.
The way it works is simply that if your request is interrupted for having run too long the client gets a redirect to a new instance of the same handler. Because the code saves its results incrementally in the datastore, the new instance can build on the work of the previous one.
This specific example retrieves the ubuntu-8.04-server-i386.jigdo file (98K) from a handful of Ubuntu mirrors and returns the average/min/max download times (without checking if the transfer was successful or not). I also had to add a 1 second sleep after each fetch in order to trigger the DeadlineExceededError because the fetch operations go too quickly when running on GAE rather than my machine (I guess Google has better connectivity than my mediocre AT&T-provided DSL line, who would have thought).
#!/usr/bin/env python # # Copyright 2008 William Vambenepe # import wsgiref.handlers import os import logging import time from google.appengine.ext import db from google.appengine.ext.webapp import template from google.appengine.ext import webapp from google.appengine.api import urlfetch from google.appengine.runtime import DeadlineExceededError targetUrls = ["http://mirror.anl.gov/pub/ubuntu-iso/CDs/hardy/ubuntu-8.04-server-i386.jigdo", "http://ubuntu.mirror.ac.za/ubuntu-release/hardy/ubuntu-8.04-server-i386.jigdo", "http://mirrors.cytanet.com.cy/linux/ubuntu/releases/hardy/ubuntu-8.04-server-i386.jigdo", "http://ftp.kaist.ac.kr/pub/ubuntu-cd/hardy/ubuntu-8.04-server-i386.jigdo", "http://ftp.itu.edu.tr/Mirror/Ubuntu/hardy/ubuntu-8.04-server-i386.jigdo", "http://ftp.belnet.be/mirror/ubuntu.com/releases/hardy/ubuntu-8.04-server-i386.jigdo", "http://ubuntu-releases.sh.cvut.cz/hardy/ubuntu-8.04-server-i386.jigdo", "http://ftp.crihan.fr/releases/hardy/ubuntu-8.04-server-i386.jigdo", "http://ftp.uni-kl.de/pub/linux/ubuntu.iso/hardy/ubuntu-8.04-server-i386.jigdo", "http://ftp.duth.gr/pub/ubuntu-releases/hardy/ubuntu-8.04-server-i386.jigdo", "http://no.releases.ubuntu.com/hardy/ubuntu-8.04-server-i386.jigdo", "http://neacm.fe.up.pt/pub/ubuntu-releases/hardy/ubuntu-8.04-server-i386.jigdo"] class MeasurementSet(db.Model): iteration = db.IntegerProperty() measurements = db.ListProperty(float) class MainHandler(webapp.RequestHandler): def get(self): try: key = self.request.get("key") set = MeasurementSet.get(key) if (set == None): raise ValueError set.iteration = set.iteration + 1 set.put() logging.debug("Resuming existing set, with key " + str(key)) except: set = MeasurementSet() set.iteration = 1 set.measurements = [] set.put() logging.debug("Starting new set, with key " + str(set.key())) try: # Dereference remaining URLs for target in targetUrls[len(set.measurements):]: startTime = time.time() urlfetch.fetch(target) timeElapsed = time.time() - startTime time.sleep(1) logging.debug(target + " dereferenced in " + str(timeElapsed) + " sec") set.measurements.append(timeElapsed) set.put() # We're done dereferencing URLs, let's publish the results longestIndex = 0 shortestIndex = 0 totalTime = set.measurements[0] for i in range(1, len(targetUrls)): totalTime = totalTime + set.measurements[i] if set.measurements[i] < set.measurements[shortestIndex]: shortestIndex = i elif set.measurements[i] > set.measurements[longestIndex]: longestIndex = i template_values = {"iterations": set.iteration, "longestTime": set.measurements[longestIndex], "longestTarget": targetUrls[longestIndex], "shortestTime": set.measurements[shortestIndex], "shortestTarget": targetUrls[shortestIndex], "average": totalTime/len(targetUrls)} path = os.path.join(os.path.dirname(__file__), "steps.html") self.response.out.write(template.render(path, template_values)) logging.debug("Set with key " + str(set.key()) + " has returned") except DeadlineExceededError: logging.debug("Set with key " + str(set.key()) + " interrupted during iteration "+ str(set.iteration) + " with " + str(len(set.measurements)) + " URLs retrieved") self.redirect("/steps?key=" + str(set.key())) logging.debug("Set with key " + str(set.key()) + " sent redirection") def main(): application = webapp.WSGIApplication([("/steps", MainHandler)], debug=True) wsgiref.handlers.CGIHandler().run(application) if __name__ == "__main__": main()
I can’t guarantee I will keep it available, but at the time of this writing the application is deployed here if you want to give it a spin. A typical run produces this kind of log:
06-13 01:44AM 36.814 /steps XX.XX.XX.XX - - [13/06/2008:01:44:45 -0700] "GET /steps HTTP/1.1" 302 0 - - D 06-13 01:44AM 36.847 Starting new set, with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM D 06-13 01:44AM 37.870 http://mirror.anl.gov/pub/ubuntu-iso/CDs/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.022078037262 sec D 06-13 01:44AM 38.913 http://ubuntu.mirror.ac.za/ubuntu-release/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0184168815613 sec D 06-13 01:44AM 39.962 http://mirrors.cytanet.com.cy/linux/ubuntu/releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0166189670563 sec D 06-13 01:44AM 41.12 http://ftp.kaist.ac.kr/pub/ubuntu-cd/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0205371379852 sec D 06-13 01:44AM 42.103 http://ftp.itu.edu.tr/Mirror/Ubuntu/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0197179317474 sec D 06-13 01:44AM 43.146 http://ftp.belnet.be/mirror/ubuntu.com/releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0171189308167 sec D 06-13 01:44AM 44.215 http://ubuntu-releases.sh.cvut.cz/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0160200595856 sec D 06-13 01:44AM 45.256 http://ftp.crihan.fr/releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.015625 sec D 06-13 01:44AM 45.805 Set with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM interrupted during iteration 1 with 8 URLs retrieved D 06-13 01:44AM 45.806 Set with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM sent redirection W 06-13 01:44AM 45.808 This request used a high amount of CPU, and was roughly 28.5 times over the average request CPU limit. High CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled.
Followed by:
06-13 01:44AM 46.72 /steps?key=agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM XX.XX.XX.XX - - [13/06/2008:01:44:50 -0700] "GET /steps?key=agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM HTTP/1.1" 200 472 D 06-13 01:44AM 46.110 Resuming existing set, with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM D 06-13 01:44AM 47.128 http://ftp.uni-kl.de/pub/linux/ubuntu.iso/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.016991853714 sec D 06-13 01:44AM 48.177 http://ftp.duth.gr/pub/ubuntu-releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0238039493561 sec D 06-13 01:44AM 49.318 http://no.releases.ubuntu.com/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0177929401398 sec D 06-13 01:44AM 50.378 http://neacm.fe.up.pt/pub/ubuntu-releases/hardy/ubuntu-8.04-server-i386.jigdo dereferenced in 0.0226020812988 sec D 06-13 01:44AM 50.410 Set with key agN2YnByFAsSDk1lYXN1cmVtZW50U2V0GBoM has returned W 06-13 01:44AM 50.413 This request used a high amount of CPU, and was roughly 13.4 times over the average request CPU limit. High CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled.
I believe we can optimize the performance by taking advantage of the fact that successive requests are likely (but not guaranteed) to hit the same instance, allowing global variables to be re-used rather than always going to the datastore. My code is a proof of concept, not an optimized implementation.
Of course, the alternative is to drive things from the client, using JavaScript HTTP requests (rather than HTTP redirect) to repeat the HTTP request until the work has been completed. The list of pros and cons of each approach is left as an exercise to the reader.
[UPDATED 2008/6/13: Added log output. Removed handling of “OverQuotaError” which was not useful since, unlike “DeadlineExceededError”, quotas are not per-request. As a result, splitting the work over multiple requests doesn’t help. Slowing down a request might help, at which point the approach above might come in handy to prevent this slowdown from triggering “DeadlineExceededError”.]
[UPDATED 2008/6/30: Steve Jones provides an interesting analysis of the cut-off time for GAE. Confirms that it’s mainly based on wall-clock time rather than CPU time. And that you can sometimes go just over 9 seconds but never up to 10 seconds, which is consistent with my (much less detailed and rigorous) observations.]
Pingback: William Vambenepe — Nice incremental progress in Google App Engine SDK 1.4