GAE Traffic Splitting

Interesting addition to the Google App Engine (GAE) platform in release 1.6.3:  Traffic Splitting lets you run several versions of your application (using a DNS sub-domain for each version) and choose to direct a certain percentage of requests to a specific version. This lets you, among other things, slowly phase in your updates and test the result on a small set of users.

That’s nice, but until I read the documentation for the feature I had assumed (and hoped) it was something else.

Rather than using traffic splitting to test different versions of my app (something which the platform now makes convenient but which I could have implemented on my own), it would be nice if that mechanism could be used to test updates to the GAE platform itself. As described in “Come for the PaaS Functional Model, stay for the Cloud Operational Model“, it’s wishful thinking to assume that changes to the PaaS platform (an update applied by Google admins) cannot have a negative effect on your application. In other words, “When NoOps meets Murphy’s Law, my money is on Murphy“.

What would be nice is if Google could give application owners advanced warning of a platform change and let them use the Traffic Splitting feature to direct a portion of the incoming requests to application instances running on the new platform. And also a way to include the platform version in all log messages.

Here’s the issue as I described it in the aforementioned “Cloud Operational Model” post:

In other words, if a patch or update is worth testing in a staging environment if you were to apply it on-premise, what makes you think that it’s less likely to cause a problem if it’s the Cloud provider who rolls it out? Sure, in most cases it will work just fine and you can sing the praise of “NoOps”. Until the day when things go wrong, your users are affected and you’re taken completely off-guard. Good luck debugging that problem, when you don’t even know that an infrastructure change is being rolled out and when it might not even have been rolled out uniformly across all instances of your application.

How is that handled in your provider’s Operational Model? Do you have visibility into the change schedule? Do you have the option to test your application on the new infrastructure or to at least influence in any way how and when the change gets rolled out to your instances?

Hopefully, the addition of Traffic Splitting to Google App Engine is a step towards addressing that issue.


Filed under Application Mgmt, Automation, Cloud Computing, DevOps, Everything, Google, Google App Engine, Utility computing

2 Responses to GAE Traffic Splitting

  1. Peter Knego

    GAE is exceptionally backwards compatible: they never change existing APIs (just add new ones for new features). So by design they never break your code, thay just upgrade service under the hood, but keep itmbackwards compatible. In my three years use of GAE it never happened that an update would break existing code.

    Did it ever happened to you that an upgrade to GAE broke your code? I think you are whining about a problet that does not exist.

    Also, if they had multiple versions of services, where users could upgrade at their lesure, they would end up with a maintenance nightmare, as some customers would never upgrade.

  2. @vambenepe

    Hi Peter,

    Bad timing for your comment that “GAE is exceptionally backwards compatible: they never change existing APIs”. Just today they announced that they are deprecating the Master/Slave Datastore and that they “strongly encourage you to migrate all your applications as soon as possible”.

    That being said, I agree with your larger point. This is a pretty rare event for GAE and indeed they rarely take features away from the runtime contract or change them in a way that requires a code change.

    But that’s not the issue I’m worried about in this post. It’s the backward-compatible changes I’m worried about. Because really they should be called “theoretically backward-compatible” changes, until they’ve been tested against my specific app. That’s what I’m asking for.

    And I agree that you can’t let app owner delays platform upgrades forever. At some point, you need to force them to move. But it would be nice to give them a time window during which they can test, so they can report bugs to you before they are forced to jump.