Category Archives: Google Cloud Platform

Big Data in the Cloud at Google I/O

Last week was a great party for the entire Google developer family, including Google Cloud Platform. And within the Cloud Platform, Big Data processing services. Which is where my focus has been in the almost two years I’ve been at Google.

It started with a bang, when our fearless leader Urs unveiled Cloud Dataflow in the keynote. Supported by a very timely demo (streaming analytics for a World Cup game) by my colleague Eric.

After the keynote, we had three live sessions:

In “Big Data, the Cloud Way“, I gave an overview of the main large-scale data processing services on Google Cloud:

  • Cloud Pub/Sub, a newly-announced service which provides reliable, many-to-many, asynchronous messaging,
  • the aforementioned Cloud Dataflow, to implement data processing pipelines which can run either in streaming or batch mode,
  • BigQuery, an existing service for large-scale SQL-based data processing at interactive speed, and
  • support for Hadoop and Spark, making it very easy to deploy and use them “the Cloud Way”, well integrated with other storage and processing services of Google Cloud Platform.

The next day, in “The Dawn of Fast Data“, Marwa and Reuven described Cloud Dataflow in a lot more details, including code samples. They showed how to easily construct a streaming pipeline which keeps a constantly-updated lookup table of most popular Twitter hashtags for a given prefix. They also explained how Cloud Dataflow builds on over a decade of data processing innovation at Google to optimize processing pipelines and free users from the burden of deploying, configuring, tuning and managing the needed infrastructure. Just like Cloud Pub/Sub and BigQuery do for event handling and SQL analytics, respectively.

Later that afternoon, Felipe and Jordan showed how to build predictive models in “Predicting the future with the Google Cloud Platform“.

We had also prepared some recorded short presentations. To learn more about how easy and efficient it is to use Hadoop and Spark on Google Cloud Platform, you should listen to Dennis in “Open Source Data Analytics“. To learn more about block storage options (including SSD, both local and remote), listen to Jay in “Optimizing disk I/O in the cloud“.

It was gratifying to see well-informed people recognize the importance of these announcement and partners understand how this will benefit their customers. As well as some good press coverage.

It’s liberating to now be able to talk freely about recent progress on our quest to equip Google Cloud users with easy to use data processing tools. Everyone can benefit from Google’s experience making developers productive while efficiently processing data at large scale. With great power comes great productivity.

1 Comment

Filed under Big Data, BigQuery, Cloud Computing, Cloud Dataflow, Everything, Google Cloud Platform, Implementation, Open source, Query, Spark, Tech, Utility computing

Joining Google

Next Monday, I will start at Google, in the Cloud Platform team.

I’ve been watching that platform, and especially Google App Engine (GAE), since it started in 2008. It shaped my thoughts on Cloud Computing and on the tension between PaaS and IaaS. In my first post about GAE, 4.5 years ago, I wrote about that tension:

History is rarely kind to promoters of radical departures. The software industry is especially fond of layering the new on top of the old (a practice that has been enabled by the constant increase in underlying computing capacity). If you are wondering why your command prompt, shell terminal or text editor opens with a default width of 80 characters, take a trip back to 1928, when IBM defined its 80-columns punch card format. Will Google beat the odds or be forced to be more accommodating of existing code?

This debate (which I later characterized as “backward-compatible vs. forward-compatible”) is still ongoing. App Engine has grown a lot and shed its early limitations (I had a lot of fun trying to engineer around them in the early days). Google’s Cloud Platform today is also a lot more than App Engine, with Cloud Storage, Compute Engine, etc. It’s much more welcoming to existing applications.

The core question remains, however. How far, and how quickly will we move from the abstractions inherited from seeing the physical server as the natural unit of computation? What benefits will we derive from this transformation and will they make it worthwhile? Where’s the next point of equilibrium in the storm provoked by these shifts:

  • IT management technology was ripe for a change, applying to itself the automation capabilities that it had brought to other domains.
  • Software platforms were ripe for a change, as we keep discovering all the Web can be, all the data we can handle, and how best to take advantage of both.
  • The business of IT was ripe for a change, having grown too important to escape scrutiny of its inefficiency and sluggishness.

These three transformations didn’t have to take place at the same time. But they are, which leaves us with a fascinating multi-variable equation to optimize. I believe Google is the right place to crack this nut.

This is my view today, looking at the larger Cloud environment and observing Google’s Compute Platform from the outside. In a week’s time, I’ll be looking at it from the inside. October me may scoff at the naïveté of September me; or not. Either way, I’m looking forward to it.


Filed under Cloud Computing, Everything, Google, Google App Engine, Google Cloud Platform, People, Uncategorized, Utility computing

Google Compute Engine, the compete engine

Google is going to give Amazon AWS a run for its money. It’s the right move for Google and great news for everyone.

But that wasn’t plan A. Google was way ahead of everybody with a PaaS solution, Google App Engine, which was the embodiment of “forward compatibility” (rather than “backward compatibility”). I’m pretty sure that the plan, when they launched GAE in 2008, didn’t include “and in 2012 we’ll start offering raw VMs”. But GAE (and PaaS in general), while it made some inroads, failed to generate the level of adoption that many of us expected. Google smartly understood that they had to adjust.

“2012 will be the year of PaaS” returns 2,510 search results on Google, while “2012 will be the year of IaaS” returns only 2 results, both of which relate to a quote by Randy Bias which actually expresses quite a different feeling when read in full: “2012 will be the year of IaaS cloud failures”. We all got it wrong about the inexorable rise of PaaS in 2012.

But saying that, in 2012, IaaS still dominates PaaS, while not wrong, is an oversimplification.

At a more fine-grained level, Google Compute Engine is just another proof that the distinction between IaaS and PaaS was always artificial. The idea that you deploy your applications either at the IaaS or at the PaaS level was a fallacy. There is a continuum of application services, including VMs, various forms of storage, various levels of routing, various flavors of code hosting, various API-centric utility functions, etc. You can call one end of the spectrum “IaaS” and the other end “PaaS”, but most Cloud applications live in the continuum, not at either end. Amazon started from the left and moved to the right, Google is doing the opposite. Amazon’s initial approach was more successful at generating adoption. But it’s still early in the game.

As a side note, this is going to be a challenge for the Cloud Foundry ecosystem. To play in that league, Cloud Foundry has to either find a way to cover the full IaaS-to-PaaS continuum or it needs to efficiently integrate with more IaaS-centric Cloud frameworks. That will be a technical challenge, and also a political one. Or Cloud Foundry needs to define a separate space for itself. For example in Clouds which are centered around a strong SaaS offering and mainly work at higher levels of abstraction.

A few more thoughts:

  • If people still had lingering doubts about whether Google is serious about being a Cloud provider, the addition of Google Compute Engine (and, earlier, Google Cloud Storage) should put those to rest.
  • Here comes yet-another-IaaS API. And potentially a major one.
  • It’s quite a testament to what Linux has achieved that Google Compute Engine is Linux-only and nobody even bats an eye.
  • In the end, this may well turn into a battle of marketplaces more than a battle of Cloud environment. Just like in mobile.


Filed under Amazon, Cloud Computing, Everything, Google, Google App Engine, Google Cloud Platform, Utility computing