Integration patterns for social data: the Open Social Data Bus

The previous entry, “Don’t tell Facebook what you like, tell Twitter“, used Twitter and Facebook as examples to illustrate a general point about the integration of social profile data. Unfortunately, the examples may have overshadowed the larger point. In the post, I didn’t consider Twitter as a social network but as a message conduit. Most people on the other hand think of Twitter as a social network (after all, which Twitterer is not watching his/her follower count?) and could come out with the impression that I was just saying that Twitter is a better social network than Facebook. It wasn’t my point.

The main point is about defining the right integration pattern for social data: is it a “message bus” pattern or a “shared database” pattern. For readers who haven’t had the joy of dealing with integration architecture and enterprise integration patterns, here is a one-paragraph primer:

The expense report application in a company needs to be in sync with the data in the HR system, so that an expense report can be sent to the right manager for review/approval. Implementing such application integration in an efficient, resilient and flexible way is hard. Battle-tested approaches (high-level “patterns”) have emerged that have been successful, in the right context. Architects have learned that 99% of the time they are better off asking themselves which of these enterprise integration patterns is right for their problem, rather than trying to invent a new approach. Two of the most common basic patterns are the “shared database” pattern and the “message bus” pattern. In the “shared database” pattern, all the applications read and write to the same repository. In the “message bus” pattern, applications post messages on a shared channel (the “bus”) and also listen on the channel for messages from other applications that they are interested in. It’s similar to a radio channel of the kind used by police and ham radio operators.

(diagrams by Hohpe/Woolf, under cc license)

Facebook wants your social data to be shared across sites and applications using the “shared database” pattern, in which Facebook is the central database (and also the primary application). What I described in the previous post was the use of a “message bus” pattern (in which Twitter was used as the bus).

A bus has the following advantages when applied to the problem of sharing social data:

  • All applications have equal access
  • The applications are loosely-coupled, meaning that changing one doesn’t break the others
  • If applications only communicate via the bus, you get to observe the data shared about you
  • It can scale well

There are lots of interesting considerations about how to build and operate such a bus: security, scalability, access protocols, payload format, etc. But they are secondary to the choice of the integration pattern. For the sake of illustration, Twitter’s approach to security is OAuth, their scalable architecture is described here, the access protocols here and the payload format here. Reasonable alternatives exist for all these functions.

It’s hard for me to imagine the content of the messages on this bus not resembling RDF-like subject/verb/object triplets, in which the subject is implicit (the user attached to the message). The verbs could be simple strings or represented by URIs and have an associated taxonomy. And as in RDF, the objects should be either URIs or simple values (mostly strings, of a limited size, be it 140 characters or something else). Possible examples (the subject is implicit, the verb is in square brackets):

[say] I just had coffeecake for breakfast
[like] http://www.hobees.com/
[location] http://www.hobees.com/redwood.html

I still think Twitter is the most practical implementation of the Open Social Data Bus, for reasons I listed before:

  • It’s here today
  • It’s open and makes no pretense of (often violated) “privacy settings”
  • It can scale (give or take some growing pains and some still-drastic quota restrictions)
  • It has a delegated authorization model (though not quite as fine-grained as I’d like)
  • It already has a large ecosystem of provider/consumer applications
  • Humans look at the messages, ensuring that any integration of personal data will remain at a human scale and therefore controllable
  • It has proven to be a very successful environment for semantic tags to emerge spontaneously
  • It is persisted by many actors, including Google, Bing and the Library of Congress
  • Did I mention that it’s here today?

I remember discussions, in the early-to-mid-nineties, about whether the Internet, this quirky but fast-growing network, would turn into the expected global “information superhighway” or whether a superior one would have to emerge. This might seem like a silly discussion today but it wasn’t so obvious at the time. Wondering whether Twitter will turn out to be the Open Social Data Bus will seem just as silly in 15 years, though I don’t know if it will be deemed silly because the answer was obviously “no” or obviously “yes”…

The tension between Twitter as an infrastructure provider and Twitter as a competitor in the Twitter app marketplace is well-known. The company understands that what makes them different from other social networks is the ecosystem of applications that was enabled by this “message bus” pattern. Which is why, even as they announced that they were going to create their own applications to tap into the stream, they took pains to explain that they would be calling the same interfaces as everybody else.

On the other hand, Twitter obviously also needs to worry about making money.  If their service becomes a low-level service, invisible to users (almost like DNS), then who is going to pay for the operations? Especially since the expectations on Twitter are currently so high that a “normal” rate of profit on operating such an infrastructure would be a huge letdown for investors. But this is not a post about the business prospects and strategic challenges of Twitter. It’s about allowing integration of social profile data in a way that benefits users.

I’d be fine with some other Open Social Data Bus implementation taking over and serving this need, as long as it fulfills the key requirements of being equally open to all applications and allowing individuals to control what gets posted about them. There are other avenues if Twitter cannot (or doesn’t want to) play this role. As the DNS example shows, it doesn’t necessarily have to be operated by a single operator. And there are a variety of funding models for such essential infrastructure (see “who funds root name server operations?” in the DNS root name servers FAQ). Alternatively, applications might be charged based on how much data they get from the bus.

Corporate support can take different forms. From wireless frequencies to wi-fi networks to DNS to supporting Firefox Google has shown a willingness to support the development and operation of the internet infrastructure, confident that they’ll be in the best position to benefit from it. Especially if the alternative is what Pete Cashmore describes as “Google’s nightmare“.

You could even think of this service eventually falling under the “common carrier” model, with the corresponding legal constraints. Especially in societies that are more privacy-aware.

I don’t know what the right business/operating model is for the Open Social Data Bus. What I know is that it’s how I want my social profile data to flow between applications.

[UPDATED 2010/5/20: Some supporting evidence for my recollection of “discussions, in the early-to-mid-nineties, about whether the Internet, this quirky but fast-growing network, would turn into the expected global ‘information superhighway’ or whether a superior one would have to emerge”:

Gates’s 286-page book [The Road Ahead, 1995] mentions the World Wide Web on only four of its pages, and portrays the Internet as a subset of a much a larger “Information Superhighway.” The Internet, wrote Gates, is one of “the important precursors of the information highway,” along with PCs, CD-ROMs, phone networks, and cable systems, but “none represents the actual information highway. … today’s Internet is not the information highway I imagine, although you can think of it as the beginning of the highway.”]

4 Comments

Filed under Everything, Facebook, Google, Social networks, Tech, Twitter

4 Responses to Integration patterns for social data: the Open Social Data Bus

  1. Pingback: William Vambenepe — Don’t tell Facebook what you like, tell Twitter

  2. I liked this post a lot. Enterprise integration meets telecom policy.

    My big reaction is that it is a little bit implied in here that Facebook wants to be a database for nefarious reasons of business advantage, and that a message bus is ultimately better for consumers. I’m not sure.

    Common carrier regulations came about because it was thought that what you call the message bus architecture was inherently worse and more dangerous for consumers. With an database architecture you can theoretically change providers. You could say that if there is some customer right to export your data and a common interchange format, looks good for consumers. Then, if Facebook sux — go back to Orkut!

    But common carrier rules came about because what you here call the message bus architecture was extremely tempting for the bus provider because it often didn’t make sense to have multiple competing message buses. When the message bus was the railroad, there was a great temptation for the railroad barons to snoop on the contents of the message bus and strike horrible side deals. I’ll deliver your produce before it spoils if you give me a kickback. Or, I see that you are competing with someone who gives me a kickback so I’ll slow down your messages. It might be to the customer’s advantage to force the bus to be dumber (as common carrier rules sometimes do — restricting the bus driver from looking in the messages and restricting the bus driver from owning side businesses that depend on preferential access to the bus). But this has to be forced.

    So would you rather have:
    – an application/database single store architecture with multiple competing application/database providers (Facebook, MySpace, Friendster, Orkut, …) with an agreed-upon right to export your data and a format to transfer it between providers.

    OR

    – a message bus architecture with an evil bus operator and no regulations. And only one message bus. (“Oh you want to direct message Vambenepe? He’s not a preferred rider so GET OFF.”)

    I do take your points about transparency and coupling though. Nicely done!

    Christian

  3. Thanks for the perspective Christian. You’re right that if there is one such message bus then there will be a lot of temptation for its operator to abuse its power. Which is the main reason why Twitter might not be the best operator (despite all the other reasons why it would work well). But rather a consortium of various participants (as is the case for DNS root servers) or a regulated entity (thus the “common carrier”). I fully agree with this, which is why I mentioned “the key requirements of being equally open to all applications”.

    A DB-centric mechanism might provide more facility to pack up and move, except that Facebook has not intent to let people do so. So in effect you’ll also need some way to make sure that the DB operator plays by the rules.

    What I really like about the message bus is that you get to see everything that feeds into your profile. Unlike the DB where you don’t know exactly what goes between Facebook and Yelp (or whichever site the integration is with). Of course that doesn’t stop Facebook and Yelp from sharing data about you outside of the bus, but at least they don’t get to piggyback on the communication channel that you authorize and facilitate.

    Maybe it’s also my background of debugging protocol implementations by intercepting HTTP requests over the wire. I like this approach. This way, if there is a bug in my social profile (e.g. my Netflix movie recommendations get polluted by an identity confusion with my brother who has awful movie tastes) I can inspect the “on the wire” traffic to debug the problem! ;-)

  4. Stu

    I think both Facebook and Twitter embody both patterns, it’s just a matter of emphasis.

    The Facebook news feed is basically a similar bus, arguably one that’s had more history than Twitter.

    Twitter maintains a full indexed database of tweets, favorites, etc. it too is a shared database.

    The difference seems to be Twitter hasn’t built out customized views or aggregations of that data, like Facebook has with say, photos. Though we are starting to see that with services like twitpic, bit.ly, etc.

    IMO The facebook backlash over privacy is rather overblown — at least they’re offering some. With Twitter you have none.

    Basically, I think the analogy of bus vs. database is a stretch. I like both and see both.