Category Archives: Facebook

The war on RSS

If the lords of the Internet have their way, the days of RSS are numbered.

Apple

John Gruber was right, when pointing to Dan Frakes’ review of the Mail app in Mountain Lion, to highlight the fact that the application drops support for RSS (he calls it an “interesting omission”, which is both correct and understated). It is indeed the most interesting aspect of the review, even though it’s buried at the bottom of the article; Along with the mention that RSS support appears to also be removed from Safari.

[side note: here is the correct link for the Safari information; Dan Frakes' article mistakenly points to a staging server only available to MacWorld employees.]

It’s not just John Gruber and I who think that’s significant. The disappearance of RSS is pretty much the topic of every comment on the two MacWorld articles (for Mail and Safari). That’s heartening. It’s going to take a lot of agitation to reverse the trend for RSS.

The Mountain Lion setback, assuming it’s not reversed before the OS ships, is just the last of many blows to RSS.

Twitter

Every twitter profile used to exhibit an RSS icon with the URL of a feed containing the user’s tweets. It’s gone. Don’t assume that’s just the result of a minimalist design because (a) the design is not minimalist and (b) the feed URL is also gone from the page metadata.

The RSS feeds still exist (mine is http://twitter.com/statuses/user_timeline/18518601.rss) but to find them you have to know the userid of the user. In other words, knowing that my twitter username is @vambenepe is not sufficient, you have to know that the userid for @vambenepe is 18518601. Which is not something that you can find on my profile page. Unless, that is, you are willing to wade through the HTML source and look for this element:

<div data-user-id="18518601" data-screen-name="vambenepe">

If you know the Twitter API you can retrieve the RSS URL that way, but neither that nor the HTML source method is usable for most people.

That’s too bad. Before I signed up for Twitter, I simply subscribed to the RSS feeds of a few Twitter users. It got me hooked. Obviously, Twitter doesn’t see much value in this anymore. I suspect that they may even see a negative value, a leak in their monetization strategy.

[Updated on 2013/3/1: Unsurprisingly, Twitter is pulling the plug on RSS/Atom entirely.]

Firefox

It used to be that if any page advertised an RSS feed in its metadata, Firefox would show an RSS icon in the address bar to call your attention to it and let you subscribe in your favorite newsreader. At some point, between Firefox 3 and Firefox 10, this disappeared. Now, you have to launch the “view page info” pop-up and click on “feeds” to see them listed. Or look for “subscribe to this page” in the “bookmarks” menu. Neither is hard, but the discoverability of the feeds is diminished. That’s especially unfortunate in the case of sites that don’t look like blogs but go the extra mile of offering relevant feeds. It makes discovering these harder.

Google

Google has done a lot for RSS, but as a result it has put itself in position to kill it, either accidentally or on purpose. Google Reader is a nice tool, but, just like there has not been any new webmail after GMail, there hasn’t been any new hosted feed reader after Google Reader.

If Google closed GMail (or removed email support from it), email would survive as a communication mechanism (removing email from GMail is hard to imagine today, but keep in mind that Google’s survival doesn’t require GMail but they appear to consider it a matter of life or death for Google+ to succeed). If, on the other hand, Google closed Reader, would RSS survive? Doubtful. And Google has already tweaked Reader to benefit Google+. Not, for now, in a way that harms its RSS support. But whatever Google+ needs from Reader, Google+ will get.

[Updated 2013/3/13: Adios Google Reader. But I'm now a Google employee and won't comment further.]

As far as the Chrome browser is concerned, I can’t find a way to have it acknowledge the presence of feeds in a page at all. Unlike Firefox, not even “view page info” shows them; It appears that the only way is to look for the feed URLs in the HTML source.

Facebook

I don’t use Facebook, but for the benefit of this blog post I did some actual research and logged into my account there. I looked for a feed on a friend’s page. None in sight. Unlike Twitter, who started with a very open philosophy, I’m guessing Facebook never supported feeds so it’s probably not a regression in their case. Just a confirmation that no help should be expected from that side.

[update: in fact, Facebook used to offer RSS and killed it too.]

Not looking good for RSS

The good news is that there’s at least one thing that Facebook, Apple, Twitter and (to a lesser extent so far) Google seem to agree on. The bad news is that it’s that RSS, one of the beacons of openness on the internet, is the enemy.

[side note: The RSS/Atom question is irrelevant in this context and I purposedly didn't mention Atom to not confuse things. If anyone who's shunning RSS tells you that if it wasn't for the RSS/Atom confusion they'd be happy to use a standard syndication format, they're pulling your leg; same thing if they say that syndication is "too hard for users".]

70 Comments

Filed under Apple, Big picture, Everything, Facebook, Google, Protocols, Social networks, Specs, Standards, Twitter

URL shorteners and privacy: The Good, the Bad and the Cookie

The table below compares various URL shorteners based on how much they value service performance and the privacy of their users.

Here is the short version of the reading guide: a URL shorterner which gives a high priority to reliability, performance and privacy will use a 301 (“Moved Permanently”) response code, will not use cache control headers and will not use cookies. A URL shortener which gives high priority to its own ability to monetize its traffic by tracking users will do one or more of these things.

Here is how a few of the most popular shorteners perform by this measure (red is bad).

For the long version (and an explanation of how I came to create this table) read below the table.

Service name Cookie Status code Caching limitations
t.co (Twitter) - 301 5 min
bit.ly tracking 301 -
tinyurl.com - 301 -
goo.gl (Google) - 301 24h
wp.me (WordPress) - 301 -
snurl.com - 301 10h
fb.me (Facebook) (*) 301 -
twurl.nl tracking 301 -
is.gd - - -
ping.fm - 301 -
p.ly tracking 301 no caching
ff.im tracking 301 (**)
u.nu - 301 -
tiny.cc tracking 301 -
snipurl.com - 301 10h
chkit.in tracking 301 -
ur1.ca - 302 no caching
digs.by - 302 no caching

Notes:

(*) Facebook’s service, fb.me, tries to set a cookie but its content is “locale=en_US” and cannot be used for identification. In addition, it sets the domain to “.facebook.com” in the Set-Cookie directive but since the response comes from another domain (fb.me) the cookie is actually never returned by the browser and therefore useless. It looks like this is a leftover configuration setting copied from the normal facebook.com servers. Defying all expectations, Facebook comes out as one of the most privacy-friendly URL shorteners.

(**) ff.im limits the cache to being “private” which means that your browser can cache the result but a shared proxy (e.g. your company’s proxy) should not cache it. Forcing each user behind that proxy to resolve the URL once. I magnanimously did not ding them for this, even though it’s sub-optimal.

Now for the longer explanation

Despite the potential it offers to stretch out our tweets, I wasn’t too impressed when I learned of Twitter’s plan to roll out (and mandate) its own URL shortening service. My fundamental issue is that URL shortening is made necessary by an arbitrary decision on Twitter’s part (the 140 character limit and the fact that URLs count toward it) and that it would be entirely within their power to make these abominations unneeded. Or, at least, much more rarely needed (when tinyurl.com came out, the main use case was to insert a very long URL in an email without having problems with carriage returns, not to turn third-world countries into purveyors of silly domain names).

Beyond this fundamental issue, my main concerns about Twitter’s t.co mechanism are that it reduces privacy and it demands that you break the HTTP specification.

From a privacy perspective, the issue is that anyone who clicks on these links tells Twitter where they are going. And Twitter can collect and correlate these actions. The easiest way for them (or any other URL shortener) to do this is to use cookies. Cookies aren’t often used as part of redirections, but technically nothing prevents them. So I wanted to see if Twitter used them.

[Side note: in practice there are ways to track your browser without using identifying cookies, not to mention simply using the IP address which works quite well on people who browse from home. Still, identifying cookies are the preferred method.]

From a specification conformance perspective, the problem is that Twitter announced that they would modify the Terms of Service of their API to prevent you from replacing the short URL with the real location once you’ve resolved it the first time (as of this writing they apparently haven’t yet made the ToS change). That behavior would be in violation of the HTTP specification if the redirection used status code 301 (“Moved Permanently”) which states that “any future references to this resource SHOULD use one of the returned URIs” and “clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server“. So I wanted to see whether t.co indeed returns a 301 (and asks us to violate the spec) or if they use a Temporary Redirect (302 or the new 307) in which case the specification would not be violated but other problems would arise (for example, search engines would not give you PageRank karma for such a link).

The other (spec-compliant) way to force a 301 to call back home once a while is the (strange but legal) practice of using cache control headers on permanent redirections. So I also wanted to see how t.co behaves on that front.

And then I decided to also test a few other services, which is how the table above came to be.

Comments Off

Filed under Everything, Facebook, Google, Protocols, Security, Social networks, Tech, Testing, Twitter

Integration patterns for social data: the Open Social Data Bus

The previous entry, “Don’t tell Facebook what you like, tell Twitter“, used Twitter and Facebook as examples to illustrate a general point about the integration of social profile data. Unfortunately, the examples may have overshadowed the larger point. In the post, I didn’t consider Twitter as a social network but as a message conduit. Most people on the other hand think of Twitter as a social network (after all, which Twitterer is not watching his/her follower count?) and could come out with the impression that I was just saying that Twitter is a better social network than Facebook. It wasn’t my point.

The main point is about defining the right integration pattern for social data: is it a “message bus” pattern or a “shared database” pattern. For readers who haven’t had the joy of dealing with integration architecture and enterprise integration patterns, here is a one-paragraph primer:

The expense report application in a company needs to be in sync with the data in the HR system, so that an expense report can be sent to the right manager for review/approval. Implementing such application integration in an efficient, resilient and flexible way is hard. Battle-tested approaches (high-level “patterns”) have emerged that have been successful, in the right context. Architects have learned that 99% of the time they are better off asking themselves which of these enterprise integration patterns is right for their problem, rather than trying to invent a new approach. Two of the most common basic patterns are the “shared database” pattern and the “message bus” pattern. In the “shared database” pattern, all the applications read and write to the same repository. In the “message bus” pattern, applications post messages on a shared channel (the “bus”) and also listen on the channel for messages from other applications that they are interested in. It’s similar to a radio channel of the kind used by police and ham radio operators.

(diagrams by Hohpe/Woolf, under cc license)

Facebook wants your social data to be shared across sites and applications using the “shared database” pattern, in which Facebook is the central database (and also the primary application). What I described in the previous post was the use of a “message bus” pattern (in which Twitter was used as the bus).

A bus has the following advantages when applied to the problem of sharing social data:

  • All applications have equal access
  • The applications are loosely-coupled, meaning that changing one doesn’t break the others
  • If applications only communicate via the bus, you get to observe the data shared about you
  • It can scale well

There are lots of interesting considerations about how to build and operate such a bus: security, scalability, access protocols, payload format, etc. But they are secondary to the choice of the integration pattern. For the sake of illustration, Twitter’s approach to security is OAuth, their scalable architecture is described here, the access protocols here and the payload format here. Reasonable alternatives exist for all these functions.

It’s hard for me to imagine the content of the messages on this bus not resembling RDF-like subject/verb/object triplets, in which the subject is implicit (the user attached to the message). The verbs could be simple strings or represented by URIs and have an associated taxonomy. And as in RDF, the objects should be either URIs or simple values (mostly strings, of a limited size, be it 140 characters or something else). Possible examples (the subject is implicit, the verb is in square brackets):

[say] I just had coffeecake for breakfast
[like] http://www.hobees.com/
[location] http://www.hobees.com/redwood.html

I still think Twitter is the most practical implementation of the Open Social Data Bus, for reasons I listed before:

  • It’s here today
  • It’s open and makes no pretense of (often violated) “privacy settings”
  • It can scale (give or take some growing pains and some still-drastic quota restrictions)
  • It has a delegated authorization model (though not quite as fine-grained as I’d like)
  • It already has a large ecosystem of provider/consumer applications
  • Humans look at the messages, ensuring that any integration of personal data will remain at a human scale and therefore controllable
  • It has proven to be a very successful environment for semantic tags to emerge spontaneously
  • It is persisted by many actors, including Google, Bing and the Library of Congress
  • Did I mention that it’s here today?

I remember discussions, in the early-to-mid-nineties, about whether the Internet, this quirky but fast-growing network, would turn into the expected global “information superhighway” or whether a superior one would have to emerge. This might seem like a silly discussion today but it wasn’t so obvious at the time. Wondering whether Twitter will turn out to be the Open Social Data Bus will seem just as silly in 15 years, though I don’t know if it will be deemed silly because the answer was obviously “no” or obviously “yes”…

The tension between Twitter as an infrastructure provider and Twitter as a competitor in the Twitter app marketplace is well-known. The company understands that what makes them different from other social networks is the ecosystem of applications that was enabled by this “message bus” pattern. Which is why, even as they announced that they were going to create their own applications to tap into the stream, they took pains to explain that they would be calling the same interfaces as everybody else.

On the other hand, Twitter obviously also needs to worry about making money.  If their service becomes a low-level service, invisible to users (almost like DNS), then who is going to pay for the operations? Especially since the expectations on Twitter are currently so high that a “normal” rate of profit on operating such an infrastructure would be a huge letdown for investors. But this is not a post about the business prospects and strategic challenges of Twitter. It’s about allowing integration of social profile data in a way that benefits users.

I’d be fine with some other Open Social Data Bus implementation taking over and serving this need, as long as it fulfills the key requirements of being equally open to all applications and allowing individuals to control what gets posted about them. There are other avenues if Twitter cannot (or doesn’t want to) play this role. As the DNS example shows, it doesn’t necessarily have to be operated by a single operator. And there are a variety of funding models for such essential infrastructure (see “who funds root name server operations?” in the DNS root name servers FAQ). Alternatively, applications might be charged based on how much data they get from the bus.

Corporate support can take different forms. From wireless frequencies to wi-fi networks to DNS to supporting Firefox Google has shown a willingness to support the development and operation of the internet infrastructure, confident that they’ll be in the best position to benefit from it. Especially if the alternative is what Pete Cashmore describes as “Google’s nightmare“.

You could even think of this service eventually falling under the “common carrier” model, with the corresponding legal constraints. Especially in societies that are more privacy-aware.

I don’t know what the right business/operating model is for the Open Social Data Bus. What I know is that it’s how I want my social profile data to flow between applications.

[UPDATED 2010/5/20: Some supporting evidence for my recollection of "discussions, in the early-to-mid-nineties, about whether the Internet, this quirky but fast-growing network, would turn into the expected global 'information superhighway' or whether a superior one would have to emerge":

Gates's 286-page book [The Road Ahead, 1995] mentions the World Wide Web on only four of its pages, and portrays the Internet as a subset of a much a larger “Information Superhighway.” The Internet, wrote Gates, is one of “the important precursors of the information highway,” along with PCs, CD-ROMs, phone networks, and cable systems, but “none represents the actual information highway. … today’s Internet is not the information highway I imagine, although you can think of it as the beginning of the highway.”]

4 Comments

Filed under Everything, Facebook, Google, Social networks, Tech, Twitter

Don’t tell Facebook what you like, tell Twitter

There seems to be a lot to like technically about the announcements at Facebook’s f8 conference, especially for a Semantic Web aficionado. But I won’t have anything to do with it as a user. Along with the usual “your privacy is our toy” subtext, I really don’t like the lack of data portability. “Web 2.0″ is starting to look a lot like “AOL 2.0″. Here is a better way to do it.

Taking the new “like” button as a simple example, I’d much rather tell Twitter what I like than Facebook. A simple #like hashtag in a tweet can be used to express positive feelings for what the tweet describes. Here is a quick list of the many advantages of this approach over the newly-introduced Facebook “like” feature.

It’s public

Your tweets are available to all. Your Facebook profile can still consume them, so if you think Facebook does the best job at organizing this information about you and your friends you can still go there to view the results. But other applications and networks can tap into the same data, so you can also benefits from innovation coming out of companies which do not want to be Facebook sharecroppers.

It’s publicly public

By which I mean that there is no pretense of privacy and no nasty surprise when trust is violated. Which is going to happen again and again. Especially when it’s not just a matter of displaying data but also of inferring new information based on the raw data collected. At which point it’s almost impossible to segregate access to the derived information based on the privacy settings of the individual data pieces. On Twitter, it’s all public, we all know it from the start, and as such we’re not fooled into sharing more than we should. See the fallacy of privacy settings.

It works on all things

Rather than only being on a web page, you can use a #like hashtag to describe any URI (dereferenceable or not) or even plain text. Just like RDF allows the value of an attribute to be either a URI or a scalar value (string, number…). For example, you can express that you like a quote or a verse of a poem by including them directly in the tweet. It’s not as identifiable as something that has a URI, but it can still be part of your profile. And smart consumers of this data might still be able to do some processing on it (e.g. recognizing it as a line from a song).

It can still be 1-click

You don’t necessarily have to copy/paste a URL (or text) into twitter. A web site can still do this for you, as long as it has your permission to post on your behalf. With that approach, it looks exactly like the Twitter “like” button to the user. You don’t have to be a Twitter user, just to have a Twitter account. No need for a Twitter client or to visit the Twitter web site if you don’t want to. It’s also OK if you have zero followers, Twitter is just a technical conduit in this approach.

It can evolve

The success of Twitter is also the success of self-organization as illustrated by the emergence of @replies, #hashtags and RT, directly form the users. Rather than having Facebook decide what verbs make sense to allow users to express their thoughts on the Web, let people decide and see what verbs emerge (e.g. to describe what you like, dislike, are curious about, are considering buying, etc). The only thing we need is an understanding that the hashtag qualifies the user’s attitude towards what’s described by the rest of the tweet. Or maybe hashtags should not be reused for this, maybe we need a new breed, “semtags” (semantic tags), with a different syntax, e.g. “^like”. This way you can semtag a hashtag, e.g. “^like #nyc” might replace “I ♥ NY” on twitter feeds (and tee shirts). It can be as simple or as complex as needed, based on what sticks in the real world. Nerds like me will try to qualify it (e.g. “^!like” for “I don’t like”) and might even come up with ontologies (^love subClassOf ^like). These experiences will probably fail and that’s fine. Evolution strives on failures.

It is transparent

Even if you let a site write these messages on your twitter feed, you can see exactly what goes on. There is no secret channel as with Facebook. The fact that it goes on your Twitter timeline acts as a validation, ensuring that only relevant, human-readable messages get added to your profile. Which is the only way in which we can maintain control of our profile information. If sites start to send too much information or opaque information you’ll see it. And so will your followers. This will put pressure on sites to make the posted data sparse and meaningful, because they know that their users won’t want to scare away their followers with social spam. See, for example, how the outcries over foursquare spam seem to have forced a clean-up (or at least so it looks to me, but maybe it’s just because I’ve unfollowed the spammers). Keeping social profiling on a human scale is a bug, not a feature.

It is persisted in many places

Who do you think is more likely to be around in 20 years, Facebook or the Library of Congress? Tweets are archived in many places, including Twitter itself, of course, but also Google, Bing and the Library of Congress. Plus, it’s very easy for you to set up a system to save all your tweets. Even if Twitter disappears, all the data in your profile that was built from your tweets will still be around. And if Google, Bing and the Library of Congress all go dark before Facebook, well that’s fine because the profile data from your tweets can be there too.

In effect, you should think of Facebook as a repository and Twitter as a stream. Don’t publish directly to one repository. Publish to a stream and benefit from all the repositories and other consumers that tap into it. It’s a well-known enterprise integration pattern (message bus), but it’s not just good for enterprise applications.

In fact, more than Twitter itself it’s this pattern that I want to encourage. Twitter is just the most obvious implementation, at this time, of a profile data bus. It already has almost everything we need (though a more fine-grained authorization model, or a delegated authorization model, would make me more likely to allow sites to tweet on my behalf). What matters is the switch from social networks owning data to you owning your data and social networks competing on how much value they can deliver to you based on the data. For example, LinkedIn might be the best for work connections, Facebook for personal connections, Google for brute search/retrieval of information, etc. I don’t want to maintain different profile data and privacy settings for each of them. I have one global privacy settings, which controls what I share with the world. Based on this, I want these sites to compete on the value they provide to me. It may not be what Facebook wants, but if what works best for us.

If you like this proposal, you know what you have to do. Go ahead and tweet:

^like http://stage.vambenepe.com/archives/1464

Or just retweet it.

[UPDATED 2010/5/6: See the next post for some clarifications.]

10 Comments

Filed under Everything, Facebook, Google, Mashup, RDF, Semantic tech, Social networks, Twitter

The fallacy of privacy settings

Another round of “update your Facebook privacy settings right now” messages recently swept through Twitter and blogs. As also happened a few months ago, when Facebook last modified some privacy settings to better accommodate their business goals. This is borderline silly. So, once and for all, here is the rule:

Don’t put anything on any social network that you don’t want to be made public.

Don’t count on your privacy settings on the site to keep your “private” data out of the public eye. Here are the many ways in which they can fail (using Facebook as a stand-in for all the other social networks, this is not specific to Facebook):

  • You make a mistake when configuring the privacy settings
  • Facebook changes the privacy mechanisms on you during one of their privacy policy updates
  • Facebook has a security flaw that bypasses access control
  • One of you friends who has access to your private data accidentally/stupidly/maliciously shares it more widely
  • A Facebook application to which you grant access betrays your trust in accessing the data and exposing it
  • A Facebook application gets hacked
  • A Facebook application retains your data in its cache
  • Your account (or one of your friends’ account) gets hacked
  • Anonymized data that Facebook shares with researchers gets correlated back to real users
  • Some legal action (not necessarily related to you personally) results in a large amount of Facebook data (including yours) seized and exported for legal review
  • Facebook looses some backup media
  • Facebook gets acquired (or it goes out of business and its assets are sold to the highest bidder)
  • Facebook (or whoever runs their hardware) disposes of hardware without properly wiping it
  • [Added 2012/3/8] Your employer or schoold demands that you hand over your account password (or “friend” a monitor)
  • Etc…

All in all, you should not think of these privacy settings as locks protecting your data. Think of them as simply a “do not disturb” sign (or a necktie…) hanging on the knob of an unlocked door. I am not advising against using privacy settings, just against counting on them to work reliably. If you’d rather your work colleagues don’t see your holiday pictures, then set your privacy settings so they can’t see them. But if it would really bother you if they saw them, then don’t post the pictures on Facebook at all. Think of it like keeping a photo in your wallet. You get to choose who you show it to, until the day you forget your wallet in the office bathroom, or at a party, and someone opens it to find the owner. You already know this instinctively, which is why you probably wouldn’t carry photos in your wallet that shouldn’t be shown publicly. It’s the same on Facebook.

This is what was so disturbing about the Buzz/GMail privacy fiasco. It took data (your list of GMail contacts) that was not created for the purpose of sharing it with anyone, and turned this into profile data in a social network. People who signed up for GMail didn’t sign up for a social network, they signed up for a Web-based email. What Google wants, on the other hand, is a large social network like Facebook, so it tried to make GMail into one by auto-following GMail contacts in your Buzz profile. It’s as if your insurance company suddenly decided it wanted to enter the social networking business and announced one day that you were now “friends” with all their customers who share the same medical condition. And will you please log in and update your privacy settings if you have a problem with that, you backward-looking, privacy-hugging, profit-dissipating idiot.

On the other hand, that’s one thing I like about Twitter. By and large (except for the few people who lock their accounts) almost all the information you put in Twitter is expected to be public. There is no misrepresentation, confusion or surprise. I don’t consider this lack of configurable privacy as a sign that Twitter doesn’t respect the privacy of its users. To the contrary, I almost see this as the most privacy-friendly approach: make it clear that everything is public. Because it is anyway.

One could almost make a counter-intuitive case that providing privacy settings is anti-privacy because it gives an unwarranted sense of security and nudges users towards providing more private data than they otherwise would. At least if the policy settings are not contractual (can you sue Facebook for changing its privacy terms on you?). At least it’s been working that way so far for Facebook, intentionally of not, as illustrated by all the articles that stress the importance of setting our privacy settings right (implicit message: it’s ok to put private information as long as you set  privacy settings).

Yes you should have clear privacy settings. But the place to store them is in your brain and the place to enforce them is by controlling what your fingers do before data gets on Facebook. Facebook and similar networks can only leak data that they posses. A lot of that data comes from you directly uploading it. And that’s the point where you have control. After this, you really don’t. Other data comes from tracking and analyzing your activities and connections, without explicit data upload from you. That’s a lot harder for you to control (you rarely even get asked for your privacy preferences on this data), but that’s out of scope for this blog entry.

Just like banks that are too big to fail are too big to exist, data that is too sensitive to leak from Facebook is too sensitive to be on Facebook.

5 Comments

Filed under Everything, Facebook, Google, Off-topic, Security, Social networks, Twitter