Category Archives: Social networks

Big Data career adviser says you should be a… Big Data analyst

LinkedIn CEO Jeff Weiner wrote an interesting post on “the future of LinkedIn and the economic graph“. There’s a lot to like about his vision. The part about making education and career choices better informed by data especially resonates with me:

With the existence of an economic graph, we could look at where the jobs are in any given locality, identify the fastest growing jobs in that area, the skills required to obtain those jobs, the skills of the existing aggregate workforce there, and then quantify the size of the gap. Even more importantly, we could then provide a feed of that data to local vocational training facilities, junior colleges, etc. so they could develop a just-in-time curriculum that provides local job seekers the skills they need to obtain the jobs that are and will be, and not just the jobs that once were.

I consider myself very lucky. I happened to like computers and enjoy programming them. This eventually lead me to an engineering degree, a specialization in Computer Science and a very enjoyable career in an attractive industry. I could have been similarly attracted by other domains which would have been unlikely to give me such great professional options. Not everyone is so lucky, and better data could help make better career and education choices. The benefits, both at the individual and societal levels, could be immense.

Of course, like for every Big Data example, you can’t expect a crystal ball either. It’s unlikely that the “economic graph” for France in 1994 would have told me: “this would be a good time to install Linux Slackware, learn Python and write your first CGI script”. It’s also debatable whether that “economic graph” would have been able to avoid one of the worst talent waste of recent time, when too many science and engineering graduates went into banking. The “economic graph” might actually have encouraged that.

But, even under moderate expectations, there is a lot of potential for better informed education and career decision (both on the part of the training profession and the students themselves) and I am glad that LinkedIn is going after that. Along with the choice of a life partner (and other companies are after that problem), this is maybe the most important and least informed decision people will make in their lifetime.

Jeff Weiner also made proclamation of openness in that same article:

Once realized, we then want to get out of the way and allow all of the nodes on this network to connect seamlessly by removing as much friction as possible and allowing all forms of capital, e.g. working capital, intellectual capital, and human capital, to flow to where it can best be leveraged.

I’m naturally suspicious of such claims. And a few hours later, I get a nice email from LinkedIn, announcing that as of tomorrow they are dropping the “blog link” application which, as far as I can tell, fetches recent posts form my blog and includes them on my LinkedIn profile. Seems to me that this was a nice and easy way to “allow all of the nodes on this network to connect seamlessly by removing as much friction as possible”…

1 Comment

Filed under Big Data, Everything, Linked Data, People, Social networks

The war on RSS

If the lords of the Internet have their way, the days of RSS are numbered.

Apple

John Gruber was right, when pointing to Dan Frakes’ review of the Mail app in Mountain Lion, to highlight the fact that the application drops support for RSS (he calls it an “interesting omission”, which is both correct and understated). It is indeed the most interesting aspect of the review, even though it’s buried at the bottom of the article; Along with the mention that RSS support appears to also be removed from Safari.

[side note: here is the correct link for the Safari information; Dan Frakes’ article mistakenly points to a staging server only available to MacWorld employees.]

It’s not just John Gruber and I who think that’s significant. The disappearance of RSS is pretty much the topic of every comment on the two MacWorld articles (for Mail and Safari). That’s heartening. It’s going to take a lot of agitation to reverse the trend for RSS.

The Mountain Lion setback, assuming it’s not reversed before the OS ships, is just the last of many blows to RSS.

Twitter

Every twitter profile used to exhibit an RSS icon with the URL of a feed containing the user’s tweets. It’s gone. Don’t assume that’s just the result of a minimalist design because (a) the design is not minimalist and (b) the feed URL is also gone from the page metadata.

The RSS feeds still exist (mine is http://twitter.com/statuses/user_timeline/18518601.rss) but to find them you have to know the userid of the user. In other words, knowing that my twitter username is @vambenepe is not sufficient, you have to know that the userid for @vambenepe is 18518601. Which is not something that you can find on my profile page. Unless, that is, you are willing to wade through the HTML source and look for this element:

<div data-user-id="18518601" data-screen-name="vambenepe">

If you know the Twitter API you can retrieve the RSS URL that way, but neither that nor the HTML source method is usable for most people.

That’s too bad. Before I signed up for Twitter, I simply subscribed to the RSS feeds of a few Twitter users. It got me hooked. Obviously, Twitter doesn’t see much value in this anymore. I suspect that they may even see a negative value, a leak in their monetization strategy.

[Updated on 2013/3/1: Unsurprisingly, Twitter is pulling the plug on RSS/Atom entirely.]

Firefox

It used to be that if any page advertised an RSS feed in its metadata, Firefox would show an RSS icon in the address bar to call your attention to it and let you subscribe in your favorite newsreader. At some point, between Firefox 3 and Firefox 10, this disappeared. Now, you have to launch the “view page info” pop-up and click on “feeds” to see them listed. Or look for “subscribe to this page” in the “bookmarks” menu. Neither is hard, but the discoverability of the feeds is diminished. That’s especially unfortunate in the case of sites that don’t look like blogs but go the extra mile of offering relevant feeds. It makes discovering these harder.

Google

Google has done a lot for RSS, but as a result it has put itself in position to kill it, either accidentally or on purpose. Google Reader is a nice tool, but, just like there has not been any new webmail after GMail, there hasn’t been any new hosted feed reader after Google Reader.

If Google closed GMail (or removed email support from it), email would survive as a communication mechanism (removing email from GMail is hard to imagine today, but keep in mind that Google’s survival doesn’t require GMail but they appear to consider it a matter of life or death for Google+ to succeed). If, on the other hand, Google closed Reader, would RSS survive? Doubtful. And Google has already tweaked Reader to benefit Google+. Not, for now, in a way that harms its RSS support. But whatever Google+ needs from Reader, Google+ will get.

[Updated 2013/3/13: Adios Google Reader. But I’m now a Google employee and won’t comment further.]

As far as the Chrome browser is concerned, I can’t find a way to have it acknowledge the presence of feeds in a page at all. Unlike Firefox, not even “view page info” shows them; It appears that the only way is to look for the feed URLs in the HTML source.

Facebook

I don’t use Facebook, but for the benefit of this blog post I did some actual research and logged into my account there. I looked for a feed on a friend’s page. None in sight. Unlike Twitter, who started with a very open philosophy, I’m guessing Facebook never supported feeds so it’s probably not a regression in their case. Just a confirmation that no help should be expected from that side.

[update: in fact, Facebook used to offer RSS and killed it too.]

Not looking good for RSS

The good news is that there’s at least one thing that Facebook, Apple, Twitter and (to a lesser extent so far) Google seem to agree on. The bad news is that it’s that RSS, one of the beacons of openness on the internet, is the enemy.

[side note: The RSS/Atom question is irrelevant in this context and I purposedly didn’t mention Atom to not confuse things. If anyone who’s shunning RSS tells you that if it wasn’t for the RSS/Atom confusion they’d be happy to use a standard syndication format, they’re pulling your leg; same thing if they say that syndication is “too hard for users”.]

70 Comments

Filed under Apple, Big picture, Everything, Facebook, Google, Protocols, Social networks, Specs, Standards, Twitter

On resisting shiny objects

The previous post (compiling responses to my question on Twitter about why there seems to be more PaaS activity in public Clouds than private Clouds) is actually a slightly-edited repost of something I first posted on Google+.

I posted it as “public” which, Google+ says, means “Visible to anyone (public on the web)”. Except it isn’t. If I go to the link above in another browser (not logged to my Google account) I get nothing but an invitation to join Google+. How AOLy. How Facebooky. How non-Googly.

Maybe I’m doing something wrong. Or there’s a huge bug in Google+. Or I don’t understand what “public on the web” means. In any case, this is not what I need. I want to be able to point people on Twitter who saw my question (and, in some cases, responded to it) to a compilation of answers. Whether they use Google+ or not.

So I copy/pasted the compilation to my blog.

Then I realized that this is obviously what I should have done in the first place:

  • It’s truly public.
  • It brings activity to my somewhat-neglected blog.
  • My blog is about IT management and Cloud, which is the topic at hand, my Google+ stream is about… nothing really so far.
  • The terms of use (for me as the writer and for my readers) are mine.
  • I can format the way I want (human-readable text that acts as a link as opposed to having to show the URL, what a concept!).
  • I know it will be around and available in an open format (you’re probably reading this in an RSS/Atom reader, aren’t you?)
  • There is no ad and never will be any.
  • I get the HTTP log if I care to see the traffic to the page.
  • Commenters can use pseudonyms!

It hurts to admit it, but the thought process (or lack thereof) that led me to initially use Google+ goes along the lines of “I have this Google+ account that I’m not really using and I am sympathetic to Google+ (for reasons explained by James Fallows plus a genuine appreciation of the technical task) and, hey, here is something that could go on it so let’s put it there”. As opposed to what it should have been: “I have this piece of text full of links that I want to share, what would be the best place to do it?”, which screams “blog!” as the answer.

I consider myself generally pretty good at resisting shiny objects, but obviously I still need to work on it. I’m back to my previous opinion on Google+: it’s nice and well-built but right now I don’t really have a use for it.

I used to say “I haven’t found a use for it” but why should I search for one?

1 Comment

Filed under Big picture, Everything, Google, Off-topic, Portability, Social networks, Twitter

URL shorteners and privacy: The Good, the Bad and the Cookie

The table below compares various URL shorteners based on how much they value service performance and the privacy of their users.

Here is the short version of the reading guide: a URL shorterner which gives a high priority to reliability, performance and privacy will use a 301 (“Moved Permanently”) response code, will not use cache control headers and will not use cookies. A URL shortener which gives high priority to its own ability to monetize its traffic by tracking users will do one or more of these things.

Here is how a few of the most popular shorteners perform by this measure (red is bad).

For the long version (and an explanation of how I came to create this table) read below the table.

Service name Cookie Status code Caching limitations
t.co (Twitter) 301 5 min
bit.ly tracking 301
tinyurl.com 301
goo.gl (Google) 301 24h
wp.me (WordPress) 301
snurl.com 301 10h
fb.me (Facebook) (*) 301
twurl.nl tracking 301
is.gd
ping.fm 301
p.ly tracking 301 no caching
ff.im tracking 301 (**)
u.nu 301
tiny.cc tracking 301
snipurl.com 301 10h
chkit.in tracking 301
ur1.ca 302 no caching
digs.by 302 no caching

Notes:

(*) Facebook’s service, fb.me, tries to set a cookie but its content is “locale=en_US” and cannot be used for identification. In addition, it sets the domain to “.facebook.com” in the Set-Cookie directive but since the response comes from another domain (fb.me) the cookie is actually never returned by the browser and therefore useless. It looks like this is a leftover configuration setting copied from the normal facebook.com servers. Defying all expectations, Facebook comes out as one of the most privacy-friendly URL shorteners.

(**) ff.im limits the cache to being “private” which means that your browser can cache the result but a shared proxy (e.g. your company’s proxy) should not cache it. Forcing each user behind that proxy to resolve the URL once. I magnanimously did not ding them for this, even though it’s sub-optimal.

Now for the longer explanation

Despite the potential it offers to stretch out our tweets, I wasn’t too impressed when I learned of Twitter’s plan to roll out (and mandate) its own URL shortening service. My fundamental issue is that URL shortening is made necessary by an arbitrary decision on Twitter’s part (the 140 character limit and the fact that URLs count toward it) and that it would be entirely within their power to make these abominations unneeded. Or, at least, much more rarely needed (when tinyurl.com came out, the main use case was to insert a very long URL in an email without having problems with carriage returns, not to turn third-world countries into purveyors of silly domain names).

Beyond this fundamental issue, my main concerns about Twitter’s t.co mechanism are that it reduces privacy and it demands that you break the HTTP specification.

From a privacy perspective, the issue is that anyone who clicks on these links tells Twitter where they are going. And Twitter can collect and correlate these actions. The easiest way for them (or any other URL shortener) to do this is to use cookies. Cookies aren’t often used as part of redirections, but technically nothing prevents them. So I wanted to see if Twitter used them.

[Side note: in practice there are ways to track your browser without using identifying cookies, not to mention simply using the IP address which works quite well on people who browse from home. Still, identifying cookies are the preferred method.]

From a specification conformance perspective, the problem is that Twitter announced that they would modify the Terms of Service of their API to prevent you from replacing the short URL with the real location once you’ve resolved it the first time (as of this writing they apparently haven’t yet made the ToS change). That behavior would be in violation of the HTTP specification if the redirection used status code 301 (“Moved Permanently”) which states that “any future references to this resource SHOULD use one of the returned URIs” and “clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server“. So I wanted to see whether t.co indeed returns a 301 (and asks us to violate the spec) or if they use a Temporary Redirect (302 or the new 307) in which case the specification would not be violated but other problems would arise (for example, search engines would not give you PageRank karma for such a link).

The other (spec-compliant) way to force a 301 to call back home once a while is the (strange but legal) practice of using cache control headers on permanent redirections. So I also wanted to see how t.co behaves on that front.

And then I decided to also test a few other services, which is how the table above came to be.

Comments Off on URL shorteners and privacy: The Good, the Bad and the Cookie

Filed under Everything, Facebook, Google, Protocols, Security, Social networks, Tech, Testing, Twitter

Integration patterns for social data: the Open Social Data Bus

The previous entry, “Don’t tell Facebook what you like, tell Twitter“, used Twitter and Facebook as examples to illustrate a general point about the integration of social profile data. Unfortunately, the examples may have overshadowed the larger point. In the post, I didn’t consider Twitter as a social network but as a message conduit. Most people on the other hand think of Twitter as a social network (after all, which Twitterer is not watching his/her follower count?) and could come out with the impression that I was just saying that Twitter is a better social network than Facebook. It wasn’t my point.

The main point is about defining the right integration pattern for social data: is it a “message bus” pattern or a “shared database” pattern. For readers who haven’t had the joy of dealing with integration architecture and enterprise integration patterns, here is a one-paragraph primer:

The expense report application in a company needs to be in sync with the data in the HR system, so that an expense report can be sent to the right manager for review/approval. Implementing such application integration in an efficient, resilient and flexible way is hard. Battle-tested approaches (high-level “patterns”) have emerged that have been successful, in the right context. Architects have learned that 99% of the time they are better off asking themselves which of these enterprise integration patterns is right for their problem, rather than trying to invent a new approach. Two of the most common basic patterns are the “shared database” pattern and the “message bus” pattern. In the “shared database” pattern, all the applications read and write to the same repository. In the “message bus” pattern, applications post messages on a shared channel (the “bus”) and also listen on the channel for messages from other applications that they are interested in. It’s similar to a radio channel of the kind used by police and ham radio operators.

(diagrams by Hohpe/Woolf, under cc license)

Facebook wants your social data to be shared across sites and applications using the “shared database” pattern, in which Facebook is the central database (and also the primary application). What I described in the previous post was the use of a “message bus” pattern (in which Twitter was used as the bus).

A bus has the following advantages when applied to the problem of sharing social data:

  • All applications have equal access
  • The applications are loosely-coupled, meaning that changing one doesn’t break the others
  • If applications only communicate via the bus, you get to observe the data shared about you
  • It can scale well

There are lots of interesting considerations about how to build and operate such a bus: security, scalability, access protocols, payload format, etc. But they are secondary to the choice of the integration pattern. For the sake of illustration, Twitter’s approach to security is OAuth, their scalable architecture is described here, the access protocols here and the payload format here. Reasonable alternatives exist for all these functions.

It’s hard for me to imagine the content of the messages on this bus not resembling RDF-like subject/verb/object triplets, in which the subject is implicit (the user attached to the message). The verbs could be simple strings or represented by URIs and have an associated taxonomy. And as in RDF, the objects should be either URIs or simple values (mostly strings, of a limited size, be it 140 characters or something else). Possible examples (the subject is implicit, the verb is in square brackets):

[say] I just had coffeecake for breakfast
[like] http://www.hobees.com/
[location] http://www.hobees.com/redwood.html

I still think Twitter is the most practical implementation of the Open Social Data Bus, for reasons I listed before:

  • It’s here today
  • It’s open and makes no pretense of (often violated) “privacy settings”
  • It can scale (give or take some growing pains and some still-drastic quota restrictions)
  • It has a delegated authorization model (though not quite as fine-grained as I’d like)
  • It already has a large ecosystem of provider/consumer applications
  • Humans look at the messages, ensuring that any integration of personal data will remain at a human scale and therefore controllable
  • It has proven to be a very successful environment for semantic tags to emerge spontaneously
  • It is persisted by many actors, including Google, Bing and the Library of Congress
  • Did I mention that it’s here today?

I remember discussions, in the early-to-mid-nineties, about whether the Internet, this quirky but fast-growing network, would turn into the expected global “information superhighway” or whether a superior one would have to emerge. This might seem like a silly discussion today but it wasn’t so obvious at the time. Wondering whether Twitter will turn out to be the Open Social Data Bus will seem just as silly in 15 years, though I don’t know if it will be deemed silly because the answer was obviously “no” or obviously “yes”…

The tension between Twitter as an infrastructure provider and Twitter as a competitor in the Twitter app marketplace is well-known. The company understands that what makes them different from other social networks is the ecosystem of applications that was enabled by this “message bus” pattern. Which is why, even as they announced that they were going to create their own applications to tap into the stream, they took pains to explain that they would be calling the same interfaces as everybody else.

On the other hand, Twitter obviously also needs to worry about making money.  If their service becomes a low-level service, invisible to users (almost like DNS), then who is going to pay for the operations? Especially since the expectations on Twitter are currently so high that a “normal” rate of profit on operating such an infrastructure would be a huge letdown for investors. But this is not a post about the business prospects and strategic challenges of Twitter. It’s about allowing integration of social profile data in a way that benefits users.

I’d be fine with some other Open Social Data Bus implementation taking over and serving this need, as long as it fulfills the key requirements of being equally open to all applications and allowing individuals to control what gets posted about them. There are other avenues if Twitter cannot (or doesn’t want to) play this role. As the DNS example shows, it doesn’t necessarily have to be operated by a single operator. And there are a variety of funding models for such essential infrastructure (see “who funds root name server operations?” in the DNS root name servers FAQ). Alternatively, applications might be charged based on how much data they get from the bus.

Corporate support can take different forms. From wireless frequencies to wi-fi networks to DNS to supporting Firefox Google has shown a willingness to support the development and operation of the internet infrastructure, confident that they’ll be in the best position to benefit from it. Especially if the alternative is what Pete Cashmore describes as “Google’s nightmare“.

You could even think of this service eventually falling under the “common carrier” model, with the corresponding legal constraints. Especially in societies that are more privacy-aware.

I don’t know what the right business/operating model is for the Open Social Data Bus. What I know is that it’s how I want my social profile data to flow between applications.

[UPDATED 2010/5/20: Some supporting evidence for my recollection of “discussions, in the early-to-mid-nineties, about whether the Internet, this quirky but fast-growing network, would turn into the expected global ‘information superhighway’ or whether a superior one would have to emerge”:

Gates’s 286-page book [The Road Ahead, 1995] mentions the World Wide Web on only four of its pages, and portrays the Internet as a subset of a much a larger “Information Superhighway.” The Internet, wrote Gates, is one of “the important precursors of the information highway,” along with PCs, CD-ROMs, phone networks, and cable systems, but “none represents the actual information highway. … today’s Internet is not the information highway I imagine, although you can think of it as the beginning of the highway.”]

4 Comments

Filed under Everything, Facebook, Google, Social networks, Tech, Twitter

Don’t tell Facebook what you like, tell Twitter

There seems to be a lot to like technically about the announcements at Facebook’s f8 conference, especially for a Semantic Web aficionado. But I won’t have anything to do with it as a user. Along with the usual “your privacy is our toy” subtext, I really don’t like the lack of data portability. “Web 2.0” is starting to look a lot like “AOL 2.0”. Here is a better way to do it.

Taking the new “like” button as a simple example, I’d much rather tell Twitter what I like than Facebook. A simple #like hashtag in a tweet can be used to express positive feelings for what the tweet describes. Here is a quick list of the many advantages of this approach over the newly-introduced Facebook “like” feature.

It’s public

Your tweets are available to all. Your Facebook profile can still consume them, so if you think Facebook does the best job at organizing this information about you and your friends you can still go there to view the results. But other applications and networks can tap into the same data, so you can also benefits from innovation coming out of companies which do not want to be Facebook sharecroppers.

It’s publicly public

By which I mean that there is no pretense of privacy and no nasty surprise when trust is violated. Which is going to happen again and again. Especially when it’s not just a matter of displaying data but also of inferring new information based on the raw data collected. At which point it’s almost impossible to segregate access to the derived information based on the privacy settings of the individual data pieces. On Twitter, it’s all public, we all know it from the start, and as such we’re not fooled into sharing more than we should. See the fallacy of privacy settings.

It works on all things

Rather than only being on a web page, you can use a #like hashtag to describe any URI (dereferenceable or not) or even plain text. Just like RDF allows the value of an attribute to be either a URI or a scalar value (string, number…). For example, you can express that you like a quote or a verse of a poem by including them directly in the tweet. It’s not as identifiable as something that has a URI, but it can still be part of your profile. And smart consumers of this data might still be able to do some processing on it (e.g. recognizing it as a line from a song).

It can still be 1-click

You don’t necessarily have to copy/paste a URL (or text) into twitter. A web site can still do this for you, as long as it has your permission to post on your behalf. With that approach, it looks exactly like the Twitter “like” button to the user. You don’t have to be a Twitter user, just to have a Twitter account. No need for a Twitter client or to visit the Twitter web site if you don’t want to. It’s also OK if you have zero followers, Twitter is just a technical conduit in this approach.

It can evolve

The success of Twitter is also the success of self-organization as illustrated by the emergence of @replies, #hashtags and RT, directly form the users. Rather than having Facebook decide what verbs make sense to allow users to express their thoughts on the Web, let people decide and see what verbs emerge (e.g. to describe what you like, dislike, are curious about, are considering buying, etc). The only thing we need is an understanding that the hashtag qualifies the user’s attitude towards what’s described by the rest of the tweet. Or maybe hashtags should not be reused for this, maybe we need a new breed, “semtags” (semantic tags), with a different syntax, e.g. “^like”. This way you can semtag a hashtag, e.g. “^like #nyc” might replace “I ♥ NY” on twitter feeds (and tee shirts). It can be as simple or as complex as needed, based on what sticks in the real world. Nerds like me will try to qualify it (e.g. “^!like” for “I don’t like”) and might even come up with ontologies (^love subClassOf ^like). These experiences will probably fail and that’s fine. Evolution strives on failures.

It is transparent

Even if you let a site write these messages on your twitter feed, you can see exactly what goes on. There is no secret channel as with Facebook. The fact that it goes on your Twitter timeline acts as a validation, ensuring that only relevant, human-readable messages get added to your profile. Which is the only way in which we can maintain control of our profile information. If sites start to send too much information or opaque information you’ll see it. And so will your followers. This will put pressure on sites to make the posted data sparse and meaningful, because they know that their users won’t want to scare away their followers with social spam. See, for example, how the outcries over foursquare spam seem to have forced a clean-up (or at least so it looks to me, but maybe it’s just because I’ve unfollowed the spammers). Keeping social profiling on a human scale is a bug, not a feature.

It is persisted in many places

Who do you think is more likely to be around in 20 years, Facebook or the Library of Congress? Tweets are archived in many places, including Twitter itself, of course, but also Google, Bing and the Library of Congress. Plus, it’s very easy for you to set up a system to save all your tweets. Even if Twitter disappears, all the data in your profile that was built from your tweets will still be around. And if Google, Bing and the Library of Congress all go dark before Facebook, well that’s fine because the profile data from your tweets can be there too.

In effect, you should think of Facebook as a repository and Twitter as a stream. Don’t publish directly to one repository. Publish to a stream and benefit from all the repositories and other consumers that tap into it. It’s a well-known enterprise integration pattern (message bus), but it’s not just good for enterprise applications.

In fact, more than Twitter itself it’s this pattern that I want to encourage. Twitter is just the most obvious implementation, at this time, of a profile data bus. It already has almost everything we need (though a more fine-grained authorization model, or a delegated authorization model, would make me more likely to allow sites to tweet on my behalf). What matters is the switch from social networks owning data to you owning your data and social networks competing on how much value they can deliver to you based on the data. For example, LinkedIn might be the best for work connections, Facebook for personal connections, Google for brute search/retrieval of information, etc. I don’t want to maintain different profile data and privacy settings for each of them. I have one global privacy settings, which controls what I share with the world. Based on this, I want these sites to compete on the value they provide to me. It may not be what Facebook wants, but if what works best for us.

If you like this proposal, you know what you have to do. Go ahead and tweet:

^like http://stage.vambenepe.com/archives/1464

Or just retweet it.

[UPDATED 2010/5/6: See the next post for some clarifications.]

10 Comments

Filed under Everything, Facebook, Google, Mashup, RDF, Semantic tech, Social networks, Twitter

The fallacy of privacy settings

Another round of “update your Facebook privacy settings right now” messages recently swept through Twitter and blogs. As also happened a few months ago, when Facebook last modified some privacy settings to better accommodate their business goals. This is borderline silly. So, once and for all, here is the rule:

Don’t put anything on any social network that you don’t want to be made public.

Don’t count on your privacy settings on the site to keep your “private” data out of the public eye. Here are the many ways in which they can fail (using Facebook as a stand-in for all the other social networks, this is not specific to Facebook):

  • You make a mistake when configuring the privacy settings
  • Facebook changes the privacy mechanisms on you during one of their privacy policy updates
  • Facebook has a security flaw that bypasses access control
  • One of you friends who has access to your private data accidentally/stupidly/maliciously shares it more widely
  • A Facebook application to which you grant access betrays your trust in accessing the data and exposing it
  • A Facebook application gets hacked
  • A Facebook application retains your data in its cache
  • Your account (or one of your friends’ account) gets hacked
  • Anonymized data that Facebook shares with researchers gets correlated back to real users
  • Some legal action (not necessarily related to you personally) results in a large amount of Facebook data (including yours) seized and exported for legal review
  • Facebook looses some backup media
  • Facebook gets acquired (or it goes out of business and its assets are sold to the highest bidder)
  • Facebook (or whoever runs their hardware) disposes of hardware without properly wiping it
  • [Added 2012/3/8] Your employer or schoold demands that you hand over your account password (or “friend” a monitor)
  • Etc…

All in all, you should not think of these privacy settings as locks protecting your data. Think of them as simply a “do not disturb” sign (or a necktie…) hanging on the knob of an unlocked door. I am not advising against using privacy settings, just against counting on them to work reliably. If you’d rather your work colleagues don’t see your holiday pictures, then set your privacy settings so they can’t see them. But if it would really bother you if they saw them, then don’t post the pictures on Facebook at all. Think of it like keeping a photo in your wallet. You get to choose who you show it to, until the day you forget your wallet in the office bathroom, or at a party, and someone opens it to find the owner. You already know this instinctively, which is why you probably wouldn’t carry photos in your wallet that shouldn’t be shown publicly. It’s the same on Facebook.

This is what was so disturbing about the Buzz/GMail privacy fiasco. It took data (your list of GMail contacts) that was not created for the purpose of sharing it with anyone, and turned this into profile data in a social network. People who signed up for GMail didn’t sign up for a social network, they signed up for a Web-based email. What Google wants, on the other hand, is a large social network like Facebook, so it tried to make GMail into one by auto-following GMail contacts in your Buzz profile. It’s as if your insurance company suddenly decided it wanted to enter the social networking business and announced one day that you were now “friends” with all their customers who share the same medical condition. And will you please log in and update your privacy settings if you have a problem with that, you backward-looking, privacy-hugging, profit-dissipating idiot.

On the other hand, that’s one thing I like about Twitter. By and large (except for the few people who lock their accounts) almost all the information you put in Twitter is expected to be public. There is no misrepresentation, confusion or surprise. I don’t consider this lack of configurable privacy as a sign that Twitter doesn’t respect the privacy of its users. To the contrary, I almost see this as the most privacy-friendly approach: make it clear that everything is public. Because it is anyway.

One could almost make a counter-intuitive case that providing privacy settings is anti-privacy because it gives an unwarranted sense of security and nudges users towards providing more private data than they otherwise would. At least if the policy settings are not contractual (can you sue Facebook for changing its privacy terms on you?). At least it’s been working that way so far for Facebook, intentionally of not, as illustrated by all the articles that stress the importance of setting our privacy settings right (implicit message: it’s ok to put private information as long as you set  privacy settings).

Yes you should have clear privacy settings. But the place to store them is in your brain and the place to enforce them is by controlling what your fingers do before data gets on Facebook. Facebook and similar networks can only leak data that they posses. A lot of that data comes from you directly uploading it. And that’s the point where you have control. After this, you really don’t. Other data comes from tracking and analyzing your activities and connections, without explicit data upload from you. That’s a lot harder for you to control (you rarely even get asked for your privacy preferences on this data), but that’s out of scope for this blog entry.

Just like banks that are too big to fail are too big to exist, data that is too sensitive to leak from Facebook is too sensitive to be on Facebook.

5 Comments

Filed under Everything, Facebook, Google, Off-topic, Security, Social networks, Twitter

There should be a word for this (Blog/Twitter edition) part 2

Back in October (see “there should be a word for this” part 1) I listed a few concepts (related to twitter and/or blogging) for which new words were needed. Since it’s such a rich field, I barely scratched the surface. Here is the second installment.

#9 The temptation to repeat a brilliant tweet of yours that went unnoticed when you expected a RT storm in response (maybe it was a bad time of the day when everyone was offline? maybe it fell in a twitter mini-outage?)

#10 The new pair of eyes you get the second after you post a tweet.

#11 The act of sharing (e.g. via delicious…) or RTing a URL to an article you haven’t actually read (but you think it makes you look smart). For example, I’d love to give a test to everyone who RTed this entry.

#12 The shock of seeing a delivery error when DMing someone you were positive was following you (this is related to definition #1 from part 1, so Shlomo’s followimp could apply).

#13 The minimum number of people to follow on twitter, of blog feeds to subscribe to and of Facebook friends to have such that you can cycles through all three continuous and never run out of new content. In the TV world, the equivalent would be the minimum number of cable channels needed to cycle through them and never feel like you’ve established that there is nothing worth watching.

#14 The awful feeling when the twitter/blog/facebook cycle from #13 breaks on a Friday night because others have a life.

#15 When a twitter conversation has reached a dead-end because of the short form. When the response you get makes you wonder what the other person understood from your last tweet. But forcing a clarification would take a half-dozen tweets at least and risk turning you into a twoll (another coinage for the twitter era, by Andi Mann).

#16 The compression rate of a sentence: how hard it is to further compress it (e.g. in order to squeeze in an RT comment), whether all the easy shortcuts have been taken already.

Please submit your candidate terms for these definitions.

[UPDATED 2010/8/12: there is now a part 3.]

9 Comments

Filed under Everything, Media, Off-topic, Social networks, Twitter

Expanding on “twitter with a brain”

Chuck Shotton recently made a compelling case (“Twitter with a Brain“) for Twitter tools to allow the user to change the protocol endpoint. That is, instead of always going to twitter.com, you can tell your Twitter client to send all requests to myTweetInterceptor.me.com. Why would you do this? You should read his blog entry, but in short his point is that the intermediary can add all kinds of new features that neither the Twitter client nor Twitter itself support. As always in computer science, a new level of decoupling adds opportunities for extensions (and breakage too, of course).

I fully agree with what he writes and I would very much like to see his call to action answered. In fact, I want more than what he is asking for. So here is my call to action:

1) It’s not just Twitter

Why just Twitter? This should be true for any client using any protocol. Why not also the APIs for the various Google and Yahoo services? The APIs for the other social networks beyond Twitter? For shopping sites like Amazon and EBay? Etc. And of course to all the various Cloud providers out there. Just because I am using the Amazon EC2 API it doesn’t mean I necessarily want the requests to go straight to Amazon. Client tools should always make the endpoint configurable, period.

2) It’s not just the clients, it’s also (and especially) the third party sites

Chuck’s examples are about features that the Twitter clients could provide but don’t, so an intermediary would be an easy way to hack support for them (others presumably include modifying the client – if open source -, writing a plug-in for it – it there is such mechanism -, or running a network interceptor on the local client – unless the protocol is encrypted-).

That’s nice and I’d love to see this, but the big deal for me is less with clients and more with third party sites. You know, all these sites that ask for your Twitter login/password. Or those that ask for your GMail/Yahoo account info to retrieve a list of your contacts. I never grant these requests, but I would consider it if they allowed me to tell them what endpoint URL to use. For example, rather than using my Twitter login to go straight to twitter.com, they would use a login/password that I create and talk to twitterIntermediary.vambenepe.com. The requests would be in the exact same shape as what they send today to Twitter, just directed to another URL. There, I could have a proxy that only allows some requests (e.g. “update twitter background image” but not “send update”) and forwards them using my real Twitter credentials. Or, for email accounts, I could have a proxy that allows requests that read my address book but not those that read my mails. The goal here is not to add features, it is to delegate trust in a fine-grained (and audited) manner. This, to me, is the burning need, rather than a 3rd place to implement Twitter lists.

I would probably write these proxies using a PaaS platform like the Google App Engine. Or maybe even Yahoo Pipes. I have long struggled to think of use cases for which Yahoo Pipes hits the sweetspot, and this may well be it. Especially if people write modules to handle specific APIs (e.g. a “Twitter API” module that shows all operations and lets you enable/disable them one by one in a pipe). The one thing missing would be a way for a pipe to keep a log of its invocations, for auditing.

You want access to my email and social network accounts? Give me the ability to filter you requests and you’ll get access. If it’s blind trust you want, I am afraid I have a very limited supply.

[Note: I wanted to add this as a comment on Chuck’s blog, but he doesn’t seem to allow them: “go start your own blog and/or shut up and eat your vegetables” is his recommendation. Since I already have my own blog, I guess I don’t have to eat my vegetables if I don’t want to. I just hope my kids don’t learn about this rule or they’ll be blogging in no time.]

[UPDATED 2009/11/30: WRT to Chuck’s request, it looks like it’s being done already. But no luck with the third party sites so far, which is what I most want to see.]

6 Comments

Filed under Automation, Everything, Google App Engine, Implementation, Mashup, PaaS, Portability, Protocols, Security, Social networks, Twitter, Yahoo

There should be a word for this (Blog/Twitter edition)

I enjoyed finishing reading The Atlantic with Barbara Wallraff’s “Word Fugitives” column every month. Until earlier this year, when it was replaced  with Jeffrey Goldberg’s attempts at humor. For old time sake, I am borrowing the “Word Fugitive” format and applying it to the world of blogs and tweets. Here is a list of blog/twitter situations for which “there should be a word”.

#1 The ego-crushing realization, in the course of a face to face conversation covering topics you’ve written about, that the other person has not read your blog/tweets on this. Even though the first thing they told you when you met 10 minutes earlier is that they love your blog.

Candidate: followimp (from Shlomo).

#2 Conversely when someone brings up in the conversation something you wrote and had forgotten you did (maybe we need two words here, one if you are happy to be reminded of this and one if you’d rather not have been).

Candidates: twegreat and twegrets, respectively (from Shlomo).

#3 Seeing the corner of the blogo-twitto-sphere where you hang out light up in response to someone’s post even though you wrote up the same thing two years ago. At least you were trying to explain the same thing, but your brilliance went unnoticed.

Candidate: deja-lu.

#4 The frustrating (for system modelers at least) intermixing of data (your text) and metadata (e.g. the identification of the tweet you are responding to) in Tweeter conversations.

Candidate: metamess.

#5 (This one comes from @Beaker) The art of carving up tweets from others to be able to retweet them in 140 characters.

Hoff has a suggestion: Twexter (Twitter + Dexter).

#6 The art of guessing early the Twitter #hashtag that will emerge as a winner for a given topic.

Candidate: foretweetude.

#7 The frustration of having too many blog drafts and no time to write them up.

Candidate: blocrastination. And Neil WD offered logjam in the comments.

#8 (added on 2009/10/22 after seeing this) The feeling of nakedness one has while his/her blog is offline.

Candidate: e-vanescence.

[UPDATED 2010/3/8: See part 2 for more.]

[UPDATED 2010/8/12: And part 3.]

6 Comments

Filed under Everything, Media, Off-topic, Social networks, Twitter

On Twitter

I created the @vambenepe Twitter account a while ago to reserve the username. Yesterday I posted three tweets, so I guess I am now “on Twitter”, in case anybody cares. We’ll see where this goes. @jamesurquhart gave me a kind (but intimidating) welcome and @Beaker hasn’t called me a “jackass” yet, so things are looking good. BTW, is it just me or has Cisco assembled a top-notch good cop / bad cop team? I hope I manage my blog-to-twitter expansion as well as they did.

The Cloud stuff is where the fun is, but if this Twitter thing is going to be of any use for real work I need to find who to follow in the IT management, application management and systems modeling areas. Any suggestion beyond @cote, @MouthOfOpenNMS, @dmcclure, @puppetmasterd and @theitskeptic (I feel like I am just Twitterifying my blogroll)?

And even then, finding people to follow seems to be the easy part. It took me about 20 minutes last night to realize that I am not going to read all the tweets (and I currently only follow 18 people). Worst case I’ll just track the direct mentions of my handle and some occasional hastags during interesting announcements. And scan the rest once a week. I assume that’s what the Twitter natives like @cote do as well (I seeded my list by picking names I recognized from his 1,130-long follow list). Advice?

The other issue is the 140 characters limit of course, but this should be easier to get used to. In the Apple/Palm tweet last night (about how this might show us what enforcement options standard bodies have) I wanted to invoke Stalin’s dismissive “The Pope! How many divisions has he got?” quote by replacing the pope with the USB Implementers Forum (USB-IF). But no room left unless I sacrificed the image of a working group chair breaking the knee cap of an offending implementer (which, as an ex-WG chair myself, I see some upside to).

Is it bad form to post multi-part tweets? How about, say, 50 parts? I need a protocol to guarantee delivery and order on top of the Twitter API. Maybe REST-* can help me… ;-)

I also wanted to ping Andy Updegrove with the hope that he’d comment on the USB-IF letter (he has looked at the iPhone before, but not this specific issue) for an authoritative opinion. But he doesn’t seem to be on Twitter. The nerve!

And then there is the “follower” thing, which I guess I am now supposed to start obsessing about (folks, if I don’t have a hundred followers by week end the kitten gets it).

In the real world, there are a few people who return my emails and occasionally agree to have lunch with me, but that’s a far cry from calling them “followers”. Even my wife would spit her coffee if I referred to her as my “follower”. But on Twitter, I just posted three tweets yesterday and I already feel like a religious guru with my 24 “followers”.

Jokes aside (on the cult-leader overtones of the word “follower”), the fact that these people are identified is a nice improvement over blog subscribers (who, to me, are just an occasional number within the user-agent field in my Apache httpd logs), at least until they comment/email. Nice to “see” you.

One more step in the slippery slope towards total egomania. Blog > Twitter > Live webcam of the inside of my stomach.

Comments Off on On Twitter

Filed under Everything, Media, Off-topic, Social networks, Twitter