Agriculture Department and Census Bureau to the rescue

An article in today’s New York Times reports that “the Social Security numbers of tens of thousands of people who received loans or other financial assistance from two Agriculture Department programs were disclosed for years in a publicly available database”.

Almost there folks! But tens of thousands is not enough, we need to cover everyone. The simplest effective way to dent the “identity-theft” (or more exactly “impersonation”) wave is to go beyond this first step and publish on a publicly accessible web site all social security numbers ever issued and the associated names. And get rid once and for all of the hypocritical assumption that SSN have any authentication value. We need a reliable authentication infrastructure (either publicly-run as a government service or privately-run, that’s a topic for another day) and this SSN-based comedy is preventing its emergence by giving credit issuers (and others) a cheap and easy way to pretend that they have authenticated their customers.

Over the last couple of years, I have received two alerts that my SSN and other data have been “compromised” (one when Fidelity lost a laptop containing data about everyone enrolled in HP’s retirement plan and one from a university) and my wife has received three. Doesn’t this sound like a bad joke going on for too long (and I should know about bad jokes going on for too long, they are my specialty)? And of course this doesn’t count the thousands of employees at dentist, medical offices, and many other businesses that have at some point had access to my data (and anybody else’s).

So, to the IT people at the Census Bureau I say “keep going”! But of course that’s not the reaction they had. The rest of the NY Times articles goes on with the usual hypocritical (or uninformed) lamentations about putting people’s identities at risk. “We took swift action when this was brought to our attention, and took the information down.” says an Agriculture Department spokeswoman. And of course there is the usual “credit report monitoring” offer (allowing the credit report agencies to benefit from both sides of the SSN-for-authentication debacle). Oblivious to the reality even though it manifests itself further down in the article: “The database […] is used by many federal and state agencies, by researchers, by journalists and by other private citizens to track government spending. Thousands of copies of the database exist.”

Another quote from the article: “Federal agencies are under strict obligations to limit the use of Social Security numbers as an identifier”. The SSN is a fine identifier. It’s using it as a mean of authentication that’s the problem.

[UPDATE] This is now a Slashdot thread. The comments are pouring in. Some get it (like here, and here). This one seems to get it too but then goes on to advocate dismantling the social security system which at this point is only connected by name to the issue at hand.

[UPDATED 2008/7/2: Sigh, sigh and more sigh while reading this article. The cat is so far out of the bag that a colony of mice has taken residency in it. The goal shouldn’t be to try to make the SSN hard to get, it should be to make it useless to criminals. That approach isn’t even mentioned in the article.]


PowerPoint abuse

The National Security Archive is not a government organization even though its name may sound like one, but a research institute hosted by The George Washington University. The Archive published today a copy of “CentCom PowerPoint Slides Briefed to White House and Rumsfeld in 2002, Obtained by National Security Archive through Freedom of Information Act“. A very interesting read, but commenting on the meat of this is way off-topic for this blog (I have a “off-topic” category, but not a “way off-topic” one). One side aspect is only a little bit off-topic though, so I’ll indulge myself: it’s about this reflection on the use of PowerPoint, by Lt. Gen. McKiernan as quoted in Thomas Ricks’ book Fiasco:

It’s quite frustrating the way this works, but the way we do things nowadays is combatant commanders brief their products in PowerPoint up in Washington to OSD and Secretary of Defense… In lieu of an order, or a frag [fragmentary] order, or plan, you get a set of PowerPoint slides… [T]hat is frustrating, because nobody wants to plan against PowerPoint slides.

It’s an old debate whether PowerPoint is a mostly good tool for presentations (that is often misused) or basically a crappy tool for presentations (if you’re in the Bay Area I can lend you the Tufte essay). I generally tend to fall towards the former view. But what should not be a matter of debate is whether a PowerPoint document is a good communication vehicle on its own (rather than as support for a presentation). I very much agree with Lt. Gen. McKiernan that it is definitely not. And by only changing a few words I could turn his quote into one that describes some interactions in several software companies I know of, including my employer. And I would guess non-software companies too, there is no reason why this would be limited to the software industry and the military.


Too late.

A larger (smoke) screen

After the 12% temperature rise, I recently ran into another creative use of percentages. Since I expect to run into many more of these (based on how many I’ve noticed in the past) and since they’re fun to point out I’ve created a new CrazyStats category.

This instance comes from a print advertisement for Samsung TVs, stating that their TVs with a 16:10 aspect ratio offer 30% more viewing surface than a 4:3 TV. Sorry, I don’t have a link but this advertisement (for computer monitors instead of TVs) repeats the “larger than 4:3 monitor” claim several times, albeit without quantifying it. This comparison makes no sense until you fix one dimension. And obviously it is to the advantage of the 16:10 screens to fix the height as being common between the two screens and then compare the surface (but even then, you only get a 20% advantage for the 16:10 compared to the 4:3, I don’t know how they came up with 30%). But if you fix the width as being the same then it’s the 4:3 that offers 20% more viewing surface…

Not that I don’t agree that 16:10 is a more useful aspect ratio (that’s what I bought for my monitor at home). But the “larger than 4:3” claim is meaningless. Next thing you know, people will start marketing 4:3 monitors as “16:12” to make them seem “bigger” than 16:10 monitors.

Commedia dell (stand)arte

There seems to be a micro-culture of people involved in internet standards. If you measure a micro-culture by the number of its private jokes, then this is definitely one. And there are other signs. A while back, I wrote about the mnot standard geek index. Now Umit has captured the essence of standards interactions in verses. And, according to Umit’s blog post, Jonathan is the Claude Levi-Strauss of this culture (I wasn’t at this presentation, Jonathan please send me the slides if you won’t post them).

One aspect that Umit doesn’t cover in her poems, is the always-entertaining issue of naming things. I don’t have her talent for verses, so here is in plain terms what the discussion often sounds like:

Bob: I think we need to be able to (…). And the best way to do it is by adding an element. I’ll call it Foo for now in my description of how it would work, but I don’t care how we end up calling it as long as the feature is supported.
(30 minutes of discussions about element Foo and how it works, which ends with an agreement)
Chairperson: Great, so we agree to add element Foo.
Alice: Yes, but we need another name. I am not one to argue about names but, Foo makes it sound like (…). We should call it Bar.
Bob: No, we can’t call it Bar, people would think it is used for (…). I don’t really care about names either, but in this case Foo is the best name.
(4 hours of discussions about the name, that end up with a resolution that will be overturned a couple of times before the spec is completed)

If you think I am exaggerating, I know of a set of patterns for which we had all agreed on the definitions but we could not agree on the names. Since there happened to be 7 of them, we almost ended up naming them after the 7 dwarfs as a tie-breaker. Those are the WSDL 2.0 message exchange patterns. And in retrospect it’s good that we didn’t go with the dwarfs since an eighth one was later added (after I left the group). They now have names that sound like they come out of the Kama Sutra: In-Only, Robust In-Only, In-Out, In-Optional-Out, Out-Only, Robust Out-Only, Out-In, Out-Optional-In.

Harlequin and Pantalone would be proud.


Too hot to count

Here is another one to file in the “lies, damn lies and statistics” category: an article dated yesterday titled “Dutch bask in warmest autumn in three centuries” that starts with “The autumn of 2006 has been the warmest in the Netherlands for over 300 years, 12.5 percent hotter than the previous year which was already a record, meteorologists said.” We find out later where this 12.5% comes from: “The average temperature for the months leading up to November 17 was up to 13.5 degrees (56 degrees F), as compared to 12 degrees last year.” Except that such percentages don’t make much sense when applied to units that have an arbitrary zero. The same calculation using Fahrenheit degrees results in only a 5% temperature rise. Use the Kelvin scale and you’re down to a paltry 0.5% rise. Now imagine that a city has a 0.5C average one year and 1.5C average the next year. That’s a 200% increase in temperature! And if you live in a place that at some point gets a 0 degree temperature average, I would recommend moving out before the next year because you’re very likely to experience a terrifying *infinite* rise (or decrease) in temperature the following year!

This article comes from Agence France-Presse. And the French education system is criticized for putting too much emphasis on Mathematics…

Working backwards

Werner Vogels describes Amazon’s approach to product definition as working backwards, starting with, in order, the press release, the FAQ, a definition of the customer experience and the user manual(s). A few comments:

  • When I was an R&D manager we didn’t go as far as starting from the press release. Well, to be honest none of the projects I managed was big enough to get its own press release anyway (things are different now that some of my work is on multi-company standards efforts where press releases are cheap). But we did write the user manual first in some cases. I can testify that in addition to providing a lot of clarity for the development team it also results in much better user manuals. Because they are written based on what the thing does, rather than based on how the thing is implemented as is often the case with after-the-fact user manual.
  • When you do the way Werner Vogels describes, the FAQ is more a list of “expected questions” than a real list of “frequently asked questions” but that still beats 80% of FAQs out there that are lists of “questions we’d like you to ask”.
  • This kind of reminds me of the French approach to dating. Starting from the end and working your way back to small talk.

Registry or not?

I recently had a meeting with people who practically could not imagine a form of discovery that didn’t involve a god-like central registry. Notifications, peer to peer relationships were heretic ideas on this call. Of course registries are good. And repositories are even better. But a registry is not the only way to discover services and it shouldn’t be. The delicious irony is that the meeting used NetMeeting and that we spent the first 5 minutes of the call repeating the IP address of the person hosting the NetMeeting to every single new participant upon joining. Instead of simply using the registry that was available (NetMeeting’s directory).

The mnot standard geek index

Sometimes Amazon scares me. Last night I was browsing the site looking at some novels (nothing whatsoever to do with technology) and here is what I see on the left side bar: a suggestion for an advice list called “So you’d like to… be a standards geek” by an Amazon user called mnotting who of course turns out to be Mark Nottingham. The scary part is that I know for a fact that I wasn’t logged on the Amazon site and there was no Amazon cookie on my disk. So either this was a complete (and unlikely) coincidence or Amazon uses the not-so-dynamic IP address provided by my DSL provider to try to recognize me. And even then, my Amazon profile clearly flags me as someone interested in technology among other things, but I don’t see how it would flag me as a standards person unless it reads my email…

In any case, this tempted me to measure my level of standard geekiness and the result is that I rank a 3 out of 8. To get to this ranking, I only looked at the list of books. I ignored the travel gadgets such as battery chargers and cell phones because there are so many of these that the chance of having a match is pretty slim (my personal recommendation for those who work a lot in airplanes is a Tablet PC).

So, focusing on the books, my three points on the mnot standards geek index come from:

  • Machiavelli’s “The Prince”. I read it in French but I assume it still counts.
  • Robert’s Rules of Order. I can’t say I read every single page but I’ve browsed it enough to know where to looks for things. I received my copy (in a different edition than Mark’s) from the hands of OASIS’ Jamie Clark when the WS-Notification Technical Committee was created that I co-chair with Peter Niblett.
  • TBL’s “Weaving the Web” of which I talked in a previous blog entry (BTW Mark you might want to check the URL you provided for this book in your list, it is incorrect and causes Amazon to not list this book in the recap of all products you recommend).

I don’t really know what to think of my score of 3/8. So I’d like the other standards geeks out there (Chris, DaveC, DaveO, Glen, Jeff, Marc, Mark, Gudge, Sanjiva, Jorgen, Tom and many others) to take the test and report their results so I know how serious my case is.


IBM roll (of eyes)

Maybe the reason why IBM is getting out of the PC business is to focus on the next growth opportunity: sushi. This week, the WSDM face to face meeting was hosted by IBM in Boulder, CO. We had lunch at the cafeteria there and discovered that they sell an “IBM roll” in the sushi stand (Richard if the picture with you camera phone works out please email it to me). In case you are wondering, an “IBM” roll is made of eel, cucumber and avocado. And yes, it is the most expensive of all rolls ($3.49 for 4 instead of $3.19). Insert your own joke about the sushi coming with high-priced IBM consultants to help with the chopsticks…

[UPDATE: Turns out the IBM cafeteria is not the only place where you can get an “IBM roll”. If you are a visitor without a knowledgeable host in the Bay Area you might end up at Miyake in Palo Alto where you’ll see floating boats boasting Miyake’s version of the “IBM roll”. Not sure if this is coincidence or not, but the composition is actually very similar to the one found in the IBM cafeteria: “unagi, crab stick, avocado, cucumber”. They just add crab (and bad music I was told).]

Natural selection favors Tablet PC owners

When I got my Tablet PC (an HP tc1100) I was interested in the form factor, the ability to take hand-written notes and the ease of reviewing documents in cramped spaces, like an airplane seat. But I just learned that the fact that the keyboard stays cold (because the disk and the CPU are behind the LCD screen) has an additional advantage I hadn’t thought of.

Feeds and feedback

I got a few requests for syndication URLs for this blog, so here is where you can find them: Four of them to choose from! But I have to agree that one URL that can be found on the page beats four that can’t be found… No good reason why this wasn’t on the page by default and it should be fixed soon. In the meantime, you now know where to find them if you hadn’t yet guessed (like many did and Bloglines seems to do by default) that adding “rss.xml”, “rdf.xml” or “atom.xml” at the end of the blog URL was worth a try.

The sad thing is that there wasn’t really a way to let me know of that problem (lack of feed URL) either since this blog doesn’t currently support comments (should be fixed soon) and doesn’t even provide my email address. Not that my address is hard to find on Google since my last name is not common. But, in the interest of rich metadata it should still be available on this blog (doesn’t this fit well in the much-quoted discussion between Adam “just Google it” Bosworth and Marc “give me metadata or give me death” Canter?). So, until it is available permanently on the page, here is where to send feedback:

The paint is still fresh…

[UPDATE: the blog has since moved and these URLs are not correct anymore. The RSS feed is now at]

32K ought to be enough for everyone

A couple weeks back I had to spend time moving my Outlook local folder file (the *.pst) from where it was by default (somewhere like “c:documents and settingsusernamelocal settingsapplication datamicrosoftoutlook”) to “c:mail”. And I also had to rename my rules (e.g. from something like “WSDM mailing list” to “w1”). And to merge several rules into one (so for example now all emails from go to the same “WS-I” folder, I can’t have one subfolder per working group anymore). That took some time (both to figure out what I needed to do and to make the changes) and left me less productive than before (non-descriptive rule names make them harder to manage and I have lost some granularity in the filtering by consolidating filters). Aren’t outlook rules supposed to make you more productive rather than less?

So why all the trouble? Simply because Exchange says all your rules have to fit in 32K. So it’s ok to have a endless signatures with quotes form other people (that somehow prove that you’re smart) or contact info for your kid’s favorite party clown. But rules, despite being vital to managing the flood of incoming email when you subscribe to several mailing lists, only get 32K.

The most infuriating aspect is that I can’t figure why that is. The rules I use are stored on the server but executed on the client. Clearly it can’t be a matter of storage space on the server. It stores dozen of megabytes of email for me. Turning 32K into 1MB would make little difference. And I’d be happy to settle for a tiny bit less email space for some badly needed rule storage space. It can’t be because of computing resources to execute the rules either. They run on my client machine, not on the server. And my 32K turn into less than 20 rules. Surely, 20 simple rules (the typical rule is “if this comes from mailing list foo put it in the foo folder”) can’t overwhelm my machine. And if they do, let me decide whether it’s worth it to me or not.

Of course this is all the fault of the WS-Addressing WG. I had postponed making the needed changes because of lack of time, but the crazy traffic on the Ws-Addressing mailing list forced me to make room for another filter. So emails could be properly dispatched to the right folder. Ironically, this is for an addressing specification. My take-away is that if Microsoft is only going to give us 32K to dispatch SOAP messages with WS-Addressing header (like they decided to do for email) then I don’t understand why they are so fond of reference properties and reference parameters. Hopefully Don won’t let the Exchange architects anywhere near Indigo. ;-)

