« Previous entry | Home | Next entry »

AOL Research exposes data; we've got a little sick feeling

(AOL has responded, saying they screwed up, and have taken the data down. More at update here).

AOL%20Research.jpg

Here are some excerpts from a post from Adam D'Angelo, over at CalTech, about AOL Research's efforts to engage with the research community. Does anyone else think they've gone over the line with this?

AOL just released the logs of all searches done by 500,000 of their users over the course of three months earlier this year. That means that if you happened to be randomly chosen as one of these users, everything you searched for from March to May (2006) is now public information on the internet.

...The data is "anonymized", which to AOL means that each screenname was replaced with a unique number. "It is still a research question how much information needs to be anonymized to protect users," says Abdur from AOL. Here are some examples of what you can find in the data:

Among user 545605's searches are "shore hills park mays landing nj", "frank william sindoni md", "ceramic ashtrays", "transfer money to china", and "capital gains on sale of house"....I'm leaving out the worst of it - searches for names of specific people, addresses, telephone numbers, illegal drugs, and more. There is no question that law enforcement, employers, or friends could figure out who some of these people are....I hope others can find more examples in the data, which is up for download over here (scroll down to the 500Kusers.tgz file).

If you go to the site, there's a person even thanking AOL for this info in comments. We haven't looked at this very closely yet, and haven't talked with AOL. But so far, we're cringing.


Trackbacks
TrackBack URL for this entry:
http://www.siliconbeat.com/cgi-bin/mt331/mt-tb.cgi/1831

Links to blogs that reference this entry:

From: UMBC eBiquity
Does AOL’s search data compromise privacy?
Excerpt: ...
Tracked: August 6, 2006 5:41 PM
From: A Day in the Life of an Information Security Investigator
AOL Blows It: Releases Search Data on 500,000 Users!
Excerpt: AOL, what the %#$@$@ were you thinking? You provide a 440MB file of search queries from 500,000 of your customers for anyone to download? Your idea of 'de-identifying' the data is to replace the screen name with an arbitrary number?...
Tracked: August 6, 2006 10:38 PM
From: fredshouse
AOL discloses 650,000 AOL users' search data
Excerpt: Well this isn't going to help AOL's image. Over the weekend, AOL researchers posted a 400MB+ tarball of the raw search query data of some 650K AOL users over the period from March 1, 2006 to May 30, 2006. While...
Tracked: August 7, 2006 1:23 AM
From: SiliconBeat
AOL responds to data leak. They screwed up.
Excerpt: John Battelle has gotten an early response from AOL about the data leak that we posted about early yesterday. Here's the summary: This was a screw up, and we're angry and upset about it. It was an innocent enough attempt to reach out to the academic c...
Tracked: August 7, 2006 8:52 AM
From: Zoli's Blog
AOL Just Did the Unthinkable - Boycott AOL?
Excerpt:

(Updated)
Thank you, Google for resisting the DOJ's effort to obtain user search data. You put up a good fight to protect our privacy, and Tracked: August 7, 2006 11:46 AM

From: Platinax News
AOL’s huge data blunder
Excerpt: SPECIAL REPORT AOL have released a big chunk of user data to the internet in a huge blunder. The data was a record of 20 million searches on the AOL search engine, carried out by 650,000 AOL users over March to May of this year. The data was pr...
Tracked: August 8, 2006 1:23 PM
From: Research
Research
Excerpt: The network promotes synthesis and comparative Search through our extensive index.JupiterResearch provides unbiased rese...
Tracked: August 9, 2006 4:08 PM

Comments

http://research.microsoft.com/ur/us/fundingopps/RFPs/Search_2006_RFP.aspx

a few months ago - Microsoft lauched an analogous project

Search Engines WEB on August 6, 2006 9:31 PM
Comment link

I dare you to compare this to the HIPAA standard of "de-identified" information that health organizations are now using as the standard to release data.

breakingranks on August 6, 2006 10:59 PM
Comment link

if you don't want to download 2 gigs and grep your way through, here's a site that'll let you search from a database: http://www.aolsearchdatabase.com .

daniel on August 8, 2006 12:29 AM
Comment link

search engine proxies have been around for a least a few years. Why dont people start using them?

heres a free one. http://www.blackboxsearch.com

Bob on August 9, 2006 8:13 PM
Comment link
Post a comment












Remember personal info?