Breaking the Netflix Prize dataset

Netflix data

Hell, this is good work. In October last year, Netflix released over 100 million movie ratings made by 500,000 subscribers to their online DVD rental service. The company then offered a prize of $1million to anyone who could better the company’s system of DVD recommendation by 10 per cent or more.

Of course, Netflix assured everybody that the data had been anonymized by removing any personal details.

That turns out to have been a tad optimistic. Arvind Narayanan and Vitaly Shmatikov at the the University of Texas at Austin have just de-anonymized it.

Here’s how: turns out that an individual’s set of ratings and the dates on which they were made are pretty unique, particularly if the ratings involve films outside the most popular 100 movies. So it’s straightforward to find a match by comparing the anonymized data against publicly available ratings on the Internet Movie Database (IMDb).

That’s exactly what Narayanan and Shmatikov have done. And get this, once the match is made, it immediately links the user to the any private ratings on the Netflix database.

“Given a user’s public IMDb ratings, which the user posted voluntarily to selectively reveal some of his (or her; but we’ll use the male pronoun without loss of generality) movie likes and dislikes, we discover all the ratings that he entered privately into the Netflix system, presumably expecting that they will remain private.”

So what, I hear ya ask.

Here’s what the dynamic duo have to say about one person whose data they outed:

First, we can immediately find his political orientation based on his strong opinions about “Power and Terror: Noam Chomsky in Our Times” and “Fahrenheit 9/11.” Strong guesses about his religious views can be made based on his ratings on “Jesus of Nazareth” and “The Gospel of John”. He did not like “Super Size Me” at all; perhaps this implies something about his physical size? Both items that we found with predominantly gay themes, “Bent” and “Queer as folk” were rated one star out of five. He is a cultish follower of “Mystery Science Theater 3000”. This is far from all we found about this one person, but having made our point, we will spare the reader further lurid details. “

So Netflix may have inadvertently revealed the political affiliation, sexual orientation, BMI and God-knows-what else of 500,00 of their subscribers. Way to go!

Next up the mobile phone datasets we talked about a coupla weeks back

Ref: arxiv.org/abs/cs/0610105 : Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset)

62 Responses to “Breaking the Netflix Prize dataset”

  1. It is an superior material and I am certain that it would have taken a lot of your very good time to generate such an exclusive and unique publish. Follows your blog site and will be practical if you may supply additional regular and organized blogs tips for newbies like us.

  2. I cling on to listening in the direction of rumor speak about obtaining boundless internet dependent grant programs so i’ve been looking for near to for the best internet site to obtain one. Could you reveal to me please, especially where could i arrive around some?

  3. Dear Arxivblog,
    Speaking of which, If someone advised me they were heading to model a car, my number one query would be “what sort of automotive”? There is no generic vehicle, there are sports cars, saloon cars and trucks, hatchbacks and commuter vehicles. The very same goes for web sites. And if you are pondering of creating your personal page you demand to make your self aware of the type of online websites that are on the word wide web and the form of material and articles and other content they represent.
    Great Job!

  4. I’ve been hunting for precisely this article. Even though I was wanting to find this content I am definitely impressed that your particular experience was so easy to locate and just how it properly compared to my own , personal experience.

  5. dus iphone says:

    Hi my family member! I want to say that this post is amazing, nice written and come with approximately all important infos. I would like to peer more posts like this .

  6. Jay Owens says:

    Hey there awesome site the one thing I saw is that you aren’t monetizing your keywords. Personally I have ninja affiliate plugin for all my blogs. Check it out at ninja-affiliate.com

  7. [...] They go on to explain how they did it. [...]

  8. This design is spectacular! You most certainly know how to keep a reader entertained. Between your wit and your videos, I was almost moved to start my own blog (well, almost…HaHa!) Great job. I really loved what you had to say, and more than that, how you presented it. Too cool!| flight deals india http://www.funtraveldeals.com/book-flights/

  9. Hey would you mind letting me know which webhost you’re working with? I’ve loaded your blog in 3 completely different internet browsers and I must say this blog loads a lot faster then most. Can you suggest a good hosting provider at a reasonable price? Many thanks, I appreciate it!| fotograf de nunta http://madvideo.ro/fotograf-de-nunta/

  10. Google sucks says:

    Thanks for writing this fabulous article..Loved your content articles. Be sure to do hold writing Google sucks http://asdasghdsdfs5hf.com

  11. Woah this blog is great i love studying your posts. Keep up the good paintings! You understand, lots of persons are looking round for this info, you could help them greatly.

  12. I constantly emailed this web site post page to all my friends, because if like to read it then my contacts will
    too.

    Feel free to surf to my web site :: haare dünner

Leave a Reply