{"id":195,"date":"2007-12-31T00:36:41","date_gmt":"2007-12-31T05:36:41","guid":{"rendered":"http:\/\/arxivblog.com\/?p=195"},"modified":"2007-12-31T00:38:34","modified_gmt":"2007-12-31T05:38:34","slug":"the-top-posts-of-2007-number-2","status":"publish","type":"post","link":"http:\/\/arxivblog.com\/?p=195","title":{"rendered":"The top posts of 2007: number 2"},"content":{"rendered":"<p>Over the holiday period, the physics arxiv blog is re-running the most popular blogs (by page views) of 2007.<\/p>\n<blockquote><p><a href=\"http:\/\/arxivblog.com\/\/?p=142\">Breaking the Netflix prize dataset<\/a><br \/>\n27 November<\/p>\n<p><a href=\"http:\/\/arxivblog.com\/wp-content\/uploads\/2007\/11\/netflix.jpg\" title=\"Netflix data\"><img decoding=\"async\" src=\"http:\/\/arxivblog.com\/wp-content\/uploads\/2007\/11\/netflix.thumbnail.jpg\" alt=\"Netflix data\" \/><\/a><\/p>\n<p>Hell, this is good work. In October last year, Netflix released over 100 million movie ratings made by 500,000 subscribers to their online DVD rental service. The company then offered <a href=\"http:\/\/www.netflixprize.com\/\">a prize of $1million<\/a> to anyone who could better the company\u2019s system of DVD recommendation by 10 per cent or more.<\/p>\n<p>Of course, Netflix assured everybody that the data had been anonymized by removing any personal  details.<\/p>\n<p>That turns out to have been a tad optimistic. Arvind Narayanan and Vitaly Shmatikov at the the University of Texas at Austin have just de-anonymized it.<\/p>\n<p>Here\u2019s how: turns out that an individual\u2019s set of ratings and the dates on which they were made are pretty unique, particularly if the ratings involve films outside the most popular 100 movies. So it\u2019s straightforward to find a match by comparing the anonymized data against publicly available ratings on the Internet Movie Database (IMDb).<\/p>\n<p>That\u2019s exactly what Narayanan and Shmatikov have done. And get this, once the match is made, it immediately links the user to the any private ratings on the Netflix database.<em> <\/em><\/p>\n<blockquote><p><em>\u201cGiven a user\u2019s public IMDb ratings, which the user posted voluntarily to selectively reveal some of his (or her; but we\u2019ll use the male pronoun without loss of generality) movie likes and dislikes, we discover all the ratings that he entered privately into the Netflix system, presumably expecting that they will remain private.\u201d <\/em><\/p><\/blockquote>\n<p>So what, I hear  ya ask.<\/p>\n<p>Here\u2019s what the dynamic duo have to say about one person whose data they outed:<\/p>\n<blockquote><p>\u201c<em>First, we can immediately find his political orientation based on his strong opinions about \u201cPower and Terror: Noam Chomsky in Our Times\u201d and \u201cFahrenheit 9\/11.\u201d Strong guesses about his religious views can be made based on his ratings on \u201cJesus of Nazareth\u201d and \u201cThe Gospel of John\u201d. He did not like \u201cSuper Size Me\u201d at all; perhaps this implies something about his physical size? Both items that we found with predominantly gay themes, \u201cBent\u201d and \u201cQueer as folk\u201d were rated one star out of five. He is a cultish follower of \u201cMystery Science Theater 3000\u201d. This is far from all we found about this one person, but having made our point, we will spare the reader further lurid details. \u201c<\/em><\/p><\/blockquote>\n<p>So Netflix may have inadvertently revealed the political affiliation, sexual orientation, BMI and God-knows-what else of 500,00 of their subscribers. Way to go!<\/p>\n<p>Next up the <a href=\"http:\/\/arxivblog.com\/\/?p=88\">mobile phone datasets<\/a> we talked about a coupla weeks back<\/p>\n<p>Ref: <a href=\"http:\/\/arxiv.org\/abs\/cs\/0610105\">arxiv.org\/abs\/cs\/0610105<\/a> : Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset)<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Over the holiday period, the physics arxiv blog is re-running the most popular blogs (by page views) of 2007. Breaking the Netflix prize dataset 27 November Hell, this is good work. In October last year, Netflix released over 100 million movie ratings made by 500,000 subscribers to their online DVD rental service. The company then [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19],"tags":[],"class_list":["post-195","post","type-post","status-publish","format-standard","hentry","category-highlights"],"_links":{"self":[{"href":"http:\/\/arxivblog.com\/index.php?rest_route=\/wp\/v2\/posts\/195","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/arxivblog.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/arxivblog.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/arxivblog.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/arxivblog.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=195"}],"version-history":[{"count":0,"href":"http:\/\/arxivblog.com\/index.php?rest_route=\/wp\/v2\/posts\/195\/revisions"}],"wp:attachment":[{"href":"http:\/\/arxivblog.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=195"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/arxivblog.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=195"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/arxivblog.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=195"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}