Predicting the popularity of online content

digg-prediction

The page views for entries on this site in the last week range from more than 17,000 thousand for this story to around 100 for this one.

That just goes to show that when you post a blog entry and there’s no way of knowing how popular it will become. Right?

Not according to Gabor Szabo and Bernardo Huberman at HP Labs in Palo Alto who reckon they can accurately forecast a site’s page views a month in advance by analysing its popularity during its first two hours on Digg.

They say a similar prediction can be made for YouTube postings except these need to be measured for 10 days before a similarly accurate forecast can be made. (That’s almost certainly because Digg stories quickly become outdated while YouTube videos are still found long after they have been submitted.)

That’s not so astounding if all (or at least most) content has a similar long tail-type viewing distribution. Measuring part of this distribution automatically tells you how the rest is distributed.

But actually proving this experimentally is more impressive. In principle, it gives hosts a way of allocating resources such as bandwidth well in advance which could be useful, especially if you can charge in advance too.

Ref: arxiv.org/abs/0811.0405: Predicting the Popularity of Online Content

One Response to “Predicting the popularity of online content”

  1. confuted says:

    17,000 thousand? 17 thousand, or 17 million?