26 June 2008

Boundless data does not mean the end of models

Chris Anderson is a very smart guy and I agree with him on his concpets of the Long Tail. However, I think he's very wrong on this idea.

"All models are wrong, but some are useful."

So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don't have to settle for wrong models. Indeed, they don't have to settle for models at all.

Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. They are the children of the Petabyte Age.

The problem wiht this is that it works in the area where: you don't have to understand the underlying phenomenon to use it (Google selling ads based on the perception of value to the advertiser); you have a readily reported massive data sets (internet usage); and highly mathematically related items (difficult with taxonomy related ideas like biology or astronomy). It won't work at all with cosmology and other non-observational sciences.

Thanks for trying, but get your head out of the internet (I blog, ironically).

No comments: