The problem with big data? No one really knows what they're doing
Even if you have heard the big data quote doing the rounds it is worth repeating here once more:
Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.
Hard to beat that for pith! It is a sign of increasing disappointment in the promise of big data.
The big data hype started back in 2009 with Google’s Flu Trends paper in Nature. They used 50 million search terms and 450 million models to find the best combination of 45 search terms that helped the Center for Disease Control in the US predict winter flu epidemics, three weeks ahead of doctor-reported stats.
Will big data change the world?
That spawned a New York Times bestseller called Big Data. “Just as the Internet radically changed the world, so too will big data change fundamental aspects of life by giving it a quantitative dimension it never had before,” it promised.
In turn, VC investors started scouring Silicon Valley for big data start-ups that could cash in on this type of analysis, which itself led to the buzzword bingo that we have been seeing in corporate presentations this year.
Castlight, a US company that IPO-ed in March was a classic. What do they do? Castlight is a pioneer in cloud-based software that aggregates large-scale data and applies sophisticated analytics, delivered through consumer-oriented applications, to make health care transparent.
Fizzle or boom?
And yet five years after the Google/Nature paper, Facebook thinks I’m an aspiring apprentice welder who likes bodybuilding and Asian girls. Apologies if I have mentioned that before, I am still trying to work out if they have delved deeper into my soul than I have ever dared to myself and whether or not I should enrol at a technical college in Singapore with cheap gym membership.
Disillusionment is inevitable. The BuzzFeed “Top 5 Big Data Fails” listicle is surely on its way, and then we will have hit the bottom.
What is the problem? Nothing really, it just takes time. Big data is more than just lots of it. It is a fundamentally different approach, using population data, correlations and data mining to find answers that are good enough. Most of this is anathema to classical statistics and precise business intelligence software.
It also takes a lot of dedicated computing power. Google et al have the physical and human resources to apply to the tasks and are still in the early stages of developing truly valuable tools, whilst most companies are just finding their feet after the recession.
Our rising cynicism is the result of the eagerness that such extraordinary promise generates.
Big data Christian Greys
Nevertheless, there are some companies that are doing big data well. Jawbone are able to understand something of the relationship between sleep patterns and daytime activity, nudging the less active to sleep more like the more active. It is typical big data, correlation without understanding causation but nonetheless actionable information. If Jawbone users sleep more, walk more, and get their friends to buy more Up bands who cares which way round the causation works?
It is also interesting to see that IBM is starting to offer its chess grandmaster Watson as a cloud service. This will help corporates with lots of data gain some insights without having to hire their own data scientists and buy five-figure supercomputers from Cray.
The bottom line is that with a bit more practice and experimentation, fondling and fiddling if you like, companies can become big data Christian Greys. We are entering the big data college years and in three years it will make a lot more sense.