When Google tried to predict the flu

In late 2008, a team of engineers at Google post to their company blog about a correlation between search queries and CDC flu surveillance data.

Soon after, they write a journal paper and launch “Google Flu Trends” (GFT). Media outlets jump on the story:

CBS - Google Flu Trends

Fast forward to February 2013, an article titled “When Google Got Flu Wrong” appears in the same publication that they had first submitted their findings to.  The article points out that GFT  had been predicting almost double what the CDC was reporting over the previous flu season.

Before long, Forbes, Wired Magazine,the Huffington Post and others are writing articles about how Google “failed” at predicting the flu.

By August 2015, the public facing site for GFT is shut down.

What went wrong? How did Google Flu Trends go from being a shining example of using technology for social good to being the victim of mocking news headlines across the web? And was the response from media justified?

A solution looking for a problem

blogvenndiagram

Dedicated to the idea that technology can help make the world a better place. – Description of Google.org on company history page

Long before HBO’s Silicon Valley made it cringe-worthy, tech companies were lining up to “make the world a better place”. Google was no different. They established “Google.org” in 2004, acting as the charitable arm of the organisation – committing around $100m of investment and grants each year to help tackle some of the worlds biggest challenges. One such initiative was called “Predict and Prevent”, with the specific purpose of preventing local outbreaks of disease from becoming pandemics (see the brief here).

Around the same time, the company was processing around 1.5 billion search queries each day (that number has hit at least 5.5 billion in recent years). Google had technology, smart people, data and a mission to “do good”. All they needed was a problem to solve…

… Enter Influenza:

  • Responsible for up to 500,000 deaths each year (worldwide)
  • Costs the US approx. $87b each year
  • Evolves more rapidly than any other virus or pathogen
  • Can jump across species
  • Has resulted in multiple pandemics killing millions of people.

The flu is a BIG problem and it is really important that governments can monitor flu activity so that they can allocate public services and vaccines and minimise the impact of an epidemic or pandemic when it happens.

Traditional surveillance programs for flu are administered by the Center for Disease Control (CDC) or Public Health England (PHE) in the UK. The programs are made up of practitioners like physicians and nurses who submit data on the number of patients they have seen vs the number of those patients who had Influenza-like Illness (ILI) symptoms. They also provide specimens for laboratory testing.

The process for monitoring is slow, with a reporting lag of around two weeks. Being able to cut down that two week reporting lag would potentially mean an earlier response to an emerging crisis, saving countless lives.

The solution had found a problem and the stage was set for Google to flex its technological and intellectual muscles.

Finding a correlation

correlationGoogle hypothesised that the frequency of specific search terms may be highly correlated with the percentage of outpatient visits that reported ILI symptoms (reported by the CDC).

Using around five years of search data, they computed a time series showing the weekly search count for 50 million of their most popular search queries. They then devised an automated way to identify which of these 50 million search queries most closely correlated to a time series of the CDC data.

A “bucket” of  around 45 search queries was identified as having the best result, with a mean correlation of around 0.9 with the CDC reported ILI figures.

During the 2007/08 flu season, Google used preliminary versions of their model and shared the results with the CDC. A key advantage, however, was a reduction in the reporting lag from 2 weeks down to 1 day (using Google’s model) – providing an opportunity to predict the CDC data :

googlefluvscdc

Google turned their model into a public site they called Google Flu Trends (GFT). Revisions were made in late 2009 to adapt to models from learnings resulting from the H1N1 pandemic.

Media outlets from CNN to the BBC to Techcrunch pounced on the story with GFT quickly becoming a trophy to be waved around in conversations about technologies ability to disrupt and replace the methods of old.

Amongst the noise and media frenzy however, it is would pointing out that Google had, in their original paper, stated that their model was NOT a replacement for traditional surveillance or lab-based diagnoses. They also stressed that panic and concerns among healthy individuals may cause a surge in queries and exaggerated estimates.

But the hype train had already left the station at this point.

GFT gets a sore throat

In May 2010, a study titled “Google flu trends estimates off” started making ripples across the web. Researchers had found that whilst GFT was highly correlated to the surveillance of non-specific ILI, the model was 25% less accurate at estimating rates of laboratory confirmed influenza.

Google’s response: “This doesn’t come as much of a surprise since the virologic data is telling a different story”.

Studies have shown that only 20-70% of cases reported as having ILI symptoms during flu season are actually caused by the flu virus. Google’s model predicted people reporting flu symptoms, not people actually having the flu.

Then, in early 2013, Nature, the same journal where GFT was first introduced to the world,  reported that Google was predicting almost double what was being reported by the CDC for the Christmas flu period.  The speculation was that increased media coverage during what had been a severe flu season had triggered excessive flu-related searches by people who were not actually sick.

Big Data Hubris and the end of GFT

About a year after the article in Nature, a paper titled “The Parable of Google flu: Traps in big data analysis” was released in Science.  The paper built on the findings of its predecessor and went even further. Particularly scathing was that it found GFT to be excessively high when compared to the CDC for 100 out of 108 weeks starting August 2011.

They attributed the errors partly to “Big data hubris“, suggesting that Google’s initial attempt at GFT was “part flu detector, part winter detector” – suggesting that their “ad-hoc” methodology had quite high odds of identifying search terms that were highly correlated to CDC data yet structurally unrelated to the flu (example being “High School Basketball” – with the sport predominantly taking place in the winter months).

They also pointed out that Google’s search algorithms and user behaviour would have been regularly updated over the lifetime of GFT and undoubtedly had an impact on GFT’s tracking capabilities. Called “Algorithm dynamics“, GFT assumed that the relative search activity was related to external events, overlooking that it was also being cultivated and influenced by the engineering of Google itself. A problem that no doubt persists across other platforms such as Twitter and Facebook.

The subsequent months after the release of the paper saw a flood of critical articles across the media world. Some notable headlines included:

  • How Google Confused Basketball Fans with Flu Patients (Bloomberg)
  • Why Google Flu is a failure (Forbes)
  • What can we learn from the epic failure of Google Flu trends (Wired)
  • Google catches the cold… (Financial Times)

Google posted a blog article stating they were launching a new model that took official CDC flu data into account (though they didn’t specifically acknowledge the criticism about GFT’s accuracy).

However, within a year, another post was written stating that the public facing site for GFT was going to be shut down and that data would instead be fed to institutions such as Boston Childrens Hospital and the CDC Influenza Division. Whilst the popular consensus is that this was a reaction to the public embarrassment handed down by the various media outlets, Google never formally acknowledged this.

Part hubris. Part delusional grandeur.

headlinesWhilst many of the scathing articles used the Science paper as ammunition, most failed to point out that the researchers were able to substantially improve on the performance of GFT and the CDC by combining the two data sets together and dynamically re-calibrating GFT.

The rise and fall of Google Flu Trends represented false assumptions and failings on many sides. On one hand, there were erroneous assumptions and over-reliance of big data, however, it seems that a familiar story played out:

Person creates a solution (with an important “catch”) > Crowd takes over. Ignores the “catch” > The solution fails (in part because of the catch) > The crowd turns on the creator.

In this case, a small team of smart people tested a theory they thought would add value to the world BUT made some clear caveats that this was in no way perfect. The public took their solution, ignoring the caveat. Then, when their model failed, turned on them.

A derivative of this has played out in other ways, for example, following Tesla’s fatal autopilot crash, Mobileye, the creator of the Autopilot technology, suggested that the capabilities of the system was oversold by Tesla (the companies have since parted ways).

GFT provided a learning opportunity for all of us. Both enterprise and academia were exposed to a real world case study of the value that Big Data could add in supplementing traditional methods of analysis whilst also providing an opportunity for understanding some of its pitfalls.

And whilst the media hype and grandeur has faded, the work continues. Studies were already underway in 2014 to assess Wikipedia usage as a source of estimating ILI prevalence, whilst another study in Oct 2016 evaluated Google, Twitter and Wikipedia as tools for Influenza surveillance.

 

Recent Posts

Leave a Comment