Applied Statistics Comes to the Rescue of Big Data


Today’s businesses, governments, and other organizations collect a large amount of data about their operations and their customers. Classic examples include data about Amazon customers, call centers, or Twitter. All that information is aggregated and analyzed to predict future customer needs, provide sensible purchase suggestions, and personalize the organization’s engagement with customers.

Another example includes Google Flu Trends (GFT), an algorithm that attempts to predict flu outbreaks based on online search patterns. Although GFT is often used as a success story about the power of Big Data, the GFT’s performance has been questioned in Nature [1] and more recently in Science [2]. In particular, GFT missed the non-seasonal influenza pandemic in 2009. It turns out that GFT was only partially detecting the flu, and partially just the winter season, since the two events mostly coincided. After the prediction failure in 2009, the GFT algorithm was updated, and it continues to be improved by Google. Nevertheless, Lazer et al. [2] demonstrate that GFT suffers from a frequent Big Data hubris by assuming that the availability of a large amount of data makes traditional data collection and analysis obsolete. The authors also show that a straightforward application of some traditional statistical methods would have avoided many problems with GFT.

Jeff Leek[3] describes other examples, where the lack of statistical expertise led to fundamental errors in genomics and economics. For instance, incorrect predictions of responses to chemotherapy resulted in major consequences and cancelled clinical trials. In another example, two economists published a paper claiming that GDP growth was hindered by high government debt. Ultimately, the data did not support their claim. In this instance, the impact was more difficult to assess, but we do know that the paper was widely cited by regulators in many countries worldwide after the recent financial crisis and might have contributed to the exceptionally slow rate of recovery. Errors such as these can have serious consequences and can be mostly avoided through the use of proper applied statistical techniques.

Leek[3] also points out the absence of statisticians in many Big Data initiatives, including some high-impact events and organizations. One way to improve this situation is to produce more statisticians with applied skills, who can work in the current environment of Big Data and analytics; and collaborate with experts from other fields.

Our program in Applied Statistics is designed to serve that very purpose, so that the problems described earlier can be avoided when working on Big Data and analytics projects. Our students gain the knowledge and skills in all major areas of applied statistics. They start with getting the necessary statistical programming skills, coupled with courses focused on the intricacies of statistical modeling as well as statistical design and analysis of experiments. Various elective courses provide a wide range of specialized topics, including data mining, machine learning, and predictive analytics –all necessary skills while working with data sets of all sizes.

Students can choose between two paths: one, an Advanced Certificate which consists of only four courses; two, a Master’s Degree which requires 10 courses to complete. Our Master’s program offers five concentrations: Predictive Analytics, Data Mining/Machine Learning, Industrial, Biostatistics, and Theory.

RIT Online has been offering its Applied Statistics programs since 2000. Our instructors have extensive experience in this field. The courses are kept up-to-date with the very latest content and best practices in the field. All of this is packaged with state-of-the-art technology for online delivery and consumption. We are proud to announce our two latest courses on Predictive Analytics and Design and Analysis of Clinical Trials.

You can find more detail here:


[1] D. Butler, Nature 494, 155 (2013).

[2] David Lazer, Ryan Kennedy, Gary King, Alessandro Vespignani. “The Parable of Google Flu: Traps in Big Data Analysis,” Science, Vol. 343, 14 March 2014.

[3] Jeff Leek, “Why big data is in trouble: they forgot about applied statistics,” SimplyStats,

Peter Bajorski

About Peter Bajorski

Professor and Graduate Applied Statistics Program Chair, RIT