The term “big data” is bandied about so frequently that it has lost perhaps any relevant meaning, instead merely referring to a vaguely connected set of ideas related to “modern data sets.” While data set size does pose computational problems, our goal will be understanding some of the corresponding changes to statistical methodology. This transition is summarized best in the Prologue of Brad Efron’s book, Large-Scale Inference, and was the inspiration for the title of this course.
“At the risk of drastic oversimplification, the history of statistics as a recognized discipline can be divided into three eras:
1. The age of Quetelet and his successors, in which huge census-level data sets were brought to bear on simple but important questions: Are there more male than female births? Is the rate of insanity rising?
2. The classical period of Pearson, Fisher, Neyman, Hotelling, and their successors, intellectual giants who developed a theory of optimal inference capable of wringing every drop of information out of a scientific experiment. The questions dealt with still tended to be simple—Is treatment A better than treatment B? — but the new methods were suited to the kinds of small data
sets individual scientists might collect.
3. The era of scientific mass production, in which new technologies typified by the microarray allow a single team of scientists to produce data sets of a size Quetelet would envy. But now the flood of data is accompanied by a deluge of questions, perhaps thousands of estimates or hypothesis tests that the statistician is charged with answering together; not at all what the classical
masters had in mind.”
Clearly we will be addressing section 3.
Vorlesungen und Übungen. In der Vorlesung werden zentrale Konzepte vorgestellt, die dann in der Übung eingeübt und vertieft werden. Die Übungen umfassen das Lösen von Problemen und die Verwendung von R für Berechnung, Simulation und Visualisierung. Die Vorlesung behandelt sowohl den theoretischen Hintergrund als auch praktische Anwendungen und Codierung.