This course gives an introduction to data science. The emphasis is on strategies for the design of experiments, considering both workflow paradigms and aspects of reproducibility and traceability of solutions. Furthermore, knowledge about the lifecycle of data, from acquisition through processing and analysis to the long-term provision and reuse, is covered. Students are also introduced to the complex legal and ethical aspects of working with data.
The following topics are covered in the lectures:
- Introduction to Data Science
- Data and the data lifecycle
- Conceptual Experiment design
- Workflow paradigms
- Data management, reproducibilty and traceability
- Experiment error analysis and statistical testing
- Advanced experiment design
In addtion, two exercises will be done.
The effort breakdown is:
7 2-hour lectures, including one multiple choice quiz: 14h
Exercise 1: 15h
Exercise 2: 25h
Exam preparation: 20h
Exam: 1h
SUM: 75h
Syllabus
(alle Seminarraum von Neumann, Mi, 16-18h)
BLOCK 1
18. Okt: Introduction to data science - data science process, algorithmic ethics, human-in-the-loop -Hanbury
25. Okt: Data and the data lifecycle (include ethical and legal aspects introduction) -Hanbury
BLOCK 2
8. Nov: Conceptual Experiment Design: Planning and Execution of Experiments, Crisp-DM -Knees
22. Nov: Workflow paradigms and Scientific Workflow Environments: Taverna, Kepler, Myexperiments.org, environment set-up: iPython, iPython Notebook Versioning, Yesworkflow, Noworkflow; -Schindler, Knees
Exercise 1: Design an experimental workflow for a given dataset (start: 22.11, hand-in: 12.12)
BLOCK 3
29. Nov: Facilitating reproducibility and traceability; Basics data management planning and data stewardship; - Rauber
6. Dez: Experiment Error Analysis and Statistical Testing -Knees
13. Dez: Deep Experiment Design (statistical power, application in workflows, metastudies, ...) -Knees
Exercise 2: Reproduce experimental results from a paper (start: 29.11, zwischenabgabe: 5.12, hand-in: 19.1)
24. Jan: Exam