This course gives an introduction to data science. The emphasis is on strategies for the design of experiments, considering both workflow paradigms and aspects of reproducibility and traceability of solutions. Furthermore, knowledge about the lifecycle of data, from acquisition through processing and analysis to the long-term provision and reuse, is covered. Students are also introduced to the complex legal and ethical aspects of working with data.
The following topics are covered in the lectures:
- Introduction to Data Science
- Data and the data lifecycle
- Conceptual Experiment design
- Workflow paradigms
- Data management, reproducibilty and traceability
- Experiment error analysis and statistical testing
- Advanced experiment design
In addtion, two exercises will be done.
The effort breakdown is:
7 2-hour lectures: 14h
Exercise 1: 15h
Exercise 2 (incl presentation): 25h
Exam preparation: 20h
Exam: 1h
SUM: 75h
Syllabus
(all in FH HS2, Thu, 2-4pm c.t.)
BLOCK 1
4.10.: Introduction to data science - data science process -Hanbury
11.10.: Data and the data lifecycle, ethical and legal aspects -Hanbury
BLOCK 2
[18.10.: Optional: Machine Learning Primer -Knees]
25.10.: Conceptual Experiment Design: Planning and Execution of Experiments, hypotheses -Knees
Exercise 1: Design an experimental workflow for a given dataset
22.11.: Workflow paradigms and Scientific Workflow Environments; iPython, Jupyter Notebook, WEKA, Graphical Experimentation Workflow; -Schindler, Knees
BLOCK 3
29.11.: Facilitating reproducibility and traceability; Basics data management planning and data stewardship; - Rauber
6.12.: Experiment Error Analysis and Statistical Testing 1 -Knees
20.12.: Experiment Error Analysis and Statistical Testing 2 -Knees
Exercise 2: Reproduce experimental results from a paper
17.1.: Presentations of Exercise 2
24.1.: Written Exam