This lecture covers the basic programming approaches in Data Science. The emphasis is on computational thinking, the formulation of problems and their solution spaces so that a computer can solve them. Methods for increasing the efficiency of the solutions are also presented. Use cases demonstrate the practical application of data science solutions.
The following topics are covered in the lectures:
In addtion, two exercises will be done.
The effort breakdown is:
Python tutorial: 4h6 2-hour lectures: 12hExercise 1: 15hExercise 2: 44hSUM: 75h
All Lectures on Tuesday 11:00-13:00, Seminarraum Gödel, Favoritenstraße 9
BLOCK 1
Introduction to DOPP, Text stream processing (|, awk, regex, sed) [Böck] (7.11)
Python [Böck] (14.11)
SciPy, NumPy, vectorisation, Execution performance measurement - benchmarking [Böck] (21.11)
Data preparation, Structuring - Data Fusion of Data of Different Types and Quality - Pandas [Kiesling] (28.11)
Exercise
BLOCK 2
Data science solution approaches: fusion of techniques from multiple areas, data science case studies [Hanbury] (12.12)
Big Exercise: Solve a data science problem and implement the solution efficiently (solved individually)
BLOCK 3
introductory scaling algorithms to big data (which architecture is needed for which problem?); Evaluation for selecting the optimal tools satisfying a set of requirements; Scaling Data Science to multiple application areas [Kiesling] (9.1)
Presentations of Big Exercise solutions and code (16.1)
Ex1, Ex2: 1..100 points. Minimum 35.
Grade=0.25*Ex1+0.75*Ex2. Minimum 50.
Not necessary