Part 1: Lecture (public) - 8 units.
The lectures will give an overview on different categories of clustering and outlier detection and introduce fundamental methods as well as specialized methods for high-dimensional data.
An introductory chapter will survey the KDD pipeline. We will shortly discuss properties of feature spaces and distance measures.
The chapter on cluster analysis will cover partitional clustering and density-based clustering, and will give some ideas about hierarchical clustering. Finally, we will discuss challenges and some strategies for clustering high-dimensional data.
The chapter on outlier detection will survey standard methods from the data mining literature as well as the particular challenges and solutions for high-dimensional data.
During the lecture, we might occasionally examine the typical behavior of some representative algorithms on toy data sets. Participants interested in following these experiments on their own laptop computer are encouraged to download the latest release of ELKI (http://elki.dbs.ifi.lmu.de/ ) (requires java, e.g., OpenJDK 7).
Part 2: Seminar, 16 participants (PhD school) - 6 units.
In the seminar, we will discuss recent literature on clustering and outlier detection, with a particular focus on high-dimensional data. The participants are to prepare talks on papers and to discuss the presented papers, based on the insights learned in the lecture.
In addition, we will have a short preparatory meeting on May 16, after the last lecture, to discuss the assignment of topics to the participants.
This is a visiting professor course of the Vienna PhD School of informatics in the area of Media Informatics and Visual Computing.
Lecturer: Arthur Zimek, LMU Munich, Germany.
First (minor) part: for the homework given in the lectures, the participants in the seminar should prepare a short report summarizing your answers and send it to zimek@dbs.ifi.lmu.de until May 1, 2014. Please include, if possible, a picture of yourself and a short description of your research topics, to help me learn your names and prepare the seminar assignments.
The second part of the grade will be based on your presentation in the seminar. We will discuss paper assignments to the participants in the seminar on May 16 after the last lecture.