Fault-tolerant distributed algorithms are at the heart of any distributed system for critical applications and implement low-level services like clock synchronization, group membership and consensus. Suitable algorithms must work as specified in the presence of the inherent uncertainty in network- or shared-memory coupled distributed systems, which is caused by varying/unknown communication delays and computing speeds and, in particular, subsystem failures. Due to combinatorial explosion, it is often impossible to verify the correct operation of such algorithms by means of model checking (or exhaustive testing). Correctness proofs based on formal-mathematical modelling are the only feasible alternative here. This theoretical graduate-level basic course provides an introduction to distributed algorithms and their formal-mathematical analysis. Apart from developing formal-mathematical skills in general, this course shall allow its attendees to: (1) become familiar with fundamental models, problems, algorithms, lower bound and impossibility results, and proof techniques in distributed computing, (2) be able to apply lower bounds and impossibility results learned to new situations where appropriate, (3) be able to design new distributed algorithms for new situations, using the algorithms and techniques learned as building blocks, and (4) find new lower bounds and impossibility results.
The course is organized in the "anglo-american style", which is based on continuous engagement during the whole semester: Several quizzes and homework assignments ensure (1) that the topics taught in the lecture are efficiently acquired, and (2) that the individual formal-mathematical problem-solving skills are trained.
Basics: Execution runs, safety and liveness properties, causality and time; Models: Message passing vs. shared memory, synchronous vs. asynchronous, failure models; Algorithms: Leader election, mutual exclusion, clock synchronization, consensus, distributed snapshots; Proof techniques: Impossibility proofs, lower bounds, simulation, indistinguishability, bivalence.
All who want to participate in the next course: Please subscribe to the TISS LVA-Forum & News as soon as possible. [Enrolling (via myTI) is only allowed after having passed the admission criterion, however.]
Textbook: Hagit Attiya, Jennifer Welch. Distributed Computing: Fundamentals, Simulations and Advanced Topics (2nd ed.), John Wiley and Sons, 2004. ISBN 0-471-45324-2
Familiarity with the analysis of sequential algorithms and elementary discrete mathematics; reasonable skills in devising mathematical proofs. Some background in distributed systems and fault-tolerant systems is helpful but not required.