6.824 Distributed systems

Schedule https://pdos.csail.mit.edu/6.824/schedule.html

Lecture 1 - 29. Aug 2020

Concepts

Distribuited systems a set of cooperative computers communicating to each other over a network to get some coherent task done.

Core structure

Lectures Papers Exams Labs Project

Labs

Debugging can be time consuming 1. MapReduce 2. Raft for fault tolerance 3. K/V server 4. Sharded k/v service Partition / Project

Reasons

Challenges

Infrastructure

Build abstractions to simplify the interface to allow applications to rely on this infrastructure (hard to find the ones who simplify ) * Storage * Communication * Computation

Implementation topics

Performance

Crashed client after updating the first node

Map-reduce

Comes from google, building an index of the web is basically equivalent sort in the entire data.

Framework that would make easy for non specialist write and run giant distribuited computations

Assumes that we have a set of inputs

Word count

Input1 -> Map -> <a : 1, b : 1 ... >
Input2 -> Map -> <b : 1 ... >
Input3 -> Map -> <a : 1, c : 1 ... >

func map (k,v)
	split v into word
	for each word w
		emit (w, ‘1’)
func reduce (k,v)
	emit(len(v)

Pagerank, iterative runs multiple jobs

We care about the orchestration

Intermediate file are saved to a distributed filesystem which is available to each worker,

GFS ( google file system )

Learn about modern datacenter

Assumes that the network is fast