07 Ali Lecture

Distributed Systems Overview
Ali Ghodsi alig@cs.berkeley.edu
Replicated State Machine (RSM)

Distributed Systems 101
Fault-tolerance (partial, byzantine, recovery,...) Concurrency (ordering, asynchrony, timing,...)
Generic solution for distributed systems: Replicated State Machine approach

Represent your system with a deterministic state machine Replicate the state machine Feed input to all replicas in the same order
Total Order Reliable Broadcast aka Atomic Broadcast

Reliable broadcast
All or none correct nodes get the message (even if src fails)
Atomic Broadcast
Reliable broadcast that guarantees: All messages delivered in the same order
Replicated state machine trivial with atomic broadcast
Consensus?
Consensus problem
All nodes propose a value All correct nodes must agree on one of the values Must eventually reach a decision (availability)
Atomic Broadcast Consensus

Broadcast proposal, Decide on first received value
Consensus Atomic Broadcast

Unreliably broadcast message to all 1 consensus per round: propose set of messages seen but not delivered Each round deliver one decided message
Atomic Broadcast equivalent to Atomic Broadcast
Consensus impossible
No deterministic 1-crash-robust consensus algorithm exists for the asynchronous model 1-crash-robust
Up to one node may crash
Asynchronous model
No global clock No bounded message delay
Life after impossibility of consensus? What to do?
Solving Consensus with Failure Detectors

Black box that tells us if a node has failed Perfect failure detector
Completeness
It will eventually tell us if a node has failed
xi = input for r:=1 to N do if r=p then forall j do send <val, xi, r> to j; decide xi if collect<val, x, r> from r then xi = x; end decide xi
Accuracy (no lying)

It will never tell us a node has failed if it hasnt
Perfect FD Consensus
Solving Consensus
Consensus Perfect FD?
No. Dont know if a node actually failed or not!
Whats the weakest FD to solve consensus?

Least assumptions on top of asynchronous model!
Enter Omega
Leader Election
Eventually every correct node trusts some correct node Eventually no two correct nodes trust different correct nodes
Failure detection and leader election are the same

Failure detection captures failure behavior
detect failed nodes
Leader election also captures failure behavior

Detect correct nodes (a single & same for all)
Formally, leader election is an FD

Always suspects all nodes except one (leader) Ensures some properties regarding that node
Weakest Failure Detector for Consensus

Omega the weakest failure detector for consensus
How to prove it? Easy to implement in practice
High Level View of Paxos

Elect a single proposer using
Proposer imposes its proposal to everyone Everyone decides Done!
Several nodes might initially be proposers (contention) Proposer attempts to enforce decision Might abort if there is contention (safety) ensures eventually 1 proposer succeeds (liveness)
Problem with
Solution is abortable consensus
10
Replicated State Machine

Paxos approach (Lamport)
Client sends input to leader Paxos Leader executes Paxos instance to agree on command Well-understood, many papers, optimizations
View-stamp approach (Liskov)

Have one leader that writes commands to a quorum (no Paxos) When failures happen, use Paxos to agree Less understood (Mazieres tutorial)
Paxos Siblings
Cheap Paxos (LM04)
Fewer messages Directly contact a quorum (e.g. 3 nodes out of 5) If fail to get response from 3, expand to 5
Fast Paxos (L06)

Reduce from 3 delays to 2 delays (delays ~ delays) Clients optimistically write to a quorum Requires recovery
Paxos Siblings
Gaios/SMARTER (Bolosky11)
Make logging to disk efficient for crash-recovery Uses pipelining and batching
Generalized Paxos (LM05)

Commutative operations for repl. state machine
Atomic Commit
Atomic Commit
Commit IFF no failures and everyone votes commit Else Abort
Consensus on Transaction Commit (LG04)

One Paxos instance for every TM Only commit if every instance said Commit
Reconfigurable Paxos
Change the set of nodes
Replace failed nodes Add/remove new nodes (change size of quorum)
Lamports idea
Part of the state of state-machine: set of nodes
SMART (Eurosys06)
Many problems (e.g. {A,B,C}->{A,B,D} and A fails) Basic idea, run multiple Paxos instances side by side

07 Ali Lecture

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

07 Ali Lecture

Enviado por

Direitos autorais:

Formatos disponíveis

Distributed Systems Overview

Ali Ghodsi alig@cs.berkeley.edu

Replicated State Machine (RSM)

Generic solution for distributed systems: Replicated State Machine approach

Total Order Reliable Broadcast aka Atomic Broadcast

Replicated state machine trivial with atomic broadcast

Atomic Broadcast Consensus

Consensus Atomic Broadcast

Atomic Broadcast equivalent to Atomic Broadcast

Life after impossibility of consensus? What to do?

Solving Consensus with Failure Detectors

Accuracy (no lying)

Whats the weakest FD to solve consensus?

Failure detection and leader election are the same

Leader election also captures failure behavior

Formally, leader election is an FD

Weakest Failure Detector for Consensus

High Level View of Paxos

Solution is abortable consensus

Replicated State Machine

View-stamp approach (Liskov)

Fast Paxos (L06)

Generalized Paxos (LM05)

Consensus on Transaction Commit (LG04)

Você também pode gostar