Você está na página 1de 15

Distributed Systems Overview

Ali Ghodsi alig@cs.berkeley.edu

Replicated State Machine (RSM)


Distributed Systems 101
Fault-tolerance (partial, byzantine, recovery,...) Concurrency (ordering, asynchrony, timing,...)

Generic solution for distributed systems: Replicated State Machine approach


Represent your system with a deterministic state machine Replicate the state machine Feed input to all replicas in the same order

Total Order Reliable Broadcast aka Atomic Broadcast


Reliable broadcast
All or none correct nodes get the message (even if src fails)

Atomic Broadcast
Reliable broadcast that guarantees: All messages delivered in the same order

Replicated state machine trivial with atomic broadcast

Consensus?
Consensus problem
All nodes propose a value All correct nodes must agree on one of the values Must eventually reach a decision (availability)

Atomic Broadcast Consensus


Broadcast proposal, Decide on first received value

Consensus Atomic Broadcast


Unreliably broadcast message to all 1 consensus per round: propose set of messages seen but not delivered Each round deliver one decided message

Atomic Broadcast equivalent to Atomic Broadcast

Consensus impossible
No deterministic 1-crash-robust consensus algorithm exists for the asynchronous model 1-crash-robust
Up to one node may crash

Asynchronous model
No global clock No bounded message delay

Life after impossibility of consensus? What to do?

Solving Consensus with Failure Detectors


Black box that tells us if a node has failed Perfect failure detector
Completeness
It will eventually tell us if a node has failed
xi = input for r:=1 to N do if r=p then forall j do send <val, xi, r> to j; decide xi if collect<val, x, r> from r then xi = x; end decide xi

Accuracy (no lying)


It will never tell us a node has failed if it hasnt

Perfect FD Consensus

Solving Consensus
Consensus Perfect FD?
No. Dont know if a node actually failed or not!

Whats the weakest FD to solve consensus?


Least assumptions on top of asynchronous model!

Enter Omega
Leader Election
Eventually every correct node trusts some correct node Eventually no two correct nodes trust different correct nodes

Failure detection and leader election are the same


Failure detection captures failure behavior
detect failed nodes

Leader election also captures failure behavior


Detect correct nodes (a single & same for all)

Formally, leader election is an FD


Always suspects all nodes except one (leader) Ensures some properties regarding that node

Weakest Failure Detector for Consensus


Omega the weakest failure detector for consensus
How to prove it? Easy to implement in practice

High Level View of Paxos


Elect a single proposer using
Proposer imposes its proposal to everyone Everyone decides Done!
Several nodes might initially be proposers (contention) Proposer attempts to enforce decision Might abort if there is contention (safety) ensures eventually 1 proposer succeeds (liveness)

Problem with

Solution is abortable consensus

10

Replicated State Machine


Paxos approach (Lamport)
Client sends input to leader Paxos Leader executes Paxos instance to agree on command Well-understood, many papers, optimizations

View-stamp approach (Liskov)


Have one leader that writes commands to a quorum (no Paxos) When failures happen, use Paxos to agree Less understood (Mazieres tutorial)

Paxos Siblings
Cheap Paxos (LM04)
Fewer messages Directly contact a quorum (e.g. 3 nodes out of 5) If fail to get response from 3, expand to 5

Fast Paxos (L06)


Reduce from 3 delays to 2 delays (delays ~ delays) Clients optimistically write to a quorum Requires recovery

Paxos Siblings
Gaios/SMARTER (Bolosky11)
Make logging to disk efficient for crash-recovery Uses pipelining and batching

Generalized Paxos (LM05)


Commutative operations for repl. state machine

Atomic Commit
Atomic Commit
Commit IFF no failures and everyone votes commit Else Abort

Consensus on Transaction Commit (LG04)


One Paxos instance for every TM Only commit if every instance said Commit

Reconfigurable Paxos
Change the set of nodes
Replace failed nodes Add/remove new nodes (change size of quorum)

Lamports idea
Part of the state of state-machine: set of nodes

SMART (Eurosys06)
Many problems (e.g. {A,B,C}->{A,B,D} and A fails) Basic idea, run multiple Paxos instances side by side

Você também pode gostar