Escolar Documentos
Profissional Documentos
Cultura Documentos
riak bends.
Justin Sheehy
justin@basho.com
Perfection is Unattainable
3
Know How You Degrade
Plan it and understand it before your users do.
4
Know How You Degrade
Plan it and understand it before your users do.
5
Harvest and Yield
harvest: a fraction
data available / complete data
yield: a probability
queries completed / q's requested
in tension with each other:
(harvest * yield) ~ constant
10
Resilience is Attainable
11
Component Failure:
reboot of live database
12
Component Failure:
reboot of live database
13
Component Failure:
reboot of live database
"log-structured" databases
Example: bitcask
14
Bitcask
15
Bitcask
at
r m r
fo igge
fi l e b
ly i n g
n
o et h
-
d m
n
e so
p
ap t of
l e
p nen
sim po
om
a c
16
as
Component Failure:
reboot during record write
17
Component Failure:
reboot during record write
18
Component Failure:
reboot during record write
19
Zoom Out:
Bitcask is one part of Riak
20
Component Failure:
internal subsystem crash
Bugs can lurk anywhere.
Unpredictability, eek. X?
X?
X?
Typical mitigation: X?
complex exception-management X?
X?
X?
21
Component Failure:
internal subsystem crash
Stronger mitigation:
supervision trees and "let it crash"
Added bonus: simpler and clearer code
22
Zoom Out:
Virtual Nodes
...
23
Zoom Out:
Virtual Nodes
Many storage instances per server.
If one fails, whole system is okay.
...
24
Zoom Out:
Riak is a Distributed System
25
Component Failure:
reboot during record write
What about a half-written write?
{ok, Value}
{ok, Value}
{error, not_found}
27
Mitigation: quorum reads
{ok, Value}
client
{ok, Value}
{ok, Value}
{error, not_found}
28
Mitigation: quorum reads
{ok, Value}
client
{ok, Value}
{ok, Value}
{ok, Value}
client
{ok, Value}
{ok, Value}
{error, bad_crc}
{ok, Value}
30
Component Failure:
server down!
From a distributed system's point of view,
a whole server can be seen as "a component."
{ok, Value}
client
{ok, Value}
{ok, Value}
X
32
What about writes?
PUT Value
client
ok
ok
X
33
Mitigation: sloppy quorum
PUT Value
client
ok
ok
ok
ok
X
34
Mitigation: sloppy quorum
{ok, Value}
client
{ok, Value}
{ok, Value}
{ok, Value}
sloppy quorums work for reads too! X
35
sloppy quorums are sloppy
{ok, Value}
{ok, Value}
{ok, Value}
?
36
Mitigation: hinted handoff
37
Mitigation: hinted handoff
39
Component Failure:
datacenter-level outage
X
40
Mitigation:
masterless replication
X
still live!
41
Mitigation:
masterless replication
Justin Sheehy
justin@basho.com