Explorar E-books
Categorias
Explorar Audiolivros
Categorias
Explorar Revistas
Categorias
Explorar Documentos
Categorias
Agenda
Introduo
Motivao
Definio do problema
Escopo negativo
Contribuies
Deduplicao
Tipos de deduplicao
Benefcios
Desenvolvimento
Dedupeer File Storage
Componente de
software para
deduplicao
Os algoritmos
Fundamentos
Detalhando os
algoritmos
Algoritmo em
execuo
Anlise de
desempenho
Anlise de compresso
2
Concluso
Introduo
Aumento da demanda
Tcnicas de compresso de dados
Intra-file vs inter-file
Motivao
Escassez de componentes de softwares
para deduplicao
Contribuir com a green storage
Proporcionar benefcios para os sistemas
de armazenamento como:
Reduo no espao necessrio para
armazenamento
Reduo do trfego de dados
Diminuio da quantidade de chunks
Definio do problema
Algoritmos detalhados para deduplicao
utilizando processamento de forma
particionada so escassos. Existe tambm
uma falta de componentes de softwares
interoperveis para deduplicao que
possam ser integrados aos sistemas de
armazenamento de dados existentes.
O que deduplicao
A deduplicao uma tcnica que reduz a
quantidade de espao necessrio para
armazenamento de dados atravs da eliminao
de blocos e/ou arquivos redundantes.
Na deduplicao, todos os blocos de dados ou
arquivos que esto duplicados em um sistema
de armazenamento so reduzidos uma nica
cpia, e os dados que foram desalocados so
convertidos para uma referncia ao contedo
mantido no sistema
6
Escopo negativo
Chunks menor que 8KB e maiores que
128KB.
Teste do DeFS de forma distribuda
Teste do algoritmo de maneira postprocessing
Utilizao de outros algoritmos de hashing
diferentes do SHA-1 e MD5
Contribuies
O algoritmo para deduplicao de dados com
processamento particionado encapsulado em um
componente de software interopervel
O sistema de armazenamento de dados distribudo com
gerenciamento de armazenamento delegado para um
banco de dados no-relacional construdo em uma
arquitetura peer-to-peer com topologia de anel.
A otimizao da descoberta de redundncia de dados
atravs da carga extra de dados no processamento
particionado, possibilitando a diminuio da perda de
identificao por causa da quebra do arquivo.
Exemplo de benefcio
12
Desenvolvimento
13
Desenvolvimento
14
15
16
17
18
Gerenciamento do armazenamento
Tolerncia falhas
Recuperao de falhas
Consistncia
Disponibilidade
Empacotamento de requisies
19
20
21
Os algoritmos
O rsync
Funcionamento
Rolling checksum
checksum do bloco de bytes j for
conhecido, obter o do bloco mais
rpido.
22
23
Algoritmo em execuo
A seguir ser demonstrado o
algoritmo com deduplicao atravs
do processamento particionado em
execuo.
O passo a passo foi capturado do
sistema executando no modo debug
com tamanho de chunk = 4 bytes e
carga de bytes = 8 bytes.
24
64 6f 6c 6f
6d 20 69 70
72 20 73 69
64 6f 6c 6f
65 74 2e
74 20 61 6d
65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 20 61 6d
Lorem ipsum dolor AQUI sit amet.
25
Hash32: 63308178
730949e23ca46f310466fbf205ffb165aef1fd7b
4c 6f 72 SHA-1:
65
ID: 0
Hash32: 55968102
da89a0ebe3dfb3a4c4cb2a758caf6515bc46c33d
6d 20 69SHA-1:
70
ID: 1
Hash32: 69534069
aac8d75fb58b069e4c0ab23393ac474d53146d53
64 6f 6cSHA-1:
6f
ID: 2
Hash32: 69468590
64 6f 6cSHA-1:
6f
13f12aec0dad9421e0fdc3d8788343ba23e8fb47
ID: 3
Hash32: 58130798
9256c5d2d36c9f9c4947b98a1556de7437d6b790
72 20 73SHA-1:
69
ID: 4
Hash32: 56557922
a1140808c69e1a3c1871996563fcebb8800b0ad3
74 20 61SHA-1:
6d
ID: 5
Hash32: 38076679
65 74 2eSHA-1: 7d03542f8187a51a05a64a5f4670b6432e7c75f9
ID: 6
26
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1fd7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc46c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb8800b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e7c75f9, 6>
27
Memria principal
Disco rgido
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
7c75f9, 6>
tes ia
y
a b mr
g
rre me
a
C aa
r
pa
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 202861 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
7c75f9, 6>
possui a chave?
Hash32: 63308178
Disco rgido
4c 6f 72 65 6d 20 69 70
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 202961 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53ado
c
146d5, 2>
pli
du
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
nk
u
8fb47, 3>
Ch
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
SHA-1: 730949e23ca46f310466fbf205ffb165aef1fd7b
Disco rgido
4c 6f 72 65 6d 20 69 70
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 203061 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
7c75f9, 6>
Chunk-1
ref:0
Disco rgido
4c 6f 72 65 6d 20 69 70
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 203161 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
Chunk-1
ref:0
Hash32: 55968102
Disco rgido
4c 6f 72 65 6d 20 69 70
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 203261 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e n
u
8fb47, 3>
Ch
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
Chunk-1
ref:0
do
a
lic
p
u
kd
SHA-1: da89a0ebe3dfb3a4c4cb2a758caf6515bc46c33d
Disco rgido
4c 6f 72 65 6d 20 69 70
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 203361 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
7c75f9, 6>
Chunk-1
ref:0
Chunk-2
ref:1
Disco rgido
4c 6f 72 65 6d 20 69 70
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 203461 6d
Memria principal
Disco rgido
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
7c75f9, 6>
Chunk-1
ref:0
Chunk-2
ref:1
tes ia
y
a b mr
g
rre me
a
C aa
r
pa
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 203561 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
..]
.
[
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
7c75f9, 6>
Chunk-1
ref:0
Chunk-2
ref:1
Disco rgido
73 75 6d 20 64 6f 6c 6f
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 203661 6d
Memria principal
Disco rgido
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
7c75f9, 6>
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 203761 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
a
i
u
69534069 ss
<Aac8d75fb58b069e4c0ab23393ac474d53
o
p ve 146d5, 2>
o
N cha
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
Hash32: 50004260
Disco rgido
72 20 41 51 55 49 20 73
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 203861 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
a
i
u
69534069 ss
<Aac8d75fb58b069e4c0ab23393ac474d53
o
p ve 146d5, 2>
o
N cha
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
buffer
Hash32: 37355783
Disco rgido
72 20 41 51 55 49 20 73
72
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 203961 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
a
i
u
69534069 ss
<Aac8d75fb58b069e4c0ab23393ac474d53
o
p ve 146d5, 2>
o
N cha
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
buffer
Hash32: 37355783
Disco rgido
72 20 41 51 55 49 20 73
72
72 20
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 204061 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
a
i
u
69534069 ss
<Aac8d75fb58b069e4c0ab23393ac474d53
o
p ve 146d5, 2>
o
N cha
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
buffer
Hash32: 48890160
Disco rgido
72 20 41 51 55 49 20 73
72 20
20 41
72
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 204161 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
a
i
u
69534069 ss
<Aac8d75fb58b069e4c0ab23393ac474d53
o
p ve 146d5, 2>
o
N cha
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
buffer
Hash32: 49611023
Disco rgido
72 20 41 51 55 49 20 73
7220
2041
72
72
20
41 51
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 204261 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
Chunk-5
72 20 41 51
=
o
i
he nk
c
<7d03542f8187a51a05a64a5f4670b6432e er
hu
c
ff
buffer
7c75f9, 6>
Bu ovo
n
72 20 41 51 55 49 20 73
Disco rgido
Chunk-1
7220
2041
72
72
20
41 51
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 204361 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
ve
a
1>
ch 6c33d,
o
a
u
ui ingi<Aac8d75fb58b069e4c0ab23393ac474d53
69534069
s
os at 146d5, 2>
p
o ela ite
N69468590
an lim <13f12aec0dad9421e0fdc3d8788343ba23e
j
E
8fb47, 3>
55968102
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
Chunk-5
72 20 41 51
Hash32: 48365873
Disco rgido
72 20 41 51 55 49 20 73
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 204461 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
7c75f9, 6>
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
Chunk-5
72 20 41 51
Chunk-6
55 49 20 73
oc
v
No
n
hu
Disco rgido
72 20 41 51 55 49 20 73
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 204561 6d
Memria principal
Disco rgido
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
7c75f9, 6>
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
Chunk-5
72 20 41 51
Chunk-6
55 49 20 73
tes ia
y
a b mr
g
rre me
a
C aa
r
pa
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 204661 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
a
i
u
69534069 ss
<Aac8d75fb58b069e4c0ab23393ac474d53
o
p ve 146d5, 2>
o
N cha
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
Chunk-5
72 20 41 51
Chunk-6
55 49 20 73
Hash32: 60883294
Disco rgido
69 74 20 61 6d 65 74 2e
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 204761 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
Chunk-5
72 20 41 51
Chunk-6
55 49 20 73
buffer
Hash32: 56557922
Disco rgido
69 74 20 61 6d 65 74 2e
69
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 204861 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
Chunk-5
72 20 41 51
Chunk-6
55 49 20 73
oc
v
No
SHA-1: a1140808c69e1a3c1871996563fcebb8800b0ad3
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
69 74 20 61 6d 65 74 2e
Disco rgido
Chunk-1
n
hu
buffer
69
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 204961 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
Chunk-5
72 20 41 51
Chunk-6
55 49 20 73
Chunk-7
69
buffer
SHA-1: a1140808c69e1a3c1871996563fcebb8800b0ad3
Disco rgido
69 74 20 61 6d 65 74 2e
69
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 205061 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e n
u
8fb47, 3>
Ch
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4do
ref:3
ca
i
l
Chunk-5
up
d
k
72 20 41 51
Chunk-6
55 49 20 73
Chunk-7
69
SHA-1: a1140808c69e1a3c1871996563fcebb8800b0ad3
Disco rgido
69 74 20 61 6d 65 74 2e
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 205161 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
58130798
56557922
38076679
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
Chunk-5
72 20 41 51
Chunk-6
55 49 20 73
or
n
eChunk-7
k
m
e hun
t
n
<a1140808c69e1a3c1871996563fcebb880
ta m c
s
0b0ad3, 5>
Re e u
qu
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
69
<7d03542f8187a51a05a64a5f4670b6432e
7c75f9, 6>
Disco rgido
69 74 20 61 6d 65 74 2e
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 205261 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
Chunk-5
72 20 41 51
Chunk-6
55 49 20 73
Chunk-7
69
Hash32: 38076679
Disco rgido
69 74 20 61 6d 65 74 2e
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 205361 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e n
u
8fb47, 3>
Ch
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
38076679
<7d03542f8187a51a05a64a5f4670b6432e
possui a chave?
7c75f9, 6>
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4do
ref:3
ca
i
l
Chunk-5
up
d
k
72 20 41 51
Chunk-6
55 49 20 73
Chunk-7
69
SHA-1: 7d03542f8187a51a05a64a5f4670b6432e7c75f9
Disco rgido
69 74 20 61 6d 65 74 2e
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 205461 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
Chunk-5
72 20 41 51
Chunk-6
55 49 20 73
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
Chunk-7
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
Chunk-8
38076679
<7d03542f8187a51a05a64a5f4670b6432e
7c75f9, 6>
69
ref:6
Disco rgido
69 74 20 61 6d 65 74 2e
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 205561 6d
Memria principal
Hash32
hashmap<SHA-1, id>
63308178
<730949e23ca46f310466fbf205ffb165aef1f
d7b, 0>
55968102
<da89a0ebe3dfb3a4c4cb2a758caf6515bc4
6c33d, 1>
69534069
<Aac8d75fb58b069e4c0ab23393ac474d53
146d5, 2>
69468590
<13f12aec0dad9421e0fdc3d8788343ba23e
8fb47, 3>
Chunk-1
ref:0
Chunk-2
ref:1
Chunk-3
ref:2
Chunk-4
ref:3
Chunk-5
72 20 41 51
Chunk-6
55 49 20 73
58130798
<9256c5d2d36c9f9c4947b98a1556de7437
d6b790, 4>
Chunk-7
56557922
<a1140808c69e1a3c1871996563fcebb880
0b0ad3, 5>
Chunk-8
ref:6
38076679
<7d03542f8187a51a05a64a5f4670b6432e
7c75f9, 6>
Chunk-9
ref:5
69
Disco rgido
69 74 20 61 6d 65 74 2e
f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 41 51 55 49 20 73 69 74 205661 6d
Algoritmo em execuo
Depois desse processamento, o
arquivo que possua 32 bytes foi
armazenado ocupando apenas 9
bytes, j que o restante do contedo
pde ser aproveitado de um outro
arquivo j armazenado no sistema.
57
Os algoritmos
Carga extra de dados
Problema resolvido
58
59
Anlise de desempenho de
compresso
Datasets
Mesmos dados para comparar sistemas
Anlise de desempenho e
compresso
Dois datasets usados
Cdigo-fonte do Linux (.tar)
stable: 3.9.8 datado de 27 de junho de 2013
(Latest Stable Kernel)
mainline: 3.10-rc7 datado de 22 de junho de
2013
Anlise de desempenho e
compresso
Windows 7 Profissional com Service Pack 1
de 32 bits, processador Intel Core 2 Quad
Q6600 2.40GHz e 3,24GB de memria RAM
DDR2.
Operaes no teste do cdigo-fonte:
Armazenamento
Deduplicao
Deduplicao + Armazenamento
Reidratao
62
Anlise de desempenho
Todas as quatro operaes feitas para
chunks de 128KB, 64KB, 32KB, 16KB e 8KB
Armazenamento:
63
Anlise de desempenho
(armazenamento)
64
Anlise de desempenho
(armazenamento)
tempo mdio com MD5
Anlise de desempenho
(deduplicao)
66
Anlise de desempenho
(deduplicao)
tempo mdio com MD5
tempo mdio com SHA-1
68
Dados no normais
Aplicando o Wilcoxon signed-rank test, resultou em um pvalue 0,00109705.
Com isso a hipteses nula deve ser rejeitada e conclumos
que o tempo mdio dos algoritmos so diferentes.
69
70
Anlise de desempenho
(reidratao)
tempo mdio com MD5
tempo mdio com SHA-1
Anlise de compresso
Mapa de chunks para a deduplicao
do cdigo-fonte
128KB -> 3%
64KB -> 6,5%
32KB -> 11%
16KB -> 25%
8KB -> 41%
72
73
79,397% para 128KB, 85,488% para 64KB, 90,740% para 32KB, 95,731% para
16KB e 98,997% para 8KB
74
Concluso
Com a utilizao do algoritmo de deduplicao
com processamento particionado foi possvel
fazer o processamento de arquivos com
tamanhos maiores do que a memria RAM do
computador utilizado.
A eficincia do algoritmo tambm foi
comprovada atravs da compresso alcanada
entre mquinas virtuais, chegando at a uma
economia equivalente 98,997% de dados em
relao ao tamanho total do arquivo
armazenado no sistema.
75
Trabalhos futuros
Buffer em disco.
Deduplicao particionada vs
Deduplicao de carga completa.
Avaliar desempenho do DeFS de forma
distribuda.
Modificar o DeFS para distribuio de
tarefas de deduplicao entre ns.
Adaptar o Dedupeer para compresso
de dados como zip e rar.
76
Referncia
77
Obrigado!
pfas@cin.ufpe.br
www.dedupeer.com
78