ALEX Gpfs

Research work that has been done at HLRS by Alejandro Calderon
On evaluating GPFS
On evaluating GPFS
Short description Metadata evaluation fdtree
Bandwidth evaluation Bonnie Iozone

IODD IOP
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
GPFS description
http://www.ncsa.uiuc.edu/UserInfo/Data/filesystems/index.html
General Parallel File System (GPFS) is a parallel file system package developed by IBM.
History: Originally developed for IBM's AIX operating system then ported to Linux Systems.
Features: Appears to work just like a traditional UNIX file system from the user application level. Provides additional functionality and enhanced performance when accessed via parallel interfaces such as MPI-I/O. High performance is obtained by GPFS by striping data across multiple nodes and disks. Striping is performed automatically at the block level. Therefore, all files (larger than the designated block size) will be striped. Can be deployed in NSD or SAN configurations. Clusters hosting a GPFS file system can allow other clusters at different geographical locations to mount that file system.
HPC-Europa (HLRS)
GPFS (Simple NSD Configuration)
HPC-Europa (HLRS)
GPFS evaluation (metadata)
fdtree

Used for testing the metadata performance of a file system Create several directories and files, in several levels
Used on:
Computers:
noco-xyz Local, GPFS
Storage systems:
HPC-Europa (HLRS)
fdtree [local,NFS,GPFS]
./fdtree.bash -f 3 -d 5 -o X
2500
/gpfs /tmp
2000
/mscratch
Operations/Sec.
1500
1000
500
0 Directory creates per second

File creates per second
File removals per second
Directory removals per second

6
HPC-Europa (HLRS)
fdtree on GPFS (Scenario 1)

ssh {x,...} fdtree.bash -f 3 -d 5 -o /gpfs...
nodex
Scenario 1:

P1
Pm
several nodes, several process per node, different subtrees, many small files
HPC-Europa (HLRS)
fdtree on GPFS (scenario 1)

ssh {x,...} fdtree.bash -f 3 -d 5 -o /gpfs... 600
500
Directory creates per second File creates per second File removals per second
Operations/Sec.
400
Directory removals per second
300
200
100
0 1n-1p
4n-4p
4n-8p
HPC-Europa (HLRS)
4n-16p
8n-8p
8n-16p
8
fdtree on GPFS (Scenario 2)

ssh {x,...} fdtree.bash -l 1 -d 1 -f 1000 -s 500 -o /gpfs...
nodex
Scenario 2:

P1
Px
several nodes, one process per node, same subtree, many small files
HPC-Europa (HLRS)
fdtree on GPFS (scenario 2)

ssh {x,...} fdtree.bash -l 1 -d 1 -f 1000 -s 500 -o /gpfs...
45
working in the same directory

40 35
working in different directories
Files creates per second
30 25 20 15 10 5 0 1 2 4 8 number of process (1 per node)
HPC-Europa (HLRS)
10
Metadata cache on GPFS client
Working in a GPFS directory with 894 entries ls las need to get each file attribute from GPFS metadata server
306$ hpc13782 noco186.nec 304$ 307$ 305$ time ls -als | wc -l 894 real 0m0.466s 0m0.033s 0m0.034s 0m0.222s 0m0.009s user 0m0.010s 0m0.011s 0m0.025s sys 0m0.052s 0m0.024s 0m0.064s
In a couple of seconds, the contents of the cache seams disappear

HPC-Europa (HLRS)
11
fdtree results
Main conclusions
Contention at directory level: If two o more process from a parallel application need to write data, please be sure each one use different subdirectories from GPFS workspace Better results than NFS (but lower that the local file system)
HPC-Europa (HLRS)
12
GPFS performance (bandwidth)
Bonnie

Read and write a 2 GB file Write, rewrite and read
Used on:
Computers:

Cacau1 Noco075 GPFS
Storage systems:
HPC-Europa (HLRS)
13
Bonnie on GPFS [write + re-write]

GPFS over NFS
180 160 140
write rewrite
bandwidth (MB/sec.)
120 100 80 60 40 20 0
write rewrite cacau1-GPFS 51,86 3,43 noco075-GPFS 164,69 36,35
HPC-Europa (HLRS)
14
Bonnie on GPFS [read]

GPFS over NFS
250
200
read
bandwidth (MB/sec.)
150
100
50
0
read
cacau1-GPFS 75,85
noco075-GPFS 232,38
HPC-Europa (HLRS)
15
GPFS performance (bandwidth)
Iozone

Write and read with several file size and access size Write and read bandwidth
Used on:
Computers:
Noco075 GPFS
Storage systems:
HPC-Europa (HLRS)
16
Iozone on GPFS [write]

Write on GPFS
1000,00-1200,00
1200,00
800,00-1000,00 600,00-800,00
1000,00
400,00-600,00 200,00-400,00
Bandwidth (MB/s)
800,00
0,00-200,00
600,00
400,00
200,00
16384 2048
0,00
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
File size (KB)
524288
ec
Le n
32
(b y
te s
256
HPC-Europa (HLRS)
17
Iozone on GPFS [read]

Read on GPFS
2000,00-2500,00
2500,00
1500,00-2000,00 1000,00-1500,00
2000,00
500,00-1000,00
Bandwidth (MB/s)
1500,00
0,00-500,00
1000,00
500,00
64
128
64
256
512
1024
2048
4096
16
8192
16384
32768
65536
131072
262144
File size (KB)
524288
ec Le n
(b yt
0,00
es )
16384 4096 1024 256
HPC-Europa (HLRS)
18
GPFS evaluation (bandwidth)
IODD
next ->
Evaluation of disk performance by using several nodes:

disk and networking
A dd-like command that can be run from MPI
Used on:
2, and 4 nodes, 4, 8, 16, and 32 process (1, 2, 3, and 4 per node) that write a file of 1, 2, 4, 8, 16, and 32 GB By using both, POSIX interface and MPI-IO interface
HPC-Europa (HLRS)
19
How IODD works

nodex
P1 P2 Pm
a b .. n
a b .. n
a b .. n
nodex = 2, 4 nodes processm = 4, 8, 16, and 32 process (1, 2, 3, 4 per node) file sizen = 1, 2, 4, 8, 16 and 32 GB
HPC-Europa (HLRS)
20
IODD on 2 nodes [MPI-IO]

GPFS (writing, 2 nodes) bandwidth (MB/sec.) 160-180 140-160 120-140 100-120 80-100 60-80 40-60 20-40 0-20
180 160 140 120 100 80 60 40 20 0 1 2 4 8 32 16 8 2

file size (GB)
process per node
HPC-Europa (HLRS)
21
IODD on 4 nodes [MPI-IO]

GPFS (writing, 4 nodes) bandwidth (MB/sec.) 160-180 140-160 120-140 100-120 80-100 60-80 40-60 20-40 0-20
180 160 140 120 100 80 60 40 20 0 1 2 4 8 32 16 8

file size (GB)
process per node
HPC-Europa (HLRS)
22
Differences by using different APIs

GPFS (2 nodes, POSIX) GPFS (2 nodes, MPI-IO)
GPFS (writing, 2 nodes) bandwidth (MB/sec.)
180 160 140 120 100 80 60 40 1 2 4 8 16 2 4

process per node file size (GB)
70 60 50 40 30 20 10 0 1 2 4
160-180 140-160 120-140 100-120 80-100 60-80 40-60 20-40 0-20
1 20 2 4 8 16 32 8
HPC-Europa (HLRS)
0 32 16
file size (GB) 1
process per node
23
IODD on 2 GB [MPI-IO, = directory]

GPFS (writing, 1-32 nodes, same directory) bandwidth (MB/sec.)
160 140 120 100 80 1 2 4

Number of nodes
60 40 20 8 16 32 0
HPC-Europa (HLRS)
24
IODD on 2 GB [MPI-IO, directory]

GPFS (writing, 1-32 nodes, different directories) bandwidth (MB/sec.)
160 140 120 100 80 1 2 4

Number of nodes
60 40 20 8 16 32 0
HPC-Europa (HLRS)
25
IODD results
Main conclusions
The bandwidth decrease with the number of processes per node Beware of multithread application with medium-high I/O bandwidth requirements for each thread It is very important to use MPI-IO because this API let users get more bandwidth The bandwidth decrease with more than 4 nodes too
With large files, the metadata management seams not to be the main bottleneck
HPC-Europa (HLRS)
26
GPFS evaluation (bandwidth)

IOP
Get the bandwidth obtained by writing and reading in parallel from several processes The file size is divided between the process number so each process work in an independent part of the file
Used on:

GPFS through MPI-IO (ROMIO on Open MPI) Two nodes writing a 2 GB files in parallel

On independent files (non-shared) On the same file (shared)
HPC-Europa (HLRS)
27
How IOP works

File per process (non-shared) P1 P2 Pm Segmented access (shared)
P1
P2
Pm
a ..
b .. n
x ..
a b .. x a b .. x n
a b .. x
2 nodes m = 2 process (1 per node) n = 2 GB file size

HPC-Europa (HLRS)
28
IOP: Differences by using shared/non-shared

writing on file(s) over GPFS
180 160 140
Bandwidth (MB/sec.)
NON-shared shared
120 100 80 60 40 20 0
64 KB 12 8K B
25 6K B 51 2K B
16 KB
32 KB
1K B
2K B
4K B
8K B
access size
1M
HPC-Europa (HLRS)
29
IOP: Differences by using shared/non-shared

reading on file(s) over GPFS
200 180 160
Bandwidth (MB/sec.)
NON-shared shared
140 120 100 80 60 40 20 0
64 KB 12 8K B
25 6K B 51 2K B
16 KB
32 KB
1K B
2K B
4K B
8K B
access size
1M
HPC-Europa (HLRS)
30
GPFS writing in non-shared files
GPFS writing in a shared file
HPC-Europa (HLRS)
31
GPFS writing in shared file: the 128 KB magic number

140 120
bandwith (MB/sec)
100 80 60 40 20 0
1 MB 8 KB 4 KB
write read Rread Bread
2 KB
64 KB
32 KB
512 KB
256 KB
128 KB
access size
16 KB
1 KB
32
HPC-Europa (HLRS)
IOP results
Main conclusions
If several process try to write to the same file but on independent areas then the performance decrease With several independent files results are similar on several tests, but with shared file are more irregular
Appears a magic number: 128 KB Seams that at that point the internal algorithm changes and it increases the bandwidth
HPC-Europa (HLRS)
33

ALEX Gpfs

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

ALEX Gpfs

Enviado por

Direitos autorais:

Formatos disponíveis

Research work that has been done at HLRS by Alejandro Calderon

Short description Metadata evaluation fdtree

Bandwidth evaluation Bonnie Iozone

GPFS (Simple NSD Configuration)

GPFS evaluation (metadata)

noco-xyz Local, GPFS

0 Directory creates per second

File creates per second

File removals per second

Directory removals per second

fdtree on GPFS (Scenario 1)

fdtree on GPFS (scenario 1)

Directory removals per second

fdtree on GPFS (Scenario 2)

fdtree on GPFS (scenario 2)

working in the same directory

working in different directories

Files creates per second

30 25 20 15 10 5 0 1 2 4 8 number of process (1 per node)

Metadata cache on GPFS client

In a couple of seconds, the contents of the cache seams disappear

GPFS performance (bandwidth)

Read and write a 2 GB file Write, rewrite and read

Cacau1 Noco075 GPFS

Bonnie on GPFS [write + re-write]

Bonnie on GPFS [read]

GPFS performance (bandwidth)

Iozone on GPFS [write]

File size (KB)

Iozone on GPFS [read]

File size (KB)

16384 4096 1024 256

GPFS evaluation (bandwidth)

Evaluation of disk performance by using several nodes:

A dd-like command that can be run from MPI

How IODD works

IODD on 2 nodes [MPI-IO]

180 160 140 120 100 80 60 40 20 0 1 2 4 8 32 16 8 2

process per node

IODD on 4 nodes [MPI-IO]

180 160 140 120 100 80 60 40 20 0 1 2 4 8 32 16 8

process per node

Differences by using different APIs

GPFS (writing, 2 nodes) bandwidth (MB/sec.)

180 160 140 120 100 80 60 40 1 2 4 8 16 2 4

160-180 140-160 120-140 100-120 80-100 60-80 40-60 20-40 0-20

process per node

IODD on 2 GB [MPI-IO, = directory]

160 140 120 100 80 1 2 4

IODD on 2 GB [MPI-IO, directory]

160 140 120 100 80 1 2 4

GPFS evaluation (bandwidth)

On independent files (non-shared) On the same file (shared)

How IOP works

2 nodes m = 2 process (1 per node) n = 2 GB file size

IOP: Differences by using shared/non-shared

IOP: Differences by using shared/non-shared

140 120 100 80 60 40 20 0

GPFS writing in non-shared files

GPFS writing in a shared file

GPFS writing in shared file: the 128 KB magic number

write read Rread Bread

Você também pode gostar