Escolar Documentos
Profissional Documentos
Cultura Documentos
On evaluating GPFS
On evaluating GPFS
IODD IOP
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
GPFS description
http://www.ncsa.uiuc.edu/UserInfo/Data/filesystems/index.html
General Parallel File System (GPFS) is a parallel file system package developed by IBM.
History: Originally developed for IBM's AIX operating system then ported to Linux Systems.
Features: Appears to work just like a traditional UNIX file system from the user application level. Provides additional functionality and enhanced performance when accessed via parallel interfaces such as MPI-I/O. High performance is obtained by GPFS by striping data across multiple nodes and disks. Striping is performed automatically at the block level. Therefore, all files (larger than the designated block size) will be striped. Can be deployed in NSD or SAN configurations. Clusters hosting a GPFS file system can allow other clusters at different geographical locations to mount that file system.
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
fdtree
Used for testing the metadata performance of a file system Create several directories and files, in several levels
Used on:
Computers:
Storage systems:
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
fdtree [local,NFS,GPFS]
./fdtree.bash -f 3 -d 5 -o X
2500
/gpfs /tmp
2000
/mscratch
Operations/Sec.
1500
1000
500
HPC-Europa (HLRS)
Scenario 1:
P1
Pm
several nodes, several process per node, different subtrees, many small files
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
500
Directory creates per second File creates per second File removals per second
Operations/Sec.
400
300
200
100
0 1n-1p
acaldero @ arcos.inf.uc3m.es
4n-4p
4n-8p
HPC-Europa (HLRS)
4n-16p
8n-8p
8n-16p
8
Scenario 2:
P1
Px
several nodes, one process per node, same subtree, many small files
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
10
Working in a GPFS directory with 894 entries ls las need to get each file attribute from GPFS metadata server
306$ hpc13782 noco186.nec 304$ 307$ 305$ time ls -als | wc -l 894 real 0m0.466s 0m0.033s 0m0.034s 0m0.222s 0m0.009s user 0m0.010s 0m0.011s 0m0.025s sys 0m0.052s 0m0.024s 0m0.064s
acaldero @ arcos.inf.uc3m.es
fdtree results
Main conclusions
Contention at directory level: If two o more process from a parallel application need to write data, please be sure each one use different subdirectories from GPFS workspace Better results than NFS (but lower that the local file system)
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
12
Bonnie
Used on:
Computers:
Storage systems:
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
13
write rewrite
bandwidth (MB/sec.)
120 100 80 60 40 20 0
write rewrite cacau1-GPFS 51,86 3,43 noco075-GPFS 164,69 36,35
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
14
200
read
bandwidth (MB/sec.)
150
100
50
0
read
cacau1-GPFS 75,85
noco075-GPFS 232,38
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
15
Iozone
Write and read with several file size and access size Write and read bandwidth
Used on:
Computers:
Noco075 GPFS
Storage systems:
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
16
800,00-1000,00 600,00-800,00
1000,00
400,00-600,00 200,00-400,00
Bandwidth (MB/s)
800,00
0,00-200,00
600,00
400,00
200,00
16384 2048
0,00
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
acaldero @ arcos.inf.uc3m.es
524288
ec
Le n
32
(b y
te s
256
HPC-Europa (HLRS)
17
1500,00-2000,00 1000,00-1500,00
2000,00
500,00-1000,00
Bandwidth (MB/s)
1500,00
0,00-500,00
1000,00
500,00
64
128
64
256
512
1024
2048
4096
16
8192
16384
32768
65536
131072
262144
acaldero @ arcos.inf.uc3m.es
524288
ec Le n
(b yt
0,00
es )
HPC-Europa (HLRS)
18
IODD
next ->
Used on:
2, and 4 nodes, 4, 8, 16, and 32 process (1, 2, 3, and 4 per node) that write a file of 1, 2, 4, 8, 16, and 32 GB By using both, POSIX interface and MPI-IO interface
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
19
a b .. n
a b .. n
a b .. n
nodex = 2, 4 nodes processm = 4, 8, 16, and 32 process (1, 2, 3, 4 per node) file sizen = 1, 2, 4, 8, 16 and 32 GB
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
20
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
21
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
22
70 60 50 40 30 20 10 0 1 2 4
1 20 2 4 8 16 32 8
HPC-Europa (HLRS)
0 32 16
file size (GB) 1
acaldero @ arcos.inf.uc3m.es
23
60 40 20 8 16 32 0
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
24
60 40 20 8 16 32 0
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
25
IODD results
Main conclusions
The bandwidth decrease with the number of processes per node Beware of multithread application with medium-high I/O bandwidth requirements for each thread It is very important to use MPI-IO because this API let users get more bandwidth The bandwidth decrease with more than 4 nodes too
With large files, the metadata management seams not to be the main bottleneck
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
26
Get the bandwidth obtained by writing and reading in parallel from several processes The file size is divided between the process number so each process work in an independent part of the file
Used on:
GPFS through MPI-IO (ROMIO on Open MPI) Two nodes writing a 2 GB files in parallel
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
27
P1
P2
Pm
a ..
b .. n
x ..
a b .. x a b .. x n
a b .. x
acaldero @ arcos.inf.uc3m.es
120 100 80 60 40 20 0
64 KB 12 8K B
25 6K B 51 2K B
16 KB
32 KB
1K B
2K B
4K B
8K B
access size
acaldero @ arcos.inf.uc3m.es
1M
HPC-Europa (HLRS)
29
64 KB 12 8K B
25 6K B 51 2K B
16 KB
32 KB
1K B
2K B
4K B
8K B
access size
acaldero @ arcos.inf.uc3m.es
1M
HPC-Europa (HLRS)
30
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
31
100 80 60 40 20 0
1 MB 8 KB 4 KB
2 KB
64 KB
32 KB
512 KB
256 KB
128 KB
access size
acaldero @ arcos.inf.uc3m.es
16 KB
1 KB
32
HPC-Europa (HLRS)
IOP results
Main conclusions
If several process try to write to the same file but on independent areas then the performance decrease With several independent files results are similar on several tests, but with shared file are more irregular
Appears a magic number: 128 KB Seams that at that point the internal algorithm changes and it increases the bandwidth
acaldero @ arcos.inf.uc3m.es
HPC-Europa (HLRS)
33