Escolar Documentos
Profissional Documentos
Cultura Documentos
Programming
Debugging and
Profiling
Introduction
● Debugging
– Many techniques same as for serial code
– Blocking can be hard to trace
– Sometimes serial code (or p=1) works, p>1 does not
● Profiling
– Knowing where your code is spending its time is
important
– Amdahl's Law
– Weak and Strong Scaling
General Debugging Approach
● Always get serial code running first!
– Much easier to debug problems
● Debugging tools cannot fix or detect flawed
algorithms
● Build your code in pieces, check each part
– ie. If you need a numerical integrator, write a function
for it and check it by itself with known data
● Break code into separate files, avoid copying
code segments
– allows code reuse, only need to fix bugs once
Debugging Tools
● printf in C, write in Fortran!
● gdb: GNU debugger for use with gcc
– Shows line where program crashes
– Allows inspection of variables and call stack
– other compilers have similar debugger
● valgrind: Heavyweight memory checker
– With dynamically allocated arrays, it's possible to try
to read from or write to invalid memory (ie. already
freed, index too large, etc) but no crash results
– Checks all memory accesses in your code are safe
printf (C) and write (Fortran)
● Printing out variable values is primary debugging
tool
– especially useful when program does not crash but
results are wrong
● Easy to redirect output to file to analysis
– myprog > myprog.log
>gcc g crash.c o crash
>f77 g crash.f o crash
Using gdb
● Running
– Start program in gdb
– No program options on command line!
– Program options are given after “run” command
>gdb ./crash
GNU gdb 6.1debian
Copyright 2004 Free Software Foundation, Inc.
.... [some lines removed] ....
This GDB was configured as "i386linux"...
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb)run
Starting program: /path/to/file/crash
Program received signal SIGSEGV, Segmentation fault.
0x08048511 in main (argc=1, argv=0xbffff964) at crash.c:24
24 d[i][j].a = N*j+i;
(gdb)
Using gdb: Variables
● To examine a simple variable or expression:
(gdb) print i
● Example
Program received signal SIGSEGV, Segmentation fault.
0x08048511 in main (argc=1, argv=0xbffff964) at crash.c:24
24 d[i][j].a = N*j+i;
(gdb) print i
$8 = 4
(gdb) print j
$9 = 0
(gdb) print N*j + i
$10 = 4
(gdb) print $8
$11 = 4
(gdb)
Using gdb: Structures
● To examine a structure:
(gdb) print d
● Example
Program received signal SIGSEGV, Segmentation fault.
0x08048511 in main (argc=1, argv=0xbffff964) at crash.c:24
24 d[i][j].a = N*j+i;
(gdb) print d[1][2]
$16 = {a = 9, b = {3, 6561}}
(gdb) print d[1][2].b[1]
$17 = 6561
(gdb) print $16.a
$18 = 9 The Data Structure
(gdb)
typedef struct {
int a;
float b[2];
} DataStruct;
Using gdb: Pointers
● To examine a pointer and the value pointed to:
(gdb) print p
(gdb) print *p
● Example
Program received signal SIGSEGV, Segmentation fault.
0x08048511 in main (argc=1, argv=0xbffff964) at crash.c:24
24 d[i][j].a = N*j+i;
(gdb) print d
$19 = (DataStruct **) 0x80498f0
(gdb) print *d
$20 = (DataStruct *) 0x8049908
(gdb) print **d
$21 = {a = 0, b = {0, 0}}
Using gdb: Backtrace
● To find out the function calls to get where you
are, use backtrace
(gdb) backtrace
● Example
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16384 (LWP 7137)]
0x08048464 in functionC (c=228) at backtrace.c:11
11 found[d] = 1;
(gdb) backtrace
#0 0x08048464 in functionC (c=228) at backtrace.c:11
#1 0x08048483 in functionB (b=114) at backtrace.c:18
#2 0x0804849e in functionA (a=57) at backtrace.c:26
#3 0x08048500 in main (argc=1, argv=0xbffff914) at backtrace.c:38
(gdb)
Using gdb: Moving Around
● You can move up and down the call stack to
examine variables there
(gdb) up
(gdb) down
● Example
(gdb) backtrace
#0 0x08048464 in functionC (c=228) at backtrace.c:11
#1 0x08048483 in functionB (b=114) at backtrace.c:18
#2 0x0804849e in functionA (a=57) at backtrace.c:26
#3 0x08048500 in main (argc=1, argv=0xbffff914) at backtrace.c:38
(gdb) up
#1 0x08048483 in functionB (b=114) at backtrace.c:18
18 return(c);
(gdb) print b
$1 = 114
(gdb) down
#0 0x08048464 in functionC (c=228) at backtrace.c:11
11 found[d] = 1;
(gdb) print d
$2 = 25
(gdb)
Breakpoints
● Breakpoints allow you to stop your program at a
certain point and examine its data
● Break points can be set in many ways including:
Line Number Function Name
(gdb) break 21 (gdb) break integrate
(gdb) break filename:21 (gdb) break filename:integrate
>gdb ./backtrace
(gdb) break functionB
Breakpoint 1 at 0x8048476: file backtrace.c, line 17.
(gdb) run
Starting program: /path/to/file/backtrace
Starting loop
Breakpoint 1, functionB (b=2) at backtrace.c:17
17 c = functionC(b*2);
(gdb) clear functionB
Deleted breakpoint 1
(gdb) break 10
Breakpoint 2 at 0x804842a: file backtrace.c, line 10.
(gdb) continue
Continuing.
Breakpoint 2, functionC (c=4) at backtrace.c:10
10 d = sqrt(c)+10;
(gdb)
MPI Debugging
● Program is running simultaneously on multiple
computers
– Cannot just use gdb since it is interactive
● Easiest solution is printf/write
– Would like all processes to output at the same time
– Need to synchronize output
● Need MPI_Barrier function
– Will wait until all processes reach this point
MPI_Barrier(MPI_Comm comm)
MPI_Barrier Example
if (myRank == 0){
N = (int)(2*atof(argv[1]));
}
MPI_Bcast(&N, 1, MPI_INT, 0, MPI_COMM_WORLD);
partialSum = 0.0;
for (i=NmyRank;i>=0;i=nProc){
partialSum += pow(1.0, i)/(2*i + 1);
}
printf("Partial Sum from %d is %e\n", myRank, partialSum);
fflush(stdout);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Reduce(&partialSum, &totalSum, 1, MPI_DOUBLE,
MPI_SUM, 0, MPI_COMM_WORLD);
if (myRank == 0){
printf("pi = %e\n", totalSum * 4);
}
Using gdb in Parallel with MPICH
● Debug using multiple processes on one machine
● Start p-1 copies of your program in gdb
● Turn off stopping on SIGUSR1 signal
>gdb myprog
GNU gdb 6.1debian
Copyright 2004 Free Software Foundation, Inc.
... [some lines deleted] ...
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) handle SIGUSR1 noprint
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
(gdb)
Using gdb in Parallel with MPICH
● Start process 0 using mpirun with -dbg option
● Give -p4norem option to your program
● Turn off stopping on SIGUSR1 signal
>mpirun np 4 dbg=gdb myprog p4norem
GNU gdb 6.1debian
Copyright 2004 Free Software Foundation, Inc.
... [some lines deleted] ...
Breakpoint 1, 0x0804c32c in PMPI_Init ()
(gdb) handle SIGUSR1 noprint
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
(gdb)continue
Continuing.
waiting for process on host keitelxx:
/path/to/prog/calc_pi3 keitelxx 47895 p4amslave
mpirun np 4 xterm e gdb ./calc_pi3
Aside: Output in Separate
Terminals
● Can use similar technique to view output in
separate terminals
– Don't forget the “read”, otherwise terminals will close
immediately
– Only works for LAM/MPI
mpirun np 4 xterm e “./calc_pi3 1e5; read”
Sample Screen Shot
Memory Checking
● Tools to check memory access
– More important in C with dynamic memory
allocation/freeing
● Tools include
– Valgrind: no recompilation necessary
– ElectricFence: recompile with -lefence or use
LD_PRELOAD
● Note: Both options slow code by a lot! Do not
use in production runs!
backtrace.c
int functionC(int c){ Output
....
found[d] = 1; >./backtrace
return(d); Starting loop
} Finished
>
int functionB(int b){
....
}
Everything looks good....
int functionA(int a){ But is it really?
....
}
int main(int argc, char *argv[]){
found = (int *)calloc(25,sizeof(int));
printf("Starting loop\n");
for (i=1;i<100;i++){
functionA(i);
}
printf("Finished\n");
}
valgrind
>valgrind ./backtrace
==7833== Memcheck, a memory error detector for x86linux.
==7833== Copyright (C) 20022004, and GNU GPL'd, by Julian Seward et al.
==7833== Using valgrind2.2.0, a program supervision framework for x86linux.
==7833== Copyright (C) 20002004, and GNU GPL'd, by Julian Seward et al.
==7833== For more details, rerun with: v
==7833==
Starting loop
==7833== Invalid write of size 4
==7833== at 0x8048464: functionC (backtrace.c:11)
==7833== by 0x8048482: functionB (backtrace.c:17)
==7833== by 0x804849D: functionA (backtrace.c:23)
==7833== by 0x80484F4: main (backtrace.c:34)
==7833== Address 0x1BA7408C is 0 bytes after a block of size 100 alloc'd
==7833== at 0x1B905901: calloc (vg_replace_malloc.c:176)
==7833== by 0x80484C9: main (backtrace.c:31)
Finished
==7833==
==7833== ERROR SUMMARY: 43 errors from 1 contexts (suppressed: 13 from 1)
==7833== malloc/free: in use at exit: 100 bytes in 1 blocks.
==7833== malloc/free: 1 allocs, 0 frees, 100 bytes allocated.
==7833== For a detailed leak analysis, rerun with: leakcheck=yes
==7833== For counts of detected errors, rerun with: v
Electric Fence
● Will cause program to SEGFAULT on invalid
read or writes
● Use gdb to track down locations
>gdb ./backtrace
(gdb) set environment LD_PRELOAD libefence.so.0.0
(gdb) run
Starting program: /path/to/code/backtrace
Electric Fence 2.1 Copyright (C) 19871998 Bruce Perens.
Starting loop
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16384 (LWP 7848)]
0x08048464 in functionC (c=228) at backtrace.c:11
11 found[d] = 1;
(gdb) bt
#0 0x08048464 in functionC (c=228) at backtrace.c:11
#1 0x08048483 in functionB (b=114) at backtrace.c:17
#2 0x0804849e in functionA (a=57) at backtrace.c:23
#3 0x080484f5 in main (argc=1, argv=0xbffff904) at backtrace.c:34
(gdb)
Profiling
● 90% of the runtime is spent in 10% of the code*
– Focus optimization on this part
– No point optimizing code which does not affect overall
performance
● The challenge is to identify this part of the code
● Frequently, it is not where you expect
>gcc pg myprog.c o myprog
>./myprog
Examining gprof Output
● To examine profiling information, use gprof
program
– generates a lot of output so pipe it through less or
redirect to a file
>gprof ./myprog | less
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
48.53 51.36 51.36 69881880 0.00 0.00 local_calculate_dpsi
12.04 64.11 12.75 139763760 0.00 0.00 neighbour_Psis
6.63 71.13 7.02 52411410 0.00 0.00 local_acc_dpsi_set_source
6.25 77.75 6.62 1747047 0.00 0.00 adjustEdgeDistances
4.41 82.42 4.67 10 0.47 7.98 calculate_dpsi
3.46 86.08 3.67 205201523 0.00 0.00 node_comparison
3.29 89.57 3.49 17470470 0.00 0.00 grow_nodes
3.29 93.05 3.49 1747047 0.00 0.00 calcEdgeDistance
2.17 95.35 2.30 17470470 0.00 0.00 local_advance_psi
2.04 97.51 2.16 17470470 0.00 0.00 local_accumulate_dpsi
.... [continued] ....
Flat Profile
Fraction of total
time spent in function
Total Time spent
in this function
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
48.53 51.36 51.36 69881880 0.00 0.00 local_calculate_dpsi
12.04 64.11 12.75 139763760 0.00 0.00 neighbour_Psis
6.63 71.13 7.02 52411410 0.00 0.00 local_acc_dpsi_set_source
6.25 77.75 6.62 1747047 0.00 0.00 adjustEdgeDistances
4.41 82.42 4.67 10 0.47 7.98 calculate_dpsi
3.46 86.08 3.67 205201523 0.00 0.00 node_comparison
3.29 89.57 3.49 17470470 0.00 0.00 grow_nodes
3.29 93.05 3.49 1747047 0.00 0.00 calcEdgeDistance
2.17 95.35 2.30 17470470 0.00 0.00 local_advance_psi
2.04 97.51 2.16 17470470 0.00 0.00 local_accumulate_dpsi
.... [continued] ....
Flat Profile
Average Time
per function call
to do its own work Average Time
per function call
# of times function to do its own work plus
is called its children
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
48.53 51.36 51.36 69881880 0.00 0.00 local_calculate_dpsi
12.04 64.11 12.75 139763760 0.00 0.00 neighbour_Psis
6.63 71.13 7.02 52411410 0.00 0.00 local_acc_dpsi_set_source
6.25 77.75 6.62 1747047 0.00 0.00 adjustEdgeDistances
4.41 82.42 4.67 10 0.47 7.98 calculate_dpsi
3.46 86.08 3.67 205201523 0.00 0.00 node_comparison
3.29 89.57 3.49 17470470 0.00 0.00 grow_nodes
3.29 93.05 3.49 1747047 0.00 0.00 calcEdgeDistance
2.17 95.35 2.30 17470470 0.00 0.00 local_advance_psi
2.04 97.51 2.16 17470470 0.00 0.00 local_accumulate_dpsi
.... [continued] ....
Call Graph
● Flat Profile shows overall summary and total
number of function calls
● But, can we find where the functions are being
called from?
– Of course!
● gprof output also contains “call graph” details
Call Graph
Summary applies to Parent function
this function
Call graph (explanation follows)
index % time self children called name
<spontaneous>
[1] 99.9 0.08 105.63 main [1]
4.67 75.16 10/10 calculate_dpsi [2]
0.15 16.87 1/1 restore_state [4]
0.72 5.03 10/10 adjust_nodes [9]
0.72 2.30 10/10 advance_psi [14]
0.00 0.00 1/1 initialize_wavespace [34]
0.00 0.00 1/1 destroy_wavespace [33]
4.67 75.16 10/10 main [1]
[2] 75.4 4.67 75.16 10 calculate_dpsi [2]
51.36 12.75 69881880/69881880 local_calculate_dpsi [3]
7.02 0.00 52411410/52411410 local_acc_dpsi_set_source [7]
2.16 0.00 17470470/17470470 local_accumulate_dpsi [16]
1.81 0.00 17470470/17470470 local_set_source [17]
0.06 0.00 17470470/17470470 local_clear_dpsi [22]
0.01 0.00 100/141 node_index_minimum [26]
.... [continued] ....
Child functions
Call Graph
Time spent in this function
Call graph (explanation follows)
index % time self children called name
<spontaneous>
[1] 99.9 0.08 105.63 main [1]
4.67 75.16 10/10 calculate_dpsi [2]
0.15 16.87 1/1 restore_state [4]
0.72 5.03 10/10 adjust_nodes [9]
0.72 2.30 10/10 advance_psi [14]
0.00 0.00 1/1 initialize_wavespace [34]
0.00 0.00 1/1 destroy_wavespace [33]
4.67 75.16 10/10 main [1]
[2] 75.4 4.67 75.16 10 calculate_dpsi [2]
51.36 12.75 69881880/69881880 local_calculate_dpsi [3]
7.02 0.00 52411410/52411410 local_acc_dpsi_set_source [7]
2.16 0.00 17470470/17470470 local_accumulate_dpsi [16]
1.81 0.00 17470470/17470470 local_set_source [17]
0.06 0.00 17470470/17470470 local_clear_dpsi [22]
0.01 0.00 100/141 node_index_minimum [26]
.... [continued] ....
Call Graph
● For example, main takes 0.08 seconds itself and 105.63 seconds calling
other functions
● Of that, 4.67 + 75.16 = 79.83 are spend in calculate_dpsi
● In calculate_dpsi, 51.36 s are spent in local_calculate_dpsi and 12.75 s
are spend in functions called from local_calculate_dpsi
Call graph (explanation follows)
index % time self children called name
<spontaneous>
[1] 99.9 0.08 105.63 main [1]
4.67 75.16 10/10 calculate_dpsi [2]
0.15 16.87 1/1 restore_state [4]
0.72 5.03 10/10 adjust_nodes [9]
0.72 2.30 10/10 advance_psi [14]
0.00 0.00 1/1 initialize_wavespace [34]
0.00 0.00 1/1 destroy_wavespace [33]
4.67 75.16 10/10 main [1]
[2] 75.4 4.67 75.16 10 calculate_dpsi [2]
51.36 12.75 69881880/69881880 local_calculate_dpsi [3]
7.02 0.00 52411410/52411410 local_acc_dpsi_set_source [7]
2.16 0.00 17470470/17470470 local_accumulate_dpsi [16]
1.81 0.00 17470470/17470470 local_set_source [17]
0.06 0.00 17470470/17470470 local_clear_dpsi [22]
0.01 0.00 100/141 node_index_minimum [26]
.... [continued] ....
Next Time
● Some of the topic for next time:
– MPI Topologies
– MPI Data Structures
– More Collective Communications
– All about Communicators
– Non-blocking communications
– Any topics you would like to hear about?
● Remember: No talks next week. We will resume
the week after (Feb 23)