Escolar Documentos
Profissional Documentos
Cultura Documentos
Mert Hidayetoglu
University of Illinois at Urbana-Champaign, IL, USA
Transceiver
Reconstructed
Object
Receiver
?
Inverse-Scattering Problems Measured Scattered
Mathematical and Physical
Modeling
Field Data Algorithm Design
Multiple
Scattering
Reconstructed
Transceiver Solutions
Object
Receiver
?
(Sensor) Inverse Solver on
Parallel Computers
128 𝑹𝑿
Reconstruction
2λ
0° Frequency
6λ
Fourier Relation: ℱ −1
Case 2: Full Angle Sensing Fourier Relation: ℱ
DC
Monochromatic Component
is captured
128 𝑹𝑿 by these
sensors DC component
Receivers is captured!
dB
Frequency
2λ
0° 6λ
Fourier Relation: ℱ −1
Distorted Born Approximation
Scattering Equation:
𝐺 = 𝐺0 + 𝐺0 𝑂𝐺
Under Perturbation: 𝛿𝑂
𝑂 = 𝑂𝑏 + 𝛿𝑂
𝐺 = 𝐺𝑏 + 𝛿𝐺
𝐺𝑏 = 𝐺0 + 𝐺0 𝑂𝑏 𝐺𝑏 𝑂𝑏
𝐺𝑏 + 𝛿𝐺 = 𝐺0 + 𝐺0 (𝑂𝑏 + 𝛿𝑂)(𝐺𝑏 + 𝛿𝐺)
𝐺𝑏 + 𝛿𝐺 = 𝐺0 + 𝐺0 𝑂𝑏 𝐺𝑏 + 𝐺0 𝛿𝑂𝐺𝑏 + 𝐺0 𝑂𝑏 𝛿𝐺 + 𝐺0 𝛿𝑂𝛿𝐺
Higher-order var.
𝛿𝐺 ≈ 𝐺𝑏 𝛿𝑂𝐺𝑏
Distorted Born Approximation (omits higher order variations)
Nearfield
Basis Functions Testing Functions
P2P Evaluations
Radiated Field from a Point Source
Fast Solvers for Forward-
Scattering Problems
Number of Levels: 11
Domain Size: 409.6λ
Number of Pixels: 16,777,216
Multiplication Time: 2.16 sec.
Max. Error: 1e-6
The Multilevel Fast Multipole Algorithm (MLFMA)
Region A Region B
Level 3
M2L
Multipole Local
Expansions M2L Expansions
Level 2
M2M M2M M2L L2L L2L Far-Field
M2L Evaluations
Level 1 L2P L2P L2P
P2M P2M P2M P2M L2P
Memory: 16.4 GB
MVM Time: 282 ms
Memory: 22.8 GB
Dimension: 409.6λ
Unknowns: 16.78 M
MVM Time: 17 s
Dimension: 12.8λ
Unknowns: 16.38 k Memory: 49.4 MB
MVM Time: 15 ms
235.3 (16x16)
3.44 s (16x1)
85.20(16x16)
14.51 (16x1)
11.93 (16x32)
7.45 (16x1)
Pure MPI
ഥ †𝑖 ⋅ 𝑭
Multiplications 𝑭 ഥ †𝑖 ⋅ 𝝓𝒊 can be performed independently
ഥ 𝑖 ⋅ 𝑶 and 𝑭
Solution Time: Inverse Sol. Forward Sol.
𝜇 = 13 𝜇 = 15 𝜇 = 20
𝜇 = 30 𝜇 = 40 𝜇 = 60
Born Iterative Method
𝐵𝐼𝑀: 𝛿𝜙 ≈ 𝐺0 𝛿𝑂𝜙𝑏 𝜇2 ത𝑰 + 𝑭
ഥ† ⋅ 𝑭
ഥ ⋅𝑶=𝑭
ഥ † ⋅△ 𝝓
Error
𝜇 = 60
BIM vs. DBIM BIM: 𝛿𝜙 ≈ 𝐺0 𝛿𝑂𝜙𝑏
Normal Equation: 𝜇2 ത𝑰 + 𝑭
ഥ† ⋅ 𝑭
ഥ ⋅ 𝛿𝑶 = 𝑭
ഥ † ⋅ 𝛿𝝓
DBIM: 𝛿𝜙 ≈ 𝐺𝑏 𝛿𝑂𝜙𝑏
𝜇 = 40 𝜇 = 60 𝜇 = 80
Switch
Nodes
Master
Node
Processor Core
Execution Execution
Comm. MPI Region
MPI MPI
Process Process (p processes)
Comm.
MPI MPI MPI Region
Process Process (p processes)
Parallel Born Solutions
Forward Solution(s)
T/X Frequencies
Execution Execution
MLFMA MLFMA …
MPI MPI MPI Region
Process Process (p processes)
MLFMA MLFMA …
OpenMP OpenMP OpenMP OpenMP MPIxOpenMP Region
Thread Thread Thread Thread (p×t threads)
…
MPI MPI MPI Region
Process Process
(p processes) MLFMA MLFMA …
T/X Positions
No communication here
Parallel Born Solutions
Inverse Solution
T/X Frequencies
Process Process
Master Slave
Execution Execution ഥ1,𝐹 ⋅ 𝑶
𝑭 … ഥ 𝑇,𝐹 ⋅ 𝑶
𝑭
Comm. MPI Region
MPI MPI
Process Process (p processes)
𝑶
𝑶 𝑶
…
OpenMP OpenMP OpenMP OpenMP MPIxOpenMP Region
Thread Thread Thread Thread 𝒃𝐹,1 𝒃𝐹,𝑇
(p×t threads)
Master Process
Process 𝑶 Process
Comm.
MPI Region
MPI
Process
MPI
Process
(p processes) ഥ1,1 ⋅ 𝑶
𝑭
… ഥ 𝑇,1 ⋅ 𝑶
𝑭
𝒃 𝒃1,𝑇
T/X Positions
Scaling on CPU Nodes
11.5 Hours
1 Thread
56 Seconds
2,048 Threads
50 minutes
32 Thread 38 Seconds
1.2 Hours
4,096 Threads
16 Thread
NCSA Blue Waters
4,228 XK Nodes
OpenMP implementation on Blue Waters (within a single node) 1,048,576 Unknowns
Example: 16 OpenMP threads 8 Levels
Floating Point Unit
default option: -j0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-j0 (default)
-j1
-j1
-j0 (default)
CPU Nodes vs. GPU Nodes
50 min. 38 sec.
(32 threads) (4,096 threads)
Sequential Execution:
10 min. 11.5 Hours
(1 GPU)
7 sec.
(128 GPUs)
Synthetic Reconstructions
𝛿𝜖 𝒓 = 𝜖 𝒓 − 𝜖𝑏
102.4𝜆
Synthetic Reconstructions
Computational Resources:
256 Computing Nodes
256 MPI Processes
𝛿𝜖 𝒓 = 𝜖 𝒓 − 𝜖𝑏
32 Threads per Node
8,192 OpenMP Threads
102.4𝜆
Node Node
CPU CPU
Synthetic Reconstructions
Synthetic Reconstructions
Synthetic Reconstructions
Synthetic Reconstructions
Synthetic Reconstructions
128 𝑹𝑿
0° Line
linear
linear
Target
128 Frequencies
128 Receivers
1 Illumination
Application Example: Ultrasound Imaging
Freq. Range: 2.2 MHz – 8.6 MHz
128 𝑹𝑿
Line
linear
linear
Target
+30°
128 Frequencies
128 Receivers
1 Illumination
Application Example: Ultrasound Imaging
Freq. Range: 2.2 MHz – 8.6 MHz
128 𝑹𝑿
linear
6λ
linear
0°
128 Frequencies
128 Receivers
1 Illumination
Application Example: Ultrasound Imaging
Freq. Range: 2.2 MHz – 8.6 MHz
128 𝑹𝑿
linear
6λ
linear
+30°
128 Frequencies
128 Receivers
1 Illumination
Current Problems
• Regularization of Born solvers
• Convergence problems with strong scatterers
• Limited-angle measurements
Future Plans
• Near-real time imaging
• New & practical applications
Acknowledgments
Mert Hidayetoglu
University of Illinois at Urbana-Champaign, IL, USA