Magnetic Resonance Imaging Based Correction and Reconstruction of Positron Emission Tomography Images

Diss. ETH No.
13484
Magnetic Resonance Imaging Based Correction

and Reconstruction of Positron Emission
Tomography Images
A dissertation submitted to the
SWISS FEDERAL INSTITUTE OF TECHNOLOGY
for the degree of

Doctor of Technical Sciences
presented by
Jonathan David Oakley
B.Sc., M.Phil.
born September 13, 1970
accepted on the recommendation of

Prof. Dr. Gábor Székely, examiner
Prof. Dr. Hans Herzog, co-examiner
Dr. John Missimer, co-examiner
Zurich, January 20, 2000

Abstract
Due to the inherently limited resolution of Positron Emission Tomography (PET) scanners
and the subsequent poor ability to resolve detail in reconstructed images, tracer activity
observed in active regions such as grey matter (GM) is often less than actual activity; in
inactive regions such as white matter (WM), more. Such errors become especially signifi-
cant in regions having dimensions of less than twice the available resolution. Accuracy of
the quantitative measurements taken is thus influenced by the undesirable partial volume
averaging of, for example, brain with cerebro-spinal fluid (CSF), bone and scalp regions.
Various efforts have been made to correct for this, and more recently, to address the more
subtle influence of the so-called partial volume effect (PVE) occurring across regions of
GM and WM.
The work presented in this report is toward the support of better quantitative analysis
in PET imaging. The emphasis here is to address the problem of limited resolution from
a purely information processing perspective; that is, ignoring the role played by the type
of tracer used and possible hardware modifications (crystal choice, et. cetera), which will
also have a significant influence on such an analysis. On the basis of the data accumulated
by the acquisition process, efforts toward better quantification in PET images must begin
by operating on the raw, sinogram data. And, as is concluded in this work, all techniques
that operate at the image level alone - namely enhancement methods - would be better
formulated as part of the reconstruction step.
The general concern of this doctoral work is how to use associated structural informa-
tion to improve the quality of the PET images. This information is available in the form of
Magnetic Resonance Imaging (MRI) data, a modality capable of resolving fine structural
detail in the brain. Within this topic area, the specifics of the thesis work can be broadly
classified into four themes, and to each of these I can cite a contribution to the existing
knowledge and technology.
The first covers method of PVE correction applied in accordance to associated struc-
tural information at the image level. A limiting factor of such techniques is the assumed
homogeneity in the within-tissue activity distribution. The implementation presented here
forsakes such an assumption, and models the variation in terms of basis functions.
The remaining themes arose from the shortcomings of this correction method, taking
the research into the field of statistical image reconstruction. Within this realm, the
second contribution comprises an interpolation technique that, in modelling uncertainty
about the emission source, is able to produce better regularised and improved resolution
reconstructions.
Akin to the second contribution, the third aspect of the work seeks also to address
misgivings of the pixel-based correction methods, this time with regard to the use of the
“prior” information. Here, a more robust and theoretically stringent pseudo-Bayesian
ii Abstract
approach1 has been developed, as well as extensions to the full Bayesian methods2 . This
has included a scheme for coupling the pixel-based correction method with a reconstruction
scheme in order to relax to a solution compromising them both.
Having developed algorithms for reconstructing PET images constrained in accordance
to associated structural information, doubts remain concerning the degree of influence
afforded to the “prior”, as well as regarding the computational overheads. The fourth and
final aspect of this work deals with these issues, and in doing so this doctoral work has been
able to address each aspect of the initial project proposal. This involves an anisotropic
resolution representation of the image data, where the resolution is adapted in accordance
to either the structural information, or, where this is not to hand (which is actually likely
to be the case in most clinical settings), in accordance to local estimates of the image’s
signal to noise ratio. In simulated studies the resulting images have consistently shown
lower errors, and in real patient studies the results are impressive. Potential over-bias
resulting from the use of “prior” knowledge (that in many cases is likely to be inaccurate)
is removed, and the computational costs are acceptable for use in the clinical environment.
1
Those better suited to transmission image reconstruction.
2
Those better suited to emission image reconstruction.
Zusammenfassung
Bedingt durch die niedrige Auflösung von Positronenemissionstomographen (PET) besit-

zen die rekonstruierten Bilder einen sehr eingeschränkten Detailreichtum. An den Grenzen
zwischen aktiven und inaktiven Regionen wird daher die Aktivität des Tracers in aktiven
Regionen, wie der grauen Substanz (GM), unterschätzt und in inaktiven Regionen, wie
der weissen Substanz (WM), überschätzt. Dieser Effekt ist besonders bedeutsam bei der
Abbildung von Strukturen, deren Ausdehnungen unter dem zweifachen der erreichbaren
Auflösung liegen. Die Genauigkeit quantitativer Messungen wird durch die unerwünschte
Mittelung über beispielsweise Hirnsubstanz, Gehirn-Rückenmark-Flüssigkeit (CSF), Kno-
chen und Kopfhaut, eingeschränkt. Die auftretende Mittelung bezeichnet man als partial
”
volume effect“ (PVE). Zahlreiche Versuche wurden unternommen dieses Problem zu lösen.
In letzter Zeit wird zudem untersucht, wie die schwieriger zu beherrschenden Effekte, die
an der Grenze zwischen weisser und grauer Substanz auftreten, kontrolliert werden können.
Im Rahmen der vorliegenden Doktorarbeit wurden Verfahren zur verbesserten quan-
titativen Rekonstruktion von Positronenemissionstomographien entwickelt. Das Problem
der eingeschränkten Auflösung wurde dabei ausschliesslich vom Informationsverarbeitungs-
standpunkt aus angegangen. Verbesserungsmöglichkeiten durch eine geeignetere Wahl der
Tracer und Modifikationen am Tomographen (Auswahl der Detektorkristalle, u.a.) wur-
den nicht berücksichtigt, obwohl ihnen in der Praxis eine bedeutende Rolle zukommt.
Bemühungen um eine bessere quantitative Rekonstruktion von PET-Bildern aus den bei
der Bildakquisition gewonnenen Daten müssen bei den unverarbeiteten Sinogrammdaten
ansetzen. Im Resümee dieser Arbeit wird sich zeigen, dass alle Verfahren, die nur auf der
Basis rekonstruierter Bilder arbeiten, die sogenannten Enhancement-Verfahren, besser als
Teil des Rekonstruktionsprozesses formuliert würden.
Das Hauptanliegen dieser Doktorarbeit ist, darzulegen, wie zugehörige strukturelle
Information die Qualität der PET Bildgebung verbessern kann. Derartige Information
ist in Form von Magnetresonanzbildern (MRI Bildern) in einer hochauflösenden Variante
vorhanden. Innerhalb dieser Thematik kann diese Arbeit in vier Schwerpunkte eingeteilt
werden, wobei jeweils ein Beitrag zum aktuellen Stand der Forschung und der Technologie
geleistet wurde. Der erste Schwerpunkt liegt in der Korrektur des Teilvolumeneffektes
(PVE) basierend auf zugehöriger strukureller Information auf der Ebene des Bildes. Der
begrenzende Faktor ist dabei die angenommene Homogenität der Aktivität innerhalb des
Gewebes. Die vorgestellte Implementierung verzichtet eine derartige Annahme und mo-
delliert diese Verteilung mit Basisfunktionen.
Die weiteren Themen ergeben sich aus den Defiziten dieser Korrekturmethode und
führen in das Gebiet der statistischen Bildrekonstruktion. Innerhalb dieses Gebiets stellt
der zweite Schwerpunkt einen Beitrag zur verbesserten Interpolation dar, welcher die Unsi-
cherheiten in den Emissionsquellen berücksichtigt und deshalb eine bessere Regularisierung
iv Zusammenfassung
und eine verbesserte Auflösung der Rekonstruktion ermöglicht.

Ähnlich wie der zweite Beitrag korrigiert der dritte Aspekt dieser Arbeit ein Defizit
der pixelbasierten Korrektur, diesmal in Bezug auf verwendete prior“ Information. Dabei
”
wurde sowohl ein theoretisch konsequenter pseudo-Bayesscher Ansatz entwickelt, als auch
eine vollständig Bayessche Korrekturmethode. Erstere eignet sich dabei besser für Trans-
missionsbildrekonstruktion und letztere für Emissionsbildrekonstruktion. Dies schliesst
auch ein Verfahren ein, welches die pixelbasierte Korrektur mit der Rekonstruktion ver-
bindet und so eine Lösung liefert, welche beide Verfahren umfasst.
Ausgehend von den Verfahren zur Rekonstruktion mittels zugehöriger struktureller
Information kommt die Frage auf, wie gross der Einfluss des Priors“ und der benötigte
”
Rechenaufwand sind. Der vierte und abschliessende Aspekt dieser Arbeit behandelt diese
Punkte und erfüllt damit alle Anforderungen des ursprünglichen Projektantrages. Hierbei
kommt eine anisotrope Auflösungsdarstellung der Bilddaten zur Anwendung, bei welcher
die Auflösung entweder an die vorhandenen strukturellen Daten oder - falls diese fehlen
(was in der tatsächlichen klinischen Anwendung meist der Fall sein dürfte) - entsprechend
dem lokalen Signal-zu-Rausch Verhältnis angepasst werden. In simultierten Studien zeig-
ten die resultierenden Bilder durchweg kleinere Fehler, und in Studien mit echten Patienten
konnten eindrucksvolle Resultate erzielt werden. Ein mögliche Überbewertung der oft un-
genauen prior“ Information wird dadurch entfernt, und gleichzeitig wird der benötigte
”
Rechenaufwand auf ein für klinische Verhältnisse erträgliches Mass reduziert.
Contents
1 Introduction 1
1.1 Toward the Intrinsic Resolution of the PET Scanner . . . . . . . . . . . . . 1
1.2 Organisation of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Compartment Model Based Methods of PET Correction 9

2.1 Introduction - A Notion of Correction . . . . . . . . . . . . . . . . . . . . . 9
2.2 The Iterative Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Pixel-by-Pixel Correction - the Non-Iterative Methods . . . . . . . . . . . . 11
2.4 Anatomical Localisation of Functional Images . . . . . . . . . . . . . . . . . 12
2.5 [Videen et al. 1988]’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 An Extension to 3-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 The Need for Increased Localisation . . . . . . . . . . . . . . . . . . . . . . 13
2.8 Accounting for “Inhomogeneity” within the Segmented Tissue Types . . . . 15
2.9 The Later Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.10 Concluding the Discussion on the Compartment Based Correction Methods 17
3 Developing a “Virtual Modality” PET Image 20

3.1 Intensity Normalisation - A Superresolution PET Image . . . . . . . . . . . 20
3.2 The Use of the Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Deriving the New PET Data . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.1 Problems and Deficiences of the Method . . . . . . . . . . . . . . . . 24
3.3.2 A Different Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 A Different Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Related Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5.1 The Intensity Transformation as a Forward Model . . . . . . . . . . 29
3.5.2 Regarding the Basis Functions . . . . . . . . . . . . . . . . . . . . . 29
3.5.3 Suitability as a Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.4 Further Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 Thesis Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 The Arguments for Statistical Methods of Reconstruction 35

4.1 The Radon Transform and Filtered Back-Projection for PET Reconstruction 36
4.1.1 PET Reconstruction using Filtered Back-Projection . . . . . . . . . 37
4.2 Improving the Model of the PET Acquisition Process . . . . . . . . . . . . 39
4.2.1 Defining the Parameter Estimation Problem . . . . . . . . . . . . . . 40
4.3 Methods of Attenuation Correction . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Background Theory - How the Activity Distribution is Digitised . . . . . . 42
vi Contents
4.4.1 The Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . 43

4.4.2 Alternatives to Voxels for Representing the Reconstructed Images . 43
4.5 An Algorithm and an Objective Function . . . . . . . . . . . . . . . . . . . 44
5 Characteristics of the System Matrix 46

5.1 Defining our Estimation Problem . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Implementing the System Matrix . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.1 Nearest Neighbour Approximation . . . . . . . . . . . . . . . . . . . 50
5.2.2 Accounting for Uncertainty in the System Matrix . . . . . . . . . . . 50
5.2.3 Regarding the Analogy to Image Reconstruction Using “Blobs” . . . 54
5.3 Cited Methods of Extending the System Matrix . . . . . . . . . . . . . . . . 54
5.3.1 An Efficient and Accurate Stochastic Model . . . . . . . . . . . . . . 57
5.4 Undesirable Characteristics of the System Matrix . . . . . . . . . . . . . . . 59
5.4.1 Exploiting the Sparse Nature of the System Matrix . . . . . . . . . . 59
6 Implementing PET Image Reconstruction in its Algebraic Form 61

6.1 Solving Simultaneous Equations of this Form . . . . . . . . . . . . . . . . . 61
6.1.1 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . 62
6.1.2 Linear Least-Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.1.3 Methods of Parameter Estimation . . . . . . . . . . . . . . . . . . . 63
6.2 The Algebraic Reconstruction Technique (ART) . . . . . . . . . . . . . . . 64
7 The Expectation-Maximization Algorithm 66

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.2 A General Description of the Expectation Maximization Algorithm . . . . . 67
7.3 The EM Algorithm as Introduced by [Dempster et al. 1977] . . . . . . . . . 69
7.3.1 Estimating the Maximum Likelihood - the M-step . . . . . . . . . . 70
7.3.2 Augmenting the Observed Data - the E-step . . . . . . . . . . . . . 71
7.3.3 Getting back to our Algorithm . . . . . . . . . . . . . . . . . . . . . 72
7.3.4 Maximising our Likelihoods having Fixed our Unknown Data - The
M-step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.4 [Shepp and Vardi 1982] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.4.2 Defining the Algorithm on the Basis of a Poisson Regression Model . 73
7.4.3 [Vardi et al. 1985]’s EM Algorithm; a Different Approach . . . . . . 75
7.5 The EM-ML Algorithm’s Implementation . . . . . . . . . . . . . . . . . . . 77
8 Characteristics of the Reconstruction Process 81

8.1 Criticisms of Statistical PET Reconstruction . . . . . . . . . . . . . . . . . 82
8.2 The Need for Prior Information and the Resulting Regularisation . . . . . . 83
8.2.1 A Brief Review of a Candidate Method . . . . . . . . . . . . . . . . 84
8.3 Penalisation Via the Filtering of Activity Estimations . . . . . . . . . . . . 87
8.3.1 Minimum Cross Entropy Reconstruction . . . . . . . . . . . . . . . . 88
Contents vii
9 Better Prior Models for Bayesian Methods of Reconstruction 97

9.1 Introducing Bayes’ Theorem for Tomographic Reconstruction . . . . . . . . 98
9.2 Regularisation Using A Priori Distributions . . . . . . . . . . . . . . . . . . 99
9.2.1 Choosing the Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.2.2 Method of [Lipinski et al. 1997] . . . . . . . . . . . . . . . . . . . . . 101
9.2.3 Method of [Sastry and Carson 1997] . . . . . . . . . . . . . . . . . . 102
10 Applying New Gaussian Priors 108

10.1 The Form of the Existing Priors . . . . . . . . . . . . . . . . . . . . . . . . 108
10.2 The Application of the New Prior Distribution . . . . . . . . . . . . . . . . 109
10.3 Deriving the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
10.4 Selecting the Gaussian’s Standard Deviation (σ) . . . . . . . . . . . . . . . 111
10.4.1 Local Estimates of PVE . . . . . . . . . . . . . . . . . . . . . . . . . 111
10.4.2 Application of σ in the Algorithm . . . . . . . . . . . . . . . . . . . 112
10.5 Results of using the Alternative Gaussian Prior . . . . . . . . . . . . . . . . 114
10.5.1 Closing Remarks Regarding Hyperparameter Estimation . . . . . . . 116
10.6 Conclusions, Extensions and Future Work . . . . . . . . . . . . . . . . . . . 117
11 Multiresolution Representations 120

11.1 Resolution, Convergence and Noise Properties of PET Reconstruction . . . 121
11.1.1 The Basic Multiresolution Implementation . . . . . . . . . . . . . . . 122
11.1.2 Initial Interpolation Methods . . . . . . . . . . . . . . . . . . . . . . 123
11.1.3 An Expectation-Maximization-Expectation (EME) Algorithm? . . . 125
11.2 Anisotropic Image Representations . . . . . . . . . . . . . . . . . . . . . . . 127
11.2.1 Allowing a Convergence of the Resolution . . . . . . . . . . . . . . . 128
11.2.2 Adaptive Interpolation for an Anisotropic Reconstruction . . . . . . 128
11.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
12 Multilevel Approaches 139

12.1 Implementing the Ordered Subsets Algorithm . . . . . . . . . . . . . . . . . 139
12.2 A Subsampling of the Sinogram Space . . . . . . . . . . . . . . . . . . . . . 140
12.2.1 Method of [Herman et al. 1984] . . . . . . . . . . . . . . . . . . . . . 140
12.2.2 The Pros and the Cons of the Multilevel Approach . . . . . . . . . . 141
12.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
13 Conclusion and Closing Remarks 144

13.1 What has been achieved, and how this came about . . . . . . . . . . . . . . 144
13.2 Reconstructing the Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
13.3 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
A Algorithms Used 148

A.1 Discrete Cosine Basis Functions Algorithm . . . . . . . . . . . . . . . . . . 148
A.2 System Matrix Manipulation Algorithms . . . . . . . . . . . . . . . . . . . . 148
A.2.1 Algorithm 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
A.2.2 Algorithm 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
A.2.3 Algorithm 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
A.2.4 Algorithm 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
A.2.5 Algorithm 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
viii Contents
A.2.6 Algorithm 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

A.3 The ART Algorithm’s Implementation . . . . . . . . . . . . . . . . . . . . . 150
A.4 The EM-ML Algorithm’s Implementation . . . . . . . . . . . . . . . . . . . 153
A.5 Implementation of the Alternative Gaussian Algorithm . . . . . . . . . . . . 154
A.6 OSEM Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 155
A.6.1 OS-Cross Entropy Algorithm . . . . . . . . . . . . . . . . . . . . . . 155
A.6.2 OS-Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 156
B [Levitan and Herman 1987] 157

B.1 The Algorithm - MAP-EM . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
B.1.1 The EM-Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
B.1.2 EM-ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
B.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
C Theoretical Background to Blobs 160

C.1 Spherical Basis Functions in Tomographic Reconstruction . . . . . . . . . . 160
C.1.1 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . 161
D Characteristics of the Poisson Distribution 166
E Additional Results for the Thesis’ Algorithms 169

E.1 Results for the Gaussian Field Prior Algorithm . . . . . . . . . . . . . . . . 169
E.2 Results for the Adaptive Interpolation Kernel . . . . . . . . . . . . . . . . . 169
F Glossary of Terms 186
Bibliography 190
Index 208
List of Figures
2.1 Example PET correction algorithms based either on equation 2.1 or on a

least-squares iterative algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Sixteen 2-D Discrete Cosine Transform basis functions. . . . . . . . . . . . . 23

3.2 A segmentation of a T1-weighted MRI image using the function proposed
in [Friston et al. 1995 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 The segmentation function (equation 3.3) itself. . . . . . . . . . . . . . . . . 25
3.4 The intensity transformation of [Friston et al. 1995 . . . . . . . . . . . . . . 26
3.5 The intensity transformation of [Friston et al. 1995 . . . . . . . . . . . . . . 26
3.6 The intensity transformation of the scheme proposed in this thesis using
real MRI-PET studies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.7 The intensity transformation of the scheme proposed in this thesis using a
mathematical phantom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.8 How the basis functions alone are able to model the PET distribution. . . . 30
3.9 Domination of the intensity transformation by the MRI segmentation re-
sulting from too few basis functions. . . . . . . . . . . . . . . . . . . . . . . 31
3.10 The required granularity of the basis functions in order to avoid MRI bias. . 32
4.1 Relating the Radon transform to the geometry of a tomographic system. . . 36

4.2 Typical filter responses used in filtered back-projection. . . . . . . . . . . . 38
4.3 An example sinogram and how its indexes relate to the tomographic systems
lines of response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 How attenuation correction factors are estimated in the tomographic system. 42
5.1 Relating an image grid to the scanner’s field of view. . . . . . . . . . . . . . 47

5.2 How the coefficients of the system matrix are stored to disk. . . . . . . . . . 49
5.3 Relating each line of response to pixels in the image space. . . . . . . . . . 51
5.4 The simple trigonometry relating the lines of response to each pixel. . . . . 52
5.5 How uncertainty in the source of the emission is incorporated into the sys-
tem probabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.6 The results of reconstructions using different system matrices, each using a
different interpolation scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.7 The Blob “Over Imposition” step used to convert a blob representation to
a voxel representation of a reconstructed image that used spherical basis
functions in the interpolation scheme. . . . . . . . . . . . . . . . . . . . . . 56
7.1 Maximum likelihood reconstructions for different normalisation methods. . 79

x List of Figures
7.2 Maximum likelihood reconstructions for different pixel sizes and interpola-
tion schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.1 The checkerboard effect common to maximum likelihood based methods of

reconstruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.2 The Penalized-Weighted Least-Squares method of reconstruction. . . . . . . 87
8.3 How a mathematical phantom was created. . . . . . . . . . . . . . . . . . . 90
8.4 Reconstruction results using the weighted-Gaussian algorithm. . . . . . . . 92
8.5 Reconstruction results using a weighted-Gaussian algorithm. . . . . . . . . . 93
8.6 Reconstruction results using the an algorithm proposed in this thesis based
on the “High Contrast” anisotropic filtering. . . . . . . . . . . . . . . . . . . 94
8.7 Reconstruction results using the an algorithm proposed in this thesis based
on the “Wide Regions” anisotropic filtering. . . . . . . . . . . . . . . . . . . 95
9.1 Results using the Gaussian field algorithm and the global mean prior. . . . 107
10.1 Images of an entropy measure taken on a t1 weighted MRI slice. . . . . . . 112
11.1 How a simple (piecewise constant) interpolation scheme is used to infer

pixel values across image planes in the multiresolution representation of the
image data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
11.2 Results of the EM-ML multiresolution scheme. . . . . . . . . . . . . . . . . 124
11.3 How we infer a higher resolution when prior information is available. . . . . 127
11.4 Communicating the notion of a straightforward anisotropic resolution im-
plementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
11.5 This figure shows how the interpolation method may vary in accordance to
either specific ROIs in the image or local estimates of noise. . . . . . . . . . 130
11.6 Different “Full-Width Half-Maximum maps” for different choices of noise
sensitivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
11.7 RMS error values of the reconstructed images using different sensitivity
values for the SNR based interpolation kernels. . . . . . . . . . . . . . . . . 133
11.8 Comparing the results of using the same OSEM reconstruction method for
various fixed and adaptive interpolation kernel methods. . . . . . . . . . . . 134
11.9 The average RMS error for the reconstructions taken at each pixel and at
each iteration (8 OSEM steps). . . . . . . . . . . . . . . . . . . . . . . . . . 135
11.10How the log likelihood value varies with each iteration (8 OSEM steps) for
the anisotropic kernel approach . . . . . . . . . . . . . . . . . . . . . . . . . 135
11.11The use of an anisotropic interpolation scheme determined using segmented
MRI data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
11.12The use of an anisotropic interpolation scheme determined using local esti-
mates of noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
12.1 How the original sinogram is sub-sampled using a simple neighbourhood

operator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
12.2 The data set used in the multilevel experiments. . . . . . . . . . . . . . . . 142
12.3 Results based on reconstructions using the cross entropy approach and a
multilevel reconstruction scheme. . . . . . . . . . . . . . . . . . . . . . . . . 143
List of Figures xi
A.1 Storing the Non-Sparse System Matrix. . . . . . . . . . . . . . . . . . . . . 151

A.2 Initialising the pointers for the algorithms to address the relevant regions
of the above system matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
C.1 How both basis functions (a pixel and a blob) are perhaps centered in the
same position, but nonetheless have different extents. . . . . . . . . . . . . . 162
C.2 An example profile of a Kaiser-Bessel window function; a “blob”. . . . . . . 164
E.1 Images showing the results of the Bayesian method on real PET-MRI stud-
ies for different Gaussian fields. . . . . . . . . . . . . . . . . . . . . . . . . . 170
E.2 Images showing the results of the Bayesian method on real PET-MRI stud-
ies for a specific ROI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
E.3 Plots showing the behaviour of the Bayesian algorithm with each iteration
step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
E.4 Reconstructions using the Gaussian Field Bayesian algorithm, K=45. . . . . 173
E.5 Reconstructions using the Gaussian Field Bayesian algorithm, K=60. . . . . 174
E.6 More anisotropic (SNR-based) kernel interpolation results. Slice 20. . . . . 175
E.7 More anisotropic (ROI-based) kernel interpolation results. Slice 20. . . . . . 176
E.9 More anisotropic (ROI-based) kernel interpolation results. Slice 31. . . . . . 178
E.14 Images showing the results of the reconstruction of the Jülich phantom
study using the variant (SNR-based) interpolation scheme. . . . . . . . . . . 183
E.15 Images showing the results of the reconstruction of the Jülich phantom
study using different cross entropy methods. . . . . . . . . . . . . . . . . . . 184
E.16 Images showing the results of the reconstruction of the Groningen study
using different cross entropy methods. . . . . . . . . . . . . . . . . . . . . . 185
1
Introduction
Positron Emission Tomography (PET) is a non-invasive functional imaging technique. A

radioactive tracer is introduced into the blood stream of the patient to be imaged. Its
distribution is then indicative of metabolic activity or blood flow. The isotope itself decays
quite quickly, emitting positrons which, within a very short distance, collide with electrons.
The annihilation that takes place - when these anti-matter meet - causes two gamma
emissions to occur in almost exactly opposite directions. These emissions are recorded in
the various detector crystals that surround the patient. The aim in PET quantification
is to accurately estimate the spatial distribution of the isotope concentration on the basis
of, ultimately, the projected counts as recorded by the scanning device.
Uncertainty in the emission source, low detector sensitivities, and poor signal statistics
combine to hinder the ability to resolve detail in the resulting images. The effect of this
limited resolution is such that the tracer activity observed in active regions such as grey
matter (GM) is often less than actual activity; in inactive regions such as white matter
(WM), it may be more. Accuracy of the quantitative measurements taken is thus adversely
affected by such undesirable partial volume averaging, the so-called partial volume effect
(PVE). This effect, for example, is very pronounced in the presence of cortical atrophy,
posing obvious problems for the quantitative analysis of such regions. A further example
occurs in areas of activity in the neocortex, where GM, WM and cerebro-spinal fluid
(CSF) spaces are so convoluted that the resulting signal reflects only the average tracer
concentration in all three compartments. That is, intricate detail cannot yet be resolved
with respect to their tracer activity using PET instrumentation alone. Thus, as the
relative percentage of CSF, GM and WM varies regionally in the same subject and between
subjects - particularly as a result of aging or disease - any given change in the apparent
tracer concentration in PET may reflect a change in morphology rather than physiology
or biochemistry, and without the inclusion of an associated anatomical image, we are
potentially at loss in deciphering this. How (and indeed, if) such “side-information” can
be used to improve the PET quantitation is the theme of this doctoral thesis.
1.1 Toward the Intrinsic Resolution of the PET Scanner

It is obviously desirable to improve the reconstructed resolution of the PET data. The
intrinsic resolution (about 3 mm in modern devices), as defined by the physical size of the
scanner’s detectors, is the theoretically best resolution achievable. In practice, however,
this is not achieved due to the smoothing process entailed during reconstruction. This
results in an image at the reconstructed resolution, which is typically in the region of
2 Chapter 1. Introduction
5 mm. To address this loss of resolution, we are presented with a number of alternatives,
most of which employ associated structural information.
Methods of PET correction (chapter 2) typically aim to redistribute the activity, tuck-
ing it back into GM regions in an effort to improve the quantitative results of the recon-
struction. Alternatively, more novel, iterative methods address resolution issues, attempt-
ing to zoom in on the data and ultimately go beyond its sampled resolution as determined
by the scanner’s crystal array, or at least compensate for the blurring characteristics of
the reconstruction process. One example of this approach would be the so-called “virtual
modality” PET data (as coined by [Friston et al. 1995]), which is the result of transform-
ing the intensities in a Magnetic Resonance Image (MRI) image such that it looks like
a deconvolved PET image. How appropriate such methods are is highly debatable, but
if the basic idea is better formulated to produce a forward model of the PET process on
the basis of the PET and MRI data, then there is at least a novel application within a
Bayesian framework for this transformed image. A suitable extension to these correction
models is therefore proposed in chapter 3, with the idea originally being to indeed provide
a model-driven approach to the interpretation of the observed activity using the Bayesian
formalism.
A loss in resolution may, alternatively, be addressed by more conventional inversion
techniques. This method of post-processing assumes that a convolution operation can ef-
fectively model the blur incurred due to the detector response, and that this is the primary
source of degradation. Indeed, deconvolution techniques have been successfully applied
in applications such as astronomical imaging (see [Jain 1989] for an overview), but the
noise conditions in emission data, however, do not allow such off-the-shelf techniques to be
applied with the same degree of success. For emission data, the routine application of stan-
dard signal processing methods is not usually appropriate for the task of deconvolution-
based image restoration. For example, problems associated to the use of the [optimal]
Wiener filter basically amount to the inappropriate assumptions made of the image noise
and its tendency to blur edges in its presence. One should ignore such characteristic arti-
fact in the emission data at one’s peril, as failure to account for these effects will almost
inevitably mean poor results. Furthermore, given the application of emission tomography,
it would be necessary to ask where such filtering methods are best employed. That is,
either in the image space or at the sinogram1 level; the latter case introducing us to the so-
called methods of “pre-reconstruction restoration” (see, for example, [Boulfelfel et al. 1992,
Gopal and Hebert 1994]). Such issues have been extensively researched, and basically cul-
minated in filtering methods applied during the filtered back-projection (FBP) [Barrett
and Swindell 1982] method of reconstruction2 . These processes make efforts to recover the
loss of high frequency signal components, but achieve this only with limited success (see
section 4.1).
In conditions of noisy data, the inverse-filtering paradigm is best avoided, and statis-
tical based methods of image restoration are considered more applicable [Andrews and
Hunt 1977]. Nonetheless, their effectiveness remains hindered by the quality of the initial
reconstruction. Of the reconstruction methods, [Censor 1983] classified these into two
classes: Fourier-based inversion methods, and algebraic methods. The basic difference in
these two approaches are that the inversion methods are direct solutions to this particular
1
The raw data measured by the scanner in its reconstructed form. See chapter 4.
2
The reasons why an analytical solution is not possible are the subject of chapter 4. Suffice to say, for
now, that to assume that the Radon transform adequately describes the acquisition process is incorrect.
1.1. Toward the Intrinsic Resolution of the PET Scanner 3
reconstruction problem, and the algebraic methods iteratively estimate the best solution.
In the clinical setting, PET data is, in the vast majority of cases, reconstructed using
the inversion scheme of FBP, despite the inferior quality of this method [de Jonge and
Blobkland 1999]. The resulting ill-conditioned nature of the data renders it unsuitable
for tasks of post-processing, and algebraic approaches to reconstruction are instead to be
preferred. The immediate advantage of these methods is that we may now model the
acquisition process and reconstruct the image data in a more regularised fashion. Sta-
tistical methods of reconstruction develop this scheme modelling also the distribution of
the observed data. And hence there is no sensible reason to apply statistical methods
of restoration solely at the image level; only geometric considerations are necessary to
allow the approaches to bridge the transition from sinogram to image whilst incorporating
restoration or other correction methods. Chapter 4 covers why it is therefore necessary to
use iterative reconstruction methods to produce better PET images.
Statistical methods of reconstruction are not, however, without short-comings of their
own. The problem of PET image reconstruction is a highly ill-posed one, and constraints
on the possible solutions are often required to allow a better form of determinism. Since
nearly all the data are subject to some uncertainty, a statistical approach is chosen to infer
the image intensities. This leads to a large system of equations, in which the dimensionality
of the reconstructed images determines the number of unknowns that must be solved (or
at least approximated). Of course, these would ideally be analytically derived, but the
aforementioned conditioning of the reconstruction process does not allow this to be done
in a robust and accurate way. Hence these methods iteratively estimate the unknowns on
the basis of the PET data (whose dimensionality defines the number of equations in the
system). Extensions to the existing statistical methods are primarily motivated by the
resolution of the PET data being limited by photon statistics, and also by the tendency
of the Maximum Likelihood Estimators (MLEs), the most popular solution estimator, to
“over fit” the solution. In the latter case, so-called “checker-board” noise can dominate
the reconstruction as the wrong kind of variance is maximised. It is shown in chapter 5
that accounting for uncertainty in the emission source can do a great deal to minimise this,
and furthermore, as the effect is spatially variant, the multiscale approaches of chapter 11
were proposed. The idea here being to account for such artifacts on a regional basis; in
accordance to either better statistics, or regions known a priori to be of more note.
It is seen throughout this thesis, that both the correction of PET data for the PVE
and the regularisation of the reconstructed image using statistical approaches all too often
adopt the assumption of homogeneous activity distribution in order to achieve reduced
noise solutions. It has been a driving theme of this research to steer solutions and algo-
rithms away from this assumption, as this alone is responsible for a significant loss in the
ability to resolve detail in the PET data. This ideal has driven the research described in
chapter 3 where the assumption is replaced by a model of within-tissue activity. It drives
also the method of chapter 10, where a Bayesian method is developed using a Gaussian
field prior variant across the image field. Each pixel is then given a prior estimate toward
which it may (or may not) be drawn. Previously, the approach taken has been to assume
homogeneity, and assign to each pixel a Gaussian that will drive it toward the expected
value for its tissue type. That is, the Gaussian fields are applied at only the resolution of
the tissue compartments (see chapter 9), and as it is likely that a significant population
of the image intensities would wish to exhibit within compartment variation (heterogene-
ity), the data is incompatible with the assumptions, and is subsequently penalised by the
algorithm. In such cases, one must be wary regarding the potential pit-fall of simply re-
constructing the prior itself, and the method of chapter 10, in giving more consideration
to the data itself and showing greater flexibility to incorporate arbitrary models, offers
one method of avoiding this.
Additional concerns arising from the over-influence of the prior are addressed in chap-
ter 11, which considers multiresolution representations of the image data. Here we in-
troduce a method for dealing with the measured data on more a localised basis, thus
presenting an effective anisotropic reconstruction. Chapter 12 closes with an addressal of
how the divide-and-conquer paradigm may be used to tackle the increasing computational
demands of the reconstruction algorithms and the ever-increasing dimensionality of the
data. Finally, chapter 13 closes by summarising what has and what has not been achieved
in these doctoral studies.
1.2 Organisation of this Thesis

Chapter 1 (this chapter) introduces the PET imaging modality and tomographic re-
construction as an active field of research. In describing the PET acquisition and
reconstruction processes current in clinical usage, characteristic deficiencies are cited
in order to give an overview of this research. Particular emphasis is given to the low
resolution of the resulting images, which subsequently hinders the effectiveness of
this modality for purposes of diagnosis. Approaches taken from the signal process-
ing repertoire are mentioned as potential methods for addressing this problem and
improving the overall quality of the images used within the clinical environment.
The advantages and disadvantages of such methods are mentioned on a relatively
abstract level sufficient only to justify the goals of the research in the broadest of
terms. This, however, is achieved, and the general project aims and outline are thus
formulated.
Chapter 2 presents methods of PET image correction. These aim to address the problem
of inaccurate quantification (low resolution) by using higher resolution MRI data
as an additional source of information. The methods are typically applied in a
non-iterative way at the pixel level. They use arithmetic operators to perform a
simple artificial boosting of the observed levels of activity (those assumed to be
underestimated), and a suppressing of activity where assumed to be overestimated.
This is an attempt to address the characteristic problems of estimating activity
in the [perhaps convoluted] active and inactive regions of the brain, and to thus
correct for the PVE. The rather manifest disadvantages of these methods are aired
in detail, and their typically poor results are demonstrated. Lack of robustness
to segmentation and registration errors, along with unrealistic activity distribution
assumptions - namely, assumed homogeneity - in the definition of the correction
models are convincing arguments for not adopting the approaches reviewed in this
chapter.
Chapter 3 concerns itself with developing a pixel-based PVE correction method that
avoids the homogeneity assumption applied to each tissue class. In a seemingly
elaborate scheme, a new model is proposed based solely on quite rational and widely
accepted assumptions regarding tracer distributions. An expansion is derived that
1.2. Organisation of this Thesis 5
allows the MRI data to have direct equivalence to the PET data. This firstly requires
its segmentation into meaningful tissue components, and secondly requires the use
of a convolution kernel to relate the different resolutions. The transformation de-
rived is applied to the intensities in the segmented MRI, and once the coefficients of
this transformation are determined, one is free to simulate a PET distribution at a
higher resolution (i.e., void of the convolution kernel). This model is termed a “su-
perresolution” PET image as it exhibits an effective deconvolution of the observed
PET data, constrained by structure extracted in the MRI data. This image is the
result of a more detailed and realistic model, one that needs to make no assump-
tion regarding homogeneity for the within-compartment activity distributions, and
which may serve, it is later argued, as a more appropriate prior model for Bayesian
reconstruction approaches. The method is not without its limitations (and hence the
suggested application as only a prior), and these are given in the chapter’s closing
discussion. In turn, each limitation is later addressed, and solutions are found to lie
in the reconstruction algorithms of chapters 5, 7 and 10.
Chapter 4 applies the processing one step earlier, at the pre-reconstruction, or sinogram
level. This is motivated as a result of the ineffectiveness of the chapter 2’s correction
methods that apply at the post-reconstruction, or image level. Considering first
all possible methods of reconstruction, it is seen that the analytical inversion based
methods produce images of low resolution, high noise, and significant artifact, all of
which are liable to hinder any attempt at recovering a better quality result. In short,
the conclusion of this chapter is that statistical methods of reconstruction must be
applied if any effort toward better quantification in PET images is to be achieved.
Furthermore, if a statistical reconstruction technique must be developed, then it is
here that the methods of PET correction should be applied.
Chapter 5 presents the system model used in statistical methods of reconstruction. As

common sense would indicate, the better the model, the better the reconstruction.
One should not, however, lose sight of additional concerns such as computational
efficiency, which too must influence this choice. Typical models used in emission
tomography reconstruction are presented, and a discussion of what is and what is
not applicable leads to the formulation of the model used in this research. The
model itself relates the measured data to the image to be estimated in the form of
a single transformation matrix. The fundamental importance of the interpolation
methods used to relate measurements to pixel sites is demonstrated, culminating in a
method producing higher quality images. Importantly, it is seen that the traditional
methods of interpolation can benefit by accounting for uncertainty in the emission
source. Exactly how this so-called system matrix is implemented is shown, although
its size becomes unwieldy, necessitating either the use of sparse matrix techniques
or on-the-fly calculations of the coefficients. These too are explained, concluding a
comprehensive introduction to what is arguably the most important aspect of the
reconstruction process.
Chapter 6 frames the statistical reconstruction algorithms on the basis of the system
matrix presented in chapter 5. This is basically a review, serving as an introduction
and useful reference to the methods that follow. The reconstruction is formulated on
the basis of a linear system of equations, so numerical methods for solving such sys-
tems are briefly discussed ahead of the more explicit methods used particular to this
emission tomographic application (i.e., a noise model). The implementation details
of one simple form of algebraic reconstruction is presented, thus providing context to
the development and use of the system model first presented in the previous chapter.
Chapter 7 introduces, reviews, and details the Expectation-Maximization Maximum

Likelihood (EM-ML) algorithm that plays a very important role in emission to-
mographic reconstruction, and, therefore, also in this project. Covered are the ideas
behind the original development of the algorithm - primarily its clever, constrained
optimisation approach - and hence its importance in the iterative process that consti-
tutes image reconstruction. In using a likelihood function that is to be maximised,
one is free, that is, to describe the characteristic distribution of the data (is free
to propose its functional form). This has allowed the emission process to be more
accurately modelled, and the introductory papers that exploited this were seminal
in the field of emission reconstruction. The EM-ML approach forms the basis for
all the methods later developed in this dissertation, so no necessary particulars are
overlooked in either the proof of its derivation or the implementation details that are
presented. Results are shown for reconstructions based on this algorithm in conjunc-
tion with the implementation of the system matrix first proposed in chapter 5. These
conclusively demonstrate the effectiveness of the described interpolation technique
used in defining the system matrix coefficients of chapter 5.
Chapter 8 documents the characteristics of statistical methods of reconstruction that

have traditionally remained a hindrance to PET quantification. The undesirable
properties of reconstructions based on the EM-ML algorithm are reviewed, and new
solutions are proposed. This serves to give better direction to the research, introduc-
ing the need to regularise the noisy solution resulting from this algorithm. The review
of regularisation methods provides a gentle introduction to the Bayesian paradigm,
which is applicable wherever sensible assumptions may be made with regard to the
expected solution. Relatively simple pseudo-Bayesian methods of penalisation are
demonstrated alongside my own approach that extends this work in terms of being
more theoretically attractive. These algorithms are not, however, suited to the task
of resolution recovery in emission imaging, but should certainly be given considera-
tion in transmission reconstruction whenever the requisite structural information is
to hand, and within compartment homogeneity is a valid assumption.
Chapter 9 reviews developments of the Bayesian paradigm first presented in chapter 8,

looking primarily to where better use of associated MRI data has been made than
that seen of the previous chapter. This typically involves defining firstly more de-
tailed assumptions regarding the expected activity distributions, and secondly the
relation these have to the MRI data. The resulting models are not over-complicated,
but when applied as priors within the Bayesian framework are nonetheless capable
(under certain conditions) of providing better resolution reconstructions than the
approaches previously seen. And this is an important juncture in the thesis: such
techniques, it is argued, are more suitable to PET correction per se than those
initially given in chapter 2. The activity model is more sophisticated (constrained
by energy fields), and more information may then be used from the MRI data in
terms of the expected magnitude of compartmentalised activity levels. One may
1.2. Organisation of this Thesis 7
think of this as the formation of a forward activity model, but one which again has
the shortcoming of an assumption of homogeneity. Each pixel, depending on its
tissue classification, is driven toward the homogeneous distribution. More flexibility
is therefore necessary, and this is what is achieved in chapter 10.
Chapter 10 ’s importance is to consider the possible worth of the correction method

developed in chapter 3 by defining an appropriate algorithm to exploit its potential as
a Bayesian prior. This, it is seen, is based on one approach highlighted in chapter 9,
with the variation being to extend the flexibility of the prior to allow an homogeneous
estimate for the tracer distribution. To do this, it is necessary to use an independent
energy field for each pixel, where information regarding the pixel’s tissue component
weighted in accordance to observed data itself is used to determine the standard
deviation of the Gaussian field. This then allows for sensible heuristics, which,
for example, allow GM to show the greatest variation. This is incorporated to
address a disadvantage common to most MRI-based PET reconstruction techniques;
namely the potential masking of pertinent within boundary activity. And hence
the variation shown in the application of the Gaussian field model is used to steer
the reconstruction solution toward the prior at the pixel resolution, instead of the
conventional approach applied at the resolution of the compartments. Furthermore,
a simple entropy measure taken on the MRI data can indicate to the reconstruction
procedure how homogeneous the tissue composition of the current region is, and how
much, therefore, the influence of the prior should be relaxed. Results are given to
demonstrate the worth of these methods, and also to indicate a better robustness to
both the segmentation and registration errors that one is likely to incur.
Chapter 11 introduces multiresolution techniques to the reconstruction procedures. Be-

ing normally a function of three parameters (detector size, image variance and count-
rates [Hoffman et al. 1982]), resolution issues can only be addressed in a variant man-
ner having adopted the statistical approach to reconstruction. It was envisaged that
the traditional multiresolution methods (those based on a uniform size pixel grid)
could help address the rather unattractive convergence properties of the likelihood-
based methods of reconstruction, the issues of computational overheads3 , and also
to maximise potential resolution. This is researched, but despite the use of quite
elaborate methods, as well as the very recent citing in the literature of the worth of
such an approach, no significant gains were to be had. As such, the investigation
turned toward an anisotropic resolution representation of the emission data, argued
as being sensible on the basis of the data’s variant behaviour across the scanner’s
field of view (FOV). The nature of the acquisition process itself ensures this, and
the quality of the statistics in the image vary accordingly. The interpolation method
of section 5 can actually adapt in accordance to these statistics allowing increasing
focus to be given in regions where the statistics would encourage this, and increased
regularisation elsewhere. Of equal interest is the adaption of the interpolation kernels
on the basis of the MRI data. This, it is shown, again allows us to focus on desired
regions of interest (ROI). The result is reduced noise and increased contrast in the
images for both real and simulated tests in comparison to the methods not using
3
From the practical point of view, the optimisation process itself may be aided by the use of multires-
olution approaches, especially if the function to be optimised is non-convex.
this priori information, and reduced least-squared errors for the simulated studies
using mathematical phantoms.
Chapter 12 documents the so-called Multilevel Approaches to reconstruction [Herman

et al. 1984]; that is, those that operate on either a subset of the sinogram data
[Hudson and Larkin 1994], or on a sub-sampled sinogram set. This latter section of
the report devotes discussion to the various practicalities of implementing a clinically
acceptable algorithm - where a reduction in the data dimensionality is especially
important. On the basis of its excellent efficiency improvements, the subset approach
has paved the way for the general clinical acceptance of the statistical methods
of reconstruction, and hence its inclusion in this dissertation. Unfortunately, the
inclusion of prior information in the reduced subset framework is not straightforward.
The more rigorous Bayesian techniques (those of chapters 9 and 10, for example) are
unsuitable to these methods because of the information mis-match between the prior
and that which may be extracted from only a subset of the sinogram data. The use
of prior information in the method that closed chapter 11, however, is applicable,
and introduces no significant additional computational overheads to the existing
methods. On the other hand, the addressal of the information mis-match problem
seen in the full-Bayesian methods has prompted additional research into a sinogram
sub-sampling technique, which does not lessen the counts in the present data only
the coarseness of their presentation. This can be shown to improve convergence
properties and the accuracy of the solution for at least one pseudo-Bayesian approach
(section 8.3.1), and is applicable to the full Bayesian approaches, although it is seen
to offer little or no improvement in terms of efficiency; all the reconstruction work
tends to be done at the final level of detail.
Chapter 13 closes the thesis by summarising the methods that have been introduced
in this work, what has been achieved and how this came about. It also discusses
problems relating to imposing too many constraints on the expected solution, and
the consequences that this might have. Finally, it discusses what has and what has
not been learned, and what conclusions may therefore be drawn.
2
Compartment Model Based
Methods of PET Correction
This chapter covers the existing literature for pixel-by-pixel PET correction. This image-
space processing of PET images attempts to improve the data in respect of both knowledge
regarding the subject and that of the acquisition process. The correction methods cov-
ered can be considered to approximate a rather unconventional deconvolution, where the
solution is primarily based on the expected activity values given by a compartment model.
2.1 Introduction - A Notion of Correction

Given a reasonable segmentation of the MRI data into the three main constituent com-
ponents of the brain (WM, GM, and CSF), one is able to impose reasonable constraints
on the intensity ranges for activity within each of these compartments. This ensures,
for example, that the observed activity in CSF should vary between well defined bands;
so too should activity in WM (where the intensity bands would be of some magnitude
greater than those for CSF and also characterised by a higher mean); and also for GM,
where the range and mean would be the highest. On a scale of 0 to 255, say, we could
have CSF measurements constrained to between 0 and 10, WM to between 10 and 80,
and GM between 70 and 255. This compartment model approach forms the basis of the
pixel-by-pixel correction methods that re-evaluate the PET data in accordance to these
assumptions.
The PET redistribution step uses the compartmentalised regions to localise the activity
distributions. This is done on the basis of boundaries of active brain regions and the
estimated point spread function (PSF) of the scanner. The basic principle is that the
measured activity levels that correspond to GM regions should be artificially “boosted” to
compensate for the PVE’s masking of its true value. This enhancement method attempts
to improve the disappointing results of analytical methods of deconvolution that, because
of the unfavourable noise conditions in emission images, are not seriously applicable (see,
for example, [Boulfelfel et al. 1992, Budinger et al. 1979, King et al. 1981, King et al. 1983,
King et al. 1984, Links et al. 1990, Todd-Pokropek 1983]).
The compartment model is made up of assumed activity distributions for WM, GM
and the CSF regions, for which the distributions are considered to be homogeneous. As
will be explained in the following, the inaccuracy of such assumptions limit the success of
the algorithms that have been developed, and the crudeness of the correction method is
likely to produce inaccurate results.
10 Chapter 2. Compartment Model Based Methods of PET Correction
2.2 The Iterative Approaches

Of the PET correction methods that use associated higher resolution MRI data, three
iterative alternatives present themselves. The first is that of Knorr and co-workers in
Dusseldorf [Knorr et al. 1993]. The second is Friston’s self-acclaimed “virtual modality”
developed at the Functional Imaging Labs in London [Friston et al. 1995]. And the third,
having been developed in a collaboration associated to Friston’s group, is related to this
second method [Labbé et al. 1997].
[Knorr et al. 1993]’s approach to PET correction involves an iterative redistribution of
pixel values based on a deconvolution constrained by associated anatomical information.
This allows for an estimation of how much activity registered in a PET pixel originates
from a single MRI pixel on the basis of assumptions regarding the ratio of activity levels
in GM, WM and CSF. It would seem, however, that in the first instance this actually
over-complicates the issue by using the MRI data in a solely linear deconvolution process.
That is, if the PSF is considered to be stationary, then the MRI data has nothing to do
with the resolution correction process. In the second instance, the tendency is to rather
over-simplify matters, making fixed assumptions of relative magnitudes of activity levels
and their homogeneity.
At the Functional Imaging Laboratories, the Statistical Parametric Mapping (SPM)
approach is routinely used to assess the significance of regionally specific effects in func-
tional images. For accurate comparison, the image data (the relevant example being MRI
and PET) must first be brought into registration, then normalised, before finally being
smoothed [spm ]. These processes constitute the spatial and intensity transformations
requisite to the SPM procedures [Friston et al. 1995]. To derive the transformations,
a general linear model is formulated to [basically] yield an equation which relates what
is observed to what is expected. In deriving what is expected, it is actually possible to
relate the PET data to the MRI data at the resolution of the MRI data; i.e., without the
smoothing. In doing this, a high resolution PET image may be derived, which cannot be
taken too seriously alone, although it does however offer good activity estimations that
may appropriately constrain the PET reconstruction process (see chapter 10). As such, a
possible adaptation of this method becomes a realistic option, and the further discussion
of this intensity transformation is therefore postponed until it is developed in chapter 3.
The third method was implemented by workers in collaboration with these laboratories,
and the approach is consequently similar [Labbé et al. 1997]. On the basis of probabilistic
segmentations of different tissue classes as well as different ROIs, a PET distribution is
simulated to fit the measured and reconstructed PET data. This involves finding differ-
ent coefficient values for each compartment such that the sum of all the compartments
multiplied by these coefficients resembles the PET data. This naturally involves using an
estimate of the scanner PSF, which is how the correction is possible: solve with the PSF
present, and then construct the compartments anew this time in the absence of the PSF
convolution.
In each case, correction methods profit only in accordance to the accuracy the models
used, which in turn are only as good as the relevance of the assumptions made, the basis
of which is homogeneity. As we shall see in the following, this is the limiting aspect of all
the pixel correction methods. The models are quite trivial, and their possible extensions
massively restricted. Alternative approaches should really involve research into deriving
the true nature of the activity distributions such that a better understanding of the image
2.3. Pixel-by-Pixel Correction - the Non-Iterative Methods 11
signal may be achieved. This could, for example, mean developing simulation procedures
to generate simulated PET brain images from MRI data (along the lines of the McGill
University group in Montreal [Rousset et al. 1993b]), or by using Monte-Carlo simulation
methods to derive a density map from MRI data to be incorporated as a phantom in the
appropriate software. It is mere speculation here to presume that these would be fruitful
research directions, but it is certainly the case that the inadequacy of the chosen model
will impair the results in all but the most superficial of experiments, and work based on
developing the simulation models would, therefore, be effort well spent.
2.3 Pixel-by-Pixel Correction - the Non-Iterative Methods

The redistribution of PET data can only be done using prior knowledge to allow the
derivation of a higher resolution image. Such information is available, although it is never
too clear what assumptions can be made that relate activity to anatomy in a fashion
stringent enough to facilitate accurate and robust PET correction. This is of course
relevant in considering the both the original methods of “pixel-by-pixel correction”, as well
as their later localisation to brain and non-brain regions. This involves an extension to
3-D sets, increased inter-tissue localisation, and finally an attempt to allow inhomogeneity
in at least one of the compartments.
All of the pixel-by-pixel approaches are non-iterative, and as such, they are all im-
plement rather elementary methods of PVE correction. The original methods involved
using X-Ray Computed Tomography (CT) images to correct PET studies for the effects
of atrophy [Herscovitch et al. 1986, Chawluk et al. 1987]. Due to its improved delin-
eation of tissue structure and high contrast between brain and CSF, the later application
of MRI imaging permitted a greater degree of accuracy than that achieved using the CT
data. Fully utilised, the MRI data allows for a more accurate segmentation [Wehrli 1988],
which in turn enables this greater accuracy in quantitative measures of, for example, brain
atrophy [Condon et al. 1986, Videen et al. 1988].
For each image volume, the methods of [Herscovitch et al. 1986, Chawluk et al. 1987]
begin by producing two 2-D images formed from a summation of the entire 3-D volume.
The PET data is then divided by the anatomical image on a pixel-by-pixel basis to yield
an image of activity expressed in terms of the CT imaged volume, and the correction
procedure is finished. That is, the correction is not localised in any sense of the term1 . That
is, it does not use information pertaining to any higher segmentation of the anatomical
image, giving only a measure of activity per CT pixel, and the approach subsequently offers
no notion of regional measurements. Measurements desired from such localised regions
remain therefore significantly influenced by activity originating in neighbouring tissues
because of the PVE. For example, effects such as “over-spill” and “under-spill” occur to
mask the final measurement [Links et al. 1990], and this so-called non-correspondence
of regional information is exaggerated as a result of the low resolution of the PET data.
This requires the use of better delineated anatomical information to account for variations
in the typical levels of activity in tissued regions (i.e., across GM and WM), a perhaps
obvious extension to the methods of [Herscovitch et al. 1986, Chawluk et al. 1987], which
1
It is quite a bone of contention to define localised (see the paper). Also, for Videen’s work the hi-
resolution data is used just as reference. These methods could, therefore, be called: “Referencing PET
data in Accordance to Higher Resolution Structural Images”, or some such thing.
was only later considered in [Müller-Gärtner et al. 1992].
2.4 Anatomical Localisation of Functional Images

[Videen et al. 1988] applied the same 2-D pixel division method, but was original in the use
of a segmented MRI image to localise the associated PET data. The segmentation however,
distinguished only brain and non-brain components. This meant that the corrected data
set was given only in units of tissue volume, thus ignoring any distinction between the
tissues themselves, and full use of the higher resolution anatomical image is not therefore
achieved.
The correction method itself is based on an algebraic operation, which first requires
that the image’s resolution is matched. To do this, the MRI data is convolved with a
Gaussian estimate of the PET scanner’s PSF, at which point we actually lose the higher
resolution information. The correction is then performed by dividing the PET image by
this convolved MRI data, the purpose of which being to recover activity toward the periph-
ery of brain; the motivation for which being the masking of the observed data by inactivity
about these regions. Subsequently, the method results in an effective amplification of the
observed data wherever the filtering has left tissue values of less than unity, but greater
than zero.
Clearly this method of correction is very crude. Its necessary extension by [Meltzer
et al. 1990] was firstly to the 3-D case, and then later to account for “heterogeneous”
GM activity distributions [Meltzer et al. 1996]. Additionally, [Müller-Gärtner et al. 1992]
of the same collaboration, bridged these developments by describing how the GM tracer
concentration alone could be estimated. As these extensions remain based on the same
general correction by division principle, we detail briefly how this is applied ([Videen et
al. 1988]’s approach is a template, in effect, for the aforementioned later improvements),
and then discuss the extension to the 3-D case.
2.5 [Videen et al. 1988]’s Method

Firstly the MRI data is thresholded to create a binary brain image, where brain pixels
are set as on, and non-brain pixels are set as off. This image is then convolved with the
2-D PSF characteristic of the PET scanner used to create a corrected tissue image. Under
the assumption that where there is function, there is brain, the original PET image may
then be divided by the corrected tissue image to yield an image in which pixels represent
activity per actual tissue volume; rather than simply per spatial volume. That is, we have
a representative ratio of activity to brain structure, which is this case is the corrected
PET image. Hence, wherever a region that affects a reconstructed PET pixel constitutes
brain tissue alone, the pixel value is high (1), and the reconstruction is not therefore
attenuated in any way by the division. Elsewhere the value is less than 1, hence any
attenuation reflects activity’s distribution about anatomy, and the division operator will
increase the observed signal. It remains then only to divide each pixel in the original PET
image by its corresponding pixel in the corrected tissue image, and a division by this value
will amplify, or recover, the signal. The resulting corrected PET image is now a better
estimate of tissue activity expressed in units of brain tissue volume, but being the result
of pixel-by-pixel correction at the resolution of the smoothed MRI data only, the problem
2.6. An Extension to 3-D 13
of spatial resolution is not addressed.
2.6 An Extension to 3-D

[Videen et al. 1988]’s paper outlined how an extension to the 3-D case could be achieved.
The main points were the following. Firstly, accurate atrophy correction of each slice of
the PET data requires that potential sources of activity should also be considered from
outside of the current plane. This means that enough contiguous MRI sections must
encompass the [3-D] area about the slice to be corrected, thus accounting for all aspects of
influence2 . Secondly, MRI inhomogeneity must be considered in aligning the images (this
is pretty run-of-the-mill with respect to the image registration problem). And lastly, the
PSF of the PET scanner must be carefully accounted for in each dimension during the
convolution process.
In adherence to this, [Meltzer et al. 1990] used nine contiguous MRI slices to perform
the correction of a single PET slice3 . In this paper validation is well supported with
studies of two dementia patients. Phantom-based studies showed aspects of recovery of
up to 79% of the estimated theoretical loss. Additionally, the results of small errors in
the registration of the image sets was investigated, leaving the following factors as cited
candidates for causing a correction of less than 100%:
• Partial pixel misregistration,
• Nonuniformity of the in-plane and axial PET PSF,
• Distortion due to the attenuation correction,
• Random coincidences, and
• Scatter coincidences.
An additional category that might be added to the above list could be the use of two
spatially invariant 2-D convolution kernels to emulate the 3-D PSF.
2.7 The Need for Increased Localisation

The pixel-by-pixel methods of correction discussed thus far fail to distinguish activity
distribution in terms of differing tissue types. Their rather ad hoc application of the
division procedure may substantially amplify the recorded level of activity in regions that
have little or no brain tissue. Furthermore, the effective units in which the resulting
corrected image is represented (per brain tissue volume) is in no way sufficient for most
quantitative studies of pathology. [Videen et al. 1988, Meltzer et al. 1990] do not,
therefore, help in respect of the desire to resolve radioactivity distributions in regions
2
Here, one must then consider how greater the MRI volume must be with respect to the PET image.
This should relate to the potential distribution of the most active region, which is typically given by the
full-width half maximum (FWHM) of the gamma camera.
3
On this basis, we therefore require that the approximate extent to which we must surround the PET
volume is by four slices of MRI data. And this is not the restrictive limitation that might first be assumed.
Typically, the physical ratio of MRI slices to PET slices might be in the region of 1 to 5, and as such, we
can assume that only the end slices of each PET volume cannot be fully corrected.
effected by the PVE, and the subsequent associated over-spill and under-spill effects. These
objectives could only be achieved by firstly improving the segmentation, and secondly by
incorporating a term representing a simulation of the PET data at the resolution of the
MRI images.
GM is of greatest pathological importance due to it having the highest function of blood
flow, receptor density, and glucose metabolism [Kameyama et al. 1979, Sokoloff et al. 1977,
Young et al. 1986]. Additionally, degenerative diseases, such as Alzheimer’s, are thought
to originate in GM regions [Mintun and Lee 1990]. Therefore, an important extension to
the previous work is to consider the effects of partial volume averaging between GM and
WM, which relates more directly to the structural changes likely to occur in diseases such
as Parkinson’s, Multiple Sclerosis (MS), Alzheimer’s or Schizophrenia. For example, in
the diagnosis of MS we would wish to monitor variation in WM, where its proliferation is
typically masked by activity in the surrounding GM [George et al. 1988]. And given that
CSF spaces probably do contribute weakly to the PET image [Blasberg et al. 1989], then
an accurate segmentation of the three component parts of brain tissue (GM, WM, and
CSF) is desirable to allow better efforts of PET correction. Given such a segmentation,
further advances in PET imaging depend solely on improvements in the resolution of
the PET scanner, and on a better modelling of its acquisition effects (and hence, better
reconstructions). To allow this, the PVEs due to varying GM content must be accounted
for, as issue addressed in section 2.8.
The work of [Müller-Gärtner et al. 1992] first addressed PVEs (specifically in GM)
by describing a method of detailing tracer concentration per unit volume of GM. And
because GM regions show the highest levels of activity, the pixel-division method for the
recovery of the observed activity levels within this compartment is relatively sensible. The
method is based on an accurate segmentation of the MRI image into its three constituent
parts (GM, WM and CSF) to derive - by way of various assumptions made on the relative
activity ratios - an expression for the GM concentration. This expression is then used
to determine how various intermediate image representations of the two data sets (MRI
and PET) are used [arithmetically] to derive an image of activity per unit volume of
GM. The assumptions are based around an assumed known distribution of the radioactive
concentration within the CSF (in fact, no concentration is expected here) and WM, and
also that the activity across all of the regions is homogeneous. The corrected PET image
(pc (~x)), is basically formed using the following:
pm (~x)
pc (~x) = pp (~x) × , (2.1)
ps (~x)
where pm (~x) is the measured and reconstructed PET image; pp (~x) is the “pure” PET
image, which is formed from the MRI segmentations; and ps (~x) is the simulated PET
data, which is basically the pure image convolved with a Gaussian like smoothing filter as
an approximation of the PET scanner’s PSF.
The results are good, although they are only validated on computer simulations. As
such, it is a feasibility study:
neocortical anatomy was only approximately replicated .... More realistic

agarose brain phantoms will need to be constructed to evaluate thoroughly
the method in various cases of cortical anatomy and subcortical structures.
2.8. Accounting for “Inhomogeneity” within the Segmented Tissue Types 15
These extensions were left to Frost’s group [Frost et al. 1995, Meltzer et al. 1996], and
are covered in the next section. Otherwise, the technique is sound and the method of cor-
rection is said to be accurate, although the assumptions made on the typical distributions
may not be so valid. Firstly, they assume that no concentration is expected in regions of
CSF, and (depending on the tracer used) this may not be the case. And secondly, the
“known” distribution of activity in WM is assumed homogeneous. In studies of MS, an
assumed known homogeneous distribution of activity about WM regions is inappropriate.
Perhaps an additional compartment is necessary (e.g., MS lesion), as the homogeneous
assumption must hold using these methods to derive the correction formula. Also, even
though GM activity is not considered [explicitly] to be uniform, the condition that allows
the formulation of an expression for GM activity only holds for cases of homogeneity.
2.8 Accounting for “Inhomogeneity” within the Segmented

Tissue Types
An additional partition to include an additional region of homogeneous distribution is ex-
actly how [Frost et al. 1995] extends the work of [Müller-Gärtner et al. 1992]. Primarily
through the use of more appropriate validations, [Frost et al. 1995] is able to cite the
tendency of [Müller-Gärtner et al. 1992]’s method to “incompletely correct GM struc-
tures when local tissue concentrations are highly heterogeneous”. To this end, the three
compartment algorithm was extended to include a fourth, where the additional compart-
ment is applied within the GM regions, and hence the new notion of heterogeneity of GM
radioactivity. The details [and further evolution] of the method were later described in
[Meltzer et al. 1996] (of the same group), and are briefly covered in the following.
The work of [Labbé et al. 1997] is on exactly the same lines, although this is an iterative
approach. They separately derive distributions for an arbitrary number of compartments,
with a single coefficient value being assigned to each, representative of its activity level.
The approach due to Meltzer is said to improve partial volume correction of hetero-
geneous GM activity when structures of interest can be delineated by MRI [Meltzer et
al. 1996]. The method presented begins in accordance to the method of [Müller-Gärt-
ner et al. 1992]: an expression is derived for the GM radioactive concentration, which
is in terms of spatial distributions (from the MRI data), the observed PET image, and
known homogeneous concentrations in WM and CSF. To account for a spatially hetero-
geneous distribution of GM activity, the three-compartmental algorithm is extended to
include a fourth compartment. This fourth component is assumed to be homogeneous,
however, the overall GM regions are only therefore heterogeneous while they consist of two
homogeneous distributions! Following an initial correction of the PET data - using the
original three-compartment approach - the GM regions are further subdivided to yield an
expression for the activity in this fourth compartment, the GM Volume of Interest (VOI)4 .
The expression requires, therefore, that the segmentation process is able to recover these
within-tissue regions5 . Hence the correction method is a two-step procedure. The initial
4
On close inspection of the algorithm, it is clear that the VOI need not be part of the GM distribution.
It is assumed as such because it is in this region that the likelihood of variation in activity is greater.
Therefore, we might wish to apply a similar approach to capture heterogeneity in WM due to MS lesions,
or whatever.
5
No real discussion on how the VOI is segmented is given. Described was that “additional subcortical
VOI’s were defined by the boundaries of the caudate and thalmus in the MR[I] data set.” And that,
step determines “global” GM activity, which is then used as the basis for the second step
of determining GM-VOI activity.
That two homogeneous distributions constitute a single heterogeneous distribution is
unlikely, and the base assumption of the method is, therefore, too crude. Only in situations
where the level of overall GM activity equals the activity in the VOI is the second correction
step valid. Otherwise it is not, and the residual error will propagate through to the final
derivation. (Indeed their suggestion of iterating the process to enable the inclusion of any
number of additional volumes of interest is by their own admission limited due to this
likely [cumulative] error propagation.)
The results of phantoms tests are good, however, although they do come across as
appearing rather manufactured toward the theoretical model. Remaining unresolved is the
assumption on the activity distributions, and the obvious reliance on segmentation and
registration accuracy implied by the basic formulation of these methods (equation 2.1).
2.9 The Later Developments

More recently, [Yang et al. 1996] presented a correction method that worked in accor-
dance to the approach of [Müller-Gärtner et al. 1992] redistributing the PET data at the
resolution of associated MRI images. Again, the method works under the assumptions of
homogeneous within-tissue distributions, known activity rations across tissue types other
than GM, and a known and spatially invariant PSF of the PET scanner. This more recent
work is again based on equation 2.1 above. The first step of the correction method is the
formation of the correction map using the simulated PET data and the “pure” (expected)
activity levels derived according to the classification of the MRI data. The correction map
is then applied at the MRI pixel resolution (PIXEL) and at the region of interest ROI
resolution, which is at the resolution of tissue regions. These were their own methods,
but they also emulated the method of [Müller-Gärtner et al. 1992] (MG) for comparative
purposes.
The data used was manufactured from MRI scans, segmented into four components:
GM, WM, CSF, and - perhaps surprisingly - muscle (MSC). Sinograms were generated on
the basis of GM:WM:CSF:MSC activity ratios of the order 4:1:0:0. Poisson noise was then
added and the images were then reconstructed using the scanner’s own software (i.e., using
FBP). Noiseless simulated PET images were generated directly from the MRI data using
the aforementioned activity levels and convolution whose kernel approximated the PSF
of the PET tomograph. Finally, real PET fluorine-18 fluorodeoxyglucose (FDG) images
were acquired. The MRI data was resliced to the dimensions of the PET data, and all the
image sets were brought into common alignment.
The results show the ROI method to be the most robust and accurate, and the MG
method to be over-sensitive to misestimates of WM activity. It would seem that their own
two methods of correction (PIXEL based, and ROI based), differed only in the resolution to
which they were applied (their ROIs were simply collections of pixels whose intensities were
averaged). One advantage, however, is that no assumptions are made on the magnitude
of the GM tracer distribution in [Yang et al. 1996], aside from it being homogeneous, of
“delineation of the structure of interest by MR[I] is a requisite parameter for the algorithm,” which thus
implies the need for significant manual intervention where the pre-processing uses a histogram fitting
method of segmentation [Müller-Gärtner et al. 1992].
2.10. Concluding the Discussion on the Compartment Based Correction Methods 17
course.
2.10 Concluding the Discussion on the Compartment Based

Correction Methods
It is clear that a homogeneous distribution of activity should not be assumed in any
attempts at modelling (and thus, correcting) the PET imaging process. For practical pur-
poses, such assumptions are adopted in different forms by all the methods seen above as
well as by [Knorr et al. 1993] for PET correction and deconvolution, and [Ma et al. 1993]
for PET simulation (a process applied to [Rousset et al. 1993b, Rousset et al. 1993a,
Rousset et al. 1995]). Specific details of actual distributions are scarce, but according to
[Friston et al. 1995], it is normally GM that shows the greatest amount of variance; CSF is
likely to show some, although this is negligible; and WM should show some aspects of vari-
ability in cases of diseases such as MS. Additionally, Friston cites lesions, focal activations
and field heterogeneities as typical factors that would violate the homogeneity assumption.
One good physiological argument is provided in the form of the caudate whose uptake is
likely to be asymmetric, yet its classification (on the basis of MRI data) would probably
be singly to GM. And Parkinson’s disease typically results in a very gradually change in
activity levels across the putamen. Justification is also to be found in the discussion of
[Ma et al. 1993], where the opinion that sharp activity changes occurring at structural
boundaries are unlikely is given (i.e., those occurring under the homogeneity assumption
are unlikely), and that it is more probable that the distributions can be characterised by
some gradient in tracer level occurring within and possibly across structures. By allow-
ing an arbitrary number of compartments, [Labbé et al. 1997] offer one solution to this
problem in producing a correction method involving many homogeneous ROIs - the more
the better in fact, although this becomes an unrealistic proposition given the required
segmentations (the onus is on the user to make the correction). Furthermore, the varia-
tion exhibited in the resulting, corrected tracer distribution for whatever compartment,
is determined entirely by the segmentation of the associated MRI image. The PET data
is only used to derive a single coefficient term for each compartment, and local variation
results only from the use of probabilistic segmentation maps (i.e., those giving the degree
of affinity of each pixel to a particular class).
It is interesting to see how activity distributions are constrained, if not because of
their assumed accuracy, but due to the difficulty of deriving more accurate yet sufficiently
practical known patterns of distribution. We have:
• [Knorr et al. 1993]’s GM:WM:CSF ratio of 4:1:0.
• [Ma et al. 1993]’s GM:WM:CSF ratio of 4:1:0. Additionally, [Kosugi et al. 1995]
found this ratio to give the smallest error difference when used as input to a multi-
layer neural network6 (see also [Ingvar et al. 1965]).
• [Rousset et al. 1993b]’s GM:WM:CSF ratio of 7:4:07 .

6
Here, a non-linear interpolating device.
7
The main values assigned by [Rousset et al. 1993b] are actually closer to those of his co-workers [Ma
et al. 1993] than the above ratios imply. In his work, [Rousset et al. 1993b] uses the additional GM
[sub-]compartments of the caudate nuclei and the putamen. These regions, although small, are assigned
high activity levels, which thus detract from the values assigned to the remainder of the GM region.
• [Yang et al. 1996]’s GM:WM:CSF:MS ratio of 4:1:0:0, where MS represents muscle.
• [Kiebel et al. 1997]’s GM:WM:CSF ratio of 10:3:1.
• [Sastry and Carson 1997]’s GM:WM:CSF:background ratio of 4:1:0.005:0 based on

the rat experiments of [Sokoloff et al. 1977].
The variation in these must make us really question the basis on which the figures were
originally chosen. That is, were these values determined in respect of previous activation
studies, or empirically in an effort to derive “nice” or feasible results? Certainly - and as
is evident in figure 2.1 (a) - by generating the test data in a manner absolutely consistent
with the model assumptions, we do of course get good results. In figure 2.1 (b) and
(c), however, the assumptions lose validity, and hence the quality of the results rapidly
deteriorates.
Finally, and as we shall see in detail, we should note here that it is one of the goals
of model-based reconstruction to provide images free of the PVE. This effort alone,
perhaps, permits the more accurate quantification of distributions in small structure,
the ultimate aim of this work. It has been shown in the literature ([Qi et al. 1997,
Sastry and Carson 1997]), that with the incorporation of an accurate projection model
(which must include resolution effects), this can at least be theoretically achieved. But
ahead of forsaking the pixel-based correction approach entirely, we first introduce our
own extension to the methods whose contribution is to abandon the previously required
assumption of homogeneity in the within-compartment activities.
2.10. Concluding the Discussion on the Compartment Based Correction Methods 19
Corrected PET Measured PET Simulated PET Pure PET GM Segmentation
(a)
(b)
(c)
WM Segmentation GM Segmentation
(d)
Figure 2.1: This figure shows example PET correction algorithms. The first three rows are based
on equation 2.1 and demonstrate the sort of results acquired by the authors mentioned throughout
this chapter. From left to right the images in each column are: the corrected PET images (pc ); the
measured PET images (pm ); the simulated PET images (ps ); the “pure” PET images (pp ); and
the GM segmentations (WM and CSF segmentations are not shown). The ratio of the activity
distributions used for GM:WM:CSF:other were 10:3:1:0 for the top (a) and bottom rows (b), and
1:3:10:0 for the middle row (c). In regions where equation 2.1 would involve a divide by zero, the
original PET image value was taken. From the simulation of the PET distribution alone (images
third from left), we are able to see the inaccuracy of the result, although for the 1st row of images
(a), the results are sensible. This is because the data set comes from a simulation based on exactly
these activity ratios, and the segmentations are perfect. As soon as the activity assignments are
incorrectly assigned, then the result is much worse. The third row (c) shows the scheme for real
PET-MRI data. Important to note is that the simulated high resolution PET images (second from
right column) are derived according to very base assumptions on the activity distributions, and
without consultance of the PET data itself. Indeed, as is apparent in the above, the effectiveness
of these correction procedures is limited by the effectiveness of the model, and if the model (the
simulated PET column in the above) does not even use the PET data, then the correction can be
nothing but crude. The second algorithm, shown in the bottom row (d) is that due to [Labbe et
al. 1997], which does use the PET data in deriving the simulation. The example above uses only
the aforementioned tissue compartments (i.e., no ROIs), and is shown for real PET and MRI data.
By using only a minimal number of compartments, the effectiveness of the procedure is hindered
in the sense that the correction follows all too closely the influence of that which is dominant, GM,
and the observed data becomes all but irrelevant.
3
Developing a “Virtual Modality”
PET Image
The work presented in this chapter has a single aim: to develop a pixel-based PVE correc-
tion algorithm without assuming a within-compartment homogeneous distribution. Scep-
tical of the methods of the previous chapter, this work was instead developed with the
Bayesian methods of image reconstruction in mind. The initial idea being to develop a
realistic forward model that could be applied as a prior. We return to this issue in chap-
ter 10, but concentrate for now solely on the development of the correction algorithm. It
is based on an adaptation of the once used spatial and intensity normalisation approaches
of [Friston et al. 1995], who were also responsible for the “virtual modality” name.
3.1 Intensity Normalisation - A Superresolution PET Image

In the following, the intensity normalisation as used and developed by [Friston et al. 1995]
and co-workers is described. Their SPM methods require methods of spatial and intensity
normalisation such that differently acquired images can be compared on a like-for-like
footing. For example, to compare MRI and PET images, they must first be brought
into co-registration. To enable this, the images themselves are said to be approximately
equivalent if the intensities are transformed in one image such that it resembles the other
and it is brought into spatial alignment. On this basis, a registration solution can be
derived by solving such an equivalence expressed as a system of linear equations, where
the expansion designed for this application uses a Taylor’s Series as the linearising device.
The reason for investigating this intensity transformation is that it can be used to make
the MRI image resemble the PET image at the resolution of the MRI image. The following
relates the original theory of [Friston et al. 1995].
If we assume that the images are already aligned, then the assumption is that the
originally constructed PET image (λoj ) can be described by an intensity transformation of
the [segmented] MRI image (msj ) and a convolution to relate the differences in resolution.
That is,
λoj ≈ hj ∗ γj {msj }, (3.1)
where ∗ denotes convolution, hj is the convolution kernel that reflects the difference in
resolution between the PET and MRI data, and γj is the intensity transformation that we
wish to derive. A further assumption is that the PET signal stems predominantly from the
3.1. Intensity Normalisation - A Superresolution PET Image 21
GM of the brain, and by delineating this in the image data, our intensity transformation
should apply with greatest influence in these regions. Thus, we denote the intensity
transformation in terms of a GM segmentation:
γj {mj } = u0,j s(mj , vj ), (3.2)
where u0,j is a spatially variant scaling coefficient, sj () denotes our segmentation of

the MRI data (mj ), and vj are deviations (errors) from our estimated GM mean intensity,
denoted v̄. Along with the coefficients, these deviations are our unknowns, which are
contained in the segmentation equation:
−{mj − (v̄ + vj )}2

exp , (3.3)
2σ 2
where σ is the standard deviation of the segmentation kernel (0.3 is the value used
in SPM). The estimate of a mean GM intensity is set at 52% of the maximum grey-scale
value in the MRI data. Given our two unknowns, the above equation 3.2 is not in a form
that allows a least-squares solution, hence a first order approximation of the Taylor Series
w.r.t. our [small] deviations in the GM intensity is used to expand the equation into a
solvable form. That is,

∂s(mj , 0)
γj {mj } ≈ u0,j s(mj , 0) + vj . (3.4)
∂vj
−{mj −(v̄+vj )}2

∂s(mj ,vj ) mj −(v̄+vj )
Letting u0,j vj = u1,j , and defining ∂vj to be σ2
exp 2σ 2
, our
expansion becomes:
−{mj − v̄}2 −{mj − v̄}2

mj − v̄
γj {mj } ≈ u0,j exp + u1,j exp . (3.5)
2σ 2 σ2 2σ 2
Our non-stationary (but smoothly varying) coefficients u0,j and u1,j are expanded in
terms of B basis functions (β):
B−1
f
X
ui,j = ui;b βb,j , i = {0, 1}, (3.6)
b=0
where f indexes a possible variation in scale, which we shall not use. In this case, our
unknowns reduce to the ui;b coefficients, of which there are only 2 × B. To estimate the
coefficients requires finding the solution to an equation of the form λo = A · u, where λo
is the original PET image. This linear system is the following,

∂s
λo ≈ h.diag.s(m, 0).β f h.diag.( ).β f ][u0 u1 ] . (3.7)
∂v
Given we have J pixels in our image data and ignoring for now the convolution term,
22 Chapter 3. Developing a “Virtual Modality” PET Image
the matrix A that results from the above equation 3.7 is:
 1 ∂s(mj=0 ,0) 1 ∂s(mj=0 ,0) 
0 0 β0,j=0 . ∂vj=0
, .. βB−1,j=0 . ∂vj=0
β0,j=0 .s(mj=0 , 0), .. βB−1,j=0 .s(mj=0 , 0),
0 0 1 ∂s(mj=1 ,0) 1 ∂s(m j=1 ,0)
 β0,j=1 .s(mj=1 , 0), .. βB−1,j=1 .s(mj=1 , 0), β0,j=1 . ∂vj=1
, .. βB−1,j=1 . ∂vj=1

.
 .

. . . .
. .
 . .

 . . 
 . . . .

. . . . .
 . . . .
 . .

 . . . . 
. . . .
 . . 
 . . . .

. . . .
 . . . .

0 0
β0,j=J−1 .s(mj=J−1 , 0), .. βB−1,j=J−1 .s(mj=J−1 , 0), 1 ∂s(mj=J−1 ,0) 1 ∂s(mj=J−1 ,0)
β0,j=J−1 . ∂vj=J−1
, .. βB−1,j=J−1 . ∂vj=J−1
(3.8)
And the second matrix term in equation 3.7 is,
u = (u0;0 , u0;1 , ..., u0;B−1 , u1;0 , ..., u1;B−1 )T . (3.9)
Regarding the dimensionality of the data, let us say for J = 1282 , an (empirically
derived) suitable choice for B - the number of basis functions - would be around 16. Our
linear system contains J equations and only 2×B unknowns, thus the set is overdetermined
(J > 2 × B), and a solution is sought which comes closest to satisfying all equations
simultaneously [Press et al. 1988]. As we will see in section 6, we can define closeness in
its least-squares sense, and the problem reduces in most cases to a solvable linear problem.
This we do, and the least-squares solution of A · u = λo is determined by finding the vector
û which minimises the residual vector r = λo − A · û. This is done using the NAG fortran
routine f04jaf, which minimises the sum of squares of the residuals,
S = (λo − A · û)T (λo − A · û) = kλ − A · ûk22 . (3.10)
The solution is found in two steps. Firstly, Householder transformations are used to
reduce A to a simpler form1 . And secondly, the least-squares solution is obtained with
back substitution. Iterative refinement of the solution is used until further corrections are
negligible, defined by a tolerance parameter supplied as an argument to the routine.
3.2 The Use of the Basis Functions

In essence, the appropriate choice of basis functions sets a limit on the ability of the
least-squares model to accurately model the distribution in the data. This is done under
the original assumption that variations in these intensities are smooth. Hence the basis
functions are smooth, and orthogonal2 . Originally [Friston et al. 1995], the functions cor-
responded to Fourier modes at spatial frequencies in steps of π2 , and their implementation
set consisted of thirteen of these. qBut, for algorithmic purposes, the lowest frequency
basis function of the discrete cosine function (DCT) are now used.
[Jain 1989] cites efficiency of implementation and effectiveness of operation as the two
major reasons for choosing these functions. The DCT packs a large fraction of the average
energy into relatively few components of the transport coefficients, and consequently has
1
This involves reducing the matrix to its block diagonal form via a sequence of such transformations.
This removes matrix elements below the diagonal.
2
Note, however, that when returning to the context of image priors, then this effectively meets the local
variation assumption used by [Sastry and Carson 1997, Lipinski et al. 1997]. See chapter 9.
3.2. The Use of the Basis Functions 23
excellent energy compaction for highly correlated data. Additionally, the cosine transform
of a vector of N elements can be calculated in O(N ln N ) operations via an N -point Fast
Fourier Transform (FFT) [Narasimha and Peterson 1978]. The N − by − N cosine transfer
function matrix, C = {c(k, n)}, is defined as,
√1 if k = 0, where 0 ≤ n ≤ N − 1
(
,
c(k, n) = qN (3.11)
2
N cos π(2n+1)k
2N , where 1 ≤ k ≤ N − 1, 0 ≤ n ≤ N − 1
by = 0
by = 1
by = 2
by = 3
bx = 0 bx = 1 bx = 2 bx = 3
Figure 3.1: This figure shows 16 2-D Discrete Cosine Transform basis functions. These result from
the algorithm given in this section, where B = 4. Note in the above that horizontal frequencies
increase from left to right, vertical frequencies from top to bottom.
In this application, it is, of course, necessary to use 2-D basis functions. These are
shown in figure 3.1, and the algorithm for computing them is detailed in section A.1 of
appendix A.
Obviously, the more basis functions that can be applied, the better the fit that can be
attained, but the higher the computational overheads. This is an important issue and will
therefore be returned to.
3.3 Deriving the New PET Data

Once a solution for the vector û is found, we use these values to derive a “superresolution”
PET image in the following manner. Recall that our intensity transformation hinges on
the facts that intensities in the PET data stems mostly from GM regions in the brain,
and secondly that we relate these activity levels via a convolution that accounts for the
differences in resolution. This means that in removing the convolution term from the
matrix A, we are able to use the unknowns (equation 3.9) that were derived and produce
a higher resolution PET image, λnew , valid in accordance to these original assumptions:
λnew
j = γj {mj }. (3.12)
The intensity transformed MRI image is said to represent a least-squares solution for
the underlying flow distribution seen in the PET image [Friston et al. 1995]. That is, the
transformed MRI image emulates a fully “restored” PET image, where the restoration is
predominantly a deconvolution, constrained by the delineation of GM distribution in the
MRI data.
In assigning PET values to structural objects, the transformation can be considered
to “constitute a functional categorisation of anatomical structures” of the sort implied by
a segmentation. Hence we are only emulating a “restored” PET image; and hence also
that the restoration, in embodying anatomical information, is only as good as the validity
of the assumption concerning activity distributions in the tissue types known to be in the
MRI data.
3.3.1 Problems and Deficiences of the Method

Firstly, our assumptions are weak. Although activity in PET data does originate predom-
inantly in GM, we must not ignore contributions from WM and even CSF. Secondly, and
most importantly, our segmentation based solely on the criterion given by a single inten-
sity value in equation 3.3 yields bad results. Using this method, we get the segmentation
result shown in figure 3.2 from a T1 weighted MRI image.
Obviously, this is an unsatisfactory result, as we only really derive noticeable signals
where intensities are the same as the GM estimate. This is highlighted in figure 3.3, where
the x-axis are intensity values and the y-axis the segmentation where our GM estimate
is 130. That is, the segmentation function s(mj , 130) shows all too sharp a peak where
our estimate for pure GM is 130, tailing off either side of the estimate within a single
intensity step. Subsequently, our segmentation function results in isolated peaks in the
image where the pixel value coincides almost exactly with the GM estimate. Furthermore,
the situation is little improved by fixing the denominator in equation 3.3 to a large value.
In application, we do indeed end up with the poor results shown in figure 3.4, as the
variance allowed in this estimator (the vj terms of equation 3.2) accounted for little of its
inherent crudeness.
3.3.2 A Different Segmentation

On reflection, it is quite surprising that the results presented in [Friston et al. 1995] are not
similar to those given here. Obviously a different segmentation is necessary, confirmed by
these preliminary. Altering the MRI data indicates that the results can improve, but this
3.3. Deriving the New PET Data 25
Figure 3.2: .
] On the left we see the segmentation of a T1-weighted MRI image, which itself is shown
on the right. This segmentation results from the use of the segmentation expression given
in equation 3.3. In this expression, σ is set, as recommended in [Friston et al. 1995], to
0.3. The effective segmentation kernel (as shown in figure 3.3), is obviously too narrow,
resulting in the highly selective segmentation result shown. Things improve as σ increases,
but in general, the segmentation remains a poor one.
1
The segmentation Kernel
0.5
0
100 110 120 130 140 150 160
Mean Intensity Value for GM
Figure 3.3: The segmentation function (equation 3.3) used to extract GM tissue about an esti-
mated mean. As the function shows, if a pixel’s intensity is anything more than a single intensity
step from the estimate, then it is not considered to be GM.
involves significant preprocessing, consisting of anisotropic filtering and user interaction

to determine an appropriate value for the GM intensity mean. A pre-segmented image
would certainly be more accurate, and this is exactly what is done in the following such
that all the MRI pixels marginally associated to GM are assigned a fixed intensity which
is then used as the intensity estimates for GM in the intensity normalisation procedure.
Consequently, segmentation shown in the result of figure 3.5 is more meaningful.
Apart from this alteration of the segmentation, the implementation of Friston’s method
is a faithful one. In the paper [Friston et al. 1995], it is said that the derivatives are
calculated “by hand”. Here they are instead explicitly derived, but this should not have
any real significance given the nature of their segmentation term. In later documents (for
example, [Kiebel et al. 1997]) in which their methods of image normalisation are described
(spatial and intensity), the method has been dropped, indicating, perhaps, that in this
formulation, it just didn’t work3 .
3
Incidentally, the later approach for aligning different modality image sets is a clever one. Firstly, the
PET image is registered to a PET-like image which is in a known registration to a model MRI image. The
High Resolution PET Original PET SPM Segmentation Result

Simulated PET MRI Image
Figure 3.4: .
] From left to right, the figures shown above are: the superresolution PET image derived
from the intensity transformed MRI data (without the convolution); the actual PET image;
the MRI intensity transformed image (with the convolution); the MRI GM segmentation;
and the original T1 MRI image.
High Resolution PET Original PET SPM Segmentation Result MRI Image (Pre- Segmented)
Simulated PET
Figure 3.5: h
aving changed the segmentation function.] From left to right, the figures shown above
are: the superresolution PET image derived from the intensity transformed MRI data
(without the convolution); the actual PET image; the MRI intensity transformed image
derived from the intensity transformed MRI data (with the convolution); the MRI GM
segmentation; and the pre-segmented T1 MRI image.
3.4 A Different Expansion

A first, sensible extension to the method given above would be to include the tomographic
forward projection into the model, such that our tracer distribution be estimated on the
basis of the sinogram data. The intention for now, however, is to derive an image-space
correction method, so we shall postpone this issue until the discussion at the end of the
chapter. The need for a complete reformulation of the intensity normalisation process is
anyway required, and this should be done as a three compartment model, incorporating
known estimates of WM, and CSF activity levels in addition to the existing GM infor-
mation (the implicit fourth compartment being background, for which zero activity is
expected). The use of the Taylor’s Series to expand the terms into a solvable form, and
the modelling of intensity variation with basis functions are sensible, but only this latter
aspect of the expansion need be retained. This is what is done in the following where
MRI data that is to be registered is aligned to the model MRI image, and hence the registration is now
possible.
3.4. A Different Expansion 27
where I use segmented images as input to the expansion procedure. The input images are
the result of a fuzzy segmentation algorithm [Gustafson and Kessel 1979], which gives us
effective probability maps of affinities to GM, WM and CSF, given as mg , mw and mc ,
respectively. The expansion follows the general form of equation 3.1, but we now say that,
λoj ≈ hj ∗ [gγjg {mjg } + wγjw {mw c c

j } + cγj {mj }], (3.13)
where each intensity transformation function is again made up of basis functions and
non-stationary coefficients (equation 3.14), and g, w, and c are normalised ratio contribu-
tions. For example, if we choose 10 : 3 : 1 for GM:WM:CSF (the model of [Kiebel et al.
g
1997]), then g = g+w+c , and so on4 . The basis functions are derived from:
B−1
γjη {mηj } = mηj
X
uη;b βjb , (3.14)
b=0
for η ∈ {g, w, c}. We again must determine the solution to an equation of the form:
λ = A · u, (3.15)
which is again overdetermined (J > 3 × B). Ignoring for now the convolution term,
the matrix A in this instance is of the form:
g g
β1,j=0 .mj=0 , .., βB,j=0 .mj=0 , β1,j=0 .mw w c c
j=0 , .., βB,j=0 .mj=0 , β1,j=0 .mj=0 , .., βB,j=0 .mj=0
 
g g
β1,j=1 .mj=1 , .., βB,j=1 .mj=1 , β1,j=1 .mw w c
j=1 , .., βB,j=1 .mj=1 , β1,j=1 .mj=1 , .., βB,j=1 .mj
c
 . 
 . 
 . 
 . 
.
.
. (3.16)
 
 .
 . 
 . 
 . 
.
 . 
g
β1,j=J−1 .mj=J−1 g, .., βB,j=J−1 .mj=J−1 , β1,j=J−1 .mw j=J−1 , .., ...
... βB,j=J−1 .mwj=J−1 , β c c
1,j=J−1 .mj=J−1 , .., βB,j=J−1 .mj=J−1
And the second 1 × 3B matrix that holds the coefficients of equation 3.17 is,
u = (ug;0 , ..., ug;B−1 , uw;0 , ..., uw;B−1 , uc;0 , ..., uc;B−1 )T . (3.17)
Results of the Intensity Transformation

The results of this expansion are shown in figure 3.6, and it is reasonable to say that
they are better. Indeed, when the PET image was simulated from the MRI data with
convolutions and Poisson additive noise, then the results are very nearly perfect (fig-
ure 3.7). Obviously this is practically rigged (our test data set matches the assumptions
of the algorithm exactly), but bear in mind that the benefits of accurate registration and
segmentations are invaluable [Cocosco et al. 1997, Lipinski et al. 1997], an advantage
figure 3.6 does not have.
4
Experimentation showed that the setting of these values would at best only aid the convergence of the
optimisation process, as they are absorbed into the coefficients of the basis functions. Hence the algorithm
is robust to errors in estimates of the activity ratios.
Important to note is that as we increase the number of basis functions to model the
transformation with a better accuracy, we obviously get much better results. Unfortu-
nately, there is a strict limit on how many basis functions we can apply. This occurs
because the Nag routine used requires that the image matrix of equation 3.16 be de-
clared as double systemMatrix[2*totNumBasFuncs][xRes*yRes], which poses obvious
problems.
The Intensity Transformation with B=64 Basis Functions, on 64x64 Pixel Images:
(GM)
The New Prior Image Original Filtered Back-Projected MRI Intensity MRI Segmentations Basis Function
PET Data Transformed Image Granularity
(WM)
and the Intensity Transformation with just B=16 Basis Functions, on 64x64 Pixel Images.
Figure 3.6: From left to right, the figures (both top and bottom rows) shown above are: the
superresolution PET image; the actual PET image; the MRI intensity transformed image; the
MRI WM segmentation; the MRI GM segmentation; and the highest frequency (horizontal and
vertical) basis function image used. Note that the top image returned σ = 4.186, and the image
below (using less basis functions) returned σ = 10.78 from the NAG routine.
Figure 3.7: From left to right, the figures shown above are: the superresolution PET image; the
actual PET image; the MRI intensity transformed image; the MRI WM segmentation; and the MRI
GM segmentation. To note here is that given an accurate estimate of the activity distribution’s
delineation, and also that these distributions are homogeneous, exact recovery is possible. This is
also true of the pixel-based correction methods of chapter 2.
3.5. Related Discussions 29
3.5 Related Discussions
3.5.1 The Intensity Transformation as a Forward Model
The intensity transformation can in many respects be thought of as a rather crude forward
model of the tracer uptake given some reasonable anatomical knowledge and an estimate
of the PET scanner’s PSF. On the basis of this forward model, it is possible to re-engineer
the solution having determined its free parameters.
It was along such lines that Kosugi and fellow workers derived an intensity mapping
using an artifical neural network [Kosugi et al. 1995]; finding the weights of the network’s
middle layer is very much akin to deriving the coefficients in the equation 3.14 which
are used to balance equation 3.13. And it is on this basis exactly that they derive a
PET correction method: in deriving a relation between MRI intensities and PET activity
levels, they constrain subsequent correction techniques to adhere to this mapping. This
would indeed seem “okay” for one study (given such a multi-layer neural network will
anyway approach the least-squares solution [Ripley 1996]), but it is irrelevant for any
other. This is because no two patients are the same, and no two acquisitions are the same,
and the solution that is valid for one study is not valid for another. Of course, the method
presented in this thesis is very dependent on the accuracy of the tissue delineation, and
the intensity transformation must be repeated for each study. But in comparison to the
overheads associated with training a neural network to achieve this mapping, even when
discarding the assumption in [Kosugi et al. 1995] of validity across studies, then the basis
function approach reported here is clearly more feasible.
In hypothesising how to relate [partially observed] activity to its underlying structure,
the conclusions are that one should not over-simplify matters via assumptions on homo-
geneity, and that it would also not be appropriate to over-characterise the activity-tissue
relationship as this would, in its limit, lead to the PET data becoming an irrelevance. The
problem is actually quite a subtle one, and one which will turn up time and time again
when trying to include priors in such inverse problems.
3.5.2 Regarding the Basis Functions
Choice of the Basis Functions
As figure 3.8 shows, the more basis functions that can be applied, the better the fit that
can be attained. That is, with enough degrees of freedom, one can fit any data, even
when it is not worth fitting. But as has been mentioned, this increases the computational
overhead quite considerably. Empirically I have found that there should at least be 6
basis functions to enable the transformation to function such that a transformed 128-
by-128 pixel MRI image visually approximates the PET image. But because this figure
is so low, the selection of which basis functions should be used will have a fundamental
effect on the resulting transformation. The choice of the cosine functions is certainly
sensible, but it is also arbitrary. It would be very useful to have some physiological
justification in the choice of these functions, such that one could capture global activity
through the summation of appropriately modelled localised activations [Fox et al. 1985,
Worsley et al. 1992].
The granularity of the

64 Basis Functions: Basis Functions.
Original PET Images

Prior Derived from Uniform MRI Images
Intensity Transformed Uniform Images
16 Basis Functions:
The granularity of the

Basis Functions.
Figure 3.8: This figure shows how the basis functions alone are able to model the PET distri-
bution. Here, 64 basis functions are used (corresponding to a dimension of 8) in the algorithm of
equation 3.13 for the top set of 64-by-64 pixel images, and 32 are used for the 32-by-32 images
below. In both cases, the MRI data input to the algorithm is a uniform image; i.e., uninformative.
The Notion of a Neighbourhood

√
In using less basis functions than image pixels (i.e., B < J), there results an explicit
notion of a neighbourhood imposed upon the modelling process. To be certain that any
image data can be accurately modelled by basis functions, it is necessary to have basis
functions displaying a granularity that can meet the resolution of the image data (requir-
ing, for example, that the Nyquist limit be met). This is not, however, possible in this
application where the additional constraint of computational cost must be considered. As
such, a neighbourhood property comes about because the prior that results has only one
degree of freedom (its scaling coefficient) with respect to its contribution to the estimate.
Therefore, the basis image of highest granularity is most influential it resolving detail, but
this granularity can only be effective over a severely restricted region in the image - it’s
neighbourhood. In the transformation then, the MRI image will dominate if there are
not sufficient basis functions to allow sufficient variation. In figure 3.9, for example, the
transformation uses only four basis images (B = 2), and the edges of the segmentation
3.5. Related Discussions 31
estimates subsequently dominate. The result is that the transformation is too inflexible; it
constrains the segmentation estimates too much, allowing them no considerable variation
in their efforts to resemble the PET distribution to which they are associated. Indeed, the
results are akin to those of the previous chapter’s correction methods. In such situations,
the MRI segmentation may actually remove detail. Consider the images in figure 3.10.
These show regions in the image data where the MRI data exhibits an homogeneous tissue
distribution. The PET data, however, shows some variation. If the neighbourhood of the
basis functions is too large, the Nyquist rate is not met, and there is no way in which
the variation that is shown in the PET data can be reproduced in the correction. It is
necessary, therefore, to use basis functions that can capture PET activity at its finest
possible localisation of activity, otherwise, dependent on the conditions, this too loosely
constrained transformation is capable of removing edge information that was in either of
the original sets of input data.
If computational overheads were not a consideration, then this would not be a problem.
Then again, it would not be a problem to find a very good transformation of any image to
the PET data. This of course would make no sense, and the transformation would have
been given too many degrees of freedom; and even noise might be reproduced in the result.
As such, the basis functions should meet the need to find localised regions of activity, but
they should not go beyond this. In this respect, they could then indeed be thought of as
having a physiological interpretation [Fox et al. 1985].
Basis Function Granularity.
The normalisation process using just 4 basis functions (dimension of the functions being 2).
The Images are 128x128 pixels, and the PSF was 4 pixels FWHM.
Figure 3.9: The above intensity transformation was performed using only four basis functions.
The result is that the segmented MRI images must dominate because the transformation simply
does not have enough degrees of freedom to prevent this. With only 4 degrees of freedom (resulting
from the 4 scaling coefficients), each covers a rather large neighbourhood, and is unable to do
anything but scale all these higher frequency values by an equal amount. Their contrast, therefore,
remains. Interesting in the above is how similar the superresolution PET image is to that derived
using the PET correction method of [Müller-Gärtner et al. 1996] shown in figure 2.1. This, of
course, does make sense given that the segmentations used were the same, as were the initial
estimates of activity levels. Important to note is that the intensity transformation approach is
highly robust to errors in this assignment of activity levels, where as the pixel-based correction
methods are not.
3.5.3 Suitability as a Prior

It was never envisaged that the intensity transformed MRI segmented image would be
presented as a virtual modality image capable of replacing the reconstructed PET data.
Necessary Granularity of
MRI Data: PET Data:
the Basis Functions:
Figure 3.10: The image on the left shows a region in the MRI data of homogeneous tissue. The
corresponding PET data shows variation in activity, which is not considered to be noise. The
resulting correction could therefore mask this activity, and remove the detail. Such a situation
could occur, for example, in the case of a hypermetabolic tumor, which might otherwise not be
distinguishable from the normal GM tissue when located within such a homogeneous distribution
[Lipinski et al. 1997]. The necessary granularity of the basis functions required operates, therefore,
at twice the highest frequency of the observable PET data.
This superresolution PET image could in no way be acceptable as that which is presented
to the clinician, or medical doctor, et. cetera. For one thing, its biological and pathological
relevance may be all too tenuous, which would require a re-think of the original assump-
tions on the distributions. The application that was in mind when work began on its
derivation was to provide support to the reconstruction process as either a Bayesian prior
in the Maximum A Posterior (MAP) reconstruction process, or just as stochastic con-
straints allowing inferrals of “missing data” in the EM-ML reconstruction process. These
two themes are returned to later in this report, and the remainder of this section develops
the open questions that arose from an analysis of the method.
Uniqueness of the Solution - The Need for Linearisation?

One open question that remains of this approach is how unique is the solution for the
superresolution image? Is it the only distribution that explains the PET data? It is not
clear from the original Friston paper that the Taylor series expansion is done to enable
a unique solution. The solution is the “best” only in the least-squares sense. Is this
a necessary extension to the method presented here? One must be careful, however,
in posing this question. The transformation as described in the above has a potentially
infinite number of basis functions that can be applied to model the data. Should, therefore,
a single set of parameters allow a solution, then an infinite number of them will [Scales
and Smith 1999].
Localising Confidence in the Global Fit

The methods demonstrated above derive global fits and return a value indicating the
quality of the fit. This means that errors (due, for example, to misregistrations or to a
poor segmentation) may cause the fit to be meaningless in certain regions. In minimising
3.6. Thesis Outlook 33
the global least-squares difference, acceptance of wildly inaccurate fits in one region may
be the result of achieving a good fit elsewhere. In its envisaged application as a prior,
there is a need therefore, for its validation to be performed within localised regions. This
is covered quite effectively in section 10.4.1.
3.5.4 Further Constraints

The current solution is quite unconstrained in terms of the original estimates of activity
levels (from the PET data). For example, the solution can also yield negative values.
The first necessary adjustment to the least-squares estimator is therefore the imposition
of these positivity constraints. But to have any real physiological justification this would
also require using basis functions that themselves are non-negative, and as we shall see, this
presents the notion of using spherical basis functions instead of square indicator functions
(the traditional pixels) as basis functions in the reconstruction procedures (this is clarified
in section 5.2.3 and appendix C). The DCT’s ability to represent arbitrary signals in
such a compact manner is to a large extent possible because it can go negative. To seek
new basis functions that did not go negative would almost certainly mean that one basis
function per pixel would be necessary. Hence the correction scheme derived in the above
could only claim a true physiological justification were we to use a basis function centered
at each pixel, which argues in favour of iterative reconstruction. Indeed, if, as previously
mentioned, we include the forward projection in the model, then we have a fully-fledged
reconstruction algorithm whose equivalent would be an appropriately constrained Bayesian
procedure (chapter 9) using the interpolation method of (section 5.2.2).
Briefly, to spare the details for the appropriate sections, the basis function deblurring
would be accommodated by including “blobs” as the pixel basis functions, yet not using
a final “over-imposition step” (blob to voxel step [Lewitt 1992]); and the constraining of
the reconstructed image to that of the expected distribution (taken from the segmented
MRI image) can be achieved using a Gaussian field (introduced in chapter 9).
This latter alteration implements a penalisation term for estimates of significant vari-
ation in respect of the compartment-model like activity estimates. Hence the solution to
producing this forward model has become complicated, and it is quite possible that the
best approach will involve a Maximum Likelihood (ML) based estimation procedure. That
is, we really should instead work in the realm of the reconstruction algorithms alone.
3.6 Thesis Outlook

The least-squares fit of the intensity transformation gives, depending on the individual’s
agreement with the initial assumptions, a sensible estimate of the tracer distribution at
the resolution of the MRI data. The result is a high resolution, low noise prior, whose
application within the framework of a Bayesian reconstruction scheme may allow for an
appropriately corrected, well regularised, reconstructed PET image (this is to be tested in
chapter 10). But it is the above discussion which indicates the direction of the remaining
work. The current inclusion of negative coefficient terms in the intensity transformation’s
solution is the main weakness of the method. This invalidates the realism of the physical
model, if, that is, it seeks to use the basis functions to indicate supposed brain activations.
Considered simply as a [2-D] signal that should be modelled, then the DCT does its job in
a very efficient manner. But in the pursuit to model the physical system, these (negative
coefficients) are undesirable inclusions in the resulting high resolution PET image, neces-
sitated by the need to fit a global solution for the transformation. To correct for this, and
include the other desirable properties that were discussed above, the procedure must be
re-written to:
P P
• Impose energy conservation on the solution (i.e., j petj = j newj , where petj
and newj are the original PET image and the resulting high resolution PET image
image, respectively). This would again be necessary to achieve physical realism, and
is a constraint that is no enforced in any of the correction methods of chapter 2.
• Enforce positivity constraints.
• Impose constraints to the estimated activity ratios of GM:WM:CSF.
It is possible, for example, to use spherical (or “radial” in the neural network literature)
basis functions to fit arbitrary signals, and how this might best be achieved to impose the
aforementioned distribution constraints is by using methods of ridge regression [Orr 1996].
A prototype version had been implemented in Matlab for 1-D signals, but one needs almost
as many basis functions as pixels to model the signal. Also of possible importance could
be a return to the paper from [Mǒch et al. 1997], which attempts to describe a generic
approach to the problem of modelling functional images using artificial neural networks.
At the time of first reading, however, I considered their apparent confusion of the terms
“generalization” and “error cost” sufficient to render their work all but useless. These
two possibilities can be considered as the last possible leads for the redevelopment of this
PVE correction routine in a physically realistic manner. Beyond that, conviction has it as
necessary to reconstruct on a one basis function per pixel footing, in which case it is only
sensible to reconstruct using statistical routines. And furthermore, the three issues raised
above regarding the rewriting of the procedures can now be addressed:
• Use any of the algebraic reconstruction techniques (chapter 6).
• Use, for example, the EM-ML algorithm (chapter 7).
• Use a Bayesian reconstruction method where the solution is constrained to specific

ranges using Gaussian fields (chapter 10).
As such, all of the above issues are addressed in the remainder of this thesis.
4
The Arguments for Statistical
Methods of Reconstruction
PET reconstruction based on the inverse Radon transform is not naturally applicable to
instances of discrete data acquired in any tomographic imaging system. It is at best (in CT,
for example) a good approximation which will worsen as the quality of the data degrades.
Such degradation is typical in the case of emission tomography. The conventional FBP
method used to reconstruct such images, is, therefore, based wholly on a mathematical
idealisation of tomographic imaging [Fessler 1994]1 . Furthermore, it is spatially invariant,
and adversely affected by limited statistics, as only in the highest count regions are the
number of recorded events sufficient to enable reasonable results. It would thus benefit the
FBP algorithm if it were able to adapt itself in accordance to the availability and quality
of the statistics in the image. However, the FBP algorithm is formulated in a manner that
makes either regional adaptivity or the incorporation of statistical information difficult.
Only by adopting the statistical approach to image reconstruction is it possible to in-
corporate detailed information pertaining to the characteristics of the acquisition process.
This knowledge is capable of supplying the appropriate bounding constraints to the highly
ill-posed problem of tomographic reconstruction.
One contribution of this work is to present an algorithm which allows the process
to adapt itself on a regional basis to the quality of the available statistics, thereby al-
lowing features of interest to be resolved at a finer scale than is possible globally (see
section 11.2.2). This is achieved with an addressal of the main criticisms of the algebraic
reconstruction methods; namely the convergence properties and the compromised (i.e.,
uniform) resolution.
The question of resolution is quite ubiquitous to this report. It has already been
introduced as the motivation to the PET correction methods of chapter 2, and although
the definition of a non-uniform resolution may be of secondary importance to this work,
the notion of better quantification wherever possible is not.
1
Indeed, [Older and Johns 1993] show that convolution back-projection can be considered as an ap-
proximate solution to the general least-squares formulation of the reconstruction problem, the simplest of
the statistical methods.
36 Chapter 4. The Arguments for Statistical Methods of Reconstruction
4.1 The Radon Transform and Filtered Back-Projection for

PET Reconstruction
In CT imaging, the application of the inverse Radon transform to reconstruct the images
seems appropriate [Bates et al. 1983]. This assumes, therefore, that the tomographic
acquisition process is adequately described by the Radon transform [Radon 1917]. This
transformation is introduced in the following.
The projection space coordinates of the tomographic system are defined by the number
of rotation steps (θ) and the number of radial offsets (ρ) that the scanner can accommo-
date. That is, they arise naturally on the basis of the physical characteristics of the data
gathering mechanism itself, and the measured data, the sinogram, corresponds to these
dimensions. The Radon transform is defined according to these coordinates as a line in-
tegral along the angle θ taken from the axis of a cartesian space, and at a radial distance
ρ from its centre. This is shown in figure 4.1. The Radon transform of this object is
denoted R(λ(x, y)), where λ(x, y) is the object. We can define this object in terms of the
projection space coordinates using the Radon transform:
Z ∞
y(ρ, θ) = R(λ(x, y)) = λ(ρ cos θ − u sin θ, ρ sin θ + u cos θ)du, (4.1)
−∞
where ρ = x cos θ + y sin θ. In PET, these line integrals are assumed to mathematically
describe the LORs along which the photons travel and are detected.
Figure 4.1: This figure introduces the geometry associated to the Radon transform. The line
integral is shown as passing through the object of interest; that drawn in normal cartesian space.
The line integral is defined by two parameters: its radial offset from the centre of the coordinate
space to its tangential points (ρ), and its angle (θ).
4.1. The Radon Transform and Filtered Back-Projection for PET Reconstruction 37
4.1.1 PET Reconstruction using Filtered Back-Projection

Associated to the Radon transform is the back-projection operator, B:
Z π
−1
R ≈ λ̂(x, y) = B(x, y) = y(x cos(θ) + y sin(θ), θ)dθ (4.2)
0
The operation of equation 4.2 is shown in [Deans 1983] to be linked to the adjoint
Radon transform2 . But, B =6 R−1 , and only approximates R−1 to a good degree of
accuracy if and only if :
• The Measurements are Continuous; i.e., the Lines of Response are infinitely thin.
1
Indeed, λ̂(x, y) is an image of λ(x, y) blurred by a PSF of √ , where the distance
(x2 +y 2 )
is from the centre of the frequency transform. The effect occurring corresponds to the
projections from all views contributing too much to central regions of the image data.
Subsequently, sampling is sparser toward the image’s periphery. This results in two char-
acteristic artifacts of the method: increased blur at higher frequencies and the so-called
star pattern effect.
We attempt to correct for these problems by filtering the data, and hence the name
(FBP). A number of filters may be applied, each attempting to recover the loss of high
frequency information. That is, the reconstruction is complete following an attempt to
recover the information lost through the inherent blurring that occurs because of the back-
projection operation. Figure 4.2 shows two filters that might be used and their behaviour
in both the spatial and frequency domains. In the presence of uniform noise, however,
a disproportionate amount will be amplified with respect to the image data [Jain 1989].
Subsequently, variation in these filters is typically based on the choice of apodizing window
used to compromise the loss of high frequency components whilst keeping noise levels to
within acceptable bounds.
Realistically, however, the PET acquisition system cannot be described with any ac-
curacy by the Radon transform. In comparison with CT, for example, PET is sparsely
sampled and has very poor statistics. These poor statistics result primarily from the
following [Budinger et al. 1979, pages 171–172]:
• Each event’s energy level is rather loosely gated in order to enable some discrimina-
tion of scatter events. The associated deadtime of the detector crystals due to the
light decay in the crystals was more than 400ns at the time of writing (for NaI(T1)),
is around 300ns for detectors made of Bismuth Germanate (Bi4 Ge3 O12 ) crystals,
and is around 40ns for crystals of the next generation’s scanners (Lu2 SiO5 (LSO))
[Casey et al. 1996].
• Detection of coincidences also requires verification, which is a further delay of around

10-15ns for BGO [Ollinger and Fessler 1997].
• The administered dosage must be strictly limited to that which is considered safe
for the patient. Associated limitations include the difficulty and costs involved in
producing sufficient positron emitter activity.
2
More precisely, the adjoint Radon transform is two times the back-projection operator.
Amplitude Amplitude
Ideal Frequency Response
0 0
Ideal Spatial Response

Shepp-Logan Spatial Response Lowpass Cosine Spatial Response
Shepp-Logan Frequency Response Lowpass Cosine Frequency Response
Figure 4.2: The above figures show responses for two filters typically used in FBP. That on
the left is the Shepp-Logan filter, and that on the right, a Cosine filter. Note that as frequency
increases, the filter response increases. That is, the purpose of the filters is to recover the blurring
effect introduced by the back-projection operator. Theoretically, this can be exactly recovered (the
ideal response is shown). For real data (i.e., that which includes noise), the application of this filter
would have the undesirable effect of amplifying mostly noise. And hence the apodizing windows
shown in the figure that suppress this.
Assumptions must be made to mathematically relate the acquired data to that of

the imaged object. To say that the tomographic process can be described by the Radon
transform is to say that each sinogram value approximates a distinct line integral through
the imaged object, and we are thus able to use deterministic inversion formulas (e.g., FBP)
to reconstruct a function from this approximate line integral. Unfortunately, in allowing
this we are effectively presuming that each projection is infinitely thin, uniform in depth,
and unattenuated. Hence the line integrals (or approximations to) which represent the
Radon transform of the tracer distribution, require processing additional to their inversion
(to yield the image of the distribution), as the transform assumes a continuous sampling of
the emission. The line integrals do not approach infinite thinness because the physical size
of the detecting crystals is quite large, we actually have tubes of response instead of lines
of response (LORs). Furthermore, the sinogram values only approximate the actual mean
value for the line integral when the count-rate is sufficient. For low counting statistics the
estimate is a very poor one. Exact uniformity of depth is also impossible; this would be
true only if we recorded events occurring between detector pairs that faced each other,
and as we record emissions from [near enough3 ] any detector pair, the depth will vary a
great deal. And finally, there will always be some attenuation.
With such an incomplete formulation of the acquisition process it is no surprise that the
final FBP images are sub-optimal and then warrant the use of image restoration methods.
However, it is possible to directly incorporate large elements of this missing detail into our
reconstruction methods. This firstly necessitates improving the model of the acquisition
process.
3
The detectors may also belong to different detector rings, the number of which being determined by
the Ring Difference setting, which is the upper limit on the number of contributing neighbours.
4.2. Improving the Model of the PET Acquisition Process 39
0
0, ....
Figure 4.3: This figure shows a typical sinogram. Its characteristic sinusoidal shape is evident,
and its axes show its variation for differing values of θ and ρ, respectively the number of angular
views and radial offsets of the imaging device.
4.2 Improving the Model of the PET Acquisition Process

The crystal scintillators of the gamma camera operate in a counting fashion. Additive
energy relates to additive components in the image plane, and we would thus generally
assume that the formation process is linear in λ(x), our PET image (i.e., the activity distri-
bution), where x denotes a position vector in cartesian space. Given that emissions operate
under counting statistics, we can further assume that they occur according to a Poisson
process having the [unknown] intensity function of the PET image to be reconstructed.
The first complication in the formation of the model arises from a correction process:
the real-time correction for randoms4 using the delayed-window approach renders the data
non-Poisson [Yavuz and Fessler 1997]. A second complication is the nonlinearity which
exists in the form of a single deadtime correction factor [Daube-Witherspoon and Carson
1991], di (where i indexes the LORs). This can, when available, be calculated from an
empirical formula supplied by the manufacturer of the scanner5 .
A further significant component of the PET acquisition process results from scatter.
The predominant effect in PET is due to Compton scatter, although the photo effect and
Rayleigh scatter also contribute. In combination, these processes produce scatter fractions
of 10 to 20% in 2-D, rising to between 30 and 50% in 3-D. This contribution can to some
extent be removed through the simple thresholding of energy levels (a process occurring
4
Randoms (or accidental coincidences) are events mistakenly registered as occurring due to the same
annihilation event, but are in fact the result of separate annihilation events. For this to happen, the events
must occur at approximately the same time, and emit gamma rays at angles such that they combine to
indicate a different emission entirely.
5
Detector deadtime refers to the finite period of time required during event processing to determine the
validity of these events. This usually involves position calculations and the evaluation of energy levels to
discriminate true events, during which time the detector is unable to process further events, and hence the
name.
during the scanner deadtime), or by explicitly modelling this contribution in the hope of
confining and minimising its end effects (see, for example, [Bergström et al. 1983]).
Combining these elements, we are able to arrive at the following model for the recorded
events6 :
yi = γi [ηit pi mi + ηir ri + ηis si ], (4.3)
where mi is the number of annihilations along the LOR indexed by i, pi is the survival
probability of a photon (that is, the probability that it does not interact as it propagates
along the LOR), ri is the number of accidental coincidences, si is the number of scatter
events, ηit is the probability of detection for true events, ηir is the probability of detection
for accidental coincidences, ηis is the probability of detection of scattered events, and γi is
the probability of an event not being lost due to deadtime [Ollinger and Fessler 1997].
4.2.1 Defining the Parameter Estimation Problem

The survival probability, pi , thought of as attenuation, is the simplest of the aforemen-
tioned characteristics to model. It is in fact measured: its value being derived as the ratio
of the transmission and blank scans7 . Randoms introduce a component of nonlinearity
into the model, where the single rates required to model the dependence on the source are
not directly available. And hence estimates for these are normally obtained with the de-
layed coincidence window [Hoffman et al. 1981], and their contribution, ηir ri , may to some
extent be accounted for. This, however, results in the previously alluded to consequence
of a non-Poisson distribution, which might then require an alteration of the underlying
assumed distribution (see section 8.2.1).
We may take a pragmatic approach to dealing with scatter (we know it is there, but
there is not much that can be done about it), and model its contribution on the basis of
the referred to manufacturer’s estimate. In all, we should be able to assume that both
scatter and random contributions are known and estimated separately.
Assuming that ri (λ) = γi ηir ri and si (λ) = γi ηis si , the model can now be re-written as
an accumulation of counts over time [Fessler and Ollinger 1996]:
Z
y i (λ) = di T hi (x)λ(x)dx + si (λ) + ri (λ) , (4.4)
x
where di is the deadtime correction factor, T is the scan time, (x) is the position vector in
image space, hi (x) is the scatter free, point-response function of the same detector pair8 ,
si (λ) is the mean rate of detected scattered events for the ith detector pair, ri (λ) is the
mean rate of detected random coincidences for the ith detector pair, and the integral is
over the scanner’s field of view. Hence, the emission density determines the measurement
means (almost linearly), and the random coincidences ri (λ) depend nonlinearly on λ. Due
6
Note that the data modelled in this equation is stored in sinograms; 2-D arrays indexed by i, a
simplified notation for combining the propagation distance ρ and the propagation angle θ which define the
coordinates of the projection space. See figure 4.3.
7
The transmission scan records the number of transmitted events, ti , for an external source containing a
long-lived isotope. A blank scan is just a transmission scan without the presence of the patient or phantom
study. Typically, such scans are performed on a daily basis as part of the scanner calibration process.
8
hi (x) is the probability that a positron emitted from a nuclei at position x will produce a pair of
¯
annihilation events that are detected by the ith detector pair without scattering (including geometric
effects, attenuation, and detector efficiencies).
4.3. Methods of Attenuation Correction 41
to scanner deadtime, the measurement means for high count rates are highly nonlinear
functions of patient activity. There are, however, no attempts to include deadtime non-
linearity in this model, hence the typical approach is to separate the nonlinear deadtime
loss from the ideal linear relationship between λ and yi .
This models simplifies to the approach outlined in [Fessler and Ollinger 1996]. The
measurement means (y i ) are given to be mean in λ:
J−1
X
y i (λ) = aij λj + si + ri , (4.5)
j=0
where the system matrix is defined as being

Z
aij = di T hi (x)φj (x)δx. (4.6)
As before, T denotes the scan time, hi (x) is the point-response function of the ith
detector pair, where φj (x) is typically just the indicator function for the j th voxel (i.e.,
it is a basis function; see section 4.4 below), and di is the deadtime loss factor. Hence,
the function of the matrix is to perform the projection-space to image-space coordinate
transformation. And by being based on an exact stochastic model of the projection mea-
surements, it is no longer necessary to assume that each projection is infinitely thin,
uniform in depth, and unattenuated. It is apparent from equation 4.6 alone that this
system matrix will be of fundamental importance to the success of any reconstruction
algorithm, and as such, its warrants a chapter of its own (5).
The observed data itself, yi , 0 ≤ i < I is defined in terms of a Poisson regression
model:
y yi i
yi ∼ P oisson{y i (λ)} = exp(−ȳi ), (4.7)
yi !
where I is the number of coincident detector pairs, λ is the emission density, and
y i (λ) is the mean of the ith measurement9 . Given we are dealing with radioactive decay,
the appropriateness of this model is only questionable when the data is precorrected for
randoms (see chapter 8).
4.3 Methods of Attenuation Correction

To allow any form of quantitative analysis, PET sinograms must be corrected for the effects
of attenuation. As photons travel through any attenuating medium, they will experience
an exponential decay in accordance to the attenuation coefficient, µ, of the medium. If
the distance travelled in the medium is given by s, then the recorded energy Er will be
the attenuated original energy Eo for each LOR:
Er = Eo exp−µs . (4.8)
This is obviously a simplification of what actually happens. For one thing, the attenua-
tion coefficient is unlikely to be homogeneous. Also, different attenuation media are likely
to be encountered, in which case the additional attenuation factors must be included:
P
Er = Eo exp− i µi (x,y)si
, (4.9)
9
Each i corresponds to a unique ρθ pair; that is, a unique LOR.
where i indexes the different media, µi (x, y) is the inhomogeneous attenuation coeffi-
cient for medium i (in 2-D), and si denotes the distance each photon must travel in this
medium10 .
The exponential term in equation 4.9 can also be interpreted as the probability of a
single photon passing through the medium. If we now consider the emission occurring
in figure 4.4, thenR the photon travelling along s1 will reach the detector with proba-
bility p1 =R exp− s1 µ.ds , and the other photon will reach its detector with probability
p2 = exp− s2 µ.ds . Assuming the attenuation medium to be homogeneous, and given that
the photons travel independently of each other, then the probability that this particular
annihilation is recorded within the coincidence window, i.e., as a valid event, is given by
[Kak and Slaney 1988]:
R
exp− s1+s2 µ.ds
. (4.10)
This is a very important result, as it is clear that this attenuation factor is independent
of the position of the annihilation along this LOR. As such, compensation for attenuation
can be achieved using a transmission study to correct for the emission study. This is
simply achieved by dividing the latter by the former.
Detector Pair i
Pixel j S1
LOR
S2
Figure 4.4: The annihilation event occurring along the LOR shown in the above figure yields two
photons that must travel distances s1 and s2 to reach their respective detector pairs. As explained
in the associated text, the attenuation factor is independent of the position of the annihilation along
the LOR. As the distance s1 , for example, increases and its photon’s energy undergoes greater
attenuation, s2 must decrease, and its photon will incur less. Hence the combined attenuation
factor remains as constant.
4.4 Background Theory - How the Activity Distribution is

Digitised
Classical estimation methods assume that the tracer distribution λ(x), can be represented
as a set of basis functions [Vardi et al. 1985, Ollinger and Fessler 1997]. Suppose then
10
Normally, the sum in equation 4.9 will be given as an integral over the distance si .
4.4. Background Theory - How the Activity Distribution is Digitised 43
that the unknown continuous image distribution λ̂(x, y) (we consider only the 2-D case)
is approximated by a finite series, with (x, y) indexing the image space [Jain 1989]. We
can therefore express our distribution as:
M X
X N
λ̂(x, y) = fm,n φm,n (x, y), (4.11)
m n
where the φm,n are a set of basis functions applied over this image space (see originally
equation 4.6); that is, they are indicator functions (equation 4.13) for the j th voxel (as-
sumed to be either off or on). This means that our sinogram data, y(ρ, θ), where ρ and
θ index the sinogram data in a continuous fashion, can be defined in the following terms:
M X
X N M X
X N
y(ρ, θ) ' Rλ̂(x, y) = fm,n [Rφm,n (x, y)] ≡ fm,n rm,n (ρ, θ; x, y), (4.12)
m n m n
where R is the Radon transform, and rm,n (ρ, θ; x, y) is the Radon transform of φm,n (x, y),
the projection matrix, which is computable in advance. The unknowns, fm,n , may be
solved as a set of simultaneous equations, and the λ̂(x, y) may thus be derived from
equation 4.11.
4.4.1 The Estimation Methods

To simplify matters, we assume that fm,n ' λ(x, y) ' λm,n , given λ is digitised on an
M × N grid such that it is assumed constant within each pixel region. That is,
1 if inside the (m, n)th pixel,

φm,n (x, y) = (4.13)
0 otherwise.
Putting the sinogram data onto a discrete grid, 0 ≤ r < R and 0 ≤ t < T , equation 4.12
subsequently becomes,
M X
X N
y(ρr , θt ) ' λm,n rm,n (ρ, θ). (4.14)
m n
And mapping λm,n into a J × 1 (J = M × N ) vector λ, in matrix form, we get,
y ' Aλ, (4.15)
where y and A are I × 1 (I = R × T ) and I × J arrays, respectively. The reconstruction

problem is, therefore, to estimate λ from y. And hence the importance of the transition,
or system matrix, in either the statistical or algebraic techniques.
4.4.2 Alternatives to Voxels for Representing the Reconstructed Images

On the basis of work originally reported in [Lewitt 1990], an alternative representation
for voxels in image reconstruction algorithms was proposed [Lewitt 1992]. These were
based on generalised Bessel functions, Gaussian-like functions having finite support (see
the original papers for the in-depth theoretical evidence). Justification is on the basis
that the sharp edges that constitute the traditional indicator functions of equation 4.13
are considered unsuitable to this real-world application. But I would contend that given
sufficient sampling (i.e., in accordance to Nyquist’s theory), then there is no reason that
the normal indicator functions cannot be used.
In the above definition of equation 4.12, the elements of the projection matrix, rm,n
are assumed to be the line integral lm,n of the m, nth basis function φm,n (x, y). As we
shall see in the next section when we consider in more detail how a projection matrix
can be built, the use of cubic voxels - spatially limited basis functions - yields a sparse
projection matrix. This finiteness is in many respects an attractive property when we
consider the implementation details, but it has unattractive properties that are manifested
in the reconstructed images as we shall also see. The main theoretical argument for the
use of generalised Bessel functions, however, is their additionally good localisation in the
frequency domain, which is infinite in the case of voxels. An effectively limited function is
said to decay as rapidly as possible for points lying with increasing distance from its origin.
This only being achievable using basis functions with multiple continuous derivatives in
the spatial domain [Jacobs et al. 1998].
We will return to this topic in more detail, but what is basically being said is that
the projection matrix may be built by replacing the definition for the basis functions
φm,n of equation 4.13 above, with so-called blob-like functions (see appendix C). This is
communicated in figure 5.5 of section 5.2.2 where a scheme is implemented showing some
equivalence to that suggested in [Lewitt 1992]. Later sections of this report extend the
use of these methods, showing the validity of being flexible with respect to the nature of
the basis functions used.
4.5 An Algorithm and an Objective Function

Having decided on our basic model for the PET imaging process, and the desire to use sta-
tistical methods to produce the images, all that remains is the formulation of an algorithm.
[Fessler 1994] defines the five necessary components of Statistical Image Reconstruction
(SIR):
1. A finite parameterisation of the positron-annihilation distribution. For example, its

representation as a discretised image. That is, formulate the voxel-based model of
the brain such that a distinct intensity describes each of these. Here we use λj
,j = 0, ..., J − 1. This has been shown in section 4.4.1.
2. A system model that relates the unknown image to the expectation of each detector
measurement. Thus we relate the J pixels to the I detectors via an I-by-J matrix
(A) of conditional probabilities. The probabilities (aij ) are derived from physical
considerations (which may include geometric effects, attenuation, and detector ef-
ficiencies), and express the probability that an emission from voxel j is recorded
in detector i. This is known a priori for all I and J, and was first introduced in
equation 4.6. The development of this matrix is the subject of chapter 5.
3. A statistical model for how the detector measurements vary around their expecta-
tions. Normally this would be a Poisson regression model of equation 4.7.
4. An objective function that is to be maximised to find the image estimate. [Shepp

and Vardi 1982] and [Lange and Carson 1984] proposed the Maximum Likelihood
4.5. An Algorithm and an Objective Function 45
(ML) method to estimate λ0 , ..., λJ−1 from the measured counts in the I detectors.
This example is given later in equation 7.24 of section 7.4.
5. An algorithm, typically iterative, for maximising our objective function, including

specification of the initial estimate and stopping criterion. The development of such
algorithms are the main theme of this thesis (chapters 8, 9, 10, 11 and 12).
Hence these components are covered throughout the remainder of this thesis.
5
Characteristics of the System
Matrix
As was indicated in the previous chapter, the definition of the transition, or system ma-
trix is of fundamental importance to the algebraic methods of tomographic reconstruction.
Primarily, it describes the tomographic process, thus relating the image data to the mea-
surements that are taken. This is done in the form of a probabilistic framework such that
coefficients are assigned relating an emission occurring within a known pixel site to each of
the detector pairs. How these probabilities are used to formulate an algorithm capable of
achieving a reconstruction of the image data is the subject of chapter 6, where we actually
see how straightforward the reconstruction process is, given a good stochastic model.
It is this chapter therefore that presents a lot of the groundwork requisite of the
reconstruction methods. If the acquisition model is based on the forward Radon transform,
and the pixel intersections follow that of the nearest neighbour method of interpolation
(see figure 5.4), then the assignment of these values is relatively straightforward. But, this
is an approximation of the process, which would ideally include such details as attenuation
and detector efficiencies, a model of the scatter fraction, and even deadtime effects. Its
design therefore, is developed to allow the best recovery possible of the initial distribution,
and as such, many issues must be addressed. This we begin in the following.
5.1 Defining our Estimation Problem

The digitisation of the image grid (section 4.2.1) allows us to define our estimation problem.
We assume the following:
• λ is a 2-D image of J pixels, where λj denotes the expected number of annihilation

events occurring at the j th pixel; the PET activity distribution to be reconstructed.
• y denotes the I-D measurement vector, where yi denotes the coincidences counted
by the ith of the I detector pairs. This is the sinogram data recorded by the scanner.
• aij is the probability that a coincidence originated at the j th pixel is detected in the
ith detector pair. This forms the stochastic model of the acquisition process.
or expected number of coincidences detected by the ith detector

• y i denotes the mean,P
pair, such that y i = J−1
j=0 aij λj .
5.1. Defining our Estimation Problem 47
Detectors
Pairs
Indexed
by i.
Pixels Indexed
by j.
The Scanner
Figure 5.1: Relating an image array to the acquisition process. Note the assumption here that the
digitised image touches the scanner’s FOV. The detectors of the scanner form a ring encompassing
the FOV. Within this FOV we place our image grid, indexed by j ∈ 0, ..., J − 1 pixels.
That is, because the emission of positrons is a Poisson process, yi is a sample from
a Poisson distribution with mean y i . Or, yi ∼ P oisson{y i (λ)} (see equation 4.7). In
conjunction with this information, we formulate our PET reconstruction problem on the
basis of the matrix form introduced in section 4.4.1. Namely,
y = Aλ, (5.1)
R
where A ∈ I×J is the system matrix that contains the weight factors between each
of the image pixels and each of the line orientations from the sinogram. Primarily then,
the matrix describes a transformation that emulates the acquisition process; i.e., an ap-
proximate Radon transform. To allow this, it assumes that the length of intersection of
the ith ray with the j th pixel, denoted aij , represents the weight of contribution of the j th
pixel to the total attenuation along the ith ray.
We further assume that our image of J pixels is 2-D, where J = M × N , such that
we will now index our pixels using indices m and n, and also that our sinogram data is
referenced in a similar manner. Thus, we let I = R × T , where R is the number of radial
offsets and T is the number of angular positions, and index individual elements with r
and t (r = 0, ..., R − 1 and t = 0, ..., T − 1). This allows us to construct our system matrix
using the following relation:
arT +t,nM +m = φ(ρr − xm cos θt − yn θt , θt ), (5.2)
where the (ρr , θt ) reference the I elements of sinogram, and φ(.) is the expansion
function in the image domain specifying how the pixel at position (xm , yn ) models the
image domain having the continuous positions (x, y) (see equation 4.11).
We are saying therefore, that each matrix element is the Radon transform of the ex-
pansion function, with the ρ-parameter shifted according to the pixel position. Naturally,
as each element assigns the likelihood of an emission in a known position being detected
48 Chapter 5. Characteristics of the System Matrix
by a particular detector pair, then the matrix must be properly normalised. Hence,
I−1
X
aij = 1. (5.3)
i=0
This says that with a probability of 1, each event will be recorded in one of the detector
pairs, which is actually not the case. SomePannihilations will inevitably go undetected,
I−1
and it is then reasonable to assume that i=0 aij ≤ 1. And as we shall see later in
the implementation of [Lange and Carson 1984], this property is accounted for as the
normalisation is corrected using an additional term.
5.2 Implementing the System Matrix

The system matrix can be built in a quite straightforward manner: visit each pixel and
check which LORs intersect with it; update the matrix accordingly. However, the normal-
isation of equation 5.3 requires visiting each pixel twice for each LOR, the second time
to make the normalisation adjustments to the assigned probabilities. Unfortunately, this
requires a second pass on the entire system matrix, so its construction can be relatively
slow. It must, however, be created just once for a given geometry. That is, each pixel
has associated to it a normalisation factor that can only be determined after all detector
pairs are visited for all pixels. This is a consequence of the chosen storage method (see
figure 5.2) and the normalisation requirement (equation 5.3).
We derive J normalisation factors by counting all of the pixels that are visited by each
LOR (indexed by i). These factors are summed for all I LORs, and then are applied
to each element of the system matrix. This involves visiting each pixel for each LOR a
second time to normalise the probabilities. For the first run, visit each pixel to determine
whether or not they are intersected by the current LOR, and if so, we simply use our
pixel width (∆x) for the value that may be written to disk1 . On the second run these are
adjusted according to the total number of possible LORs that intersected each pixel. The
basic procedure is the following:
reset count[0..J-1];
for each LOR 0 to I-1;
for each pixel 0 to J-1;
if pixel intersected by LOR
increment count[j];
write (∆x) to disk;
else
write (0) to disk;
end;
end;
end;
for each LOR 0 to I-1;
read System Matrix row i (a[0..J-1]);
1
It is worth noting that Toft’s code [Toft 1996] uses a predefined normalisation factor - determined from
the sampling distances - which means, therefore, that the probabilities are not properly normalised. And
although the algorithm follows that of [Lange and Carson 1984], the notion of energy preservation is lost.
5.2. Implementing the System Matrix 49
for each pixel 0 to J-1;

if count[j] > 0
write (a[j]÷count[j]) to disk;
else
write (0) to disk;
end;
end;
end;
The system matrix is thus stored in a kind of continuous stacked notation. Thought
of in 2-D it is easier to follow (figure 5.2).
What is important to note is that the current implementation assumes that the digi-
tised image touches the scanner FOV, as was shown in figure 5.1. But this is not always
the case, hence the implementation allows the user to adjust this to match the methods
of the ECAT software (from CTI) and that used by a Monte Carlo simulator developed
within our collaboration [Emert 1998]. Another assumption used in relating physical pix-
els sizes to the size of the LOR tubes, is to assume that the each LOR’s width is the
same as the pixel width. This therefore relates the radial views to the pixel spacing, and
subsequently to the physical spacing. It is done through the parameter fovRatio, which is
simply based on the ratio of the pixel dimensionality to the radial samples dimensionality
(R from section 5.1).
j
J
0 1 2
i
16384th
element. The System Matrix.
e.g., J = 128*128,
I = 160 (rho) * 192 (theta).
The 503,316,480th
element!
Figure 5.2: How the system matrix is stored to disk. Although the matrix A is very sparse, the
basic procedure is to store each of its elements aij to disk in a contiguous fashion. This is shown
in the above figure. The net result is, in this example, a 2Gbyte matrix, of which perhaps 98% of
the entries are zero. And hence the subsequent need for sparse matrix routines.
5.2.1 Nearest Neighbour Approximation

On the basis of the above algorithm, there are many strategies for populating the matrix.
For now, and to keep things simple, values are estimated using nearest neighbour inter-
polation. Misgivings regarding the use of this algorithm stem from the fact that only one
pixel for each point along the integration lines can be included. This then results in an
irregular, saw-tooth traversal of the pixels, which is an unrealistic representation of what
happens. An effective smoothing of these coefficients - or of the staggered path, depending
on how you think of the problem - adds firstly more physical realism to the model, and
secondly gives some implicit regularisation to the reconstructions. This is covered in more
detail later, with the basic concept of nearest neighbour interpolation being clarified in
the following.
At the exact centre of each pixel, (xm , yn ), is positioned a square of size ∆x × ∆x
which thus defines the physical dimensions of the pixel (we assume that they are square).
If the line with parameters ρr θt crosses the square, then the corresponding (to the pixel
and the sinogram observation) matrix element is set to ∆x, otherwise it is set to zero2 .
Letting ρ0 = ρr − xm cos θt − yn cos θt , then our algorithm becomes:
∆x if |ρ0 cos θt | < ∆x

and |ρ0 sin θt | < ∆x

arT +t,nM +m = 2 2 , (5.4)
0 otherwise.
We have thus cast a cartesian grid of J square pixels covering enough of the scanner’s
FOV sufficient to entirely span our imaged object. Ignoring contributions from scatter
and randoms, we assume that the length of intersection, or probability, of the ith ray with
the j th pixel, denoted aij , (∀i, j), represents the weight of contribution of the j th pixel
to the total activity along the ith ray. In terms of back-projection, we are saying that
the measurement total along the ith ray, yi , is representative of the line integral of the
unknown attenuation function along the path of the ray. In this model, the line integral is
a finite sum, and the tomogram is described by the system of linear equations first given in
equation 4.4 of chapter 4. This relationship between each LOR and the pixels in the image
space are shown in figure 5.33 , and figure 5.4 shows the trigonometric relations involved.
5.2.2 Accounting for Uncertainty in the System Matrix

There are many sources of uncertainty in the tomographic process. These are inherent
in both the source and the measuring apparatus. In PET, blurring, the result of such
uncertainty, occurs primarily for the following reasons4 :
• The photons are not exactly co-linear. The deviation that can occur, resulting from
some residual kinetic motion (recoil), can, for example, introduce a mis-positioning
of 1.7mm for a 800mm ring length [NAS 1996].
2
The notation follows that originally set out in section 5.1.
3
Note that the current implementation uses only values of 1 and 0 (pixel size), and does not yet
accurately calculate the degree of overlap which is implied by the figure. To do this is computationally
expensive, although, for accuracy, essential. Currently, however, it may be deemed to be an extravagance.
4
[Furuie et al. 1994] cites an additional reason for the blurring: the rebinning process that transforms
the 3-D sinogram measurements into separate 2-D “bins” [Defrise et al. 1997]. I am not sure, however, if
this is entirely correct.
Pixel Size
aij
LOR Width
Figure 5.3: Relating each LOR to pixels in the image space. The figure shows a single LOR
intersecting pixels in the image grid. For each pixel, j, the nearest neighbour method of interpola-
tion tells us whether or not a particular LOR has intersected it. Shown above in black is one such
intersection. To accurately model the system on the basis of this overlap alone is unrealistic given
the uncertainty inherent in the emission source.
• Positron range: before an annihilation event occurs, the positrons must find a free
electron with which to annihilate. This distance depends on the actual nuclide and
composition of the object being imaged. For example, for 18 F the blurring is quite
small (≈ 0.5mm), yet this may increase in images of the lungs where free electrons
may be more sparsely distributed than is the case in, say, the brain. We could then
expect uncertainty about 5mm regions.
• The photons may be scattered from one crystal to another resulting in a mis-
positioning of the detected photon. This effect occurs while the crystal interface
can not always be orthogonal to the direction of the photon’s arrival. Subsequently
the effect becomes more prominent as the emission sites occur on more oblique LORs.
This results in an elongation of the resolution spread function in the crystals [NAS
1996] where a photon may penetrate through one or more crystals before stopping.
This can cause in-plane resolution variation of up to 39% for BGO crystals [Hoffman
et al. 1982].
• Additional artifact resulting from patient movement can be considerable.
Considering the penultimate issue first, we find that a scanner can be approximately
modelled as a linear, shift-invariant (LSI) system. That is, we can associate a response
function to the measurements taken by the scanner; the scanner’s PSF. Every operation
of an LSI system reduces to a convolution, best modelled using a Gaussian kernel. Hence,
one would think that the obvious extension to the aforementioned scheme is to associate
r t
x ’
yn
xm
LOR
Figure 5.4: The simple trigonometry relating the LORs to each pixel. Shown is one LOR inter-
secting the image grid, but this time at a finer scale to that of figure 5.3. To determine whether
this particular LOR - of angle θt and offset ρr - intersects the pixel shown, equation 5.4 must be
satisfied. That is, the pixel is intersected if |ρ0 cos θt | < ∆x 0 ∆x
2 and |ρ sin θt | < 2 .
to each intersecting detector pair the system PSF. But although detector efficiencies and
attenuation factors (derivable from associated blank and transmission scans) are used,
factors such as the detector blur and inter-crystal penetration are not typically modelled
[Mumcuoğlu et al. 1994]. The exact computation of factors such as this resolution spread
function for every configuration and scanner is simply not practical. At best, workers such
as [Leahy and Qi 1998] use instead Monte Carlo simulations to yield a simplified model
of the detectors (see also [Mumcuoğlu et al. 1996]). One exception is to be found in [Qi
et al. 1997], for example, who model and deconvolve the system response within a MAP
reconstruction framework.
Sources of spatial uncertainty in the origin of the emission, however, should be mod-
elled. One approach is to accept that there exists such possible variation, and then account
for this in the tomograph model. That is, having determined which pixel the current LOR
intersects, then we can assign probabilities according to a probability distribution about
that pixel, designed to express the uncertainty. This is demonstrated in figure 5.5 us-
ing a Gaussian probability density function PDF, and sketches the model used in this
dissertation. This is essentially the interpolation approach originally muted by [Han-
son and Wecksung 1985], which will have equivalences to the methods of [Lewitt 1992,
Matej and Lewitt 1996, Jacobs et al. 1998] if their pixel basis functions are the same.
Indeed, [Hanson and Wecksung 1985] believed that many of the disagreements concerning
the development of the early CT reconstruction algorithms had their basis in the differing
approximations used in the projection estimation rather than in the underlying recon-
struction algorithms. Although the approach taken in this here does have similarities,
as the following shows, these were arrived at from an entirely different perspective: first
and foremost, the intention was to model the physics of the imaging device. This, as we
shall see, has led to the development of a quite different algorithm. Furthermore, given
we havea number of sources of uncertainty, the central limit theorem suggests a Gaussian
PDF as this will be most probable.
The procedure is then simply the following. The nearest-neighbour scheme - as em-
ployed in the algorithm of section 5.2 - determines which pixels it intersects, then for each
of these, a Gaussian PDF, whose FWHM is set by the user of the software, is centered on
that pixel in order to distribute the probabilities relating the LORs to the pixels beyond
that of the nearest neighbour alone. In this way, regions about the centre pixel are also
assigned probabilities according to whether or not they are overlapped by this Gaussian
PDF. Using the nearest neighbour scheme alone, these pixels would have otherwise been
assigned zero probability for the current LOR, but our uncertainty in the location of the
emission’s source deems it necessary to include these.
Pixel Intersection
LOR
(a) (b)
Figure 5.5: How uncertainty in the source of the emission is incorporated into the system proba-
bilities. About each pixel judged to be intersected by the LOR according to the nearest neighbour
interpolation method, a PDF (shown as a 1-D profile above in (a)) is placed to distribute the
probabilities of an emission. This is shown for just a single pixel centre in (a). That is, it is
fairly likely (with a probability determined by the PDF) that the emission actually occurred in
one of the neighbouring pixels. (b) shows how a LOR of response might be represented in terms
of probabilities for the pixels it intersects.
The effect this has on the reconstructions is shown for various width PDFs in figure 5.6.
The images shown result from using the ML-based algorithm of chapter 7 for 10 iterations.
It is interesting to see that the variation is considerable, and that care, therefore, should be
taken in determining the FWHM to be used. This, however, can be related to the image
resolution, and a form of regularisation is imposed upon the solution, as the likelihoods are
no longer assigned in a crisp manner. Indeed, given a sensible choice for the FWHM, this
interpolation method will actually remove the “checkerboard effect”, the characteristic
noise artifact exhibited at high iterations of the most popular statistical reconstruction
algorithm (see section 8.1).
It is possible that this effect of high frequency noise results from these sharp-edged
assignments in the coefficients resulting from nearest-neighbour interpolation, as here we
are introducing high frequency components into the reconstruction routine. Such noise
occurs due to the ill-posed nature of the estimation problem (i.e., wherever the statistics
are bad), and is seen to manifest itself at the resolution of the pixel basis functions of the
forward projection model. That is, with a FWHM of 1, we are just using traditional, square
pixels, so the checkerboard effect exhibits noisy, single square pixels. Where the square
representation is effectively smoothed by a Gaussian, this effect is much less pronounced,
and only isolated “blobs” are seen to occur in the image. This effect is seen to be prominent
in regions of poor statistics, as might be expected.
[Schmidlin et al. 1994] considers this noise effect to stem from inaccuracies in the
calculation of the system matrix coefficients. My argument therefore is how can one
accurately derive these values if the inherent uncertainty in the emission source is not
modelled? If an explicit form of regularisation is to be had according to the width of the
kernel used to model the uncertainty, then why not do this in low noise regions, with the
possibility that the localised improvement in the signal-to-noise (SNR) will effect a global
improvement [Llacer et al. 1991]? This is certainly desirable, as in doing this, differing
levels of emphasis can be applied to different regions and an overall improvement in the
reconstruction results can indeed be observed (this theme is developed in section 11.2.2).
5.2.3 Regarding the Analogy to Image Reconstruction Using “Blobs”

It has been mentioned that the PDFs that have been used in deriving the system matrix
coefficients aij have some equivalence to the methods of [Hanson and Wecksung 1985,
Lewitt 1992], who use basis functions other than pixels to reconstruct the image data.
This equivalence, however, lies only in the interpolation method, as the basis functions
here are the traditional, square pixels, and the interpretation of this approach can only
amount to a modelling of the uncertainty in the emission. Principally, the algorithms
herein do not follow those championing the “blob” (so-called so as to refer to the Gaussian-
like nature of the basis functions) approach in one important detail. This is the final step
of “overimposing” the blob representation into its voxel equivalent, necessary only if your
basis functions are indeed blobs. As is discussed in sufficient detail in appendix C, this
basically involves visiting each blob and computing its contribution to those voxels that
it covers; it is simply a summation of the product of the blob coefficient values and
their functions. It is a necessary step when one adopts the theoretical rigours of using
Gaussian-like basis functions - they must be converted back to their voxel representation
- and hence it is unnecessary if one adopts the physical model of source uncertainty and
retains square basis functions, as is the case here. This last step of over-imposition actually
effects a Gaussian-like (dependent, of course, on your choice of basis function) convolution
of the image, and, with the inclusion of my experimental results, I can conclude that is
makes neither theoretical nor practical sense. The results of this step are demonstrated in
figure 5.7.
5.3 Cited Methods of Extending the System Matrix

We are now aware that the system matrix enables the reconstruction process to fully
incorporate whatever information is available to describe the acquisition process. This
typically includes the ability to describe aspects of tracer time decay, detector efficiency,
scatter and random contributions, noise, and resolution. Beyond uncertainties in the
emission source, we have ignored such contributions in the definition of our model, taking
only geometry as a definition of the acquisition process. Although this in itself would
5.3. Cited Methods of Extending the System Matrix 55
Reconstruction based on a 3.43 mm Reconstruction based on a 5.145mm

(2 pixels) FWHM interpolation kernel. (3 pixels) FWHM interpolation kernel.
Reconstruction based on a 6.86 mm Reconstruction based on a 8.575 mm

(3 pixels) FWHM interpolation kernel. (5 pixels) FWHM interpolation kernel.
Figure 5.6: The above figures show reconstructions using different system matrices. The images
are 256-by-256 pixels in size, with each pixel being 1.715mm2 . The sinogram data had a dimension
of 288-by-144 (288 angular samples; 144 views). Each system matrix assumes a different FWHM
for the PDF’s centered at each pixel. This has an equivalence in using blobs as basis functions
instead of voxels (see text and appendix C). The overall effect obviously alters the probabilities for
an emission occurring at pixel k being recorded in the detector pair indexed by i (see section 5.2.1).
The variation is demonstrated by these figures, where the FWHM is in the range from 2 pixels
(3.43 mm) to 5 pixels (8.575 mm). The reconstructions used the Ordered Subsets EM (OSEM)
algorithm [Hudson and Larkin 1994] with 16 subsets, run for 20 iterations.
be essential for the development of a generally applicable reconstruction algorithm, for a

more comprehensive model we would do better to follow the guidelines of [Mühllehner and
Colsher 1982], who state the probabilities to be dependent on the following factors:
• Positron range and variations in angular separation from 180 degrees.
• Probability of attenuation.
• The spatial energy resolution of the detectors.
• Detector efficiency and dead time.
Voxel Basis Function Image Blob Basis Function Image
1
0.8
0.6
0.4
0.2
00 1 2
Radius 6.86 mm
Figure 5.7: This figure shows the results of adhering to the conviction that PET images are
best reconstructed from “Blobs” [Lewitt 1992]. The image on the left shows the reconstruction
resulting from the use of spherical basis functions (a modified second order Bessel function, radius
= 6.86mm, α = 6.72351; see equation C.5) in the interpolation scheme used to define the system
matrix. Its profile is shown below the two images. If one insists on using these basis functions to
represent the image distribution, then the image on the left represents only the coefficients of these
basis functions. The image on the right is the result of converting these coefficients into their basis
function representation to be displayed on a pixel grid. That is, citing no physical justification
to the method - beyond that of the world being constructed from natural structures - theoretical
consistency requires this “over-imposition” step. For each coefficient, this requires a summation of
the term over its basis function, which being spherical has the blurring like effect demonstrated in
this figure. By desiring instead to reconstruct pixels within a system that models the uncertainty
at the emission source, this step is unwarranted.
• The position of the j th pixel relative to the ith detector pair.
With the exception of detector deadtime, these factors are independent of the source
activity. That is, techniques for calibrating PET systems in which the above factors have
been accounted for should, in most cases, be available on commercial PET systems. For
example, calibration for detector efficiency is performed on a routine basis and corrections
for attenuation are taken from measurements obtained using a rotating external transmis-
sion source. And even the deadtime correction factor can usually be calculated according
to a procedure supplied with the scanner. Currently, and not unreasonably, only the ge-
ometric relationships, and to a lesser extent the spatial source uncertainty, are included
in the present implementation. I say “lesser extent” because I am not sure that this ap-
5.3. Cited Methods of Extending the System Matrix 57
proximation actually manages to capture this attribute in its entirety; this, I feel, would
involve extensive phantom studies. Still, it is clear that various corrections must be made
if the wish to approach a more realistic model of the acquisition is necessary. For example,
ignoring for now the use of the deadtime correction factor of equation 4.6, we see that
[Sastry and Carson 1997], follow a suitably comprehensive approach defining their system
matrix in the following manner:
aij
Cij = , (5.5)
Ai Ni
where aij relates the geometrical relationships between the image and imaging device;
Ai is an attenuation correction factor; Nj is a normalisation correction factor for variant
detector efficiencies; and scatter and randoms are also incorporated into the reconstruction
process following the model of equation 4.5. Although detailed, this implementation would
not be more inefficient to implement than the method currently used. The matrix stored
to disk need only be the aij elements of equation 5.5, which would then be adjusted for
according to a patient dependent value, Ai , and the routinely derived scanner dependent
normalisation factor, Ni .
Knowledge of the scan time is also an important element of the acquisition model,
primarily because the rate of decay can vary considerably within this period. [Ardekani
et al. 1996] capture this relationship by using the following model:
Cij = τ −1 (1 − exp(−τ T ))aij , (5.6)
where T denotes the duration of the scan, and τ is the delay constant of the radionu-
clide.
As these examples show, the majority of the requisite information, however, must come
from the manufacturers of the scanner, and this is currently not to hand. The scanner used
by the PET group associated to our institute (PRT2 - Positron Rotating Tomograph) was
a prototype of the commercial ECAT ART (Advanced Rotating Tomograph) scanner, and
the possibility of applying known dead-time, scatter contribution and random estimates,
detector efficiencies and the like were not possible. Furthermore, the scanner is unable
to operate in list-mode 5 , hence these values would be difficult to estimate. Regardless,
because of its importance, the theory is documented in the following, as when specified
accurately, the approach will undoubtedly provide the more accurate reconstructions of
the data.
5.3.1 An Efficient and Accurate Stochastic Model

It is important to note that the system matrix only really relates to the Radon transform
in the case of true events. Clearly, as scatter is dependent on the density of the material
along the photon path, such effects should either be modelled in the system matrix or
corrected for before a Radon transform based matrix is used. As [Leahy and Yan 1991]
suggest, a reasonable compromise between accuracy and computational complexity is to
model scatter with variant convolution processes [King et al. 1981, Bergström et al. 1983].
5
List-mode is a mode of scanner operation that outputs all available information acquired during the
scan. For example, it gives the detector pairs for each registered event (which would later be converted
into the corresponding (ρ, θ) indexed sinogram bin), for each window. Random events are recorded and
given separately.
Random events are another issue as their behaviour differs from those that are either
true or scattered. Their rates scale quadraticallly with the source activity, contributing,
therefore, an element of nonlinearity to the system. To account for both of these effects
would require the faithful implementation of the acquisition model given in section 4.2.1,
for example. Such an approach is adopted by [Leahy and Yan 1991] and his co-workers
who maintain the off-time random events as an additional set of Poisson data of mean
q̄i (see [Lange and Carson 1984]). Additionally, the modelling of scatter separate to the
trues, allows for the following model (see also [Mumcuoğlu et al. 1994, Mumcuoğlu et al.
1996]):
X
E(yi ) = Atij λj + Asij λj + qi , (5.7)
j
where Atij is the probability matrix for true events, and Asij is that for scattered events.
Unfortunately, sufficient knowledge that would allow the description of all the afore-
mentioned properties is firstly not available, and secondly would anyway yield a system
matrix of unwieldy proportions raising the question of its practicality. Subsequently, this
work opts for a simpler model, and an approach that addresses the use of sparse matrix
routines. However, we will close the remainder of this section by sketching out how these
various acquisition characteristics could be incorporated into the system matrix. The price
payed in employing these techniques is that the more accurate our matrix, the less sparse
its structure, and hence the possibility of reducing the matrix size with, for example, the
method given in section 5.4 below, is reduced.
As such, the only relevant literature is that which shows how the system matrix can
be handled relatively efficiently through the factorisation of its elements into a product of
probabilities, each due to one of the above listed causes. Kearfott, for example, writing in
[Vardi et al. 1985], adopts just such an approach:
aij = atrue
ij + arandom
ij , (5.8)
where atrue
ij and arandom
ij are probabilities of detecting true and random coincidence events,
and,
atrue
ij = ageom
ij f atten f wob f sens f ran f dead , (5.9)
where ageom
ij depends only upon the PET and reconstruction matrix geometry and the
f ’s represent corrections for attenuation [Huang et al. 1979], wobble, relative detector
sensitivity, randoms [Hoffman et al. 1981], and deadtime, respectively. Just why the
model accounts for randoms in both equations (the arandom
ij term in 5.8, and the f ran
term in 5.9) is not clear.
Another method is given by [Mumcuoğlu et al. 1994], who describe their I × J system
matrix with the following:
aij = f detectors f atten ageom

ij f positron , (5.10)
where f detectors is a banded I × I matrix that may include factors related to the
detectors, including intrinsic and geometric sensitivity, crystal penetration, inter-crystal
scatter as well as system deadtime; f atten is a diagonal I × I matrix that contains the
5.4. Undesirable Characteristics of the System Matrix 59
attenuation correction factors; ageom

ij is a I × J matrix that contains the probabilities of
detection in the absence of an attenuating medium; and f positron is a J × J matrix that
can model the effects of positron range [Derenzo 1986]6 .
The forward projection scheme used in the reconstruction process involves multiplying
the [current estimate for the] image vector by each of these matrices in turn. Back-
projection subsequently requires multiplying the sinogram vector by the transpose of these
matrices in the reverse order.
5.4 Undesirable Characteristics of the System Matrix

Primarily as a result of the near co-linearity of the measurements (the LORs), the system
matrix is near singular, and the linear algebra formalism based on the matrix’s inversion
is subsequently an ill-conditioned one. Furthermore, as a result of the invariant cover-
age of the projections in the image domain, some of the regions in the image will be
under-determined, others over-determined. In the case of an over-determined region, the
likelihood is that the solution is inconsistent in the sense that there exists no solution for
λ = A−1 y.
Additionally, the matrix is very large and highly sparse. For example, to reconstruct
a 2-D image of 1282 pixels from a sinogram of 192 × 160 observations requires a matrix
containing around 600 million elements (i.e., 1282 × 192 × 160 elements), which amounts
to solving for 1282 unknowns from 192 × 160 equations. If each element is stored as type
float, then the matrix will take up about 2Gbytes of memory. Of these, perhaps less than
1% are non-zero; the result of only a few pixels having non-empty intersections with each
particular ray. This may be dealt with by storing only the meaningful matrix elements
(see section 5.4.1), but this does not, however, address the poorly conditioned structure
of the data. Sensible preconditioning is, therefore, usually appropriate, of which perhaps
that given by [Mumcuoğlu et al. 1994] is the best.
5.4.1 Exploiting the Sparse Nature of the System Matrix

Accuracy must be traded with computational overheads in deriving a reconstruction solu-
tion; the size of the system matrix must be suitably confined, and this confinement yields
the trade-off between the sparseness of the matrix and the exactness with which it is able
to describe the tomographic process. And this is an additional reason why, for the time-
being at least, it is sensible to allow a high-degree of sparseness in the system matrix such
that the vector operations that are implemented may exploit this, allowing for a reduction
in the computational overheads.
One current method calculates the matrix A once for the given scanner geometry7 .
However, a vast compression of the storage requirements can be achieved simply by storing
only the non-zero elements of the system matrix. This is a form of run-length encoding
with which it is possible to reduce this data structure to within that of the capabilities of
a modern workstation’s primary storage (i.e., memory virtual and physical). Indeed, this
has now been achieved [Kehren et al. 1999].
6
Although not given here, a factor that is perhaps of secondary importance, intercrystal penetration,
can also be included [Mumcuoğlu et al. 1994].
7
We are here simplifying the problem by ignoring contributions due to factors such as efficiency and
attenuation. These were dealt with in section 5.3.
Implementing this just involves the setting of small values below some arbitrary thresh-
old to zero. That is, for some threshold value γ, the following is done,

aij if aij > γ,
ãij = (5.11)
0 otherwise,
where, for example, γ = 0.0005 × maxij (aij ), ∀i, j. Using this scheme, the number of
non-zero elements of A, denoted Zi , are stored as a 1-D vector, aiz (∀i, z ∈ 0..I − 1, 0..Zi ).
If all elements of A were nonzero, then Zi would equal J (the number of pixels; M ×N ). It
is assumed, however, that Zi is significantly smaller than J. Enough, that is, to justify the
slight overhead required to associate the stored elements to those of the original matrix
A with an additional indexing system. This is shown in appendix A, and the algorithms
are given in sections A.2.4 to A.2.6 (with aiz ⊂ aij , Zi J, ∀i ∈ 0,.., I − 1), where this
index requires an additional 1-D vector for each of the ai . The vector must also be of
length Zi , and its values correspond to the addresses of the column indices that relate the
aiz to the aij . Care should also be taken in assigning the threshold appropriately. Too
high, and the result effectively samples the likelihoods too infrequently, causing rather
erroneous patterns to form in the reconstruction. Too low, and the advantage from using
this method might be lost.
On-The-Fly Calculation of the System Matrix

Most mention in the literature regarding the calculation of the system matrix coefficients
is of on-the-fly implementations. That is, no system matrix is actually stored, but the
individual values are calculated on an as-needed basis. This obviously results in repeated
calculations, but is very advantageous in terms of storage overheads; for fully 3-D recon-
struction, such methods are essential.
In this lazy evaluation of these constant terms, one must, however, be careful to ensure
that the system matrix values are correctly normalised. Errors in these values can to some
extent be corrected in the algorithm, as is discussed in section 7.5. As is also discussed in
this section, such adjustments are not accurate, and it is necessary, therefore, to do the
proper normalisation. But to allow this, it is not only necessary to know for each LOR
if the current pixel is intersected, it is also necessary to know what other LORs would
intersect this pixel and to what degree. A vast number of calculations can be avoided
simply by calculating these J normalisation terms off-line and saving them to be read
from disk. This is what is done in my implementation.
6
Implementing PET Image
Reconstruction in its Algebraic
Form
This chapter introduces the basic implementation details of algebraic methods of image
reconstruction. It closes by showing how an example algorithm is implemented to solve this
system of linear equations that resulted from the discussions of chapter 5. A later chapter
is left the task of describing the more commonly used ML-based approach, which serves
also to introduce methods of regularisation required due to the ill-conditioned nature of
the reconstruction process. The discussion in the succeeding chapters subsequently turns
to the use of a priori or contextual models, and also the implementation of a multiscale
approach.
Returning to the theme of this chapter, however, we present image reconstruction in
its algebraic form, and thus introduce a number of considerations involved in choosing an
algorithm suitable for solving a large system of linear equations. In short, this chapter
introduces us to the non-optimised approaches of SIR.
6.1 Solving Simultaneous Equations of this Form

We formulated the reconstruction problem as that of estimating the image distribution
λ, from y on the basis of the system matrix A (equation 4.15). That is, our system of
linear I equations (the number of measurements taken) must be solved for J unknowns (the
number of pixels in our image). If I > J, the system is overdetermined and there are many
solutions from which an appropriate one must be chosen. If, however, J > I, or if J = I
but the equations are degenerate, then there are effectively fewer equations than unknowns
and there can be no unique solution and the system is underdetermined [Press et al. 1988,
Green 1990]. In the latter case, the solution space consists of a particular solution λ̂ added
to any linear combination of [typically] J − I vectors (which are said to be in the nullspace
of the matrix A).
The task of finding the solution space of A using standard approaches from the field
of signal processing might involve:
• Singular Value Decomposition (SVD).
• Linear Least-Squares (the L2norm).

62 Chapter 6. Implementing PET Image Reconstruction in its Algebraic Form
• Methods of Parameter Estimation; ML or M AP estimation.
In selecting an appropriate algorithm, careful thought must be given to the known

aspects of the system. In the application of PET reconstruction, the dimensionality of
our reconstructed image is typically less than that of the sinogram, hence our system is
an overdetermined one1 . As such, there is, in general, no solution vector to our equation
(Aλ = y). It happens frequently, however, that the best “compromise” solution is sought;
the one that comes closest to satisfying all equations simultaneously.
Solutions to this particular problem of parameter estimation are likely to lead to unde-
sirable fluctuations in the solution, particularly when the estimate is based on an L2norm.
Because of this, we say that the system is ill-posed. Furthermore, it is typical that an
unstable solution is related to the ill-conditioned nature of our system matrix (see sec-
tion 5.4). The most straightforward approach to stabilising the solution is by constraining
it. This regularisation is necessary in PET reconstruction, and is found typically in either
the form of a penalising term [Fessler 1994], or through the use of Bayesian priors [Geman
and Geman 1984, Geman and McClure 1985]. We will later apply such techniques, but for
the time being at least, we shall discuss the more common approaches to this particular
inverse problem. Of these, perhaps the most widely used technique is that of SVD.
6.1.1 Singular Value Decomposition

The method of SVD analyses properties of the matrix that defines the system of equations.
The basic idea is to decompose this matrix into a simpler form, such that the equations
are more easily solved. Indeed, most of the classical iterative methods are based on such
matrix-splitting methods [Young 1971]. For the system matrix A ∈ I×J , this would R
involve,
A = UΣVT , (6.1)
R
in which the matrix Σ ∈ I×J is a diagonal matrix with the singular values of A
R
along this diagonal, Σ = diag(σ1 , ..., σJ ), and the matrices U ∈ I×I and V ∈ J×J are R
orthogonal. Normally, the singular values are arranged in decreasing order, σi ≥ σi+1 ,
and the conditional number of A is just the ratio of the largest and the smallest singular
values.
To find the solution vector, we remap the problem of matrix inversion using SVD to re-
move the insignificant singular values of Σ. The resulting matrix Σ+ is better conditioned,
and may instead be used to formulate our inverse problem:
λ = VΣ+ UT y. (6.2)
This approach certainly acts to stabilise the solution, and although this may be
achieved non-iteratively, we have, associated to the method, the large computational effort
required for calculating the eigenvalues and eigenvectors of what are huge image matrices.
On the other hand, this need only be done once, and off-line.
1
We do of course have the freedom to decide on an image resolution appropriate to the quality of our
data; one theme of this research.
6.1. Solving Simultaneous Equations of this Form 63
6.1.2 Linear Least-Squares

Defining closeness to the solution in the least-squares sense, then our overdetermined
problem reduces to a (usually) solvable linear problem. In the linear least-squares problem,
the reduced set of equations to be solved can be written as a J × J set of equations,
(AT A)λ = (AT y). (6.3)
where AT denotes the matrix transpose of A. The above equations are called the
normal equations of the linear least-squares problem. To determine the unique solution
of an equation of the form y = Aλ by Gauss elimination and the like, requires that A
is a square matrix and det(A) 6= 0. For a square matrix with det(A) = 0, the matrix
is singular, and there is no unique solution. In this case, at least one equation is lin-
early dependent and can be eliminated [Strang 1980]. In this manner we can reduce the
number of equations (by elimination), such that we have more unknowns than equations,
in which case the number of solutions becomes infinite, the solution requiring, therefore,
additional constraints. Nevertheless, the method is straightforward, and on the basis of
the acquisition model given in equation 4.5, the estimator can be rewritten as:
(AT A)−1 AT (y − s − r), (6.4)
where (AT A)−1 AT is the pseudo-inverse of A, and s and r are I −D vectors containing
the sinogram’s scatter and random contributions, respectively. This is then solvable in an
optimal sense under the assumption that the signal is Gaussian distributed [Sivia 1996].
As we shall see in discussing implementations for PET reconstruction, however, the direct
solution of the above normal equations is not necessarily the best way to find a least-
squares solution [Press et al. 1988]. For example, the approach is impractical for large A,
and its conventional application may yield negative estimates for λj [Fessler and Ollinger
1996], which is physically unrealistic.
6.1.3 Methods of Parameter Estimation

Maximum Likelihood (ML) Estimation
One estimator that allows us to assume signal distributions other than that of the Gaussian
is the Maximum Likelihood Estimator (MLE). When applied to such a system of equations,
this estimator yields its greatest value for the likelihood of the observed data, y, given an
estimate for underlying distribution, λ, that best fits the chosen distribution model. This
is denoted p(y|λ). For practical reasons, it is usually assumed that the observed data
are independent 2 , and as such the PDF for all the observations is based on the joint
distributions of each. Consequently, their conditional PDF reflects this:
I−1
Y
p(y | λ) = p(yi |λ). (6.5)
i=0
The solution for the MLE is a question of function optimisation, aiming, that is, to
find the λ for which the first derivative of equation 6.5 is zero; i.e., the values for which
the estimator is maximal. As such, if we know the form of y’s distribution, and its
2
Strictly speaking, we are actually using a pseudo-likelihood estimator.
64 Chapter 6. Implementing PET Image Reconstruction in its Algebraic Form
relationship to λ (i.e., via the system matrix), then we may derive an estimation for our
activity distribution that better reflects the physical process. “Better” in the sense that by
assuming that the data is Poisson distributed, we thus more accurately model the process
of radioactive decay inherent to the PET imaging process. On the other hand, if we assume
that the data follows a Gaussian model, then the MLE can be shown to correspond to
the least-squares estimate. Under the assumption that the Poisson distribution is more
appropriate, the reconstruction methods will mainly be developed based upon ML-based
solutions solutions.
Maximum A Posterior (MAP) Estimation

Unfortunately, being applied to an inherently nonparametric estimation problem, ML
solutions require regularisation [Grenander 1978]; the estimations being derived from a
continuously valued space. Hence, when a prior model is used (taken, for example, from
associated MRI data), we reformulate the reconstruction approach as Bayesian. This
basically allows us to regularise our ML estimate using a priori information about the
solution, resulting in a MAP estimator for the reconstruction.
The parameters to be estimated can be thought of as amounting to a hypothesis of the
data, with a likelihood function telling us how well the model (i.e., the hypothesis) explains
the observed data. Bayes’ theorem “inverts” our previously seen estimator, allowing us
instead to derive our quantity of interest: the probability that the hypothesis is true given
the data [Sivia 1996]. That is, we relate what we want to know (the probability that our
estimate of the tracer distribution is true given the observed data), to what is probably
known (i.e., to a value that is more easily assigned - the probability that we would have
observed the measured data if our estimate of the tracer distribution were true). The
former is the posterior probability, and the latter is the likelihood function, respectively.
Bayes’ theorem is formally expressed as:
p(y | λ) × P (λ)
P (λ | y) = , (6.6)
p(y)
where the term on the left hand side of the equation is our posterior distribution,
p(y | λ) is our likelihood function, P (λ) our prior distribution, and p(y) is, in our problem
of parameter estimation, a normalisation factor sometimes referred to as the evidence
term. In the above we make the distinction in the above between P and p, with the former
indicating a probability, and the latter a [possibly continuous] distribution function.
This approach closes a brief introduction to methods that are to hand. It should
already be clear that it makes better sense to adopt methods inferring solutions based on
a more accurate modelling of the acquisition system. If not, then this will certainly be
clarified by a more detailed discussion of the theory (chapter 7), as these methods form
the basis to the algorithms developed in this project.
6.2 The Algebraic Reconstruction Technique (ART)

Perhaps the most straightforward method for finding a solution for λ̂ is by using the family
of Algebraic Reconstruction Techniques (ART), originally formulated by [Kaczmarz 1937]
as a general purpose process for finding the solutions to systems of linear equations. This
6.2. The Algebraic Reconstruction Technique (ART) 65
method constitutes a common series expansion approach, and by not modelling the actual
statistics of the image distribution, it is not a statistical method of reconstruction.
On the basis of an initial estimate, the ART techniques update each image intensity
along each LOR by a factor that compensates for the discrepancy between the measured
ray sum and the calculated ray sum. The general algorithm may be summarised as the
following [Gordon 1974, Censor 1983]:
yi − aTi λkj
λk+1
j = λkj +η · ai , (6.7)
aTi ai
for all j ∈ J, with k denoting the iteration number, η representing a relaxation term
used to avoid over-damping around a solution, and the choice of i at each iteration may be
determined from a number of different schemes. It is therefore implemented as an iterative
algorithm, where the solution vector in iteration k is updated by adding a scaled version
of row i from the system matrix. Exactly how this algorithm is implemented is given in
section A.3 of appendix A.
7
The Expectation-Maximization
Algorithm
This review covers the theory behind the ML-based Expectation-Maximization (EM) algo-
rithm and its usage in PET image reconstruction. It aims to clarify ambiguous aspects of
the theory, which thus may be used as separate reference material during the development
of Bayesian or multiscale methods. This latter approach extends the normal EM, requiring
in fact, an additional expectation step; a theme presented in section 11.1.3.
In this chapter, the EM algorithm will be presented as a method for using the [incom-
pletely] observed data (y) to obtain a ML estimate of the parameter vector λ. Its original
derivation is given in its generalised form [Dempster et al. 1977], and this leads ultimately
to equation 7.33, an iterative algorithm for PET reconstruction.
7.1 Introduction
The EM algorithm was adopted by the PET reconstruction community for two reasons.
The most important of these was its use of the ML estimator, and its subsequent ability
to better model the underlying physics of the acquisition process; that is, the Poisson
process which describes the counting of the events in the detectors. The Poisson nature of
the data is thus readily incorporated into the reconstruction procedure, as shown in the
method of section 6.1.3, using the distribution of equation 4.7.
The second reason was its attractive optimisation properties: positivity constraints are
naturally imposed, and the objective function is convex (this is shown in section 7.4.3). The
problem of PET reconstruction maps well to this EM framework where it makes good sense
to recast our data into a larger, more manageable data-space for the purposes of optimising
an ill-conditioned data set (such simplifications are common in optimisation problems on
the basis of finding a simpler solution). Indeed, due to attenuation, scatter, inter-crystal
penetration and gating, we really do have an incomplete-data set with which to work,
and hence the later proposals of section 11.1.3. This final reason for using this algorithm
implies that it maps even better to the envisaged performing of PET redistribution on the
basis of the observed PET data set completed by an associated MRI scan; although not
necessarily within the Bayesian framework that was introduced earlier (section 6.1.3).
In [Vardi et al. 1985], Kearfott summarises the advantages of ML reconstruction over
FBP reconstruction:
1. Statistical noise is reduced without excessive smoothing;

7.2. A General Description of the Expectation Maximization Algorithm 67
2. Noise is less correlated;
3. Non-negative values are guaranteed in the final reconstruction (matching physical

reality);
4. The number of total counts is preserved (again matching physical reality);
5. Edges of the objects are more clearly defined and the noise outside the object is
eliminated;
6. A statistical measure of the goodness of reproduction is provided;
7. Nonuniformities in the response over the FOV - for example, variable resolution and
effective plane thickness - may be accounted for (this is the case for all statistical
techniques);
8. The subjective aspects of the reconstruction are eliminated (smoothing and edge en-
hancement are unnecessary, which implies the redundancy of the restoration meth-
ods);
9. Missing detector data should not produce artifacts or severe degradation;
10. Estimates of regional uncertainty in the measured data, desirable for error analysis
of PET physiological models, might be easily obtainable.
Of the points above, it is interesting to relate some of these to our objectives. The
first two points imply the all-desirable better conditioning of the data, which might, for
example, enable the more successful application of image restoration techniques. Most
importantly, points 6 and 10 are fundamental to our later intentions of redistributing the
PET data in accordance to an associated MRI data set, which we will return to later in
this report. This latter aspect is also captured in the interpolation model of section 5.2.2.
Firstly, however, to follow the use of the procedure introduced separately for PET
reconstruction by [Shepp and Vardi 1982] and [Lange and Carson 1984], requires looking
with sufficient detail at the generic EM algorithm.
7.2 A General Description of the Expectation Maximization

Algorithm
The EM is often used for ML estimation in situations where the estimate for the given
data cannot be derived in a straightforward manner. Such situations arise when the data
can be described as being incomplete.
The notion of incompleteness is difficult to define, but examples are to be found in
all manner of statistical contexts [McLachlan and Krishnan 1997]. For example, in PET
reconstruction ML estimation requires maximising a measure of the chance that we would
have obtained the data that we actually did observe (y), given the activity distribution
(λ) was known, or estimated. Clearly an iterative method of estimation is required to
find a solution, but as far as I am aware, no proof has been offered to show, for example,
that the function has a closed form solution. On the other hand, by coupling the observed
data closer to the parameters to be estimated, we are able to derive a data set for which
the estimation problem complexity is reduced. And this is how the complete data set is
68 Chapter 7. The Expectation-Maximization Algorithm
formed. Following such a step, it can be shown that the MLE of the complete data yields a
convex function [Shepp and Vardi 1982]. In the case of PET reconstruction, the complete
data set is formed by a bridging of the [transformation] gap between the measured data
and the parameters to be estimated in a very tractable manner (see equation 7.34 below).
It is the definition of completeness - the appropriate manufacturing of the complete data
set on the basis of the incomplete, observed data set - that determines the effectiveness of
the algorithm.
The EM algorithm for PET image reconstruction was developed independently by
[Shepp and Vardi 1982, Lange and Carson 1984], and has since been the mainstay of most
subsequent techniques [Vardi et al. 1985, Snyder et al. 1987, DePierro 1991, Miller and
Wallis 1992, Ollinger 1994, DePierro 1995, Fessler et al. 1996]. This is easily justified:
recall, we can firstly formulate the problem as a Poisson regression model, and thus better
model the physics of the acquisition process; and secondly, our data set fits well to the
incomplete model1 . The algorithm consists of two steps [Dempster et al. 1977]. The
first is the Expectation-step (E-step), which seeks to address the problem of missing data.
In general, it aims to account for this by postulating a more complete data set on the
basis of the original, observed data. In more formal terms, it is the log likelihood of the
complete-data problem that is derived in the E-step. But because this estimation must be
based on unobservable data, we instead take the conditional expectation of the complete
data given the observed data and an initial estimate of our parameters. It is therefore
necessary to establish a relationship between the two likelihoods (that for the observed,
or incomplete data; and that for the complete data); and additionally, it is necessary to
estimate these missing values in a manner which would facilitate convergence toward the
most likely (in the ML sense) parameter estimates.
The process of completing a data set can be thought of as one of manufacturing
sufficient statistics. This is done under some constraint that binds the complete data
to the incomplete data, utilising, possibly, the new prior probability estimates for each
observation2 . We enforce a strict relationship between the two data sets by viewing the
observed vector y (the incomplete data), as a function of the complete data, x. Namely,
y = h(x), where h is some function of the complete data. Considering our incomplete
and complete data to be derived from the sets and Y X
respectively, then x is known
X
only to lie in the subset (y), determined by our function h. Hence, we have defined a
many-to-one relationship from X Y
to .
To understand the algorithm requires considering the relationship between the sample
space for your known data and that of the larger space. We assume that the observed
data is a random vector y, which has the conditional PDF p(y|λ), where λ corresponds to
the parameters to be estimated. To maximise p(y|λ) with respect to λ, the EM algorithm
proceeds by embedding the sample space of y into a larger space where the problem is more
solvable. The algorithm forms the complete data, x, such that y is a function of x (that
is, y = h(x)). The random vector x is defined to have the density function p(x|λ) with
respect to some measure µ(x). On this basis, the relationship that allows us to recover
p(y|λ) is,
1
Incomplete can also be thought of in the sense of unrecorded annihilations [Fessler and Karp 1997],
or with respect to the general desire to redistribute the PET data to a higher resolution (see, for example
[Knorr et al. 1993])
2
Without such knowledge we are forced to apply uniform (or “uninformative”) priors, and thus proceed
in an entirely ad hoc fashion.
7.3. The EM Algorithm as Introduced by [Dempster et al. 1977] 69
Z
p(y|λ) = p(x|λ)dµ(x). (7.1)
X(y)
The retrieval of our observed data’s density function is, therefore, to be had via a
bounded integral on p(x|λ). In the present [imaging] context, all density functions are
discrete, so we have,
X
Li (λ) = p(y|λ) = p(x|λ), (7.2)
X(y)
where Li (λ) denotes the likelihood function of our measured, incomplete data given λ.
To summarise these two steps then,
• The Expectation Step - The E-step of the algorithm builds the log likelihood for
the complete data (log Lc ()) problem using the observed data set and the current
estimation of the parameters, denoted λ. This is manufactured on the basis of
the conditional expectation of the complete data, given the observed data and the
current estimate of the unknown parameters. This is denoted as,
E(log Lc (x)|y, λ). (7.3)
• The Maximisation Step - In the M-step, the conditional expectation of equation 7.3
is maximised with respect to λ in order to derive new estimates that should improve
the solution. Namely:
argmaxλ [E(log Lc (x)|y, λ)]. (7.4)
At this point we are assuming that the complete data has been chosen in such a
way that computing the ML estimator of λ from the complete data is simpler than
solving for Li (λ).
Outlining the Remainder of the Chapter

Having conceptually covered the workings of the algorithm, it is now necessary to substan-
tiate these ideas with an example, after which the reader might be well advised to re-read
the above. The next section therefore gives that presented in the paper from [Dempster
et al. 1977]. As will be later shown, this example can be very easily adapted to the aims
of this project (section 11.1.3). In the third section of this report, we review the theory
behind the EM algorithm in PET reconstruction. And later sections hope to combine the
theory of parts 7.3 and 7.4 to extend the use of the EM algorithm in PET reconstruction.
7.3 The EM Algorithm as Introduced by [Dempster et al.

1977]
In the following, I quickly summarise the original example that Dempster, Laird and
Rubin used to introduce the EM algorithm [Dempster et al. 1977]. It is a nice, and
therefore much repeated, example, which has direct relevance to the method presented in
section 11.1.3.
The observed data, y = [y1 , y2 , y3 , y4 ]T = [125, 18, 20, 34]T , is postulated as arising
from a distribution of the following cell probabilities for y1 to y4 , respectively:
1 1 1 1 1
+ ϕ, (1 − ϕ) , (1 − ϕ) , ϕ, (7.5)
2 4 4 4 4
where 0 ≤ ϕ ≤ 1. In this example, ϕ is our [1-D vector of] unknown parameter[s].
Given that the observed data, y, are independent variables, then their PDF follows a
multinomial distribution:
Li (ϕ) = p(y|ϕ) =
y1 y 2
1 1 y3 1
y 4
(y1 + y2 + y3 + y4 )! 1 1 1 1
+ ϕ − ϕ − ϕ ϕ ,
y1 !y2 !y3 !y4 ! 2 4 4 4 4 4 4
which acts as the incomplete data likelihood function, Li (), given that it is conditional
on our prior probabilities.
To illustrate the EM algorithm, we represent y as incomplete data drawn instead from
a five category population, x = [y11 , y12 , y2 , y3 , y4 ]T , having the following probabilities:
1 1 1 1 1
, ϕ, (1 − ϕ) , (1 − ϕ) , ϕ. (7.6)
2 4 4 4 4
That is, the first of the four elements in y has been split into two sub-cells of probabil-
ities 21 and 14 ϕ, respectively. Elements y11 and y12 result from the split, where the solution
to these unknowns is constrained in accordance to equation 7.2. Namely,
X
p(y|ϕ) = p(x|ϕ), (7.7)
x
where p(x|ϕ) is the PDF for the complete data, defined now as the following:
Lc (x) = p(x|ϕ) =
y11 y12 y2
1 1 y3 1
y4
(y11 + y12 + y2 + y3 + y4 )! 1 1 1 1
× ϕ − ϕ − ϕ ϕ ,
y11 !y12 !y2 !y3 !y4 ! 2 4 4 4 4 4 4
(7.8)
where Lc is our complete data likelihood function, and where the summation of equa-
tion 7.7 is over all values of x such that:
y1 = y11 + y12 . (7.9)
7.3.1 Estimating the Maximum Likelihood - the M-step

The EM algorithm actually operates by first producing a ML estimate, which is then
used to derive the missing values of the complete data (the E-step). Admittedly, the
starting ML estimate is arbitrary (positive being a necessary constraint), but the E-step
is impossible without it. The algorithm repeats this process iteratively until whatever
stopping criterion is met, with the last step of the process of course being a M-step. We
define the log likelihood function on the basis of equation 7.8:
7.3. The EM Algorithm as Introduced by [Dempster et al. 1977] 71
lc (x) = log Lc (x) =

y11 y12
(y11 + y12 + y2 + y3 + y4 )! 1 1
log + log + log ϕ
y11 !y12 !y2 !y3 !y4 ! 2 4
1 1 y2
y 3 y4
1 1 1
+ log − ϕ + log − ϕ + log ϕ .
4 4 4 4 4
The ML estimate taken is that which maximises our likelihood function. The first
derivatives of our log likelihood function (where we may ignore the first term as it is a
constant) yields:
dlc (x) y12 (y2 + y3 ) y4
= − + . (7.10)
dϕ ϕ (1 − ϕ) ϕ
Setting the above expression to zero and solving for ϕ̂ml gives,
y12 + y4
ϕ̂ml = . (7.11)
y12 + y2 + y3 + y4
Hence, we need the E-step since y12 is unobservable, and we cannot therefore solve for
ϕ̂ml until we have completed the data (i.e., plugged back into the above equation).
7.3.2 Augmenting the Observed Data - the E-step

The E-step handles the problem of filling in for the “missing” data by averaging the
complete data log likelihood over its conditional distribution given the observed data y.
Retracing our initial steps, we made a starting estimate for ϕ0 such that on the first
iteration of the algorithm (and for subsequent iterations) the E-step uses this estimate to
compute the conditional expectation of lc (x0 ) given y 3 . This is denoted:
Ec (lc (xk+1 )|y, λk ). (7.12)
Hence this step is effected simply by replacing the unobservable data by their condi-
tional expectation given the observed data y, and the current estimate for λ.
An Aside Think of the problem in more practical terms: we have two unknowns
(y11 , y12 ), their prior probabilities ( 12 , 14 ϕ), and a constraint on their solution:
X
p(y|ϕ) = p(x|ϕ), (7.13)
X(y)
X X X
where (y) is the subset of , determined by y = y(x) ∈ (y), which here is simply
y1 = y11 + y12 . The right hand side of this expression can be satisfied with 126 possible
solution pairs: (0, 125), (1, 124), ..., (125, 0), which is obviously of limited effectiveness. To
avoid such an ad hoc estimation approach, we use prior probabilities to constrain these
possible solutions yet further4 . Alone, however, equation 7.13 is simply evaluated by
summing all of these solution pairs with the values for (y2 , y3 , y4 ) remaining as previously
defined (18, 20, 34).
3
In plain English, it means that we take the average value from those which are most likely!
4
In the case of PET reconstruction, were we to follow the procedure given in this example use of the
EM algorithm, then such prior knowledge would only be available in the form of anatomical data.
7.3.3 Getting back to our Algorithm

The conditional expectation of y11 and y12 is derived from a standard probability result:
k Pr (y11 )
y11 = y1 (7.14)
Pr (y11 ) + Pr (y12 )
k Pr (y12 )
y12 = y1 (7.15)
Pr (y11 ) + Pr (y12 )
We are thus able derive the following expressions for our unknowns:
1
0 2 y1
y11 = 1 1 0, (7.16)
2 + 4ϕ
1 0
0 4 y1 ϕ
y12 = 1 1 0, (7.17)
2 + 4ϕ
and hence the need for a starting estimate for ϕ. At each iteration we have,
1 k
k 4 y1 ϕ
y12 = 1 1 k. (7.18)
2 + 4ϕ
7.3.4 Maximising our Likelihoods having Fixed our Unknown Data -

The M-step
The likelihoods between the complete and incomplete data have been appropriately cou-
pled, and having completed the E-step in which we augmented our observed data, the
M-step follows the normal ML estimation procedure to derive ϕ1 given y.
Earlier we derived the result that,
y12 + y4
ϕ̂ml = , (7.19)
y12 + y2 + y3 + y4
hence we now know that,
0 +y
y12 4
ϕ1 = 0 +y +y +y . (7.20)
y12 2 3 4
The iterative process is therefore:
k +y
y12 4
ϕk+1 = k +y +y +y
, (7.21)
y12 2 3 4
k is defined as above in the E-step of equation 7.18.

where y12
7.4. [Shepp and Vardi 1982] 73
7.4 [Shepp and Vardi 1982]

This seminal paper, though brilliant, is not as lucid as it could be, in part because of
the notation. It is, however, original in formulating PET reconstruction within the EM
framework, work done in parallel to that of [Lange and Carson 1984]. The following review
outlines the steps they take to deriving the algorithm, showing a logical progression.
This firstly involves describing the deviation of the ML fixed-point iterative algorithm.
Following this, we derive the same algorithm using the complete and incomplete data
formalism [Dempster et al. 1977]. By showing these to be equivalent, we are assured of a
maximum likelihood algorithm that has the desirable optimisation properties of the EM
algorithm. This is denoted as the EM-ML algorithm.
7.4.1 Problem Definition

• Let λj be the number of emissions in pixel j, which are our parameters to be esti-
mated.
• Assume that once an emission occurs in pixel j, the conditional probability that it
is detected in the ith tube5 is denoted aij , where 0 < I−1
P
i=0 ij ≤ 1, and the “≤”
a
says that once an emission occurs, it is not guaranteed that it will be detected.
• The problem requires estimating the means of these variables:

J−1
X
y i = E(yi ) = λj aij . (7.22)
j=0
• And thus, the EM-based PET reconstruction problem is one of estimating λj , j =

0, ..., J − 1, on the basis of estimates for the means of the yi .
7.4.2 Defining the Algorithm on the Basis of a Poisson Regression Model

Given the Poisson nature of our data, we define our distribution as:
y yi i
yi ' P oisson(y i ) = exp(−ȳi ), i = 0, ..., I − 1. (7.23)
yi !
The likelihood function p(λ|y) is a measure of the chance that we would have obtained
the data that we actually observed (y), if the value of our tracer distribution (λ) was
given as known. Assuming then that the yi are independent Poisson random variables,
the likelihood of obtaining their means, y i , if the image vector is λ, is,
I−1
Y (ȳi )yi
L(λ) = p(y | λ) = exp(−ȳi ). (7.24)
yi !
i=0
To ease the computations, we can take the natural logarithm of this function,
I−1
X
l(λ) = ln(L(λ)) = {yi ln ȳi − ȳi − ln yi !}. (7.25)
i=0
5
Where i indexes our detector pairs, 0, ..., I − 1.
PJ−1
As ȳi = j=0 aij λj , namely the inner product (forward projection) of the system
matrix’s ith row and the current estimate of the image vector, this becomes,
 
I−1
X J−1
X I−1 J−1
X X I−1
X
l(λ) = yi ln  aij λj  − aij λj − ln(yi !). (7.26)
i=0 j=0 i=0 j=0 i=0
We then maximise this log likelihood in the normal manner by setting its derivative
to zero (note that as the last term is a constant it may be ignored). This results in the
basic form of the algorithm:
I−1 I−1
dl(λ) X X aij yi
=− aij + PJ−1 = 0. (7.27)
dλj j 0 =0 aij 0 λj 0
i=0 i=0
With the constraint of non-negativity, we derive the following Kuhn-Tucker condi-

tions6 :
dl(λ)
λj = 0, (7.28)
dλj
where λj > 0, for all j = 0, ..., J − 1, and
dl(λ)
≤ 0, (7.29)
dλj
where λj = 0, for all j = 0, ..., J −1. The first of these two expressions7 , and the assumption
that I−1
P
i=0 aij = 1, allow the derivation of the following iterative scheme:

dl(λ) PI−1 PI−1 aij yi
0 = λj = λj − i=0 aij + i=0 P J−1 (7.30)
dλj a 0λ 0
j 0 =0 ij j
aij yi
λk+1 = λkj I−1
P
⇒ j i=0 P J−1 k . (7.31)
j 0 =0
aij 0 λj 0
Alternatively, if we do
P not properly normalise the system matrix probabilities (i.e., we
can safely assume that i aij ≤ 1), then we would derive the following scheme:
I−1 I−1
X X aij yi
0 = −λk+1
j aij + λkj PJ−1 , (7.32)
i=0 i=0 j0 aij 0 λkj0
From which [Lange and Carson 1984] proposed the following:
I−1
λkj X aij yi
λk+1
j = I−1 PJ−1 . (7.33)
aij 0 λkj0
P
i=0 aij i=0 j0
6
Kuhn-Tucker theorem is a theorem in non-linear programming which states that if a regularity con-
dition holds for f and the functions hj are convex, then a solution x0 which satisfies the conditions hj
for a vector of multipliers λ is a global minimum. The theorem is an extension to the use of Lagrange
multipliers in constrained optimisation for constraints defined by equalities and inequalities [Luenberger
1973, pages 295–296].
7
This expression (equation 7.28) relates to the derivation of a so-called fixed-point iteration scheme
[Luenberger 1973]. This scheme updates the solution iteratively in the direction of its gradient.
7.4. [Shepp and Vardi 1982] 75
7.4.3 [Vardi et al. 1985]’s EM Algorithm; a Different Approach

Although not explicitly declared the following shows from [Vardi et al. 1985] that the
algorithm of equation 7.30 is an instance of the EM algorithm, having the nice property
that l(λk ) < l(λk+1 ), unless λk = λk+1 . The incomplete data is, of course, that which
is observed, yi . The definition of the complete data is defined to be ψij , the number of
emissions occurring in pixel j and detected in the ith tube, and is given in the following:
• In addition to the definitions introduced in section 7.4.1, we define the complete data
as being ψij , the number of emissions occurring in pixel j and registered in detector
pair i; intuitively then, this is a more complete description of the data. These values
can be obtained from each λj according to the probabilities aij , where i = 0, ..., I − 1
and j = 0, ..., J − 1, respectively. Subsequently, the ψij are themselves independent
Poisson random variables (whose means are therefore ψ̄ij ).
• The measured [incomplete] data can now be re-expressed in terms of the complete
data as:
J−1
X
yi = ψij , (7.34)
j=0
where i = 0, ..., I − 1, which is the many-to-one mapping defined in equation 7.2. We

thus use these values to show that the proposed algorithm for ML estimator of the
reconstruction solution is an EM algorithm. It is characterised, therefore, by well
defined and favourable optimisation properties.
• The alternative summation yields the total number of emissions that have occurred
in pixel j and have been detected in any of the crystals,
I−1
X
ψ̄ij = λj . (7.35)
i=0
• And now the estimation problem is again defined as the estimation of the λj , j =
0, ..., J − 1, on the basis of the yi , via the formulation of the ψij and their means.
In accordance to the definition of the EM algorithm, this data may only be observed
through the measured data:
J−1
X
ȳi = ψ̄ij , (7.36)
j=0
and subsequently, yi = J−1

P PI−1
j=0 ψij and λj = i=0 ψ̄ij . It is also assumed that the true
sinogram (void of noise) fits the solution perfectly:
J−1
X
ȳi = aij λj , (7.37)
j=0
for i = 0, ..., I −1, j = 0, ...J −1. Hence, ψ̄ij = aij λj , where ψ̄ij is the expectation value
of ψij . This can only be estimated given the observed data and the k th estimate of the
image, and forms the E-Step: it determines the expected projections based on the current
estimate of the activity distribution, which for the complete data set requires relating these
to each pixel (i.e., to ψij ). This, resulting from a standard probability result, is given as,
k+1 aij λkj yi

E(ψij |yi , λkj ) = P J−1
a 0 λk
j 0 =0 ij j 0 (7.38)
= Nij .
If we assume at each iteration that λk is the true λ, then on the basis of equation 7.35
we estimate the number of annihilations in pixel j through the unobserved data with the
M-step:
I−1 I−1
a λ y
P ij j i ,
X X
λk+1
j = k+1
E(ψij |yi , λkj ) = (7.39)
i=0 i=0 j 0 aij 0 λj 0
which is identical to the right-hand side of the iterative formula of equation 7.30. This
step thus estimates the new values for λ on the basis of the complete data set, and we see
therefore how the E- and M-steps fall together.
This is often termed [perhaps inappropriately] the back projection step, and is used
to relate the relative difference between the estimated and measured projections so as to
adjust the current activity distribution toward the most likely solution. Toward, that is,
the solution where the likelihood is maximal.
Another Proof
We can prove this result the other way by considering the likelihood function of the
complete data set:
I−1 ψ
Y J−1
Y ψ̄ijij exp−ψ̄ij
Lc (ψ) = . (7.40)
ψij !
i=0 j=0
The log likelihood thus becomes:

I−1 J−1
X X
lc (ψ) = ln Lc (ψ) = (ψij ln ψ̄ij − ψ̄ij − ln ψij !), (7.41)
i=0 j=0
which is the same as:

I−1 J−1
X X
= (ψij ln aij λj − aij λj − ln ψij !). (7.42)
i=0 j=0
The expectation value we seek, Nij which was first in equation 7.38, may therefore be
k+1 k
plugged into this equation (i.e., replace all ψij with E(ψij |λj , yi )) to give us:
I−1 J−1
X X
(Nij ln aij λj − aij λj − ln Nij !). (7.43)
i=0 j=0
7.5. The EM-ML Algorithm’s Implementation 77
The maximisation of this function requires its derivation with respect to λj :
I−1 I−1
dlc (ψ) X 1 X
= Nij − aij . (7.44)
dλj λj
i=0 i=0
By setting equation 7.44 above to zero, we are able to see that our best, current
estimate for λj must be:
PI−1
Nij
λ̂j = Pi=0
I−1
, (7.45)
i=0 aij
which is,
I−1
λkj X aij yi
λk+1
j = PI−1 PJ−1 , (7.46)
aij k
i=0 i=0 j 0 =0 aij 0 λj 0
which is essentially
PI−1 the same as equation 7.30 given that our matrix A is properly
normalised (i.e., i=0 aij = 1), or equation 7.32 if this is not the case.
Properties The properties of this equation can be summarised as the following [Kauf-
man 1987]. Firstly, given an initial positive starting estimate (λ0 ), the iterates remain
positive because the values in the equation are always positive. This is certainly desirable
as it matches reality without the explicit check for negative values seen to be necessary in
the ART algorithm (section A.3). Secondly, the model assumes that there is a conservation
in the number of photons emitted and detected, unless otherwise specified Pin the system
matrix. That is, on each iteration after the very first, the following holds: j λj = Ii yi ,
J P
where j indexes the J pixels P and i indexes
P the I lines of response; or, if the aij are not
properly normalised, then i,j aij λj = i yi . Last, and most favourably, the algorithm
converges monotonically to a global maximum of the log likelihood, lc = log Lc (λ)). That
is, the second derivative is of a negative semi-definite quadratic form [Lipinski 1996], and
hence lc is concave:
I−1
dlc (ψ) X 1
2
= − Nij 2 . (7.47)
d λj λji=0
7.5 The EM-ML Algorithm’s Implementation

The implementation of the algorithm can be broken down into four stages, where k is the
current iteration number [Kaufman 1987, Toft 1996]:
1. y f = Aλk−1 . Thus, each iteration requires a forward projection to yield y f .
2. yiq = yi
, ∀i. In this step, the yif may be zero when a certain i corresponds to a line
yif
passing through no regions of activity. This should, therefore, be checked to avoid
divide by zero errors.
3. λb = AT y q . Hence each iteration also requires a back projection.

λk−1 λbj PI−1

4. λkj = j
sj . This normalises the result, where sj = i=0 aij , ∀j.
Accordingly, the algorithm actually involves greater computational costs each iteration
in comparison with FBP. The entire process involves an iterative loop proceeded by an
initialisation step. This is given in detail in section A.4 of appendix A.
Although the algorithm given in section A.4 only demonstrates the implementation of
[Lange and Carson 1984], it is of course possible to adopt the [classic] algorithm of [Vardi
et al. 1985]. That is, we adhere to the proper normalisation requirement ( I−1
P
i=0 aij = 1)
and use equation 7.30 as the basis for our algorithm. It P seems more sensible, however,
to acknowledge that some emissions go undetected (i.e., i aij ≤ 1). My experience to
date with these methods is that the main variation is due to the energy conservation
assumption not being met (they are way out). I have now settled on the use of the [Lange
and Carson 1984] algorithm, ensuring however that the matrix coefficients are normalised.
Typically inexact normalisation is assumed, and each day a calibration measure is taken
such that an adjustment factor may be determined for each of the scans taken that day.
This information we do not have, and the results of this possible variation are shown in
figure 7.1. These indicate, visually at least (qualitively), that the [Lange and Carson 1984]
usage of a correction term is appropriate, but for quantitative results, true normalisation
is required whenever the calibration adjustment is not done.
Closing Words More results of the software developed for this research are shown in
figure 7.2. These are reconstructed from 2-D slices generated from a fully 3-D data set
using the Fourier Rebinning (FORE) method of [Defrise et al. 1997]. Correction for scatter
is done using the method of [Watson et al. 1995]. No effort has been made to return the
statistics of the data to a truer Poisson distribution, as is shown to be beneficial to the
OSEM algorithm in [Comtat et al. 1998]. Hence it is fair to say that the reconstructions
shown in this figure are probably suboptimal.
We close and introduce the next chapter with a quote from [Leahy and Qi 1998]:
Since there is no closed form solution of the ML problem, solutions are com-
puted iteratively. ... In cases where the data are consistent [in terms of knowns
and unknowns?], ART will converge to the solution of equations y = Aλ. In
the inconsistent case, the iterations will not converge to a single solution and
the properties of the image at a particular stopping point will be dependent
on the sequence in which the data are ordered. [The] EM solution is slow to
converge and the images have high variance. The variance problem is inher-
ent in the ill-conditioning of the Fisher information matrix. In practice, it is
controlled in EM implementations with early stopping rules, post smoothing,
sieves[, and the like].
Alternatively, I have found this to be best controlled through a more appropriate

interpolation method (see section 5.2.1).
7.5. The EM-ML Algorithm’s Implementation 79
ML-based reconstructions for the 1st, 16th, 31st and 48th slices of a 63 slice volume.
ML-based reconstruction without the proper normalisation of the system matrix values.
FBP reconstructions for the same 1st, 16th, 31st and 48th slices of the 63 slice volume.
Figure 7.1: The above figure shows the results of maximum likelihood reconstructions. Those in
the first row use the [Lange and Carson 1984] algorithm; in the second row are those using the
same algorithm, where on this occasion the correct normalisation of the system matrix coefficients,
aij , are ensured. And the final row shows FBP results using the ECAT 7 software for comparison.
Groningen FBP reconstruction to 128x128 pixels OSEM (8 subsets) reconstruction to 128x128 pixels
(3.43x3.43mm). (3.43x3.43mm). FWHM = 3.43mm, 5 iterations.
OSEM (8 subsets) reconstruction to 256x256 pixels OSEM (8 subsets) reconstruction to 256x256 pixels
(1.715x1.715mm). FWHM = 3.43mm, 5 iterations. (1.715x1.715mm). FWHM = 5.145mm, 5 iterations.
Groningen FBP reconstruction to 128x128 pixels OSEM (8 subsets) reconstruction to 256x256 pixels
(3.43x3.43mm). (1.715x1.715mm). FWHM = 5.145mm, 5 iterations.
Figure 7.2: The above figure again shows the results using my ML-based reconstruction methods,
but this time reconstructing the images to 256-by-256 pixels. These use the OSEM algorithm run
for 5 iterations with 8 subsets, presented in a consecutive manner. The pixel dimensionality is
1.715mm for the 256-by-256 pixels, and 3.43mm for the 128-by-128 pixel images. The FWHMs
used are therefore 3.43mm (2 pixels) and 5.145mm (3 pixels).
8
Characteristics of the
Reconstruction Process
This chapter introduces the main limitations of the algorithms thus far presented, serving
therefore to give context to the formulation of the real thesis work and the remainder
of this report. As much of this work will involve investigations into the performance of
the ML-based reconstruction algorithms, and then to MAP estimates within the Bayesian
framework, the aim of the chapter is to look at the artifacts of the likelihood algorithm.
Following the implementation, these characteristics have either surfaced as a result of
experimentation, or have been gleaned from the considerable volume of literature covering
this topic. The resulting need for regularisation is not questioned; only how it this is
done and on what basis it is done need be scrutinised. In short, one can either attempt
to improve the reconstructions using the measured PET data alone, or one can seek
additional constraints, from, for example, associated MRI data. Of these, the former can
only make limited assumptions a priori regarding the distribution, such as homogeneity
(or “piecewise-continuation”). The latter approach attempts to compartmentalise the
distributions and assign relative activity levels to each, which is again at odds with the
notion of resolution recovery.
One approach to improving the reconstructions on the basis of the PET data alone is
given in the following sections, where penalisation perms are introduced to regularise the
solution. These typically penalise “roughness”, or some suitable measure of noise distor-
tion. The penalised approach of section 8.2 leads to an associated form of regularisation
that smoothes the image solutions. The extension to this method is to adjust the degree
of smoothing in accordance to additional structural information supplied from MRI data.
This will be our first example of how data from an additional, higher resolution modality
may be used to improve the PET reconstruction. The methods presented here are sup-
plied with the same information as that available to the later Bayesian methods, but, as
we shall see, the Bayesian methods become increasingly sophisticated in how the prior
knowledge is applied. The methods of this chapter provide in many senses quick-and-less-
dirty-than-pure-ML solutions to the reconstruction problem, albeit whilst showing some
neglect toward the MRI data’s potential. Nonetheless, they serve to nicely bridge the gap
between the previous chapter’s rather raw application of the likelihood estimator and the
alternative direction of chapter 9, which is where we demonstrate more effective use of the
a priori information from MRI data.
82 Chapter 8. Characteristics of the Reconstruction Process
8.1 Criticisms of Statistical PET Reconstruction

It is the case that the results of PET reconstructions will, in comparison to other tomo-
graphic imaging modalities (for example, CT, or MRI), be noisy and of low resolution. The
reasons for this are predominantly due to the low dosages of radioactive isotope that may
be safely administered to the patient resulting in a weak and noisy signal. Compounding
this is the relatively poor signal response of the detectors themselves, requiring therefore
that individual detectors must cover sufficiently wide physical areas in order to improve the
counting statistics of the acquisition. This loss of spatial resolution not only restricts the
potential for resolving detail in the distribution, but further lessens the appropriateness
of analytical methods of reconstruction; we are left with a less continuous (i.e., sparser
sampled) discrete Radon transform, rendering the use of its inverse for PET reconstruction
inappropriate (see section 4.1). In respect of the statistical methods of reconstruction, less
measurements (equations) are made available to our already ill-conditioned linear system.
To improve the stability of this algebraic system we are left with little choice but to regu-
larise the system, as otherwise the ill-posed nature of the process will lead to unacceptable
levels of noise in the resulting images. Hence the statistical methods are by no means
perfect, encountering many criticisms of their own.
The primary complaint of any statistical method of reconstruction must be that of
excessive computational costs. This we acknowledge, but it is not the real issue here.
Thanks mainly to increased computational capabilities, but also to clever and efficient
implementations (see [Hudson and Larkin 1994], and more recently [Kehren et al. 1999]),
statistical methods can now be used routinely in the clinical environment. Concern instead
must concentrate on how the ill-conditioned nature of emission tomography effects the
result of ML-based estimation. Most important being how the ML estimates, as they
become more refined, become excessively noisy.
Being a nonparametric estimation problem, such effects are likely [Grenander 1978],
and the wrong form of variance is increased. A pseudo over-fitting to the measured data
occurs to progressively worsen the final image. After a certain point, an increase in the
value of the likelihood function also increases the distortion due to high-frequency noise,
the origin of which is uncertain, although the interpolation method used to relate pixel cites
to LORs might be one source (see section 5.2.2). This is the start of over-convergence,
and where such deterioration begins depends upon a number of factors and is all but
intractable. The effect is shown clearly in figure 8.1, where the starting estimate was the
back-projected image. There was little change in the solution after 40 iterations.
This problem of over-convergence has been referred to as the unattractive closed form
of the EM-ML algorithm (in the sense that although the function is convex, the optimisa-
tion process will, on convergence, fit noise); occurring, it is said, because the probability
distributions of the precorrected measurements that form the basis of pure likelihood-based
methods are unknown (they no longer fit the Poisson regression model of equation 4.7,
which, as we will see relates to Fessler’s rejection of the Poisson model (section 8.2.1)).
This basically says that unless we make additional assumptions on the distribution, then
given the validity of the Poisson model, the likelihood function alone is too ill-posed. We
can define a number of sensible assumptions regarding the activity distributions, and the
reconstruction procedure becomes one of maximising a combination of the likelihood func-
tion and its prior distribution, which yields the Bayesian MAP solution [Sivia 1996]. This
approach is introduced in the next section.
8.2. The Need for Prior Information and the Resulting Regularisation 83
Maximum Likelihood Reconstruction after: 1, 2, 5 , and 20 Iterations:
Maximum Likelihood Reconstruction after 40 and 100 Iterations; a Filtered-Back Projected Image; and a Penalised Weighted Least-Squares Result:
Figure 8.1: The images above show reconstructions where the likelihood function is Poisson, and
the EM-ML algorithm described in section 7.5 is used. Note the increase in noise occurring with
increased iteration number. The starting estimate to the reconstruction was the back-projected
image. The sinogram was 80-by-96 and the images are 64-by-64. Images shown are after 1, 2, 5,
20, 40, and then 100 iterations. The images in the bottom right hand corner are firstly a FBP
reconstruction, and secondly the penalised weighted least-squares method using the software of
[Fessler 1997]. Refer again to section 5.2.2 for a method of reducing this high frequency noise.
8.2 The Need for Prior Information and the Resulting Reg-
ularisation
In terms of statistical inference, the ML procedure converges - in the limit - to the true
parameter values as the number of cases approaches infinity [Neal 1993]. But this will
only occur if the stochastic model is correct (i.e., the system matrix accurately describes
the tomographic system), and the assumption of Poisson distributed data is true. 100%
accuracy therefore is in any practical circumstance impossible (how much physics can be
put onto the computer?), and hence the adoption of section 5.2.2’s uncertainty model.
Furthermore, situations where we are able to approach a training set of such magnitude
are unlikely, and the subsequent quality of our estimations based on ever decreasing input
data will suffer. Part solutions for this problem may be found in methods of post filtering
[Snyder et al. 1987], or by implementing some early convergence adaptation through the
development of new stopping rules [Veklerov and Llacer 1987, Llacer and Veklerov 1997,
Hebert 1990, Johnson 1994]. Post-filtering (or the application of sieves), however, may
only effect an effort at recovering from a poorly derived stopping criteria, which applies
also to the use of early convergence methods.
A better addressal of this problem is to choose values to maximise the likelihood
function and a penalty term, where the introduction of this term has the purpose of
steering the result away from an “over fitted” solution (see, for example, [Fessler 1994,
DePierro 1995]). And as [Yavuz and Fessler 1997] put it:
Since image reconstruction is ill-conditioned, usually one includes a roughness

penalty R(λ), in the objective function. From the Bayesian point of view, this
roughness penalty can be thought of as a log-prior for λ.
The following candidate method introduces the regularisation approach based on a
penalisation term. Related to this are the cross entropy methods discussed in section 8.3,
which are basically filtering based approaches to regularisation. These are important,
however, primarily because they introduce to this thesis how associated MRI data can
be used to improve the quality of the reconstruction. This serves as a good introduction
to the Bayesian methods of the following chapter which, it is shown, make better use of
the a priori information than the methods of cross entropy. Important to note in these
discussions, is that with each form of regularisation, there is the possibility of inducing too
significant a bias into the result. Images that may appear nicer, may in fact be masking
true activity that would otherwise have been of interest. There is therefore a desire to
compromise the need for reduced noise and the ability to show variation in the activity.
This we should bear in mind when considering what is presented in the following.
8.2.1 A Brief Review of a Candidate Method

An important advocate of penalised methods is Professor Fessler in Michigan. Fessler’s
group mainly uses an objective function with a penalisation term, where, it is argued
[Fessler 1994, Fessler et al. 1996], that a least-squares objective function is as appropriate,
if not more so, than the Poisson likelihood function. The choice of penalisation function
primarily depends on the model for the emission distribution, for which the Poisson model
has been almost traditionally used. Fessler and co-workers, however, have proposed al-
ternative models for usage in the emission reconstruction community [Yavuz and Fessler
1997]. The view is also shared by [Leahy and Yan 1991], and holds significant credence.
Part of their reasoning is based on the central limit theory’s prediction that the Pois-
son distribution will very quickly tend toward that of a Gaussian for all but the lowest
of densities. But their main argument relates to the manner in which prereconstruction
correction techniques affect the data.
Accurate quantification of activity must include corrections for randoms. The way
in which this is frequently done is based on the assumption that the random events are
additive Poisson variates with known means (see, for example, [Liow and Strother 1992,
Politte and Snyder 1991]). True coincidences will only occur within a given time window
related to the time taken for the photons to traverse the LOR. Randoms, however, do
not necessarily obey this rule, and may occur either ahead of, or after the trues. Indeed,
randoms have approximately uniform variability in the distances that they must cover
in order to be detected. As such, their distribution in time is almost even, and it is
possible to estimate their contribution by employing separate timing windows. Corrected
sinogram bins are subsequently incremented for each event arriving in the “trues” window,
and decremented for each event registered in the “randoms” window. This combination
(increments and decrements) of the two Poisson processes is not, however, a Poisson
process in the strictest sense, and this, coupled with what the central limit theory tells
us, is the justification for a Gaussian model. Furthermore, scatter subtraction methods
will have a similar effect, and detector normalisation procedures are likely also to upset
the Poisson assumption [Furuie et al. 1994]. Such statistical effects of these corrections
- including those possibly incurred due to FORE rebinning [Defrise et al. 1997] - can be
included in the least-squares approach via a “weighting matrix”, as shown in the following.
8.2. The Need for Prior Information and the Resulting Regularisation 85
Adapted to PET from the work in X-ray transmission tomography of [Sauer and
Bouman 1993], the objective function used in [Fessler 1994] is a weighted least-squares
function:
1
(ŷ − Aλ)T Σ−1 (ŷ − Aλ), (8.1)
2
where A is the system matrix, λ is the tracer distribution, and ŷ represents an emission
sinogram that has been precorrected for the effects of deadtime, attenuation, detector
sensitivity, random events, and possibly scatter. The [covariance] matrix Σ is diagonal
with the ith entry, σi2 , an estimate of the variance of the ith precorrected measurement, ŷi .
This is the weighting matrix, whose accuracy is critical to the success of the algorithm as
well as the relevance of using a least-squares approach, as it is here that the aforementioned
statistical effects must be included. If used after FORE rebinning, it is important to know
the effects this procedure has on the data. But, if the input [3-D] sinograms are not
correlated, it is possible to determine precisely the effect the FORE procedure has on the
variance of the input data [Comtat et al. 1998].
The actual implementation of equation 8.1 requires, therefore, the addition of the
smoothness penalty term, thus yielding the following penalised, weighted least-squares
(PWLS) objective function:
1
Φ(λ) = (ŷ − Aλ)T Σ−1 (ŷ − Aλ) + β.R(λ, w), (8.2)
2
where R is our regularisation function acting on our estimates of the distribution, β
is a smoothing parameter controlling the degree of influence that the regularisation term
has on the overall function, and the objective function is to be minimised.
As always, R is chosen to suit the application. For example, penalty terms developed
to preserve edge structure whilst imposing smoothness within uniform regions have been
popular (see, for example, [Bouman and Sauer 1993, Geman and McClure 1985], and
section 8.3 below). In [Fessler 1994], where edge-preservation is deemed unnecessary given
the low count rates of the data, Fessler uses the following quadratic smoothness penalty:
1X X 1
R(λ, w) = wjk (λj − λk )2 , (8.3)
2 2
j k∈Nj
where Nj is the set of eight neighbours of the j th pixel, and weights wij equal 1 for
horizontal and vertical neighbours, and √12 for diagonal neighbours. As the formulation
shows, the nature of the objective function is to penalise increased variance. It is also
proven in [Fessler 1994] that the use of this penalty yields a convex objective function,
Φ. The resulting penalised estimator is identical in form to a MAP estimator for λ in
instances where λ is assumed to be a random vector with the following Gaussian prior
gw (λ) = ρ(w) exp(−βR(λ, w)), (8.4)
where ρ(w) is a normalisation constant dependent only on the weight map w (defined
from equation 8.3), and β > 0 [Hero et al. 1998]. That is, as the variance increases, this
value decreases, and the prior is maximised. It is a Gibbs prior, and the reconstruction of
λ in this manner yields a Gibbs random field with potential function R(λ, w).
The choice of the penalty term also influences their choice of algorithm. On one hand,
the maximisation step of the EM-ML algorithm is considered too cumbersome when using
such regularisation methods, and on the other the gradient descent optimisation methods
cannot easily accommodate the constraint of nonnegativity. Instead, a successive over-
relaxtion (SOR) algorithm is chosen, which applies the nonnegativity constraint to each
parameter, and is said to more naturally suit the choice of objective function. In the
following, λ̂ denotes the current estimate of the emission density λ, and aj denotes the j th
column of A. The algorithm from [Fessler 1994] begins with the following initialisation:
• λ̂ = F BP {y}
• r̂ = ŷ − Aλ̂
• sj = aTj Σ−1 aj
P
• dj = sj + β k∈Nj wjk
where F BP denotes filtered back-projection, and the matrix s is a J × J symmetric

matrix, si,j = sj,i , defined as:
 P
I−1 ai,0 ai,0
PI−1 ai,0 ai,1 PI−1 ai,0 ai,J−1 
, i=0 , ... ,
 PI−1 i=0
a
Σi
a
Σ
PI−1 ai,1 ai i,1 Pi=0
I−1 ai,1
Σi
a i,J−1
, ... , 
i,1 i,0


i=0 Σi , i=0 Σ i i=0 Σ i
 P
I−1 ai,2 ai,0 .. .. .. 

Σi , .

 i=0 . . 
 . ..
.. .. 

 ..
 . . . , (8.5)
 . .. .. .. 
 ..
. . . 


 . .. .. .. 
 .. .
. .

 P PI−1 ai,J−1 ai,J−1 
I−1 ai,J−1 ai,0 PI−1 ai,J−1 ai,1
i=0 Σi , i=0 Σi , ... i=0 Σi ,
which typically requires 1/2 a Gbyte of storage space even when its symmetry is
exploited. And this is the reason why I’ve been in no rush to code this algorithm, despite
its suitability to a parallel implementation1 . Finally, the iterative steps of the algorithm
are:
for j = 0 to J-1;
λ̂new
j ← λ̂j
aT −1 r̂+s λ̂old +β
P
j Σ j j k∈Nj wjk λ̂k
λ̂new
j ← dj
λ̂j ← max{0, (1 − w)λ̂old
j + wλ̂new
j }
r̂ ← r̂ + aj (λ̂old
j − λ̂j )
end;
It converges to a unique solution for λ̂ ≥ 0, that maximises Φ. The convergence

rate depends on w [Sauer and Bouman 1993], and updates to λ̂ are done sequentially (a
1
It can, of course, be calculated on-the-fly. It is termed a “weighting matrix”, although it is a co-
variance matrix incorporating the system response though the first-order moment (given a Gaussian-like
distribution). See also the algorithm due to [Kaufman 1993].
8.3. Penalisation Via the Filtering of Activity Estimations 87
characteristic of the SOR algorithm). One potential advantage to this facet being that
the residual r̂ may determine whether or not a pixel value may have converged. This
will naturally suit adaptive convergence, which is why we have chosen to document the
algorithm in detail. One potential problem with this penalised method is, however, that
we can expect a better resolution in low count regions of the image2 . Certainly, at first
glance at least, this seems a rather surprising result.
The above images are FBP reconstructions using a sinc shaped filter. They are the 1st, 16th,
31st and 48th slices of a 63 slice volume.
The above show reconstructions using the PWLS of [Fessler 1995]. These are the 1st, 16th,
31st and 48th slices of the data set from the University Hospital in Groningen.
Figure 8.2: The above figure shows some results of the PWLS software (top row) in comparison
to FBP reconstructions of the same sinogram set (bottom row). The FBP reconstructions were
actually used as the starting estimates for the PWLS algorithms. The PWLS algorithms were
run for two conjugate gradient iterations followed by ten iterations of coordinate with the under-
relaxation parameter of successive over-relaxtion (SOR), ω, set to 0.6. The penalty term is a
quadratic one (φ(x) = x2 /2). It is not claimed that these parameters are optimally chosen - I am
not familiar enough with the software to do this. The results, however, may be compared to those
shown at the end of chapter 6, which discusses the implemented version of the ML algorithm.
8.3 Penalisation Via the Filtering of Activity Estimations

The literature has shown that it is beneficial to select a penalty in terms of a distance
measure from an estimator λp , where λp is obtained directly from the sinogram data by
means of a well-smoothed back-projection method [Herman et al. 1982]. The smoothing
is done to an extent sufficient to remove noise. The penalisation of roughness in an image
does the same thing, effecting a smoothing of the resulting activity distribution. The
choice of prior, or regularisation term, depends of course on the application. And these
2
One can basically expect a better resolution in low-count regions of the image [Fessler and Rogers 1996]!
But, these non-uniformities are not a result of system response. Instead they are due to the interaction
between the log likelihood and the penalty terms of the objective function. [Stayman and Fessler 1999]
has sought to address this issue.
can vary significantly. Recent use, for example, has involved techniques such as Wavelet
decomposition [Mallat 1989] to regularise the reconstruction process [Wu 1993]. Wu uses
this method to extract the high frequency components of the perfectly reconstructed image
data to use as a prior for a MAP based reconstruction. This acts to smooth - and thus
regularise - the solution in the reported satisfactory manner. Unfortunately, such priors
are not typically available (i.e., those coming from a perfectly reconstructed image), and
although the exact manner in which this method simply effects prereconstruction filtering
is not clear, the analogy is a strong one. That is, the method is again a smoothing process;
albeit an elaborately conceived one.
More smoothing is par for the course in this section, which, in its simplest form, can be
achieved using the convolution operator and an appropriate kernel. The result can show
some equivalence to the penalisation term of equation 8.3: an image of increased SNR,
but also of reduced resolution. How sensible this might be depends on the initial assump-
tions made with regard to the activity distribution. For example, the above algorithm due
to [Fessler 1994] is applied to low-count data, where edge preservation is considered an
extravagance. Another example would be if one could assume that a PET image consists
of neighbourhoods showing negligible variance in activity, then a prior model based on a
filtered EM-ML reconstruction might indeed be appropriate. And this is exactly what is
done by [Alenius and Ruotsalainen 1997] and later [Seret 1998] who use a median filter
on the basis of its suitability for reducing speckle-like noise whilst showing some ability
to retain the image’s original structure [Pitas and Venetsanopoulos 1990]. Extending this
rather rudimentary option, [Ardekani et al. 1996] proposed the minimisation of a cross
entropy measure taken between a EM-ML reconstruction and its smoothed counterpart.
The extension offered by this work is to use an adaptive filter, which, based on edge infor-
mation from associated MRI data, smoothes the data whilst more successfully preserving
structure.
The main problem of such a method is that this filtering may also involve a masking of
functional detail away from the edge boundaries. Nonetheless, this is done, for example,
by [Hero et al. 1998] who adopt the estimator of equation 8.2 such that the weights
in equation 8.3 are adjusted in accordance to associated MRI data. In the following,
however, we discuss and demonstrate the methods due to [Ardekani et al. 1996] and [Som
et al. 1998], and then offer an alternative approach in the light of potential disadvantages.
This closes the chapter by introducing us to reconstruction methods that use additional
information for emission tomography reconstruction. In this and most cases, the additional
information is from higher resolution, lower noise, and better contrast MRI data.
8.3.1 Minimum Cross Entropy Reconstruction

An example of smoothness based priors that will introduce us to the Bayesian methods of
chapter 9 is the cross entropy methods originally used by [Liang et al. 1989, Nunez and
Llacer 1990, Byrne 1993]. Cross entropy, or the Kullback-Leiber distance [Kullback 1969],
is a measure of dissimilarity between two images. When the sinogram data is modelled
by the Poisson of equation 4.7, then the distance between the measured emission data (y)
and the forward projection of the reconstruction estimate (Aλ) can be shown to reduce
to a cross entropy measure3 . Used in the reconstruction of emission tomography images,
3
Note that the methods assume a Poisson model. Previously we looked at finding the maximum of
an assumed Gaussian distribution, which is a least-squares problem. That is, as the final a posteriori
the measure is therefore to be minimised, and when no penalisation is applied, this is

equivalent to the EM-ML algorithm. The incorporation of a prior can, in the simplest
case, be simply a smoothed version of the current estimate. The reconstruction method
due to [Ardekani et al. 1996], however, uses an adaptive smoothing filter to regularise
the [EM-ML] reconstruction solution. The kernel of this filter is derived on the basis of
gradients in the associated MRI data, and the resulting image is incorporated into the
reconstruction technique as the second image (the first being the EM-ML reconstruction)
in the cross entropy measure. The algorithm is thus formulated to minimise this measure
according to a weighting which determines the respective influence of each image. We look
at this method in the following where a variation of the same algorithm due to [Som et
al. 1998] and then my own approach is given. The pros and cons of these methods close
the chapter, nicely leading us to the more involved methods of the chapter 9.
The Experimental Setup In the experiments described in the following, the system
matrix used did not model uncertainty in the emission source, and the normal method of
nearest neighbour interpolation was used. This produces the images in greatest need of
regularisation, and hence the examination of each algorithm is more rigorous.
The data set was computer simulated and is shown in figure 8.3. Combined segmen-
tations from the Montreal Brain Database [Cocosco et al. 1997] were used as delineated
regions to which activity could be assigned. The resulting activity map was then blurred
before being forward Radon transformed4 to form a sinogram set. Poisson noise was then
added to improve the realism of the data. The more accurate inclusion of Poisson noise
would typically involve a shifted Poisson model in order to emulate the effects of the correc-
tion methods used in the real acquisitions. That is, the means would not just be the pro-
jection data, but the projection data plus some normal distribution [Rowe and Dai 1992,
Furuie et al. 1994]:
P oisson(ȳ + N (0, σ 2 )), (8.6)
where y is the simulated (noise free) projection data, and N (0, σ 2 ) is a normal distri-
bution of zero mean and σ standard deviation. Through empirical studies, [Furuie et al.
1994] suggest that this value should be set as,
√
σ 2 = ȳ[(1 + 0.05 ȳ)2 − 1]. (8.7)
As mentioned, the experiments involved three quite similar cross entropy algorithms.
These are described in the following, and their results are shown in figures 8.4, 8.5, 8.6,
and 8.7.
The Cross Entropy Method of [Ardekani et al. 1996]

The general idea of the approach of Ardekani is that a prior may be repeatedly estimated
on the basis of the current estimate and its adaptively smoothed version. The adaptive
smoothing is done by a weighted Gaussian, where gradients (taken as the absolute differ-
ence between two grey levels) in the associated MRI data determine this weighting. The
probability is Gaussian, then finding the MAP estimate amounted to solving a weighted least-squares
problem.
4
Done using Matlab’s implementation of the Radon transform.
Original "MRI" Data Assigned PET (True) Distribution Resulting Sinogram
Blurred, Forward
Projected +
Poisson Noise
(a) (b) (c)
(d) PWLS Reconstruction (e) OSEM - using Nearest (f) OSEM - NN Interpolation (g) OSEM - NN Interpolation
Neighbour (NN) Interpolation with Gaussian, FWHM = 2 Pixels with Gaussian, FWHM = 3 Pixels
Figure 8.3: The top row of the above figure shows the input data used for the results given in
the following. The image on the left is the simulated MRI data (a) from which a PET distribution
is taken (b). This is done by assigning the GM, WM and CSF to the 10:3:1 ratio, and then
modulating the GM signal such that it shows some variation. This image was then blurred using a
Gaussian kernel, before being forward-projected to derive a sinogram (c). Poisson noise was then
added to simulate the counting effect in the detectors. The final sinogram (that which was used in
the following tests), is shown on the right. It should be noted that the generation of this sinogram
was consistent with the assumptions made of the model used in the PVE correction methods of
chapter 2. The next row shows example reconstructions of this data where no MRI data was
used. At the far left (d) is the penalised weighted least-squares method (as discussed above) using
a quadratic penalty function over a first-order neighbourhood, run for 10 iterations. The next
three images are all reconstructions using the OSEM [Hudson and Larkin 1994] algorithm run for
5 iterations using 8 subsets. In the first (e), normal nearest neighbour interpolation was used in
creating the system matrix. In the second (f), a Gaussian PDF of 2 pixels FWHM was modelled,
and in the third (g), this was extended to 3 pixels.
filter is of the following form [Wang and Vagnucci 1981]:

w√ w1 w√
 
5 6
W 2 W W 2
1  w2 1 w3
fj = (8.8)

W
 W 2 W 
w√
7 w4 w√
8
W 2 W W 2
P8
P4 2 k=5 wk
where W = 2 k=1 wk + √
2
, and,

2 if m0 = mk
wk = 1 (8.9)
km0 −mk k otherwise,
p
where km0 − mk k = (m0 − mk )2 , and m
~ is the MRI data.
On the basis of the above filter, a prior image, p is formed at each iteration (k) by
pkj = fj ∗ λkj , (8.10)
where λ is the current solution, and ∗ denotes the [2-D] convolution operator. The
algorithm can be summarised as the following:
( I−1 !)
λkj X yi aij λkj
λk+1
j = PI−1 PJ−1 − β ln . (8.11)
i=0 aij j 0 =0 λj 0 aij 0
pj
i=0
Although this algorithm (originally proposed for emission tomography reconstruction

in [Nunez and Llacer 1990]) satisfies a positivity constraint on β, such that at the limit
β → ∞, then λj = pj , it is not, however, guaranteed to converge. As such, a hybrid
algorithm is presented, where if the following equation is satisfied,
I−1
!
X yi aij λkj
PJ−1 > β ln , (8.12)
j 0 =0 λj 0 aij 0
pj
i=0
then equation 8.11 is used. Else:
I−1
( !)
−1 X yi aij
λk+1
j = pj exp PJ−1 . (8.13)
β j 0 =0 λj 0 aij 0
i=0
The results of running this algorithm using the test data of figure 8.3 for a 3-by-3
kernel are shown in figure 8.4.
The Cross Entropy Method of [Som et al. 1998]
[Som et al. 1998] use the above algorithm, changing only the filter that is applied. The
kernel is again based on a Gaussian distribution, weighted by inverse gradients in the
anatomical image. Their 3-by-3 pixel filter is defined as the following:
 
w w1 w6
1  5
fj = w2 w0 w3  (8.14)
W
w7 w4 w8 ,
P8
where W = k=0 wk , and the wk are defined as:
(
gk (r) if m0 = mk
wk = 1
(8.15)
gk (r) km0 −mk k otherwise.
In the above gk (r) is the Gaussian kernel, specified according to r, the radial distance
from the centre pixel. Although given as a 3-by-3 kernel, I have implemented this algorithm
for arbitrary kernel sizes. Results are shown in figure 8.5 for a 3-by-3 kernel.
Beta = 0.1 Beta = 0.3 Beta = 0.5
Profiles for the Ardekani Algorithm.

90
True
80
Beta = 0.1
70
Beta = 0.3
60 Beta = 0.5
50
40
30
20
10
0
0 10 20 30 40 50 60 70
Figure 8.4: For the input data explained in figure 8.3, the results above show three reconstructions
based on the [Ardekani et al. 1996] algorithm. On the left, β of equation 8.11 was set to 0.1 (i.e.,
bias is shown toward the EM-ML result). In the middle β equalled 0.3, and on the right, β was set
to 0.5 (i.e., bias was given toward the prior). Thirty iterations of the algorithm were performed. At
the bottom, we see profiles for each reconstruction shown in comparison to the true distribution.
Cross Entropy using Anisotropic Filtering

Given the above methods of Ardekani and Som, a rather obvious new method that presents
itself is to use anisotropic filtering [Perona and Malik 1987] where the diffusion coefficient is
taken from the MRI data. Anisotropic filtering is a smoothing operation where the degree
of smoothing applied is determined by estimates of the local image gradient. The algorithm
is derived such that where the image gradients are large (i.e., where an edge is present),
then the smoothing is minimal. Where, however, there is no edge information, then the
smoothing is unrestricted. The result is a very useful and widely applied smoothing filter
as it typically preserves the structure in the image whilst improving the SNR (i.e., whilst
smoothing).
Image smoothing can be modelled as a diffusion equation, where for an arbitrary image,
I, the diffusion equation is:
dI
= div(c(I)∇I), (8.16)
dt
where c is the diffusion coefficient (taken as a measure on the image), div is the
divergence operator, and ∇I is the gradient of the image, I. It is necessary to choose c = 0
at boundaries, for example, and 1 elsewhere to encourage the desired effect. Remaining,
Beta = 0.1 Beta = 0.3 Beta = 0.5
Profiles for the Som Algorithm.

90
True
80
Beta = 0.1
70
Beta = 0.3
60 Beta = 0.5
50
40
30
20
10
0
0 10 20 30 40 50 60 70
based on the [Ardekani et al. 1996] algorithm, this time using the filter given in equation 8.14 (i.e.,
it is the algorithm proposed by [Som et al. 1998]). In the middle β was 0.3, and on the right,
β was set to 0.5 (i.e., bias was given toward the prior). Thirty iterations of the algorithm were
performed. Again, the bottom figure shows profiles for each reconstruction in comparison to the
true distribution.
however, is a method to determine the locations of the edges in the image, E(x, y, t).
Assuming that this information can simply be derived from local image gradients, then
the diffusion coefficient must be chosen as a function of this measure: c = g(kEk), where
kEk denotes the magnitude of E. From [Perona and Malik 1987], two options for g are
given, where both are non-negative, monotonically decreasing functions, with g(0) = 1.
Firstly,
!
k∇Ik 2

g(∇I) = exp − , (8.17)
K
which is said to privilege edges of high contrast, and secondly,

!−1
k∇Ik 2

g(∇I) = 1 + , (8.18)
K
which privileges wide regions over smaller ones. In both of these equations, the value
K is some constant that is normally determined empirically.
The necessary modification to the implementation of this filtering method such that it
is suitable for use in the cross entropy algorithm, is to simply derive ∇I from the MRI data
whilst applying the resulting filter to the PET data. This then, forms the prior used during
the reconstruction process. Figures 8.6 and 8.7 show the results of using this algorithm
with the local image gradients estimated from equations 8.17 and 8.18, respectively.
Beta = 0.1 Beta = 0.3 Beta = 0.5
Profiles for the Diffusion Filtering (HC) Algorithm.

90
True
80
Beta = 0.1
70
Beta = 0.3
60 Beta = 0.5
50
40
30
20
10
0
0 10 20 30 40 50 60 70
again based on the [Ardekani et al. 1996] algorithm, but this time using the anisotropic filter due to
[Perona and Malik 1987]. In these, the diffusion coefficient was defined using equation 8.17, which
favours regions of high contrast. On the left, β of equation 8.11 was set to 0.1 (i.e., bias is shown
toward the EM-ML result), the middle β was 0.3, and on the right, β was set to 0.4 (i.e., bias was
given toward the prior). The value of K in equation 8.17 was set to 6 for each result, and thirty
iterations of the algorithm was performed, with 8 runs of the diffusion filter being performed at
each iteration.
Closing Words on the Cross Entropy Approaches

Interesting in the above is the similarity between the the approach due to [Som et al. 1998]
and that using the anisotropy filter. Both, it can be argued, outperform the original work
of [Ardekani et al. 1996]. Otherwise, I have found little to distinguish these methods5 .
Theoretically, it is certainly more attractive to use the diffusion filter approach than that
of a truncated Gaussian. There are good mathematical reasons for adopting the Gaussian
as a smoothing filter, these being primarily related to its good localisation properties in
both the spatial and frequency domains, which, in the case of the frequency domain, will
be entirely lost in the approaches due to both [Ardekani et al. 1996] and [Som et al. 1998].
If we look at the Fourier spectrum of the truncated Gaussian, then we will immediately be
5
Note that further results are to be found in figure 46 of appendix E.
Beta = 0.1 Beta = 0.3 Beta = 0.5
Profiles for the Diffusion Filtering (WR) Algorithm.

90
True
80
Beta = 0.1
70
Beta = 0.3
60 Beta = 0.5
50
40
30
20
10
0
0 10 20 30 40 50 60 70
again based on the [Ardekani et al. 1996] algorithm, but this time using the anisotropic filter due to
[Perona and Malik 1987]. In these, the diffusion coefficient was defined using equation 8.18, which
favours wide regions. On the left, β of equation 8.11 was set to 0.1, the middle β was 0.3, and on
the right, β was set to 0.4. The value of K in equation 8.18 was set to 6 for each result, and thirty
runs of the algorithm was performed, with 8 iterations of the diffusion filter being performed at
each iteration.
able to recognise the high frequency components of such a kernel. Following its convolution
of the image data, it is again insightful to consider the result this has in the frequency
domain. Here, convolution becomes multiplication, and the high frequencies introduced
by the truncated kernel will be preserved. The methods of both [Ardekani et al. 1996] and
[Som et al. 1998], therefore, will lead to the introduction of high frequencies that were
not in the original image, are unlikely to have been present in the tracer distribution [Ma
et al. 1993], and are beyond the operating range of the scanning device! To avoid this
problem, one can instead use the diffusion filtering approaches introduced here.
Important to these approaches is balancing the usually conflicting goals of each term
in equation 8.11. The parameter β controls this, and its choice can best be made in an
empirical fashion. Such methods for hyperparameter estimation are also necessary in the
objective function of equation 8.2, and, as it turns out, is pretty much the case with all
such methods. We will later see the importance of a similar parameter in our own methods,
but we are fortunately able to offer sensible approaches for its appropriate selection. It is
another “fiddle-factor” as Fessler so nicely calls them [Fessler 1997], present somewhere in
almost all reconstruction algorithms.
The benefits of the cross entropy approaches are in my opinion twofold. Firstly, the
inherent simplicity of the algorithm is attractive in itself (there is no potentially over-

elaborate model, for example). And secondly, intensities are not allocated in a manner
that could show significant variance away from their within-region neighbours (because
of the smoothing), which may be appropriate [Rousset et al. 1993a], or from the initial
reconstruction estimates based on the sinogram data alone (i.e., there is no wild estimates
here, and data is not pulled from thin air). It is to be noted then, that in applications
of transmission image reconstruction, where within-compartment homogeneity is to be
expected, these procedures are applicable.
For emission images - the application of this thesis work - problems arise, and the
disadvantages are more apparent. Firstly, the smoothing of the data has to be the wrong
approach given the end objectives of improved resolution, delineation and quantification of
the data. The notion is simply not compatible. Secondly, true activity variation is likely
to be masked, except where edges are present in the MRI data. Fortunately, the filter
kernel itself might be too wide to adversely effect (or harm) intricate ROIs such as GM,
but this is just a spin-off benefit which is by no means guaranteed to materialise. Thirdly,
potential side information, such as approximate activity levels (see [Sokoloff et al. 1977,
Ingvar et al. 1965, Ma et al. 1993, Rousset et al. 1993b]) or models to capture their
variation (see, for example, [Geman and McClure 1987, Mumcuoğlu et al. 1994]) is not
used. This obviously fails to utilise the full potential of the MRI modality, and the quality
of the reconstructions must correspondingly suffer. Finally, the policy of updating the
prior at each iteration of a Bayesian scheme is a dubious one, which, being out of the
scope of the current discussion, will be returned to in the next chapter.
9
Better Prior Models for Bayesian
Methods of Reconstruction
The methods of the previous chapter amount to a generalised form of the Bayesian MAP
estimation approach to reconstruction. Following this initial introduction to reconstruction
based on prior models, this chapter delves further into the Bayesian approach, primarily
to demonstrate the purpose of the more elaborate models. That is, in cases where MRI
data is available, then one can be more explicit in the use of the prior. Rather than simply
expecting a generally smooth reconstruction and encouraging transitions to occur at known
edge boundaries, the methods introduced in this chapter involve defining independent
expectations for each pixel value. Throughout, this chapter assumes that such information
is available, and shows therefore how this extended usage of prior knowledge is applied.
We have seen that in the case where there is no real prior image model (in the form,
for example, of a segmented MRI image) how regularisation using a penalisation term
may be used to constrain the solution. An alternative, however, would be to derive such
piecewise constraints based on random field models.
The random field models are probably the most common class of priors used in im-
age analysis and processing, and since the seminal work of [Geman and Geman 1984],
the tomographic community was quick to see the potential of the models used as regu-
larisers in emission reconstruction [Geman and McClure 1985, Geman and McClure 1987,
Levitan and Herman 1987, Hebert and Leahy 1989, Green 1990, Bouman and Sauer 1993,
Mumcuoğlu et al. 1994, Mumcuoğlu et al. 1996]. Popularity is ensured while the Markov
random field (MRF) models are computationally tractable and able to capture many non-
Gaussian aspects of images, such as edges [Suhail et al. 1998]. A key paper in the field of
PET reconstruction is that of [Mumcuoğlu et al. 1994] who address the slow convergence
and ill-conditioned properties of the ML estimator by using a MAP estimator based on
a MRF instead. Accordingly, they maximise over the posterior probability rather than
over the likelihood function, which imposes a regularising effect on the problem allowing
instability to be avoided [Mumcuoğlu et al. 1996]. They do not, however, use anatomical
information to do this, but are able to demonstrate very clearly the effectiveness of the
Bayesian approach.
Applicable though random fields are, PET reconstruction can further benefit from
information provided by MRI data (for example, [Lange et al. 1987, Gindi et al. 1991,
Leahy and Yan 1991, Lipinski et al. 1997, Sastry and Carson 1997]). In accordance to the
cross entropy method of applying the penalisation only within well-bounded regions, the
same considerations apply to limit the extent of the random field distributions. Namely,
98 Chapter 9. Better Prior Models for Bayesian Methods of Reconstruction
that although PET and MRI produce images of different parameters, their underlying
anatomical structure is shared, hence the distribution of functional activity relates to,
and is dependent on, the spatial distribution of the structure. In contrast to the cross
entropy methods of section 8.3.1, however, random field models are capable of modelling
within-tissue activity variation.
Although the volume of research undertaken for such applications is immense, it has
matured significantly, allowing us to look only to the novel and most recently published
papers for direction. For example, [Gindi et al. 1991] use MAP estimates of simulated
PET data by incorporating line sites (originally proposed by [Geman and Geman 1984])
into the reconstruction derived from associated MRI data. The assumption being that
where there is a strong anatomical edge, then it is likely to correspond to a change in
activation, hence the line sites are used to enforce this. Within the edge-bounded regions
the assumption is that the activity can be modelled with a uniform distribution, which
the authors themselves admit to being unlikely. A similar method is also proposed by
[Leahy and Yan 1991]. Instead of using line sites, [Bowsher et al. 1996] incorporate a
segmentation model into their reconstruction process. Under the assumption that the
activity distribution can be modelled as consisting of regions of similarly activated voxels,
they use Bayes’ theorem and MRI priors to estimate the number of regions, and their
mean activity levels. The activity distribution is based on the Poisson distribution, and
a variation on the Iterated Conditional Modes (ICM) algorithm [Besag 1986] is used to
estimate the prior. Phantom studies are performed using the 3-D Hoffman phantom. In
the discussion, the authors conclude that:
First, anatomical structure is much clearer [in their result], which may assist
clinicians. Secondly, the Bayesian procedure may improve detection. Thirdly,
the Bayesian procedure may improve quantitation.
[Lipinski et al. 1997] reconstruct using MRI priors to form Markov and Gaussian energy
fields to be minimised. [Sastry and Carson 1997] also use this Bayesian approach to PET
reconstruction, building priors of a similar form, but concentrating only on the Gaussian
field based approach. The work based on the MRFs seems in the Lipinski paper to have
stabilised at a point at which further development is not obvious. Indeed, recent work
[Vollmar et al. 1999] based on the same algorithms offers, to the best of my knowledge, no
original extensions, and hence no improvements. And this is also the reason why I have
not researched in this direction. Of particular interest in the work of [Sastry and Carson
1997], however, was also the introduction and estimation of tissue-type activities into a
reconstruction algorithm, which somewhat sets this work aside from the other approaches
offering a more complete utilisation of the information provided by the MRI data. The
method for estimating the tissue-type activities may, along with other modifications to
the algorithm, be improvable. This then, is now covered in detail.
9.1 Introducing Bayes’ Theorem for Tomographic Recon-

struction
It is clear that under various conditions the penalisation methods constitute a subset of
simple methods of MAP estimation. It makes limited sense, therefore, to channel effort
entirely toward one whilst ignoring the whole. Nonetheless, when anatomical information
9.2. Regularisation Using A Priori Distributions 99
is available, then more explicit Bayesian inference is the appropriate framework. The
prior probability term is able to embody assumptions on the data’s distribution before
it is observed, and not simply in the form of boundaries to the distributions. Bayes’
theory allows us to update these assumptions in the light of the observed data (although,
strictly, one should not update the prior during the inference process), and thus convert our
problem of statistical inference into an application of probabilistic inference [Neal 1993].
As we have seen, the likelihood function p(y|λ) is a measure of the chance that we would
have obtained the data that we actually observed (y), if the value of our tracer distribution
(λ) was given as known. Tomographic reconstruction based on this function is a method
of statistical inference, selecting the parameters that give the most likely explanation of
the observed data. Using Bayes’ theory, we are able instead to retrieve a value describing
the state of knowledge about the tracer distribution given the measurements. Considering
the parameters to be estimated a hypothesis, then instead of defining the probability of
the data given the hypothesis (λ), we define the probability of the hypothesis given the
data [Sivia 1996], P (λ|y).
Critics of Bayesian approaches cite the subjective nature of the prior as a weakness.
But this would seem to ignore that fact that it is necessary to make assumptions to allow
any form of inference. Bayesian methods allow us to maintain our faith (albeit subjective)
in our prior, and let the data tell us otherwise. That is, in the Bayesian scheme, as
the empirical evidence grows, we will eventually yield a result irrespective of our prior
knowledge; the posterior PDF will become dominated by the likelihood function (this was
shown in equation 6.6). Indeed, in such cases, the choice of the prior becomes largely
irrelevant [Sivia 1996].
Errors in our prior have then the potential to be corrected, and where the prior is
accurate, it may, in emission tomography applications, have the benefit of improving
the result elsewhere in the image. Among [Llacer et al. 1991]’s results from studies of
prior distributions in Bayesian reconstructions, was the indication that prior information
applied in some areas of an imaging field have a tendency to improve the results of a
reconstruction elsewhere. This may at first seem a surprising and perhaps rather dubious
result, but when one considers how highly correlated an emission image is, and how its
piecing together is a problem of global optimisation with constraints such as positivity
and energy conservation, then increasing the certainty of the solution in one area is indeed
likely to aid the solution in another. Although these influential effects are reported as
being slight, the paper argues favourably for the incorporation of any sensible knowledge
regarding the distributions to be used, something that cannot be done in FBP reconstruc-
tion and is not done in reconstruction methods using neural networks as approximators
of the inverse-Radon transform.
9.2 Regularisation Using A Priori Distributions

By combining a prior distribution for our emission data P (λ0 , ..., λJ−1 ) with the conditional
distribution for the observed data p(y0 , ..., yI−1 | λ0 , ..., λJ−1 ), we get a joint distribution
for the observed data:
Y
p(λ0 , ..., λJ−1 , y0 , ..., yI−1 ) = P (λ) p(yi | λ). (9.1)
i
This is known as the product rule, from which a number of results may be obtained,
perhaps the most interesting of which being Bayes’s rule. Since the probability of λ and
y being true is the same as y and λ being true, then the right-hand side of equation 9.1
can be equated to p(y)P (λ|y) and:
P (λ)p(y|λ)
P (λ|y) = , (9.2)
p(y)
which is Bayes’ theorem. Thereby, we are able to derive Bayes’s rule for the posterior
distribution, P (λ|y), given the observed values for y0 , ..., yI−1 , and constrain the likelihood
solution according to a prior of our choice. This prior distribution should describe the
properties of the image to be reconstructed, yielding through Bayes’ theorem a posterior
probability distribution for the image given the data. By maximising this belief over the
set of allowable images, we derive a Bayesian estimate of the PET image, optimal in
respect of the prior.
9.2.1 Choosing the Prior

In the field of image analysis, priors are chosen such that they best characterise lo-
cally structured image properties. For reasons of computational efficiency and theoretical
tractability, Gibbs distributions of the following form have proven popular [Winkler 1994]:

1 1
P (λ) = exp − U (λ) , (9.3)
Z T
where T is a constant, the temperature, which we will assume to be 1, and U (λ) is the
energy function, a sum of clique potentials, and Z is a normalisation term, the partition
function in the parlance of statistical physics.
Cliques are basically the chosen neighbourhoods, and as such, the energy function
simply monitors how neighbours interact. This, of course, depends on the potentials, whose
qualitative behaviour is decisive in balancing the competing demands of local smoothness
whilst permitting sharp transitions. Of the form, φjk (λj − λk ), where j and k index the
pixels and k 6= j, the potential functions are then - in the words of [Leahy and Qi 1998] -
chosen to reflect two conflicting image properties:
• Images are locally smooth;
• Except where they are not!
To avoid masking true edges, one may utilise structural images to either allow the
definition of a number of distinct fields [Levitan and Herman 1987, Lipinski et al. 1997,
Sastry and Carson 1997], or by directly employing “line processes” - allowable transitions
in the fields [Gindi et al. 1991, Leahy and Yan 1991]. The characteristics of the potential
functions themselves are that they are monotonic, non-decreasing functions of the absolute
intensity difference between neighbouring pixels, |φ| = |(λj − λk )|. The square of this
function, for example, yields the Gauss-Markov prior used in the following section. This
typically results in smooth images, unlikely to generate significant intensity transitions.
[Bouman and Sauer 1993] extend this to use a generalised Gaussian model, the p-Gaussian
model, where ψ(φ) = |φ|p , 1 < p < 2, in an attempt to improve the likelihood of achieving
sharp transitions. A deep discussion of these functions is well beyond the scope of this
introduction, and the reader would do well to refer instead to [Leahy and Qi 1998] for PET
application specific approaches, or [Winkler 1994] for a more general, but very thorough
image science based overview. It suffices for now to mention [Green 1990]’s use of ψ(φ) =
log cosh(φ), and [Geman and McClure 1985]’s ψ(φ) = −1φ 2 , with δ being a constant
1+( δ )
controlling the convexity of the potential1 , as ones that are routinely adopted in the
literature.
9.2.2 Method of [Lipinski et al. 1997]

A Bayesian reconstruction algorithm using a MRF to describe the relationships between
neighbouring pixels is given in [Lipinski et al. 1997]. Also presented is an algorithm
based on a Gaussian field, but this will be described in section 9.2.3. The implementation
operates over a second-order neighbourhood (eight neighbours in a 2-D plane) for pixels
belonging to the same tissue department. That is, the boundaries of the random field
correspond to the tissue boundaries, and hence sharper intensity transitions may thus be
accommodated.
From [Lipinski et al. 1997], letting ηj be the neighbourhood about the j th pixel, the
Markov property is given as,
p(λj |λk , j 6= k) = p(λj |λk , j ∈ ηj ). (9.4)
The a priori probability for the tracer distribution is defined as:
exp(−βU (λ))
P (λ) = , (9.5)
Z
where Z is a normalisation constant, β is a parameter defining the strength of the
inter-pixel relationships, and U (λ) is the energy function defined to operate over the
neighbourhood, or clique:
X
U (λ) = VC (λ), (9.6)
C
where the cliques are indexed by C. VC are the potential functions that operate on
the cliques, typically evaluating the intensity differences returning a higher “potential” for
greater discrepancies, and a zero value when the intensities are the same. That chosen in
the paper is:
VC = ln(cosh |λj − λk |), (9.7)
where the λ are neighbours of the same tissue type found within the clique, C. For
neighbours of different tissue types, the potential is set to zero. Via Bayes’ theorem, we
derive our MAP estimator as:
p(y|λ)P (λ)
P (λ|y) = (9.8)
p(y)
∝ p(y|λ)P (λ) (9.9)
 
I−1
!
Y ( j aij λj )yi
P
X 1 X
= exp − aij λj  exp −β VC (λ) . (9.10)
yi ! Z
i=0 j C
(9.11)
1
It is interpreted as a scale parameter on the range of values of λ.
Taking the derivative of this expression with respect to the λj , and equating the result
to zero yields:
X aij yi X dU (λ)
0= P 0
− aij λj − β , (9.12)
j aij 0 λj 0 dλj
i i
which may be solved using the One Step Late (OSL) approach proposed by [Green
1990]. Here, the partial derivatives of dU (λ) are evaluated at the current estimate, λk ,
resulting in the simple update equation:
λkj X yi aij
λk+1
j =P k
, (9.13)
β dU (λ)
P
i aij + dλj |λ=λk i j 0 aij 0 λj 0
where,
dU (λ) X sinh |λj − λk |
= . (9.14)
dλj cosh |λj − λk |
C
9.2.3 Method of [Sastry and Carson 1997]

In this work, segmented MRI data were used to assign four tissue composition labels to
each PET voxel. These consisted of the relative contributions of CSF, WM, GM and other
(i.e., anything remaining), respectively. To achieve this, a fuzzy segmentation algorithm
was used allocating the probability of membership to each of the four classes; for example,
a pixel might be 32% CSF, 5% WM, 23% GM and 40% other. This then provided the basis
of their model, which describes the image data as a sum of activities for each [classified]
tissue type weighted according to this tissue composition. That is, the authors introduce
auxiliary variables λjn , representing the activity level of tissue type n at pixel j:
X
λj = λjn . (9.15)
n
This application of a tissue composition model is certainly original, but the remainder
of the approach is based on the same Gaussian field model as used by [Lipinski et al.
1997]. The assumptions then are very much the same:
• Global Homogeneity: within each tissue compartment, the activity concentrations
are Gauss distributed with a unique mean. This is expressed with a Gauss random
field (GRF).
• Local Homogeneity: within a homogeneous activity distribution, neighbouring pixels

tend to have similar values. This they model with a MRF.
The first prior enforces no local neighbourhood properties on the reconstruction, but
instead assumes that an activity level corresponding to a tissue of a known class will not
be significantly different from that of the mean activity level for that class. In the case
of the second prior, the piece-wise smooth assumption returns, and the idealisation of
a PET image being one in which activities exhibit little variation across neighbouring
pixels is used. The changes allowed in activity is captured by the Gaussian fields, and
the original estimates of the tissue activity is derived from knowledge of the tissue type.
Sastry and Carson actually apply this prior in combination with that of the first (the
so-called Smoothness-Gaussian prior), thus combining local variation with the constraint
that activity levels should remain within sensible global bounds, astutely capturing both
of the assumptions given above.
Ignoring for the moment how the individual components of the tracer distribution are
estimated, we see that our final image (λj ) is derived within the normal model of the PET
acquisition system:
X aij
ȳi = E(yi ) = λj + ri + si , (9.16)
Ai Nj
j
where ȳi is the expected value of the projection counts for the ith projection ray;
aij refers to our geometric probabilities2 of an emission occurring at the j th pixel being
detected in the ith LOR; Ai is an attenuation correction factor; Nj is a normalisation
correction factor for variant detector efficiencies; and ri and si are estimates for randoms
and scatter contributions in the ith detector pair, respectively. Returning to the notion of
a tissue-based estimation approach, the authors must reformulate the system matrix to
account for the weights assigned by the segmentation. Thus:
aij mjn
Cijn = , (9.17)
Ai Ni
th
P where mjn is the affinity of the j pixel to one of the four classes, labelled by n, and
n mjn = 1. This model is then completely described by the following equation:
3 J−1
X X
ȳi = Cij λjn + ri + si . (9.18)
n=0 j=0
That is, the projections are modelled as a summation of emissions across each of the
tissue classes, plus the scatter and random terms. As such, the solution for λj must be
derived from a summation of the individual estimates for the component activities, λjn .
Defining Prior Distributions of λ

In defining an image model, assumptions must be made regarding the nature of the image
distribution. Admittedly, these are subjective, the often criticised aspect of the Bayesian
formulation. For this work, and that of [Lipinski et al. 1997], the assumptions are given
above, and the priors used follow a Gibbs representation:
1
P (λ) = exp(−U (λ)), (9.19)
Z
where U defines the energy function chosen to impose appropriate constraints on the
reconstructed activities, and Z is our normalisation term involving the summation over
all possible configurations of λ. This is of the form:
X
Z= exp(−U (λ)). (9.20)
λ∈ R
2
Note that their system matrix is said to include effects due to detector resolution. See previous
discussions of section 5.3.1.
Typically, Z is ignored, as proportionality in the equations allows this. Its estima-

tion is required to optimally derive a solution, eliminating the need for estimating the
hyperparameters of the random field distribution [Green 1990]. The idea now would be
to construct U such that low energy states are consistent with our expectations regarding
the distribution of λ. This is done in the following.
The Global Mean Prior The Global Mean Prior says that the distribution of activity
levels within each tissue type is reasonably close to the mean activity level of that type.
This is embodied in a Gaussian prior probability density Pg (λ), whose energy function is
minimal when the activities for each tissue type are closest to their global means, denoted
λ̄n . As such, the prior is then at a maximum.
Pg (λ) ∝ exp(−Ug (λ)), (9.21)
where the energy function is,
X X (λjn − λ̄n )2
Ug (λ) = 2
. (9.22)
n
2σ n
j
Note that the choice of σ was fixed for all n tissue types, and that little experimentation
had been done to optimise its value. For this prior, these standard deviations were set
according to:
K λ̄n
σn = 1 , (9.23)
2(ln 2) 2 100
where the resulting Gaussian has a FWHM of K% of λ̄n .
The Smoothness-Global Mean Prior The Smoothness-Global Mean Prior adds the
further assumption that the variation of activity within each tissue type is locally corre-
lated, and as such a notion of a neighbourhood is included. This prior is used in conjunction
with the first:
Psg (λ) ∝ exp(−(Ug (λ) + Us (λ))), (9.24)
where Ug is as defined above, and our new energy term, Us is defined as:
(λjn − λj 0 n )2
XX X
1
Us (λ) = s2 . (9.25)
γ n 0
2σjn
j j ∈Nj
Here, the λj 0 n are activities for tissue n at pixel j 0 , where j 0 additionally references
a neighbourhood about the j th pixel, Nj , and γ denotes the total number of neighbours.
Consequently, the expression sums and averages the values over this region. And as before,
s determines the degree of influence of the prior, in this case, being chosen as:
σjn
s Ks λ̄jn
σjn = 1 , (9.26)
2(ln 2) 2 100
where the λ̄jn are the mean activities of tissue type n taken about pixel j and its neigh-
bours. If the average intensity value over a neighbourhood is low, then the corresponding
Gaussian will be very narrow, and the prior estimates will be adopted each time, which
may not be a good thing.
The Algorithm The algorithm used to derive the MAP estimates of λjn is based on the
EM-ML algorithm. Here, the complete data set is denoted, ψijn , which as before indicates
the number of emissions occurring in pixel j that are registered in the detector pair i, with
an additional index for the tissue type. That is:
XX
yi = ψijn + ri + si . (9.27)
j n
As we saw in equation 7.40 of chapter 7, the ψijn are also Poisson variables. The
likelihood derived in this paper, however, must include the individual tissue components
again indexed by n:
YYY (Cijn λjn )ψijn
Lc (ψ) = exp(−Cijn λjn ) . (9.28)
n
ψijn !
i j
The iterative equation for λ using the global mean prior is defined to be:
s
k+1 1 2
X 1 X
2
X
k ,
λjn = (−σn Cijn + λ̄jn ) + (−σn2 Cijn + λ̄n )2 + 4σjn Nijn (9.29)
2 2
i i i
k is the expectation of ψ
where Nijn k
ijn conditional on y and λ . This was given previously
in equation 7.38, and is rewritten here to include the tissue compartments as:
Cijn λkjn yi
E(ψijn | y, λk ) = P k
, (9.30)
l,m Cilm λlm + ri + si
where k defines the iteration number, and l and m are additional indexes for the image
pixels and tissue types, respectively.
Fundamental to the operation of the algorithm is the behaviour of the standard de-
viation term. This has properties that as σn → 0, the estimates tend toward λ̄n . On
the other hand, as σn → ∞, the algorithm approaches that of the EM-ML algorithm as
described in chapter 7. The iterative equation for the Smoothness-Global Mean prior is
obtained in a similar way, yielding:
s
k+1 1 2
X 1 2
X
2
X
k ,
λjn = (−Sjn Cijn + Λ̄jn ) + (−Sjn Cijn + Λ̄jn )2 + 4Sjn Nijn
2 2
i i i
(9.31)
2 is defined as:
where Sjn
!−1
2 1 1
Sjn = 2
+ s2 , (9.32)
σn σjn
and Λ̄jn is:

!
λ̄n λ̄jn 2
Λjn = + s2 Sjn . (9.33)
σn2 σjn
The difference between the algorithm given in equation 9.29 and that of 9.33 is said to
lie:
in the replacement of σn2 by s2jn , and λ̄n by by Λ̄jn . s2jn and Λ̄jn define the
constraint strength and mean activity levels calculated by a weighted aver-
age of the values specified by the Gaussian and the smoothness constraints
individually.
Least-Squares Activity Estimate All that remains to fully describe the algorithm
presented in this paper is the definition of the mean activity for each tissue types, λ̄n .
This is done using a least-squares estimate from equation 9.18 under the assumption that
0
activityPin each tissue-type is uniform (λjn = λ̄n , ∀j). Defining yi ≡ yi − Ri − Si , and
0 0 0
Cin = j Cijn , then equation 9.18 can be rewritten as yi = Cin λ̄n . This is solvable using:
0T 0T
λ̄ls = [C · C 0 ]−1 · [C · y 0 ]. (9.34)
In the iterative reconstruction process, these estimates were continually updated, al-
though this showed no sign of improving results, and, furthermore, is questionable. Con-
sider the following from [Sivia 1996]:
When one first learns about Bayes’ theorem, and sees how the data modify
the prior through the likelihood function, there is occasionally a temptation to
use the resulting PDF as the prior for a re-analysis of the same data. It would
be erroneous to do this, and the results quite misleading. In order to justify
any data analysis procedure, we must be able to update the PDF of interest to
others used in its calculation through the sum and product rules of probability
(or their corollaries). If we cannot do this, then the analysis will be suspect,
at best, and open to logical inconsistencies. If we persist in our folly and keep
repeating it, the resulting posterior PDF’s will become sharper and sharper;
we will just fool ourselves into thinking that the quantity of interest can be
estimated far more accurately than is warranted by the data.
Discussion This approach is clever in that it does not directly employ any smoothing
through the use of the priors. Instead, as the priors only apply to the tissue-type activities
λjn individually, the resulting images maintain contrast at edges corresponding to the
boundaries of these tissue classes. Such boundaries, however, are seemingly imposed on
the reconstruction regardless of what the data is otherwise telling us. That is, in cases
where variation is large away from the expected mean for that given tissue type, in using
only n Gaussian fields the algorithm simply does not have the flexibility to adapt to it.
Violations of the assumptions of the prior, for example, due to misregistration between
the MRI and the PET data, can subsequently introduce large biases, particularly for the
values of the control parameters that produce the greatest noise reduction. If the difference
between the prior (λ̄n ) and the reconstructed pixel value (λjn ) of equation (9.22) is large,
then our energy function increases, and our prior rapidly3 falls to zero. In this case, the
estimate for λjn is considered unacceptable by the algorithm, and hence the mean value for
the tissue class will instead be adopted (λ̄n ). Their experiments “suggest that there may
be a small advantage to iteratively updating the global tissue-type activities, particularly
if there are large errors in the initial estimates of λ̄n ”. However, with good estimates of
3
Dependent on σn .
Reconstruction size 128x128 (3.43x3.43mm)

Starting Estimate (OSEM 5 its.
8 subsets; FWHM = 6.86mm) 1st Iteration. 2nd Iteration. 3rd Iteration.
Likelihood Value 3350560 Likelihood Value 2843840 Likelihood Value 2830518 Likelihood Value 2829472
The Probabilistic Segmentations used:

GM WM CSF
Figure 9.1: The above figure shows some results using the Gaussian field approach of [Sastry
and Carson 1997] and the global mean prior. The top-left image shows the OSEM produced
initial estimate of the reconstruction. This is used as the mean tissue activity values (the λ̄n of
equation 9.22), hence a back-projected image would invalidate these. Three reconstructions are
shown starting from this OSEM image, these being (from left to right) at the first, second and
third iteration steps of the Bayesian procedure. The bottom row shows the segmentations that
were used to define the tissue components of the reconstruction.
these levels, performance did not depend on the updating of these priors, and the above
advice of [Sivia 1996] should be heeded.
A shared goal of this doctoral work is to address the PVE, and we are of a like-mind
concerning how this should be best achieved. Namely, that although post-reconstruction
methods for PVE removal (see chapter 2) are to hand, we think that the incorporation of
anatomical information directly into the reconstruction process offers the more effective
usage, and is likely to then yield the better results. Subsequently, the framework provided
by this work is very suitable for PET correction and reconstruction, all the more so in
situations where the prior estimates can be improved. But the inflexibility of the algorithm
is a worry, as the framework within which it has been developed will always prefer an
homogeneous distribution. The addressal of this issue is the subject of the next chapter.
10
Applying New Gaussian Priors
Chapter 3 concluded with an algorithm capable of deriving, in a very robust manner, a

high resolution PET image. Assuming a good co-registration, the quality of the resulting
image is limited by the quality of the segmentations made on the MRI data to delineate the
tissue regions. In efforts toward higher resolution reconstructions, just such a distribution
applied as a prior may be appropriate. Low noise of course remains requisite, and again
the PET correction derived in the referred to chapter is applicable. Applicable, that is, in
conjunction with section 9.2.3.
This chapter develops this procedure so as to couple both correction and reconstruction
procedures in order to best address the PVE. To do this, the method of [Sastry and
Carson 1997] must be extended to incorporate a prior that models variation in each tissue
compartment. It is assumed throughout, therefore, that referral to the prior used for the
PET image to be reconstructed relates to the PVE correction result of chapter 3, which
we will refer to as the alternative Gaussian (AG) prior, PAG (λ). This chapter covers the
background to the work summarised in [Oakley et al. 1999a].
10.1 The Form of the Existing Priors
The discussion in section 9.2.3 of [Sastry and Carson 1997]’s paper showed a typical way
in which prior knowledge regarding an image distribution is applied in the form of an
energy function to the appropriate optimisation function. As such, the energy term itself
(equation 9.22) is minimal when the estimate for the parameter (λjn ) matches that of the
estimated mean (λ̄n ). Employed in the prior Pg (λ), and because of the exponential term
of equation 9.21, this returns us the desired maxima when the values find themselves in
exact agreement.
The a posteriori probability for tissue-type activities given the measured counts is:
P (λ|y) ∝ p(y|λ)P (λ). (10.1)
In the above, p(y|λ) is our likelihood function (of the known - Poisson - parametric
form), and P (λ) is the prior probability model, which was that developed in section 3.4,
whose application in the form of an energy term is now further discussed.
10.2. The Application of the New Prior Distribution 109
10.2 The Application of the New Prior Distribution

The prior is of the following form,
1
P (λ) = exp(−U (λ)), (10.2)
Z
where Z is a normalisation term (the partition function), and U (λ) is the energy
function chosen to impose appropriate constraints on the reconstructed activities. Of this
form, [Sastry and Carson 1997] used two different yet similar priors, the global mean
prior and the smoothness-global mean prior. In application, both of these priors applied
across tissue types, allowing the authors to maintain edge information by estimating these
separately. This approach makes sense because separate Gaussian fields may then be
applied for each tissue type, preferably reflecting the expected difference in variation (see
[Friston et al. 1995]).
The assumption under which the Gaussian prior of [Sastry and Carson 1997] was
applied states that the distribution of activity levels within each tissue type is reasonably
close to the mean activity level of that type. This is termed their “global-mean” constraint,
which is expressed via a Gaussian prior probability density, Pg (λ) (equation 9.21). The
approach proposed here can be interpreted in two ways, depending on the choice of the
prior used. The simple interpretation is that it extends the global-mean constraint to allow
for inhomogeneous distributions. A second interpretation is the analogy to the smoothness-
global prior in the case where the prior is derived using the method presented in chapter 3.
In this case, the estimate of tissue activity comes from a single prior data set as estimated
using the method of section 3.4, and the smoothness term is implicitly replaced by the
constraints on local variation supplied from the basis functions in the manufacturing of the
prior. The energy term (equation (10.2)) is minimal where the activity (λj ) for each pixel
type equals the corresponding activity in the prior, λpj , which is defined for each pixel.
The values of λpj are derived from equation 3.13 of section 3.4. Hence it is possible to avoid
the piecewise-smoothness assumption common to these methods (as the estimated prior
will not be piecewise-smooth), a disadvantage of the [Sastry and Carson 1997] method.
It is important to note, however, that in using the prior derived from section 3.4, con-
straints on the within-compartment distributions are not entirely relaxed, as the [valid]
assumption that neighbouring regions of the PET image should show similar activity levels
is made explicit by constraining the basis function coefficients to being locally stationary
but smoothly varying in the estimate of the PET distribution (equation 3.14). Indeed,
the choosing of the dimensionality of the basis functions can in many senses be an equiv-
alent choice to the standard deviation (σjn ) terms found in equation 9.25 of section 9.2.3,
which defines the locality about with the energy term is applied. In effect then, the local
assumptions on the activity distribution can be met.
The prior is now defined as
PAG (λ) ∝ exp(−UAG (λ)), (10.3)
where the energy function [to be minimised] is Gaussian:
X (λj − λpj )2
UAG (λ) = . (10.4)
j
2σj2
110 Chapter 10. Applying New Gaussian Priors
10.3 Deriving the Algorithm

In accordance with chapter 7, the likelihood function of the compete data set, ψij is:
I−1 ψ
Y J−1
Y ψ̄ijij exp−ψ̄ij
Lc (ψ̄) = . (10.5)
ψij !
i=0 j=0
This was originally given in equation 7.40. We then derive the expectation value for
the ψij , as in equation 7.38:
k+1 k aij λkj yi
E(ψij |λj , yi ) = P J−1
a 0 λk
j 0 =0 ij j 0
= Nijk .
The log likelihood as given in equation 7.43 is:
I−1 J−1
X X
lc (ψ̄) = ln Lc (ψ̄) = (Nijk ln aij λj − aij λj − ln Nijk !). (10.6)
i=0 j=0
In the Bayesian approach, instead of maximising the likelihood function, it is necessary

to maximise the a posteriori probability:
map(ψ̄) = argmaxλ {Lc (ψ̄)exp(−UAG (λ))}
⇒ argmaxλ {l
c (ψ̄) − UAG (λ)} .
P P P P (λj −λpj )2
⇒ argmaxλ i (yi log j aij λj − j aij λj − yi !) − j 2σj2
(10.7)
The prior knowledge can then be seen to act as a penalty function when maximising the
above term. The optimisation process itself can be derived by firstly taking the derivative
w.r.t. λj :
λpj
!
∂.map() X a y λj
= P ij i − aij − + . (10.8)
∂λj
i j 0 aij 0 λj 0 σj2 σj2
To optimise, we set this derivative to zero, multiply throughout by λj to derive the

fixed-point iterative scheme, and rearrange (multiply by −1) such that:
I−1
λpj I−1
!
λ2j X X
+ λj aij − − Nijk = 0. (10.9)
σj2 i=0
σj2 i=0
The solution of this quadratic equation involves first multiplying throughout by σj2 to
isolate the λ terms, taking λj = λk+1j , and solving for the positive root. This yields the
following iterative scheme due originally to [Levitan and Herman 1987]:
v !
u
k+1 1 2
X p 1 u p
2
X X
λj = − (σj aij − λj ) + t λj − σj aij + 4σj2 Nijk , (10.10)
2 2
i i i
10.4. Selecting the Gaussian’s Standard Deviation (σ) 111
where Nijk may be extended, if possible, to include scatter and random terms, ri and si
aij λkj yi
respectively, giving P
aij 0 λkj0 +Ri +Si
, and the meaning of the remaining terms and indices
j0
are as defined previously in equation 9.29. The implementation is detailed in section A.5
of appendix A.
10.4 Selecting the Gaussian’s Standard Deviation (σ)

The correction method of section 3.4 that derives an intensity transformed image based
on activity levels in its associated PET image allows an estimation of activity levels for
each PET pixel at the resolution of the MRI image. Using this result, the energy term of
equation 10.4 can be applied at the pixel level, as is now sensible to be able to constrain
the solution differently for each pixel.
This is done simply by varying the standard deviation term, σj , of equation 10.4.
How this should be varied is subject to experimentation, but one approach that might be
applicable is implied in figure 3.10. That is, the discussion regarding this figure concerned
itself with the need for the intensity transformation to capture variation in the PET
data within an homogeneous region of the MRI image. The conclusion was that the
granularity of the basis functions must be sufficient to capture this variation, else possibly
true activity might be masked by the MRI data. Such caution should also be applied
when using the resulting prior in the energy term; all the more so given the computational
limitations currently restricting the resolution of the basis functions1 . That is, in order
to avoid masking true variation in activity, the PET data alone should determine the
outcome of the reconstruction in regions shown to be of a homogeneous tissue composition.
This would simply imply letting the likelihood function dominate in equation 10.7 above.
This corresponds to increasing the width of the Gaussian field, because as σj → 0, the
reconstruction tends toward the prior, and as σj → ∞, then the algorithm tends toward
the EM-ML solution. Alternatively, where we are sure of structural variation in the MRI
data, we should also assume variation in the activity levels of the PET image as the
underlying structure is common to both image sets. We can interpret the selection of this
hyperparameter in a physical sense. It gives us an estimate of the likely contribution from
the PVE. Where variation is high in the MRI data, then the PVE is likely to be greater.
10.4.1 Local Estimates of PVE
We mentioned that figure 3.10 implies the solution to how the σ term must vary. An
entropy measure taken on small windows about each pixel in either the MRI image or
the prior yields a measure of variation, or structure, in that image. This in turn can be
thought of as an approximate estimate for the likely PVE contribution.
Entropy is a measure of the lack of information. Its magnitude is smallest when all
of the probability mass is concentrated in one bin of the histogram. Thus, where the
structure is more notable, intensity variation is likely to be higher, and the magnitude of
the entropy will increase.
1
See end of section 3.4
Taken in small windows of the image data, entropy is defined as the following:
G
X
ej = − pj (g) log pj (g), (10.11)
g=0
where g indexes the G different grey-scales, j indexes the pixel about which the window
is centered, and the pj (g) are the histograms for those windows. In such regions where the
entropy is larger, then the σj term should also become larger as there is little structural
information, and the EM-ML solution should dominate. That is, σj should be proportional
to the entropy image. Figure 10.1 shows how an entropy measure can relate to a prior
(far left) for varying window sizes, about which the measure is taken.
Input Image (a T1 Weighted MRI Slice). Entropy Image for a 3x3 Window. Entropy Image for a 5x5 Window. Entropy Image for a 7x7 Window.

Figure 10.1: From left to right, the figures shown above are: a T1 weighted MRI slice; entropy
measures of the MRI slice using a window of 3-by-3, 5-by-5 and 7-by-7 pixels, respectively. In
these images (of a resolution of 256-by-256 pixels), the darker the intensity, the lower the entropy;
i.e., the more notable the structure (equation 10.11).
The entropy measure used is calculated over a finite region, typically set to a 3-by-3
window. This is a relatively arbitrary choice, and may be improved as there are factors that
should hold influence. Firstly, in using the correction method of section 3.4, the granularity
achieved by the basis functions imposes a notion of a neighbourhood (section 3.5): it is
the largest spatial extent that cannot be adequately approximated. This is clearly visible,
for example in the demonstration of figure 3.8. Secondly, there is a notion of minimal
activation; the smallest spatial extent of an active region in the brain. However, this
information is probably more relevant in the choice of basis functions (it sets the upper
limit), so it will be largely ignored for now. Consequently, ej , the entropy image, should
be determined as a function of two variables: B, the number of basis images; and then
the MRI data around the j th pixel.
10.4.2 Application of σ in the Algorithm

Having decided that an entropy measure taken on the prior image should influence the
energy term, its actual value and range of variation should also be made explicit. Obviously
σ must be scaled to the image data. As such, it seems reasonable that it should be
developed based around the method of [Sastry and Carson 1997]. In using the tissue
compartment models, their approach is said to allow a physiological interpretation of the
parameter K, namely the percent variability of activity within tissue n. Accordingly,
the variation for each tissue class is weighted with respect to the expected intensity for
that class, which, being greater for GM, allows the greater variation to occur within this
10.4. Selecting the Gaussian’s Standard Deviation (σ) 113
tissue. Less variation is thus allowed in WM, and the least in CSF. This effect is actually
undocumented in the original paper, but it certainly makes good physiological sense (refer
again to [Friston et al. 1995]). To capture such variation, the standard deviation should
indeed be associated to its tissue class; it is only the means of the Gaussian that are
estimated at the pixel resolution. An additional option, however, involves the weighting
due to the entropy measure, ej . This measure is simply scaled to within 1 ± 0.6 (1.6
where the prior shows greatest structure, and 0.4 in completely homogeneous regions).
The resulting Gaussian standard deviations are thus calculated from:
K.Cj .ej
σj = 1 , (10.12)
2(2 ln 2) 2 100
where the Cj are some scaling constants which, after much experimentation, are now
defined as:
Cj = mgm (j).λ̄gm + mwm (j).λ̄wm + mcsf (j).λ̄csf , (10.13)
where mgm , mwm and mcsf are the probabilistic segmentations of GM, WM and CSF,
respectively; and λ̄gm , λ̄wm and λ̄csf are the estimated tissue activity means for the classes
GM, WM and CSF. These are estimated from the mean of the reconstruction’s starting
estimate for λ (hence it is recommended to use a EM-ML reconstruction for this), and the
activity ratios GM:WM:CSF. The tissue means themselves are set according to:
P
j λ(j) × mgm (j)
λ̄gm = , (10.14)
P J
j λ(j) × mwm (j)
λ̄wm = ,
P J
j λ(j) × mcsf (j)
λ̄csf = ,
J
where mgm , mwm , and mcsf are the probabilistic segmentations of GM, WM and CSF,
respectively, and J the number of pixels. This estimate is entirely data driven, and as
we shall see in the results, yields very nice results. In the case of exact registrations and
segmentations, we get the desired variability in terms of accommodating greatest variation
in GM regions. Furthermore, if true WM regions (according to the segmentation) contain
a tumor, for example, and the take-up of tracer to this region is high (i.e., an unexpectedly
variant distribution in the WM which could not be predicted from the segmentation alone),
then the σj values will increase for these regions, allowing this take-up to be reproduced in
the reconstruction. It would undoubtedly be masked using the global-mean algorithm of
[Sastry and Carson 1997], as Gaussians are applied only at the resolution of entire tissue
classes.
Unfortunately, in the case of segmentation or registration errors, a GM classified pixel
in the MRI could be associated to a WM activity in the PET data, for example. As such,
the above scheme will tend to average the estimates such that the standard deviations
(but not their means, note) become close to equal. Because of this, an additional option
is available to constrain the resulting Gaussians such that their variation was more indica-
tive of the majority2 tissue class shown in the MRI segmentation. Hence equation 10.14
becomes:
2
It is important to remember that we use probabilistic segmentations. This allows the width of the
Gaussians to adopt a continuum of different values between 100% GM and 100% WM, for example.
P
j λ(j) × mgm (j)
λ̄gm = 1. , (10.15)
P J
j λ(j) × mwm (j)
λ̄wm = 0.3. ,
P J
j λ(j) × mcsf (j)
λ̄csf = 0.1. ,
J
where the ratios assumed are in the 10:3:1:0 for classes GM:WM:CSF:other, proportion.
The width of each Gaussian field is, therefore, dependent on its segmentation and
possibly an estimate of the PVE’s influence. The result is, that the more GM a pixel
contains, then the wider the Gaussian, and the greater the expected variability. The
Gaussian fields are not, however, fully specified. What is important is their chosen centres.
These are taken from the prior image estimate, which could then be the PVE corrected
image, as opposed to adopting simple tissue mean values. The effect these additional
constraining terms have on the reconstruction is shown in the bottom row of figure E.1, and
elaborated upon for a small ROI in figure E.2 (these figures are to be found in appendix E).
These are discussed in the following section.
10.5 Results of using the Alternative Gaussian Prior

The results of using the above presented scheme are given in figures E.1 and E.2 for one
segmented set, and figures E.4 and E.4 for another (appendix E). In both cases, the
prior is taken from a least-squares fit of the MRI segmentations to the PET data using
activity ratios of 10:3:1 for GM:WM:CSF and an approximation of the PSF relating the
PET resolution to that of the MRI (it is shown in the far right, below the associated
MRI image). This is equivalent to the method of [Labbé et al. 1997] where only 3 tissue
compartments are used.
The first of these (figure E.1) shows (from left to right) increasing iteration steps
beginning at the same starting estimate, an OSEM reconstruction using 5 iterations and
8 subsets (shown in the middle on the far left). The images are each 128-by-128 pixels in
size, where these are each 3.43mm2 . On the top row is the Bayesian reconstructions for
K = 50 and the Cj terms being estimated from equation 10.14. This choice of K from
equation 10.12 is a conservative one, but one designed to retain activity in the GM regions.
This it achieves, yet the GM segmentation imposes shape on the reconstruction solution,
and activity is seen to be enhanced in the cortical regions. The second row shows the
reconstructions using the EM-ML algorithm, which is of course equivalent to the Bayesian
method for K = ∞. The next row employs the entropy image (also shown - bottom
right) to add an additional constraining effect as discussed in section 10.4.1 above. This
should predominantly effect edge regions in the image, but as such regions are also those
most susceptible to registration errors, any error here will mask its contribution. This
seems to explain the lack of effect that is resulting from a theoretically and ideologically
sound procedure. The entropy image was taken about a 3-by-3 pixel window in the MRI
image, and the values scaled between 1 ± 0.6, as given in equation 10.11. The last row
instead shows the additional constraints of equation 10.15, designed especially for cases
of registration error. The results here are more pronounced, as highlighted for a specific
10.5. Results of using the Alternative Gaussian Prior 115
ROI shown in figure E.2. Here we see that we can allow the GM distribution to remain
essentially driven by the likelihood algorithm, yet we can tightly constrain the remaining
regions in order to improve the contrast between the tissue regions.
More important are the results of figures E.4 and E.5. These show the same algorithms
using a better registration (done by hand). The reconstructions now make terrific sense.
In each case, the choice for the fiddle factor K was deliberately conservative (45 and 60)
such that the activity in the GM regions is given almost “complete” freedom to assume its
true value. Freedom is, however, not total, as the Gaussian fields do ensure that extreme
variance (i.e., what could cause the checker-board) cannot occur. But, the use of the
additional constraints on the WM and CSF distributions will give us the good contrast
GM/WM and GM/CSF borders, which is therefore a PVE correction. This is a very
desirable effect, and one that again can only be achieved using the formulation of this
chapter.
In this respect, the algorithms show more than significant promise. The key to the
method is the flexibility allowed in applying a Gaussian field to each pixel and the heuristics
realised from using equation (10.14). A PVE corrected image can then be applied as the
prior, and our confidence in its accuracy can be used directly to set K of equation (10.12).
And as the results of figure E.2 emphasise, the point to the method is that wherever
good heuristics are to be found, then these can be directly built into the algorithm as
the necessary framework to accommodate this has now been developed. It is therefore
envisaged that discussion with neurologists and clinicians would prove fruitful in this
respect, as they are the people to involve on the further refinement of the algorithm.
Although the heuristics that were applied to derive the means of equation 10.14 should
constrain the Gaussian fields according the tissue type (i.e., greatest variance allowed for
GM, and so on), as was mentioned above, segmentation and registration errors will effect
these. This effect is especially important in the case of the additional entropy term. In
appendix E, figure E.3 shows the variation occurring as a result of the entropy measure.
Each row shows plots for the energy term (which is to be minimised) and the log likelihood
(which alone would be maximised) at iterative steps of the algorithm. This is the same
experiment as that shown in figure E.1, but with additional values for the K term. The
left hand-side (the first two columns) shows in each row first the energy value and then the
log likelihood value for K equal to 45, 50 and 55, each time without the application of the
entropy values. The right hand-side shows the same but this time with the application of
the entropy values. Only five iterates are shown as each time the algorithm very quickly
settles toward a solution, after which no significant change to either the energy or log
likelihood values is observed. Interesting is that for some values of K, the entropy measure
has the effect of both increasing the log likelihood value whilst decreasing the Gaussian
field’s energy. This effect does not occur for higher values of K, however. The consistent
reduction of the energies should be noted, which relates of course to the tightening of the
constraints to the prior where variance in the structure is high. In such regions it can be
expected that the PET data alone is likely to differ significantly from the value in the [high
resolution] prior. And hence where the constraints are relaxed (in homogeneous regions),
one would assume that the distribution of the prior is already similar to that in the PET
signal, hence the entropy term will have anyway little effect in this region.
10.5.1 Closing Remarks Regarding Hyperparameter Estimation

It is likely however that only phantom studies will reveal the appropriate ranges for the
hyperparameters of the algorithm, which in part is a task of validation. Automated, data
driven methods are to hand (see, for example [Leahy and Qi 1998] for an overview), but
these methods are often over-complicated and subsequently unreliable for our needs. It
might be best to determine the choice of σ (the fiddle-factor), which relates to the tissue
affinity of the given pixel as well as the use of the entropy image, on a more global basis.
In many respects, this is the task of the K term, a more global parameter about which
the segmentation and entropy image sets a variation. In all cases, as σ → 0, we increase
the bias toward our prior image; and as σ → ∞, the algorithm approaches the conventional
EM-ML approach as given in section 7.5. One possibility for making the variation more
global would be to select this value according to the confidence of our [global] least-squares
fit of equation 3.13. The algorithm we use to derive this fit (NAG fortran routine f04jaf )
minimises the sum of squares of the residuals, as given in equation 3.10. The algorithm
returns this result,
s
rT r
σnag = , (10.16)
J −k
where r = p − A · û (the residual to the solution), J is the number of rows in the

matrix that contains the segmented MRI image combined with the basis functions (it is
therefore the number of pixels in the image), and k is the rank of the matrix, where if
k = J, then σnag will return 0. Hence, in equation 10.12, it may be possible to relate this
global measure on the goodness of fit, to C, the globally applied value that determines
the FWHM of the Gaussian field. This is just one suggestion for allowing the overall
confidence in the prior to influence the final reconstruction.
The appropriate steering of the algorithm (between the prior and the EM-ML esti-
mate) is then the subject of this discussion, which firstly digresses to include a review of
something similar that was attempted by [Ouyang et al. 1994].
Ouyang et al. 1994 Akin to the approach of [Ouyang et al. 1994], we also adopt
a scheme that allows the incorporation of a broad range of possible prior information.
Their work models intensity variation using Gibbs distributions compounded with line
sites to capture sharp activity steps given knowledge of the underlying tissue boundaries.
The work extends the usual line site approach (see [Gindi et al. 1991]) in order to avoid
possible estimation errors occurring where there are discrepancies in the anatomical and
functional boundaries. That is, the choice is similar to that of our own: either reconstruct
strictly in accordance to the information that the prior is giving you; or, alternatively, in
cases where the information given in the prior appears inconsistent, reconstruct with the
PET data alone. The resulting “weighted line site” method achieves this by considering
the joint probability distribution of the structural (CT or MRI) data and the PET data.
Considered is the cross-correlation between the anatomical and functional boundary maps.
When either of these probabilities - of a functional edge and of a structural edge - fall
below a set threshold, then one or both indicates no edge, and the weighting is minimal.
Otherwise, the weighting increases with the joint probability, and hence the anatomical
boundary value must have the agreement of the functional value before the line potential
is determined.
10.6. Conclusions, Extensions and Future Work 117
This is non-Bayesian and ad hoc By the author’s own admission, the approach taken
in steering the solution between the two extremes is sensible, but ad hoc. This is because
the modification done only affects the line site’s conditional distribution, while leaving
the conditional density of the intensity site unchanged. An ideal procedure would also
modify the potentials across the cliques, but these are fixed according to the pre-defined
boundaries from the anatomical image. That is, their weighting is applied after these
potentials have been calculated, although the intensities in the cliques are, of course, free
to vary at each iteration (within the constraints of the Gibbs distribution and possible
edge). It is basically a post-check for correctness.
What this paper lacks, however, is a comparison of the weighted line site to that of the
normal line site approach. Tests were organised such that false line sites were included,
but no conclusion is given for the case where true PET boundaries were evident within
homogeneous anatomical images. We conclude then that this is indeed a relatively ad hoc
weighting scheme applied to the prior in order to suppress anatomical edge influence if
unsupported by the functional data.
10.6 Conclusions, Extensions and Future Work

The method that has been described in this thesis for selecting the the widths of the
Gaussian fields has managed to incorporate good heuristics, and without new ideas, it will
be difficult to improve on this. The values derived in equation 10.14 will allow greatest
variation in GM, less in WM and less still in CSF if the segmentation is correct and it is
indeed the case that GM shows greater activity than WM, which in turn is greater than
CSF. This will almost certainly be the case, but one must be aware that in pathological
studies it may not hold true. Tracer uptake can be significant, for example, in WM
tumors, and this will only be visible in the PET data. But here is the advantage of the
scheme presented. The values derived from equation 10.14 are data driven, and if the MRI
segmentation classifies the tumor as WM, then the Gaussian fields associated to the WM
matter will widen as a result of the increase in activity in this region. It is only in the
case of errors in the segmentation and registration processes that this scheme will fail, and
the fields are likely to average eachother out. And hence the proposal of equation 10.15,
which is shown in figures E.1 and E.2 to allow the GM distribution to remain unchanged,
but in WM and CSF regions, the distribution is forced toward the prior. The effect seen
is exactly that which is desirable: better contrast definition around “busy” regions in
the MRI where the GM borders other tissues. In this way, the solution found for the
PVE corrected image is not based on a correction of the GM distribution, but instead
corrects the distribution in the neigbouring regions. Again, the method is susceptible to
registration errors, so it must be applied with caution.
Hence it is the appropriate selection of K and σj which are extremely important to the
algorithm proposed. Without the ability to analytically derive the normalisation terms
p(y) of equation 9.2 (the evidence term) and Z of equation 9.20 (the partition function), it
is not possible to estimate these values in the optimal Bayesian sense3 . Instead, its accurate
selection can only be determined using studies where there is a valid ground truth according
with which the algorithm’s development can be guided. The above method arrived at a
3
And even if we could produce the “optimal” reconstruction, the Bayesian formulism would offer us
more than one for a single model!
scheme for the choice of the σj , which is based on the following heuristic considerations:
• σj should be narrow wherever there is evident structure in the MRI data alone; i.e.,
where the entropy indicates this.
• σj should remain unchanged in cases where both the MRI data and the prior (the vir-
tual PET image) show no significant variation either way. (An “in between mode”.)
• σj should increase where the MRI entropy indicates homogeneous tissue regions.
• σj should be largest where MRI is homogeneous and there is significant signal in the
data. That is, this is assumed not to constitute artifact.
It was not possible, however, to have absolute confidence in the validity of the virtual
modality PET image of section 3.1, despite what the literature in chapter 2 tends to
indicate. There are limits in precision to the intensity transformation that are likely in
some places to be locally exaggerated. And hence the effort put into the reconstruction
methods in order to develop a scheme capable of including the additional constraining
methods necessary to alleviate these problems.
Limits in the precision do, to a large extent, reflect the limits in the assumptions that
can be made in relating activity distributions to tissue types. In cases of uncertainty, one
must err on the side of caution, as it would be folly to introduce a signal that was not
originally there. This consideration includes the algorithms performing the segmentation
and registration steps that may significantly increase inaccuracies and the level of uncer-
tainty. As such, each stage of these algorithms should be carefully validated, as errors will
propagate into the one described in this chapter.
Such errors unfortunately highlight the well known shortcomings in all such cross-
modality reconstruction methods. This work’s adoption of a fuzzy segmentation method
coupled to a summation of basis functions in order to estimate the underlying activity
distribution does, to a large extent, exhibit some robustness with respect to the latter of
these two issues. With respect to errors in the registration of the different image sets, then
the algorithm, like its counterparts, reveals its frailty. Errors need be only very slight to
render the associated MRI data as all but useless.
A final problem with such methods is the computational overheads involved. The
Bayesian algorithms are very slow, and any improvement that might be offered in terms
of the aforementioned correct normalisation using the partition function and evidence
terms slow the algorithms yet further. Given the delicacies of the algorithms and the
difficulty of a divide-and-conquer implementation4 , not to mention the costs associated to
the required additional scans, one must be realistic and concede that their adoption in the
clinical environment seems, for the time being at least, unlikely.
But on a more positive note, the algorithm developed in this chapter has outperformed
that of the [Sastry and Carson 1997] paper, and does avoid the homogeneity assumption.
It is also O(n) times quicker than that algorithm (where n is the number of tissue com-
partments used), and the scheme incorporates an automatic approach for choosing the
Gaussians that has better built in heuristics. The computational overheads are troubling
4
It would seem that using only a subset of the sinogram data, as in [Hudson and Larkin 1994] for
example, introduces an information mismatch with respect to the prior, requiring that it be appropriately
scaled to balance this. We return to this issue in chapter 12.
10.6. Conclusions, Extensions and Future Work 119
though, so the following two chapters looks to these in their various forms. More important
in the next chapter, however, is the unbiased use of prior information.
11
Multiresolution Representations
Prior information applied within the Bayesian framework - in the form of an energy func-
tion - has shown itself to be a meaningful approach to the reconstruction and correction of
PET data. An alternative direction presented in this chapter is a multiresolution approach
for which two strategies are proposed.
The first is concerned with the more [traditional] pyramidal approach in which ini-
tial reconstructions found at a coarse resolution are fed as estimates to the next higher
resolution. The second is slightly more novel. It varies the resolving ability of the recon-
struction algorithm according to local properties in the image data. As shall be clarified
in the following, the intentions of both are to (i) better condition the reconstructed image
data, and (ii) improve the convergence properties of this optimisation process. We should
now be aware that ML-based reconstructions are critically dependent on the chosen pixel
resolution, and in general badly degrade at “clinically interesting” resolutions [Geman et
al. 1993]. It is the primary task of this chapter to investigate such issues, which will
involve the combination of many aspects of the previous chapters into attempts at further
improving the reconstruction quality of the PET data.
One issue here is that the traditional approach of EM-ML reconstruction is viewed as
not fully utilising the potential of the EM algorithm. That is, the algorithm does not quite
treat the PET reconstruction problem as one involving incomplete data, yet instead simply
reformulates the approach as a missing data problem for purposes of numerical convenience
in bridging the geometric transition between the projection and cartesian spaces. In the
EM-ML scheme, the incomplete data, (ψij of equation 7.34), are not directly measured,
but nor are they are “missing” in the sense of some mis-recording, but instead due to the
superpositioning of the Poisson photon rays. Hence the inability to observe the data emit-
ted at a known pixel position having been registered in a known detector pair directly. It
would be a terrific achievement to be able to reformulate the PET reconstruction problem
as one in which a significant amount of the expected data really is missing, where the use
of prior knowledge about the tracer distributions could more appropriately augment the
observed data. This lends itself to the notion of a superresolution PET data set, with the
ultimate aim of resolving features of interest at a finer scale than is possible globally, by
progressively inferring more unknowns as and where our model and data allow this. This
theme is dealt with in section 11.1.3.
Many arguments for the development of a multiresolution representation are based on
the spatially variant nature of the measurement process. This concerns both the density
of sampling and observable data, and the convergence properties of the reconstruction
process). Uniform resolution - it seems - would be undesirable, especially if an improve-
11.1. Resolution, Convergence and Noise Properties of PET Reconstruction 121
ment of PET’s ability to resolve fine detail is required. Being able to adapt the resolution
properties in accordance to the data is another objective of this chapter, which is achieved
only through the development of a novel, anisotropic representation of the image space.
Section 11.2 formulates these ideas into a working algorithm, where once again, the asso-
ciated high-resolution prior image is important to the final procedure. Nonetheless, this
section also shows that simple, localised estimates of the SNR can too be successfully
utilised to determine a more suitable resolution, an improvement offered without the need
for a prior information. An attractive proposition indeed.
In this chapter, computational issues must take a back seat to our current concern.
The divide-and-conquer paradigm - which normally provides the motivation for such mul-
tiresolution schemes - is actually better suited for efficiency yields when applied to the
sinogram data. This is shown in the next chapter, where a discussion of computational
costs can be no longer postponed if the algorithms developed in this thesis are to gain
clinical acceptance.
11.1 Resolution, Convergence and Noise Properties of PET

Reconstruction
Typically, different regions of the image will converge at different rates [Liow and Strother
1993]. The convergence rate is object dependent, and so too is the resolution and noise
characteristics. [Wilson et al. 1994] analysed the noise properties of the EM-ML algorithm,
confirming object dependency, and rather surprisingly that mean and noise-free resolutions
were equivalent. [Tanaka 1987] presents the more intuitive result for EM-ML reconstructed
images that instances of higher frequencies in the image data converge more slowly than
the lower frequencies. This is the motivating factor behind the multiresolution approach
of [Ranganath et al. 1988, Raheja et al. 1999]: attempts are made at exploiting the fact
that the lower frequency components of the image can be reconstructed faster than the
higher frequency ones. [Liow et al. 1997] conclude for phantom studies reconstructed
with the EM-ML algorithm that resolution is independent of object position [in the FOV],
although this independence is not shared by object size - the effective resolution increased
with decreasing size due to the slower convergence rates for smaller objects. [Stamos et al.
1988] found a dependence on the surrounding structure for ML estimators, but failed to
consider any object other than a delta function for the actual measurements. [Wilson and
Tsui 1993] cover similar background, although the results are not so insightful: factors
influencing the resolution using EM-ML reconstructions are said to include the iteration
number, accuracy of the system model, and the position in the FOV. The resolution
itself may be interpreted as a function of position in the scanner’s FOV [Hoffman et al.
1982], which relates to the sampling of the FOV with the LORs. But resolution may
also be improved according to the availability of better statistics [Miller and Wallis 1992],
yet [Fessler and Rogers 1996] finds the opposite to be the case when using a penalised-
likelihood reconstruction algorithm (see section 8.2.1): “Paradoxically, for emission image
reconstruction, the local resolution is generally poorest in high count regions”.
The above clearly shows enough disparity in the conclusions to make the drawing of
any general opinion quite difficult. It is, however, clear that one stage of this project had
to involve a study of the resolution and convergence properties of the statistical algorithms
(with and without regularisation), as from the literature alone we can not be certain of the
122 Chapter 11. Multiresolution Representations
role played by object dependence. This implies the earlier convergence of lower frequency
image components. So would the reconstruction of a multiresolution PET data set allow
homogeneous convergence? We are sure, however, that resolution can be improved with
the availability of better statistics. Am I thus to assume that the convergence of the
EM-ML algorithm occurs at a time dependent on the statistics supplied by the emissions
themselves? If this is indeed to be the case, then one might consequently hypothesise
that the use of a spatially invariant approach to resolution is inappropriate (the notion of
average resolution anyway seems improper [Hoffman et al. 1982]), even in cases where low
variance convergence is achievable (i.e., the result of using penalising terms, for example).
In respect of the acquisition process, a prominent feature is that the acquired data is of
unevenly spaced samples; finer sampling is to be found toward the centre of the scanner’s
FOV. Image variance is determined by the number of counts collected during the scan,
and this, recall, also influences our resolution. Thus, compounded by the typically variant
distribution of unevenly sampled counts, the notion of a single, “optimal” resolution does
seem unsuitable. So, beginning at a very low resolution representation of the emission
density, the following methods are aimed at continually deriving the higher resolution
equivalent, but to do so only where our statistics allow this. And as this is likely to
show significant variation within the image field, the remaining sections of this chapter
investigate spatially variant representations of the image space.
11.1.1 The Basic Multiresolution Implementation
The multiresolution proposition is to work from an extremely low resolution reconstruction

(typically 2-by-2 pixels - in 2-D) toward a higher resolution reconstruction using the EM-
ML algorithm. That is, we work on a number of different image planes, where each image
plane is of a fixed, isotropic resolution. The resolution for the next image plane would
ideally be chosen to yield homogeneous convergence properties. The implementation thus
requires a different system matrix for each resolution stage, each describing a seemingly
different geometry. In respect of the limiting factor to the available resolution, voxel sizes
that are too small will simply lead to an over-parameterisation of the algorithm, which
basically means numerical instability. It is desirable to approach this limit using the PET
data alone, and then to see if it may be improved using MRI priors.
The following discusses the multiresolution implementation of the ML reconstruction
algorithm of section 7.5, where the process uses the starting estimate of a down-sampled,
back-projected image, typically reduced to a 2-by-2 pixel plane. It first iterates to a
solution at each current resolution (that of the current plane), the results of which being
taken as the starting estimate of the image on the next, higher resolution plane. As a rule,
missing values of a multivariate dataset may be replaced by the observed mean for that
variable (i.e., averaging, or piece-wise constant interpolation), or by some predicted value
from a regression model (using, for example, bicubic interpolation). As we will see, the
former case is rather naive and actually distorts the covariance structure of the dataset,
driving estimated variances toward zero; on the other hand, the use of a regression model is
said to exaggerate observed correlations [Schafer 1997]. Such considerations are important
in the use of this multiresolution estimation scheme.
11.1.2 Initial Interpolation Methods

The initial approach to the up-sampling and down-sampling used is shown in figure 11.1
for a simple averaging interpolation method. The first improvement to this scheme is
to instead fit the interpolated data to bicubic splines, thus ensuring the more visually
satisfying result [Maeland 1988, Press et al. 1988]. In this respect, the ideal interpolation
method for band limited data is sinc interpolation, but to do this properly requires using
all pixels in the image just to estimate the value (i.e., the fit to the spline) at a single
interpolation point. This is obviously computationally expensive, hence compromises have
involved, for example, sinc interpolation using a Hanning window. But for the purposes of
the study presented here, I do not think that this level of sophistication is yet necessary.
The interpolation methods instead take us from the most simple (piece wise constant), to
the most popular (bicubic splines), to a scheme for inferring PET pixel intensities at the
resolution of the MRI data (that of section 11.1.3).
Case 1: Initialisation. Case 2: Moving to the Next Resolution
20 20 8 8
8 6 3 0 4 4 4 4
FBP Result
2 9 5 1 20 20 8 8
scannerDiameter = 4 4 4 4 4
resRatio = 2 24 24 9 9
4 8 3 3 4 4 4 4
7 2 2 1 24 24 9 9
4 4 4 4
"Converged" Regions
(i.e., the procedure is the same.)
(8+6+2+9) 9
MLMR Estimate 20 8
currentResolution = 2
21 9 24 9
K Iterations of the EM Algorithm...
Figure 11.1: How the routine updateEstimates takes the images from one resolution to the next
using simple averaging. Case 1 occurs when we begin by estimating the first values of the EM-ML
reconstructed image from the back-projection algorithm. We must therefore reduce the resolution.
Case 2 is the result of increasing the resolution, where, in the case of the anisotropic resolution
implementation, we must also consider whether or not the regions are said to have “converged”.
As the text explains, this simple averaging (piece-wise constant) interpolation scheme leads to
significant block artifact in the reconstructed images, and erosion at their periphery. (In the
figure, MLMR denotes ML multiresolution.)
So the current implementation uses three different schemes, the results of which are
shown in figure 11.2. Clearly, the simple distribution of the values for a neighbourhood
over the new neighbourhood such that the combined intensities sum to the same value
is unsuitable (as shown in figure 11.1). Attractive though its simplicity may be, such
an averaging approach to interpolation is rather ad hoc, and results in the block artifact
we would wish to eradicate. Note also that the use of a multiresolution scheme initiated
at the lowest possible resolution (2-by-2) is unwarranted, and exaggerates the artifact,
resulting in additional loss of information; i.e., the missing chunks toward the edges. This
resulted from the averaging of the pixel values over regions that were mostly background.
Once lost - or thinned out over this area - the information is not recovered. Interestingly
enough, this may be desirable in cases where images exhibiting block-like structures are
to be reconstructed. In this instance, [Zou et al. 1996] are indeed able to produce a better
quality reconstruction using just such a method, although anything other than a mere
cursory glance at their results is enough to see that the positioning of their phantom must
actually lie with its main plane along the image’s central axis in order to achieve this
improvement!
As can also be seen in this figure, the bicubic method of interpolation [Bartels et
al. 1987] is the most suitable for our requirements. Nonetheless, experimenting with this
scheme failed to reproduce the results given in [Raheja et al. 1999], who cite improvements
in both image quality and computation overheads. But a main theme of this thesis has
been the use of additional side-information of MRI data, so just how this is used in the
interpolation scheme is given in the following.
No Multiresoltion Bicubic Interpolation Using Priors Interpolation Averaging Interpolation
FWHM = 1
FWHM = 2
FWHM = 3
Figure 11.2: The above images are the result of the EM-ML multiresolution scheme that is
described in this chapter. The images shown are in four columns, where the first column shows
normal reconstructions for purposes of comparison (i.e., reconstructed at a single resolution); the
second column gives the multiresolution reconstructions (starting with a 2-by-2 image grid) using
bicubic interpolation between levels; the third uses the MRI prior to weight the interpolation
values otherwise estimated using an unweighted average (an example EME algorithm); and the
final column shows the results of such an unweighted average, namely the method of figure 11.1.
Starting at a 2-by-2 image has meant, however, that the resulting images using the averaging based
interpolation (those in the 3rd and 4th columns) can simply lose regions of the image that were
averaged out at very low resolutions. See the text for explanations.
11.1.3 An Expectation-Maximization-Expectation (EME) Algorithm?

Associated to our PET data are MRI scans, so it is clear that we do not need to augment
the data with observed means. The images are typically more than twice the resolution of
the PET data, and when segmented into the compartment regions (GM, WM and CSF),
they may offer least-squares fitted estimates for tracer activity levels. Application of the
anatomical data in PET reconstruction increases our knowledge concerning the likelihood
of any particular reconstruction that is formed. This is also used, for example, to en-
courage intensity gradients where there are known structural boundaries in the MRI data,
or by penalising gradients in the PET distribution from occurring within homogeneous
tissue regions; some regular distribution of activity is thus encouraged, for example, by
characterising it with a MRF [Gindi et al. 1991], or by the methods of chapter 8. The
proposal of chapter 3 was to develop a more realistic map under the assumption that there
exists a spatially variant, nonlinear function that is capable of transforming intensities in
the MRI data into those of the PET image [Friston et al. 1995]. In the multiresolution
scheme, we now have a new application for the MRI data, and that is in the interpolation
method. Outlined in the following then is an alternative to the methods of chapter 9 for
incorporating the structural MRI data into the reconstruction, the results of which are to
be seen in the third column of figure 11.2.
Firstly, we assume that we have a MRI data set of twice the resolution of the PET data
set (i.e., of a higher resolution). This means that on the basis of a reasonable segmentation,
we have prior knowledge concerning the tracer distribution at this higher resolution (i.e.,
at the resolution to be interpolated to). On this basis, we are able to build an image
of predicted activity levels [Sokoloff et al. 1977, Knorr et al. 1993, Ma et al. 1993,
Kosugi et al. 1995, Rousset et al. 1993b, Yang et al. 1996, Kiebel et al. 1997], as was
seen in chapter 2. Let us denote this prior knowledge as m ~ η , which gives the probabilities
of activity based solely on the segmentation results and our knowledge of tracer/tissue
distributions1 . Continuing with our use of the EM-ML algorithm, we can employ an
additional expectation step to infer activity values for the PET image at a resolution
approaching that of the MRI data. (This is well documented in the original paper due to
[Dempster et al. 1977], as well as the review of section 7.3.) Alternatively, this can be
thought of as the interpolation step.
Working in 1-D, we will look only at four PET observations (λ0 , ..., λ3 ), and eight
segmented MRI observations (m ~ η (0), ..., m
~ η (7)), each of which being a vector of three
class affinities to the tissues GM, WM and CSF indexed by η. We are thus able to directly
follow the original example documented in section 7.3 of this thesis. The tracer distribution
that is responsible for the observed data is λ, and we wish to estimate the next resolution
step, the incomplete data, which we denote as φ = φ0,...,7 . That is, each observation of λ
must be split into two observations in accordance to the estimates given by the MRI data.
Assuming a multinomial distribution, we say that the likelihood of φ is simply:
(φ0 + ... + φ7 )!
L(φ) = Pr (φ0 )φ0 ..Pr (φ7 )φ7 , (11.1)
φ0 !....φ7 !
where the Pr () are the probabilities of the individual observations derived directly from
the segmented MRI data and their activity ratios weightings (i.e., g : w : c = 10 : 3 : 1).
1
Ultimately, we should associate to each observation a likelihood value associated to a Gaussian field
centered on these initial estimates (that is, akin to the methods of chapter 10).
That is, for φl , where l ∈ 0, ..., 7, ignoring issues of edge-wrapping, we must first derive
following:
P
η∈{g,w,c} η.mη (l)
Pr (φl ) = P 1 P (11.2)
η∈{g,w,c} η.mη (l) + N η∈{g,w,c} η.mη (l + 1)
where the pixel at λl/2 will be split into ψl and ψl+1 . There is now the possibility of
adopting - after each iteration of the existing algorithm - this additional pseudo-E-step
(or inference) to estimate the higher resolution PET distribution.
In 2-D, each λj has associated to it four cell probabilities, which will allow us to estimate
our complete data set, the higher resolution image. Calling again φl the complete data set,
this now denotes the total number of emissions occurring in pixel l, where this digitisation
is at twice the [2-D] resolution of the PET data; i.e., l = 0, ..., (J ∗ 4) − 1. Within the EM
framework, we say,
X
λj = φn×j , (11.3)
n∈Nj
where Nj is the neighbourhood of [four] pixels at the MRI resolution that constitute a
single pixel at the resolution of the PET data. This is simply the basic probability result
that defines this additional E-step, in which we say:
Pr (φkj+p )
φk+1
j+p = λj , (11.4)
Pr (φkj )Pr (φkj+1 )Pr (φkj+2 )Pr (φkj+3 )
where p = 0, ..., 3. This idea is shown in figure 11.3.
An Example Multiresolution Algorithm

One implemented version is given below in pseudo-code for purposes of illustration. More
subtle aspects, such as the inclusion of Markov chains linking the pixels across resolutions
(see [Rehrauer et al. 1998]), could now be included, or at the very least, the method
should be combined with the bicubic interpolation (i.e., weighted bicubic interpolation).
Note, that at each resolution iteration r, we will operate at an initial resolution (indexed
by j), and then step to the higher resolution (indexed above by l) each time. R denotes
the number of different scales we will have, which will be in the region of 4.
Initialise φ0 as a reduced resolution FBP PET image;

for each r in 0 to R-1;
Create system matrix Ar ;
λ = φ;
Within a local loop do;
PI−1 aij yi
λkj = λk−1
j i=0 P J−1 k−1 ;
j 0 =0
aij 0 λj 0
end;
Determine φr+1 according to equation 11.4 above;
for each j in 0 to J-1;
if variance within each jth pixel < threshold;
Label as having converged.
11.2. Anisotropic Image Representations 127
0.3 GM 0.3x10+0.2x4+0.3 0.5x10+0.1x4+0.2

0.5 GM
0.2 WM 0.1 WM total probabilites total probabilites
0.3 CSF 0.2 CSF
0.2 Other 0.2 Other =0.1775 =0.2424
138
0.4x10+0.2x4+0.1 0.8x10+0.1x4+0.1
O.4 GM 0.8 GM
total probabilites total probabilites
0.2 WM 0.1 WM
PET Estimate at Iteration k. 0.1 CSF 0.1 CSF
0.3 Other 0.0 Other =0.2121 =0.368
This is the result of the Maximization-Step MRI Segmentation at Resolution k+1. Probabilities Used.
of the ML reconstruction. Determined on a 10:4:1:0 ratio
for GM:WM:CSF:Other.
138x0.1775=
24.5 34.5
29 51
PET Estimate at Resolution k+1.

These estimates are then input to the next
Expectation-Step of the ML reconstruction.
Figure 11.3: The above figure shows how we infer a higher resolution when prior information is
available. This actually results still results in considerable block-artifact, but less than that using
no prior information (see figure 11.2). It implements an additional expectation step for the data
being reconstructed.
end;
end;
Basing the inference step on a standard probability result is actually quite crude, but
this is the basis for the E-step in the EM-ML algorithm (section 7.3.2). There should
be appropriate constraints set on the “redistribution” of this data, in the sense that the
overall distribution will not follow any particular smoothing model. We also fall into the
trap discussed in [Schafer 1997] of seeking only to preserve the sample means while actually
distorting the covariance structure and biasing estimates. These, and other related issues,
can only be the subjects of future research, as the disappointing results led instead to the
scheme given in the following.
11.2 Anisotropic Image Representations - Toward Noise Sup-

pression and Resolution Recovery
To reiterate, one goal of statistical methods of image reconstruction is to achieve resolu-
tion recovery and image uniformity by modelling the characteristics of the signal along
with response of the system in a stochastic manner. The latter is specified in the defini-
tion of the system matrix - the transformation matrix used to describe the tomographic
process - and the more accurate the specification, the better the possibility of recovering
the original distribution. To address the unfavourable aspects of the statistical methods
(regarding convergence and ill-conditioned properties), one investigation involves allowing
this matrix to describe a variant resolution tomographic system, where the local resolution
is determined in accordance to the quality of the data. This firstly implies a better condi-
tioning of the reconstructed images (of lower noise and artifact), whilst secondly achieving
a reconstruction that is closer to the resolution of the acquired data.
11.2.1 Allowing a Convergence of the Resolution

The original idea for an anisotropic resolution scheme is communicated in figure 11.4. It
involves a number of image planes, the first of which begins with a uniform resolution.
Dependent on local tests for “convergence”, this resolution may or may not be held fixed
on the next image plane. Supplied as input to the reconstruction software is a threshold
value (conThresh). This sets a limit on the variance shown by each pixel estimated across
the iterations at each resolution level. If the variance is less than our threshold value, then
that pixel is considered to have “converged”, and its resolution, and originally its intensity
too, remain fixed for the remainder of the reconstruction.
Important to this method is the initial hypothesis that a homogeneous convergence
should be sought, yet how should this be done? Namely, how does one define “conver-
gence”? I finally came to the conclusion that the image should not be allowed to converge
separately, as there is too much inter-pixel correlation in emission tomography images.
Instead, regions showing little variation should not go to a higher resolution, but they
will still be update-able. That is, on the basis of combined probabilities summed over
the region. For example, ain (the probability of an emission occurring in the pixel of
the nth region being detected in detector pair i) forPa region that at the previous lower
resolution showed insignificant variation2 becomes j∈N aij , where N now denotes the
neighbourhood spanned by four pixels, indexed by j. This results in the desired anisotropic
resolution, but experiments again suffered from block artifact. In this case only a clever
adaptation of a cubic interpolation method will remove this. The short conclusion was that
traditional use of pixel and piecewise constant multiresolution approaches is unsuitable to
an anisotropic representation, requiring, therefore, an entirely new strategy.
The use of basis functions in the interpolation method used to derive the system matrix
coefficients (section 5.2.2) offers two interesting possibilities, however, and these are given
in the next section.
11.2.2 Adaptive Interpolation for an Anisotropic Reconstruction

In looking to vary the resolution on a single image plane, we are able to cite two general
directions to the work. The first involves making no assumptions about the data’s distri-
bution; that is, the case where no prior information is used. One must then look first at
the properties of the convergence and noise distributions alone. Important to note is how
variant they are across the image space, as well as how dependent this is on the image
resolution. Noise must also be carefully treated as no prior information means no implicit
2
That is, stability in its estimated value across iterations.
Converged during the

Third
second level of iterations
Pixel Converged
during the first
iterations.
Figure 11.4: The above figure communicates the notion of a straightforward anisotropic resolution
implementation. The idea is that the reconstruction algorithm begins at a very coarse resolution
and iterates to a quick solution. During this time, certain pixels are said to have “converged” (see
text), in which case, their size is fixed for the remainder of the reconstruction process, although
they may still be updated. This process continues by stepping to the next, higher resolution,
for which it again seeks a solution, again checking if pixels have “converged” on the way. This
continues until the final resolution is reached.
regularisation, and the consequent noise resulting from the overfitting of the MLE is sig-
nificant. The idea behind the multiscale approach is that it should form its own implicit
regularisation. The second direction of investigation is to employ regularisation and the
more explicit forms of prior knowledge within the multiscale setting. For both, the goals
of the reconstruction method are as before - noise suppression and resolution recovery.
Given in the following is a specification of how this is to achieved.
In relating the LORs to the pixel sites within the FOV, section 5.2.2 adapted the
nearest neighbour method of interpolation to account for uncertainty in the emission
source. This has shown to produce excellent reconstruction results with the standard EM-
ML algorithm (section 7.5), regularising in particular the effect of high frequency noise
at increased iteration number. The free parameter in these methods is the width of the
Gaussian kernel that is used to express the aforementioned uncertainty, and although the
choice of a suitable value is clearly related to the dimensionality of the pixel grid, the effect
of adapting its value to suit localised conditions in the image field is desirable.
By increasing the value beyond the assumed optimum, we notice that the images
appeared smoothed. Conversely, a decrease will increase the variance. It would seem
therefore that the choice of kernel size located at each pixel site determines how focused
the reconstruction becomes. In regions that are more interesting it would make sense to
keep the kernel width small. On the other hand, regions that are of little interest should
be covered using larger kernels. As is demonstrated in the remainder of this chapter, such
adaptive regularisation can be automatically achieved on the basis of either the PET data
alone or using prior information. In the former case, the proposition is to estimate the
noise on a regional basis, and where dominant - with respect to the signal - then the kernel
width should increase accordingly. The second possibility uses the MRI data, but here in
an unbiased manner. A segmentation of the MRI data into its GM component is basically
a map of interesting regions. The suggestion is, therefore, to reduce the kernel where
there is more GM, and increase the kernel size away from such regions. This builds prior
information into the likelihoods assigned between pixels and LORs, but it is the measured
data alone which determines the resulting reconstructed image.

The hypotheses driving the methods of anisotropy proposed in the sequel, is based on
the following. For regions that are clearly of less interest (i.e., those made up of inactive
tissue, of showing high noise), then we may relax the ability to decipher detail in these
areas in order to improve the regularisation. The resulting smoothing effect will increase
the SNR in these regions, which, given the correlated nature of emission tomographic data,
is likely to benefit other regions of the image. This was an effect first observed by [Llacer
et al. 1991], who found that regularisation introduced in one area of the imaging field
showed the tendency to improve the results of a reconstruction elsewhere. We assume
that the checkerboard effect occurs only after the point of true convergence. Given that
convergence is object dependent, we must regularise the regions that converge fastest.
Pixel Grid x
Single LOR
(b)
(a)
Figure 11.5: This figure shows how the interpolation method may vary in accordance to either
specific ROIs in the image or local estimates of noise. The kernels are said, therefore, to be
adaptive. In this figure, we see a single LOR crossing the image grid. The interpolation method of
nearest neighbours is used to decide which pixels are intersected by this LOR, and at each of these
the uncertainty of this LOR traversal is modelled with a Gaussian PDF. By constraining the size
of the Gaussian, we are effectively focusing our attention at the chosen region. Hence, the area
indicated by (a) corresponds, for example, to regions of better SNR or of greater GM constitution.
On the other hand, the pixels traversed in section (b) of the LOR are of less interest, the kernels
are subsequently wider, and the resulting reconstruction will not allow so much detail (which also
means less noise).
Furthermore, [Schmidlin et al. 1994] is of the opinion that the checkerboard effect
is a result of the incorrect assignment of the system matrix coefficients, and hence the
method of interpolation is all important. I will elaborate on this conclusion and postu-
late on the basis of my experimentation that the checkerboard effect is exaggerated as
a result of an observed sinogram value being associated to its true emission source with
a probability of zero (i.e., the data is effectively misplaced ). Because of the constraint
of energy preservation (counts in equalling counts out), then the value must be placed
somewhere in the image data, and with many equally valid alternatives to its true origin,
then the value may eventually occur in a rather arbitrary position. One manner in which
the incorrect assignments of these terms may be minimised therefore, is to make the in-
terpolations wider, which is of course a quite cautious approach to the estimation of the
true distribution. But dependent on the quality of the image data and the knowledge of
the scanning device, it may be necessary. Such an implicit form of regularisation makes
particular sense, therefore, assuming that the checkerboard effect:
• Occurs at the frequency of the pixel basis functions used, and
• is more likely to be present in regions of poor statistics.
Local Estimates of Noise

Given the Poisson nature of the sinogram data (see appendix D), it is very straightforward
to make an estimation of the noise in the image itself [Alpert et al. 1982]. From a back-
projected reconstruction, we can take the number of counts in each pixel, Nj , and assume
that the variance is proportional to this value:
Nj ∝ σ 2 = µ, (11.5)
where µ is the mean of the distribution (which is equal to the variance if the distribution
p
is Poisson). Noise is estimated to be the standard deviation, which is proportional to Nj ,
and the signal is the mean, which is proportional to Nj . As such, our SNR can be estimated
at each pixel as being proportional to:
Nj
p , (11.6)
Nj
which is consistent with the definition of the SNR given in [Hoffman and Phelps 1986]:
qP
J
µ j Nj
= 1 3/4 , (11.7)
σ 2J
where J is the total number of pixels.

In practice, a variance measure is taken about a window of K pixels. This firstly
involves finding the mean value for that window, µj , and then calculating the average
standard deviation about this mean:
1 X
q
σj = (λk − µj )2 , (11.8)
Kj
k∈Kj
where k indexes the Kj neighbours of the current pixel of interest, λj , and the measure
is taken for all j pixels. This value determines the FWHM of the kernel as used in the
iterative reconstruction process.
The user of the associated software is then able to define the number of “levels” of
anisotropy, which simply corresponds to the number of different kernels that may be
realised. This, for example, would be three, in which case the range of values returned
from equation 11.8 is divided into three groups (see figure 11.6). Those values in the
lowest third are assumed to be of low SNR and are likely to be the earliest to converge.
As such, they need greatest regularisation, and the kernel positioned at the corresponding
pixel is set at the highest value (for example, a FWHM of 5 pixels). For values in the
middle third, the associated kernels would be set to FWHMs of 4 pixels (keeping with the
FWHM = 2 pixels. FWHM = 3 pixels. FWHM = 4 pixels.
Threshold Sensitivity at 0.0 Threshold Sensitivity at 0.5
Figure 11.6: The above figure shows different FWHM maps and the resulting reconstructions
based on an adaptive kernel used in the interpolation. The kernel is determined from the “FWHM
maps”, which in turn are taken from the back-projected image. The figure shows how varying
the sensitivity of the noise threshold alters the anisotropic representation for a given minimum
FWHM value (as supplied by the user), and a fixed number of levels (3). Clearly, as the sensitivity
increases, less and less regions are selected as belonging to the finest resolution granularity offered.
example). And the remaining kernels would be set to 2 pixels FWHM; i.e., these are those
that assume the best SNR. The user is free to set the sensitivity of the noise thresholds,
and the minimum kernel size. As figure 11.6 shows, this can make a significant difference
to the reconstruction result (root mean squared (RMS) errors are shown in figure 11.7).
Further results are given in appendix E.
Local Regions of Interest

An alternative approach to adapting the interpolation kernels is to instead use the MRI
data to tell use where the ROIs are, and where, therefore, our focus of attention should
be. This corresponds to asking what regions of the image are likely to be slowest to
converge. Intuitively, one would say that GM tissue regions are of greatest interest as
here the most activity is to be found. Alternatively, for pathological studies, we might
actually want to concentrate on regions of WM. In either case, the software allows the
user the freedom to chose what regions are indeed of interest. This entails again using a
0.0
6.5
RMS 0.5
Error 6
0.8
1.0
5.5
1.3
5 1.7
4.5
0 2 4 6 8 10 12 14 16 18 20
Iteration
Figure 11.7: This figure shows RMS error values of the reconstructed images using different
sensitivity values for the SNR based interpolation kernels. These are errors resulting from using
the FWHM maps of figure 11.6. Note, that the optimal threshold value in this test turned out to
be the very conservative choice of 2.3, after which the errors began to increase again.
probabilistic segmentation of the MRI data, as the continuum between being fully GM,
for example, and containing no GM is of importance. As was the case with noise, the
kernels are determined as smallest when the pixel is of most interest, and largest when of
least interest. That is, a kernel of 2 pixels FWHM might be assigned to a pixel having
high affinity to GM, and 5 pixels FWHM to one with little or no affinity.
Using the simulated data set of section 8.3.1 (shown in figure 8.3), tests were carried
out for varying kernel sizes with FWHMs of between 1 pixel and 5. The results for the
normal reconstructions are shown for those between 1 and 4 in the top row of figure 11.8.
Below these images are, from left to right: the [thresholded] GM segmentation used to
determine the FWHMs in the scheme mentioned above; the resulting reconstruction after 5
OSEM iterations; the resulting reconstruction based on the scheme of using local estimates
of the SNR to determine the FWHMs; those local estimates of the SNR taken from the
back-projected image.
There is a clear interpretation of the results of figure 11.8. One can say that the
image using the segmented data retains the interesting structure that is shown in the
image reconstructed using a fixed 2 pixel FWHM, whilst simultaneously showing the nice
regularisation in uninteresting (i.e., non GM) regions of the image reconstructed using a
FWHM of 3 pixels. It seems the perfect compromise, but this is the purpose of using the
simulation: we can test the results against some ground truth data. The results of this
are shown in figure 11.9 where we can immediately verify the benefit of the anisotropic
representation using the segmented data. The worth of the method using SNR estimates
cannot be gauged from this figure alone. Instead, one must notice that good noise threshold
values3 were not chosen, but the results of figure 11.7 show that this scheme also offers
an improvement to the fixed interpolation schemes. Thus one must be careful in choosing
the level of the desired “focusing’ effect. This is verified in figure 11.12 where real PET
data is reconstructed to 256-by-256 pixels using this scheme. Comparing the result of the
3
The so-called “threshold sensitivity” values are simply scaling values set such that the user of the
software may select an interpolation scheme where all kernels are either very small, or very large, or are
ranged somewhere between these extremes. This choice was approximately normalised between 0 and 2.5,
an arbitrary choice.
Nearest Neighbour FWHM = 2 FWHM = 3 FWHM = 4
GM-based FWHM Map GM-based Reconstruction SNR-based Reconstruction SNR-based FWHM Map
FWHM = 4 FWHM = 4
FWHM = 3 FWHM = 3
FWHM = 2 FWHM = 2

Threshold 0.5 Threshold 1.7
Figure 11.8: The images in the top row are reconstructions using the OSEM algorithm (8 Subsets)
for 5 iterations (40 OSEM steps). From left to right, the reconstructions were done using 1, 2,
3 and 4 pixel FWHMs respectively. Below we see the reconstructions using the variable FWHM
procedures. On the far left is the FWHM map derived from the GM segmentation that was used to
determine the FWHMs for the neighbouring image (that to its right). Here, a threshold of 0.5 was
used. And on the far right, is the level map based on local SNR estimates and used to determine
the FWHMs that produced the reconstruction to its left. This map was derived using a threshold
of 1.7.
reconstruction where the SNR levels are more selective to those of the non-variant method
shown in figure 7.2 substantiates this approach. For interest, the log likelihoods at each
OSEM iteration are shown in figure 11.10 to show the consistency of the results. Here
they are plotted at each OSEM iteration, such that 8 (the number of subsets) steps make
one full iteration.
On the basis of the above results, experimentation has begun using real PET-MRI
studies. Figure 11.11 shows this, and here is the resulting subjective interpretation. We
are confident that FDG take-up occurs predominantly in GM regions, so much of the work
using PVE correction method of Bayesian regularisation attempts to “tuck” the distribu-
tion back into these regions. And this is exactly the effect that we are now witnessing
using our segmentation to control the kernel size of our interpolation method. Indeed, one
can immediately compare these results to those using a typical PVE correction method for
the same data set (figure 2.1 of section 2.9). The method given here i) avoids the bias of
the normal correction methods, and ii) adheres to the notion of energy conservation; that
is, counts in equals counts out, which for quantitative analysis is essential. Furthermore,
the method has a physically realistic interpretation regarding each element of its imple-
mentation. That is, we have succeeded in addressing the three requirements that closed
chapter 3 and this set the scene for the thesis work! Finally, the method is essentially
OSEM, which means it operates at clinically acceptable speeds. Figure 11.11 shows the
11.3. Discussion 135
7 9
6.5
8
FWHM = 1
6 Segmentation Based
7 FWHM = 2
RMS Error
RMS Error
5.5 FWHM = 5
Fixed (FWHM = 3)
6 FWHM = 4
5
5
4.5
SNR Based (threshold = 1.7) FWHM = 3
4 4
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18
Iteration Iteration
(a) (b)
Figure 11.9: The above figures show the average RMS error for the reconstructions taken at each
pixel and at each iteration (8 OSEM steps). (a) shows the results using the anisotropic FWHM
algorithms (where the segmentation method uses a threshold of 0.5 and the SNR estimates 1.7)
alongside the best performing normal reconstruction approach, and (b) shows those for the fixed
kernel algorithms. Note the RMS error for the back-projected image was 39 (not shown). After 8
OSEM steps, this has been reduced to around 5 for each of the algorithms.
79000 79000
Segmentation FWHM 1
Determined FWHM Log 78000 FWHM 2
Log 78000
Likelihood FWHM 3
Likelihood SNR Determined 77000
77000
FWHM
76000
76000
75000
75000
74000
74000
73000
73000 72000
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140
Subset Iteration Subset Iteration
(a) (b)
(OSEM Steps) (OSEM Steps)
5 Full Iterations
Figure 11.10: The above figures show (a) how the log likelihood value varies with each iteration (8
OSEM steps) for the anisotropic kernel approach, and (b) how the log likelihood varies for the fixed
kernel approach. These are shown to emphasise the fact that although we can steadily increase our
likelihood estimate, this does not necessarily correspond to an improvement in the reconstruction.
This can be seen by comparing the above results to those of the RMS error (figure 11.9).
same data reconstructed based on SNR estimates; that is, without the use of the MRI
data. This result also seems to demonstrate the worth of an adaptive algorithm. Further
results are again given in appendix E
11.3 Discussion
The concluding approach to an anisotropic representation of the image data has validated
hypotheses that drove its development. We have increasing regularisation where we need
(i.e., reduced noise), and recovered resolution where this is desired.
It is interesting that these alternative approaches to inference using either the SNR
estimates or the MRI data actually build this information into the system matrix. This
could of course be done using the methods of section 11.1.3, such that each coefficient of
the matrix would be defined as being the probability that a coincidence originating in the
lth pixel is detected by the ith detector pair. Thus, A ∈ I×L , where L denotes the number R
13.7 mm
10.3 mm
6.86 mm

(a) (b)
GM Probabilistic Segmentation. FWHM Map.
(c) (d)
Reconstruction using an interpolation Reconstruction using an
kernel of FWHM = 6.86 mm. anisotropic kernel.
Figure 11.11: The image on the top-left (a) is the GM segmentation used to determine the
kernel sizes for the anisotropic kernel scheme. The image on the top-right (b) is the corresponding
FWHM map. That shown on the bottom-right (d) is the result of this scheme, reconstructed using
the OSEM algorithm run for 5 iterations (the FWHM varying between 6.86 and 13.7 mm). The
image on the bottom-left (c) uses a fixed kernel size (FWHM = 2 pixels) and is also reconstructed
using the OSEM algorithm run for 5 iterations. Noticeable is how the tracer distribution seems to
be “tucked” back into the GM regions.
of pixels in our high-resolution image (l = 0, ..., (J ∗ 4) − 1), and I is as before, denoting

the number of sinogram measurements. Suddenly, the reconstruction problem has four
times the number of unknowns (L), but the same number of equations (I). Certainly, the
system is now undetermined, but the MRI data gives us the constraints to regularise this.
Indeed, this approach is actually taken by [Sastry and Carson 1997], which is shown in
equation 9.17.
When asking what prior information can bring, we must be very careful in allowing it
to have direct influence over the reconstructed images. By building this information into
the system matrix coefficients in the manner described in section 11.2.2, the influence is
implicit as we only alter the probabilities of assignment in a discrete manner. In using the
FWHM = 2
1 (a) Local Estimates of SNR taken from the 1 (b) Resulting FWHM Map : Threshold 0.5. 1 (c) Resulting Anisotropic Reconstruction.
Back-Projected Reconstruction. FWHM = 4
FWHM = 3
Back-Projected Reconstruction.
Back-Projected Reconstruction.
Figure 11.12: The above images show different reconstructions for different slices of PET taken
from the Groningen data set. Each uses the local estimates of SNR to estimate the FWHM maps.
The SNR values are thresholded dependent on the number of anisotropy “levels” and the sensitivity
as chosen by the user of the software. In this example, we have used 3 levels, and correspondingly,
a FWHM map is made where the FWHM is narrowest in regions of good statistics (high SNR).
These values are then use to perform the reconstruction shown on the right. This image is of
256-by-256 pixels, and was reconstructed with the OSEM algorithm run for 5 iterations. Compare
this result with the image of figure 7.2 at the end of chapter 7. The FWHM map shown in 1(b)
uses a sensitivity threshold of 0.5, as does 3(b). That shown in 2(b) uses a threshold of 0.3.
Gaussian PDFs about LOR intersected pixels sites (section 5.2.2), it was seen that the
width of the kernel directly controls the degree of smoothing, and is thus a free parameter
that should be chosen in accordance to i) the dimensionality of the image and ii) the degree
of noise, or expected noise. This first point is fixed, but in this chapter we have succeeded
in showing how we can use knowledge of the noise to determine the most appropriate
interpolation scheme. This naturally extends to using the MRI data, where the kernel
size should be narrower where the activity is expected to be greater (i.e., where the GM
segmentation dominates). This is done under the assumption that we then effectively
smooth in regions of little interest (of low activity), and are more focused in regions of
greater interest (high activity). And on this basis we can address the conflicting goals of
noise suppression and resolution recovery.
12
Multilevel Approaches
Using any iterative method, the question of computational efficiency cannot be overlooked,
and the divide-and-conquer paradigm is undeniably attractive. In recent years, the adop-
tion of this approach for iterative methods of emission reconstruction has enabled the
algorithms to achieve the efficiency sufficient to be clinically acceptable (for a brief review,
see [Hutton et al. 1997]). This was only possible because of the work of [Hudson and
Larkin 1994] whose implementation of a EM-ML algorithm reduced the average computa-
tional overhead by an order of magnitude proportional to the number of subsets used. The
method, Ordered Subsets EM (OSEM), achieves this by applying the divide-and-conquer
tactic: instead of using all projections, the progressive use of different subsets was shown
to produce very similar results at a reduced computational cost. To allow this, the order-
ing of the subsets was said to be important, requiring that they be maximally separate,
thus contributing maximal new information at each sub-iteration. A further constraint1
on the implementation is that of subset balance, requiring that the sum of counts in each
subset be equal for all subsets. The idea is that in this manner, an emission from each
pixel has equal probability of being detected in each of the subsets. In practice, however,
this objective may not actually be achievable because of spatially varying attenuation and
detector sensitivities [Leahy and Qi 1998].
12.1 Implementing the Ordered Subsets Algorithm

Defining λ̂0 to be the initial estimate of image, and λ̂k to be the estimate of the image
at the k th iteration, let S1 , S2 , ..., SN be the ordered subsets. From [Hudson and Larkin
1994], the algorithm is the following:
1) k = 0, λ̂k initialised
2) While λ̂k not converged, do:
a) λ0 = λ̂k , k = k + 1;
b) For subsets n = 1, ..., N
Project. Calculate expected values for cumulated counts y
as µnt = J−1 n
P
j=0 atj λj , t ∈ Sn .
yt atj
λn
P
j t∈Sn µn
Calculate. λn+1
j = P
atj
t
, for all j = 0, ..., J − 1.
t∈Sn
c) λ̂k = λn+1 .
3) End.
1
This is again only a recommendation of the original paper.
140 Chapter 12. Multilevel Approaches
In the M-step (the forward projection), “the scaling of the images matches the activ-
ity expected to be recorded on the projection and the aggregate count recorded on the
projection”:
J−1
X
a.j λnj ,
j=0
P
where a.j = t∈Sn atj . Incorporated into the algorithm is the scaling step, where the
image λ̂ obtained at 2(c) is replaced by cλ̂, where,
P
yt
c = PJ−1t .
j=0 λ̂j a.j
Appropriate selection of the subsets is discussed in [Hudson and Larkin 1994], where the
two proposals are that they be nonoverlapping or cumulative. Sensible selection requires
they be as orthogonal as possible, such that s is typically 14 for 196 projections [Michel
et al. 1998]. Personally, I have found no advantage to this method, and my algorithms
instead present the subsets in a consecutive order. Details of this implementation are
given in section A.6 of appendix A. Note that OSEM is not a true EM-ML algorithm, as
we can no longer guarantee the nice optimisation properties; the log likelihood value can
decrease at an iteration step, for example (see figure 11.10).
12.2 A Subsampling of the Sinogram Space

[Herman et al. 1984] also proposed a multilevel reconstruction approach that involved the
subsampling of the sinogram data to reduce the dimensionality of the optimisation process
for early iterates. Blissfully unaware of the OSEM methods, I adopted and implemented
this method in the hope that the sub-sampling of the sinogram data would average out
the noise, ultimately leading to better conditioned reconstructions arrived at with better
convergence properties. It would now seem, however, that in all but a very few and
rather contrived examples, the method is not appropriate (see section 12.2.2). The pre-
conditioning of the data that results from the subsampling effects the best choice of image
resolution, leading to block artifact in the images for which the resolution is too high (or
the subsampling too deep). Nonetheless, in using the Bayesian methods, the application
of the prior is firstly straightforward (the sinogram is not a subset, it is just subsampled),
although this is not necessarily the case for OSEM2 , and can secondly be used to smooth
this blocky artifact. As such, the examples where the method seems valid involve Bayesian
regularisation. But, the following will not dwell for too long on this approach. It describes
the method, and then some results, demonstrating the aforementioned case where the
method proved beneficial.
12.2.1 Method of [Herman et al. 1984]

The following was motivated primarily by the computational overheads of statistical re-
construction methods, but also by the difficulties in estimating hyperparameters of the
2
I have not to proved this, but this is what I heard at the fully 3-D reconstruction meeting, Amsterdam
1999.
12.2. A Subsampling of the Sinogram Space 141
reconstruction process. This latter point would often mean that an excessive number of
iteration steps were required in order to achieve a good quality solution.
The justification for the method stems from arguments relating the image digitisation
to the sinogram data sampling. If the image digitisation greatly exceeds the degree of data
sampling, then the reconstructions will not be of a good quality. But, if the data sampling
is dense with respect to the digitisation, then this brings no advantages as equations of
the linear system associated with two neighbouring rays may yield essentially the same
information (to all extents and purposes the algorithm considers them to be colinear).
The paper itself [Herman et al. 1984] defines a matrix that performs the sub-sampling
process; an averaging matrix is the applied example. The advantages to this are said to
be twofold:
• A single cycle through all the equations takes half the time (in the case where the
data is sub-sampled by just one level);
• Improved convergence rate.

The concept of the data sub-sampling is shown in figure 12.1, and exactly this method
was also adopted by [Ranganath et al. 1988], who unconvincingly cite advantages in terms
of both reconstruction quality and times.
Direction of
the Reconstruction
Figure 12.1: The above figure shows how the original sinogram (bottom middle) is sub-sampled
using a simple neighbourhood operator. The final sinogram (top right) is then used as the input
data for the first iterates of the reconstruction process. Then, the algorithm iterates using the
progressively finer sampled sinograms, until it arrives at a solution using the original sinogram
data.
12.2.2 The Pros and the Cons of the Multilevel Approach

Attracted by the argument for using this approach, I investigated these methods, present-
ing conclusions to the work in [Oakley et al. 1999b]. The following summarises that work,
142 Chapter 12. Multilevel Approaches
closing with what I consider to be the advantages and disadvantages of the method, and
thus ending this chapter.
The algorithm that was implemented took the exact form of the multilevel approach
[Herman et al. 1984]. The image to be reconstructed is held to a fixed digitisation, and
the sinogram data is read from smoothed and subsampled values, progressing back to
finer levels of sampling only as the iteration number increases (figure 12.1). Note the
difference, therefore, to the OSEM image reconstruction approach of [Hudson and Larkin
1994]. Both methods will, however, arrive at a non-ML solution, and the following studies
show that the averaging of the sinogram data to produce the subsampled sets can have
significant influence on the form of the reconstruction result. In many cases - but certainly
not all - this will imply a better conditioning of the input data. The data set used for the
experiments was simulated [Emert 1998], and is shown in figure 12.2.
The Monte Carlo Simultion - The Ground Truth The Segmentation of the Monte Carlo Activity Regions The Back Projection of the Monte Carlo Sinogram
Figure 12.2: The figure above shows the data set used in the multilevel experiments in profile.
On the left is the input emission data created using a simulation - it is the true distribution. The
plot in the centre shows a segmentation of the original distribution, delineating regions of different
activity levels, as would be the case for a MRI segmentation. And the plot on the right shows the
back-projected distribution of the simulated sinogram.
To test the multigrid approach, I used the cross entropy methods of chapter 8. The
first of these is that due to [Ardekani et al. 1996], the second that from [Som et al. 1998],
which I extended to run with an arbitrary filter size, and the third is the method first
introduced in this report and based on the diffusion filtering of [Perona and Malik 1987].
Much experimentation was made with these algorithms, and it was certainly the case
that the varying of the sinogram resolution had an influence on the final reconstruction
result. This, however, was in many instances a detrimental effect, and often at increased
computational cost. Indeed, these experiments were unable to substantiate the results
reported in [Ranganath et al. 1988].
The occasions for which the multilevel approach did offer improvements only really
occurred using the algorithm due to [Ardekani et al. 1996]. As, however, was discussed in
the original review of this approach (section 46), the method was shown to be the worst of
the cross entropy implementations, resulting obviously in better chances for improvement!
Still, the results of figure 12.3 do show clear advantages when one considers firstly the
better recovery of the true distribution, and secondly the potential speed increase had the
algorithm been set to converge following no further increase in the likelihood estimate.
Reconstruction using the [Ardekani et al., 1996] Cross-Entropy algorithm, run for 50 iterations for Beta = 0.2 and 0.4, respectively. Shown also is the log Likelihood values for increasing iteration number.
Likelihood Plot for the

fully sampled Sinogram Data

fully sampled sinogram Data
Reconstruction using the same algorithm and a Sub-Sampled Sinogram Set (compare result to the true distribution).
Shown are the results for Beta = 0.2 and 0.4, respectively. Note that the log Likelihood value reaches a maximum after around 20 iterations for the fully sampled sinogram data.

Sub-Sampled Data
Figure 12.3: The above results are based on reconstructions using the cross entropy method of
[Ardekani et al. 1996]. Those at the top show the reconstruction result after 50 iterations with
the regularisation weighting term (see section 8.3.1), β, at 0.2 (top left) and 0.4 (top right).
Neighbouring each of the profiles on the right is a plot of the likelihood values at each of the 50
iterations. It can be seen, that in this [normal] use of the algorithm (i.e., where the sinogram
data used was not subsampled), the results are relatively poor, and the convergence occurs at
around the 50th iteration. The figures in the row below, however, show the results of using the
same algorithm where the sinogram data was initially presented at a lower resolution (i.e., one
layer of subsampling). Here, the form of the solution is closer to that of the true emission (see
figure 12.2), and the maximum likelihood value is reached quicker than that using no subsampling of
the sinogram data (this includes accounting for the work done at both levels of sinogram resolution).
12.3 Discussion
The short conclusion of this work is to keep an open mind about things regarding the
regularisation of the image data, but this is not of primary concern. What is of primary
concern is the need to ensure that any algorithm developed can be run without excessive
computational overheads such that it would be acceptable to the clinical environment.
Currently, this would mean that the statistical method should be developed in conjunction
with an algorithm offering the efficiency of implementation of the OSEM algorithm. If
this is not possible, then there is no pressing urgency to expend too much effort on an
algorithm until the computational resources necessary for its acceptance are available. The
anisotropic algorithms of the previous chapter adhere to this requirement.
13
Conclusion and Closing Remarks
13.1 What has been achieved, and how this came about
It proved a useful exercise to at least be involved in the PVE correction methods of
chapter 2. Not just because of the interesting extension to these methods that resulted,
but, more importantly, because it was actually in the identification of this new method’s
shortcomings that the thesis work was able to find its true direction.
Two of these shortcomings related to better constraints on the solution derived from
knowledge of the physical system. From the signal analysis perspective, the model devel-
oped in chapter 3 might have been both effective and efficient in terms of the number of
basis functions required to model an arbitrary signal, but it was unsatisfactory in terms
of its physical realism. Regardless of the niceties of the implementation, the model ac-
tually lost a good deal of this realism in its push to defeat the homogeneity assumption.
The issue of constraints outlined at the end of the chapter were therefore aimed toward
returning the model to something that fitted the physical system.
The first of these was simply the notion of energy conservation: sinogram counts should
be reproduced without loss or gain in the image data. This, we saw in chapters 4 and 5 is
something that naturally occurs when the image is derived from a reconstruction process
formulated using the algebraic methods.
The second constraint was positivity. Negative values often arising in reconstruction
or restoration solutions do so while our parameter range sweeps a continuum of values,
and if there is one model that fits the data, then there will be an infinite number of them
[Scales and Smith 1999]. How, therefore, can one have confidence in a solution without
appropriate constraints?
Positivity imposed upon our solution (or upon the basis function coefficients of equa-
tion 3.14) is a constraint that is almost unreasonable given the choice of basis functions
(the cosine function goes negative1 ). To meet this, it would be necessary to use more and
more basis functions, and for the quality of data that is now being produced from today’s
generation of PET scanners, the number of basis functions is fast approaching the upper
limit of one per pixel at the resolution of the MRI data. Even then, the issue of constrain-
ing the transformation remains unanswered, and this positivity requirement would have
to be enforced using something like Lagrange multipliers. This would be similar to the
use of Kuhn Tucker conditions (as applied to equation 7.27) to impose the positivity on
the EM-ML’s solution [Luenberger 1973]. Furthermore, in applying Gaussian kernels to
our nearest neighbour interpolation scheme (section 5.2.2), we are adopting [non-square]
1
And hence, actually, the ability to model arbitrary signals in efficient terms.
13.2. Reconstructing the Prior 145
basis functions at each pixel! Moreover, this algorithm operates on the sinogram data
(i.e., ahead of any errors resulting from the reconstruction process), and includes a noise
model consistent with what is observed.
Such issues, of course, implied using methods of statistical reconstruction to meet the
constraints that would bring consistency with the physical system. Remaining was the
desire to impose the same realism on the solution taken from the MRI data; the biological
system. The transformation of equation 3.14 was derived using starting estimates to relate
the different tissue segmentations to activity in the PET according to known ratios. But
these terms (10, 3 and 1, for example, [Kiebel et al. 1997]) were immediately absorbed into
the coefficients of the basis function. On one hand, we can claim robustness to errors in
these initial estimates (i.e., robustness with respect to knowledge concerning the activity
ratios of different tissue classes), but on the other hand we fail to exploit a considerable
amount of information, which in the case of healthy patients, is highly valid. Perhaps the
most suitable method seen for imposing such constraints does this using energy functions
in order to encourage the distributions toward known estimates [Lipinski et al. 1997,
Sastry and Carson 1997]. This of course operates within a reconstruction algorithm,
although such Gaussian fields could be applied to the intensity transformation of chapter 3
as penalisers. But the methods of [Lipinski et al. 1997] and [Sastry and Carson 1997] also
impose the energy conservation and positivity constraints, as their algorithms are based
on the ML estimator and a Poisson model.
Statistical methods of image reconstruction are based on a modelling of the imaging
process to a level of detail restricted only by the available knowledge and computational
cost. On the basis of accurate system and biological models, statistical methods offer by far
the best prospects for the more accurate quantification of PET data [Lipinski et al. 1997,
Sastry and Carson 1997, Leahy and Qi 1998]. And we have also been able to demonstrate
similar evidence to support this conviction.
This firstly involved chapter 5’s development of a new interpolation model for relating
the LORs to the pixels in the reconstructed image, and the regularisation method of sec-
tion 8.3.1. An extension to the Bayesian method of [Sastry and Carson 1997] was proposed
in chapter 10, designed specifically to incorporate the PVE correction method of chapter 3,
or any alternative prior that avoids an assumption on homogeneity. Better heuristics were
developed to adjust the energy fields of this Bayesian method, and the resulting algorithm
exhibited improved results in comparison to the existing method. Chapter 11 then in-
troduced the notion of a variant pixel representation and how this might effectively be
used in PET reconstruction. Operating with a variant interpolation method, the scheme
developed the usage of prior information in an unbiased manner, robust to errors in the
registration and segmentation procedures, and with an improved RMS error for studies
using mathematical phantoms. Furthermore, this approach is readily extendable to local
estimates of noise in the PET data, such that improved reconstructions are also available
when no prior information is to hand.
13.2 Reconstructing the Prior

From any inspection of the work presented in the thesis, it is clear that there exists a
trade-off between noise and resolution, variance and regularisation, and then prior beliefs
and posterior outcomes. One treads a fine line with such issues, and in seeking to avoid
146 Chapter 13. Conclusion and Closing Remarks
over-simplifying assumptions made of the data, this line just got finer. I will therefore urge
caution in these matters, as I have been particularly concerned given the importance of the
application. The following quickly summarises these concerns before the final conclusions
are collected.
Ideally, algorithms should be employed ahead of errors likely to be incurred from firstly
the segmentation of the MRI data, and secondly the errors resulting from the MRI to PET
registration step. This of course means that the PET reconstruction process would not
be able to use such side-information, so one should just be aware of this source of error,
and how influential it can be. Consider the following question raised in the discussion of
Brain PET 1997 by J.C. Baron to the paper from Olivier Rousset concerning his PVE
correction method.
It is very important to do corrections for partial volume effects and atrophy.

But, as Dr. Minoshima said, from the pathophysiological point of view, one has
to be very cautious when using these tools. For example, take a structure that
has lost part of its volume but the receptor number, for example, has stayed
the same. Then you will find, by correction, an increased density, whereas
the number has not changed. So this means that when you do these kinds of
corrections, you have to take into account the volume of the structure that you
have corrected. This also applies to Alzheimer’s disease and all the diseases
where atrophy is part of the disease. So if you correct for atrophy, you may miss
the effects of pathology you’re actually interested in. With the new generation
machines which have such a high resolution, do we still need correction for
these effects?
Rousset replied that he thought so, but I am not so sure. The new generation PET
scanners are now with us, and their use of higher sensitivity LSO crystals will lead to
improvements in resolution.
To tightly constrain a reconstruction is to introduce a biased inference of the recon-
struction’s outcome. In effect, the PVE correction algorithms of chapters 2 and 3 offer the
most sophisticated methods available for the generation of prior distributions. Nonethe-
less, the reconstruction of a prior is likely to mean an incorrect reconstruction, invalid for
any quantitative analysis of PET data. However apt certain assumptions regarding ratios
of activity levels in different tissues are, at the ultimately desired accuracy - or even at
the resolution of routine PET reconstructions - these assumptions can not be guaranteed
to hold. They are coarse generalisations, that, in images of diseased brain regions (PET’s
main and unrivalled application), they begin to lose their relationship to reality.
Ultimately, one’s only assurance of an unbiased, quantitatively valid, objective recon-
struction is to use the PET data alone, be as accurate and as sensible with respect to
the probabilistic model of the tomographic system as possible (i.e., the system matrix),
assume Poisson noise (and that is about as general as one can be), and reconstruct. Fi-
nally, the system model should have a physical interpretation to justify each of its design
aspects. The observed data should never be manufactured or arbitrary.
This argument favours the simplicity of the emission uncertainty interpolation model
(section 5.2.2), and also the anisotropic representation (section 11.2). When the MRI
data is to hand, then this approach has been demonstrated to show more accurate recon-
structions, as well as visually more appealing images. The Bayesian method developed in
chapter 10 delivers an unrivalled framework for combining the correction and reconstruc-
13.3. Closing Remarks 147
tion results, but although one can now employ sensible data driven heuristics regarding
the determination of the Gaussian fields (section 10.4), one can never be certain of striking
the correct balance between the prior and the likelihood terms.
A recent editorial discussing the general acceptance of statistical methods cites the
most prohibitive factor for their established clinical usage as being processing time [de
Jonge and Blobkland 1999]. The authors surveyed recent papers using emission tomo-
graphic studies, reporting that clinical usage of iterative methods only approached 10%.
They conclude that each new method should be shown to be at least as good as the cur-
rent gold standard before its introduction. Given that the gold standard is FBP, then
all that remains to be addressed is the issue of processing time. But given the fact that
a number of the presented algorithms operate with the computational complexity of the
OSEM implementation (which is not the case for most a priori tuned algorithms), then
this aspect has been previously addressed.
13.3 Closing Remarks

The trade-off between over- and under-constrained solutions is a delicate one. Clearly,
if we operate with too many unconstrained degrees of freedom, then we fail to exploit
what information was to hand, we may result with what is obviously an unrealistic recon-
struction, and we are in turn left with little confidence in our solution. Alternatively, too
many constraints and we bias our solution, we undermine the value of the observed data,
and we are highly susceptible to errors in registration and segmentation procedures that
are not accurate in their automated implementations. The particular concern that has
arisen in this doctoral work has been the issue of homogeneity. By assuming homogeneity
within tissue classes one acquires considerable knowledge regarding the expected solution.
But this knowledge is fool’s gold, although to ignore it means overlooking the information
given to us from the MRI data.
One can therefore operate somewhere between these two extremes and remain relatively
satisfied with the observed results, despite being unsure of the whys and wherefores of what
brought them there. There probably is an optimal trade-off to be had, but this work alone
cannot be expected to determine where it sits or how one might locate it. Indeed, it
is an aspect of emission tomographic reconstruction in particular, and ill-posed inverse
problems in general, that is far from being satisfactorily answered. We must obviously
learn to live with such “fiddle factors”.
A
Algorithms Used
A.1 Discrete Cosine Basis Functions Algorithm

The following is the Discrete Cosine Basis function algorithm as described in section 3.2.
for by = 0 to B;
for bx = 0 to B;
if (by==0) v = √1Y ;
q
else v = Y2 ;
if (bx==0) u = √1X ;
q
else u = X2 ;
for y = 1 to Y;
for x 1 to X;
cu = π(2(x−1)+1)bx
2X ;
π(2(y−1)+1)by
cv = 2Y ;
basisImage(x,y) = u.v. cos(cu). cos(cv);
end;
end;
end;
end;
A.2 System Matrix Manipulation Algorithms

A.2.1 Algorithm 1
Our first algorithm involves straightforward vector multiplication. This is implemented as
sparseMultiplication(FILE *fp, float *lambdaPtr, float *sinoYintPtr):
Aλ̂ → ỹ. (A.1)
for i = 0 to I-1;
sum = 0;
for j = 0 to J-1;
sum += aij × λ̂j ;
end;
A.2. System Matrix Manipulation Algorithms 149
yit = sum;
end;
A.2.2 Algorithm 2
This algorithm is implemented in sparseTransposeMultiplication(FILE *fp, float
*sinoYfloatPtr, float *lambdaPtr). It is essentially the same as that above, except
we now multiply the transpose of A with the sinogram vector to yield an estimate of λ:
AT ỹ → λ̂. (A.2)
for j = 0 to J-1;
λb (j) = 0; /* Initialise all Columns. */
end;
for i = 0 to I-1;
/* Get one row of the system matrix and traverse it (fread). */
for j = 0 to J-1;
λ̂b (j) += aij ×y(i);
end;
end;
A.2.3 Algorithm 3
The second algorithm produces the scalar product between the ith row of the system matrix
and the estimated image vector. This is implemented as float sparseScalarProduct(FILE
*fp, float *lambdaPtr, int rowNumber):
ATi λ̂. (A.3)
scalar = 0; /* And i is known in advance */

for j = 0 to J-1;
scalar += aij × λ̂(j);
end;
return(scalar);
The following details how the previously defined algorithms may be altered to operate
on our new implementation of the system matrix (see figure A.1). An explanation of the
following algorithms (A.2.4 to A.2.6) is given in section 5.4.1 of the main text.
A.2.4 Algorithm 4
Our first algorithm involves straightforward vector multiplication:
Aλ̂ → ỹ. (A.4)

150 Appendix A. Algorithms Used
for i = 0 to I-1;
sum = 0;
/* Set a and j to point to current row (figure A.2). */
for z = 0 to Z(i)-1;
sum += a(z)×λ̂(j(z));
end;
yit = sum;
end;
A.2.5 Algorithm 5
AT ỹ → λ̂. (A.5)
for j = 0 to J-1;
λb (j) = 0; /* Initialise all Columns. */
end;
for i = 0 to I-1;
/ Set a and j to point to current row (figure A.2). */
λ̂b (j) += a(z)×y(i);
end;
end;
A.2.6 Algorithm 6
The second algorithm produces the scalar product between the ith row of the system
matrix and the estimated image vector:
ATi λ̂. (A.6)
scalar = 0; /* And i is known in advance */

scalar += a(z)×λ̂(j(z));
end;
return(scalar);
A.3 The ART Algorithm’s Implementation

There are three stages involved in implementing the ART algorithm, where each stage
uses the following basic matrix operations that are common to all of the statistical recon-
struction implementations:
• Matrix Vector Multiplication Aλ → y.
• Multiplication with the transpose of the matrix AT y → λ.

A.3. The ART Algorithm’s Implementation 151
a 04, i.e., for i=0 and z=4.

z=0............. Z -1 Z
0 0
' ) ' )

( * ( *
Zmax Z0
' ) ' )

( * ( *

az’s (values)
i=0

# % # %
j z’s (indicies)

# % # %
$ & $ &
# % # %
$ & $ &
Z0
Z1
"
Z I-1
! !" + + - + - / - / 1 / 1 1 3 3 5 3 5 7 5 7 9 7 9 ; 9 ; = ; = ? = ? W ? W W W W W W
, . , . 0 . 0 0 2 2 4 4 6 8 6 8 : 8 : : < > < > > @ X @ X X X X X X
Z I-1
! ! + + - + - / - / 1 / 1 1 3 3 5 3 5 7 5 7 9 7 9 ; 9 ; = ; = ? = ? W ? W W W W W W
, . , . 0 . 0 0 2 2 4 4 6 8 6 8 : 8 : : < > < > > @ X @ X X X X X X
+ + - + - / - / 1 / 1 1 3 3 5 3 5 7 5 7 9 7 9 ; 9 ; = ; = ? = ? W ? W W W W W W
, . , . 0 . 0 0 2 2 4 4 6 8 6 8 : 8 : : < > < > > @ X @ X X X X X X
+ + - + - / - / 1 / 1 1 3 3 5 3 5 7 5 7 9 7 9 ; 9 ; = ; = ? = ? W ? W W W W W W
, . , . 0 . 0 0 2 2 4 4 6 8 6 8 : 8 : : < > < > > @ X @ X X X X X X
i=I-1
H
A A C A C E C E G E GH I K I K M K M O M O Q O Q S Q S U S U Y U Y Y Y Y Y Y
B D B D F D F F J J L N L N P N P P R T R T T V Z V Z Z Z Z Z Z

A A C A C E C E G E G I K I K M K M O M O Q O Q S Q S U S U Y U Y Y Y Y Y Y
B D B D F D F F J J L N L N P N P P R T R T T V Z V Z Z Z Z Z Z

] ] ]
The nonzero matrix values The corresponding column

[ [ [
\ \ ] ^] ^]
[ [ [
\ \
[ [ [
^ ^
\ \
[ [
\
[
az, for all i. indexes that relate the aiz to aij .
Figure A.1: Storing the Non-Sparse System Matrix (see section 5.4.1). The very first integer
stored (Z + max) gives the maximum row size, which is useful for putting an upper limit in the
code when allocating memory structures. The next integer stored gives the size of the current row,
which is the likelihoods of an emission in a known pixel being detected by detector pair i, where
i is the current row number, and the values are stored as type float. Which pixel is given by
the indices of the next row, which are stored as type int. This pattern is then repeated for each
detector pair.
a

for fixed i

j

Figure A.2: Initialising the pointers for the algorithms to address the relevant regions of the
above system matrix. Having read Zmax , the code jumps to read the current Zi and then the
corresponding two rows. This simply requires setting pointers to find the beginnings of these rows,
which can only be done with knowledge of Z0 , ..., Zi−1 such that the offset can be appropriately
derived.
• Scalar Product between the ith row of the system matrix and vector ATi λ.
These three algorithms are given above (sections A.2.1, A.2.2 and A.2.3). In the
following we sketch only the routines specific to the iterative ART algorithm of the earlier
definition (section 6.2).
Stage 1
Stage 1 requires:
aTi ai . (A.7)
Calculating the denominator of equation 6.7 is a time-consuming task. The system

matrix must be accessed along its rows and also along its columns. Our method of storage,
however, means that we are only able to perform consecutive data accessing (i.e., in
the direction of the rows, only). Fortunately, the denominator is a constant, hence the
work is only done once during the initialisation of the algorithm. For all i, aNorm(i),
the denominator, is calculated by the routine sparseArtNormFactor(FILE *, float *).
This function takes and returns a pointer to our resulting 1-D index array (aNorm) along
with the system matrix file pointer as arguments.
Stage 2
Stage 2 requires:
aTi λk . (A.8)
The central operator is the scalar product between a certain row, i, and the solution
vector λ:
J−1
X
yi = aij λj = aTi λ. (A.9)
j=0
This is implemented in the code as sparseScalarProduct(FILE *fp, float *lambdaPtr,

int rowNumber).
Stage 3 - For our chosen i, determine updates for all j

yi −aTi λ
k
The term T
ai ai
gives us our single weighting value. We apply this to each aij for fixed
i, and add these j values to our previous estimate of the solution vector, λkj .
The entire process involves firstly an initialisation step, and then the iterates toward
a solution. At each iterate, one particular row of the system matrix must be chosen. Our
choice for i can be done in a number of ways, and each will affect the final result [Jain 1989].
Often, i may be chosen as a function of K, your maximum iterate number. For example,
K mod I. Below we just select the value randomly from within bounds determined by the
matrix size.
/* Initialise the procedure: */

for j = 0 to J-1;
tempImageData = 0;
end;
sparseArtNormFactor(ptrToSysMatrixFile, aNorm); /* See section A.3 above */
for k = 0 to K-1;
/* Derive chosen i randomly: */
i = rand()%I;
A.4. The EM-ML Algorithm’s Implementation 153
product = sparseScalarProduct(ptrToSysMatrixFile, imageData, i);

/* See section A.3 above */
weight = (sinogram[i] - product) / aNorm[i]);
/* Update the image data: */
for j = 0 to J-1;
tempImageData[j] += weight*sysMat(i,j); /* See section A.3 above */
end;
for j = 0 to J-1;
if (tempImageData[j] < 0.0)
tempImageData[j] = imageData[j] = 0.0;
else imageData[j] = tempImageData[j];
end;
end;
return(imageData);
Note the use of tempImageData. This is used to allow the imposition of a non-
negativity constraint. This particular ART algorithm is consequently referred to as par-
tially constrained ART reconstruction [Jain 1989], ART2 or MART [Herman et al. 1973,
Lent 1976].
A.4 The EM-ML Algorithm’s Implementation

The following shows the pseudo code for the EM-ML algorithm as discussed in section 7.5
of chapter 7.
/* Initialise the procedure: */

for j = 0 to J-1;
imageData[j] = sensible positive value;
normVector[j] = 0.0;
end;
for i = 0 to I-1;
/* Use ‘fread‘ to load row i of A into sysMatRow. */
for j = 0 to J-1; P
normVector[j] += sysM atrixRowi [j]; /* i aij */
end;
end;
/* Begin the iterative procedure: */
for k = 0 to K-1;
/* Do Stage 1 and 2 in parallel: */
for i = 0 to I-1;
/* If sinogram value is not zero ... */ P
sinogramF = sparseScalarProduct(fp, imageData, i); /* ← j 0 aij 0 λj 0 */
P
sinogramQ[i] = sinogram[i] / sinogramF; /* yi ÷ j 0 aij 0 λj 0 */
end; /* Ends Stages 1 and 2. */
for j = 0 to J-1; /* Some initialisation. */
sumImage[j] = 0.0;
end;
/* Begin Stage 3: */
for i = 0 to I-1;
/* Use ‘fread‘ to load row i of A into sysMatRow. */
for j = 0 to J-1; /* Now addressing individual pixels: */
sumImage[j] += sysM atRowi [j]*sinogramQ[i];
end;
end; /* Ends Stage 3. */
/* Begin Stage 4: */
for j = 0 to J-1;
imageData[j] *= sumImage[j] / normVector[j];
end; /* Ends Stage 4 and current iteration of the EM-ML algorithm. */
end;
return(imageData);
A.5 Implementation of the Alternative Gaussian Algorithm

The algorithm of equation 10.10 takes the form of equations 9.29 and 9.31 of section 10.3.
Variables are firstTerm, littleSum and bigSum. These are given by equation A.10 to
equation A.12, respectively:
!
f irstT ermj = λpj − σj2
X
aij , (A.10)
i
X
littleSumi = aij 0 λkj0 , (A.11)
j0
!
X aij λkj yi
bigSum = . (A.12)
littleSumi + Ri + Si
i
Notice that bigSum is the summation of the Nijk terms of equation 9.29, such that our
iterative procedure is:
1 1q
λk+1
j = .f irstT erm j + f irstT erm2j + 4σj2 .bigSum, (A.13)
2 2
where, as previously, i indicates the detectors, j the pixels, and k the iteration number.
In this case, aij is the value of the system matrix in its ith row and j th column (see
figure 5.2).
One iteration of the code is simply:
P
/* Initialise the normalisation vector i aij : */
for j = 0 to J-1;
normVector[j] = 0.0;
end;
for i = 0 to I-1;
for j = 0 to J-1;
A.6. OSEM Implementation Details 155
normVector[j] += sysMatRow[j];
end;
end;
for j = 0 to J-1;
f irstT ermj = λpj − σj2 ×normVector[j];
end;
for j = 0 to J-1;
bigSum = 0.0;
for i = 0 to I-1;
littleSumi = 0.0;
/* Get one row of the system matrix and traverse it (fread). */
for jj = 0 to J-1;
littleSumi += ai,jj × λkjj ;
end;
bigSum += (aij × λkj × yi ) ÷ (littleSumi );
end; q
f irstT ermj + f irstT erm2j +4σj2 bigSum
λk+1
j = 2 ;
end;
A.6 OSEM Implementation Details

The actual implementation (code) for the OSEM algorithm of section 12.1 simply follows
that of the EM-ML algorithm given in section 7.5. Letting {Sn }N n=1 be a disjoint partition
of the sinogram data, k the iteration number for a complete cycle of the algorithm, and
n the index for each sub-iteration, for which λ(k,0) = λ(k−1) , and λ(k,N ) = λk , then the
algorithm is implemented along the lines of:
(k,n−1)
(k,n) λj X aij yi
λj =P (k,n−1)
, (A.14)
i∈Sn−1 aij P
aij 0 λj 0
i∈Sn−1 j0
for all j = 0, ..., J − 1 and n = 1, ..., N .
A.6.1 OS-Cross Entropy Algorithm

Along similar lines, we have.
1) k = 0, λ̂k initialised
2) While λ̂k not converged, do:
a) λ0 = λ̂k , k = k + 1;
λnj ← λk+1j ;
b) For subsets n = 1, ..., N n−1
yi aij λ
if i∈Sn−1 0 a 0 > β log jpj
P
P then
j ij
n−1
n−1
n λj y a λ
i ij
i ∈ Sn−1 P a λn−1 − β log jpj
P
λj = P ;
i∈Sn−1 j0 ij 0 j 0
else
yi aij
λnj = pj exp{ −1
P
β ( i∈Sn−1 aij 0 λn−1
)};
P
j0 j0
end;
λk ← λn ;
end;
A.6.2 OS-Bayesian Approach

Likewise.
! v !2
u
(k,n+1) 1 1u X (k,n)
aij + λpj aij + λpj
X X
λj = −σj2 + t −σj2 + 4σj2 Nij ,
2 2
i∈Sn i∈Sn i∈Sn
(A.15)
(k,n)
(k,n) aij λj yi
where Nij = P (k,n) .
j 0 a 0 λ
ij j 0 +r i +si
As such, firstTerm becomes:

!
p
X
−σj2 aij + λj ; (A.16)
i∈Sn
littleSum is
(k,n)
X
aij 0 λj 0 ; (A.17)
j0
and bigSum is:

(k,n)
X aij + λj yi
. (A.18)
littleSumi + Ri + Si
i∈Sn
One should possibly be careful with the OS approach in the Bayesian implementation
[Leahy and Qi 1998]:
However, both ICA (see [Fessler 1994]) and gradient based methods [Kaufman
1987, Mumcuoğlu et al. 1996, Fessler et al. 1996] produce faster convergence
than the EM-ML algorithm and its variants, and have the advantage over the
OSEM1 method that, when used to compute a MAP estimate, they are stable
at higher iterations so that the selection of the stopping point of the algorithm
is not critical.
It is unclear to me how great an influence the Bayesian prior should have on a recon-
struction process whose updates are derived from only a subset of the sinogram data. I
would guess that the influence be scaled down accordingly, but I have yet to validate this.
1
But that’s not for a Bayesian version?
B
[Levitan and Herman 1987]
The paper due to [Levitan and Herman 1987] is important for many reasons. Primarily
it gives the derivation of the Bayesian reconstruction algorithm used to to regularise the
solution with a Gaussian field. It also poses the question of the importance of the likelihood
function to the reconstruction algorithm, suggesting that it might be better to adopt
penalised least-squares, an approach later adopted with gusto.
A “simple” iterative formula is derived for a penalty function that is a weighted sum
of the squared deviations of image vector components from their a priori mean values,
denoted throughout as λp .
p(y|λ)P (λ)
P (λ|y) = . (B.1)
p(y)
The Bayesian (equation B.1) was first applied in [Hurwitz 1975]. The “checkerboard
effect” (irregular high amplitude patterns) reflects the inherent ill-posedness of the EM-
ML reconstruction and can be related to the so-called dimensional instability in non-
parametric probability density estimation (i.e., because the estimators are sought in a
continuous domain [Lipinski 1996]).
B.1 The Algorithm - MAP-EM

The combined MAP-EM approach results in a penalised ML reconstruction using specific
forms for the likelihood and penalty functions (see also [Geman and McClure 1985]).
[Liang and Hart 1987] also discuss the Gaussian prior, and use the EM-ML approach to
derive an iterative scheme.
 yi P
Y X − j aij λj
p(y|λ) =  aij λj  exp . (B.2)
yi !
i j
The choice of prior is the following,

γ
P (λ) = C exp − (λ − λp )T H(λ − λp ) , (B.3)
2
158 Appendix B. [Levitan and Herman 1987]
where C is a normalisation constant, λp denotes the prior, and − γ2 and H together

specify the covariance matrix of the Gaussian distribution. Hence this is of the quadratic
form,
 
X (λj − λpj )2
exp  . (B.4)
2σ 2
j
Now, assuming H is a diagonal matrix (i.e., they assume no relationship between

neighbouring pixels),
γ
ln P (λ|y) ∝ ln p(y|λ) − (λ − λp )T H(λ − λp ) (B.5)
 2  
X X X γX
= yi ln  aij λj  − aij λj  − hj (λ − λp )2 . (B.6)
2
i j j j
As such, the derivative is of the following form,
d ln P (λ|y) d ln p(y|λ) X
= −γ hj (λ − λp ). (B.7)
dλj dλj
j
P
Note, it will be assumed that i aij = 1, for all j.
B.1.1 The EM-Algorithm

P
Let ψij be the complete data such that yi = i ψij . In the MAP EM algorithm, instead of
maximising ln P (λ|y) directly, the conditional expectation of ln P (λ|y) over the complete
data set is being maximised [Dempster et al. 1977].
γ
ln P (λ|ψ) ∝ ln p(ψ|λ) − (λ − λp )T H(λ − λp ), (B.8)
2
where H denotes the covariance matrix of the Gaussian, simplified to the identity
matrix for implementation purposes.
Let p(ψ|y, λ̂) denote the conditional probability of ψ under the assumption that the
measurement y has been made and λ̂ the image vector. Then, for any function f of ψ
(e.g., a log likelihood function!), we define the conditional expectation of f by:
X
Eψ [f (ψ)|y, λ̂] = f (ψ)p(ψ|y, λ̂). (B.9)
ψ
It then follows from equation B.8 that (in saying f (ψ) = ln P (λ|ψ) - and then swapping
the ψ and λ terms):
γ
Eψ [ln P (λ|ψ)|y, λ̂] = Eψ [ln p(ψ|λ)|y, λ̂] − (λ − λp )T H(λ − λp ). (B.10)
2
B.1. The Algorithm - MAP-EM 159
B.1.2 EM-ML
The EM gets from λk to λk+1 by first considering the conditional expectation of ln P (λ|·)
with λ̂ = λk , which is the E-step, and then choosing as λk+1 that λ which maximises the
conditional expectation (the M-step).
( )
XX yi aij λ̂j
Eψ (ln p(ψ|λ)|y, λ̂) = P ln(aij λj ) − aij λij , (B.11)
i j j 0 aij 0 λ̂j 0
and hence,
( )
yi aij λkj γX
hj (λj − λpj )2 ,
XX
k
Eψ (ln p(ψ|λ)|y, λ ) = P k
ln(aij λj ) − aij λij −
j 0 aij 0 λj 0 2
i j j
(B.12)
which is the E-step of the algorithm. The optimisation process involves the following.
I−1
dEψ (ln p(ψ|λ)|y, λk ) λkj X aij yi
= P k
− 1 − γhj (λj − λpj ) = 0. (B.13)
dλj λj a 0 λ
j 0 ij j 0
i=0
For λj = λk+1
j , and rearranging, we see that,
P
2 p k aij yi
γhj λj − λj (1 − γhj λj ) − λj P i k
= 0, (B.14)
j 0 aij 0 λj 0
and hence,
1 q
λk+1
j = γhj λpj − 1 + (γhj λpj − 1)2 + 4γhj Nijk , (B.15)
2γhj
a y
where Nijk = λkj i P ija i0 λk . Note that the derivations only concern those parameters
P
j0 ij j0
remaining to be estimate; i.e., some are held fixed in the EM-ML scheme. Note also that
only the positive root is taken in the above expression, and hence we obtain positive λk+1
j s
k
from the positive λj s. Finally, note that γ was empirically selected as 0.0005, and that as
the hj constituted a diagonal identity matrix, they may be ignored.
B.1.3 Discussion
After only 25 iterations does the MAP algorithm improve upon the least-squares error of
the best ML result, at which point the ML deteriorates, the MAP result instead remaining
consistently near the point of optima. As the least-squares error value is not of course
available in real experiments, and hence the point of deterioration is unknown for the
ML algorithm, the authors suggest just letting the MAP process “run long enough” is
sufficient to produce a fairly guaranteed result. But this point does coincide fairly well to
where the ML method seems to have found a likely solution; i.e., the likelihood function
shows only slight change thereafter.
The MAP algorithm does not adhere to the energy preservation principle, but the
authors
P show
P how this might be achieved via the introduction of a Lagrange multiplier,
η( j λj − i yi ).
C
Theoretical Background to Blobs
This appendix gives some background theory to the work on image reconstruction using
spherically symmetric pixels, or “blobs” to represents images, as opposed to the con-
ventional use of a square shaped representation. This does relate to the reconstruction
schemes used in this project, so it is a necessary inclusion to the thesis. In the following,
the notation mostly follows that of [Lewitt 1992], and the introductory discussion is a re-
view of [Hanson and Wecksung 1985], who originally introduced the use of spherical basis
functions to tomographic reconstruction.
C.1 Spherical Basis Functions in Tomographic Reconstruc-

tion
Our unknown function, the image that is to be reconstructed, must always be represented
by a set of basis functions. Typically, this is the pixels of the image group, and the
coefficients of these basis functions are simply the pixel intensities. If one instead decides
to use different basis functions, then this of course effects the calculation of the system
matrix coefficients that relate the LORs to the basis function centres (previously pixels)
- for one thing, the basis functions are likely to have more coverage. The reconstruction
algorithms in all instances estimate the coefficients of the basis functions, which, in the
case of pixels, are simply the pixel intensities. Hence the reconstruction algorithms remain
the same, but on completion (having converged, for example), those using spherical basis
functions must derive the final image via an “over-imposition step” that redistributes each
coefficient value over the basis function that was centered at the position of the coefficient.
Hence, by including the basis functions in the system matrix, the forward projection of
the image estimate includes this summation step, reconstructing the sinogram data on the
basis of a blob-constructed image to allow the calculation of the error residual taken in
the sinogram space1 .
The motivation for this approach first documented in [Hanson and Wecksung 1985],
was almost entirely aesthetic: “... sharp discontinuities at the pixel boundaries can be very
disconcerting to the eye”. To this end, desirable properties of a suitable basis function are
those given in table 62.
Note that non-local basis functions are unlikely to satisfy criteria (D) and (E), and
1
This error residual is the difference between the true sinogram data and that which was forward
projected from the image estimate. It is the basis for the updates in all of the algebraic reconstruction
algorithms.
C.1. Spherical Basis Functions in Tomographic Reconstruction 161
A Strong linear independence

- necessary to specify the coefficients corresponding to a given recon-
struction function uniquely.
B Power of approximation - ability to represent the true function (the true
image distribution).
C Insensitivity to shift of basis-function set - in order to allow for a more
efficient implementation; this amounts to the requirement that the basis
function be bandlimited.
D Efficient computation of forward and backward projections - the aspects
of the algebraic reconstruction algorithms that take most processing
time.
E Efficient implementation of reconstruction constraints - for example, ac-
commodating the usage of prior knowledge.
F Fidelity of appearance - the given basis for this approach.
Table C.1: The original criteria that motivated the use of spherical basis functions, or blobs.
Desirable properties of a suitable basis function set for tomographic image reconstruction [Hanson
and Wecksung 1985]
hence, ultimately, the motivation for the work of [Lewitt 1992]. Criterion (F), however,
drove this original work, and not, for example, the need for regularisation of the reconstruc-
tion solution or improved accuracy in deriving the system model. To this end, smooth,
positive, overlapping basis functions were preferred; this, firstly because of reduced visual
artifact, and secondly because of a said improved computational accuracy2 .
C.1.1 Theoretical Background

The Regular Pixel Representation In 2-D, let (x, y) be the point relative to the
sampling centre - the would be pixel centre - and let {(xj , yj )|j = 0, 1, ...J − 1} be a set
of J points that are the nodes of a uniform grid of points having components each of
which are multiples of a given constant ∆. The image is represented as λ̄pixel , which is
constructed as the sum of scaled and shifted square basis functions, φ:
J−1
fjpixel φ(x − xj , y − yj ),
X
pixel
λ̄ (x, y) = (C.1)
j=0
where φ, the unit pixel basis function is defined as:
1 if |x|, |y| ≤ ∆

φ(x, y) = 2, (C.2)
0 otherwise,
and {fjpixel } is the set of coefficients of the image representation that is stored in the
computer. This defines the conventional pixel representation in image data, as was initially
introduced in section 4.4.
2
Actually, there is no evidence in the paper itself of this claim! Indeed, the authors themselves say:
“It is hard to quantify the [presumed] superiority of the more advanced basis functions because the usual
RMS measures of accuracy are inadequate to describe such minor, but significant, differences”. That is,
their computational accuracy was subjectively judged, despite the availability of ground truth data, and
the results are entirely qualitative.
162 Appendix C. Theoretical Background to Blobs
(x j,y j)

Pixel Extent
Blob Extent
Figure C.1: The above figure shows how both basis functions (a pixel and a blob) are perhaps
centered in the same position (xj , yj ), but nonetheless have different extents. In practice, however,
the LOR is unlikely to traverse the pixel centre, and the blob is therefore located off-centre.
The Spherical Symmetric Functions Representation Through the replacement of

φ with spherically symmetric basis functions, denoted b, we are now able to bring a new
image representation into consideration. Denoting k(x, y)k as the distance from the origin
(x, y), then b is defined to be a function of this distance, and the image representation, λ̄,
is:
J−1
X
λ̄ = fj bj , (C.3)
j=0
where we say bj = b(k(x − xj , y − yj )k). The choice of functions is taken from the
rigorous theoretical discussion of [Lewitt 1990], boiling down to those capable of efficiently
approximating an arbitrary function, whilst being well-bounded in both the spatial and
frequency domains. As one would expect, this is a Gaussian-like function, although the
Gaussian function itself cannot represent a uniform distribution (except in its limiting
case3 !). For spatial distributions that are smooth and continuous, it is said that λ̄ will be
a more appropriate image representation than λ̄pixel , which is discontinuous [Lewitt 1992]:
“The straight sides, 90o angles and artificially sharp boundaries of voxels are
not consistent with the properties of natural structures.”
Such basis functions contain an infinitely large band of frequencies, so, on the basis
that the reconstructions should not attempt to estimate components outside of the range
of the acquisition device’s operational limit, it makes sense to bandlimit. (On the other
hand, it is of course desirable not to restrict the reconstructions to the scanner’s response,
if one would wish, for example, to address the PVE.)
Direct Influences on the Reconstruction Process The final image must always be
stored in the normal pixel, raster format. Regardless of which particular basis functions
are used, what are estimated by the image reconstruction algorithms are the coefficients
of equations C.1 and C.2, the fj . The main consideration, therefore, is the calculation of
the system matrix, the stochastic likelihoods of an emission occurring in a given location
3
This argues against physical realism! An emission image is more likely to constitute large numbers of
activations that would never result in a homogeneous distribution.
being detected in a given LOR. It affects, that is, the interpolation method used in the
following manner.
Let p̄i denote the line integral of λ̄ along the ith LOR. From C.2:
J−1
X
p̄i = aij fj , (C.4)
j=0
where aij is the line integral along the ith LOR of the j th shifted basis function; i.e.,
that which is centered at (xj , yj ). Its value depends upon [Lewitt 1992]:
• for pixels: the orientation of the line relative to the pixel and on the distance of the
line from the pixel centre.
• for blobs: only on the distance of the line from the basis function centre, r, and is
independent of the direction of the line integral.
And hence the cited efficiency. More important, however, is that the LORs are not
seen to pass through single, rather isolated pixels. Instead, they have a relationship to
the pixels neighbouring the current location of interest, which will increase as the blob
diameter increases.
Method and Results from [Hanson and Wecksung 1985] [Hanson and Wecksung
1985]’s experiments were based on a 32-by-32 pixel phantom, reconstructed by deriving
the same dimensionality of basis function coefficients, but visualiased at 512-by-512 pixels.
Expansion of the [resulting] coefficient set fj for display purposes was done in two steps:
• The basis function expansion of equation C.3 is evaluated on a 256-by-256 image
grid by convolving the coefficients with a kernel that consists of the point sampling
of the basis function.
• Further interpolation is carried out to reach the final 512-by-512 resolution display
using bilinear interpolation for all but the square basis function, for which simple
replication is employed (piecewise continuous) to preserve the sharp boundaries of
the pixels.
As such, the workers employ a “zooming” method to preserve sharp boundaries in
the reconstruction, and then complain of sharp boundaries in the resulting image! In all,
the use of basis functions in [Hanson and Wecksung 1985] simply provided an improved
method of interpolation for display purposes, stating a result long since established; i.e.,
improved display magnification using cubic interpolation. Hence the emphasis for the work
was visualisation aesthetics and not the quantitative accuracy of the reconstructions.
Implementing the Basis Functions of [Lewitt 1990] Lewitt, one of the chief ex-
ternal contributors to the work of [Hanson and Wecksung 1985] continued the efforts,
aiming primarily to address the ability of the basis functions to approximate the image
distribution [Lewitt 1990] (criterion (B) of the original motivating factors given in ta-
ble 62). To this end, he concludes that the best choice are Bessel functions, which have,
since this paper, been the routinely adopted approach of this collaboration [Lewitt 1992,
Matej and Lewitt 1996, Jacobs et al. 1998].
The family of functions is characterised by two parameters:
164 Appendix C. Theoretical Background to Blobs
• the order, m, a non-negative integer; and

• the taper, α, which determines the shape of the blob, and is a real number.
The basis functions are thus described by bm,α (r), for which both m and α are deter-
mined using phantom experiments. Formally, the functions are defined as:
q
2 m/2 2
1 − dr Im α 1 − dr
bm,r,α,r (d) = , (C.5)
Im (α)
where r is the radius of the basis functions, d is the distance away from the centre
(i.e., the function will return 0 for values of d > r), and the Im are Bessel functions of the
first kind, of order m [Press et al. 1988]. Figure C.2 shows an example profile of these
functions using a distance of 5 pixels, and α = 6.72351 and m = 2, which are given as
optimal in [Jacobs et al. 1998] among other papers that have been developed as a part of
this collaboration.
1
0.9
0.8 Example "Blob" Profile.

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
This distance is specfied by the user. It is in pixels.
Figure C.2: The above shows the characteristic profile of the generalised Kaiser-Bessel window
functions used in the literature of [Lewitt 1990]. Note that it is Gaussian-like, although it has a
more naturally constrained spatial extent.
In deriving the coefficients of the system matrix, the algorithms using the blob basis
functions do not calculate the overlap of the LOR for each pixel. On this basis the
workers are able cite an efficient implementation method where the corresponding aij are
dependent only on the distance between pixel centre and the LOR. This they implement
in the form of a look-up table (i.e., exact distances are not used), and the distances
themselves are accessed in an incremental manner (i.e., probably based on the classic line
tracing algorithm of [Bresenham 1965]).
A Bone of Contention In the case of pixels, the value of the reconstructed image
at any point can be found directly from the coefficients. In the case of the blobs, the
expression for λ̄ must be summed explicitly using the vector coefficients produced by the
final iteration of the algorithm in order to find the values of the reconstructed image at
the specific sample points. To me this makes no sense.
The preference for Bessel functions over, say, Gaussians is only because of their limited
spatial extent. I assume that the Gaussian may be truncated at twice its FWHM, which
being less than 0.4% of its peak value, seems appropriate, so high frequency components
resulting from the introduction of sharp-edge basis functions (pixels) are still sensibly
avoided.
What would be more convincing, in any of the fairly extensive literature that has
resulted from almost 10 years of research in the direction of spherical basis functions, is
that they showed their results in image form! Admittedly, this was done in a few of the
papers, but these show FBP algorithms to be outperforming their EM-ML implementation,
and hence I can only doubt the validity of their conclusions (see, for example, [Matej et
al. 1994]).
D
Characteristics of the Poisson
Distribution
Before citing some basic information regarding the Poisson distribution, it is important
to clarify that this distribution only approximately describes the sinogram data fed to
the reconstruction algorithms. It is well known that radioactive decay obeys the Poisson
model, but because of the corrections made to the projection counts, the data that we
operate with does not follow this distribution.
The corrections employed are done so as to combat effects due to attenuation, randoms
(accidental coincidences), and scatter. These firstly amplify the magnitude of the noise
(see, for example, [Huang et al. 1979, Bergström et al. 1980, Hoffman et al. 1981, Hoffman
and Phelps 1986]), and secondly may result in [additional] spatial correlations [Rowe and
Dai 1992]. We are to question then the use of a MLE using the Poisson distribution as
its functional form. Indeed, as this latter point implies [reduced] independence in the
emissions, then the use of the Maximum Pseudo-Likelihood estimator 1 (equation 7.24) is
less easily justified.
It is no surprise therefore that workers such as [Yavuz and Fessler 1997] have sug-
gested different (Gaussian) models for the distribution. The results are indeed impres-
sive, although the tendency elsewhere is, as in this thesis, to use the Poisson model.
Post-correction correction methods have even been developed to re-introduce the Poisson
statistics to the data [Comtat et al. 1998], and this yields impressive results. In short, not
many are ready to give up this assumption until clearly better (more accurate and more
efficient) alternatives are available.
The remainder of this chapter simply cites from the literature three classic references
that discuss the Poisson distribution. The first presents an example case for radioactive
decay, and the second two discuss the theory specific to emission reconstruction.
From [Leo 1987]:
The Poisson distribution describes processes for which the single trial prob-
ability of success is very small but in which the number of trials is so large
that there is nevertheless a reasonable rate of events. To take a concrete ex-
ample, consider a typical radioactive source such as 137 Cs which has a half-life
1
The likelihood estimator is actually a pseudo likelihood estimator, as one would learn from most
statistical texts. This is because of the assumption made regarding the independence of the probabilities.
Our pseudo estimator is, however, consistent. That is, in its limit - i.e., for large images - it will approximate
a true likelihood estimator [Guyon and Künsch 1996].
167
of 27 years. The probability per unit time for a single nucleus to decay is then
λ = ln 2 ÷ 27 = 0.026÷ year = 8.2 × 10−10 s−1 . A small probability indeed!
However, even a 1µg sample of 137 Cs will contain about 1015 nuclei. Since
each nucleus constitutes a trial, the mean number of decays from the sample
will be µ = N p = 8.2 × 105 decays/s, where N is the number of trials, and p
is the probability of success of a single trial. This satisfies the limiting condi-
tions [of the Binomial distribution - the definition of the Poisson], so that the
probability of observing r decays is given by:
µr exp−µ
P (r) = . (D.1)
r!
Note that in this equation only the mean appears so that knowledge of N
and p is not always necessary. This is the usual case in experiments involving
radioactive processes or particle reactions where the mean counting rate is
known rather than the number of nuclei or particles in the beam. In many
problems also, the mean unit per dimension λ, e.g., the number of reactions
per second, is specified and it is desired to know the probability of observing r
events in t units, for example, t = 3s. An important point to note here is that
the mean in equation D.1 refers to the mean number in t units. We also find
that
σ 2 = µ, (D.2)
that is the variance of the Poisson distribution is equal to the mean. The stan-
√
dard deviation is then σ = µ. Note that the distribution is not symmetric.
The peak or maximum of the distribution does not, therefore, correspond to
the mean. However, as µ becomes large, the distribution becomes more and
more symmetric and approaches a Gaussian form. For µ ≥ 20, a Gaussian
distribution with mean µ and variance σ 2 = µ, in fact, becomes a relatively
good approximation and can be used in place of the Poisson for numerical
calculations.
This last point is particular pertinent to papers such as [Fessler 1994], as was given in
section 8.2.1.
From [Ollinger and Fessler 1997]:
If a deterministic finite number of nuclei are injected into the patient, then,
strictly speaking, a multinomial distribution would be more precise than the
Poisson assumption. However, in practice the exact number of nuclei is un-
known and may well be considered a random variable with a Poisson process:
furthermore, a Poisson process “thinned” by Bernoulli trials remains Poisson
[Macovski 1983] all of which leads to [the use of] the Poisson process.
Characteristics of the Poisson distribution [Lange and Carson 1984]:
• A Poisson process is completely determined by its mean.
• A non-negative, integer-valued random variable Z follows a Poisson process if:
λk
P (Z = k) = exp−λ , (D.3)
k!
for some λ > 0.
168 Appendix D. Characteristics of the Poisson Distribution
Properties:
• Z has mean E(Z) = λ.
• The sum of any two [or more] Poisson processes is also a Poisson process.
• Suppose that a random number Z of particles is generated according to a Poisson

distribution with mean λ. Let each particle be independently distributed to one of m
categories, the k th element of which occurs with probability pk . If we let Zk represent
a random number of particles falling in category k, Zk follows a Poisson distribu-
tion with mean λpk . Furthermore,
Pm the collection of random variables Z1 , ..., Zm is
independent. Since Z = k=1 k this implies that the number of particles are
Z ,
independent random variables.
This last point is interesting in that it would indeed justify the use of the pseudo-
likelihood estimator. Further discussion is given in the original paper with respect to this
issue.
E
Additional Results for the Thesis’
Algorithms
This appendix includes additional results for the algorithms of section 8.3.1, chapter 10
and section 11.2.
E.1 Results for the Gaussian Field Prior Algorithm

The first set of figures are results for the algorithm of chapter 10. The variation seen is a
result of changing either the prior distribution (i.e., changing the means of each Gaussian
field), the widths of the Gaussian (for which a number of heuristics may be employed), or
both.
E.2 Results for the Adaptive Interpolation Kernel

This second section includes additional results for the algorithms of chapter 11.2, although
figure E.15 shows also the results of section 8.3.1’s cross entropy algorithms for comparative
purposes.
The first set of images, figures E.6, E.7, E.8, E.9, E.10, E.11, E.12 and E.13 show
reconstructions from the FDG study taken at the University Hospital in Groningen, the
Netherlands. These were taken using the HRT+ scanner from CTI. Figure E.14 is a
phantom study, the sinograms for which were kindly donated by the PET group of the
Institute for Medicine at the Nuclear Research centre in Jülich, Germany. Being made up
of only three homogeneous regions, these phantom sets allow a good comparison of both
noise and resolution recovery, and hence the additional comparison to section 8.3.1’s cross
entropy methods.
The "Alternative Gaussian"
Bayesian Scheme.
K= 50, Gaussians estimated
using the means of eq. 10.14.
(No entropy term.)
170
1st Iteration 2nd Iteration 3rd Iteration 4th Iteration 5th Iteration
Starting Estimate for each Maximum Likelihood Iterations

of the Methods. This is an starting from the OSEM estimate,
OSEM reconstruction of i.e., the Bayesian algorithm for K=0.
5 iterations (8 Subsets).
The Prior used. estimated
according to a least-squares
fit of the MRI data.


Bayesian Scheme.
K= 50, Gaussians estimated
using the means of eq. 10.14,
but with an entropy term.
1st Iteration 2nd Iteration 3rd Iteration 4th Iteration 5th Iteration

Bayesian Scheme.
using the means of eq. 10.15,
The Entropy Image used to
i.e., with the additional additionally constrain the
constraints. (No entropy Gaussian fields.
term).
Figure E.1: The above figure shows the results of the Bayesian method on real PET-MRI studies for different Gaussian fields. From left to right
we see results at each iteration step. Each of the four schemes (a different scheme being presented in each row) begins with a starting estimate
derived from an OSEM reconstruction after 5 iterations (8 subsets). The top row shows each iteration step for the Bayesian scheme using K = 50
(equation 10.12), where the means of the Gaussians comes from the prior shown on the far right. The estimates for the σj terms is taken from
equation 10.14 without the entropy weighting. The next row shows how the EM-ML based reconstruction steps compare. This is of course equivalent
to the Bayesian method for K = ∞. The third row is the same as the first, but with the entropy weighting. The entropy image itself is shown on
the bottom right. The bottom row is the Bayesian method with the σj terms estimated from equation 10.15, and without the entropy measure.
The prior is taken from a least-squares fit of the MRI segmentations to the PET data using activity ratios of 10:3:1 for GM:WM:CSF and an
Appendix E. Additional Results for the Thesis’ Algorithms
approximation of the PSF relating the PET resolution to that of the MRI. This is equivalent to the method of [Labbé et al. 1997] where only 3
tissue compartments are used. The entropy image was taken about a 3-by-3 pixel window in the MRI image, and the values scaled between 1 ± 0.6,
as given in equation 10.11. See also figure E.2.
Maximum Likelihood Reconstructions.
The Alternative Gaussian scheme, without additional constraints. K=50.
The Alternative Gaussian scheme, with additional constraints. K=50.

E.2. Results for the Adaptive Interpolation Kernel
Iterations 1 to 5 of each Algorithm.
The Selected ROI as shown on the prior. GM Segmentation. WM Segmentation. CSF Segmentation.
Figure E.2: The above figure shows the results of figure E.1 for (top row) the EM-ML algorithm, (second row) the Bayesian scheme of this chapter
with K = 50, and (third row) the same scheme with the additional constraints on the fields of equation 10.15 and K = 50. The bottom row shows
the prior used, and the probabilistic segmentations from which the additional heuristics were drawn (the brighter areas indicate the higher affinity
171
to the given tissue type). The point is, we are able to “tighten” the reconstruction solution to the prior according to what the segmentation tells
us. Activity in the GM regions can be reconstructed with less regularisation because we are able to “tidy up” in less interesting (noisier) regions.
This effect can be seen by analysing the above: in the bottom row, the CSF region is more tightly constrained to the prior than the row above, and
the contrast shown between it and the GM region is consequently enhanced. Such effects are only possible due to the flexibility offered in using
different Gaussian fields for each pixel.
16 16
1.745e+09 1.745e+09
172
14 14
12 12
Energy
Energy
1.735e+09 1.735e+09
log Likelihood
log Likelihood
10 10
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
Iteration Iteration Iteration Iteration
1(a) and (b) - K= 45, no entropy. 1(c) and (d) - K= 45, with entropy.
16 16
1.745e+09 1.745e+09
14 14
12 12
Energy
Energy
1.735e+09 1.735e+09
log Likelihood
log Likelihood
10 10
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
2(a) and (b) - K= 50, no entropy.

2(c) and (d) - K= 50, with entropy.
16 16
1.745e+09 1.745e+09
14 14
12 12
Energy
Energy
1.735e+09 1.735e+09
log Likelihood
log Likelihood
10 10
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
Iteration Iteration Iteration Iteration
3(a) and (b) - K= 55, no entropy. 3(c) and (d) - K= 55, with entropy.
Figure E.3: The above plots show how the algorithm of chapter 10 behaves in terms of its energy and log likelihood values at each iteration step
(of which only 5 are shown as the algorithm quickly settles to a solution). Each row shows the result for one fixed value of K (see equation 10.12),
first without the entropy measure applied, and then with. The first column is the energy term (which is to be minimised) and the next column
is the likelihood (which should be maximised). This is then repeated for the same value of K but with the entropy term. The first value shown
is that due to the starting estimate, an OSEM reconstruction. Here, the likelihood was maximised without regularisation due to the prior. Hence
both the energy term and the likelihood values are high. See section 10.5 for the full discussion of these results.
Gaussian Field Reconstruction. Gaussian Field Reconstruction. Gaussian Field Reconstruction.
OSEM starting estimate. K=45, 5 iterations.
Prior Distribution.
K=45, plus entropy. 5 Iterations. K=45, plus constraints. 5 iterations.
800 800 800 800 800
600 600 600 600 600
400 400 400 400 400
200 200 200 200 200
0 0 0 0 0
18 18 18
"Registered" MRI Slice. Entropy Image.
14 14 14
Energy K = 45
Energy K = 45 (entropy)
Energy K = 45 (constrained)
10 10 10
8 0 1 2 3 4 5 0 1 2 3 4 5
0 1 2 3 4 5
Iteration Iteration
Iteration
1.9e+09 1.9e+09
1.9e+09
1.89e+09 1.89e+09
1.89e+09
1.88e+09 1.88e+09 1.88e+09
1.87e+09 1.87e+09 1.87e+09
log Likelihood (K=45)

1.86e+09 1.86e+09 1.86e+09
log Likelihood (K=45 with entropy)

log Likelihood (K=45 plus constraints)
1.85e+09 1.85e+09 1.85e+09

0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
Iteration Iteration Iteration
Figure E.4: The above shows more results using the algorithm of chapter 10. On this occasion, a better registration was available, and the results
make more sense. The image at the top-left is the starting estimate, an OSEM reconstruction. The algorithm was ran for 5 more iterations, with
K fixed at 45 (see equation 10.12), but also with the entropy measure and then the additional constraints on the WM and CSF distributions.
These are shown second from left, third and fourth, respectively. The prior used is shown on the far right (this also gives some indication of the
segmentation). Profiles are shown under each image, and for the Bayesian reconstructions we also show energy and log likelihood plots. Note that
173
likelihoods remain pretty constant. On the bottom-left is the associated MRI image, and on the bottom right the entropy image.
Gaussian Field Reconstruction. Gaussian Field Reconstruction. Gaussian Field Reconstruction.
OSEM starting estimate. K=60, 5 iterations. K=60, plus entropy. 5 Iterations. K=60, plus constraints. 5 iterations. Prior Distribution.
174
800 800 800 800 800
600 600 600 600 600
400 400 400 400 400
200 200 200 200 200
0 0 0 0 0
"Registered" MRI Slice. Entropy Image.

18 18 18
"energyK60ac.txt"
14 14 14
Energy (K=60)
Energy K=60 (entropy)
Energy K=65 (constrained)
10 10 10
8
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
Iteration Iteration Energy K=65 (constrained)
1.9e+09
1.9e+09 1.9e+09
1.89e+09
1.89e+09 1.89e+09
1.88e+09
1.88e+09 1.88e+09
1.87e+09
1.87e+09 1.87e+09
log Likelihood (K=60)

1.86e+09
1.86e+09 1.86e+09
log Likelihood (K=60 with entropy)

log Likelihood (K=60 plus constraints)
1.85e+09
0 1 2 3 4 5 1.85e+09
1.85e+09
Iteration 0 1 2 3 4 5 0 1 2 3 4 5
Iteration Iteration
Figure E.5: The above shows more results using the algorithm of chapter 10, but this time with the more relaxed values for the Gaussian fields.
The image at the top-left is the starting estimate, an OSEM reconstruction. The algorithm was ran for 5 more iterations, with K fixed at 60 (see
equation 10.12), but also with the entropy measure and then the additional constraints on the WM and CSF distributions. These are shown second
from left, third and fourth, respectively. The prior used is shown on the far right (this also gives some indication of the segmentation). Profiles are
shown under each image, and for the Bayesian reconstructions we also show energy and log likelihood plots. Note that likelihoods remain pretty
constant. On the bottom-left is the associated MRI image, and on the bottom right the entropy image.
1800 1800 6.86 mm 1800
5.145 mm
1400 1400 1400
3.43 mm
(Sensitivity 0.3)
1000 1000 1000
600 600 600
200 200
200
0 0
0 0 0
Figure E.6: The above figure consists of three reconstructed slices (number 20) of the Groningen data set. Each is reconstructed to 256x256 pixels,
where each pixel is 1.714mm2 . Each are reconstructed using 5 iterations of the OSEM algorithm, applied to 8 subsets presented in a sequential
fashion. On the left the image has been reconstructed using a fixed interpolation kernel of 3.43mm FWHM; in the middle kernels of 3.43mm,
5.145mm and 6.86mm were applied; and for the image on the right a fixed 5.145mm kernel was used. How the kernels were chosen for the central
image was based on SNR estimates taken from the back-projected image (section 11.2.2). The resulting FWHM map is shown as an icon in the
bottom-right corner of the central image. Here, the sensitivity threshold was 0.3. The profiles that are shown correspond to the white line crossing
175
the top of the image. The clearly observable effect is resolution recovery and reduced noise.
176
1800 1800 6.86 mm 1800

5.145 mm
1400 1400 1400
3.43 mm
(Sensitivity 0.7)
1000 1000 1000
600 600 600
200 200 200

0 0 0
0 0
Figure E.7: The above figure consists of three reconstructed slices (number 20) of the Groningen data set. Each is reconstructed to 256x256
pixels, where each pixel is 1.714mm2 . Each are reconstructed using 5 iterations of the OSEM algorithm, applied to 8 subsets presented in a
sequential fashion. On the left the image has been reconstructed using a fixed interpolation kernel of 3.43mm FWHM; in the middle kernels of
3.43mm, 5.145mm and 6.86mm were applied; and for the image on the right a fixed 5.145mm kernel was used. How the kernels were chosen for
the central image was based on a ROI taken from the GM segmentation (section 11.2.2), which may be compared to the previous figure that used
noise estimates. The resulting FWHM map is shown as an icon in the bottom-right corner of the central image. Here, the sensitivity threshold was
0.7. The profiles that are shown correspond to the white line crossing the top of the image.
6.86 mm
1600 1600 1600
5.145 mm
3.43mm
1200 Sensitivity = 0.5
1200 1200
800 800 800
400 400 400
0 0 0
Figure E.8: The above figure consists of three reconstructed slices (number 31) of the Groningen data set. Each is reconstructed to 256x256 pixels,
where each pixel is 1.714mm2 . Each are reconstructed using 5 iterations of the OSEM algorithm, applied to 8 subsets presented in a sequential
fashion. On the left the image has been reconstructed using a fixed interpolation kernel of 3.43mm FWHM; in the middle kernels of 3.43mm,
5.145mm and 6.86mm were applied; and for the image on the right a fixed 5.145mm kernel was used. How the kernels were chosen for the central
image was based on SNR estimates taken from the back-projected image (section 11.2.2). The resulting FWHM map is shown as an icon between
the right two plots. The profiles that are shown correspond to the white line crossing the top of the image. The clearly observable effect is again
177
resolution recovery and reduced noise.

178
6.86 mm
1600 1600 1600
5.145 mm
3.43mm
1200 Sensitivity = 0.5
1200 1200
800 800 800
400 400 400
0 0 0
3.43mm, 5.145mm and 6.86mm were applied; and for the image on the right a fixed 5.145mm kernel was used. How the kernels were chosen for
the central image was based on a ROI taken from the GM segmentation (section 11.2.2), which may be compared to the previous figure that used
noise estimates. The resulting FWHM map is shown as an icon in the bottom-right corner of the central image. Here, the sensitivity threshold was
0.5. The profiles that are shown correspond to the white line crossing the top of the image.
6.86 mm
2000 2000 2000
5.145 mm
3.43 mm
1500 1500 1500
1000 1000 1000
500 500 500
0 0 0
3.43mm, 5.145mm and 6.86mm were applied; and for the image on the right a fixed 5.145mm kernel was used. How the kernels were chosen for the
central image was based on SNR estimates taken from the back-projected image (section 11.2.2). The resulting FWHM map is shown as an icon in
the bottom-right corner of the central image. The profiles that are shown correspond to the white line crossing the top of the image. Note again
the resolution recovery with excellent regularisation. Indeed, a clear “hot-spot” has been identified in this image that we can now assume, on the
179
basis of our experiments, is not an artifact.

180
2000 2000 6.86 mm

2000
5.145 mm
3.43 mm
1000 1000 1000
0 0 0
central image was based on SNR estimates taken from the back-projected image (section 11.2.2). The resulting FWHM map is shown as an icon
in the bottom-right corner of the central image.

2000 2000 6.86 mm 2000
5.145 mm
3.43 mm
1000 1000 1000
0 0 0
Figure E.12: The above figure consists of three reconstructed slices of the Groningen data set. Each is reconstructed to 256x256 pixels, where
each pixel is 1.714mm2 . Each are reconstructed using 5 iterations of the OSEM algorithm, applied to 8 subsets presented in a sequential fashion.
On the left the image has been reconstructed using a fixed interpolation kernel of 3.43mm FWHM; in the middle kernels of 3.43mm, 5.145mm and
6.86mm were applied; and for the image on the right a fixed 5.145mm kernel was used. How the kernels were chosen for the central image was
181
based on SNR estimates taken from the back-projected image (section 11.2.2). The resulting FWHM map is shown as an icon in the bottom-right
corner of the central image.
182
2000 2000 2000

6.86 mm
5.145 mm
3.43 mm
(Sensitivity 0.3)
1000 1000 1000
0 0 0
central image was based on SNR estimates taken from the back-projected image (section 11.2.2). The resulting FWHM map is shown as an icon in
the bottom-right corner of the central image. Note again the resolution recovery, and it is also interesting to compare this reconstruction to that
of FBP used in Groningen (figure 7.2).

OSEM reconstruction to 128x128 pixels, 2 Iterations, 16 Subsets. OSEM reconstruction to 128x128 pixels, 2 Iterations, 16 Subsets. OSEM reconstruction to 128x128 pixels, 2 Iterations, 16 Subsets.
Fixed Interpolation Kernel (FWHM = 2 pixels). Variant Interpolation Kernel: Fixed Interpolation Kernel (FWHM = 2 pixels).
FWHM = 2
FWHM = 3
FWHM = 4
180000 180000 180000
140000 140000 140000

100000 100000 100000
60000 60000 60000
20000 20000 20000
0 0 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
Figure E.14: The above images (reconstructed to 128-by-128 pixels) show the results of using the same OSEM reconstruction procedure (2
iterations, 16 subsets) on the Jülich phantom study. Below each of these images is a 1-D profile passing horizontally through the centre. On the left,
the image is a result of the fixed interpolation scheme, where the FWHM is 2 pixels. Note in particular the noisy profile in homogeneous regions.
In the middle is the result of varying the interpolation scheme between 2 and 4 pixel FWHMs, as communicated by the icon FWHM map in the
top-left of the central reconstruction. Note again the recovery of the most active region (peaking at a value greater than that recovered in either of
the other two images), and the reduced noise in homogeneous regions. The image on the right is a reconstruction using a fixed interpolation kernel
of 3 pixels FWHM.
183
184
140000 140000 140000
100000 100000 100000
60000 60000 60000
20000 20000 20000
0 0 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
The Ardekani Algorithm , 10 iterations, Beta = 0.1 The Som Algorithm, 10 iterations, Beta = 0.1 The adaption using anisotropic diffusion filtering.
10 iterations, Beta = 0.1, K = 10.
The Segmentation Used:
Figure E.15: The above images were each reconstructed using a cross entropy algorithm from section 8.3.1 run for 10 iterations. On the left is
the result of using the algorithm of [Ardekani et al. 1996], in the middle that due to [Som et al. 1998], and on the right that proposed in section 46
using equation 8.17. For each, the weighting of the prior’s influence (the β term of equation 8.11), is set to 0.1. The segmentation used from which
the local image gradients were derived is shown as an icon in the bottom-right.
Unregularised EM-ML reconstruction. Ardekani regularisation, Beta = 0.2. Som regularisation, Beta = 0.2. Diffusion regularisation, Beta = 0.2.
1800 1800 1800 1800
1400 1400 1400 1400

1000 1000 1000 1000
600 600 600 600
200 200 200 200

0 0 0 0
Figure E.16: This figure shows reconstructions using the cross entropy algorithms given in section 8.3.1 The top rows shows (from left to right):
the EM-ML reconstruction (i.e., an unregularised reconstruction); the result of using the algorithm of [Ardekani et al. 1996]; that due to [Som et
al. 1998]; and that proposed in section using equation 8.17. For each, the weighting of the prior’s influence (the β term of equation 8.11), is set to
0.2. All algorithms were run for 10 iterations. Below are the profiles of the image data, and we can easily recognise that smoothing is obviously
not compatible with the notion of resolution recovery. (Between two plots in the centre of the image is the GM segmentation from which the image
gradients were taken to determine the extent of the smoothing.)
185
F
Glossary of Terms
• AG - Alternative Gaussian: the prior of chapter 10.
• ART - Algebraic Reconstruction Techniques: the family of series expansion re-

construction techniques that update the reconstruction solution iteratively using a
weighted residual between the forward projected current estimate and the measured
data. This is done on a LOR by LOR basis. See [Gordon 1974].
• AVS - Advanced Visual Systems: an image processing tool kit and environment,
most suitable for the rapid prototyping of algorithms and the visualisation of their
results.
• BGO - Bismuth Germanate (Bi4 Ge3 O12 ). The crystals used in most PET scanners.
• BP - Back-projection. The operator associated to the inverse Radon transform (see

section 4.1.1).
• CSF - Cerebro Spinal Fluid: the brain’s shock-absorption system. Additionally

responsible for gradual changes in the chemical environment of the brain, know to
affect the resulting methods of processing.
• CT - X-Ray Computed Tomography: a tomographic imaging modality based on the

phenomenon of X-ray absorption.
• CTI - Computer Technology Imaging Inc.: manufacturers of PET scanners. See

http://www.cti-pet.com
• DCT - Discrete Cosine Transformation: the image transformation used to derive the
component forms of an arbitrary image. This is used in extensively in chapter 3.
• ECAT-ART - ECAT Advanced Rotating Tomograph: a scanner made by CTI.
• EM-ML - Maximum Likelihood Expectation Maximization: a family of parameter

optimization algorithms that extend the process to operate in two separate param-
eter spaces to ease the computational complexity. The algorithm is applicable to
estimation problems of so-called incomplete data, within which it is difficult to find
the solution. Instead, one makes the estimation in a more complete data space,
and then relates this solution to that of the incomplete space. This requires the
establishment of a relationship between the values. See [Dempster et al. 1977].
187
• EME - Expectation-Maximization-Expectation: the EM algorithm plus an addi-

tional E-step. See section 11.1.3 of this report.
• FBP - Filtered Back-Projection: the most commonly used method of PET recon-
struction. This is a direct inversion method based on the BP operator, as described
in section 4.1.1.
• FDG - Fluro-Dopa: a radioactive tracer commonly used in PET studies of the brain.
• FFT - Fast Fourier Transform: efficient implementation of the Fourier transform.
• FWHM - Full-Width at Half-Maximum: a standard definition of the standard devi-

ation of a spherical basis function. It gives its width at a point where the amplitude
is half its maximum value.
• FOV - Field of View: the visible imaging field of any imaging device.
• GM - Grey Matter: highly packed (in terms of neuron density) brain tissue. Re-
sponsible, therefore, for the majority of the brain’s energy dissipation.
• GRF - Gaussian Random Field: an energy field described by a Gaussian distribution.

See section 9.2.3.
• ICA - Iterated Coordinate Ascent: See [Bouman and Sauer 1996].
• ICM - Iterated Conditional Modes: Proposed by [Besag 1986] as a feasible approxi-

mation to the MAP estimate of a segmentation solution based on a MRF prior.
• LOR - Line of Response: Amounts to being a single sinogram measurement.
• LSI - Linear Shift-Invariant System - any signal processing system that can be char-
acterised solely by its linear and shift invariant processing of the input signal.
• LSO - Lu2 SiO5 . The crystals used in the next generation of PET scanners.
• MAP - Maximum A Posterior: MAP estimators define the posteriori degree of belief
in an inference problem given the observed data (wrapped usually in a likelihood
function) and an a priori belief in what is to be expected.
• MG - Müller-Gärtner algorithm of [Müller-Gärtner et al. 1992] discussed in sec-

tion 2.8 of this report.
• ML - Maximum Likelihood: see MLE.
• MLE - Maximum Likelihood Estimator: a probabilistic estimation for the likeli-

hood of a given input set of parameters (that are typically estimated) having been
responsible for giving rise to the observed (measured) parameters.
• ML-EM - see EM-ML.
• MRF - Markov Random Field: a random energy field of a distribution determined

from an explicit parameterisation and its neighbour properties.
188 Appendix F. Glossary of Terms
• MRI - Magnetic Resonance Imaging: an imaging modality producing images of

proton density.
• MS - Multiple Sclerosis: sclerosis occurring in patches in the brain or the spinal cord,
or both.
• MSC - Muscle: an organ composed of a tissued bundle of fibers by whose contraction

bodily movements are produced.
• NAG - Numerical Algorithms Group: see http://www.nag.co.uk/.
• OSEM - Ordered Subsets Expectation Maximization: A fast and efficient EM recon-

struction algorithm that operates only on selected subsets of the sinogram data to
arrive at the reconstruction solution. The choice of the subsets is relatively arbitrary,
and the solution is visually almost the same as the traditional EM result, although
an order of magnitude faster. The well defined properties of the EM algorithm, such
as each iteration yielding an increased or equivalent (in the case of convergence)
likelihood value are not, however, retained. Nonetheless, the algorithm is now used
clinically. See [Hudson and Larkin 1994].
• OSL - One Step Late: the optimisation algorithm used to derive the maximum pos-
terior reconstruction result in [Green 1990]. The partial differential equation relating
the prior to the reconstruction solution that must be solved in the optimisation pro-
cess can only be evaluated using the reconstruction solution of the previous estimate.
And hence the procedure’s name.
• PDF - Probability Density Function: a histogram representation of the probability

distribution of a known function.
• PET - Positron Emission Tomography: Quantitative images of a radioactive tracer

uptake, indicative of flow, oxygen utilisation, or metabolic changes, dependent of the
choice of compound to be labelled. See chapter 1.
• PIXEL: algorithm of [Yang et al. 1996] discussed in section 2.9 of this report.
• PRT-2 - Prototype Rotating Tomograph: a PET scanner made by CTI.
• PSF - Point Spread Function: the system response of the scanning device to an
impulse (Dirac pulse).
• PVE - Partial Volume Effect: the averaging of measured regions’s quantitative val-
ues, resulting from the finite resolution size. A function of the scanning device:
crystal size and efficiency.
• ROI - Region of Interest: any 2-D delineated area of the image considered to be
of interest. Typically measurements are taking over such a region, ignoring the
remainder of the data.
• SIR - Statistical Image Reconstruction: the reconstruction techniques based on a

stochastic geometric tomograph model, and a noise model for the observed data.
See [Shepp and Vardi 1982, Lange and Carson 1984].
189
• SNR - Signal to Noise Ratio: a measure of the quality of a signal. Its noiseless value
with respect to the noise. It can often only be estimated.
• SOR - Successive Over Relaxation: the optimisation algorithm used to derive the
reconstruction result in [Fessler 1994].
• SPECT / SPET: Single Photon Emission [Computed] Tomography: similar to PET

accept the isotopes used emit only a single photon.
• SPM - Statistical Parametric Mapping: the software system (and its associated
methods) for the analysis of functional imaging data developed at Hammersmith
Hospital and then the Institute of Neurology in London. See http://www.fil.ion.
ucl.ac.uk/.
• SVD - Singular Value Decomposition: a method of decomposing a matrix into a

simpler form by exploiting matrix orthogonality alone.
• VOI - Volume of Interest: any 3-D delineated area of the volume considered to
be of interest. Typically measurements are taking over such a region, ignoring the
remainder of the volume.
• WM - White Matter: sparsely packed brain tissue.

Bibliography
[Alenius and Ruotsalainen 1997] S Alenius and U Ruotsalainen. Bayesian image recon-
struction for emission tomography based on median root prior. Eur. J. Nucl. Med.,
24:258–265, 1997.
[Alpert et al. 1982] NM Alpert, DA Chessler, JA Correla, RH Ackerman, JY Chang,
S Finklestein, SM Davis, GL Brownell, and JM Taveras. Estimation of the local statisti-
cal noise in emission computed tomography. IEEE Trans. Med. Imag., MI-1(2):142–146,
1982.
[Andrews and Hunt 1977] HC Andrews and BR Hunt. Digital Image Restoration. Prentice
Hall Signal Processing Series, Englewood Cliffs, NJ, 1977.
[Ardekani et al. 1996] BA Ardekani, M Braun, BF Hutton, I Kanno, and H Iida. Minimum
cross-entropy reconstruction of PET images using prior anatomical information. Phys.
Med. Biol., 41:2497–2517, 1996.
[Barrett and Swindell 1982] HH Barrett and W Swindell. Radiological Imaging. Academic
Press, New York, 1982.
[Bartels et al. 1987] RH Bartels, JC Beatty, and BA Barsky. An Introduction to Splines
for use in Computer Graphics and Geometric Modeling. Morgan Kaufmann Publishers,
California, 1987.
[Bates et al. 1983] RHT Bates, KL Garden, and TM Peters. Overview of computerized
tomography with emphasis on future developments. Proceedings of the IEEE, 71(3):356–
372, 1983.
[Bergström et al. 1980] M Bergström, C Bohm, K Ericson, and L Ericksson. Corrections
for attenuation, scattered radiation, and random coincidences in a ring detector positron
emission transaxial tomograph. IEEE Trans. Nuc. Sci., 27:435–444, 1980.
[Bergström et al. 1983] M Bergström, L Ericksson, C Bohm, and et al. Correction for
scattered radiation in a ring detector positron camera by integral transformation of the
projections. Journal of Computed Assisted Tomography, 7:42–50, 1983.
[Besag 1986] JE Besag. On the statistical analysis of dirty pictures. Journal Royal Sta-
tistical Society, Series B., 48:259–302, 1986.
[Blasberg et al. 1989] RG Blasberg, RE Carson, R Kawai, and et al. Strategies for the
study of the opiate receptors in the brain: Application to the opiate antagonist cyclofoxy.
Journal of Cerebral Blood Flow and Metabolism, 9(1):732–739, 1989.
Bibliography 191
[Boulfelfel et al. 1992] D Boulfelfel, RM Rangayyan, LJ Hahn, and R Kloiber. Prerecon-

struction restoration of myocardial single photon emission tomograpy images. IEEE
Trans. Med. Imaging, 11(3):336–341, 1992.
[Bouman and Sauer 1993] CA Bouman and K Sauer. A generalized gaussian image model
for edge-preserving MAP estimation. IEEE Trans. Med. Imaging, 2:296–310, 1993.
[Bouman and Sauer 1996] CA Bouman and K Sauer. A unified approach to statistical
tomography using coordinate descent optimization. IEEE Trans. Med. Imaging, MI-
5(3):480–492, 1996.
[Bowsher et al. 1996] JE Bowsher, VE Johnson, TE Turkington, RJ Jaszczak, CE Floyd,

and RE Coleman. Bayesian reconstruction and use of anatomical a priori information
for emission tomography. IEEE Trans. Med. Imaging, 99:99–99, 1996.
[Bresenham 1965] JE Bresenham. Algorithm for computer control of a digital plotter.

IBM Systems Journal, 4(1):25–30, 1965.
[Budinger et al. 1979] TF Budinger, GT Gullberg, and RH Huesman. Image reconstruc-

tion from projections: Implementation and applications. pages 147–246. Springer Ver-
lag, 1979.
[Byrne 1993] CL Byrne. Iterative image reconstruction algorithms based on cross-entropy

minimization. volume 2, pages 96–103, 1993.
[Casey et al. 1996] ME Casey, L Eriksson, M Schmand, MS Andreaco, M Dahlbom,

R Nutt, and M Paulus. Investigation of lso crystals for high spatial resolution positron
emission tomography. IEEE Trans. Nuc. Sci., 44(3):1109–1113, 1996.
[Censor 1983] Y Censor. Finite series-expansion reconstruction methods. Proc. of the

IEEE, 71:409–419, 1983.
[Chawluk et al. 1987] JB Chawluk, A Alavi, R Dann, HI Hurtig, S Bais, MJ Kushner,

RA Zimmerman, and M Reivich. Positron emission tomography in aging and dimentia:
Effect of cerebral atrophy. Journal of Nuclear Medicine, 28:431–437, 1987.
[Cocosco et al. 1997] CA Cocosco, V Kollokian, RK-S Kwan, and AC Evans. Brainweb:
Online interface to a 3D MRI simulated brain database. In Proceedings of 3rd Interna-
tional Conference on Functional Mapping of the Human Brain, Copenhagen, 1997.
[Comtat et al. 1998] C Comtat, PE Kinahan, M Defrise, C Michel, and DW Townsend.

Fast reconstruction of 3D PET data with accurate statistical modeling. IEEE Trans.
Nuc. Sci., 45(3):1083–1089, 1998.
[Condon et al. 1986] BR Condon, J Patterson, D Wyper, and et al. A quantitative index
of ventricular and extraventricular intracranial csf volumes using MR imaging. Journal
of Comp. Assis. Tomog., 10:784–792, 1986.
[Daube-Witherspoon and Carson 1991] ME Daube-Witherspoon and RE Carson. Unified

deadtime correction model for PET. IEEE Trans. Med. Imag., MI-10:267–275, 1991.
192 Bibliography
[de Jonge and Blobkland 1999] FAA de Jonge and AK Blobkland. Statistical tomographic
reconstruction: How many more iterations to go? European Journal of Nuclear
Medicine, 26(10):1247–1250, 1999.
[Deans 1983] SR Deans. The radon transform and some of its applications. 1983.
[Defrise et al. 1997] M Defrise, PE Kinahan, DW Townsend, C Michel, M Sibomana, and

D Newport. Exact and approximate rebinning algorithms for 3D PET. IEEE Trans.
Med. Imag., MI-16:145–158, 1997.
[Dempster et al. 1977] AP Dempster, NM Laird, and DB Rubin. Maximum likelihood

from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B,
39:1–38, 1977.
[DePierro 1991] AR DePierro. A modified expectation maximization algorithm for penal-

ized likelihood estimation in emission tomography. Technical Report 65, Instituto de
Matemática, Universidade Estadual de Campinas, 1991.
[DePierro 1995] AR DePierro. A modified expectation maximization algorithm for penal-

ized likelihood estimation in emission tomography. IEEE Trans. Med. Imag., 14(1):132–
137, 1995.
[Derenzo 1986] SE Derenzo. Mathematical removal of positron range blurring in high

resolution tomography. IEEE Trans. Nuc. Sci., 33:565–569, 1986.
[Emert 1998] F Emert. Monte-Carlo-Simulationen der Bildentstehung in der 3D-

Positronenemissionstomographie. PhD thesis, Mathematisch-Naturwissenschaftlichen
Fakultät, Rheinischen Friedrich-Wilhelms-Universität, 1998.
[Fessler and Karp 1997] PE Kinahan JA Fessler and JS Karp. Statistical image recon-
struction in PET with compensation for missing data. IEEE Trans. Nucl. Sci., NS-
44(4):1552–1563, 1997.
[Fessler and Ollinger 1996] JA Fessler and JM Ollinger. Signal processing pitfalls in
positron emission tomography. Technical Report 302, Communications and Signal Pro-
cessing Laboratory, University of Michigan, 1996. Available from http://www.eecs.
umich.edu/~fessler.
[Fessler and Rogers 1996] JA Fessler and WL Rogers. Resolution properties of regu-
larized image reconstruction methods. Technical Report 297, Communications and
Signal Processing Laboratory, University of Michigan, 1996. Available from http:
//www.eecs.umich.edu/~fessler.
[Fessler et al. 1996] JA Fessler, EP Ficaro, NH Clinthorne, and K Lange. Grouped-

coordinate ascent algorithms for penalized-likelihood transmission image reconstruction.
IEEE Trans. Med. Imaging, 16(2):166–175, 1996.
[Fessler 1994] JA Fessler. Penalized weighted least-squares image reconstruction for

positron emission tomography. IEEE Trans. Med. Imaging, 13:290–300, 1994.
Bibliography 193
[Fessler 1997] JA Fessler. Aspire 3.0 user’s guide: A sparse iterative reconstruction library.
Technical Report 293, Communications and Signal Processing Laboratory, University
of Michigan, 1997. Available from http://www.eecs.umich.edu/.
[Fox et al. 1985] PT Fox, JS Perlmutter, and ME Raichle. A stereotactic method of

anatomical localization for positron emission tomography. Journal of Computed As-
sisted Tomography, 9:141–153, 1985.
[Friston et al. 1995] KJ Friston, J Ashburner, J-B Poline, CD Frith, JD Heather, and RSJ
Frackowiak. Spatial registration and normalization of images. Human Brain Mapping,
2:165–189, 1995.
[Frost et al. 1995] JJ Frost, CC Meltzer, and et al. MR-based correction of brain PET
measurements for heterogeneous gray matter radioactivity distribution. Neuroimage,
2:32–32, 1995.
[Furuie et al. 1994] SS Furuie, GT Herman, TK Narayan, JS Karp, RM Lewitt, and

S Matej. A methodology for testing for statistically significant differences between
fully 3D pet reconstruction algorithms. Phys. Med. Biol., 39:341–354, 1994.
[Geman and Geman 1984] S Geman and D Geman. Stochastic relaxation, gibbs distribu-
tions, and the bayesian restoration of images. IEEE Trans Patt. Anal. Mach. Intel.,
6(6):721–741, 1984.
[Geman and McClure 1985] S Geman and D McClure. Bayesian image analysis: An appli-
cation to single photon emission tomography. In Proceedings of the Statistical Computing
Section, pages 12–18. American Statistical Association, 1985.
[Geman and McClure 1987] S Geman and D McClure. Statisical models for tomographic
image reconstruction. Bull. Int. Statist. Inst., 52:5–21, 1987.
[Geman et al. 1993] S Geman, KM Manbeck, and DE McClure. A comprehensive sta-

tistical model for single-photon emission tomography. In R Chellapa and A Jain, edi-
tors, Markov Random Fields: Theory and Applications, pages 93–136. Academic Press,
Boston, 1993.
[George et al. 1988] AE George, LA Stylopoulus, MJ de Leon, H Rusinek, M Mourino,

and H Kowalski. MR imaging quantification of selective temporal lobe gray matter loss
in alzheimer disease. Radiology, 169:264–271, 1988.
[Gindi et al. 1991] G Gindi, M Lee, A Rangarajan, and IG Zubal. Bayesian reconstruction
of functional images using registered anatomical images as priors. In Colchester ACF
and Hawkes D, editors, Information Processing in Medical Imaging, pages 121–130.
Springer Verlag, 1991.
[Gopal and Hebert 1994] SS Gopal and TJ Hebert. Pre-reconstruction restoration of spect
projection images with a neural network. IEEE Trans. Nuc. Sci., 41(4):1620–1625, 1994.
[Gordon 1974] R Gordon. A tutorial on ART (algebraic reconstruction techniques). IEEE

Trans. Nucl. Sci., NS-21:78–82, 1974.
194 Bibliography
[Green 1990] PJ Green. Bayesian reconstruction from emission tomography data using a
modified EM algorithm. IEEE Trans. Med. Imag., 9:84–93, 1990.
[Grenander 1978] U Grenander. Abstract Inference. John Wiley and Sons, New York,
1978.
[Gustafson and Kessel 1979] DE Gustafson and W Kessel. Fuzzy clustering with a fuzzy
covariance matrix. IEEE CDC, 2:761–770, 1979.
[Guyon and Künsch 1996] X Guyon and HR Künsch. Asymptotic Comparison of Estima-
tors in the Ising Model, pages 177–198. Springer, Berlin, 1996.
[Hanson and Wecksung 1985] KM Hanson and GW Wecksung. Local basis-function ap-
proach to computed tomography. Applied Optics, 24(23):4028–4039, 1985.
[Hebert and Leahy 1989] T Hebert and R Leahy. A generalized EM algorithm for the 3D
bayesian reconstruction from poisson data using gibbs priors. IEEE Trans. Med. Imag.,
8(2):194–202, 1989.
[Hebert 1990] T Hebert. Statistical stopping criteria for iterative maximum likelihood
reconstruction. Phys. Med. Biol., 35:1221–1232, 1990.
[Herman et al. 1973] GT Herman, A Lent, and SW Rowland. Art: Mathematics and
applications; a report on the mathematical foundations and on the applicability to real
data of the algebraic reconstruction techniques. Journal Theor. Biol., 42:1–32, 1973.
[Herman et al. 1982] GT Herman, RA Robb, JE Gray, RM Lewitt, RA Reynolds,
B Smith, H Tuy, DP Hanson, and CM Katz. Reconstruction algorithms for dose reduc-
tion in x-ray computed tomography. In Proceedings MEDCOMP ’82, pages 448–455.
IEEE Computer Society, 1982.
[Herman et al. 1984] GT Herman, H Levkowitz, HK Tuy, and S McCormick. Multilevel
image reconstruction. In Rosenfeld A, editor, Multiresolution Image Processing and
Analysis, pages 121–135, Berlin, 1984. Springer-Verlag.
[Hero et al. 1998] AO Hero, R Piramuthu, JA Fessler, and SR Titus. Theory and impl-
mentation of minimax ECT image reconstruction with MRI side information. Technical
Report 312, Communications and Signal Processing Laboratory, University of Michigan,
1998. Available from http://www.eecs.umich.edu/.
[Herscovitch et al. 1986] P Herscovitch, AP Auchus, M Gado, D Chi, and MD Raichle.
Correction for positron emission tomography data for cerebral atrophy. Journal for
Cerebral Blood Flow and Metabolism, 6:120–124, 1986.
[Hoffman and Phelps 1986] EJ Hoffman and ME Phelps. Positron emission tomography:
Principles and quantitation. In M Phelps, J Mazziotta, and H Schelbert, editors,
Positron Emission Tomography and Autoradiography: Principles and Applications for
the Brain and Heart. Raven Press, New York, 1986.
[Hoffman et al. 1981] EJ Hoffman, S-C Huang, ME Phelps, and DE Kuhl. Quantitation
in positron emission tomography: 4. effect of accidental coincidences. Journ. Comp.
Assist. Tomogr., 5:391–400, 1981.
Bibliography 195
[Hoffman et al. 1982] EJ Hoffman, S-C Huang, D Plummer, and ME Phelps. Quantitation
in positron emission tomography: 6. the effect of nonuniform resolution. Journ. Comp.
Assist. Tomogr., 6(5):987–999, 1982.
[Huang et al. 1979] S-C Huang, EJ Hoffman, ME Phelps, and DE Kuhl. Quantitation in
positron emission tomography: 2. effects of inaccurate attenuation correction. Journ.
Comp. Assist. Tomogr., 3:804–814, 1979.
[Hudson and Larkin 1994] HM Hudson and RS Larkin. Accelerated image reconstruction
using ordered subsets of projection data. IEEE Trans. Med. Imag., 13:601–609, 1994.
[Hurwitz 1975] H Hurwitz. Entropy reduction in bayesian analysis of measurements. Phys.

Rev., A,, 12:698–706, 1975.
[Hutton et al. 1997] BF Hutton, HM Hudson, and FJ Beekman. A clinical perspective of

accelerated statistical reconstruction. European Journal of Nuclear Medicine, 24:797–
808, 1997.
[Ingvar et al. 1965] DH Ingvar, S Cronovist, and et al. Acta. Neural Scand., 41(12):72–78,
1965.
[Jacobs et al. 1998] F Jacobs, S Matej, and R Lewitt. Image reconstruction techniques
for PET. Technical Report R9810, University of Ghent, 1998. Available at http:
//petultra.rug.ac.be/~jacobs.
[Jain 1989] AK Jain. Fundamentals of Digital Image Processing. Prentice Hall, Baltimore
and London, second edition, 1989.
[Johnson 1994] VE Johnson. A note on stopping rules in EM-ML reconstruction of ECT

images. IEEE Trans. Med. Imag., 13:569–571, 1994.
[Kaczmarz 1937] S Kaczmarz. Angenäherte auflösung von systemen linearer gleichungen.

Bull. Acad. Polon. Sci. Lett. A., 35:355–357, 1937.
[Kak and Slaney 1988] AC Kak and M Slaney. Principles of Computerized Tomography.
IEEE Engineering in Medicine and Biology Society, 1988.
[Kameyama et al. 1979] M Kameyama, C Wasterlain, R Ackermann, D Finch, J Lear,

and D Kuhl. Neuronal responses of the hippocampal formation to injury: Blood flow,
glucose metabolism, and protein synthesis. Exp. Neurol., 79:329–346, 1979.
[Kaufman 1987] L Kaufman. Implementing and accelerating the EM algorithm for PET.
IEEE Trans. Med. Imag., 6(1):37–51, 1987.
[Kaufman 1993] L Kaufman. Maximum likelihood, least squares, and penalized least
squares for PET. IEEE Trans. Med. Imag., 12(2):200–214, 1993.
[Kehren et al. 1999] F Kehren, T Schmitz, H Herzog, and H Halling. Implementation

aspects of a fully iterative, parallel 3d-reconstruction-software for PET data. In 1999
International Meeting on Fully Three-Dimensional Image Reconstruction in Radiology
and Nuclear Medicine, pages 96–99, Egmond aan Zee, Netherlands, 1999.
196 Bibliography
[Kiebel et al. 1997] SJ Kiebel, J Ashburner, J-B Poline, and KJ Friston. A crossvalidation
of SPM and AIR. Neuroimage, 5:271–279, 1997.
[King et al. 1981] PH King, K Hubner, W Gibbs, and E Holloway. Noise identification
and removal in positron imaging systems. IEEE Trans. Nucl. Sci., NS-28(1):148–151,
1981.
[King et al. 1983] MA King, PW Doherty, and RB Schwinger. A wiener filter for nuclear
medicine images. Medical Physics, 10(6):876–880, 1983.
[King et al. 1984] MA King, RB Schwinger, PW Doherty, and BC Penney. Two-
dimensional filtering of SPECT images using metz and wiener filters. Journal of Nuclear
Medicine, 25:1234–1240, 1984.
[Knorr et al. 1993] U Knorr, Y Huang, G Schlaug, RJ Seitz, and H Steinmetz. High
resolution PET images through REDISTRIBUTION. In Lemke et al., editor, Comp.
Assis. Radiology, Berlin, 1993. Springer Verlag.
[Kosugi et al. 1995] Y Kosugi, M Sase, Y Suganimi, and J Nishikawa. Dissolution of
PVE in positron emission tomography by an inversion recovery technique with the MR-
embedded neural network model. Neuroimage, 2:S:35, 1995.
[Kullback 1969] S Kullback. Information Theory and Statistics. Dover, New York, 1969.
[Labbé et al. 1997] C Labbé, M Koepp, J Ashburner, T Spinks, M Richardson, J Dun-

can, and V Cunningham. Absolute PET quantification with correction for partial vol-
ume effects within cerebral structures. In RE Carson, ME Daube-Witherspoon, and
P Herscovitch, editors, Quantitative Functional Brain Imaging with Positron Emission
Tomography, pages 59–66. Academic Press, 1997.
[Lange and Carson 1984] K Lange and RE Carson. EM reconstruction algorithms for
emission and transmission tomography. Journal of Computed Tomography, 8(2):306–
316, 1984.
[Lange et al. 1987] K Lange, M Bahn, and L Roderick. A theoretical study of some max-
imum likelihood algorithms for emission and transmission tomography. IEEE. Trans.
Med. Imag., 6(2):106–114, 1987.
[Leahy and Qi 1998] RM Leahy and J Qi. Stastical approaches in quantitative positron
emission tomography. Technical report, University of Southern Calafornia, 1998. Avail-
able at http://sipi.usc.edu/~jqi/.
[Leahy and Yan 1991] R Leahy and X Yan. Incorporation of anatomical MR data for
improved functional imaging with PET. In ACF Colchester and D Hawkes, editors,
Information Processing in Medical Imaging, pages 105–120. Springer Verlag, 1991.
[Lent 1976] A Lent. A convergent algorithm for maximum entropy image restoration with
a medical x-ray application. In Shaw R, editor, Image Analysis and Evaluation, pages
249–257. Society of Photographic Scientists and Engineers, Washington DC, 1976.
[Leo 1987] WR Leo. Techniques for Nuclear and Particle Physics Experiments (A How
To Approach). Springer Verlag, 1987.
Bibliography 197
[Levitan and Herman 1987] E Levitan and GT Herman. A maximum a posterior proba-
bility expectation maximization algorithm for image reconstruction in emission tomog-
raphy. IEEE Trans. Med. Imag., 6:185–192, 1987.
[Lewitt 1990] RM Lewitt. Multidimensional digital image representations using Kaiser-
Bessel window functions. J. Opt. Soc. Am., 7:1834–1846, 1990.
[Lewitt 1992] RM Lewitt. Alternatives to voxels for image representation in iterative
reconstruction algorithms. Phys. Med. Biol., 37(3):705–716, 1992.
[Liang and Hart 1987] Z Liang and H Hart. Bayesian image processing of data from
constrained source distributions; i: Non-valued, uncorrelated and correlated constraints.
Bull. Math. Bio., 49:51–74, 1987.
[Liang et al. 1989] Z Liang, R Jaszczak, and K Greer. On bayesian image reconstruction
from projections: Uniform and nonuniform a priori source information. IEEE Trans.
Med. Imag., 8:227–235, 1989.
[Links et al. 1990] JM Links, JK Zubieta, CC Meltzer, MJ Stumpf, and JJ Frost. Influ-
ence of spatially heterogeneous background activity on hot object quantitation in brain
emission tomography. Journ. Comp. Assist. Tomogr., 20(4):680–687, 1990.
[Liow and Strother 1992] J-S Liow and SC Strother. The quantitive performance of the
maximum likelihood based reconstruction with image wide resolution convergence. Jour-
nal of Nuclear Medicine, 33(5):871–876, 1992.
[Liow and Strother 1993] J-S Liow and SC Strother. The convergence of object dependent
resolution in maximum likelihood based tomographic image reconstruction. Physics in
Medicine and Biology, 38:55–70, 1993.
[Liow et al. 1997] J-S Liow, SC Strother, K Rehm, and DA Rottenberg. Improved resolu-
tion for PET volume imaging through three-dimensional image reconstruction. Journal
of Nuclear Medicine, 38(10):1623–1631, 1997.
[Lipinski et al. 1997] B Lipinski, H Herzog, E Kops, W Oberschelp, and HW Müller-
Gärtner. Expectation maximization reconstruction of positron emission tomography
images using anatomical magnetic resonance information. IEEE Trans on Med. Imag.,
16(2):129–136, 1997.
[Lipinski 1996] B Lipinski. Rekonstruktion von positronen-emissions-tomographischen
Bildern unter Einbeziehung anatomischer Information. PhD thesis, Der Mathematisch-
Naturwissenschaftlichen Fakultät, Rheinischen-Westfälischen Technischen Hochschule
Aachen, 1996.
[Llacer and Veklerov 1997] J Llacer and E Veklerov. Feasible images and practical stop-
ping rules for iterative algorithms in emission tomography. IEEE Trans on Med. Imag.,
8:186–193, 1997.
[Llacer et al. 1991] J Llacer, E Veklerov, and J Nuñez. Preliminary examination of the use
of case specific medical information as prior in bayesian reconstruction. In Colchester
ACF and Hawkes D, editors, Information Processing in Medical Imaging, pages 81–93.
Springer Verlag, 1991.
198 Bibliography
[Luenberger 1973] DG Luenberger. Linear and Nonlinear Programming. Addison-Wesley,

second edition, 1973.
[Ma et al. 1993] Y Ma, M Kamber, and AC Evans. 3D simulation of PET brain images
using segmented MRI data and positron tomograph characteristics. Comp. Med. Imag.
and Graph., 17(4):365–371, 1993.
[Macovski 1983] A Macovski. Medical Imaging Systems. Prentice-Hall, New Jersey, 1983.
[Maeland 1988] E Maeland. On the comparison of interpolation methods. IEEE Trans.

Med. Imag., 7(3):213–217, 1988.
[Mallat 1989] SG Mallat. A theory for the multiresolution signal decomposition: The
wavelet representation. IEEE Trans. Pattern Anal. Machine Intell., 7:674–693, 1989.
[Matej and Lewitt 1996] S Matej and RM Lewitt. Practical considerations for 3-D image
reconstruction using spherically symmetric volume elements. IEEE Transactions on
Medical Imaging, 15(1):68–78, 1996.
[Matej et al. 1994] S Matej, GT Herman, TK Narayan, SS Furuie, RM Lewitt, and

PE Kinihan. Evaluation of task-orientated performance of several fully 3D PET re-
construction algorithms. Phys. Med. Biol., 39:355–367, 1994.
[McLachlan and Krishnan 1997] GJ McLachlan and T Krishnan. The EM Algorithm and
Extensions. John Wiley and Sons, 1997.
[Meltzer et al. 1990] CC Meltzer, JP Leal, HS Mayberg, HN Wagner, and JJ Frost. Cor-
rection of PET data for partial volume effects in human cerebral cortex by MR imaging.
Journ. Comp. Assist. Tomogr., 14(4):561–570, 1990.
[Meltzer et al. 1996] CC Meltzer, JK Zubieta, JM Links, P Brakeman, MJ Stumpf, and

JJ Frost. MR-based correction of brain PET measurments for heterogeneous gray matter
radioactivity distribution. Journal of Cerebral Blood Flow and Metabolism, 16:650–658,
1996.
[Michel et al. 1998] C Michel, M Sibomana, M Lonneux, M Defrise, P Kinahan,

D Townsend, D Newport, and P Luk. FORE and OSEM: a practical solution for iter-
ative reconstruction in a clinical environment. Technical report, Université Catholique
de Louvain, 1998. Available at http://www.topo.ucl.ac.be/.
[Miller and Wallis 1992] TR Miller and JW Wallis. Clinically important characteristics
of maximum likelihood reconstruction. Journal of Nuclear Medicine, 33(9):1678–1684,
1992.
[Mintun and Lee 1990] MA Mintun and KS Lee. Mathematical realignment of paired PET
images to enable pixel-by-pixel subtraction. Journal of Nuclear Medicine, 31:186–191,
1990.
[Mühllehner and Colsher 1982] G Mühllehner and JG Colsher. I really must look this one
up. Instrumentation in Computed Emission Tomography, 99, 1982.
Bibliography 199
[Müller-Gärtner et al. 1992] HW Müller-Gärtner, JM Links, JL Prince, RN Bryan,

E McVeigh, JP Leal, C Davatzikos, and JJ Frost. Measurement of radiotracer concen-
tration in brain gray matter using positron emission tomography: MRI-based correction
for partial volume effects. Journal of Cerebral Blood Flow and Metabolism, 12:571–583,
1992.
[Mumcuoğlu et al. 1994] EÜ Mumcuoğlu, RM Leahy, SR Cherry, and Z Zhou. Fast
gradient-based methods for bayesian reconstruction of transmission and emission PET
images. IEEE Trans. Med. Imag., 13(4):687–701, 1994.
[Mumcuoğlu et al. 1996] EÜ Mumcuoğlu, RM Leahy, and SR Cherry. Bayesian recon-
struction of PET images: Methodology and performance analysis. Phys. Med. Biol.,
41:1777–1807, 1996.
[Mǒch et al. 1997] N Mǒch, LK Hansen, SC Strother, C Svarer, DA Rottenberg,

B Lautrup, R Saviy, and OB Paulson. Nonlinear versus linear models in functional
neuroimaging: Learning curves and generalization crossover. In James Duncan and
Gene Gindi, editors, Information Processing in Medical Imaging, Chapel Hill, North
Carolina, 1997. Springer Verlag.
[Narasimha and Peterson 1978] MJ Narasimha and AM Peterson. On the computation of

the discrete cosine transformation. IEEE Trans. Commun., 26(6):924–936, 1978.
[NAS 1996] NAS. Mathematics and Physics of Emerging Biomedical Imaging. National
Academy of Sciences, 1996. Avaliable at http://www.nap.edu/readingroom/books/
biomedical/.
[Neal 1993] RM Neal. Probabilistic inference using markov chain monte carlo methods.
Technical Report CRG-TR-93-1, University of Toronto, 1993. Available at http://
www.cs.utoronto.ca/~radford/publications.html.
[Nunez and Llacer 1990] J Nunez and J Llacer. A fast bayesian reconstruction algorithm
for emission tomography with entropy prior converging to feasible images. IEEE Trans.
Med. Imag., 9:159–171, 1990.
[Oakley et al. 1999a] J Oakley, J Missimer, and G Székely. MRI-based correction and re-
construction of PET images. In ER Hancock and M Pelillo, editors, Energy Minimiza-
tion Methods in Pattern Recognition and Computer Vision, pages 301–316. Springer,
1999.
[Oakley et al. 1999b] J Oakley, J Missimer, and G Székely. Multilevel bayesian image
reconstruction and better priors for positron emission tomography. In 1999 International
Meeting on Fully Three-Dimensional Image Reconstruction in Radiology and Nuclear
Medicine, Egmond aan Zee, Netherlands, 1999.
[Older and Johns 1993] JK Older and PC Johns. Matrix formulation of computed tomo-
gram reconstruction. Phys. Med. Biol., 38:1051–1064, 1993.
[Ollinger and Fessler 1997] JM Ollinger and JA Fessler. Positron emission tomography.
IEEE Signal Processing Magazine, pages 43–55, January 1997.
200 Bibliography
[Ollinger 1994] JM Ollinger. Maximum likelihood reconstruction of transmission images

in emission computed tomography via the expectation maximization algorithm. IEEE
Trans. Med. Imag., 13(1):89–101, 1994.
[Orr 1996] MJL Orr. Introduction to radial basis function networks. Technical report,
Centre for Cognitive Science, University of Edinburgh, 1996. Available from http:
//www.anc.ed.ac.uk/~mjo/intro/intro.html.
[Ouyang et al. 1994] X Ouyang, WH Wong, and VE Johnson. Incorporation of correlated

structural images in PET image reconstruction. IEEE Trans. Med. Imag., 13(2):627–
640, 1994.
[Perona and Malik 1987] P Perona and J Malik. Scale-space and edge-detection using
anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell., 12(7):629–639, 1987.
[Pitas and Venetsanopoulos 1990] I Pitas and A Venetsanopoulos. Nonlinear Digital Fil-
ters: Principles and Applications. Kluwer Academic, New York, 1990.
[Politte and Snyder 1991] DG Politte and DL Snyder. Corrections for accidental coin-
cidences and attenuation in maximum-likelihood image reconstruction for positron-
emission tomography. IEEE Trans. Med. Imag., 10(1):82–89, 1991.
[Press et al. 1988] WH Press, BP Flannery, SA Teukolsky, and WT Vetterling. Numerical

Recipes in C. Cambridge University Press, 1988.
[Qi et al. 1997] J Qi, RM Leahy, EÜ Mumcuoüglu, SR Cherry, A Chatziioannou, and
TH Farquhar. High resolution 3D bayesian image reconstruction for microPET. 1997
International Meeting in Fully 3D Image Reconstruction, 1997.
[Radon 1917] J Radon. Über die Bestimmung von Funktionen durch ihre Integralwerte
längs gewisser Mannigfaltigkeiten. Ber. Ver. Sächs. Akad. Wiss. Leipzig, 69:262–277,
1917.
[Raheja et al. 1999] A Raheja, T Doniere, and AP Dhawan. Multiresolution expectation

maximization reconstruction algorithm for positron emission tomography using wavelet
processing. IEEE Trans. Nuc. Sci, 46(3), 1999.
[Ranganath et al. 1988] MV Ranganath, AP Dhawan, and N Mullani. A multigrid expec-

tation maximization reconstruction algorithm for positron emission tomography. IEEE
Trans. Med. Imag., 7(4):273–278, 1988.
[Rehrauer et al. 1998] H Rehrauer, K Seidel, and M Datcu. Multiscale image segmentation
with a dynamic label tree. In IGARSS’98, pages 1–2. Kingston-Upon-Thames, 1998.
[Ripley 1996] BD Ripley. Pattern Recognition and Neural Networks. Cambridge University
Press, Cambridge, 1996.
[Rousset et al. 1993a] OG Rousset, Y Ma, M Kamber, and AC Evans. Simulations of

radiotracer uptake in deep nuclei of human brain. Journal of Comp. Med. Imag. and
Graph., 17(4):373–379, 1993.
Bibliography 201
[Rousset et al. 1993b] OG Rousset, Y Ma, GC Léger, AH Gjedde, and AC Evans. Cor-
rection for partial volume effects in PET using MRI-based 3D simulations of individual
human brain metabolism. In Quantification of Brain Function, pages 113–126. Elsevier,
1993.
[Rousset et al. 1995] OG Rousset, Y Ma, S Marenco, DF Wong, and AC Evans. In vivo
correction for partial volume accuracy and precision. Neuroimage, 2:33–33, 1995.
[Rowe and Dai 1992] RW Rowe and S Dai. A pseudo-poisson noise model for simulation
of positron emission tomographic data. Med. Phys., 19:1113–1119, 1992.
[Sastry and Carson 1997] S Sastry and RE Carson. Multimodality bayesian algorithm for
image reconstruction in positron emission tomography: A tissue composition model.
IEEE Trans. Med. Imag., 16(6):750–761, 1997.
[Sauer and Bouman 1993] K Sauer and CA Bouman. A local update strategy for iterative
reconstruction from projections. IEEE Trans. Med. Imag., 13(2):534–548, 1993.
[Scales and Smith 1999] JA Scales and M Smith. Introductory Geophysical Inverse Theory
(Draft). Samizdat Press, 1999. Available at http://www.landau.mines.edu/~jscales.
[Schafer 1997] JL Schafer. Analysis of Incomplete Multivariate Data. Number 72 in Mono-

graphs on Statistics and Applied Probability. Chapman and Hall, 1997.
[Schmidlin et al. 1994] P Schmidlin, ME Bellmann, and J Doll. Iterative PET-

bildrekonstruktion: Einfluss der projektionsmethode auf das rauschen. Nuklear Medizin,
33(2:A78), 1994.
[Seret 1998] A Seret. Median root prior and ordered subsets in Bayesian image reconstruc-
tion of single-photon emission tomography. Eur. J. Nucl. Med., (25):215–219, 1998.
[Shepp and Vardi 1982] LA Shepp and Y Vardi. Maximum likelihood reconstruction in
positron emission tomography. IEEE Trans. Med. Imag., 1:113–122, 1982.
[Sivia 1996] DS Sivia. Data Analysis: A Bayesian Tutorial. Oxford Science Publications,
1996.
[Snyder et al. 1987] DL Snyder, MI Miller, LJ Thomas, and DG Politte. Noise and edge
artifacts in maximum likelihood reconstruction for emission tomography. IEEE Trans.
Med. Imag., 6:228–238, 1987.
[Sokoloff et al. 1977] L Sokoloff, M Reivich, and et al. The (14C)deoxglucose method for
the measurement of local cerebral glucose utilization: Theory, procedure, and normal
values in the conscious and anaethetized albino rat. Journal of Neurochemistry, 28:897–
916, 1977.
[Som et al. 1998] S Som, BF Hutton, and M Braun. Properties of minimum cross-entropy
reconstruction of emission tomography with anatomically based prior. IEEE Trans.
Med. Imag., 24(6):3014–3021, 1998.
[spm ] See http://www.fil.ion.ucl.ac.uk/spm.

202 Bibliography
[Stamos et al. 1988] JA Stamos, WL Rogers, NH Clinthorpe, and KF Koral. Object-

dependent performance comparison of two iterative reconstruction methods. IEEE
Trans. Nuc. Sci., 31(1):611–614, 1988.
[Stayman and Fessler 1999] JW Stayman and JA Fessler. Penalty design for uniform spa-
tial resolution in 3d penalized-likelihood image reconstruction. In 1999 International
[Strang 1980] G Strang. Linear Algebra and Its Applications. Academic Press, second
edition, 1980.
[Suhail et al. 1998] SS Suhail, CA Bouman, and K Sauer. ML parameter estimation for
markov random fields with application to bayesian tomography. IEEE Trans. Imag.
Proc., 7(7):1029–1044, 1998.
[Tanaka 1987] E Tanaka. A filtered iterative reconstruction algorithm for positron emis-
sion tomography. In CN de Graaf and MA Viergever, editors, Information Processing
in Medical Imaging, pages 217–233, New York, 1987. Plenum Press.
[Todd-Pokropek 1983] A Todd-Pokropek. Image processing in nuclear medicine. IEEE

Trans. Nuc. Sci., 27(3):1080–1094, 1983.
[Toft 1996] P Toft. The Radon Transform: Theory and Implementation. PhD thesis,
Technical University of Denmark, Section for Digital Signal Processing, 1996. Available
at http://www.ei.dtu.dk/staff/ptoft/ptoft.html.
[Vardi et al. 1985] Y Vardi, LA Shepp, and L Kaufman. A statistical model for positron
emission tomography. Amer. Stat. Assoc., 80(389):8–37, 1985.
[Veklerov and Llacer 1987] E Veklerov and J Llacer. Stopping rule for the MLE algorithm
based on statistical hypothesis testing. IEEE Trans. Med. Imag., 6(4):313–319, 1987.
[Videen et al. 1988] TO Videen, JS Perlmutter, MA Mintun, and ME Raichle. Regional

correction of positron emission tomography for the effects of cerebral atrophy. Journal
of Cerebral Blood Flow and Metabolism, 8:662–670, 1988.
[Vollmar et al. 1999] S Vollmar, W Eschner, K Wienhard, and U Pietrzyk. Iterative recon-
struction of emission tomography data with a-priori information. In 1999 International
[Wang and Vagnucci 1981] D Wang and A Vagnucci. Gradient inverse weighted smoothing
scheme and the evaluation of its performance. Comp. Vision. Graph. Image Proc.,
15:167–181, 1981.
[Watson et al. 1995] CC Watson, D Newport, and ME Casey. A single scatter simulation
technique for scatter correction in 3d pet. In P Grangeat and J-L Amans, editors, Three-
Dimensional Image Reconstruction in Radiation and Nuclear Medicine, pages 255–268,
Netherlands, 1995. Kluwer.
Bibliography 203
[Wehrli 1988] F Wehrli. Principles of magnetic imaging. In DD Stark and WG Bradley,

editors, MRI, pages 3–23, St Louis, 1988. Mosby.
[Wilson and Tsui 1993] DW Wilson and BMW Tsui. Noise properties of filtered-
backprojection and ML-EM reconstructed emission tomographic images. IEEE Trans.
Nuc. Sci., 40(4):1198–1203, 1993.
[Wilson et al. 1994] DW Wilson, BMW Tsui, and HH Barrett. Noise properties of the
EM-algorithm: 2. monte-carlo simulations. Phys. Med. Biol., 39(5):847–871, 1994.
[Winkler 1994] G Winkler. Image Analysis, Random Fields and Dynamic Monte Carlo
Methods. A Mathematical Introduction. Spriger, 1994.
[Worsley et al. 1992] KJ Worsley, AC Evans, S Marrett, and P Neelin. A three-
dimensional statistical analysis for cbf activation studies in human brain. Journal of
Cerebral Blood Flow and Metabolism, 12:900–918, 1992.
[Wu 1993] Z Wu. Map image reconstruction using wavelet decomposition. In HH Barrett
and AF Gmitro, editors, Information Processing in Medical Imaging, pages 354–371,
Flagstaff, Arizona, 1993. Springer Verlag.
[Yang et al. 1996] J Yang, SC Huang, M Mega, KP Lin, AW Toga, GW Small, and
ME Phelps. Investigation of partial volume correction methods for brain fdg pet studies.
IEEE Trans. Nucl. Sci., 43(6):3322–3327, 1996.
[Yavuz and Fessler 1997] M Yavuz and JA Fessler. New statistical models for randoms-
precorrected pet scans. In James Duncan and Gene Gindi, editors, Information Pro-
cessing in Medical Imaging, Chapel Hill, North Carolina, 1997. Springer Verlag.
[Young et al. 1986] AB Young, KA Frey, and BW Agranoff. In vitro and in vivo. pages
73–111, 1986.
[Young 1971] DM Young. Iterative Solution of Large Linear Systems. Academic, New
York, 1971.
[Zou et al. 1996] L-H Zou, Z Wang, and LE Roemer. Maximum entropy multi-resolution
em tomography by adaptive subdivision. In Maximum Entropy and Bayesian Methods,
pages 79–89, Dordrecht, 1996. Kluwer.
Acknowledgment
I will be forever grateful to my supervisors for showing such genuine concern for my
welfare. I have been lucky, and I would therefore like to express sincere thanks to Professor
Dr. Gábor Székely and Dr. John Missimer.
I was also extremely lucky to have been examined by Professor Dr. Hans Herzog. His
advice has been invaluable to this doctoral work, and I only wish contact had been made
at an earlier stage.
My thanks also go to the PET group in Jülich for making my visit such a pleasant
one. In particular I would like to thank Frank Kehren whose advice and help significantly
improved my implementation.
In Groningen, I would like to thanks those who supplied me with the data used in this
thesis: Nico Leenders, Paul Maguire, Linda Meiners and Antoon Willemsen.
Closer to home, I thank the people whom I worked with over the past three years. This
includes the wonderful BIWI group, particularly Professor Dr. Luc Van Gool, Michael
Schröder and Michael Quicken. Of those at the PSI, I thank Antonio Barchetti, Chantal
Martin, Steffi Magyar, Albert Romann and Peter Vontobel.
Finally, thanks also to Dr. Simon Arridge, Professor Dr. David Barber, Professor
Dr. Peter Bösiger, Dr. Michel Defrise, Professor Dr. Edwin Hancock, Dr. Christian Michel,
Carlos Sorzano, and, of course, Harry Redknapp.
Curriculum Vitae
Name: Jonathan David Oakley

Date of birth: 13. September 1970
Place of birth: Kingston-upon-Thames, England
Citizenship: British
Marital status: Single
1980–1986 Tiffin Boys School

1986–1988 Kingston College of Further Education
1988–1990 Trainee Engineer, GEC Plessey Telecommunications Ltd.
1990–1994 Computer Science degree at the University of York, England
1994–1996 Graduate Studies at University College, London, England
1997-1999 Research scientist, Paul Scherrer Institute, Switzerland
Index
algebraic reconstruction, 1, 61 Fast Fourier Transform (FFT), 22

Algebraic Reconstruction Techniques (ART), field of view (FOV), 40, 48
64, 150 filtered back-projection, 37
alternative Gaussian (AG), 108 fixed-point, 73
anisotropic filtering, 92, 141 forward model, 29
apodizing window, 37 FWHM maps, 128
attenuation coefficient, 41
Gaussian field, 33, 97, 102, 111, 144
back-projection, 37 Gibbs distributions, 84, 99
basis function, 40
basis functions, 20, 22 hyperparameter, 94
Bayes’ theorem, 64, 98, 99
ill-conditioned, 59, 82
Bayesian prior, 61
ill-posed, 50, 82
Bessel functions, 43, 163
incomplete data, 67, 75
BGO, 37
indicator function, 40, 42
bicubic interpolation, 122
intensity normalisation, 20
blob, 33, 43, 54
inter-crystal penetration, 50
blobs, 160
intrinsic resolution, 1
central limit theorem, 50, 84 inverse filtering, 1
checker board effect, 1, 50, 128, 157 Iterated Conditional Modes (ICM), 97
coincidence detection, 37
compartment model, 9 Kuhn-Tucker conditions, 73, 144
complete data, 67, 75 Kullback-Leiber distance, 88
convergence, 82, 121
L2norm, 61
cross entropy, 83, 88, 141
Lagrange multipliers, 73, 144
crystal scintillator, 39
least-squares, 63
deadtime correction, 39 likelihood function, 64
deconvolution, 9 line integral, 37
discrete cosine function (DCT), 22 Line Of Response (LOR), 37
divide-and-conquer, 117, 120, 139 line sites, 97
list-mode, 54
energy conservation, 33, 75, 144 local noise estimates, 131
entropy, 111 local regions of interest, 132
evidence term, 64, 117 localisation, 11
Expectation step (E-step), 67 LSO, 37
Expectation-Maximization (EM) algorithm,
66 Markov random field (MRF), 97, 125
Expectation-Maximization-Expectation (EME), Maximisation step (M-step), 67
125 Maximum A Posterior (MAP), 31
208
maximum likelihood estimator, 1 source uncertainty, 50
Maximum Likelihood Estimator (MLE), sparse matrices, 59
63 spatial normalisation, 20
MG, 16 Statistical Parametric Mapping, 10
Monte Carlo, 48 statistical reconstruction, 1, 35, 44
multilevel, 140 stochastic model, 40
multiresolution, 120 successive over-relaxtion (SOR), 84
superresolution, 24
nearest neighbour interpolation, 50 survival probability, 40
neural network, 17, 29, 33, 98 system matrix, 40, 46
Nyquist, 30, 43
Taylor’s Series, 20
on-the-fly calculations, 60
One Step Late (OSL), 101 under-spill, 11, 13
over-imposition step, 33, 54, 160 underdetermined, 59, 61
over-spill, 11, 13
overdetermined, 59, 61 variant resolution, 127, 128
virtual modality, 1, 10, 20
Partial Volume Effect (PVE), 1, 111
partition function, 109, 117 wavelets, 87
penalisation, 61, 81
X-Ray Computed Tomography (CT), 11
penalisation term, 33
penalised weighted least-squares, 84
PIXEL, 16
pixel redistribution, 10
pixel-by-pixel correction, 9
point spread function (PSF), 9
Poisson distribution, 39, 73, 84, 166
positivity constraints, 33, 66
positron range, 50
posterior probability, 64
prior distribution, 31
quantitative results, 1
Radon transform, 35
randoms, 39, 84
reconstructed resolution, 1
regional measurements, 11
regularisation, 61
resolution spread function, 50
ring difference, 37
scatter, 39
sieves, 83
signal-to-noise ratio (SNR), 50
sinc interpolation, 123
Singular Value Decomposition (SVD), 61
sinogram, 1, 36

Magnetic Resonance Imaging Based Correction and Reconstruction of Positron Emission Tomography Images

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Magnetic Resonance Imaging Based Correction and Reconstruction of Positron Emission Tomography Images

Enviado por

Direitos autorais:

Formatos disponíveis

Diss. ETH No.

Magnetic Resonance Imaging Based Correction

A dissertation submitted to the

SWISS FEDERAL INSTITUTE OF TECHNOLOGY

for the degree of

Jonathan David Oakley

accepted on the recommendation of

Zurich, January 20, 2000

Bedingt durch die niedrige Auflösung von Positronenemissionstomographen (PET) besit-

und eine verbesserte Auflösung der Rekonstruktion ermöglicht.

2 Compartment Model Based Methods of PET Correction 9

3 Developing a “Virtual Modality” PET Image 20

4 The Arguments for Statistical Methods of Reconstruction 35

4.4.1 The Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . 43

5 Characteristics of the System Matrix 46

6 Implementing PET Image Reconstruction in its Algebraic Form 61

7 The Expectation-Maximization Algorithm 66

8 Characteristics of the Reconstruction Process 81

9 Better Prior Models for Bayesian Methods of Reconstruction 97

10 Applying New Gaussian Priors 108

11 Multiresolution Representations 120

12 Multilevel Approaches 139

13 Conclusion and Closing Remarks 144

A Algorithms Used 148

A.2.6 Algorithm 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

B [Levitan and Herman 1987] 157

C Theoretical Background to Blobs 160

D Characteristics of the Poisson Distribution 166

E Additional Results for the Thesis’ Algorithms 169

F Glossary of Terms 186

2.1 Example PET correction algorithms based either on equation 2.1 or on a

3.1 Sixteen 2-D Discrete Cosine Transform basis functions. . . . . . . . . . . . . 23

4.1 Relating the Radon transform to the geometry of a tomographic system. . . 36

5.1 Relating an image grid to the scanner’s field of view. . . . . . . . . . . . . . 47

7.1 Maximum likelihood reconstructions for different normalisation methods. . 79

8.1 The checkerboard effect common to maximum likelihood based methods of

10.1 Images of an entropy measure taken on a t1 weighted MRI slice. . . . . . . 112

11.1 How a simple (piecewise constant) interpolation scheme is used to infer

12.1 How the original sinogram is sub-sampled using a simple neighbourhood

A.1 Storing the Non-Sparse System Matrix. . . . . . . . . . . . . . . . . . . . . 151

Positron Emission Tomography (PET) is a non-invasive functional imaging technique. A

1.1 Toward the Intrinsic Resolution of the PET Scanner

1.2 Organisation of this Thesis

Chapter 5 presents the system model used in statistical methods of reconstruction. As

Chapter 7 introduces, reviews, and details the Expectation-Maximization Maximum

Chapter 8 documents the characteristics of statistical methods of reconstruction that

Chapter 9 reviews developments of the Bayesian paradigm first presented in chapter 8,

Chapter 10 ’s importance is to consider the possible worth of the correction method

Chapter 11 introduces multiresolution techniques to the reconstruction procedures. Be-

Chapter 12 documents the so-called Multilevel Approaches to reconstruction [Herman

2.1 Introduction - A Notion of Correction

2.2 The Iterative Approaches

2.3 Pixel-by-Pixel Correction - the Non-Iterative Methods

was only later considered in [Müller-Gärtner et al. 1992].

2.4 Anatomical Localisation of Functional Images

2.5 [Videen et al. 1988]’s Method

of spatial resolution is not addressed.

2.6 An Extension to 3-D

• Partial pixel misregistration,

• Nonuniformity of the in-plane and axial PET PSF,

• Distortion due to the attenuation correction,

• Random coincidences, and

2.7 The Need for Increased Localisation

neocortical anatomy was only approximately replicated .... More realistic