Você está na página 1de 186

NETHERLANDS GEODETIC COMMISSION

PUBLICATIONS ON GEODESY

NEW SERIES

VOLUME 8

NUMBER 1

THE GEOMETRY OF
GEODETIC INVERSE LINEAR MAPPING
AND NON-LINEAR ADJUSTMENT

by

P. J. G. TEUNISSEN

1985
RIJKSCOMMISSIE VOOR GEODESIE, THIJSSEWEG 11, DELFT, THE NETHERLANDS

PRINTED B Y W. D. MEINEMA B.V., DELFT, THE NETHERLANDS

SUMMARY

This publication discusses

l0
The problem o f inverse linear mapping
and
2' The problem o f non-linear adjustment.
A f t e r the introduction, which contains a motivation of our emphasis on geometric thinking, we
commence i n chapter I1 w i t h the theory of inverse linear mapping. Amongst other things we show t h a t
every inverse B of a given linear map A can be uniquely characterized through the choice of three
linear subspaces, denoted by S, C and V.
Chapter U1 elaborates on the consequences o f the inverse linear mapping problem for planar,
ellipsoidal and three dimensional geodetic networks. F o r various situations we construct sets of base
vectors f o r the nullspace Nu(A) of the designmap. The chapter is concluded w i t h a discussion on the
problem o f connecting geodetic networks. We discuss, under f a i r l y general assumptions concerning
the admitted degrees of freedom o f the networks involved, three alternative methods of connection.
Chapter

IV

treats the problem of non-linear adjustment. A f t e r a general problem statement and a

b r i e f introduction i n t o Riemannian geometry, we discuss the local convergence behaviour o f Gauss'


i t e r a t i o n method (GM). A differential geometric approach is used throughout.
F o r both one dimensional and higher dimensional curved manifolds we show that the local behaviour
of G M is asymptotically linear. Important conclusions are further t h a t the local convergence
behaviour of GM, 1.'

is predominantly determined by the least-squares residual vector and the

corresponding extrinsic curvature o f the manifold, 2.'


o f asymptotic linear convergence, 3.'

is invariant against reparametrizations i n case

is asymptotically quadratic in case either the least-squares

residual vector or the normal f i e l d B vanishes, 4.'

is determined by the C h r i s t o f f e l symbols of the

second kind i n case of asymptotic quadratic convergence and 5.'

w i l l practically n o t be a f f e c t e d by

line search strategies i f both the least-squares residual vector and extrinsic curvature are small
enough.
N e x t we discuss some conditions which assure global convergence of GM.
Thereupon we show that f o r a particular class of manifolds, namely ruled surfaces, important
simplifications

of

the non-linear

least-squares

adjustment problem can be obtained through

dimensional reduction. Application o f this idea made i t possible t o obtain an inversion-free solution o f
a non-linear variant o f the classical t w o dimensional H e l m e r t transformation. This non-linear variant
has been called the Symmetric H e l m e r t transformation. We also give an inversion-free solution of the
two

dimensional

Symmetric H e l m e r t transformation when a non-trivial

rotational

invariant

covariance structure is pre-supposed. A f t e r this we generalize our results t o three dimensions.


I n the remaining sections o f chapter

IV

we give some suggestions as t o how t o estimate the extrinsic

curvatures i n practice; we estimate the curvature of some simple 2-dimensional geodetic networks
and we b r i e f l y discuss some of the consequences of non-linearity f o r the statistical treatment of an
adjustment. Hereby it is also shown t h a t the bias of the least-squares residual vector is determined by

the mean curvature of the manifold and t h a t the bias of the least-squares parameter estimator is
determined by the trace o f the Christoffelsymbols of the second kind.
The chapter is concluded w i t h a b r i e f discussion o f some problems which are s t i l l open for future
research.

ACKNOWLEDGEMENTS

The author greatfully acknowledges t h e support received from t h e following organisations:


The Netherlands Geodetic Commission for granting travelling funds,
The Netherlands Organisation f o r t h e Advancement of Pure Research (Nederlandse Organisatie voor
Zuiver-Wetenschappelijk Onderzoek Z.W.O.)

f o r awarding a research-grant, and

'The Geodetic Institute of t h e S t u t t g a r t University (FRG) for t h e facilities offered during t h e author's
s t a y in Stuttgart.
Finally,special thanks go t o miss Janna Blotwijk f o r t h e excellent job she did in typing and preparing
t h e final version of this publication.

THE GEOMETRY OF GEODETIC INVERSE LINEAR MAPPING


AND NON-LINEAR ADJUSTMENT

.......................................................iii
ACKNOWLEDGEMENTS ............................................... v

SUMMARY

INTRODUCTION

....................................................l

.................................................. 10
..............................13
Injective and Surjective Maps ....................................... 18
Arbitrary Systems of Linear Equations and Arbitrary Inverses ................22
Some Common Type of Inverses and their Relation
to the Subspaces S. C and D ........................................ 24
6. C .and S-Transformations .........................................30

.
.
3.
4.
5.
1

The Principles

Arbitrary Inverses Uniquely Characterized

....................................................35
Geodetic Networks and their Degrees of Freedom ......................... 36
2.1.
Planar networks ............................................36
2.2.
Ellipsoidal networks .........................................42
2.3.
Three dimensional networks ...................................52
3. (Free)Networks and their Connection ..................................65
3.1.
Types of networks considered .................................. 65
3.2.
Three alternatives .......................................... 68

.
2.
1

IV

Introduction

GEOMETRY OF NON-LINEAR ADJUSTMENT

........................................84
2. A Brief Introduction into Riemannian Geometry ..........................87
3. Orthogonal Projection onto a Parametrized Space Curve ....................91
3.1.
Gauss' iteration method ......................................91
3.2.
The Frenet frame ...........................................92
3.3.
The "Kissing" circle .........................................95
3.4.
One dimensional Gauss- and Weingarten equations ................... 97
3.5.
Local convergence behaviour of Gauss' iteration method ............... 98
3.6.
Examples ................................................ 102
1

General Problem Statement

.............................................. 1 0 9
4 . Orthogonal Projection onto a Parametrized Submanifold ................... 110
Gauss1 method ............................................ 1 1 0
4.1.
4.2.
The Gauss1equation ........................................ 1 1 2
4.3.
The norma1fiel.d B .......................................... 116
4.4.
'The l o c a l r a t e o f convergence ................................. 118
4.5.
Global convergence ........................................ 125
5 . Supplements and Examples ........................................ 1 3 4
The t w o dimensional H e l m e r t transformation ...................... 1 3 4
5.1.
Orthogonal projection onto a ruled surface ........................ 1 3 9
5.2.
5.3.
The t w o dimensional Symmetric H e l m e r t transformation .............. 1 4 1
3.7.

Conclusions

5.4.

The t w o dimensional Symmetric H e l m e r t transformation w i t h a non-trivial

5.5.

The three dimensional H e l m e r t transformation and i t s symmetrical

rotational invariant covariance structure

......................... 145

........................................ 1 4 8
5.6.
The extrinsic curvatures estimated ............................. 1 5 6
5.7.
Some t w o dimensional networks ................................ 1 6 3
6 . Some Statistical Considerations ..................................... 1 6 6
7 . Epilogue .....................................................1 7 0
generalization

REFERENCES

....................................................

173

I. INTRODUCTION

This publication has the intention t o give a contribution t o the theory of geodetic adjustment. The
two main topics discussed are

l0
The problem o f inverse linear mapping
and
2'

The problem of non-linear adjustment

I n our discussion o f these t w o problems there is a strong emphasis on geometric thinking as a means
of visualizing and thereby improving our understanding o f methods o f adjustment. It is namely our
belief t h a t a geometric approach t o adjustment renders a more general and simpler treatment of
various aspects of adjustment theory possible. So is it possible t o carry through quite rigorous trains
o f reasoning i n geometrical terms without translating them i n t o algebra. This gives a considerable
economy both i n thought and i n communication of thought. Also does it enable us t o recognize and
understand more easily the basic notions and essential concepts involved. And most important,
perhaps, is the f a c t t h a t our geometrical imagery i n t w o and three dimensions suggests results for
more dimensions and offers us a powerful tool o f inductive and creative reasoning. A t the same time,
when precise mathematical reasoning is required it w i l l be carried out i n terms of the theory of f i n i t e
dimensional vector spaces. This theory may be regarded as a precise mathematical framework
underlying the heuristic patterns of geometric thought.
I n Geodesy it is very common t o use geometric reasoning. I n fact, geodesy benefited considerably
f r o m the development of the study of d i f f e r e n t i a l geometry which was begun very early i n history.
P r a c t i c a l tasks i n cartography and geodesy caused and influenced the creation of the classical theory
o f surfaces (Gauss, 1827; Helmert, 1880). And differential geometry can now be said t o constitute an
essential p a r t of the foundation of both mathematical and physical geodesy (Marussi, 1952; Hotine,
1969; Grafarend, 1973).
B u t i t was n o t only i n the development of geodetic models t h a t geometry played such a p i v o t a l r8le.
Also i n geodetic adjustment theory, adjustment was soon considered as a geometrical problem. Very
early (Tienstra, 1947; 1948; 1956) already advocated the use of the Ricci-calculus i n adjustment
theory. It permits a consistent geometrization of the adjustment of correlated observations. H i s
approach was l a t e r followed by (Baarda, 1967 a,b; 1969), (Kooimans, 1958) and many others.
More recently we witness a renewed interest i n the geometrization of adjustment theory. See e.g.
(Vanicek, 1979), (Eeg, 1982), (Meissl, 1982), (Blais, 1983) or (Blaha, 1984). The incentive t o this renewed interest is probably due to the introduction into geodesy of the modern theory of H i l b e r t
spaces w i t h kernel functions (Krarup, 1969). As (Moritz, 1979) has p u t it rather plainly, this theory
can be seen as an i n f i n i t e l y dimensional generalization of Tienstra's theory of correlated observations
i n i t s geometrical interpretation.

Probably the best motivation f o r taking a geometric standpoint i n discussing adjustment problems i n
linear models is given by the following discussion which emphasizes the geometric interplay between

best linear unbiased estimation and least-squares estimation:


L e t y be a random vector i n the m-dimensional Euclidean space M w i t h m e t r i c tensor

assume that y has an expected value

.)

where E {

(. , .)M

We

i.e.,

is the mathematical expectation operator, and that y has a covariance map


Q : M*

M,

defined by

-1

.)M

= (Y1,

yl

Vylc

The linear vector space M* denotes the dual space o f M and is defined as the set o f a l l real-valued
(homogeneous)

*
y : M

IR

linear

Instead o f w r i t i n g

*
y (y

each

!l*

is a linear

function

) we w i l l use a more symmetric formulation, by


1
) as a bilinear function i n the two variables y* and .yl
This bilinear function is

considering

*
y (y

denotedby

(.,.I:
M

The function (

. Thus

defined on M

functions

., . )

M + I R a n d i s d e f i n e d b y ( y ,Y1)=Y

* (yl)

V y

* E M* ,Y1c

is called the duality pairing of M* and M intolR.

We define a linear model as

where

is a linear manifold i n M. A linear manifold can best be viewed as a translated subspace. We

w i l l assume t h a t

{y } + U
1

where yl is a fixed vector of M and U is an n-dimensional proper

subspace o f M.
The problem of linear estimation can now be formulated as: given an observation ys on the random

vector y, i t s covariance map Q and the linear manifold i , estimate the position o f y i n fi C M
Y
I f we r e s t r i c t ourselves t o Best Linear Unbiased Estimation (BLUE), then the problem o f linear

M * , f i n d 2 E IR and i* E M* such that


,y ) is a BLUE'S estimator o f ( y S* , -y ) . The
( y * , y ) if,
S

estimation can be formulated dually as: given an y"

6 +

the inhomogeneous linear function h ( y ) =

function h(y) is said t o be a BLUE'S estimator of

(y

l0 h(y) is a linear unbiased estimator of ( y* , y )


if E

) = y

i.e.,

R,

and
2'

h(y) is best, i.e.,


Variance

{h(y )} 5

estimators g ( y )
F r o m (1.4.1')

follows that

Variance
X

{ g(y) }

= a + ( y ,y),

f o r a l l linear unbiased

a EIR, y*

M*,

* -y ) .

of ( y s ,

= y

*S

I*

forsome y

y y

The set o f

{ y } + U.
*1
y E M* for

of

and is denoted by U

since

Uc M

6
F r o m (1.4.2')

which ( y
0

If

(F*,

F*)

{ }:y

+ U

= 0,

(1.5)

N,

and y

{ }:y

+ U
X

M*

,U)

( U , U ) = 0. This gives f o r (l.5),

5 (Y*,Q~Y

I f we now define the dual m e t r i c o f

F*

yX

.X

(ys- y

= 0, forms a subspace of M*. It is called the annihilator

c M*, i.e.

follows w i t h (1.6) t h a t

and

* ,U)

= ( y s - y , y l ) f o r some y

it follows t h a t

X.

S
0

.X

- y

(1.6)

must satisfy

Y*

{ yS} + u0

(1.7)

by pulling the m e t r i c of M back by Qy, i.e.,

must satisfy

-*

Geometrically this problem can be seen as the problem of finding t h a t point y


which has least distance t o the origin o f
orthogonally projecting

*
y

E,{*.
And

in

{ }:y

-X

it w i l l be i n t u i t i v e l y clear t h a t y
0

IS

+ U

found by

onto the orthogonal complement ( U ) o f U' (see figure 1).

figure 1
Now, before we characterize the map which maps y* into
S

on linear maps.

i*,l e t us f i r s t present some generalities

L e t N and M b e t w o linear v e c t o r s p a c e s of dimensions n and m respectively, and l e t A: N + M

be a

l i n e a r m a p b e t w e e n them. Then w e d e f i n e t h e image of U c N under A a s


AU

= { y E M

y = A

forsome

E U )

T h e inverse image of V c M under A is defined a s

U = N

In t h e s p e c i a l c a s e t h a t

t h e inverse i m a g e of { O}

t h e i m a g e of U under A is called t h e range space R ( A ) of A. And

M under A is called t h e nullspace Nu(A)

of A. I t is easily verified
-1
(V).

t h a t if V and U a r e l i n e a r subspaces of M and N respectively, s o a r e AU a n d A


A l i n e a r m a p A: N + M is injective o r one-to-one if f o r e v e r y

f A x2.

t h a t A xl

l,

= M

T h e m a p A is surjective o r o n t o if A N

x2

x1 f x2

implies

And A i s called bijective o r a

bijection if A is both injective and surjective.


With t h e linear m a p A : N + M

*
y

composition

and t h e dual v e c t o r (or l - f o r m ) y*

*
y

o A is a linear function which m a p s N intolR, i.e.

A assigns t h e l - f o r m

o A

N* t o y

i t follows t h a t t h e

M*
A

* . Since t h e m a p

M* w e s e e t h a t t h e m a p A induces a n o t h e r linear

m a p , A* say, which m a p s M * i n t o N*. This m a p A* is called t h e dual map t o A and is defined a s


A * ~ *= y

o A.

With t h e duality pairing i t is easily verified t h a t

A n i m p o r t a n t consequence of t h i s bilinear identity is t h a t f o r a non-empty inverse i m a g e of subspace

V c M under A, w e h a v e t h e duality r e l a t i o n

N o t e t h a t h e r e t h e f o u r c o n c e p t s of image, inverse image, annihilation and duality c o m e t o g e t h e r in


o n e formula. F o r t h e s p e c i a l c a s e t h a t

V = { O} t h e r e l a t i o n r e d u c e s t o Nu(A) O = R ( A * )

Maps t h a t play a n i m p o r t a n t r o l e in l i n e a r e s t i m a t i o n a r e t h e so-called p r o j e c t o r maps. Assume t h a t


t h e subspaces U and V of N a r e c o m p l e m e n t a r y , i.e.
Then f o r

N w e h a v e t h e unique decomposition

c a n now d e f i n e a linear m a p P:

with

2'

N = U ce V
X

, with "ce"
X

with

denoting t h e d i r e c t sum.
X

. We

+ N through

E U ,

E V and N =

U a V

This m a p is called t h e p r o j e c t o r which p r o j e c t s o n t o U and along V. I t is d e n o t e d by P


(see figure
U , V
2).

figure 2
I f P projects onto U and along V then I- P, w i t h Ithe identity map, projects onto V and along U. Thus
I - P

U,

= P

(1.14)

v, U

F o r their images and inverse images we have

It is easily verified t h a t the dual P* of a projector P is again a projector operating on the dual space.

For we have w i t h (1.12) and (1.15):


0

(0))

= V

= P

and

u,v

(P-'
U,

( u ) ) ~=

N =

( 0 ) ; P*
U,

U'.

Thus,

= P

and

(I-P

u,v

)* = P *

v,u

= P

uO,vO

Finally we mention t h a t one can check whether a linear map is a projector, by verifying whether the
i t e r a t e d operator coincides w i t h the operator itself (Idempotence).
Now l e t us r e t u r n t o the point.where we l e f t our BLUE'S problem. We noted t h a t
orthogonally projecting
0

onto

projects onto ( U ) and along U',

F r o m (1.6) and (1.17)

( U')'.

i* could be found by

Hence, the projector map needed is the one which

i.e.,

follows then that the linear function h(y) is the unique BLLIE's estimator o f

h(y)

= ((I
- P

= i +

+ (P

Y*,Y),
(uOf

where yl

is an arbitrary element o f N.

Application o f the definition o f the dual map gives

And since

we get

i n which we recognize the least-squares estimate

= y1

+ p

Y1

u,uL
which solves the dual problem

(see figure 3).

figure 3

E N ,

,uO

Thus we have recovered t h e existing duality between BLUE's estimation and least-squares estimation.
We minimize a sum of squares (1.20) and emerge with an optimum estimator, namely one which
minimizes another sum of squares (1.81, t h e variance. From the geometrical viewpoint this arises
simply from t h e duality between t h e so-called observation space M and estimator space M*,

established by t h e duality pairing ( y ,y )


The above given result is of course t h e well known Gauss-Markov theorem which probabilistically
justifies least-squares estimation in case of linear models.
Observe t h a t t h e above discussion shows another advantage of geometric reasoning, namely t h a t t h e
language of geometry embodies a n element of invariance. That is, geometric reasoning avoids
unnecessary r e f e r e n c e t o particular s e t s of coordinate axes. Concepts such a s linear projections and
linear manifolds for instance, may be visualized in a coordinate-free or invariant way. All results
obtained by an invariant approach therefore necessarily apply t o all possible representations of t h e
linear manifold

G.

That is, one could define

by a linear map A from t h e parameter space N into

t h e observation space M (in Tienstrals terminology this would be "standard problem 11") or implicitly
by a s e t of linear constraints ("standard problem I"). Even a mixed representation is possible.
Consequently, in general we have t h a t if a coordinate representation is needed one c a n t a k e t h e one
which seems t o be t h e most appropriate. T h a t is, t h e use of a convenient basis r a t h e r than a basis
fixed a t t h e outset is a good illustration of t h e f a c t t h a t coordinate-free does not mean freedom from
coordinates so much a s it means freedom t o choose t h e appropriate coordinates for t h e task a t hand.
With respect t o our first topic, note t h a t a direct consequence of t h e coordinate-free formulation is
t h a t t h e difficulties a r e evaded which might possibly occur when a non-injective linear map A is used
to specify the linear model. This indicates t h a t t h e actual problem of inverse linear mapping should
not be considered t o constitute an essential part of t h e problem of adjustment. T h a t is, in t h e context
of BLUE's estimation i t is insignificant which pre-image of

y under A is taken. This viewpoint seems,

however, still not generally agreed upon. The usually merely algebraic approach taken often makes
one omit t o distinguish between t h e actual adjustment problem and t h e actual inverse mapping
problem. As a consequence, published studies in the geodetic literature dealing with t h e theory of
inverse linear mapping surpass in our view often t h e essential concepts involved. We have therefore
tried to present an alternative approach; one t h a t is based on t h e idea t h a t once the causes of t h e
general inverse mapping problem a r e classified, also t h e problem of inverse linear mapping itself is
solved. Our approach s t a r t s from the identification of t h e basic subspaces involved and next shows
t h a t t h e problem of inverse linear mapping can be reduced t o a few essentials.
As t o our second topic, t h a t of non-linear adjustment, note t h a t t h e Gauss-Markov theorem
formulates a lot of "ifs" before i t s t a t e s why least-squares should be used: if t h e mean
linear manifold

N,

if t h e covariance map is known t o be Q

Y'

lies in a

if we a r e willing t o confine ourselves t o

e s t i m a t e s t h a t a r e unbiased in t h e mean and if we a r e willing t o apply t h e quality criterium of


minimum variance, then t h e best e s t i m a t e is t o be had by least-squares. These a r e a lot of "ifs" and it
would be interesting t o ask "and if not?". For all "ifs" this would become a complicated task indeed.
But i t will be clear t h a t t h e first "if" which called for manifold

t o be linear, already breaks down

in case of non-linear models. Furthermore, in non-linear models a restriction t o linear estimators


does not seem reasonable anymore, because any estimator of y m u s t be a mapping from M into

i,

which w i l l be curved i n general. Hence, s t r i c t l y speaking the Gauss-Markov Lheorem does not

apply anymore i n the non-linear case. And consequently one might question whether the excessive use
of the theorem i n the geodetic l i t e r a t u r e f o r theoretical developments is justifiable i n a l l cases.
Since almost a l l functional relations i n our geodetic models are non-linear, one may be surprised t o
realize how l i t t l e attention the complicated problem area of non-linear geodesic adjustment has
received. One has used and is s t i l l predominantly using the ideas, concepts and results f r o m the
theory of linear estimation. O f course, one may argue t h a t probably most non-linear models are only
moderately non-linear and thus p e r m i t the use of a linear(ized) model. This is true. However, it does
i n no way release us f r o m the obligation of really proving whether a linear(ized) model is sufficient as
approximation. What we need therefore is knowledge of how non-linearity manifests itself a t the
various stages of adjustment. Here we agree w i t h (Kubik, 1967), who points out that a general
theoretical and practical investigation i n t o the various aspects of non-linear adjustment is s t i l l
lacking.
I n the geodetic l i t e r a t u r e we only know of a few publications i n which non-linear adjustment problems
are discussed. I n the papers by (Pope, 19721, (Stark and Mikhail, 19731, (Pope, 1974) and (Celmins,
1981; 1982) some p i t f a l l s t o be avoided when applying variable transformations or when updating and
re-evaluating function values i n an iteration procedure, are discussed. And i n (Kubik, 1967) and
(Kelley and Thompson, 1978) a b r i e f review is given of some iteration methods. A n investigation i n t o
the various e f f e c t s of non-linearity was started i n (Baarda, 1967 a,b), (Alberda, 19691, (Grafarend,
1970) and more recently i n (Krarup, 1982a). (Alberda, 1969) discusses the e f f e c t of non-linearity on
the misclosures o f condition equations when a linear least-squares estimator is used and illustrates
the things mentioned w i t h a quadrilateral. A similar discussion can be found i n (Baarda, 1967b), where
also an expression is derived f o r the bias i n the estimators. (Grafarend, 1970) discusses a case where
the circular normal distribution should replace the ordinary normal distribution. And finally (Baarda,
1967a) and (Krarup, 1982a) exemplify the e f f e c t of non-linearity w i t h the aid of a circular model.
Although we accentuate some d i f f e r e n t and new aspects of non-linear adjustment, our contribution t o
the problem of non-linear geodesic adjustment should be seen as a continuation o f the work done by
the above mentioned authors. We must a d m i t though t h a t unfortunately we do n o t have a c u t and
dried answer t o a l l questions. We do hope, however, t h a t our discussion of non-linear adjustment w i l l
make one more susceptible t o the intrinsic difficulties of non-linear adjustment and t h a t the problem
w i l l receive more attention than it has received hitherto.
The plan of this publication is the following:
I n chapter I1 we consider the geometry of inverse linear mapping. We w i l l show t h a t every inverse B
of a linear map A can be uniquely characterized through the choice of three subspaces S , Cand D.
Furthermore, each of these three subspaces has an interesting interpretation of i t s own. I n order t o
f a c i l i t a t e reference the basic results are summarized i n table 1.
I n chapter 111 we s t a r t by showing the consequences of the inverse mapping problem for 2 and 3dimensional geodetic networks. This p a r t is easy-going since the planar case has t o some extent
already been treated elsewhere i n the geodetic literature. The second p a r t of this chapter presents a
discussion on the i n geodesy almost omnipresent problem of connecting geodetic networks.
Finally, chapter I V makes a s t a r t w i t h the problem of non-linear adjustment. A differential geometric
approach is used throughout. We discuss Gauss' method i n some detail and show how the extrinsic

curvatures of submanifold

a f f e c t s i t s local behaviour. And amongst other things, we also show how

in some cases t h e geometry of t h e problem suggests important simplifications. Typical examples a r e


our generalizations of t h e classical Helmert transformation.

IL

GEOMETRY O F INVERSE L I N E A R MAPPING

1
. The principles

Many problems i n physical science involve the estimation or computation of a number of unknown
parameters which bear a linear (or linearized) relationship t o a set o f experimental data. The data
may be contaminated by (systematic or random) errors, insufficient t o determine the unknowns,
redundant, or a l l o f the above and consequently, questions as existence, uniqueness, stability,
approximation and the physical description o f the set o f solutions are a l l o f interest.
I n econometrics f o r instance (see e.g.

Neeleman, 1973) the problem of insufficient data is discussed

under the heading o f "multi-collinearity"

and the consequent lack of determinability o f the

parameters f r o m the observations is known there as the "identification problem". And i n geophysics,
where the physical interpretation o f an anomalous gravitational f i e l d involves deduction of the mass
distribution which produces the anomalous field, there is a fundamental non-uniqueness i n potential
f i e l d inversion, such that, f o r instance, even complete, perfect data on the earth's surface cannot
distinguish between t w o buried spherical density anomalies having the same anomalous mass but
d i f f e r e n t r a d i i (see e.g.

Backus and Gilbert, 1968).

Also i n geodesy similar problems can be recognized. The f a c t t h a t the data are generally only
measured a t discrete points, leaves one i n physical geodesy f o r instance w i t h the problem o f
determining a continuous unknown function f r o m a f i n i t e set of data (see e.g.

Rummel and Teunissen,

1982). Also the non-uniqueness i n coordinate-system definitions makes itself f e l t when identifying,
interpreting, qualifying and comparing results f r o m geodetic network adjustments (see e.g.

Baarda,

1973). The problem o f connecting geodetic networks, which w i l l be studied i n chapter three, is a
prime example i n this respect.
A l l the above mentioned problems are very similar and even formally equivalent, i f they are
described i n terms of a possible inconsistent and under-determined linear system

where A is a linear map f r o m the n-dimensional parameter space bJ

i n t o the m-dimensional

observation space b.(.


The f i r s t question t h a t arises is whether a solution t o (1.1) exists a t all, i.e. whether the given vector
y is an element of the range space R(A), y

R(A). I f this is the case we c a l l the system consistent.

The system is certainly consistent i f the rank of A, which is defined as rank A = dim. R(A) = r, equals
the dimension o f M. I n this case namely the range space R(A) equals M and therefore y

M= R(A). I n

a l l other cases, r <dim. M consistency is no longer guaranteed, since i t would be a mere coincidence
i f the given vector y EA.( lies i n the smaller dimensioned subspace R ( A ) c M . Consistency is thus
guaranteed i f y

E R(A)

= NU (A*)'.

Assuming consistency, the next question one m i g h t ask is whether the solution of (1.1) is unique or

not, i.e.

whether the vector y contains enough information t o determine the vector

X.

I f not, the

system is said t o be under-determined. The solution is only unique i f the rank o f A equals the
dimension o f i t s domain space N

, i.e.

i f r = dim. N . To see this, assume xl

solutions t o (1.1). Then Axl = A x 2 or A(xl-x2)


From

the

above

m = dim. M

considerations

and n = dim. N

t o be t w o

= 0 must hold. B u t this means t h a t r < dim. N.

follows

, which

and x2 f xl

that

it

is

the

relation

of

r = dim. R ( A )

to

decides on the general character o f a linear system. I n case

r = m = n, we know t h a t a unique inverse map B o f the bijective map A exists, w i t h the properties

B A = I

and

A B = I .

(1.2)

For non-bijective maps A, however, i n general no map B can be found f o r which (1.2) holds. F o r such
maps therefore a more relaxed type o f inverse property is used. Guided by the idea t h a t an inversel i k e map B should solve any consistent system ,that is, map B should furnish f o r each y
some solution

R(A),

= B y such t h a t y = ABy, one obtains as defining property o f B

Maps B: M + N

which satisfy this relaxed type o f inverse condition are now called generalized

inverses o f A.

I n the geodetic l i t e r a t u r e there is an overwhelming l i s t o f papers which deal w i t h the theory o f


generalized inverses (see e.g.

Teunissen, 1984a and the references c i t e d i n it). It more or less started

w i t h the pioneering work o f Bjerhammar (Bjerhammar, 1951) ,who defined a generalized inverse f o r
rectangular matrices.

And a f t e r the publication o f Penrose (Penrose,

1955) t h e l i t e r a t u r e o f

generalized inverses has proliferated rapidly ever since.


Many o f t h e published studies, however, follow a rather algebraic approach making use o f anonymous
inverses which merely produce a solution t o t h e linear system under consideration. As a consequence
o f this anonymity the essential concepts involved i n the problem o f inverse linear mapping o f t e n stay
concealed. Sometimes it even seems t h a t algebraic manipulations and the stacking o f theorems,
lemma's, corollaries, and what have you, are preferred t o a clear geometric interpretation o f what
really is involved i n the problem o f inverse linear mapping.
I n this chapter we therefore approach the problem o f inverse mapping f r o m a d i f f e r e n t viewpoint.
Our approach is based on the idea t h a t once the causes o f the inverse mapping problem are classified,
also the problem o f inverse mapping i t s e l f is solved. The following reminder may be helpful. We know
t h a t a map is uniquely determined once i t s basis values are given. B u t as t h e theorem o f t h e next
section shows, condition (1.3) does not fully specify a l l the basis values o f the map B. Hence i t s nonuniqueness. This means, however, t h a t analogously t o the case where a basis o f a subspace can be
extended i n many ways t o a basis which generates the whole space, various maps satisfying (1.3) can
be found by specifying their failing basis values.
To give a p i c t o r i a l explanation o f our procedure, observe that i n the general case o f rank A = r <
min.(m,n),

the nullspace EJu(A) c

and range space R(A) c M both are proper subspaces. That is,

they do not coincide w i t h respectively N and M (see figure 4).

M : observation space

N : parameter space

= rank A

dirn.R(A)

d irn. Nu(A)

= n-rank

Now, just l i k e there are many ways i n which a basis of a subspace can be extended t o a basis which
generates the whole space, there are many ways t o extend the subspaces N u ( A ) c
R(A) c

and

t o f i l l N and M respectively .(see figure 5).

figure 5
L e t us choose t w o arbitrary subspaces, say S c N

S e N u ( A ) and R ( A ) e

CO

and

CO

c M,

such that the direct sums

coincide w i t h N and h( (see figure 6).

N : parameter space

M : observation space

R(A) = rank A

d im.

dim.

N = S

EI

Nu(A)

figure 6

M =R(A) s C O

= mrank A

The complementarity of S and Nu ( A ) then implies t h a t t h e subspace S has a dimension which equals
t h a t of R ( A )
A

Is

i.e.

dim. S = dim.

R ( A ) . But this means t h a t map A, when restricted t o S ,

is bijective. There exist t h e r e f o r e linear maps B: M

become t h e inverse of A

Is

which, when restricted t o R(A),

(see figure 7):

AI = I
B~R(A) s

and

IS 1'

= I .
R(A)

d im.

figure 7
The inverse-.like properties (1.4) a r e thus t h e ones which replace (1.2) in t h e general c a s e of rank A =
r < min.(m,n). The second equation of (1.4) can be rephrased a s ABA = A, and therefore constitutes
t h e classical definition of a generalized inverse of A. The first equation of (1.4) s t a t e s t h a t

In t h e next section w e will prove what is already intuitively clear, namely t h a t equation (1.5) is
equivalent t o t h e classical definition (1.3), and therefore (1.5) c a n just a s well be used a s a definition
of a generalized inverse. In f a c t , (1.5) has t h e advantage over (1.3) t h a t it clearly shows why
generalized inverses a r e not unique. The image of S under A is namely only a proper subspace of M .
To find a particular map B which satisfies (1.5), w e therefore need t o specify i t s failing basis values.

2. Arbitrary inverses uniquely characterized

In this section w e will follow our lead t h a t a map is only uniquely determined once i t s basis values a r e
completely specified.
As said, t h e usual way t o define generalized inverses B of A is by requiring

This expression, however, is not a very illuminating one, since i t does not tell us what generalized
inverses of A look like o r how they can be computed. We will therefore r e w r i t e expression (2.1) in
s u c h l a form t h a t i t becomes relatively easy t o understand t h e mapping characteristics of B. This is
done by t h e following theorem:

Theorem
A B A = A

l0

F o r some unique S c N
complementary t o Nu(A),

B Ax =

X,

, holds.

Proof of l0
F r o m premultiplying ABA = A w i t h B follows BABA = BA. The map BA is thus idempotent

(+ )

and therefore a projector f r o m N i n t o N.


F r o m ABA = A also follows t h a t Nu(BA) = Nu(A).
To see this, consider

Nu(BA). Then BAx = 0 or ABAx = Ax = 0, which means that

Nu(A). Thus Nu(BA) c Nu(A). Conversely, i f


means

X E

Flu(BA). Thus we also have ?JU(A)c

pu(BA). Hence Nu(f3A) = Nu(A).


R(BA) = S

N o w l e t us denote the subspace R(BA) by S, i.e.


of

BA

then

implies

BAx =

that

N = R (BA) c Nu(BA) .With


have that

N = S c Nu (A)

X,

R(f3A) = S

and

S.

. The projector property

And

it

also

Nu(BA) = Nu(A)

. Hence the complementarity o f S and Nu(A).

implies

that

we therefore

F r o m N = S c Nu(A) follows the complementarity o f S and Nu(A). We can therefore

(C)

= I- PNu(A),S. With this projector we can now replace

construct the projector PS,

BAx =

X,

B A

Nu(A)

VX

by
X

= P
S, NU(A)',

And since A PS, N ~ ( ~= A(


) I

A B A x = Ax,

vx

PNu(A) ,S

or finally, a f t e r premultiplication w i t h A ,

Proof of

X E

Nu(A), then Ax = 0 or BAx = 0, which

Vx N .

z0
We o m i t the proof since i t is straightforward.

N.

) = A, we get

The above theorem thus makes precise what already was made intuitively clear i n section one.
There are now t w o important points which are put forward by the theorem. F i r s t o f all, it states t h a t

every linear map B:

N which satisfies

w i t h N = S e Nu(A), is a generalized inverse o f A. And since

Iy

R(A) = A N =
= {y

E
E

= {y

M
M

E M

I
I

y = A x f o r some X E N 1
y = A x f o r some x = x + X
y =

AX

f o r some

2'

ES

,X

NU(A))

ESI

= AS,
this implies t h a t a generalized inverse B of A maps the subspace R ( A ) c M onto a subspace S c N
complementary t o Nu(A). Map B therefore determines a one-to-one relation between R(A) and S, and
is injective when r e s t r i c t e d t o the subspace R(A).
A second point t h a t should be noted about the theorem is t h a t it gives a way o f constructing arbitrary
generalized inverses of A. To see this, consider expression (2.2). Since R(A) = A N = A S, expression
(2.2)

only specifies how B maps a subspace, namely R(A), o f M. Condition (2.2)

is therefore not

sufficient f o r determining map B uniquely. Thus i n order t o be able t o compute a particular


generalized inverse o f A one also needs t o specify how B maps a basis o f a subspace complementary
t o R (A). L e t us denote such a subspace by

...,

CO,

%l,
n

and e",

CO

c M

, i.e.

M = R( A ) e

are bases o f M and N and Cp


li
e

p = l,

CO

Then i f ei, i = l , ..,m,

..., (m- r ) , ')

forms a basis o f

a particular generalized inverse B o f A is uniquely characterized by specifying i n addition t o (2.2)

how i t maps

CO,

say:

I
i

B C
P

= D

..

l , .m ;

a '

denotes the subspace spanned by D

e
P

CO

Although the choice for

= V c N , with

c N

condition, namely V c N u ( A ) ,

p , .

..,m

(2.3)

(Einstein's summation convention).


a

Thus i f

..

l , .n ;

"

we have,

M = R(A) e C

(2.4)

is completely free, we w i l l show t h a t one can impose an extra


without affecting generality. N o t e t h a t point 2'

o f the theorem

says t h a t A B is a projector, projecting onto the rangespace R(A) and along a space, say

t o,

complementary t o R(A). With (2.4) we therefore get t h a t

S)

l
i
6. .C' = 0 ,
expresses the f a c t t h a t C
I t
P
11 9
q = l, r, or i n m a t r i x notation t h a t ( C ) C
= 0
pxm mx(m-p) p x ( m - p )

The kernel l e t t e r

...,

"$lv

i,j = l,...,m;

p=l, ...,(m-

r);

B u t this means t h a t i f B is characterized by mapping

CO

onto D, there exists another subspace o f M

complementary t o R(A) which is mapped by B t o a subspace o f Nu(A). We can therefore just as w e l l


s t a r t characterizing a particular generalized inverse B o f A by (2.2) and (2.41,

D c Nu(A)

additional condition that

b u t now w i t h the

Summarizing, we have for the images o f the t w o complementary subspaces R ( A ) = A S

and

CO

under B:

B AS=S

and

= D

CO

with

= S m Nu(A),

M = R(A) m

(2.5)

CO

and
D c Nu(A)
A few things are depicted i n figure 8.

N : parameter space

M : observation space

dim. S = rank A

dim. R(A) = rank A

dim. C

C=

= m-rank A

Nu(A)
figure 8

Our objective o f finding a unique representation o f an arbitrary generalized inverse B o f A can now
be reached i n a very simple way indeed. The only thing we have t o do is t o combine (2.2) and (2.3). I f
we take the coordinate expressions o f
a

B
where e .
S,

CO

and D,

i = Biea

. ..

i= l,
a
S e
q a

,m

li

and

and e

and A t o be

A e

a
e . and D e
P
1
P a

i
= A e
a
a i

...,

a = l,

p=l,

n are bases o f M and N, and i f we take as bases o f

...,( m - r ) ;

q=l,

...,r ,

then (2.2) and (2.3) can be expressed as

and

which gives i n m a t r i x notation

B ( A
S
n x m m x n n x r .mxym-r))
Now, since the subspaces R(A) = A S and

= (n8r
CO

: nxym-r))

are complementary, the m x m m a t r i x ( A S

: CI)

has

f u l l rank and is thus invertible. The unique representation o f a particular generalized inverse B o f A
therefore becomes

: n xD( m - r ) ) ( mA xSr

B = ( S
n xm
nxr

'
C

mx(m-r)

-1

A more symmetric representation is obtained if we substitute the easily


verified m a t r i x identity

w i t h U' = R(A)' = NU(A'),


the subspaces

CO

and

i n t o (2.7) (recall t h a t C

and U are m a t r i x representations o f respectively

f):

With (2.7) or (2.8) we thus have found one expression which covers all the generalized inverses of A.
Furthermore we have the important result that each particular generalized inverse o f A ,defined
through (2.2)

and (2.3),

complementary t o Nu(A),

is uniquely characterized by the choices made f o r the subspaces S,


CO

complementary t o R(A) and D, a subspace of Nu(A).

I n the next t w o sections we w i l l give the interpretation associated w i t h the three subspaces S,

CO

and D. Also the r e l a t i o n w i t h the problem of solving an arbitrary system o f linear equations w i l l
become clear then.

3. Injective and surjective maps

F r o m the theorem o f the previous section we learnt t h a t the inverse-like properties

hold for any arbitrary generalized inverse B o f A. That is, the maps BA and AB behave l i k e identity
maps on respectively the subspaces S c

and R(A) c M

.Thus in the special case that rank A = r

= n, the generalized inverses of A become left-inverses, since then BA = I.And similarly they become
right-inverses i f rank A = r = m, because then AB = I holds.
I n order t o give an interpretation o f the subspace S c N

,l e t us now f i r s t concentrate on the special

case t h a t rank A = r = m.
I f rank A = r = m then R(A) = M
CO

= {0)

. With (2.5)

, which implies that

we then also have t h a t

{ 0)

(see figure 9). The general expression o f

right-inverses therefore readily follows f r o m (2.8) as

S (AS)-'
n x m mxm

the subspaces complementary t o R(A) reduce t o

,with

N= S a

b!u(A)

M : observation space

N : parameter space

dirn. S

d im. R (A)

d im. Nu(A)

M = R(A)

N = S e Nu(A)
figure 9

Thus t h e only subspaces which play a role in t h e inverses of surjective maps a r e t h e subspaces S
complementary t o Nu(A).
In order t o find o u t how (3.2) is r e l a t e d t o t h e problem of solving a system of linear equations
y = A
X ,
mxl
mxn n x l

(3.3)

for which matrix


A
has full row rank m, first observe t h a t t h e system is consistent for all
m
mx n
y E R
With a particular generalized inverse (right-inverse), say
B , of
A , and
mx n
n xm
Y 6 l = Nu(A) , t h e solution s e t of (3.3), which actually represents a linear manifold m N, c a n

t h e r e f o r e be w r i t t e n a s
{ x } = { x I
X
= B y +
V'
a
nxl nxl
nxl nx(n-r)fn-r)xl
nxl
By choosing a ,s a y a : = a l
X

1
nxl

1 .

, we get thus a s a particular solution

xl

{ X)

B y + V a
nxl

nxl

1'

where al so t o say contributes t h e e x t r a information, which is lacking in y, t o determine xl. Since


R(B) = S

, it follows from (3.4)

(silt

that
I

X1

= ((S )

(n-r)xn nxl
But this means t h a t , since

v 1

(n-r)x(n-r)

al

cal l

(n-r)xl

(3.5)

1
(n-r)xl

a l o r c l contributes t h e e x t r a information which is lacking in y t o

d e t e r m i n e xl, equation (3.5) and (3.3) together suffice t o determine xl uniquely. O r in o t h e r words,
t h e solution of t h e uniquely solvable system

:l] =

(m+n-r)xl

( Z Y ]

(m+n-r)xn nxl

is precisely xl:

nxl
with V

1 1

[(:I)J1

nx(m+n-r) (m+n-r)
0

= Nu(A)

t I -l

= (s(~s)-l

: vL"s

v 1 )[:l]

nx(n-r)

(m+n-r)xl

Thus w e have recovered t h e rule, t h a t in order t o find a particular solution t o (3.3), s a y xl, w e merely

need t o extend t h e s y s t e m of linear equations from (3.3) t o (3.6) by introducing t h e additional


I t
equations c = ( S ) X , so t h a t t h e extended matrix
1

becomes square and regular. F u t h e r m o r e t h e corresponding right-inverse of A is obtainable f r o m t h e


inverse of this extended matrix.
L e t us now consider t h e c a s e rank A = r = n. Then all generalized inverses of A become left-inverses.
Because of t h e injectivity of A we have t h a t i t s nullspace reduces t o N u ( A ) = { o }
implies t h a t S = N and V = { o }

, since

V c Nu ( A )

.(see figure 10).

. But this

M : observation space

N :parameter space

R(A) = rank A
= r = n
d im. S

cO
M = R(A) e

N=S

= m-n

CO

figure 1 0
X

F o r t h e dual m a p A : M

X
+

we therefore have a situation which is comparable t o t h e one

sketched in figure 9 (see figure 11). Now, taking advantage of our result (3.2), w e find t h e general
matrix-representation of a n arbitrary generalized inverse B* of A* t o be
-1
t
t
B = C ( A C )
mxn mxn n x n

M*

N*

:e s t i m a t o r s p a c e

.
: co-parameter space

d irn. C =
rank A =

dim.S

= rank A

= r = n

d irn. Nu(A ) =

M = C

N* ,S

NU(A*)
figure 11

The general expression of left-inverses therefore readily follows as

= ( C A)

with

M = R(A)

a C

Thus dual to our result (3.2), we find t h a t t h e only subspaces which play a r8le in t h e inverses of
injective maps, a r e t h e subspaces

CO

complementary t o R(A).

With t h e established duality relations i t now also becomes easy to s e e how (3.8) is related t o t h e
problem of solving a generally inconsistent but otherwise uniquely determined system of linear
equations
y
mxl

A
X
rnxn n x l

w i t h r a n k A = r = n.

The dual of (3.6) modified to our present situation gives namely

And dual t o (3.7), the unique solution of (3.10) is given by:

.I

(n+m-r)xl
with

A . C )

(n+rn-r)xm mxl
%

Nu(A )

( C A)

-1
y =

I t l - l
'

( n + r n - r ) xrn

mx l

We therefore have recovered t h e dual rule t h a t in order t o find a particular solution t o (3.9), we need
t o extend t h e system of linear equations from (3.9) t o (3.10) by introducing additional unknowns such
t h a t t h e extended matrix
( A :
C'
)
rnxn m x ( r n - r )
becomes square and regular. Furthermore t h e corresponding left-inverse of A is obtainable from t h e
inverse of this extended matrix.

4. A r b i t r a r y systems o f linear equations and arbitrary inverses

I n the previous section we showed t h a t a particular solution o f an underdetermined but otherwise


consistent system o f linear equations could be obtained by extending the m a t r i x
especially the principal role played by the subspace

A
rowwise. And
mx n
S c N complementary t o Nu(A) i n removing

the underdeterminability was demonstrated. Similarly we saw how consistency o f an inconsistent, but
otherwise uniquely determined system o f linear equations was restored by extending the m a t r i x
A columnwise. And here the subspace c O c M complementary t o R(A) played the decisive
mx n
role. We also observed a complete duality between these results; f o r the dual o f an injective map is
surjective and vice versa.

These results are, however, s t i l l not general enough. I n particular we note t h a t the subspace

was annihilated as a consequence of the assumed i n j e c t i v i t y and surjectivity. The

c Nu(A)

reason f o r this w i l l become clear i f we consider the interpretation associated w i t h the subspace D .
Since

n D

{o}

it follows f r o m expression (2.8) that R ( B ) = S s D

= rank A we therefore have t h a t rank B

2 rank

.With dim.S=
{o}

A, w i t h equality i f and only i f D =

d i m R(A)

. But this

shows why the subspace D gets annihilated i n case o f injective and surjective maps. The l e f t (right)
inverses have namely the same rank as the injective (surjective) maps. F r o m the above i t also
becomes clear t h a t the rank o f B is completely determined by the choice made f o r
w i l l have minimum rank i f D is chosen t o be D =
one can choose D such that dim. D = min.(m,n)-r.

{o}

, and maximum rank,

D. I n particular B

rank B = min.(m,n),

Now t o see how the subspace

Nu(A)

if

gets

incorporated i n the general case, we consider a system o f linear equations


y
mxl
i.e.

A
X
mxn n x l

, w i t h rank A = r

< min.(m,n),

(4.1)

a system which is possibly inconsistent and underdetermined a t the same time. F r o m the rank-

deficiency o f A in (4.1)
y E R(A)

follows t h a t the unknowns

cannot be determined uniquely, even i f

.Thus the information contained i n y is not sufficient t o determine

uniquely. Following

the same approach as before, we can a t once remove this underdeterminability by extending (4.1) t o

But although the extended m a t r i x o f (4.2) has f u l l column rank, the system can s t i l l be inconsistent.
To remove possible inconsistency we therefore have t o extend the m a t r i x o f (4.2) columnwise so t h a t
the resulting m a t r i x becomes square and regular. Now since

M = R(A)

e,

CO

, the

following

extension is a feasible one:

0
(m+n-r)xl

(m+n-r)x(m+n-r)

l;l

(m+n-r)xl

with

M = R(A)

0
@

B u t the most general extension would be

with

N = S s

M = R(A) s

Nu(A),

CO

X
(n-r)x(m-r)

and

solution o f (4.3) is then given by:

with

N = S e

Nu(A)

M = R(A) e

CO,

I n this expression we recognize, i f we put


general m a t r i x representation (2.8)

v0
-V

being arbitrary.

The unique

Nu(A) a n d U' = NU(A*)


I t I - 1
1 t
X = D or X = - ( S ) D ,
(S ) V )

of an arbitrary generalized inverse B o f A.

our

Thus as a

generalization o f (3.7) and (3.11) we have:

w i t h V'

Nu(A) a n d U'

NU(A")

This result then completes the circle. I n section one namely, we started by describing the geometric
principles behind inverse linear mapping. I n section two these principles were made precise by the
stated theorem. This theorem enabled us t o f i n d a unique representation concerning a l l g e n e r a k e d
inverses B of a linear map A. I n section three we then specialized to injective and surjective maps,
showing the relation between the corresponding inverses and the solutions o f the corresponding
systems o f linear equations. And finally this section generalized these results to arbitrary systems of
linear equations whereby our general expression of generalized inverses was again obtained.

5. Some common type of inverses and their relation


t o t h e subspaces S , C and

With our interpretation of the three subspaces S

C and D , and an expression l i k e (2.8) it now

becomes very simple indeed to derive most of the standard results which one can find i n the many
textbooks available. See e.g.

(Rao and Mitra, 1971). As a means of exemplification we show what r 8 l e

is played by the three subspaces S , C and D i n the more common type of inverses used:

- least-squares inverses L e t M be Euclidean w i t h m e t r i c tensor


definedby

-1
Q y

We know fro;

(. , .)M

(Y,.)~.

and l e t Q : M
Y

M be the covariance map

chapter one that for

t o be a least-squares solution o f m i n
X

. ( y-A

X,

y-A x ) ~

I, w i t h U = R ( A ) ,

A B = P

u,u
must hold. F r o m (2.8) follows, however, t h a t i n general

Namely, expression (2.8) shows t h a t


A B
mxm

t
-1 t
= A S ( C A S )
C .
mxm

And since

-1 t
A S ( C ~ A S ) C .
C'
=
0
mxm
mx (m- r )
mx (m- r )

and
t
-1 t
A S ( C A S )
C .AS = A S
mxm
mxr
mxr

i t follows t h a t (5.3) is the m a t r i x representation of the projector P

U CO'
(5.2) we thus conclude t h a t least-squares inverses are obtained by chodsing

F r o m comparing (5.1) and

while S and D may s t i l l be chosen arbitrarily. I n matrices condition (5.4) reads


C'

mx(m-r)

'

U
Y
mxm m x ( m - r )
Q

-- minimum norm inverses L e t N be Euclidean w i t h m e t r i c tensor


-1
definedby Q X = ( X , . )
X

For

(. , .)

and l e t Q : N

r
+

N be the covariance map

y = A

X,

; must

orthogonal projection of the origin onto the linear manifold specified by y = A

X.

Hence,

t o be the minimum norm solution of

m i n.
X

X, X

)N

subject t o

be the

must hold. With the same reasoning as above we then find that the minimum norm inverses are
obtained by choosing

while

CO

and

may s t i l l be chosen arbitrarily. I n matrices condition (5.7) reads

S =
nxr

Q
V
.
X
nxn n x r

N o t e t h a t since (5.7) implies t h a t S

= R (A )

(5.4) and (5.7) are dually related.

- maximum- and minimum rank inverses I n the previous section we already indicated t h a t by varying the choices f o r D C N u ( A ) one could
manipulate the rank o f the corresponding generalized inverse. Inverses w i t h maximum rank min.(m,n)
were obtained i f one could choose D such that

d im. D

= m i n. (m, n ) - r

and minimum rank

inverses were characterized by the choice D = { U } .


As we w i l l see i n the next section the minimum rank inverses are by f a r the most important f o r
statistical applications.
There is an interesting transformation property involved i n the class of minimum rank inverses, which
enables one to transform f r o m an arbitrary inverse to a prespecified minimum rank inverse. To see
this, r e c a l l t h a t a minimum rank inverse, B1 say, of A, which is uniquely characterized by the choices

1'

CO

and

= {U},

satisfies the conditions

B A x =

B 1c 1
O={O1

V x E S 1 ;

X,

with
Nq S

= BIR(A)

o V0

o V

and

V0

U = R(A),

h!=

U.

= AS1 o NU(B~)

(5.9)

= Nu(A)

And i t can be represented as

B u t the linear map A i t s e l f also satisfies similar conditions. F o r an arbitrary generalized inverse, B
say, o f A we have namely
A B y = y,

V y

EU;

{o),

A V 0 =

with

M = U o C

= AR(B) o

and
u=R(A),

V'

N=S

CO,

=NU(A)

o V

= BU

c Nu(A)

Upon comparing (5.11) w i t h (5.9) we therefore conclude that the linear map A is representable i n a
way similar t o t h a t o f B1 i n (5.10), i.e.

with

and where B may be any arbitrary inverse of A.

V = Nu ( A ) '

U = R(A),

Now,

substitution of (5.12) i n t o (5.10) gives

B1
n xm

(S

t
-1 t
V
( V S1)
nxn

1.

B
n xm

t
-1 t
cl
(u(clu)
mxm

1.

I n this last expression we recognize the m a t r i x representations of the projectors P


S1, N u ( A )
P
o. Thus we have found the transformation rule
R ( A ) ,C

and

which shows how t o obtain a prespecified minimum rank inverse f r o m any arbitrary generalized
inverse of A. Because of the reciprocal character o f minimum rank inverses
inverse o f i t s minimum rank inverses

- they are often called r e f l e x i v e inverses.

A is namely again an

- minimum norm least-squares inverses -The minimum norm least-squares solution

o f an inconsistent and underdetermined system o f linear equations

y 1 A

withrank A = r < min.(m,n),

X,

is defined as the solution f o r which IL is the minimum norm solution o f

and

y is the least-squares solution o f (5.15).

Since the minimum norm solution o f (5.16) is given by

where the inverse

o f A is characterized by (5.7),

and the least-squares solution o f (5.15) is given

by

= P

1~~ w i t h

U = R(A),

(5.18)

u,u
i t follows f r o m the combination o f (5.17) and (5.18) together w i t h the transformation rule (5.13),

that

the minimum norm least-squares inverse o f A is uniquely characterized by

N o t e t h a t since no freedom is l e f t i n choosing the three subspaces, the minimum norm least-squares
inverse must be unique.
I n the special case that both N and M are endowed w i t h the ordinary canonical metric, the minimum
norm least-squares inverse is commonly known as the pseudo-inverse.

- constrained inverses 0

So far we have been c a r e f u l i n stating the complementarity conditions for S c N and C c M


the method o f prolonging a m a t r i x

. In

this was reached by adding the minimum number o f

X
so t h a t determinability o f X was restored, and the
mXl
mxn n x l
minimum number o f unknowns so that the prolonged m a t r i x became square and regular, i.e. so that

equations needed t o the system

consistency was restored.

Sometimes, however, one can come across the situation where the system o f linear equations

A A
X
is appended w i t h the restrictions ( T L ) L x = c
q > n - r . That is, w i t h the
m lmxn n x l
qxn n x l q x l '
restrictions t h a t X should l i e i n a linear manifold parallel t o a subspace T which is a proper subspace
of an S, complementary t o Nu(A)

. I n this case

T thus fails t o be complementary t o Nu(A)

Although this situation d i f f e r s f r o m the ones we considered so far, it can be handled just as easy. B y
the method o f prolongation we get namely

with T c S

N = S m Nu(A),

M = AT m C

The solution o f (5.20) then follows as

t
-1 t
where m a t r i x T ( C A T )
C is known as a constrained inverse of A (see e.g. Rao and Mitra, 1971).
n xm
Other types o f constrained inverses can be obtained i n a similar way.

To conclude this section we have summarized, i n order t o f a c i l i t a t e reference, the basic results i n
table 1.

,I

I,

4 :

- U ,

n . ',L

5 B-;
"I., ,
-

-.m

"b.
U

+.

6. C- and S -transformations

Now t h a t we have found a geometric characterization of the inverse linear mapping problem, l e t us
return t o the linear estimation problem which was considered i n chapter I.
Consider the linear model

As we know (see (1.1.6))


..X

6 + (y

h(y) =

,y )
f

= (ys-

the necessary and sufficiency conditions for

t o be a linear unbiased estimator (LUE) of ( y * ,


S

-*

y ,yl)

forsome y

9 ),

the linear function


are:

and

-X

That is, y

needs t o be a point on the linear manifold

yX)
S

+ U

0 .

In M

(see figure 12).

figure 12

I t w i l l be clear t h a t every point

appropriate subspace
linear manifold

orif

U= A N ,

C c M

{ y*) +
S

on this linear manifold can be obtained by choosing an

complementary t o U'

U onto C . Hence,

and then projecting the l - f o r m y s along the

With (6.2) then follows that the class of linear unbiased estimable functions of ( y

where

= {y } + A N
1

C c M

and

is arbitrary provided t h a t M

= C

e,

,-y )

is given by:

NU(A').

Every

such linear function is thus uniquely characterized by the choice made f o r C . And by varying the
choices for C one varies the type of unbiased estimator. Since the projector P

always

C, N ~ ( A ' )

projects along the nullspace o f A* (see figure 13), we have that

figure 13

The transformation between the corresponding l - f o r m s is therefore given by

and

i n accordance w i t h

the

current

terminology

one could c a l l such transformations,

C-

transformations
A typical example i n which a particular choice for C is made can be found i n the method of averages
due t o T. Mayer (Whittaker and Robinson, 1944, p. 258). I n this method, which is sometimes used f o r
polynomial approximations (see e.g. Morduchow and Levin, 1959), C is chosen such that the equations
of a linear system
Although more o f

y = A

are separated i n t o n groups and a f t e r that groupwise summed.

e%hpye% can be given, the most common applied estimator is of course the

BLUE'S estimator which is, as we know, characterized by the choice

C =

It is

interesting t o note though, t h a t since every (oblique) projector can be interpreted as an orthogonal
projector w i t h respect t o an appropriate m e t r i c tensor, every unbiased estimator can be interpreted
as a BLUE's estimator w i t h respect t o an appropriate covariance map, a f a c t which was already

pointed out by (Baarda, 1967b, p. 34). To see this, assume that


u(ctu)-lct

U=

with

R(A)

and

is a m a t r i x representation o f the oblique projector P

R(A),

M = R(A) e

CO,

CO'

With the symmetric and positive-definite m a t r i x

follows then that

Thus the problem o f comparing d i f f e r e n t unbiased estimators can i n principle be restricted t o the
problem of analyzing the e f f e c t of assumptions on the m e t r i c tensor. See e.g. (Krarup, 1972).

Now l e t us assume t h a t we have picked one particular l - f o r m ,


t h a t the corresponding unbiased estimate o f

*(c)
)i
Y
= Y1 + P

(ys-yl)

Nc

)i

-(c)
Y

y1

+ P

(ys-y,),
R(A)

say. I t follows then f r o m (6.4)

M is given by:

C, N ~ ( A * )

or

(c)

with y

, CO

And since the problem of removing inconsistency is i n the above context of linear estimation
essentially the problem o f finding the estimate
actual adjustment problem once
-(c?
parameter representation o f y

9")

, one

could say that one has concluded the

is computed. I n practice, however, one often requires a


E

N.

And here is thus where the actual inverse mapping

problem enters. That is, i n order t o find a parameter representation o f


pre- or inverse image of 9 ( C ) - y

AN

one needs a particular

under A. By means o f a generalized inverse, B say, o f A

such an inverse image is obtained as

-(c)
F r o m the transformation r u l e (5.13) and (6.7) follows then t h a t every inverse image of Y
under A can be w r i t t e n as

-Y1

with

N = S m

and M*

Nu(A)

T o understand what

X
A

* ')

(yS9y1)

* -

(Y

*( c )

Y,Yl

(ySsy1)

N U ( A * ) , and where

( p

,Y-Y,)

is allowed t o be any arbitrary

is a LUE of

= { Y ~ }+ AN

V I sE M*

y,, y - y l )
C, N ~ ( A * )

. B-* x s , y - y l )
f

(P

is thus uniquely characterized by the choices made f o r C and S .

actually estimates, consider the following equivalencies:

(Y,*Y),

= C m

1' ''I

inverse o f A. The estimate

is a L U E o f

isaLUEof

C, N ~ ( A * )

( X ,X ) ,

X:

.B*.

(P

R (A*)

R(A*)

S , Nu(A)

,y-y

of A*

) isaLUEof

(y-yl))

isaLUEof

R ( A ) , CO
v X : E N *

is an unbiased estimate o f

subtle difference as t o what

B*

. B . P

( X ~ 9 PN U~( A, )X ) ,

,so

(xs,p

I n other words

arbitrary inverses

C, N ~ ( A * )

XA ( s " )

.
(S)

--

X , b u t not o f X itself. This


' S , Nu(A)
actually estimates has sometimes been a source o f confusion.
X

See e.g. (Jackson, 1982).


Since the projector P

S , Nu(A)

always projects along the nullspace o f A (see figure 14) we have t h a t

figure 1 4
The transformation between the various inverse images of

-y

under A is therefore given by

Such transformations a r e now known a s S-transformations. They were first introduced by Baarda in
t h e context of f r e e networks and used t o obtain an invariant precision description of geodetic
networks (see e.g. Baarda, 1973; Molenaar, 1981; Van Mierlo, 1979; or Teunissen, 1984a). Baarda has
used t h e t e r m "S-transformation", since t h e projector PS nu(^) is in case of geodetic networks
9

derivable from the differential Similarity transformation. In the above general context, however, i t
would perhaps be more appropriate t o call transformation (6.10) a Singularity transformation. This a s
opposed t o the Consistency transformation (6.6).
Note t h e g r e a t resemblance between (6.6) and (6.10). From this comparison also follows t h e duality
result t h a t the
of

C-transformations of A a r e the

is t h e S-transformation of A

3f

S-transformations of A

and t h e projector P

S, Nu(A)

or, t h e projector

is t h e C-transformation

In this section we have seen how the inverse linear mapping theory applies t o t h e problem of linear
estimation. We have seen that t h e actual problem of adjustment and t h e actual problem of inverse
mapping, although dually related, a r e essentially two problems of a different kind. Were we only
interested in adjustment, i.e. in removing inconsistency, then we would only be concerned with t h e
subspace C

M*. But if one, in addition t o removing inconsistency, is also interested in f i n d i n g a

particular pre- or inverse image of

9")

R(*)

under A, then t h e choice of S c N comes t o

t h e fore. We would like t o stress here t h e importance of the definite ordering: first adjustment and
then inverse mapping, since i t shows t h a t in an estimation context no g r e a t value should be attached
- ( s , c ) of
t o the subspace D. In f a c t t h e only inverses of A which map ys into the pre-image X
$(c),
one

a r e t h e minimum rank inverses (

can

not

get

= P
R ( A ) , R(A)

1)'s

an

arbitrary

, by mapping

pre-image
y

0) )

. And in particular one should be aware t h a t

-(S)

of

the

least-squares

estimate

M with an arbitrary least-squares inverse of A into N.

1. Introduction

In t h e preceding chapter we have seen how t o characterize an arbitrary inverse of a linear map
A: N

M uniquely. In particular we saw how by choosing

CO

complementary t o R(A) one could

make an inconsistent system of linear equations consistent, and how S complementary t o Nu(A) gave
a

way

of

restoring

determinability.

We

also

noted

that

although

inconsistency

and

underdeterminability generally occur simultaneously if rank A = r < min.(m,n), t h e actual problem of


adjustment, i.e. t h e problem of removing inconsistency, and t h e actual problem of inverse mapping
a r e essentially t w o problems of a different kind. They can therefore be dealt with separately.
In this chapter we will concentrate on t h e actual inverse mapping problem of geodetic networks. As
an exemplification of t h e theory of S -transformations we discuss t h e non-uniqueness in coordinate
system definitions and construct s e t s of base vectors for Nu(A). We also discuss t h e related problem
of connecting geodetic networks.
Section t w o is devoted t o t h e inverse mapping problem and section t h r e e t o t h e problem of
connecting networks. In section two we discuss successively t h e planar, ellipsoidal- and t h r e e
dimensional case. Although we recognize t h a t t h e inverse mapping problem of two dimensional planar
geodetic networks has already been discussed a t length in t h e geodetic literature (see e.g. Teunissen,
1984a, and t h e references listed therein), we have reiterated some of t h e theory since i t indicates
very well t h e principles involved. Generalization t o the ellipsoidal- and t h r e e dimensional c a s e
becomes then r a t h e r straightforward.
F o r practical ellipsoidal networks an interesting f e a t u r e turns out t o be t h e numerical ill-conditioning
of t h e inverse map. One will find namely t h a t even a f t e r t h e admitted degree of freedom of t h e
ellipsoidal model is taken c a r e of, the estimated geodetic coordinates of practical ellipsoidal
networks still lack precision. As a consequence t h e estimation problem of t h e ellipsoidal model turns
out t o b e not too different from t h a t of t h e planar model.
In our discussion of t h r e e dimensional networks we make a distinction between local surveys and
networks covering a large area. F o r local surveys (e.g. for t h e purpose of construction works), i t is
likely t h a t one is only interested in describing t h e point configuration of t h e network. Therefore, for
such networks S -transformations t h a t only transform coordinates (and their CO-variances) will do. As
an example we have given an analytic expression of t h e t h r e e dimensional

S-transformation

advocated by (Baarda, 1979). For large networks however, i t will not be sufficient t o consider only
t h e coordinate transforming S-transformations. In these cases one is almost surely also interested in
a description of t h e fundamental directions like local verticals and t h e average terrestrial pole. T h a t
is, besides t h e network's point configuration also t h e configuration of t h e fundamental directions
becomes of interest then. Hence, we also need S-transformations t h a t transform both coordinates
and orientation parameters.
Having given t h e various representations of Nu(A) which a r e needed t o derive t h e appropriate S -

transformations, we turn our attention i n section three t o the problem o f connecting geodetic
networks.

Without

exaggeration one can consider this problem o f comparing and connecting

overlapping pointfields t o be almost omnipresent i n geodesy. I n cartography f o r instance, the problem


occurs when digitized map m a t e r i a l needs t o be transformed t o a w e l l established known coordinate
system such as a national system. And i n photogrammetry when photogrammetric blocks need t o be
connected w i t h terrestrial coordinate systems or i n case o f stripwise block adjustment when the
various strips need t o be connected (Molenaar, 1981b). Also i n surveying practice where densification
networks need t o be t i e d t o existing (often higher order) networks the connection problem appears
repeatedly (Brouwer et.al.,

1982). And on a more global scale when connecting satellite networks t o

national networks (Adam et.al.,


connect networks, e.g.

1982). Even i n case o f gravity surveys one sometimes needs t o

relative gravity networks t o existing w e l l established absolute gravity

systems. And finally similar problems are encountered in deformation analysis (Van Mierlo, 1978).
There networks measured a t t w o or possibly more epochs need t o be compared i n order t o a f f i r m
projected geophysical hypotheses.
I n a l l the above cases the same principles f o r connecting networks can be applied although o f course
the elaboration can d i f f e r f r o m application t o application, depending e.g.

on the information available

and the purposes one needs t o serve. T h a t is, although d i f f e r e n t solution strategies exist, a l l methods
r e l y on the self-evident principle t h a t the only information suited f o r comparing networks, is the
information which is common t o both networks.
I n our presentation we w i l l discuss three methods f o r connecting geodetic networks. Although a l l
three alternatives are considered t o some extent i n the geodetic literature, the t r e a t m e n t below
accentuates some aspects which are not discussed elsewhere.

2. Geodetic networks and their degrees of freedom

2.1.

Planar networks

L e t us commence, i n order to f i x our minds, w i t h the simple example o f a two dimensional planar
triangulation network i n which only angles are measured (see e.g.

figure 15).

A f t e r adjusting the network (using e.g.

a f i r s t standard

problem formulation) we obtain a consistent set o f


adjusted angles which determines the shape o f the
network. I n order t o describe this shape we have many
possiblities a t hand. Each set o f mutually independent
adjusted angles f o r instance,

w i l l do.

I n practice,

however, one usually wants the result o f an adjustment


t o be presented by means o f coordinates, since they
are more manageable than individual angles.

The

advantage o f working w i t h coordinates is namely that,


figure 15

once they are introduced, they a l l have one and the

same reference i n common. The benefit being t h a t w i t h coordinates the relative position o f any t w o

points i n a network is easily obtained without need t o bother about the way i n which these t w o
network points are connected by the measured elements. Consequently, coordinates are very
tractable f o r drawing maps or making profiles of the whole or parts o f the network.
With this motivation i n mind we are thus looking f o r a way t o present our results of adjustment by
means of (cartesian) coordinates.
However, i n order t o compute coordinates we f i r s t need t o f i x some reference, i.e. i n the case of a
planar triangulation network we need t o f i x the position, orientation and scale of the network. One
way t o accomplish this is o f course by fixing t w o points of the network,i.e.

by assigning arbitrary and

non-stochastic coordinates t o t w o points o f the network. F o r instance, we can s t a r t by fixing the


points P1 and P2 and then compute, w i t h the aid o f the adjusted angles, the coordinates of the points
P3, P4, P 5 and P6. Or, we can f i x the points P 3 and P1, and then compute the points P4, P5, P6 and
P2. L e t us f o r the moment leave i n the middle which two points we fix. Let's just c a l l them Pr and PS.
We then can w r i t e (see figure 16)

= yr
I

I
cosA
+ I
cos (A
+ n + a
1
rS
rS
S i
rS
rsi

figure 16
Linearization o f (2.1) gives (the upperindices "on indicate the approximate values):
Ax

Ayi

= Ax

+X

= Ay +y
r

rs

Alnl

rs

Alnl

rs+'rs
0

-X

rs

rs

AA
AA

rs

rs

+X

+y

si
0

si

Alnl
Alnl

si+'si
0

-X

si

si

AA
AA

rs

+y

si
0

-X

rs

si

Aa
Aa

rsi

rsi

which we can w r i t e as
o

Ax

ri

Ay

A A ~

'ri

rS
Alnl

rs

'

Since a l l the angular type of information is collected i n the f i r s t t e r m on the right-hand side of (2.3)
we see that, i n order t o be able t o introduce coordinates, we need t o assign B p r i o r i values t o the
second term. One way is of course t o take points P, and PS as reference- or base points by assigning
t o them the non-stochastic approximate coordinates

Ax r

= Ayr

= Mrs = A l n l r s

XO

'

y:

and x

,y 0 ,

i.e. by assuming that

= 0 or

The coordinates o f any other point Pi of the network are then computed as

where the upperindices (r,s) indicate t h a t these coordinates are computed w i t h respect t o the
basepoints P, and PS.
Although the choice of fixing the t w o points P, and PS i n (2.3) is an obvious one, there are also other
ways o f introducing coordinates. One could f o r instance take two other points of the network as base
points, or f i x linear combinations o f the coordinate increments of network points. Essential is,
irrespective the choice made, t h a t the positional-, orientational- and scale degrees o f freedom of the
network are taken care of. This is best seen by observing that (2.3) combined w i t h (2.5) essentially
constitutes the t w o dimensional d i f f e r e n t i a l similarity transformation:

which follows f r o m linearizing

= t O = 0.
Y
Since there are many different ways of introducing coordinates, i t is important that one recognizes
under the assumptions t h a t 2 = 1 , @O=0 and t:

that i n general

Hence, i f one wants t o compare two sets of coordinates, where the two sets are computed f r o m t w o
different and independent observational campaigns
analysis

for instance f o r the purpose of a deformation

i t is essential that these coordinates are a l l defined w i t h respect t o the same reference.

Now i n order t o get a l l coordinates i n the same reference system one needs t o be able t o transform
f r o m one system t o another.
F o r the above defined (r,s)-system this transformation is easily obtained.
F r o m substituting

i n t o (2.3) follows namely w i t h (2.5) the transformation rule

(2.9)
which shows how t o transform f r o m an arbitrary coordinate system t o the prespecified (r,s)-system.
To find the general procedure f o r deriving such transformations, note t h a t the definition of the (r,s)system and the derivation of (2.9) followed f r o m the decomposition formula (2.3). With (2.5) and (2.8)
this decomposition formula reads i n m a t r i x notation as

where
X

and

= (Ax ,Ayr,Axs,Ay
r

Ax. ,Ay
I

t
..
.) ,
i

Decomposition (2.10) is however, just one o f the many possible decompositions o f

X.

A n alternative

decomposition follows i f we premultiply (2.10) by

I
where R(Si) is arbitrary b u t complementary t o R(V ) We then get

or

= S C V S.)
L

1 t

And this expression decomposes

1 I t l - l I t
+v ( ( 5 . ) V ) (Si) X

S
X

(2.11)

just l i k e (2.10) i n t o a f i r s t part, which contains a l l the angular type

o f information and a second p a r t for which additional a p r i o r i information is needed. Now, just l i k e
decomposition (2.10) suggested t o choose the restrictions (2.4), (2.11) suggests t h a t we take

The coordinates o f the network points are then computed as

where the upperindex (si) refers t o the choice (2.12).


substituting (2.13) i n t o (2.11) that

Hence the transformation t o the (si)-system is given by

And analogously t o (2.10)

we f i n d f r o m

This is the general expression one can use for deriving transformations l i k e (2.9).

We thus see t h a t i n

order t o derive such a transformation we only need t o know R(V ) and t o choose an (S: )
I
4 X ,!I
t h a t R(S ) is complementary t o R(V )

such

So far we discussed planar networks o f the angular type. B u t formula (2.14) is o f course valid for

other types o f networks too. The only difference is that we need t o modify R(V ) accordingly. For
a network i n which azimuths and distances are measured for instance, we find f r o m

that

i.e.

the appropriate (differential) s i m i l a r i t y transformation is i n this case the one i n which scale and

r o t a t i o n is excluded.

To link up w i t h the theory of the previous chapter note t h a t it follows f r o m

t h a t i n case of, for instance, an angular type o f network a l l linear(ized) functions o f the angular
observables are invariant to the d i f f e r e n t i a l similarity transformation (2.6). Thus i f the adjustment of
the planar triangulation network o f e.g. figure 15 is formulated as

then

Hence we recognize transformation (2.14) as an example o f an S-transformation, i.e.

Following (Baarda, 1973) we w i l l therefore c a l l the coordinate systems corresponding w i t h choices


l i k e (2.121, S-systems.

A t this point of our discussion i t is perhaps f i t t i n g t o make the following

I
remark concerning the choice of S = R( S ) complementary to R( V )
Some authors, when dealing w i t h free network adjustments, prefer t o take the coordinate system
definition corresponding t o the choice

This is o f course a legitimate choice, since it is just one o f the many possible. However, we cannot
endorse their claim t h a t one always should choose (2.20) because i t gives the "best" coordinate system
definition possible.
They m o t i v a t e their claim by pointing out that the covariance map o f the pre-image o f the BLUE'S
estimate

9 of 7 = Ax corresponding w i t h

t r a c 4 (I-?(
f o r a l l pre-images

(V')
of

the choice (2.201, has minimum trace, i.e. t h a t

'vI)-'(V')

(I-V'(

(V')

'V')

-l(vL) t)}< trace Q

*(S)

under A.

This i n i t s e l f is true o f course. I n case o f free networks however, i t is unfortunately without any
meaning. A l l the essential information available is namely contained i n

whereby

is nothing

b u t a convenient way o f representing this information. A theoretical basis f o r prefering (2.20) does
therefore not exist i n f r e e network adjustments. A t the most one can decide t o choose (2.20) on the
basis o f computational convenience which m i g h t i n some cases be due t o the symmetry o f

I-v'(

(V')

tv')

- l ( v L ) t.

One could also rephrase the above as follows: Since every (oblique) projector can be interpreted as an
orthogonal projector w i t h respect t o an appropriate chosen metric, the difference between the w i t h
choice (2.20)

corresponding S-system and another arbitrary S-system can be interpreted as the

difference i n choosing a parameter-space

norm, w i t h (2.20)

corresponding t o the canonical

parameter-space norm. And since there is no reason t o prefer one particular norm above another, we
do not have, as i n physical geodesy, a norm choice problem i n f r e e network adjustments.

2.2 Ellipsoidal networks

So f a r we discussed the inverse linear mapping problem o f planar geodetic networks. B u t l e t us now
assume t h a t we have t o compute a geodetic network, the points o f which are forced t o l i e on a given
ellipsoid o f revolution, defined by

where a and b are respectively the ellipsoid's major and minor axes.
I n view o f the foregoing discussion the three main questions we are interested i n are then: (i) how
does the theory o f S-transformations apply t o the ellipsoidal model, (ii) how does i t compare t o the

results we already obtained f o r the planar case and ( i i i ) what are the consequences f o r practical
network computations.
On an i n t u i t i v e basis it is not too d i f f i c u l t t o answer these three questions provisionally. F r o m the
rotational symmetry o f the ellipsoid o f revolution follows namely that the ellipsoidal model w i l l a t
most admit one degree o f freedom. And since this degree of freedom is o f the longitudinal type i t
follows t h a t the ellipsoidal counterpart of transformation (2.6) w i l l read as

where
is the geodetic longitude increment of point Pi and AeZ the d i f f e r e n t i a l r o t a t i o n angle.
AAi
Hence, transformation (2.22) can be used t o derive the appropriate S-transformations f o r the
ellipsoidal model.
As t o the second question, i f one wants to understand i n what way and to what extent the ellipsoidal
model differs f r o m the planar model, we need a way of comparing both models. One can achieve this
by considering the planar model as a special degenerate case of the ellipsoidal model. Assume
therefore t h a t we are given a geodetic triangle (i.e. a triangle bounded by geodesics) on the ellipsoid
By l e t t i n g e 2=(a 2-b 2 )/a 2 the f i r s t numerical eccentricity squared, approach zero

of revolution (2.21).

we get f o r the l i m i t e 2 + 0 t h a t the ellipsoid of revolution becomes a sphere w i t h radius R:=a=b.


Consequently, the given ellipsoidal triangle w i l l become a spherical triangle f o r which then spherical
geometry applies. Now, i f we further proceed by l e t t i n g the spherical curvature approach zero then
f o r the l i m i t R

the sphere becomes identifiable w i t h i t s own tangent planes. Hence, f o r

increasing values o f R the spherical triangle w i l l ultimately reduce t o an ordinary planar triangle.
Summarizing one could therefore say that the difference between ellipsoidal geometry and planar
Euclidean geometry is p r i m a r i l y made up by the two factors e2 and R. And one can thus expect t h a t
i f both the ellipsoidal eccentricity factor e2 and the spherical curvature 1/R are small enough, no
significant differences w i l l be recognizable between ellipsoidal geometry and planar Euclidean
geometry.
B u t what about the admitted degrees o f freedom? We note namely a drastic change i n the maximal
number of admitted degrees of freedom when the two l i m i t s e 2

0 and R

are taken: the

ellipsoidal model only admits the longitudinal degree o f freedom, whereas the planar model admits a
maximum of four degrees of freedom. Still, despite this difference i n admitted degrees o f freedom it
seems reasonable t o expect that the actual estimation problem of the ellipsoidal model w i l l n o t be
too d i f f e r e n t f r o m that of the planar model i f e2 and 1/R both are small enough. Consequently, it can
be questioned whether i n this case transformation (2.22) suffices t o characterize the degrees o f
freedom admitted by the ellipsoidal model. Theoretically i t does of course. B u t for practical
applications i t becomes questionable whether the rotational degree o f freedom as described by (2.22)
is the only degree of freedom the ellipsoidal model admits i f both e 2 and 1/R are small.
This then brings us t o the t h i r d question concerning the consequences for practical network
computations. Namely, the smaller e2 and 1/R get the worse the conditioning of the ellipsoidal
networks' design m a t r i x A can expected t o be. That is, although theoretically the maximum defect o f

A equals one, i t can be expected t h a t for small enough values of e2 and 1/R more than one of the
columns of the design m a t r i x A w i l l show near linear dependencies. As a consequence one can
therefore expect t h a t the ill-conditioning of A w i l l a f f e c t the estimation of the explanatory variables
X

= A

i n the linear model

X.

I n t u i t i v e l y one can understand this by realizing t h a t the almost

collinear variables do not provide information that is very different f r o m t h a t already inherent i n
others. I t becomes d i f f i c u l t therefore t o i n f e r the separate influence of such explanatory variables on
the response

7.

Consequently, the potential harm due t o the ill-conditioning o f the design m a t r i x

arises f r o m the f a c t t h a t a near collinear relation can readily result i n a situation i n which some of
the observed systematic influences of the explanatory variables

on the response is swamped by the

model's random error term. And i t w i l l be clear t h a t under these circumstances, estimation can be
hindered.
To f i n d out whether f o r p r a c t i c a l ellipsoidal networks the estimation o f geodetic coordinates is
indeed hindered by the expected ill-conditioning of A, one can follow different b u t related routes.
One way is t o investigate numerically t o what extent the shape o f an ellipsoidal network as measured
by i t s angles, can considered t o be invariant t o a change of i t s position, orientation and scale.
Another way is t o compute the non-zero singular values o f A or the non-zero eigenvalues of the
t
normal m a t r i x A A. Eigenvalues small relative t o the largest eigenvalue of the normal m a t r i x w i l l
then r e f l e c t the poor conditioning o f A. And finally one could t r y t o show analytically t h a t the
estimated geodetic coordinates lack precision i f only the longitudinal degree of freedom is taken care
of.
The f i r s t approach, which is based on the idea that f o r planar geodetic networks of the angular type
the invariance t o position, orientation and scale changes is complete, has been followed by (Nibbelke,
1984). And he found t h a t f o r p r a c t i c a l ellipsoidal triangulation networks one can indeed consider the
network's position, orientation and scale as non-estimable. That is, one is, just as i n the planar case,
forced t o f i x four linear independent functions of the geodetic coordinate increments. The theoretical
deformations of the network's shape, which possibly follow f r o m these restrictions, are then
negligible. The same conclusion was also reached by (Kube and Schnkldelbach, 1975), who used the
second approach.

The reported eigenvalue computations which were performed f o r the European

network show t h a t i n case of, f o r instance, an ellipsoidal triangulation network, four eigenvalues of
the normal m a t r i x w i l l be so small t h a t a sensible estimation of the network's position, orientation
and scale is n o t attainable. This conclusion is also i n agreement w i t h the result found by (Krarup,
1982a), who indicated t h a t the position of a trilateration network on an ellipsoid of revolution is
practically non-estimable.
As an example and also t o support the above mentioned findings we w i l l now show analytically t h a t
the estimation o f geodetic coordinates indeed lacks precision i f only the longitudinal degree of
freedom is taken care of. F o r this purpose assume t h a t we have a f u l l rank linear model

y = A x
mxl
mxn n x l '
i n which x2 of

(xl

xi)

has been identified as the parameter which is degraded by the ill-

mxl

m x ( n - l ) rnxl

conditioning o f A.

L '

nxl

follows then t h a t the column vector A 2 depends almost linearly on the columns o f A.l

Using the

reparametrization

we can w r i t e (2.23) as

F = A 1( X1+ ( At1 A1) -1A t1A 2

or as

) + (I-Q ( A A )
1 1 1
2

-1 t
A )A2x2
1

y = A x
+ A x
1 1
2 2'
with

= (I-A ( A A )
1 1 1

(2.24)

-1 t
A ) A .
1
2

(2.25)

F r o m the f a c t t h a t A 2 depends almost linearly on the columns of A1 now follows t h a t one can
t o be a rather short column vector. Geometrically this is seen as follows.
reasonably expect A
t
t
Since I-A1 (AIA1 ) A1 is an orthogonal projector, we have t h a t (see figure 17)

-I

A ~ A=
2 2
where

A (1-A (A A )
1 1 1
2

-1 t
t
2
A ) A = A A s i n 0,
1 2
2 2

0 denotes the angle between A2 and i t s orthogonal projection on the subspace spanned by the

columns of A.l

figure 17

F r o m the near linear dependency o f A1 and A 2 thus follows that the angle 0 w i l l be small. Hence,
the length o f

A2

can be expected t o be small i f the length of A

is not too large.

3
a I,

N o w i f we assume the covariance map of y t o be Q =


it follows f r o m (2.24) and the
Y
2
t h a t the variance ox of x2 is given by
orthogonality of Al and
2
2

=
X

---- - --------- - .....................


-t-

A A
2 2

t
2
A A sin 0
2 2

t
t
-1 t
A ( I - A ( A A ) A1)A2
2
1 1 1

Hence, the estimation of x2 lacks precision i f the length of

A2

is too small. Thus i n order t o f i n d out

to what extent the diagnosed ill-conditioning of A affects the estimation o f x2 we need t o have a
reasonable estimate of (2.27).
Since we know t h a t the possible lack o f precision of the estimated parameter x2 is a consequence o f

the near linear dependency between A1 and A2, i t follows that there must exist a vector, z say, f o r
which
A z = v

is small enough. F r o m w r i t i n g (2.28) as

we get

Hence, expression (2.27) can also be w r i t t e n as

With

( I-A1 (AIA1

-1 t
A1 ) v (

v v,

we then get the lower bound

Thus i f we are able t o f i n d a vector z such that the length o f A z =

is small enough, we can use

the lower bound of (2.30) t o prove that the estimation of x2 indeed lacks precision.
Now, t o apply the above t o our case of ellipsoidal networks, recall t h a t we made i t plausible t h a t the
difference between ellipsoidal,

spherical and planar Euclidean geometry can considered t o be


2 and 1/R are small enough. One can therefore expect t h a t f o r small
insignificant i f both the factors e

enough values of e2 and 1/R, the eigenvectors of spherical- and planar networks' design matrices
belonging t o zero eigenvalues are the proper candidates for the z-vector of (2.28). F o r this purpose
we thus f i r s t need to find the spherical analogon of (2.3) (or (2.15)).
We w i l l s t a r t f r o m the three dimensional d i f f e r e n t i a l similarity transformation
( A t

With

'1

c o s @ . c o sA .

-sin@.cosA

I
0

I
0

I
0

Ay

= cos@, s i n 1

where

i J

AZ

ni

i
i

0 . .

-sinX.
c

(~'+h(l)~@~
i
I

c o s @.

sin@.

Ah .

1
0

NOCOS

$. A A
I

i,

@ i , A. and hi a r e respectively t h e g e o d e t i c l a t i t u d e , longitude and g e o m e t r i c height a b o v e


I

t h e ellipsoid of point Pi, and Ni, Mi a r e t h e e a s t - w e s t and north-south radii of c u r v a t u r e , o n e c a n


r e w r i t e (2.31) in g e o d e t i c c o o r d i n a t e s a s

0
0.
0.
0
2
2 0
0
0.
. - s i n @ . s i n k . c o s + . . - ( N . ( 1 - e s i n Q. ) + h . ) s i n 1
i.
1
i.
I .
I
1
1
i.
0
0.
0
0.
0.
2 0
0
0
0
c o s Q . c o s A . . c o s @, s i n k . . s i n @ , . - e N . c o s Q , s i n @ . s i n A .
0

-sinQ.cosA

0.

2
( ~1 ~ ( 1 - e
0
2
(N. ( 1 - e s

2 0
0
e N.cos Q
1

Q, c o s A

cosQ.sin@
1
i
2
2 0
. - e s i n @.)

(2.32)
Since t h e network points a r e f o r c e d t o lie on t h e ellipsoid of revolution, w e m u s t h a v e t h a t
0

= 0 and h i = 0

Vi = 1,

...

H e n c e , i t follows f r o m (2.32) t h a t
0
0
0
2 0
0
0
0
+ c o s Q . s i n A . A t + s i n Q . A t - e N . c o s Q . s i n @ . s i n A . A ~+
I
1
Y
l
Z
l
l
l
l
X
2 0
2
2 0
e N .I c o s Q O1 s i n Q O
1 c o s A ~ A E+ N Y ( ~ - e s i n $ i ) AK ,
Y
V i = 1,
0

0 = cosQ.cosA.At

...

But this means t h a t for a regular network (i.e. a network which excludes cases like X i = constant,
Vi = 1,

. . .)

situated on an ellipsoid of revolution we have t h a t

which confirms our earlier s t a t e m e n t t h a t t h e ellipsoidal model only admits t h e longitudinal degree of
freedom.
In an analogous way we c a n find the type of degrees of freedom admitted by t h e spherical model. In
spherical coordinates Ri, $ i and A .

'

transformation (2.32) will namely read a s

' A t '

. COSA i

+ -sin$.cosA
I

i
0

.. R . s i n $ . c o s ) , .
I

0.

.R.sin$.sin~
I .
I
I
i.

0.

.-

c o s 0. c o s A

~t~

~t
0

AEX

AE

RO

AE

i,

, AK
(2.36)
And by setting
0

R i = R and ARi = 0

Vi = 1 ,

...,

(2.37)

we g e t t h a t

from which follows with (2.36) t h a t t h e spherical counterpart of (2.6) is given by

(2.39)
To find t h e expression which corresponds t o (2.3) (or (2.15)), we first need t o know the relation
t
and ( A $ , , A A r , A A r s ) t . This is given by

between ( A E ~ , A E
Y

Y
z

Substitution of (2.40) into (2.39) then gives

0.

cosh
'i.

o
o 0 .
o
o
o o
o
----- . s i n @ . s i n ( A . - A ) . R s i n @ . c o s @ c o s ( A - A ) - R c o s @ . s i n @
0.
1
l
r .
1
r
i
r
1
O
r

The spherical analogon of (2.3) (or (2.15)) then finally follows from substituting
I
ir
- s i n --- s i n A
= s i n ( + a - @ )sine(.-A )
1
r
R
ir
r
= cos@ sin(A.-A

and
1
sin

ir

--- c o s ( 2 n - A
R

ir

) = s i n ( + n - $ . ) c o s ( i n - @ 1 - c o s ( ~ n - & ) ~ i n ( + n - @ ~ ) c- o
A s )( ~

into (2.41):

.
.

cos @
r.
0
\

cos(A - A )
i
r

. Rsin

ir

--- s i n A

(2.42.a) and (2.42.b) follow from applying t h e sin-rule sin a/sin A = sin b/sin B and t h e so-called fiveelements' rule sin c cos a - cos c sin a cos B = cos A cos b (see figure 18) of spherical geometry.

Expression

(2.43)

shows t h a t not surprisingly the

spherical model admits a maximal number o f three


degrees of freedom, a l l o f which are of the rotational
type. Hence, we find that theoretically speaking the
scale o f a spherical network is estimable. Even i f only
angles are measured. Those who are familiar w i t h
global aspects o f differential geometry know this of
course already f r o m the Gauss-Bonnet formula. When
applied t o the sphere, this formula says t h a t for a
triangular region bounded by three geodesics the sum
o f the spherical triangle's interior angles minus a
figure 18

equals the r a t i o o f the area enclosed by the triangle


and the radius o f the sphere (see e.g. Stoker, 1969). We

are here thus confronted w i t h a situation where angles alone suffice t o determine scale. B u t still,
although scale is theoretically estimable, one can expect, as was made clear i n the foregoing
introductory discussion, t h a t i f the spherical curvature is small enough scale w i l l only be very poorly
estimable. And indeed it turns out t h a t f o r practical spherical networks, scale can be considered as
non-estimable. See for instance (Molenaar, 1980a,p.20) or the earlier c i t e d references.
I n the same manner it is concluded i n these publications that the scale, orientation and position o f
practical ellipsoidal networks, can considered t o be non-estimable. To support these findings we w i l l
now show analytically, t h a t the geodetic coordinates lack precision i f only the longitudinal degree o f
freedom o f the ellipsoidal model is taken care of. F o r this purpose consider expression (2.43).

The

three columns o f the m a t r i x on the right-hand side o f (2.43) span the nullspace o f the design m a t r i x o f
a spherical triangulation network, whereas the f i r s t column vector provides a basis o f the nullspace o f
an ellipsoidal network's design matrix. Thus, i f the eccentricity factor e2 is small enough one can
expect t h a t both the second and t h i r d column vector o f (2.43) get almost annihilated by the ellipsoidal
network's design matrix. Hence, we can use one of these vectors, say

t o obtain an estimate o f the lowerbound (2.30) via (2.28).


L e t us consider as an example an ellipsoidal t r i l a t e r a t i o n network. According t o (Helmert, 1880, p.
282) the ellipsoidal distance observation equation reads as:

(2.45)

where A .
1

denotes the ellipsoidal geodesic azimuth f r o m Pi t o P.. We w i l l abbreviate (2.45) as

where

-0

= (... - s i n A

-0

,
ij

-cosA

-0

,
i j

-sinA

-0

j i,

-cosA

,
j i

...I

is the kth rowvector of the ellipsoidal network's design matrix and

Using (2.44) we get


V

-0

-0

-0

- cosA

= a z = -sinA

-sinA

sin@.sin(X.-X )
ij
I
I
r

cosA

-0

sin@.sin(X -X )
j i
J
j
r

cos(X -X )
ij
i
r
cos(),

j i

-X

I t will be c l e a r t h a t if t h e network is situated on a sphere, then V k = O .

L e t us therefore identify

geodesic coordinates with spherical coordinates. With

which a r e obtained from


j '
identifying geodesic coordinates with spherical coordinates, and linear approximations like

where A.. denotes t h e spherical azimuth between t h e points P i and P


11

sin A

ij

= sin A

ij

+ cos A

ij

AA

ij

we can rewrite (2.47) a s


V

= (sin A

ij

cos(X -X )
i
r

+ [sin A

cos(X.-X
j i
J

0
l-

cos A

ij

s i n @ , s i n ( X . - X ))AA
+
I
I
r
ij

cos A

s i n @ , s i n ( X -X ))AA
j i
J
j
r
j i

(2.48)

Repeated application of t h e sine-rule and five-elements' rule of spherical geometry and

1AA .1
1

1AA .1
J

(see Helmert, 1880, p. 2891,

then finally gives

From this e s t i m a t e and (2.30) thus indeed follows t h a t in c a s e of practical ellipsoidal networks (1.. =
1J
64 km, R = 6 4 0 0 k m , o =
l ~ - ~ i .j 'l e = 1/ 3 0 0 ) geodetic coordinates will lack precision if
only t h e longitudinal degree of freedom is taken c a r e of.

Three dimensional networks

2.3.

Now t h a t we have considered t h e inverse mapping problem in two dimensions i t is not too difficult t o
generalize t o t h r e e dimensions.
We will first assume t h a t only angles and distance ratios a r e measured in t h e t h r e e dimensional
geodetic

network.

The

generalization

of

(2.1)

to

t h r e e dimensions becomes

then

rather

straightforward. To s e e this, observe t h a t we can write (2.1) a s


IrssinArs'
lrscosA

rs

Isi
---c o s a

'sr

lrssinArsX

rsi

'rsCoSArs
0

'

'si

---s i n a

'

sr

rsi

0 l O ] ' I ~ ~ S ~ ~ A ~
lr s ~ ~ s A
rS

-1 0 0

, 0 0 0

(2.50)
where t h e action of t h e matrix

equals t h e action of

with

denoting t h e vector- or cross product.

"X"

With (2.51), expression (2.50) therefore suggests t h e following generalization t o three dimensions:
X

y
z

, i ,

yr
z
. r,

l r s s i n Z r s s i n Ar s

+ 1r Ss i n Z r s c o s Ar S

- 'rsCos

lrssin Z

'

Isi
- ---

rS

s i n Ars

lrs

'rs

rS

lrssin Zrssin ArsY

Is i
- ---

n2

Isr sin a r S

' nnl 3]
\

l r s s i n Zrscos Ars
lrsCOS Z

rS

where Zrs denotes t h e vertical angle of t h e line PrPs (see figure 19.a) and n = (nln2n3It is t h e unit
normal of t h e plane through t h e points P,, PS and P i (see figure 19.b) defined a s

figure 19

We thus see t h a t one way of introducing coordinates f o r three dimensional networks of the angular
type is by starting t o f i x the t w o points P,

and PS. This would then take care of six degrees o f

freedom. Namely, three translational degrees of freedom, two rotational degrees of freedom and one
freedom of scale. The remaining rotational degree o f freedom, namely r o t a t i o n of the network around
the line PrPs, is then taken care of by fixing the direction o f the u n i t normal n i n the plane
perpendicular t o the line PrPs. The so defined coordinate system thus corresponds t o fixing two points
P, and PS, and the plane through these two points and a t h i r d point, P t say. Following (Baarda, 1979)
I
t
)
m a t r i x by which the (r,s;t)we w i l l denote this S-system as the (r,s;t)-system. The ( S
(r,s; t )
system is defined then follows f r o m the restrictions

where n

which

o o o t
= ( n n n ) can be computed f r o m (2.53) for i = t using approximate values. With
1 2 3

follows

from

the

three

dimensional

straightforward application of (2.14) then gives

differential

similarity

transformation

(2.311,

' 0
X

-------------

----------W--

rs rt

rst

, O '
X

' 0
X

0
tr

Ybr
z t r,

'

ri

, O '

t ,
AXS

"2

Ays

."3,

IAzs,

'Ysi
zi ,

'

"1

Expression (2.56) can be considered as the natural generalization o f (2.9). Namely, i f we r e s t r i c t our
attention

in

(2.56)

to

the

Ax,

..., andalso

0
z.
= 0
1

V i = l,

A y - parts
0

of

n o - n 2 = 0,
1 -

the
n;

points

- 1,

Pi,

P,

and

PS

and

take

we obtain (2.9) again.

I n deriving the three dimensional S-transformation (2.56) we assumed that only angles and distance
ratios were observed. B u t this assumption is generally only valid i n l o c a l three dimensional surveys
(e.g.

construction works). I n large three dimensional networks, one w i l l usually have besides the

angles and distance ratios also direction measurements l i k e astronomical azimuth, latitude and
longitude a t ones disposal. It is l i k e l y then that one is not only interested i n the (cartesian)
coordinates describing the network's configuration but also i n the orientation (and possibly scale)
parameters describing fundamental directions l i k e local verticals and the earth's average r o t a t i o n
axis. It seems therefore t h a t f o r large three dimensional networks transformations l i k e (2.56), which
only transform coordinates (and their CO-variances) do not really suffice. And this becomes even more
apparent i f one thinks o f connecting such networks. F o r large networks we therefore need Stransformations that also transform orientation (and scale) parameters.
Now before deriving such S-transformations l e t us f i r s t draw a parallel w i t h the t w o dimensional
planar case. Since i n practice the observation equations are usually w r i t t e n down i n terms o f
instead o f i n terms of angles and distance ratios, the parameter
directions r.. and pseudo-distances I..
11

vector

11

o f the linear model

= A x w i l l contain besides the coordinate increments also orientation-

and scale unknowns. Hence, the linear model o f t w o dimensional planar networks w i l l i n practice be
o f the same f o r m as that o f large three dimensional networks:

with, x:l

coordinate unknowns; x2: orientation- and/or scale unknowns.

Thus also i n case o f t w o dimensional networks one can i n principle decide t o involve the orientationand scale unknowns i n the many S-systems possible. O f course i n practice one w i l l not do so, since i n
t w o dimensional planar networks these unknowns are generally o f no particular interest. B u t still, l e t
us, f o r the sake o f comparison between the two- and three dimensional case, pursue the idea o f
involving these unknowns i n the many S-systems possible.
Consider for this purpose a t w o dimensional planar network w i t h direction- and pseudo-distance
measurements r.. and I...
I n figure 20 a part of such a network is drawn. The theodolite frames i n
11

11

points P, and Pi are shown by dashed lines and the directions P P r r


zero reading.

P . P:
1

are the directions o f

figure 20
Analogous t o (2.1) we can then write

And linearization gives

Hence,
(

.. .AX

if

the
Ay

unknowns

.... A e i ,

in

AI n l c i .

the

..)

linear

model

(2.57)

are

ordered

like

t
X

= (xl

t
x2 ) =

its nullspace would read a s

A legitimate choice f o r defining a n S-system would therefore be

T h a t is, instead of fixing coordinates like we did in (2.4) we may just a s well fix one network point,
one direction of zero-reading and one scale parameter. The corresponding S-transformation then
follows from (2.59) a s

w h e r e t h e upperindex (r) i n d i c a t e s t h a t t h e s e p a r a m e t e r s a r e defined through t h e r e s t r i c t i o n s (2.61).


N o t e by t h e way t h a t o n c e o n e includes orientation- and s c a l e p a r a m e t e r s , o n e a c t u a l l y e x t e n d s t h e
notion of n e t w o r k configuration t o c o v e r both t h e point-configuration and a t t i t u d e s of t h e t h e o d o l i t e
frames. And in f a c t t h e direction- and pseudo-distance observables r.. a n d 1.. a r e t h e n i n t e r p r e t a b l e
11
11
a s angles a n d d i s t a n c e ratios. They b e c o m e t h e invariants of t r a n s f o r m a t i o n (2.62).
Now l e t us r e t u r n t o t h e t h r e e dimensional c a s e and g e n e r a l i z e t h e foregoing t o t h r e e dimensions. We
will s t a r t by assuming t h a t only horizontal- and v e r t i c a l d i r e c t i o n m e a s u r e m e n t s r.. and Z.. and
11
'1'
pseudo-distance m e a s u r e m e n t s I.. a r e available. We consider t h e following t w o t y p e s of righthanded
11
o r t h o n o r m a l t r i a d s ( s e e f i g u r e 21).
l0

The reference frame E I

I = 1,2,3;

I t is t o this r e f e r e n c e f r a m e t>at t h e c o o r d i n a t e s xi, yi, z i r e f e r , i.e. t h e position v e c t o r of point


P i , d e n o t e d by X ( P i ) = X' ( P i ) E I , h a s with r e s p e c t t o t h e f r a m e E t h e c o m p o n e n t s
=l
'2
'3
I
( P i ) = X.
(P.) = yi,
(P.
= 2..
l

2'

The theodolite f r a m e T I ( P i )

I = 1, 2 , 3 ,

in point Pi;

points upwards in t h e direction of t h e theodolite's f i r s t axis,


I=3
T I ,2 points in t h e direction of z e r o reading, and

TI

c o m p l e t e s t h e right-handed system.

figure 2 1
(a)
The relation between t h e two frames E

a n d T I ( P . ) is given by
I

cos0

where

sine
0

-sine

3, i

cos0

3, i

3, i

3, i

and
-sine
-S

in0

cos0

2, i
1, i

cos8

-sine

2, i

case 1, i case 2 , i

2, i
l ,i

sin0

2, i

case l , i s i n e 2 , i

Furthermore, we have for the difference vector X(P P , ) = X ( P . )


r I
I
points Pi and Pr that,

cos0
sine

- X(P

l ,i
l,i

between the two

X(P P . ) = ( K 1 . s i n Z s i n r . , K 1 . s i n Z c o s r
1 .cosZ
r I
r rl
ri
rl
r rl
ri
r i V K rr l
ri

is a scale factor.
r
I
From (2.631, (2.64) and X ( P r P ) = X ( P i ) E I
where

yi
z
L
i,

Xr

-sine

yr
- zr J

which

( P )
r
T
(P
I=3 r
I=2

X
I

2,r
CO&
2,r

shows

-sine
-sine
cos0

that

one

l,r
l,r

COS^

sine

cos0

1 .sinZ s i n ( 8
+r . )
r r~
ri
3 , r r~
K 1 . sinZ .cos(8
+
.
COS^
sine
r rl
rl
3 , r r~
l,r
2,r
K 1 .cosZ .
S i ne
l,r
'. r rl
rl

2,r
2,r

l,r
can

start

~ Y r ~ Z r 9 e l , r ' e 2 , r 9 e 3and
,r

I
X ( P ) E follows then that
r
I

l,r

COS^

computing

2,r

"K

coordinates

once

the

seven

= Ayr = Az

(2.65)

parameters

are fixed. Hence, a legitimate choice for defining an S-

K,.

system would be
Ax

= A0

l,r

= A0

2,r

= A8

3,r

= Aln K

= 0

(2.66)

Since (2.65) generalizes the first two equations of (2.50), linearization of (2.65) would give us the
three dimensional analogue of the first two equations in (2.59). But this is of course only half of the
story. We also need to know how the last two equations of (2.59) read in three dimensions. For scale
this is trivial:
A n

K .
I

= A n l

. - A n l.
I

+ Aln

(2.67)

To find the corresponding transformation for the orientational parameters though, we need to know
in point Pi are affected by differential
el,i9e2,i9e3,i
and K
Since we can rule out
changes in the seven parameters x r ,y r , z
r
r9el,r9e~,rye3,r
differential changes in the scale- and translational parameters, this leaves us with the problem of
in terms of the
finding a differential relation which expresses the h e l , d e 2 , J e 3 ,
how the orientational parameters

observables and the parameters A B

1,!9Ae2,r9Ae3,r.
Let us assume that the non-linear relat~onreads

where matrix K only contains functions of t h e observables.


With (2.631, i t then follows from (2.68) t h a t

Linearization gives,

(2.69)
Since the first t e r m on t h e right-hand side of (2.69) only contains observables, it is t h e second t e r m
we a r e really interested in. In components (2.69) reads then
t

= ( observables )

(A01,i,A02,i,A93,i)

- ' 2O, r 1

e1

an

'OS

'

1, r

cos g

1
0
c0s(e2

(2.70)
We a r e now in t h e position to collect our results. From (2.701, (2.67) and t h e linearized expression of
(2.65) follows t h e t h r e e dimensional analogue of (2.59) a s

..
0

-2

0:

.....
....

-z

ri
0

cos0
sing

2,r

l:(y .sine
+X cos0
) 0
rl
2,r r i
2,r

0:

cos(g

0:tang

2,i

-g

(-z

2,r

.. -1l , i s i n ( g 2 , i g 2 , r )
O.cos g
sin(g
g
0

l,i

2,i

2,r

ri

(z

(-y

ri
0

ri
0

ri

-cos0

l(-sing
) O

cosg
cos0
cos0

1,r
0.

1,r
0

2,r
0

1,r

sin(g

sing
cosg
cos0

2,i
0.

2,r

+y

ri
0

2,r

-X

ri
0

sing
sing

1,r

. O

. ri
.X

. O

l,r

Iyri
0

. O

+X cosg
sing
).z
2,r r i
1,r
2,r
ri

-g

...
....
..

2,r
0

.O

)
0

+tang
cosg
cos(g
g
)):o
l,r
l,i
1,r
2,i 2,r
-1 0
0
0
0
cos
cos9
cos(g
-9
)
.O
'1,i
1,r
2,i 2,r

Thus i f the unknowns i n the linear model of the three-dimensional network are ordered l i k e

...

.. .

. ..

the linear model's nullspace would be


A
A
A ln .
),
A0
(
Ax . A y A z
I
1, i
I
spanned by the seven columns of the m a t r i x on the right-hand side of (2.71). F r o m this the w i t h
choice (2.66) corresponding S-transformation easily follows.
N o t e t h a t so f a r we made no reference to the gravity field, i.e. the theodolite frames are allowed t o
assume any arbitrary atittude i n space. O f course i t is likely then, l i k e it was i n the two-dimensional
case, t h a t one has no special interest i n computing the orientation- and scale unknowns. I n such cases
one would probably reduce these unknowns f r o m the model, which would leave one w i t h only
coordinates. And then transformations l i k e (2.56) w i l l do.
L e t us now assume t h a t i n addition t o the horizontal- and vertical direction measurements r.. and Z..
IJ
IJ'
and pseudo-distance measurements 1.. we also have the disposal of astronomical latitude @
IJ'
longitude A
and azimuth A . . We then need to introduce t w o new orthonormal triads:
I J

3'

The earth-fixed frame


X
f

EI,

1=1,2,3;

points towards the average terrestrial pole (CIO),

1 = 3

points towards the line of intersection of the plane of the average terrestrial

E I=l

equator and the plane containing the Greenwich vertical and parallel t o

I=3'

T ( P . 1, I= 1 , 2 , 3 ,
i n point Pi;
I I
points towards the local astronomical zenith,

The local astronomical frame


f
f

completes the righthanded system.

'E I= 2
4'

TI ,3
T
I=2

points towards north,


points towards east.

T I=l

I f we assume t h a t the theodolite frames are levelled, then the following relations between the four
triads E

where

I'

EI,

TI(Pi)

and * T ~ ( P ~ ) hold:

, and

a, B and y are small r o t a t i o n angles.

From (2.72) follows t h a t


t
t
P ; ( ~ , B , ~= ) R ( a r , h r ) R ( A r i - r r i ) R ( 8
Linearization, under the assumption t h a t a

'8'

Aa
AB

eS

= y

= 0,

-cos @

)R(81,r,8

2, r

0 "

A8
A8

- s i n Qr

A8

gives

cos A
r
r
o
o
-cos @ s i n A
r
r

r
o
-cos A
r

AY

aO =

sin A

e
r

3, r

l,r

2,r
3,r,

@P

(2.74)

and
in (2.71) by
and A i With
'1, i
'2, i
(2.74) we then find t h a t for large t h r e e dimensional networks in which also astronomical latitude,

Since

= y0 = 0

we can replace

longitude and azimuth a r e measured, (2.71) generalizes to

(-z
( z

O O O

cos

-1

s sin(^.-^ )
I

cos

cos@ sinA +y
sin@ )
r ri
r i
r
r

cos@ cosh -X
sin@ )
ri
r
r
ri
r

-1

cos@ cos(A -A )

-cos@ sinA
r
r

..
.. r i
0

'ri

where we have denoted the f i r s t column vector on the right-hand side o f (2.71) i n which i t says
"observables",

by

one may wonder why there are s t i l l seven degrees of freedom. Aren't the

When viewing (2.75)


@

and A.. supposed t o take care of the rotational degrees of freedom? The reason f o r this

hi

IJ

apparent discrepancy is of course that the network's point configuration and fundamental directions
are described w i t h coordinates referring t o the frame E*, which is essentially an arbitrary one. We
have chosen f o r this approach because i t enables us t o describe the most general situation, i.e.

it

allows us t o introduce any reference system we like. That is, we do not r e s t r i c t ourselves beforehand
t o those reference systems which might be the obvious ones t o choose because of the available
@

S,

and A.. But, would one aspire a f t e r this more conventional S-system definition, then
IJ'

decomposition formula (2.75) is easily modified. To see this, l e t us consider the t w o dimensional
situation. Assume t h a t azimuths A.., horizontal directions r.. and distances 1.. are observed. B y taking
11

11

IJ

the general case of describing the network i n an arbitrary system (see figure 22) we get f r o m
linearizing

yi

= y r + ~1

ri

cos (Ari-

a)

= A . - r
+ n - a
rI
ir
InK = InK

that

where the upper indices (r,//) indicate t h a t these coordinates are computed i n the S-system which is
defined through fixing the point P r

(A x r

= Ayr

= 0)

the scale parameter

a )
the orientation parallel ( i f a O=0 ) t o the north direction ( ~ =O
F r o m decomposing (2.77) l i k e

(A l n ~ = O ) and

Ax.

'Ax

A Y I.

'ri

A Y I.
=

A0

A llx

Allx=O

Aa

Aa = O

r-

0 .

ri

ri
-1

'ri
0

' ~ a
A llx

follows t h a t t h e reference systems one usually considers when azimuths and distances a r e observed,
a r e of t h e (//)-type. They a r e defined through Aa = 0 , A l nc =O.

Thus, although i t is usually not

explicitly stated, t h e conventional S-system chosen when azimuths and distances a r e measured, is:

figure 22
In t h r e e dimensions (2.79) generalizes t o

and it will now be c l e a r t h a t t h e usual phrase "astronomical latitude, longitude and azimuth t a k e c a r e
of t h e rotational degrees of freedom" essentially means t h a t one has fixed t h e orientation of t h e
reference system through Aa = A B = Ay = 0 .
From (2.75) follows t h a t t h e with S-system (2.80) corresponding decomposition is given by:

The corresponding S-transformation is then easily found from bringing t h e second t e r m on t h e righthand side of (2.81) t o t h e left-hand side (see also Teunissen, 1984a). N o t e t h a t since
(rsll)
A B ( r , l l ) = Ay (
) = 0 , one c a n replace A B r l l and A B 2( ,r , / / ) in
Aa
l,i
) and AA ( r , l l )
(2.81) by respectively A@ (
i
Instead of (2.80) one could of course also consider still other types of S-system definitions. One could
f o r instance t a k e t h e restrictions given by (2.54). The orientation of t h e earth-fixed f r a m e
the

directions
of
the
local
verticals
are
then
given
by
respectively
And if one replaces
A a ( r , s ; t ) , A B ( r , s ; t ) 9 AY ( r , s ; t ) and A 8 1( ,ri, s ; t ) " 2 , (i S ; )
t h e cartesian coordinates in e.g. (2.75) by geodetic coordinates and t h e direction unknowns
A0 l ,

A8 2 ,

by t h e deflection of t h e vertical components

S i = A81,i

5 ,q

through using
0

q i = (A0 2 , i - A AL. ) c o s @ i

'

one c a n show t h a t also t h e following s e t s of restrictions a r e legitimate choices for defining a n Ssystem:
(a)

c r = q r = A A r s = 0 ,
( G : A8

l,r

= A8

2,r

(b)

Aa = A B = Ay = 0 ,

(c)

A a = A B = AA

= AArs

= 0,

A+r
= 0 , A+r

AQr

= AAr

= A h r = 0 , A l n ~= 0

= A A

=Ah

= AA

r
r
r

= Ah
= Ah

r
r
r

= 0 , A I ~ K= 0 )

= 0 , AlnK = 0
= 0,

A l n ~= 0

(2.82)

(see also Strang v. Hees, 1977; Yeremeyev and Yurkina, 1969). And in this way many m o r e s e t s of

necessary and sufficient restrictions can be found. Note t h a t also the geodetic coordinates should be
given an upperindex r e f e r r i n g t o the S-system through which they are defined.
I n principle of course there is no need f o r introducing deflection of the v e r t i c a l components. F o r
computing three dimensional networks one can just as well do without them. Due, however, t o the
f a c t t h a t many existing large networks lack the necessary zenithdistances one has preferred i n the
past the classical method of reductions t o a reference ellipsoid and computation by means of
ellipsoidal quantities t o the more theoretically a t t r a c t i v e spatial triangulations of Bruns and Hotine
(see e.g.

Hotine, 1969; Torge and Wenzel, 1978; Engler e t al. 1982). Instead of solving the height

problem by using zenith distances one resorts t o the astrogeodetic (or gravimetric) method. The
problem of the network computation is then split i n t o t w o nearly independent problems, namely the

(a)

41

,X -

problem, and the

(b)

5 ,n

,h

- problem

The procedure followed is i n short the following (see also Heiskanen and Moritz, 1967). One starts by
defining a three dimensional S-system (geodetic datum). Usually one takes the datum given by (2.82.b)
or (2.82.~). Using the approximate information available on

{(P,

X;,

hp,

@g,A:

one then reduces

the observed angles and distances t o the ellipsoid and computes on it the geodetic coordinate
increments A @

A f t e r having solved f o r (a), one enters the solution of (b) where new heights

and new deflections of the v e r t i c a l need t o be determined based on the new ellipsoidal values of '$
and X

With these new values the whole procedure is repeated. One can consider this i t e r a t i o n
i'
procedure as a block Gauss-Seidel type of iteration where a linear system

is solved i t e r a t i v e l y as

A practical point o f concern is, however, the reduction procedure. I n many cases the necessary
gravity field information, needed t o perform a proper reduction of the observational data, is lacking
(see e.g.

Meissl, 1973; Teunissen, 1982, 1983). B u t i f the necessary gravity f i e l d information is

available, the classical method of reduction t o the ellipsoid can be seen t o be formally equivalent t o
the t r u l y three dimensional method and both methods, i f applied correctly, w i l l give the same results
(Wolf, 1963a; Levallois, 1960). Hence, the final iterated solution of the classical method f o r the
network's shape w i l l be free f o r m any deterministic effects of the arbitrarily introduced datum. The
intermediate solutions of the i t e r a t i o n procedure, however, do theoretically depend on the choice of
datum. It is gratifying t o know therefore, as has been shown i n subsection 2.2, t h a t these e f f e c t s are
practically negligible.

3. (Free)networks and their connection

3.1. Types of networks considered

Now that we have given representations o f Nu(A) in various situations we can start discussing the
problem o f connecting geodetic networks.
I n principle this problem is not too difficult. Essential is t o know the type o f information the t w o
networks have i n common. Based on this information one can then formulate the appropriate model
and p e r f o r m the adjustment.
As t o the methods o f connecting geodetic networks one can distinguish between three solution
strategies. Two o f them need the parameters, describing the t w o separate adjusted networks, while
the t h i r d method starts f r o m the assumption t h a t the original observation equations (or rather t h e
reduced normal equations) are s t i l l available.
I n the f i r s t method (method I ) use is made o f condition equations. The idea is t o eliminate f i r s t a l l
non-common information f r o m the t w o sets o f parameters describing the t w o separate adjusted
networks.

This can be done by means o f an appropriate S-transformation.

The so transformed

parameters are then finally used on an equal footing i n the method of condition equations.
It is curious t h a t this method has found so l i t t l e attention i n the literature. We only know o f a few

areas where i t is applied (see e.g.

Baarda, 1973; or Van Mierlo, 1978). A n explanation could perhaps be

the general aversion one has f o r the method o f condition equations since it is known t o be
cumbersome i n computation.

However, f o r our present application of connecting networks this

argument does not hold. On the contrary, the method can i n many cases be very tractable indeed.
The second method (method 11) is essentially the counterpart o f the above mentioned method. I n this
method one starts by determining the transformation parameters. This is done by means o f a leastsquares adjustment. A f t e r the adjustment one then applies the transformation parameters t o obtain
the f i n a l estimates o f the parameters describing the t w o connected networks.
Method I1 seems t o be very popular w i t h those working on the problem o f connecting satellite
networks w i t h t e r r e s t r i a l networks (see e.g.

Peterson,

1974). A serious shortcoming o f most

discussions on this method is, however, t h a t often the starting assumptions are n o t explicitly
formulated. As we w i l l see this may avenge i t s e l f on the general applicability o f the method and also
may a f f e c t the interpretability o f the transformation parameters.
Finally the t h i r d method (method 111) makes use of the so-called H e l m e r t blocking procedure. I t is
therefore essentially a phased type of adjustment, applied t o the original models o f the t w o
overlapping networks (e.g. Wolf, 1978).
Usually when one applies this method one starts f r o m the principle t h a t both the reduced normals are
regular, thereby suggesting that the t w o overlapping networks have no degrees o f freedom a t all. F o r
a general application o f the method, this is o f course a too restrictive assumption t o s t a r t with. We
w i l l therefore have t o show how the method applies i n the general case.

F r o m the above f e w remarks i t w i l l be clear that we feel t h a t a t r u l y general discussion o f the


problem o f connecting geodetic networks has not yet been given i n the literature. E i t h e r the

assumptions are too r e s t r i c t i v e t o render a general application o f the methods possible or they are
not too precisely formulated.
F o r a proper course o f things l e t us therefore s t a r t by stating our basic

Assumptions

F i r s t consider the original models. We assume that the f i r s t network is described by the linear(ized)
model

: A2

mxn

mxn

= ( A

A.
mx l

::.l)
Q,
Y

w i t h dim.

Nu(A, : A 2 )

= q,

withdim.

Nu(il:i3)

(n+n ) x l

and the second by

= (

A;

mx l

i
1

A;

: i3
)

mxn

Q-,

i2

q.

mxn

(n+n ) x l
3

We further assume t h a t the second network, apart f r o m some additional degrees o f freedom, has the
same type o f degrees o f freedom as the f i r s t network. This means that we assume the nullspace o f
(3.1.l.a)'~

normal reduced for

reduced f o r A

X ,

Ax2

t o be a proper subspace o f the nullspace o f (3.1.l.b)'~

normal

i.e.

w i t h the projectors P

-1 t -1
= I-A ( A ~ Q - ~ A A~ )Q
and
2
2 Y
2 Y

2
And finally we assume that

AP

I-A

>
with r -

t -1- l - t -1
( A Q- A ) A Q3 y
3
3 y

and

r x l

Since some o f the derivations and formulae i n the next section become quite elaborate, we w i l l use
f r o m t i m e t o t i m e the following

Example

as reference t o exemplify our results:


The f i r s t network can be thought o f as being a planar network determined f r o m distance

-,

astronomical azimuth

and angle measurements. And the second network can considered t o be planar

w i t h magnetic compass readings and angle observations only.


(A x1 ,A x 2 )

I f the parameters

increments only, then

NU(A :A
1
2

( A ; ~ , A; 3 )

and

are assumed t o contain cartesian coordinate

.. .. ..

.. ..
1 0

R(

0 1

1 0

N ~ ( A.A- ) = R (
1
3
O

) , w i t h q = 2 and

0
X.

0I )

withq=3.

.. .. ..i

.. ..

The second network has namely apart f r o m the t w o translational degrees o f freedom also an
additional freedom o f scale.
Furthermore, transformation (3.1.1.d)

would then be characterized by

with r = 4

and the nullspaces of the reduced normals by

Finally, using the decomposition

R(V ) = R ( ( V ) 1
1
1 1

CJ

R((V1)2)

= R ( ( v ~ ) ~CJ ) Nu(F3Al),

...

>

with

R((V ) )
1 1

= R(

we can i d e n t i f y the A pl

'i
o ) and
-X

1 0

i
I
0
R ( ( V ) ) = R(
1 2
0 1 Yi

t
parameter o f A p = (A p1 A

1,

.. .. ..

pi
)

as a r o t a t i o n angle and the A p 2

parameters as respectively t w o translational and one scale parameter.

3.2.

Three alternatives

Since the above mentioned f i r s t t w o methods are closely related we w i l l discuss them together.
Method Iand I1
Both methods are applicable i f the parameters, describing the two separate adjusted networks and
their covariances are available. Thus we assume given (see figure 23):

with

(3.2.1)

S = R ( S ) complementary t o

Nu(A1 : A 2 )

and

f i r s t network

5:

= ~ ( 5 complementary
)
to

NU(A :A )
1
3

second network

figure 23

Our

goal

is now,

to

solve

,Ai2(s) ,Ai3(S))

for

the

transformation

parameters

and the

increments

Here we i m p l i c i t l y assume t h a t we wish our results t o be

expressed i n the same coordinate system as t h a t of the f i r s t network. F o r our example i n subsection
3.1 this means t h a t we wish our results to take the scale and orientation of the f i r s t network. This is
a sensible choice since the f i r s t network contains by assumption more information than the second

Nu(P2Al)

c N U ( P ~ A ~ ) I B u t i f one so desires one could also proceed otherwise, viz.

by

adopting the orientation of the second network.


We believe, that f o r explanatory purposes method I best shows the principles involved i n connecting
networks. L e t us therefore first, before we proceed w i t h the actual solution strategies of the two
methods, consider the following simple b u t general enough situation. We assume to have measured
two overlapping planar networks. And furthermore we assume t h a t f o r both networks we have the
disposal of distance

-, azimuth and angle observations.

When adjusting the two networks separately

we thus need t o take care of the i n both cases existing translational degrees of freedom. B u t as we
know f r o m the previous section this can be done i n very many ways. The simplest way being to f i x
just one network point. Having done this we thus finally end up w i t h t w o sets of coordinates each

describing one o f the t w o separate adjusted networks. How are we now t o compare these t w o
coordinate sets? N o t by blithely comparing the coordinates o f corresponding networkpoints f o r these
were introduced i n a rather arbitrary way. I n general namely, the t w o fixed networkpoints w i l l be
different ones. I n fact, even i f one would have fixed the same networkpoint i n both networks, one s t i l l
should exercise great care. This is because the numerical values assigned t o the fixed point need not
be identical f o r both networks. Now i f we disregard this possibility f o r the moment and assume that
the same set o f approximate coordinates are used f o r linearizing the observation equations o f both
networks, we would have the inequality

- parameters cannot be compared directly. B u t we know


1
already f r o m t h e previous section that one can easily take care o f this discrepancy by applying the

That is, the t w o sets o f adjusted A

appropriate S-transformation. This S-transformation should enable us then t o compare corresponding


coordinate differences.
Now l e t us change the situation slightly and assume that the azimuth measurements o f the f i r s t
network are o f the astronomical type and those o f the second network follow f r o m magnetic compass
readings. Then we would have

even if

S = S.

The reason being o f course t h a t the f i r s t network is orientated w i t h respect t o astronomical n o r t h and
the second w i t h respect t o magnetic north. Thus the only information the t w o networks have i n
common is o f the distance- and angular type. But again we can take care o f this discrepancy by using
the appropriate S-transformation, namely one that eliminates the azimuthal information f r o m both
networks.
Finally we complicate the situation a b i t f u r t h e r by assuming t h a t the second network lacks distance
measurements, i.e.

lacks scale. I n this case we are i n the situation as described by the example o f the

previous subsection 3.1.,

because both networks then s t i l l have their translational degrees o f freedom

but now the second network also has an additional freedom o f scale. I n this case we thus certainly
w i l l have the inequality

irrespective the choices made for


S and

3.

B u t as w i l l be clear now, one can again overcome this discrepancy by using the appropriate S-

transformation, namely one which reduces both networks to ones of the angular type.
Summarizing,

8 i 5 ) may

we can conclude f r o m the above discussion that although the causes f o r the

incompatibility of

and A

be different, one can always f i n d the appropriate S-

transformation t o eliminate this discrepancy. And i n view of our general assumptions (3.1.1)

i t follows

t h a t an appropriate S-transformation would be:

This would give us then

or equivalently

I f the situation as sketched i n the example of subsection 3.1 applies, (3.2.3)

reads i n cartesian

coordinates as

The equivalent formulation (3.2.4)

represents then an independent set o f n-4 angular condition

equations

or a set of n-4 linear equations which is i n one-to-one correspondence t o such a set of n-4 angular
condition equations.

Some authors have expressed their hesitation towards the above described procedure f o r using Stransformations. They argue that by using an S-transformation which eliminates e.g.

the available

azimuthal and scale information, one eliminates information which is important i n i t s own right. This,
however,

is i n our opinion a missappreciation of the concept o f S-transformations.

transformation is i n the f i r s t instance only applied t o obtain the equality (3.2.3)

or (3.2.4),

The S on which

then the adjustment f o r connecting both networks is based. After the adjustment one can then
always, i f so desired, transform the adjusted coordinates back t o one of the original coordinatesystems. I n the above example f o r instance one can always transform back t o the system o f the f i r s t
network, the one that contains scale- and orientation information.
Now l e t us consider the actual solution strategies of the t w o methods I and 11. We w i l l start w i t h
method I.
Although it is customary i n the l i t e r a t u r e t o s t a r t f r o m modelformulation (3.2.3),
t o be explained, w i l l s t a r t f r o m modelformulation (3.2.4).

we, f o r reasons yet

Straightforward application o f the least-

squares algoritm for t h e method of condition equations gives then

This formulation of t h e least-squares solution of method I is however not yet in concurrence with t h e
formulation one usually finds in t h e l i t e r a t u r e (see e.g. Baarda,1973,p.125 or Van Mierlo,1978,p.9-26).
We t h e r e f o r e have t o r e w r i t e (3.2.5) a bit. For this purpose t a k e t h e following abbreviation

Since R ( A ) = R( 5
-

i t follows, if B denotes a n arbitrary inverse of A, t h a t AB is a projector which

projects onto R(S ) and along a complementary subspace.


Hence

t
F r o m premultiplying this expression with v l ( V 1 ( Q

Hence, if we use t h e customary notation

-(:l

d
and

:= P

l
R(?),R(v,)

we c a n r e w r i t e (3.2.5) a s

(S)
(Aa,

+ Qr

)v1)

t
V1

follows then

To finally transform the adjusted parameters (A::')

v-

t o the coordinate system o f the

f i r s t network, we need t o determine the transformation parameters A;.

with

R ( S ) complementary t o

l.

R(V1).

F r o m (3.1.1.d) follows t h a t

Hence the transformation parameters are easily found

through

Summarizing, we can thus w r i t e the solution of method Ias:

- c S 1 ,with
d
d

an arbitrary inverse of

This is also t h e solution one can f i n d i n (Baarda, 1973) although t h e r e t h e r e s u l t is derived under t h e
m o r e r e s t r i c t i v e assumptions t h a t

When comparing (3.2.10.a)

Nu(P2A1)

w i t h (3.2.5)

one may wonder which f o r m u l a t i o n is t h e m o r e a t t r a c t i v e

computationwise. F o r m u l a t i o n (3.2.10.a)
transformation, namely (3.2.2),
however suggested by (3.2.7)

= N U ( P ~ A ~=) R(V1).

suggests the customary p r a c t i c e o f f i r s t appying an S -

and then computing t h e inverse Q:(;).

A m o r e d i r e c t way is

and t h e method o f prolongation discussed i n sections 4 and 5 o f chapter

11.

v~(v:(

Qlis)

Ql(s)
+ Q;
1
computed f r o m

of

iW i t h
.
i
+Q;

E)

E)

Nu(P A ) = N U ( P ~ A ~=) R(V1)


n 2 = n 3 = 0 and
2 1
) v ~ ) - ~ v : is
a
symmetric
minimum
rank

Note, t h a t i n t h e special case o f

our expression (4.5) o f chapter I1 follows then t h a t (3.2.7)

3,

S =

inverse
can be

(3.2.11)
I n t h e general case t h a t
rank
Q

inverse
S

Q;

of

iS )

Nu(P2A1 ) c Nu(P3A1 )

R(V1

, (3.2.7)

S ) + Q; E ) . Instead
it
becomes
i
i
. W i t h (5.21) o f chapter I1 follows then
Q

w i l l cease t o be a m i n i m u m
a

constrained

inverse

Thus, since a representation o f R(V1) i s usually readily available, we see t h a t instead o f (3.2.10.a)
can also use f o r m u l a t i o n (3.2.5)

w i t h (3.2.7)

computed v i a (3.2.11')

I
= VIAp

one

(or (3.2.11)).

N o w l e t us consider method 11. I t s model f o r m u l a t i o n is t h e parametric counterpart o f (3.2.4)


reads as

of

Usually t h i s model w i l l c o n s t i t u t e t h e d i f f e r e n t i a l s i m i l a r i t y t r a n s f o r m a t i o n

(3.2.12)

and

e.g. when combining doppler networks with terrestrial networks (Peterson, 1974). However, since t h e
common unknowns of the two overlapping networks need not be restricted t o coordinates, relation
(3.2.12) could be a kind of modified differential similarity transformation such a s for instance (2.81).
In f a c t , relation (3.2.12) need not be r e s t r i c t e d t o t h e differential similarity transformation a t all. It
could for instance also include additional "transformation" parameters which describe projected
geophysical hypotheses in a deformation analysis. O r i t could include, say, a refraction model.
When we solve for (3.2.12) we immediately notice a difficulty which is often overlooked in t h e
literature. Namely, t h a t t h e covariance sum Q R1 ( S ) + Q i l ( ;) can turn out t o be singular. Assume
for instance t h a t

S = R ( S ) is complementary t o

Nu(P3Al ) and t h a t

c S . Then Nu(Qpl (
will exist. One could of

= R ( 3 ) is complementary t o

+ Q: (; ) ) 6 { 0} and no ordinary inverse of


X1
course ask oneselves then whether i t is possible t o

Q+]
t a k e a generalized inverse of Q R ( S ) + Q i l ( )
+

,3

Nu(P2A1)

. In some cases this is possible. We will refrain

however from f u r t h e r elaboration on this point, since if one really insists on using (3.2.12), one can
e i t h e r transform one of t h e covariance m a t r i c e s by means of an appropriate S-transformation so t h a t
t h e sum Q* ( S ) + Q: ( 5 )
becomes regular again, or, what is more practical, add t h e matrix
I I
1
1
( v ~ ) ( v ~t )o ~ Q * i s ) + Q; f 5 )
The solution of (3.2.12) follows then from straightforward

application of t h e least-squares algorithm. To show t h e close relationship with solution (3.2.10) we


will make use of a slight detour.
F i r s t consider t h e transformation parameters. With t h e aid of (3.2.5) we can w r i t e (3.2.9) a s

(3.2.13)
And since we have t h e projector identity

- v

I - ( Q ( S ) + Q X, ( ; ) ) v l [ v ; ( ~

R1

it follows from (3.2.13) t h a t

In a similar way one can prove t h a t

R1

Q:(s)
X
1

v 1L

Summarizing, w e c a n t h u s w r i t e t h e solution of m e t h o d I1 as:

F o r t h e s p e c i a l c a s e t h a t Q 2 ( S ) + Q: ( E ) itself is regular, (3.2.15) without t h e additional t e r m


1
1
(v~xv:)' is t h e solution o n e usually finds c i t e d in t h e l i t e r a t u r e (e.g. A d a m e t al., 1982). However, t h e
necessary r e l a t i o n w i t h
N o t e by t h e way t h a t
of (3.2.7).

Nu ( P A ) and N ~ ( PA ) i s usually n o t made.


2 1
3 11
Qp ( s ) + Q; ( i ) + ( V 1 ) (V:)
t , i s a s y m m e t r i c maximum rank inverse
1
1

F o r t h o s e who a r e used t o thinking in t e r m s of S-systems, i t m a y c o m e a s a surprise t h a t o n e is


allowed t o simply add t h e c o v a r i a n c e m a p s Qpl(

and Q:

( 1 of c o o r d i n a t e s defined in

different S-systems. T h e reason i s t h a t t h e t r a n s f o r m a t i o n p a r t A e t e r s A p in model f o r m u l a t i o n


(3.2.12)

a l r e a d y t a k e c a r e of t h e possible discrepancy b e t w e e n t h e t w o S-systems.

This brings us t o another important point, namely t h a t o f the interpretability o f the transformation
parameters

AF.

A shallow study o f (3.2.15)

might convince us that a l l transformation parameters

are estimable and t h a t one is allowed, i n the context o f testing alternative hypotheses, t o test
whether some or a l l o f the transformation parameters are significant or not. Here, however, one
should exercise great care. I n particular one should be aware t h a t one can not test whether an
arbitrary linear function o f the transformation parameters,
-(S)
H : Aft1

- Ax? ( S )

= VIAp,

Ax 1

= VIAI
P,

against
-(S)
HA: AR1

c
Ap say, is zero or not, i.e.:
lxr r x l

t
c Ap = 0 ,

c t Ap

0.

The reason is that, i n the general case we are considering here, one cannot t r e a t a l l transformation
parameters on an equal footing. I n case o f our example o f subsection 3.1,
orientational parameter Ap

1
Finally we l i k e t o point out the great resemblance between (3.2.10)
only d i f f e r
( A ~S )I

and (3.2.15).

i n their p r d e r o f Acomputing the transformation parameters


A ~ $ S
)

A'!;

f o r instance, only the

is eligible for a test l i k e above.

, A 2f

'

))

The t w o methods
and increments

Hence, i n principle no preference can be given t o either

method, unless one chooses on the basis o f computational convenience. One can argue namely t h a t
method I is t o be preferred since it only needs the inverse o f the covariance m a t r i x of the difference
vector

;(')or

(3.2.7),

whereas

both Q 2 ( s ) + Q t ( 3 ) + (v:)(v:)~
1
1
L e t us now consider

method
I

a n d (V1)

I1

needs

( ~ ~ ( +s Q
) ;(i)+
1
1

the
inverse
of
I I t - 1 I
(V1)(V1)
) (V1).

method III
The Helmert-blocking method is essentially a phased type of adjustment applied t o a second standard
problem formulation. Instead o f performing the adjustment i n one step, the original set o f observation
equations is divided into t w o groups, each describing one o f the t w o overlapping networks. A f t e r
having formed the corresponding normalsystems one then reduces t o obtain the reduced normals
pertaining t o the common unknowns o f the t w o networks. Through inversion o f the sum of these
reduced normals one solves f o r the f i n a l adjusted values o f the common unknowns. The remaining
unknowns are found by means o f back-substitution (e.g. Wolf, 1978).
I f we reduce f o r the A x 2 -parameters, (3.1.l.a)'~ normalsystem becomes

Hence as a solution o f (3.1.1.a)

we have

~ 1 : ~ =) ( A t Q -1A ~ ) - ~ A ~ Q " ( -A A~l A 2 j S ) ) ,


2 Y
2 Y

z0

with

t -1
-1 t -1
A2Qy , and
A (A Q A2)
2
2 Y
S = R ( S ) complementary t o Nu(P2A1)

P2 = I

I n a similar way we find for (3.1.1.b)

2'

Ax

= (

the solution

t -1- l - t -1
Q- A ) A Q - ( A y
3 Y
3
3 Y

- A1

with

- l - t -1
A 3 Q y - , and
3
3 Y
= R ( S ) complementary t o NU(P A
3 1

P,

= I

A (A t Q--1A,)

'Thus the reduced normals of (3.1.1.a)


and (3.1.1.b)
pertaininq
- t o the common unknowns are
t -1
t
respectively N = (P2A1) Q
(P2A1) and i
1
= (P3A1) Q - (P3A1).
However, i n view of
1
Y
Y
our assumptions (3.1.1) we cannot simply add them together yet. What we need is a slight
modification of one of the t w o reduced normals N1 and P?

, such t h a t relation (3.1.1.d)

is taken care

of. That is, i f

and

we either need t o modify N1 w i t h the aid of R( ( V )


t o an N; w i t h
I
1 3
I
w i t h the aid of R( ( v 1 )
t o an G1 w i t h
hlu(ml)
= R (V1)

NU(N')
1

= R(Vl)

, or i1

F o r our example of subsection 3.1 such a modification of N1 would mean t h a t we eliminate the scaleand orientational information of the f i r s t network. And likewise, elimination o f the orientational
information

of

the

second network

would

correspond t o

= R(V1).

Nu(fil)

modifying

ilt o

an

with

Since by assumption the f i r s t network contains more information than the second, we w i l l opt f o r
modifying

F o r our example this means t h a t we eliminate the orientation o f the second network
1
i n favour of the astronomical orientation o f the f i r s t network.

The modified reduced normal

fi 1

we are looking f o r w i l l thus be the reduced normal of the relaxed

model

6x1

i x n i x n

mx(r-q)

(n+n + r - q ) x l
3

And since the solution o f this relaxed model reads as

-3

-t

-l=

-1

-l=t

I - A ( A Q- A 3 ) A Q- , and
3 3 Y
3 Y,
I
= R ( : ) complementary t o NU(P A
= R ( V ~2 N U ( P ~ A ~
=

3 1

fi1

the reduced normal we are looking f o r is given by


(3.2.18)

is merely obtained f r o m relaxing (3.1.lb)

-1 =
Note that since
Q- ( p 3 A 1 ) .
Y
the t w o solutions (3.2.16.b) and (3.2.18)

= (P3A1)

t o (3.2.17)

w i l l be related by an appropriate S-transformation. We have f o r instance

we can proceed w i t h the H e l m e r t 1


blocking method and add the t w o reduced normals t o solve for the common unknowns. The remaining

N o w that we have the appropriate reduced norrnals N1 and

unknowns are then found through back-substitution.


A l l i n a l l the final solutions reads as:

t -1
-1 t -1
(A Q A2)
A Q
(Ay
2 Y
2 Y

A (

- l - t -1
( A ~ Q ~ ~ AA Q~ )( A
3 Y
3 Y

t
t t
and
AV = ( A x Ap )
2
1

A ; ~

;-

= (A3:-

-(S)),
A A;
1 1

8'

A
))
1 1

, which can be decomposed by means o f

(V')
1 1 1

(V ) )
3 1

into

t - 1 - I -1 I t t -1 = - ( ( v ' ) ~ ( P A ) Q- ( P ~ A ~ ) ( v ~ )( ~v )~ ) ~ ( P ~Q-A ~( A) Y
1 1 3 1
y
Y

- t -1- - l - t -1
A t (3s ) = ( A Q- A 3 )
A Q- [ I
(Al(v;)l+~3(v;)l).
3 Y
3 Y
I t - t - l
-1 It
t
( ( V ) ( P A ) Q- ( P A
( v ~ ) ~ ( P ~Q; A l
~)
)(A;
1 1 3 1
y
3 1

Al~:is)),

w i t h S = R(S) complementary t o

Thus i f we take the customary abbreviations

Nu(P2A1 )

A~A:~('))

t -1
N = ( p A ) Q (P A )
1
2 1
y
2 1

t -1 - ) Q- (P3Al),
3 1
Y

(P A

R,

= (B3Al)

Q;

-1 = (p3A1),

and

(3.2.20)

we can summarize the general procedure of method 111 as:

a)

Reduce the normal systems of the t w o original models (3.1.1.a)

t o the

b)

NIA X
= A n and ilA;
= A"
1
1
1
1'
I
Relax the reduced normal system of the second network w i t h the aid of (Vl ) l :

c)

Add the reduced normal system of the f i r s t network:

d)

B y means o f further reduction one gets

common unknowns:

and (3.1.1.b)

w i t h the solution:

e)

The remaining unknowns are found through:

I n the aaove approach t o the H e l m e r t blocking procedure we have seen that, as a consequence of our
general assumptions (3.1.1),

the reduced normals N1 and

are singular. Hence, i n general one can


1
not start f r o m the principle t h a t both reduced normals are regular, unless l0
there are no degrees of
freedom involved, which is highly unlikely, or 2' one assumes t h a t the degrees of freedom are already

been taken care of before applying the H e l m e r t blocking procedure. The reduced normals w i l l namely
be regular i f for instance the S-systems of both networks are defined a p r i o r i i n their non-overlapping
parts.
The question t h a t remains t o be answered is then, whether one can s t i l l apply the procedure as
outlined i n (3.2.21).

With some slight modifications we w i l l see that the answer is i n the affirmative.

The important difference w i t h (3.2.31)

is however t h a t we shall need additional transformation

parameters t o take care o f the a p r i o r i S-system definition.


L e t us s t a r t w i t h the t w o solutions one gets when the S-systems are defined i n the t w o nonoverlapping parts o f the t w o networks.
F o r the f i r s t network one would get instead of (3.2.16.a),
( s2

-1

-1

t -1
1
= ((prpl)
(P;A~))
(P;A~)
QY A Y
(
1
(s2)
t t -1
t t -1
A; s 2 ) = S ( S A Q A S )
S A Q ( A y - AIAG1
) , with
2
2
2 2 y
2 2
2 2 y
t

l0
AX
2"

t t -1
-1 t t -1
A S (S A Q A2S2)
S A Q
2 2
2 2 y
2 2 Y
= R ( S 2 ) complementary t o Nu(P A ) =
1 2
S2

P;

the solution

= I

and
R ( ( 1- A

(3.2.22.a)

t -1
-1 t -1
( A Q A1 ) A Q
)A2)
1 l Y
1 Y

and f o r the second network,

-1 -,,-

-1

t -1 = ( ( P ~ A ~Q; ) ( P ~ A ~ ) )( P p l )
Q Y- A Y ,
1
- t - t -1 " (S3)
2'
A;
B ( 5 t A- t Q--1A S )
S A Q- ( A y - A 1 ~ i l
) , with
(3.2.22.b)
3
3
3 3 y
3 3
3 3 y
-1
- r,
- t - t -1- - t - t -1
= I - A S ( S A Q - A S )
S A Q - ,and
3
3
3
3
y
3
3
-3-3y
- t -1- l - t -1
P3
= R (S
complementary t o Nu(P A
= R( ( 1 -A ( A Q- All
A Q- ) A 3
3
1 3
1 l Y
1 Y
^ (:
'

l0
AX

-,,- t

1
(

..

These two solutions are easily verified by transforming w i t h the appropriate S-transformations the
t w o solutions (3.2.16.a)

and (3.2.16.b).

F o r the H e l m e r t blocking procedure we have i n the above case the disposal of the reduced normal
systems

- ,,
N Ax

N-'AX
= An"
and
1 1
1

- ,,

= An

with
t -1
N" = (P"A ) Q
1
2 1
Y (p;~~),

i
'='( P 3" i
1

and
An" = (P2A1)
1

-1
Ay
Y

A'';

= (p;Al)

1'

t - 1 --) Q- (P3Al),
Y
t -1 Q- Ay.

B u t as before we cannot simply add the two reduced normals N; and

m;

t o solve f o r the common

m.;

F i r s t t o get r i d of the a p r i o r i S 3 - system


I
definition. This w i l l give us
back. And secondly to incorporate (V1)
to get
Therefore
1'
instead of (3.2.21.b), the relaxed normal system needed, reads

unknowns. What we need is a modification of

m1

.'1

I t--(V,)

I t - I
(V1) N1(V1)

N1

o f the a p r i o r i

3 -

An 1

additional transformation parameters A p -

system definition.

B y adding NsrAx = An" t o (3.2.23.b)


1 1
1

we get

and a f t e r some reduction steps we obtain

I n a similar way as i n (3.2.21)

'

- ( V 1~t
) An
--1

Ap
I

N o t e that, contrary t o (3.2.211,

- ,,

A x1
'

we then f i n d the final solution as

are needed t o take care

(3.2.23.e)
We thus see that also i n case the S-systems o f both networks are defined a priori, one can apply the
procedure as outlined i n (3.2.21).

The important difference is however, t h a t i n the above case


Ap2

additional transformation parameters

are needed which, contrary t o

Ap

w i l l not be

invariant t o the choice o f S-systems. This emphasizes once more our earlier remark about the
interpretability o f the transformation parameters.
N o t e that solution (3.2 23.e) is es ent'ally t h sa e as solution (3.2.19) or (3.2.21). One can verify this
S
S
2es21"
by showing t h a t (Axl
A
Ax3
) transforms w i t h an appropriate S-transformation
:(S)
to ( A X ~ b i b s ) , A i i s ) ) .

I n this section we have seen how the three customary methods for connecting geodetic networks
generalize i f one starts f r o m the general assumptions (3.1.1).
As t o the f i r s t t w o methods, i t is interesting t o remark that i n the geodetic literature one usually
assumes either one o f the following t w o attitudes when discussing the problem o f connecting geodetic
networks: Either one places the whole discussion i n the context o f free networks, thereby suggesting
t h a t free networks are really something special and that they should n o t be confused, l e t alone be
compared w i t h "ordinary" networks. Or, one assumes the attitude t h a t the coordinates o f the t w o
overlapping networks merely d i f f e r by a similarity transformation, which is easily taken care o f by
estimating the transformation parameters i n a least-squares sense. B o t h attitudes are however
needlessly too restrictive. Although i n the f i r s t approach one is normally very careful i n stating what
type o f networks are involved, one usually starts f r o m the too r e s t r i c t i v e assumptions t h a t

Nu(P A ) = NU(P A ) = R(V1 )


I n the second case, however, one o f t e n neglects t o state the
3 1
2 1
basic starting assumptions. It is namely not enough t o say t h a t the t w o coordinate sets d i f f e r by a
similarity transformation. Important is, t o know what type o f networks are involved. Only then w i l l
one be able t o identify which o f t h e transformation parameters are estimable.
When reviewing the relevant geodetic literature, it is also interesting t o note that those who assume
the above mentioned f i r s t attitude usually end up w i t h the method o f condition equations as solution
strategy, whereas those who assume the second attitude usually find themselves formulating the
problem i n such a f o r m t h a t f i r s t the transformation parameters need estimation. B u t both methods
are o f course equally applicable i n principle. I n fact, the aversion which is generally f e l t towards t h e
method o f condition equations, does not apply i n case o f connecting networks, since one can argue

that method Iis more tractable computationwise than method 11. I n some cases a t least.
As to the t h i r d method, we showed how one should go about when the S-systems are defined either
before or a f t e r the merging o f the t w o reduced normals. Here also the f a c t that i n general not a l l
transformation parameters can be treated on an equal footing, became apparant.
Some authors have proposed i n the context of method I11 t o give weights t o some of the
transformation parameters. They argue that i n case o f for instance two networks which both are
known t o contain orientational information, this is a way o f deciding how much of the orientational
information of both networks is carried over to the final solution. This i n itself is true o f course, b u t
we do n o t think t h a t i n general this is an advisable way t o go about, since it has an element of
arbitrariness i n it. So f a r namely, no objective c r i t e r i u m has been proposed on the basis of which t o
decide t o follow such a procedure. I t seems therefore more advisable t o decide on the basis of
statistical tests whether or not the two networks significantly d i f f e r i n their orientation.
As a final remark we mention t h a t i n this chapter we have adopted the customary assumption t h a t
the coordinate systems i n which the two networks are described d i f f e r only differentially. I f this is
not the case then one has t o recourse t o either a preliminary transformation which make the two
networks coincide approximately or to an iteration. I n the next chapter we w i l l see t h a t i n some cases
one can do without an iteration and formulate an exact non-linear solution.

IV. GEOMETRY OF NON-LINEAR ADJUSTMENT

1. General problem statement

I n the previous chapters we were primarily concerned w i t h the linear model

As a general solution of the linear unbiased estimation problem we found t h a t the actual adjustment
problem was solved by

and the actual inverse linear mapping problem by

where B: M

-+

is allowed t o be any arbitrary inverse o f the linear map

A: N

-+

I n this chapter we take up the study o f non-linear adjustment. A problem which heretofore has almost
been avoided i n the geodetic literature. To this end we replace the linear map A by a non-linear map
y:

-+

Instead o f the linear model (1.1) we then have the non-linear model

It seems natural now t o extend our results o f the linear theory t o the companion problem o f non-

linear operators. B u t unfortunately one can very seldom extend the elegant formulations and solution
techniques f r o m linear t o non-linear situations.
I n correspondence w i t h t h e linear theory the problem o f non-linear adjustment can roughly be .divided
into (a) the problem o f finding the estimates
properties

of

the estimators involved.

and

;,

and (b) the problem o f finding the statistical

I n order t o keep our non-linear

adjustment problem

surmountable we w i l l r e s t r i c t ourselves t o least-squares estimation and also we assume for the


moment that map y is injective. Our non-linear least-squares adjustment problem reads then

min.2
XE

E(x) =

min.
(Y~-Y,Y~-Y)~,= ( Y ~ - ~ , Y ~ - ? ) ~
yEN=y( N I

(the f a c t o r 2 is merely inserted for convenience).


I n order t o solve f o r

and

;we need non-linear

maps P: M

-+

and y

-1 :

hi

(1.3)

such that

and
2'

;=

- 1( y ) ,

with

-1 o

y = identity.

Due, however, t o the non-linearity of map y it is very seldom t h a t one can find closed expressions f o r
the maps P and y-l

(there are exceptions!). I n practice one w i l l therefore have t o recourse t o methods

which are i t e r a t i v e i n nature. One starts w i t h a given point


to generate a sequence

x o , xl,

x2.

X
0

.. which hopefully converges

which are discussed i n the l i t e r a t u r e (see e.g.

, the i n i t i a l guess,
t o the point

i.

and proceeds

Most methods

Ortega and Rheinboldt, 1970) adhere t o the following

scheme:

q+l

= xB + t A X ,
q
9 q

.. . ,n ;

= l,

no summation over q,

(i)

Set q=O. A n i n i t i a l guess is provided externally,

(ii)

Determine an increment vector A x n i n the direction of the proposed step,

(iii)

Determine a scalar
i.e.,

t
such t h i t
I I y s - y ( x q + ! ) l iM 5 I I Y s - Y ( x 9 ) I IM ,
9
such t h a t the q t h step may considered t o be an ~ m p r o v e m e n t over the (q-1)th

step. The way i n which t


(iv)

is chosen is known as a line search strategy.


9
Test whether the termination c r i t e r i o n is met. I f so, accept X
as the value of
q+l
not, increase q by one and r e t u r n t o (ii).

;.

If

Generally one can say t h a t the individual methods falling under (1.4) d i f f e r i n their choice of the
The iterative techniques f a l l roughly i n t o t w o classes:
and the scalar t
9
cl'
direct search methods and gradient methods. D i r e c t search methods are those which do not require
increment vector A x

the explicit evaluation of any p a r t i a l derivatives of the function E, but instead r e l y solely on values
of the objective function E, plus information gained f r o m the earlier iterations. Gradient methods on
the other hand are those which select the direction A x

using values of the p a r t i a l derivatives of


9
the objective function E w i t h respect to the independent variables, as w e l l as values of E itself,
together w i t h information gained f r o m earlier iterations. The required derivatives, which f o r some
methods are of order higher than the first, can be obtained either analytically or numerically using
some f i n i t e difference scheme. This l a t t e r approach necessitates e x t r a function evaluations close to
the current point

and e f f e c t i v e l y reduces a gradient method t o one of direct search.


9'
We w i l l not a t t e m p t to give an exhaustive l i s t of iteration methods which could possibly solve our
X

adjustment problem (1.3). F o r a comprehensive survey of the various methods we r e f e r the reader t o
the encyclopaedic work of (Ortega and Rheinboldt, 1970). Instead, we r e s t r i c t ourselves t o t h a t
gradient method which seems t o be preeminently suited f o r our least-squares adjustment problem,
namely Gauss' iteration method. This method can be considered as the natural generalization o f the
linear case and i t is the only method which fully exploits the sum of squares structure of the
objective function E.
As t o the second problem, namely that of finding the statistical properties of the estimators involved
we w i l l not present a complete treatment of the statistical theory dealing w i t h non-linear
adjustment. We cannot expect a w e l l working theory for the non-linear model as we know i t f o r the

linear one. The probability distribution o f the non-linear estimator f o r

7 f o r instance,

depends on both

the non-linear map P and on the distribution o f the data. Hence, i t depends on the "true" values o f

which are generally unknown. Therefore, even when we can derive a precise formula f o r the
distribution o f the estimator, we can evaluate i n general only the approximation obtained by
substituting the estimated parameter values f o r the "true" ones.

The plan f o r this chapter i s the following:


As said we w i l l discuss Gauss' iteration method i n some detail. We have chosen t o make use o f
differential geometry as a tool f o r studying Gauss' method. We strongly believe namely t h a t geometry
and i n particular d i f f e r e n t i a l geometry provides us w i t h a better and richer understanding o f the
complicated problem of non-linear adjustment. Many of the geometric concepts developed i n
d i f f e r e n t i a l geometry t u r n out t o be important indicators, qualitatively as w e l l as quantitatively, o f
how non-linearity manifests itself i n the local behaviour of Gauss' method and i n the statistical
properties of the estimators. We therefore commence i n section 2 w i t h a brief introduction i n t o
Riemannian geometry.
I n section 3 we consider the problem of univariate non-linear least-squares. That is, we consider the
problem of orthogonal projection onto a parametrized space curve. F o r this purpose we f i r s t study the
local geometry of a space curve w i t h the aid of the so-called Frenet frame and Frenet formulae. The
geometrical impact o f the Frenet formulae is that i f T and N are respectively the unit tangent vector
and unit normal t o a plane curve and

i t s arclength parameter, than t o an accuracy o f the order of

the second power of small quantities A S ,

T + AT =
N + AN =

i.e.,

we have

c o s ( k ~ s )T + s i n ( k ~ s )N
s i n ( k ~ s )T + c o s ( k ~ s )N,

the Frenet formulae embody the f a c t that the Frenet frame

(T,N)

undergoes a r o t a t i o n

depending on the curvature k o f the plane curve as one moves f r o m the point on the curve
corresponding t o

t o the nearby point corresponding t o

+ A S.

I t is this observation on which

most of our further developments are based.


A f t e r having studied the local geometry o f a space curve, we show how curvature affects the local
behaviour o f Gauss' method. The section is closed w i t h some examples and preliminary conclusions.
I n section 4 we consider the case of multivariate non-linear least-squares adjustment. That is, we
consider the problem o f orthogonal projection onto a parametrized submanifold.

I n order t o

generalize the results o f section 3 we have t o f i n d an appropriate generalization t o the Frenet


formulae. This we f i n d i n the so-called Gauss' equation. With the aid of the normal f i e l d B, which can
be considered as the m u l t i v a r i a t e generalization o f the second fundamental tensor b known f r o m
classical surface theory (see e.g.

Stoker, 1969), we then show how the extrinsic curvatures of the

submanifold a f f e c t the local behaviour o f the multivariate Gauss' iteration method. A t the end o f
subsection 4.4 we summarize the more important conclusions. The section is ended w i t h a subsection

i n which we show how Gauss' method can be made i n t o a globally convergent i t e r a t i o n method.
I n section 5 we s t a r t by considering the classical two dimensional H e l m e r t transformation as a typical
example o f a t o t a l l y geodesic submanifold, i.e.

a manifold f o r which a l l extrinsic curvatures are

identically zero. N e x t we show t h a t f o r a particular class o f manifolds, namely ruled surfaces,


important simplifications o f the non-linear least-squares adjustment problem can be obtained through
dimensional reduction. Based on this idea we then present a non-linear generalization o f the classical
two dimensional H e l m e r t transformation, which we c a l l the two dimensional Symmetric Helmert

transformation. We also give the solution o f the t w o dimensional Symmetric H e l m e r t transformation


when a non-trivial rotational invariant covariance structure is pre-supposed. A f t e r this we generalize
our results t o three dimensions. Finally we give some suggestions as t o how t o estimate the extrinsic
curvatures i n practice and we estimate the curvature o f some simple 2-dimensional geodetic
networks.
I n the last b u t one section we b r i e f l y discuss some o f the consequences o f non-linearity f o r the
statistical treatment of an adjustment. We also show how the f i r s t moments o f the estimators are
a f f e c t e d by curvature.

2. A brief introduction into Riemannian geometry

We cannot expect t o convey here much of the theory o f Riernannian geometry. F o r a comprehensive
treatment o f the theory we r e f e r the reader t o the relevant mathematical l i t e r a t u r e (see e.g. Spivak,
1975).
Riernannian geometry is a generalization o f m e t r i c differential geometry o f surfaces. Instead o f
surfaces one considers n-dimensional Riernannian manifolds. These are obtained f r o m differential
manifolds by introducing a Riernannian metric, that is, a m e t r i c defined by a quadratic d i f f e r e n t i a l
f o r m whose coefficients are the components o f a t w o times covariant positive definite symmetric
tensor field. The corresponding geometry is called Riernannian geometry.
Surfaces, w i t h their usual m e t r i c inherited or induced f r o m the ambient 3-dimensional Euclidean
space,

are

2-dimensional

Riemannian manifolds,

and p a r t of our

considerations

w i l l be a

generalization o f ideas f r o m the theory of surfaces and curves. However, f o r n = l or 2 there are many
simplifications that have no counterpart when n > 2.

Consequently, a number o f new facts and

concepts w i l l have t o be introduced i n the following sections.

I n this section we only present b r i e f l y some o f the basic notions o f Riernannian geometry. We f i r s t
consider manifolds. A n n-dimensional differentiable or smooth manifold can roughly be described as a
set of points t i e d together continuously and differentiably, so t h a t the points i n any sufficiently small
region can be put i n t o a one-to-one

correspondence w i t h an open set of points i n IR".

That

correspondence furnishes then a coordinate system f o r the neighbourhood. Moreover the passage f r o m
one coordinate system t o another is assumed t o be smooth i n the overlapping region.
The manifold concepts generalizes and includes the special cases of the r e a l line, plane, linear vector

space and surfaces which are studied i n the classical theory. The mathematician (see e.g.

Hirsch,

1976) usually begins his development o f differential topology by introducing some p r i m i t i v e concepts,
such as sets and topology o f sets, then builds an elaborate framework out o f them and uses t h a t
framework t o define the concept o f a differential manifold. F o r our present application, however, we
can ignore most o f the topological aspects. They are either very natural, such as continuity and
connectedness or highly technical. Moreover, our analysis i n subsequent sections w i l l mainly be o f a
local nature, i.e. d i f f e r e n t i a l geometry i n the small. F o r differential geometry i n the small one can do
without the global considerations i n most cases since one assumes t h a t a single coordinate system
without singularities covers the portion o f the manifold studied.
We have chosen t o define manifolds as subsets o f some big, ambient space(Rk. This has the advantage
t h a t manifolds appear as objects already familiar t o those who studied the classical theory o f
surfaces and i t also enables us t o surpass many of the topological concepts. Suppose t h a t N is a subset
of some big, ambient space lRk. Then N is an n-dimensional manifo1.d i f it is locally diffeomorphic t o
IRn; this means that each point

o f N possesses a neighbourhood V = V '

n N ,for some open set V'

o f lRk, which is diffeomorphic t o an open set U o f /Rn. The t w o sets U c IR


t o be diffeomorphic i f there exists a map h: U

and

N are said

V which is one-to-one, onto and smooth i n both

directions. This diffeomorphism is called a parametrization o f the neighbourhood V. I t s inverse

h-l: V
h-'

U is called a coordinate system on V. When the map h-' is w r i t t e n i n coordinates,

-+

, ..., X n ) ,

= (X

the n functions

a=l,

... ,n ,

are called coordinate functions.

As a simple geodetic example o f a manifold, l e t N be the set o f a l l planar geodetic networks having,
say,

fn

number o f points. Each planar geodetic network represents then a point

obvious way t o give N a manifold structure is then by taking the diffeomorphism h

o f N. The most

- 1:

lRn as

the identity map. The coordinate functions are then the standard cartesian coordinates. However, one
could o f course also take polar coordinates, cylindrical coordinates, spherical coordinates or any o f
the other customary curvilinear coordinates provided they are suitable restricted so as t o be one-toone and have non-zero Jacobian determinant.
I f t w o sets

and N both are manifolds and

c N

then 0 is said t o be a submanifold o f N. I n

particular, any open set of N is a submanifold o f N. Assume for instance that

is the set o f a l l planar

geodetic networks having $ n number o f points, w i t h the additional restrictions that, say, some
distances between some network points are taken t o be constant. Then

can be shown t o be a

submanifold o f the above defined N.


L e t us consider the linear approximation o f a manifold N, i.e. i t s tangent space. The vectors i n i t are
the tangent vectors t o N. L e t c be a point on the manifold N and l e t c trace out a curve c ( t ). I n
a
a
n The velocity vector
local coordinates the curve is given by c ( t ) = X ( c ( t ) ) a = l,
a
t o this curve is given by d c / d t .
It is now established practice i n differential geometry t o

... , .

generalize the classical definition o f tangent vector, and t o consider a d i f f e r e n t i a l operator as


tangent vector. To do this we take a real-valued function E(x) defined on N and consider i t s r a t e o f
change along the curve c ( t ) .The r a t e o f change o f E(x) i n the direction o f c ( t ) is dE/dt. I n local
a
a
coordinates this becomes a E d c / d t (here we have abbreviated a E / a X
by a a E ) . I n other
a
a
words dE/dt is obtained by applying the d i f f e r e n t i a l operator T = d c / d t a t o E. I t is T which
a
we now define as the tangent t o N a t c i n the directions given by the curve c(t). I f we apply T t o the

local

coordinate

T(? ) =

B
dc /d t a

P am- .

T =

form

functions

T =

a1

...,n ,

we

obtain

the

traditional

velocity

vector,

i.e.

a
= d c / d t . SO, a t a n g e n t v e c t o r T is now a d i f f e r e n t i a l o p e r a t o r of t h e

T h e s p a c e of all possible t a n g e n t s a t a point c is c a l l e d t h e t a n g e n t s p a c e of

N a t c a n d is w r i t t e n a s

a, ,

In t e r m s of local c o o r d i n a t e s t h e d i f f e r e n t i a l o p e r a t o r s
TcN
f o r m a basis of T N . If t h e c o m p o n e n t s
a r e s m o o t h functions, t h e n

is c a l l e d a v e c t o r field on N.

( x )L

In addition t o p a r t i a l d i f f e r e n t i a t i o n , a second d i f f e r e n t i a l o p e r a t o r i s commonly introduced on a


manifold. This is t h e o p e r a t o r of c o v a r i a n t d i f f e r e n t i a t i o n I t is closely r e l a t e d t o t h e c o n c e p t of
connections.

T h e s u b j e c t begins by observing t h a t t h e t a n g e n t s p a c e s

neighbouring points

and

X'

change a s one moves from

to

X'.

T N , T ,N
X

a t two

A connection is essentially a s t r u c t u r e

which endows o n e w i t h t h e ability t o c o m p a r e t w o s u c h t a n g e n t s p a c e s a t a pair of inifinitesirnally


s e p a r a t e d points. T h e c o n n e c t i o n i s given by defining w h a t i s called parallel t r a n s p o r t o r parallel
t r a n s l a t i o n in N. Consider T N and T , N
X

t o the curve c a t
to

X'

X,

and any curve, c say, joining

to

X.'

L e t T be a tangent

t h e n T is said t o b e parallely t r a n s p o r t e d along t h e c u r v e c if T i s pushed f r o m

in s u c h a way t o a l w a y s r e m a i n parallel t o itself. If t is t h e p a r a m e t e r of t h e c u r v e t h e n t h e

c o v a r i a n t d e r i v a t i v e of T is t h e r a t e of c h a n g e of T w i t h r e s p e c t t o t. This c o v a r i a n t d e r i v a t i v e will
d i f f e r f r o m t h e ordinary p a r t i a l derivative, t h e q u a n t i t y t h a t m e a s u r e s t h i s d i f f e r e n c e i s t h e
connection.
L e t X and Y b e v e c t o r fields on N. T h e c o v a r i a n t d e r i v a t i v e of Y w i t h r e s p e c t t o X i s t h e n d e n o t e d by
VXY and i t is a v e c t o r field on N . T h e application of t h e o p e r a t o r

V is defined t o b e linear in b o t h i t s

( f Y ) = X ( f ) Y + f VXY,
X
valued s m o o t h function on N . With t h e l o c a l c o o r d i n a t e expressions X =

a r g u m e n t s and m u s t s a t i s f y t h e c h a i n r u l e

w h e r e f is a n y real-

2 aa '

therefore g e t

which shows t h a t

V Y is t o t a l l y specified o n c e V

X
v e c t o r s fields in t h e c o o r d i n a t e fields 3

v a a = rYa 6
coefficients.

a ,B ,y

ay

T h e n 3 real-valued s m o o t h functions

as

rY

aB

Y =

'?aa

we

is given. I t is c u s t o m a r y t o e x p r e s s t h e s e

aa B
= l,.

. ., n .

(2.2)

d e t e r m i n e t h e connection and a r e called t h e connection

L e t c ( t ) b e a c u r v e in N. A v e c t o r field X on N is t h e n said t o b e a parallel v e c t o r field along t h e


curve

c(t)

if i t s c o v a r i a n t d e r i v a t i v e w i t h r e s p e c t t o t h e d i r e c t i o n T = d c / d t

identically z e r o , i.e.,

aa

is

T h e r e a r e special t y p e s of c u r v e s c ( t ) which a r e so-called self parallel. T h a t is, parallel t r a n s p o r t


f r o m t t o t' t a k e s t h e velocity v e c t o r a t c ( t ) i n t o t h e velocity v e c t o r a t c ( t ' )

. These curves

a r e c a l l e d geodesics. S i n c e t h e c o v a r i a n t d e r i v a t i v e VTT m e a s u r e s t h e r a t e of c h a n g e of T in t h e

direction T under parallel transport, an equation describing the above definition o f a geodesic is
simply

VTT = 0 ,

(2.4)

where T is the velocity vector o f c ( t )

With T = d c a / d t

local coordinates

aa ,

(2.1) and (2.2), (2.4) becomes i n

So f a r we have equipped the manifold N only w i t h a connection given by the defining equation (2.2).
We w i l l now give it some additional structure. Assume given a smooth real-valued, symmetric and

( . ,.)

positive-definite bi-linear map

:T x N

T x N + IR.

bi-linear map is called a Riemannian manifold. The bi-linear map

A manifold equipped w i t h such a

( . ,.)

is called the m e t r i c

xN

tensor and i n local coordinates i t is given by the smooth functions g


( X ) = ( aa ,a )
aB
B xN
There is a unique symmetric connection on a Riemannian manifold such t h a t parallel translation
preserves the Riemannian metric. I t is called the Levi-Civita or Riemannian connection. I t is t h a t
unique connection satisfying

for any vector fields X,Y

and Z on N. A connection satisfying (2.5.a)

torsionfree, and a connection satisfying (2.5.b)

is said t o be symmetric or

is said t o be metric.

U p till now we have considered only one manifold N. L e t us now consider two manifolds N and M, and
a smooth injective map y between them, i.e.

y:

N + M

. Then the image

= y ( N ) c M defines

a submanifold of M.
The map y provides a way of mapping vectors on N i n t o vectors on M. The image of T N under y is a
tangent space of

a t y(x), and is denoted by T

N.

This map between tangent spaces induced

y(x)
and is called the push forward of y. The precise action on a
TxN + T y ( x )
is such that given a function f on M, so that f(y(x)) is a function on N, then
a
With X = X a
this would give i n
is defined by ( y X ) f = X f ( y ( X ) 1.
Y
X
a

by y is w r i t t e n y :
3i

vector X

T--N

yY(X) E T
y(x)
local coordinates

...,m, are the local coordinate functions o f M ,

ai , i = l ,
a
coordinate vector fields and y ( X ) the coordinization o f the map y :
where yl, i = l ,

..,m,
-+ M

the corresponding

Although i t is possible t o suppress explicit reference t o the map y, t o identify N w i t h the subset

y (N)

o f M and each T N w i t h the subspace

y (T N)

of T M,
we w i l l not do so. R e c a l l
Y
namely t h a t also i n the case o f linear maps we are not used t o identify the range space w i t h the
X

domain space, although both spaces are isomorphic.

As a closing o f this section we define the observation- and parameter space o f our adjustment. I n our
least-squares adjustment context the observation space M is taken t o be Euclidean w i t h Euclidean

( . , . ) . The coefficients o f the m e t r i c are given by the real-valued constants


g . . = (ai, a.)
. The connection compatible w i t h the Euclidean m e t r i c o f M w i l l be denoted by
'
I
J M
D. And since D a.
= 0 , i , j = 1, . . . ,m, we have f o r any t w o vector fields V and W on I,4 t h a t
i
D W = V a (h ) a , i.e. the covariant derivative reduces t o the ordinary vector derivative. The
v
j

metric

Tj

directional derivative o f a function f on M i n a direction V w i l l sometimes be denoted by D f

Manifold N w i l l play the role o f the parameter space and the non-linear map y replaces t h e linear
map A which has been used hitherto. Manifold N w i l l be endowed w i t h a Riemannian m e t r i c by pulling
the m e t r i c o f M back by y. That is, given the m e t r i c o f M we define the m e t r i c o f N by

(x,Y)~

(Y

for any
,

X,Y

T ~ N .

3. Orthogonal projection onto a parametrized space curve

3.1.

Gauss' i t e r a t i o n method

It seems reasonable t h a t we should begin our discussion of non-linear least-squares adjustment w i t h

the simplest class o f problems, namely those i n which manifold N is one dimensional. I n case o f our
least-squares problem

min.
N = y(N)

Ys-Y,Ys-Y

)M

(Y

, - ~ Y ~ - $ )M

this means t h a t we need t o consider the problem o f orthogonally projecting t h e observation-point


y,

M onto a space curve.

Since we l i k e t o denote the space curve by c ( t )

,we replace the map

y:

the map
t EIR = N +M.

c:

Our univariate least-squares adjustment problem reads then

min.
N = c(N)

= (ys-&ys-L.

)M

M i n this section by

F r o m geometric reasoning i t w i l l be clear t h a t a necessary condition f o r

to be the least-squares

solution o f (3.31, is t h a t

must hold, where

d .
-IS
dt

a basis o f T IR = T N

I n the linear case i t was necessary and also sufficient f o r the residual vector t o be orthogonal t o the

fi

= AN

residual

vector

linear submanifold
Since

the

T ( c ( N ) ) = T .N
C

. I n the non-linear case however, i t is necessary but n o t sufficient.


ys-6

needs

o f the non-linear manifold

the assumed non-linearity of the mapping

to

fi

be orthogonal

the

linear

tangent

space

~~i
. B u t due t o

= c ( N) a t

6,

we need t o know

the tangent space T A( c ( N) )

N = IR

c:

to

is

generally unknown a priori. Hence our minimization problem cannot be solved directly. Expression
(3.4)

does however suggest a way o f solving our adjustment problem. Instead o f orthogonally

projecting y

projection o f y

onto the tangent space


S

T ~ N,

one can take as a f i r s t approximation the orthogonal

onto a nearby tangent space, T

fi

say. O f course then,

B u t by pulling the non-orthogonality as measured by (3.5) back t o the Riemannian manifold N, we get

= (c

I;

(--l,
dt

Y ~ - C ) ,~M

with

At

IR = T

t9

(3.6)

which suggests i n local coordinates the following iteration procedure:

where g(t) is the induced m e t r i c of

N = IR.

This is Gauss' iteration method and i t consists o f successively solving a linear least-distance
adjustment problem u n t i l condition (3.4) is met.
Before we now proceed w i t h studying the local behaviour o f Gauss' iteration method (3.7),

we w i l l

f i r s t derive some local geometric properties of the space curve c itself. A n appropriate approach f o r
studying the local geometry of curve c is by using

3.2. The Frenet frame

With the tangent f i e l d (or velocity f i e l d i f one considers t E IR t o be a t i m e parameter)

of curve c(t), we obtain f o r non-zero velocities the unit tangent field T as

l T( I

And since (

= l f o r a l l admissible t E IR,

we have

which shows t h a t D T T is orthogonal t o the unit tangent field T. We define the first curvature kl as

and when kl

> 0 the first normal N1 by

Geometrically the f i r s t curvature kl can be seen t o determine the r a t e of change o f the direction of
the tangent t o the curve w i t h respect t o i t s arclength, where arclength is defined as

The curvature kl is a property of the curve c and it is invariant t o a reparametrization.


F r o m the orthogonality of N1 and T follows t h a t

which shows t h a t T is orthogonal t o DTNl

Thus DTNl

+ klT

and when k 2 >

+ klT

.Similarly i t follows f r o m

I I N1( I

= 1 that

is orthogonal to both N1 and T. We now define the second curvature k 2 as

0 the second normal NZby

We can proceed i n this way t o define k3, N3 etc. The vectors T, N1, NZ,

... are called the Frenet

vectors and the equations t h a t express the DrT, DTNi i n terms of the Frenetvectors are called the
Frenet equations. F o r the case m=3 they read as

I n order t o find the relative position o f the curve c w i t h respect t o i t s Frenetframe a t some regular
point, we can study the projections o f the curve onto the planes o f the Frenetframe. F o r convenience
we assume that the curve c has been parametrized w i t h the arclength parameter
point, P say, correspond t o the value

. Now l e t our

= 0 o f the arclength parameter. The curve c(s) can then be

w r i t t e n i n the f o r m

The subscript "o" denotes t h a t the value i s taken a t the point corresponding t o
3
+ 0
i f S + 0.
symbol means t h a t o ( S ) / S

= 0. And Landau's o(.)

Since
D T = k N
T
1 1'
it follows t h a t

2
D T = D ( k N
T
T
1 1

=T(k

)N
1 1

dkl
k D N =--N + k
(-klT
1 T 1
ds
1
l

Substituting the above t w o expressions i n t o (3.14) gives then w i t h k

k N 1.
2 2

dkl
l
- --l - d s
'

Choose now a special coordinate system i n M such t h a t the point P under consideration is the origin
and the vectors To, N
o
,l

and

are the u n i t vectors o f the f i r s t three coordinate axes. I n this

coordinate system the curve c(s) can be represented by the equations

These equations are called the canonical representation o f curve c(s) a t

= 0, and the leading terms

S = 0.
3
It w i l l be clear t h a t many curves exist which have up t o o(s ) the same canonical representation as

i n i t conveniently describe the behaviour o f 4 s ) near the point corresponding t o

4s). That is, f o r

small enough these curves behave alike and are thus indistinguishable.

We w i l l now give a characterization o f such "kissing" curves and one of them, namely

3.3.

The "kissing" c i r c l e

w i l l be used for a further analysis o f Gauss' i t e r a t i o n scheme (3.7).


Consider t w o curves cl(sl)

and c2(s2) w i t h a common point cl(o)

natural arclength parameter. L e t cl(sl=h)

= c2(o). s1 and s2 are taken as their

and c2(s2=h) be t w o points on respective cl(sl)

and c2(s2).

We say t h a t the t w o curves have a c o n t a c t o f order n i f

but

F r o m this follows t h a t two curves c(ls)l

and c2(s2) have a contact o f order n a t a regular point

corresponding t o sl = s2 = 0 i f and only i f

i = l,
...,m,
where the coordinates o f the t w o curves are given w i t h respect t o a fixed frame o f M . W i t h (3.15)
follows then t h a t two curves have a contact o f order a t least t w o a t a common point P i f and only i f
and moreover, the same
they have a t P a common tangent vector To, a common normal N
l,o
curvature kl(0).
A l l such curves w i l l thus have the same canonical representation

And i n the above sense o f contact such curves can be considered each others best approximation.
Now, i f we r e c a l l our iteration scheme (3.7) we observe that only f i r s t order derivative information is
used.

Hence,

f o r a small enough portion o f the curve c(s) about the least-squares solution

= c ( o ) , we can replace the space curve c(s) by

I n fact, w i t h the same approximation we can replace the space curve c(s) by the c i r c l e

This follows f r o m

and

Thus we can use the c i r c l e C(s) t o replace the curve c(s) i n a neighborhood o f P. The c i r c l e C(s) is
known as the osculating (="kissingw) c i r c l e o f c(s) a t

6 = c(o)

or the c i r c l e of curvature.

N o t e that by replacing c(s) by C(s) we achieve a drastic simplification o f our original non-linear leastsquares adjustment problem.
c(s )

F i r s t o f a l l we achieve a dramatic decrease i n dimensionality:

whereas C(s) lies i n a two-dimensional plane of M spanned by To and Nl,o.

And

secondly we can now exploit the simple geometry of the osculating c i r c l e C(s) i n order t o understand
the local behaviour o f Gauss' i t e r a t i o n method (3.7).
Consider therefore the situation as sketched i n figure 24.

C(s
figure 24

y s is the orthogonal projection o f the observation point y

Nl,o,

and C(sl)

M onto the plane spanned by To and

is the i n i t i a l guess t o s t a r t the i t e r a t i o n procedure.

Since the orthogonal projection o f y,


AS

as the orthogonal projection o f

the tangent o f C(s) a t C(sl) gives the same increment


-y onto
,we have for our f i r s t i t e r a t i o n step
S

As,

- I I is
- C(sl)l

F r o m the figure also follows t h a t i n a sufficiently small neighbourhood o f

4,

tan

I I Y s - ~ ( s l ) l Ill

-1

6,

sin a

--------------M----------------------

-1
kl(0)

4,(k,(D)

lM s i n ( a - @ l )

-1 1 c

.1

- I I Y ~ - C Il
1

II

= I I ys-~(sl)l

iS-c(sl)l

IM c o s

a'

( s i n a - @1c o s a )

lM

sin(a-4,)

With

= k l( 0 ) s l

And w i t h s

= s

combination o f (3.18) and (3.19) gives

+ A s

1'

this finally gives the relation

F r o m this expression we can now formulate several important conclusions concerning the local
behaviour o f Gauss' i t e r a t i o n method as applied t o the curve c(s): F i r s t of a l l expression (3.20) tells us
t h a t i n case k ( 0 ) f 0 , the local convergence behaviour of Gauss' i t e r a t i o n method as applied t o
1
the space curve c(s), is linear. That is, the computed arclength o f the curve c(s) f r o m 6 t o c(s
q+l
depends linearly on the computed arclength f r o m 6 t o the point c(s ) o f the preceding step.
'4
Secondly, a necessary condition f o r convergence of Gauss' iteration method is t h a t

And thirdly, expression (3.20) shows t h a t the local linear convergence behaviour is determined by t w o
terms, namely the f i r s t curvature kl
the residual vector

-6

smaller the component of y,

o f the curve c(s) a t

onto the f i r s t normal N1 a t

- 6 i n the

6 and

6.

the projection (N

~~Y,-~)B,M

Of

Thus the smaller the curvature and the

direction of N1, the faster Gauss' i t e r a t i o n method as applied

t o the space curve c(s) converges.


So f a r we assumed f o r convenience t h a t the curve c:IR= N
parameter
t

S.

S.

M was parametrized w i t h i t s arclength

But i n general one would o f course have an arbitrary parametrization

c(t),

with

The question t h a t remains is then whether the above given conclusions s t i l l hold when t

S.

To study this more general case, it seems appropriate t o look f o r the direct analogon o f the Frenet
equations (3.13). These are given by the so-called

3.4 One dimensional Gauss- and Weingarten equations

F r o m the definition o f the arclength parameter

follows t h a t

S,

We therefore have t h a t

And w i t h (3.22) and D T = k N follows t h a t


T
1 1

D V = (S')

-1

(sl')V

2
( S ' ) kN
l 1.

I n a similar way we f i n d that

D N = ( s l ) D N and D?
= (s1)D,-N2.
V 1
T 1
2
With these last three equations we can now replace (3.13) by

F o r m = 3, these equations can be considered as the one-dimensional analogons of the Gauss- and

Weingarten equations.

3.5 Local convergence behaviour of Gauss' iteration method

N o w l e t us return t o our adjustment problem and see how the equations (3.23) come t o our use f o r
describing t h e local properties of iteration scheme (3.7).
F i r s t observe t h a t (3.7) can also be w r i t t e n as

Expanding the right-hand side i n t o a Taylor series about the least-squares solution t gives then w i t h
jE(i)

And w i t h

= 0:

DVV = ( s ' ) - ~ ( s ~ ~+ ) v

and

the above expression (3.25) reduces t o

B u t this is exactly the result we obtained i n (3.20) for the special case t =

S,

t = 0.

Hence, we

have as a f o u r t h conclusion t h a t the local linear convergence behaviour of Gauss' iteration method as
applied t o the space curve c(t), is invariant t o any admissible non-linear parameter transformation. It
is thus idle hope t o think that one can improve the convergence behaviour by changing t o a d i f f e r e n t
coordinate system.

N o w l e t us assume t h a t the f i r s t curvature kl of the space curve c(t) is identically zero.


Then

D T = O
T

which means t h a t the unit tangent vector T is parallel along the whole curve c(t). And since 4.1 is
Euclidean by assumption, this means t h a t the curve c(t) is a straight line. F r o m (3.25) follows then
that

And w i t h

g(;)

= (SI(;))~

f o r kl = 0, follows then t h a t

and

Hence, for the case the curve c(t) is a straight line (kl

0)

Gauss' i t e r a t i o n scheme (3.7) w i l l

have a local quadratic convergence behaviour. B u t how is this possible? Doesn't orthogonal projection
onto a straight line correspond t o the case o f linear least-squares adjustment. And i f so, wouldn't t h a t
mean t h a t iterations are superfluous? The answer is partly i n the a f f i r m a t i v e and p a r t l y i n the
negative. It essentially boils down t o our earlier remarks made i n the previous chapters, namely t h a t
adjustment i n the general sense should be thought o f as being composed o f the problem o f adjustment
in

the

narrow

sense,

i.e.

the

problem

) ,

min. (yS-y, ys- Y ) ~ = y s - Y EN


problem o f finding the pre-image 2 o f
part, namely t h a t o f finding the point

9
9

of

finding

an

estimate

such

and the problem o f inverse mapping,

under the map y :

i.e.

that
the

Thus the actual adjustment

o f M which has smallest distance t o

i n the submanifold

E hi ,
YS
more precise as t o what we mean by "linear least-squares adjustment". Usually one means by "linear
i a
least-squares adjustment1' t h a t the coordinate functions y ( X )
i = l, ,m, a = l,
n

is essentially an observation space oriented problem. I n this l i g h t we must therefore be

. . .,

...

o f the map y are linear. We will, however, c a l l a least-squares adjustment problem linear, i f the
submanifold

fi

o f the Euclidean observation space M defined by the map y : N

is linear or

flat. F o r our problem o f orthogonal projection onto the curve c this means t h a t the adjustment
problem is termed linear i f kl = 0. B u t it also means that linear least-squares problems may admit

...,

non-linear functions cl(t), i = l, m. The non-linearity i n cl(t) is then only caused by the choice o f the
parameter t. That is, by choosing another parameter i t is possible t o eliminate the non-linearity i n
cl(t). I n particular i f one takes the arclength parameter

or a linear function thereof as parameter,

the functions ci(t) w i l l become linear. As a consequence we see t h a t the local quadratic convergence
factor o f (3.29) is n o t a property o f the curve c(t) itself, b u t instead depends on i t s parametrization.
I n the special case namely o f t =

S,

we would have (S')-ls" = 0, i.e. no i t e r a t i o n would be necessary

then. Thus we see t h a t w i t h (3.29) we are actually solving f o r the inverse mapping problem, instead o f
the actual adjustment problem.
To put the argument geometrically, consider an arbitrary parametrization o f the straight line c such
t h a t the parameter t is not a linear function o f the arclength

The length

S.

1 IV(t )l

o f the curve's tangent vector V changes then when moving along the curve f r o m point
d
g ( t ) = ( C (--),
c
Hence, the coordinate expression o f the induced m e t r i c o f N
w dt
w
w i l l be a function o f the parameter t. B u t this means t h a t when one applies formula (3.7)

'(t)

t o point.
d
dt
o f Gauss'

(--)h4,

i t e r a t i o n method one is i n f a c t using t w o different "yardsticks". One yardstick given by the pulled
back m e t r i c o f the tangent space o f the curve c a t point c(t ), namely g(t ), and a second yardstick,
q
q
namely g(t), the induced m e t r i c o f the parameter space N itself. And it w i l l be clear t h a t the induced
m e t r i c g(t ) o f t h e linear tangent space T N w i l l be constant f o r the whole space, whereas the
q
t
induced m e t r i c g(t) o f N i t s e l f changes fromqpoint t o point. Thus when one computes the tangent
vector A t

= At

--

q dt

and adds i t s coordinate A t

through

to t
q

, t o obtain

one is i n f a c t neglecting t h a t T

25).

N and N are endowed w i t h two different m e t r i c tensors (see figure


9

Ay
1

dt,)

0x1
't:

m e t r i c : g .1 .I =

( ai,a.) 1

m e t r i c : g(tl)

V
U

Atl

(- -)
dt'dt

m t r i c : g(t) =

tl,N

d
(-d -)
dt'dt

t N

figure 25

And because o f this neglectance one is, despite the flatness of


iteration to find

t.

Note, however, t h a t i f one is not interested i n

= c ( N ) follows namely t h a t

i t e r a t i o n is necessary. F r o m the linearity o f the submanifold

is independent o f the choice f o r t

G, s t i l l forced t o recourse t o an
E, but instead is satisfied w i t h 6 no

9'

Since (3.28) also holds f o r the case kl f 0 but y

-6

= 0

i t follows that we also have the local

quadratic convergence rule (3.29) for zero residual vector adjustment problems. This is i n f a c t not
very surprising since f o r b o t h the cases kl = 0 and y

-6

= 0

we do n o t need an i t e r a t i o n t o

solve the actual adjustment problem. I n case o f kl = 0 the actual adjustment problem is namely linear
and i n case of

ys=

-6

= 0 the actual adjustment problem is indeed already solved a priori, since

Thus f o r both the cases kl = 0 and y

-6

= 0

the iteration is only needed f o r the

inverse mapping problem and not f o r the actual adjustment problem.

To illustrate the theory developed so far and t o demonstrate the various e f f e c t s mentioned we w i l l
now give some examples.

3.6 Examples

Example 1: Orthogonal projection onto the curve O(2).


I n this f i r s t example we take as non-linear model the t w o dimensional Helmert transformation only
admitting a rotation. The non-linear model reads

- i = l,
...,n = number of network points,
- the t i l d e " - " s i g n stands f o r the mathematical expectation,
- xi,yi are cartesian coordinates of the networkpoints,
- - X ,y are the fixed given coordinates,
- X , y,
,X n t Y n) S is the observation vector, and
- 8 is the r o t a t i o n angle t o be estimated.

where:

.. .

F o r the observation space

with

ai'

i = 1,

.. .. 2 n

M=

we take the standard metric, i.e.

the standard basis.

I t w i l l be clear that the above model (3.30) determines a curve c(8 ) i n the observation-space M. To

solve f o r (3.30)

we therefore need t o project the observation vector

orthogonally onto c (8 )

( X l, y l,

. .. , n t Y nIt
X

F o r illustrative purposes we w i l l f i r s t derive expressions f o r the induced metric, the f i r s t curvature


kl

of

c(8 ) and the convergence factor cf. of Gauss' iteration method as applied t o (3.30).

this, we give the exact non-linear solution t o (3.30).


interpretation of model (3.30)

N o t e t h a t we can w r i t e model (3.30) i n the f o r m of

-Y

= (xl,Y1,---,X

And finally we w i l l give an alternative

by using the manifold structure of the group O(2) o f orthogonal

matrices of order 2.

where:

After

ntYn)

t
9

with

(e19e2)M

= 0,

(el,el)M

= (e2,e2)M

= 1

Hence our non-linear model (3.30) describes a circle which lies i n the two-dimensional plane spanned
by the orthonormal vectors el and e2 (see figure 26).

"Helmert transformation only admitting a rotationtt

figure 26
The radius o f this c i r c l e is given by the square r o o t of I.
Thus we have immediately t h a t

We also see a t once t h a t the arclength parameter

of c ( 0 ) is given by

f r o m which follows t h a t the induced m e t r i c is constant along c ( 0 ) .


Hence, i f by any chance the least-squares residual vector y

-6

is identical t o zero, Gausst i t e r a t i o n

method as applied t o (3.30) w i l l have a t h i r d order convergence behaviour.


To compute the local linear convergence factor

o f Gausst i t e r a t i o n method as applied t o (3.30),

we need the length of the residual vector

-&

projected onto N1, the f i r s t normal of c ( @ ) . Thus we need the length of the pseudo residual vector

-y s - c..,

where

is the vector obtained by projecting ys orthogonally onto the plane spanned by el

and e2 (see figure 26). Hence,

with

= 1

- i

i ( e 2 , ~ s ) M'

l i Z l

Therefore

With (3.33) follows t h e n t h a t


cf.

l - A

l,

with
9

N o t e t h a t (3.35') i s precisely t h e e s t i m a t e of t h e s c a l e p a r a m e t e r which o n e o b t a i n s when solving f o r


t h e t w o dimensional H e l m e r t t r a n s f o r m a t i o n

X.

-1

y. =
L

ii

-x i

A cos 8 +
A sin 8 +

ii

;i

A sin 8
A cos 8,

a d m i t t i n g a r o t a t i o n and s c a l e ( s e e also (5.12)).


Of c o u r s e t h e a b o v e discussion is only m e a n t a s illustration. In p r a c t i c e o n e will n o t solve model
(3.30) by using a n i t e r a t i o n method, s i n c e a n e x a c t non-linear solution is readily available. F r o m
f i g u r e 26 follows namely t h a t

Hence,

It w i l l also be clear f r o m the figure t h a t solution (3.36) is a global minimum of the minimization

problem

Except for the case

Il

y I IM

= 0.

Then namely the solution is indefinite.

We w i l l now give an alternative interpretation of the non-linear model (3.30). F o r the moment this
alternative interpretation is only o f theoretical interest. Observe t h a t we can w r i t e model (3.30) i n
the f o r m of

cos 8
sin 8

-. s i n

8
cos 8

which we abbreviate as

Thus

stands f o r the nx2 m a t r i x on the l e f t hand side of (3.381, A f o r the nx2 m a t r i x on the r i g h t

hand side and

f o r the 2x2 r o t a t i o n matrix.

We w i l l denote the linear vector space of nx2 r e a l matrices by M(nx21, and the space o f 2x2
orthogonal matrices by O(2):
It can be shown t h a t O(n) is an

M = M(nx2), N = 0(2), and

fi

n(n-l)
------

2
= A 0 ( 2 )C M

dim. M = 2n,

dimensional manifold. Thus, w i t h the usual abbreviations

we have t h a t

dim. N = dim.

fi

= 1,

and t h a t A O(2) describes a curve i n M.


To make our new formulation (3.39) compatible w i t h (3.30) and the m e t r i c (3.311, we take for the
m e t r i c tensor o f M = M(nx2) the following definition:
def.
=

It is easily verified t h a t

(. ,

trace[(.)

(.))

as given by (3.41) f u l f i l s the necessary conditions o f

)M

symmetry, bi-linearity and non-degeneracy.


With (3.39)

and (3.41) we are now i n the position of rephrasing our original least-squares problem

(3.37) as
min. (ys-A
xeN=0(2)

x,ys-A

x ) ~=

min. t r a c e [ ( y
x~N=0(2)

-A X)t (ys-A X)).

And this is the formulation which we w i l l use i n our discussion of the three dimensional H e l m e r t
transformation (see subsection 5.5).

In the remaining four examples of this section we give some numerical results of some simple models
to demonstrate the various effects mentioned of Gauss' iteration method. In all these examples we
take the metric of M to be the standard metric.

Example 2: Orthogonal projection onto a unit circle

i =l
i=2
(t) = cos(t), c
(t) = sin(t),
i =l
i =2
The observation point given is: y
= 0.5, y
= 0.0,
and
Ourmodelreadsas: c

our initial guess reads: t

The numerical results are:

ZRS

iteration step q

(rad.)

.i=l (tq)

ciZ2(tq)

0.90822

0.41849

0.43178

0.97534

0.22070

0.22254

0.99371

0.11195

0.11218

0.99842

0.05618

0.05621

0.99960

0.02812

0.02812

0.99990

0.01406

0.01406

tq

0.99997

0.00703

0.00703

0.99999

0.00352

0.00352

0.99999

0.00176

0.00176

10

0.99999

0.00088

0.00088

11

0.99999

0.00044

0.00044

12

1.00000

0.00022

0.00022

13

1.00000

0.00011

0.00011

14

1.00000

0.00005

0.00005

15

1.00000

0.00003

0.00003

table 1
i=l
= 0.5,
Since the unitcircle has curvature kl = 1, we have with the observation point y
i=2
S
~ 0 . 0 that ( k l ~ l , y s - C ) B , M = 0 . 5 . And this local convergence factor is indeed
Ys
clearly recognizable from t h e above given numerical results.

Example 3: Orthogonal projection onto a unitcircle

= cos(t), ciZ2(t) = sin(t),


i=l
but this time we have as observation point: y
= 1.5,
S
1
our initial guess reads: t = - R (rad.).

Again our model reads: ci"(t)

i =2

= 0.0,

The numerical results are:


I

i t e r a t i o n step q

1
2

4
5
6

table 2

Again we have here a curvature kl=l.

( klNI,
ys-

y -C
c

) S ,M

- 0.5,

I n contrast w i t h example 2, however, we have t h a t

which

follows

from

the

fact

that

the

residual

vector

has a direction opposite t o t h a t o f N1. Thus, when compared t o example 2, this t h i r d

example reveals another feature, namely t h a t when the observation point ys and the centre o f
curvature are on opposite sides o f the curve,

the convergence factor w i l l be negative. As a

consequence the steplength o f each iterationstep w i l l be too long, resulting i n an overshoot. Hence,
the oscillatory behaviour o f the above iteration.
I n t h e previous example the obervation point ys and centre o f curvature were on the same side o f the
curve. And i n t h a t case the steplength w i l l be too short (see figure 27). This e f f e c t is indeed clearly
recognized f r o m table 1 where the points i n the sequence tl,

figure 27

t2

... approach

C(s)

Example 4: Orthogonal projection onto a straight line

= .lot,
ciz2(t) = ,lot,
i= l
i=2
the observation point is given by: y
= 0,
y
= 2e, and

Our model reads as: ci=l(t)

the i n i t i a l guess reads: to = 0.

.
t

f r o m one side.

The numerical results are:

.id (tq)

.i=2 (tq)

5.57494

5.57494

0.17183

3.33967

3.33967

0.12059

2.77267

2.77267

0.10198

2.71881

2.71881

0.10002

2.71828

2.71828

0.10000

i t e r a t i o n step q

table 3
Since the curve onto which the observation point is projected has no curvature, the local convergence
behaviour o f Gauss' i t e r a t i o n method as applied t o the above model must be quadratic. I n fact, w i t h
(S l ( ))S

( ) = 5

f o r the above model, the local convergence rule o f (3.29) is easily

verified f r o m table 3.
When viewing the last column o f table 3 we also notice another interesting feature. We see t h a t a l l
iterates t except the i n i t i a l guess to stay on the same side o f the solution t .The explanation is t h a t
9
the induced m e t r i c function, which f o r the above model reads g(t) = 200 e20t, is monotonic and
increasing. With a monotonic and increasing m e t r i c function one w i l l namely have an overshoot. I n
the above iteration this has the following effect. Since t o <

we see t h a t w i t h the graph o f g(t)

we

are going uphill. Hence, i n the f i r s t iteration step we w i l l have an overshoot. Thus t l > t .
B u t f o r the next step this means t h a t w i t h the graph o f g(t) we are going downhill. Hence, f o r the
second and succeeding steps we w i l l have an undershoot, which explains why tl,

t2... a l l approach t

f r o m the same side.

Example 5: Orthogonal projection onto a unitcircle with zero residualvector

= cos(t), ciz2(t) = sin(t),


i= l
the observation point is given by: y
= 1.0,
1
the i n i t i a l guess reads: t
= II
(rad.).
0
4
The numerical results are:

Our model reads: ci"(t)

iteration step q

i=l
ys
= 0.0,

and

ci='(tq)

ci='(tq)

0.99694

0.07821

0.07821

1.00000

0.00008

0.00008

1.00000

0.00000

0.00000

Although the unitcircle has a curvature of kl = 1, the observation point lies on the circle. Hence, we
expect a local quadratic convergence behaviour governed by rule (3.29). However, a closer look a t the
above results reveals a t h i r d order behaviour instead o f second order. The explanation is given by the
f a c t t h a t t equals the natural arclength parameter S o f the unit circle. Thus f (sl(t))- 1s1I(t) = 0.

3.7. Conclusions

In this section we have considered t h e univariate minimization problem of orthogonally projecting a


given observation point ys onto a smooth curve c in M . As a natural generalization of t h e linear leastsquares problem we obtained Gauss' iteration method (3.7) which consists of successively solving a
linear least-distance adjustment problem until t h e necessary condition of orthogonality,

is fulfilled. A t each iteration s t e p q + l t h e observation point ys is orthogonally projected onto a new


tangent space T c (
Hence, t h e r a t e in

) ( C ( N) ) ,which will be close t o t h e previous one, Tc ( q ) ( c ( N ) )


~ ' 1
~ c ht h e tangential p a r t of y - c ( t )decreases will depend on t h e r a t e of

change of tangent spaces. And since curvature is defined as t h e measure of t h e r a t e of change of


tangents, one c a n expect t h e local behaviour of Gauss' iteration method t o depend on t h e curvature of
curve c. Through geometric reasoning w e found t h a t t h e local behaviour of Gauss1 method is properly
described by

Hence, a necessary condition for convergence is

and t h e r a t e of convergence is linear.


Moreover, it will be c l e a r from t h e pictorial presentations given earlier t h a t

is a s t r i c t local

minimum if

We also found t h a t t h e local convergence behaviour of Gauss' method is invariant t o any non-linear
admissible p a r a m e t e r transformation.
The decisive f a c t o r s which determine t h e local convergence r a t e a r e given by kl and y

-6.

If

either of them or both a r e equal t o zero, then Gauss' method will have a local quadratic convergence
behaviour:

Instead of solving t h e actual adjustment problem, one is then solving for t h e inverse mapping
problem: given

6 find t h e pre-image

t under map c: t c N = IR -+ M

Consequently, t h e local quadratic convergence behaviour will not be invariant t o a reparametrization.


In t h e next section we extend our results t o t h e multivariate case. Can we expect t h e generalizations

t o be simple and straightforward? I n most cases yes, although there are t w o points which are worth
mentioning. Firstly, when we consider manifolds other than curves, we must i n some way take care o f
the increase i n dimensions. And secondly, we must recognize that a surface i n a three dimensional
space is the simplest object having i t s own internal or intrinsic geometry. I n our investigation o f the
space curve c(t) we were lead t o the invariants o f curvature. But these are invariants rather of the
way the curve is situated i n space, than internal t o the curve. That is, they are extrinsic invariants. A
curve has no intrinsic invariants, since essentially the only candidate for this status is the natural
parameter o f arclength

S.

But

is by i t s e l f inadequate f o r distinguishing the curve from, for instance,

a straight line, i.e. we can coordinatize a straight line w i t h the same parameter s i n such a way t h a t
distances along both curve and straight line are measured i n the same way. F o r surfaces and
manifolds i n general the situation is different. I t is impossible, f o r instance, t o coordinatize the
sphere so that the formula for distance on the sphere i n terms o f these coordinates, is the same as
the usual distance formula i n the ambient space. A consequence is t h a t where i n the univariate case
the possible local quadratic convergence behaviour o f Gauss' method could be reduced t o a t h i r d order
behaviour by taking the arclength

as parameter, this w i l l not be possible i n the multivariate case.

4. Orthogonal projection onto a parametrized submanifold

4.1.

Gauss'method

I n this section we w i l l consider Gauss' method for the multivariate case o f non-linear least-squares
adjustment. Thus we assume dim. N = n

1.

Furthermore we assume that the imbedding o f the

n-dimensional manifold N i n t o the m-dimensional space M is established by the injective nonlinear


map y, i.e.

y:

When we speak o f the m e t r i c o f N we mean as before the induced metric, i.e. the m e t r i c obtained by
pulling the m e t r i c o f M back t o N :

(x,Y)~ =

( y,

(X)

,y,

(Y))M

f o r any vector fields X,Y on N.

Now, consider again the least-squares minimization problem

For

t o be a solution t o (4.1) we have as necessary condition that the residual vector y

be orthogonal t o the tangent space

must hold a t

T,R

of

at

i,i.e.

-i

must

we have t h a t

iE N .

Due, however, t o the assumed nonlinearity o f map y, the tangent space T

is generally unknown a

priori. Hence, our adjustment problem cannot be solved directly i n general. B u t as i n the previous
section, (4.2) suggests that we take as a f i r s t approximation the orthogonal projection of y, onto a
chosen nearby tangent space T

of

at y

= y ( x ) . O f course then
9

B u t by pulling the non-orthogonality as measured by (4.3) back t o the Riemannian manifold N, we get

(aa,~xq)xqN

= ( Y , ( ~ ~ ) ~ Y M~ ~, ~w i)t h~ AX
9
9

Tx N

(4.4)

And i n local coordinates this expression suggests Gauss' i t e r a t i o n method:

This scheme is thus the multivariate generalization o f (3.7), and it consists of successively solving a
linear least-distance adjustment problem u n t i l condition (4.2) is met.
I n order t o understand the local behaviour of Gauss' method we shall now proceed i n a way similar to
t h a t of the previous section. One of the problems, however, we have t o deal w i t h is the increase i n
dimensions. Nevertheless, the l i n e a r i t y of the local r a t e of convergence of Gauss' method (4.5) is
easily shown. F r o m Taylorizing

about the least-squares solution follows namely

And since

we get

which proves our statement. Thus, f o r points close enough t o the solution the coordinate-differences
of t h e c u r r e n t point
previous point

wl

and t h e solution jc depend linearly on t h e coordinate differences of the

and 2.

Upon comparing (4.6) with our univariate result (3.26) we s e e t h a t we still lack a proper geometric
interpretation of t h e convergence f a c t o r of Gauss' method (4.5) although we c a n expect t h a t in some
way t h e curvature behaviour of t h e submanifold

at

will be involved. To make this s t a t e m e n t

precise i t s e e m s appropriate t h a t we look f o r t h e multivariate analogon of

42. The Gauss' equation

as given in (3.23).
To do so, we first recall t h a t t h e connection D of M satisfies

for all smooth functions f , g :

+ IR

and vector fields V,W on M ; t h a t i t is torsionfree, i.e.

for all vector fields V,W on M ; and that i t is metric, i.e.

f o r all vector fiel.ds U,V,W on A(.


We say t h a t a vector fie1.d U on M is an extension of a vector field Z on N , if U restricted t o
equals t h e pushforward of Z on
U

,N

i.e.

= Y*(Z),

or in components

Now, l e t X,Y and Z be three vector fields on N and l e t V, W and U be their extensions. As in (3.23),
we then decompose DVW restricted t o

Dvw I i j

i,

into a tangential and normal part:

= ~ a n g . ( ~ -w)

+ Norm.

V IN

(DW -1
V IN

With t h e connection properties (4.7), (4.8) and (4.9) of D we can then derive the following properties
for

and t h e normalfield B (see e.g. Spivak, 1975):

(i)
L e t f and g be smooth functions on M and denote their pullbacks by

f = f o y,

g = g o y.

i and g

respectively, i.e.

F r o m (4.71, (4.10) and (4.11) follows then that

or

Hence,

V- i y = i x ( i ) y +
fX

igvXy

~ ( i x , ; ~ )= i g

and

Since additivity is t r i v i a l t o prove, these t w o equations show t h a t

B(x,Y).

(4.12)

V defines an affine connection on

N and that B is bilinear i n i t s arguments.


(ii)
F r o m (4.8) follows that
( D W-D V) - = (VW-W)
V
W IN

IN

(via iwj

av

ajl hi

And w i t h (4.10) this gives

= y*(XY-YX)

Hence, w i t h (4.11) we have

Y*(VXY)

B(X,Y)

y*(V?)

B(Y,X)

= yf ( m - Y X )

or
VXY

V?

= XY

YX

and

B(X,Y)

But this shows that the torsionfreeness of D implies t h a t


i t s arguments.

(iii)
F r o m (4.9), (4.10) and (4.11) follows that

And since

= B(Y,X)

(4.13)

V is torsionfree and that B is symmetric i n

it follows t h a t also

Concluding, (4.12),

(4.13) and (4.14) taken together show t h a t

is metric, i.e.

is an affine, torsionfree and m e t r i c

connection o f N and t h a t the normalfield B is bilinear and symmetric i n i t s arguments. Hence,

is

the unique Riemannian connection (also known as the induced or L e v i - C i v i t a connection) o f N which
is completely described by t h e induced m e t r i c

( .,.X

Those f a m i l i a r w i t h Gaussian surface theory w i l l probably recognize the connection


we show how the connection coefficients

rYaB'

that p a r t i a l derivatives commute, i t follows f r o m (4.13) f o r X =

aa '

Y =

aB,

Z =

aY

more easily i f

defined by

can be computed f r o m the coefficients o f the induced m e t r i c tensor

With X =

aa '

( .,.

Y =

aB '

. Since we assume
that

and (4.15), we can w r i t e (4.14) as

Cyclically permuting the indices gives then three equations which, w i t h (4.16) show t h a t

This is Christoffel's second identity well known f r o m surface theory.

The decomposition formula

which brought the above derived properties o f


I t s complementary counterpart, i.e.
normalfield, N say, on

fi,

and B about is known as Gauss' e q u a t i o n

Weingarten's equation,

is obtained f r o m applying DV t o a

followed by an orthogonal decomposition:

And w i t h a similar derivation as used above one can show t h a t KN(X) is bilinear i n N a n d X, and t h a t

I-

D is a m e t r i c connection f o r the normalbundle T

We shall now show how equation (4.18) f o r V = y

(a 1,

r a

of

fi

i n h4 (see e.g.

W = yr(ag ),

Spivak, 1975).

i.e.

specializes t o the f i r s t equation of (3.23) i f we assume n = l , and replace y by c and

aa b y dt.

W i t h these assumptions, (4.20) becomes

(We have given BC the subindex "c" t o emphasize t h a t the normalfield BC belongs t o the spacecurve c
viewed as a one dimensional manifold).
With (4.16) and (4.17) i t follows that the f i r s t t e r m of (4.21) can be w r i t t e n as

F o r the second t e r m o f (4.21) it follows f r o m

and (4.19), (4.21) and V =

Hence,

if

we

that

d d
1
B = B t ) ,
N=
where
c dt dt
C
we get f o r the second t e r m of (4.21) t h a t

ry

,-

put

d
= kl( t ) dt

ry

is

unitnormal,

and

Thus, w i t h (4.22) and (4.23), (4.21) can be w r i t t e n as

d
c (-1
r dt

since g(t) = (sl(t))'

d
(-1
rdt

1 - 1
dg
d
= c (- g ( t )
( t ) Z) + g(t)kl(t)N1
r 2
dt

and V = c (-)
r dt

.And (4.24) is indeed the equation which we already derived i n

(3.23).
N o t e f r o m comparing (4.20) and (4.24) t h a t

(S'

(t))

-1

dt

sl'(t)

generalizes t o

B:r

generalizes t o

B( aa,

and
2
( S ' ( t 1) kl(

t INl

ayp

aB).
fi

is contained i n the normalfield

o f a curve CAR + Ed

can be obtained f r o m the

Hence, we can expect t h a t the curvature behaviour of submanifold


B. L e t us therefore study

4.3.

The normalfield B

i n more detail.
According t o (4.23),

the f i r s t curvature kl(t)

normalfield BC through

Now i n order t o find the proper multivariate generalization o f this expression, one of the problems we
have t o deal w i t h is the increase i n dimensions. We can, however, get round this d i f f i c u l t y i f we
consider t w o curves, one i n N, which we denote by c:l
denote by c2: IR

QC

M.

BC

IR + N , and one i n

And furthermore we assume that

= y o c

1'

M,

which we

Thus we have the

following situation

With the connections D and

2
of M and N respectively, we can then apply the univariate Gauss'

decomposition formula twice. Namely t o curve cl and t o curve cg. With

V = c

d
(-)
2% d t

and X = c

d
(-)
1%d t

this gives

D V = c ((s'(t))-'s"(t)
V
2%

d
2
it)
+ ( ~ ' ( t ) ) k2,1(t)%,1,

(4.26.a)

and

are the f i r s t curvature and f i r s t normaI o f curve c 2 i n M, and k,l


where k2,1 and N
2,l
the f i r s t curvature and f i r s t normal of curve cl i n N.
N o t e t h a t the arclength parameter

V = y (X).
%

is equal for both curves since

and NlS1 are

c 2 = y o c.l

Application o f the multivariate Gauss' decomposition formula gives then

Hence

And substitution o f (4.26.b)

gives

F r o m comparing (4.27) w i t h (4.26.a)

follows then t h a t

Hence, f o r curve c 2 which lies entirely i n c M the normalfield B equals the orthogonal component
I2
of ( S ' ( L ) ) k2,1(t)N2,1.
Thus f o r an arbitrary unit normal N E T N we have

since ( s l ( t ) )

=(x,x).

(t)N2,1,NX(
the extrinsic curvature o f curve c 2 w i t h respect t o the unitnormal N and
2,l
denote it by kN(t) (the f i r s t curvature k
(t) of curve cl i n N is sometimes called the intrinsic or
1,l
geodesic curvature). We can now w r i t e (4.28) as
We c a l l ( k

and this expression can considered t o be the proper generalization o f (4.25).

As a consequence o f the

increase i n dimensions we thus see t h a t t o every combination of a tangent vector X and a


normalvector N, there belongs an extrinsic curvature kW
Tangent directions f o r which the extrinsic curvature kN attains extreme values are called principal
directions w i t h respect t o the unitnormal N. And the corresponding extrinsic curvatures are called
principal curvatures w i t h respect t o N. Thus i n order to f i n d the principal directions- and curvatures
f o r a chosen unit normal N, we need the extreme values of the r a t i o

Recall f r o m linear algebra t h a t this problem reduces t o the eigenvalue problem

The

eigenvectors

determine

then the

mutually orthogonal principal directions and the

corresponding eigenvalues the principal curvatures.


We w i l l denote the n principal curvatures f o r the normal directions N by
assume t h a t

r
h,
r = l , ...,n,

and

r=l,.
The corresponding mutually orthogonal principal directions are denoted by X
J'
o f submanifold N = y(N)
F o r l a t e r reference we define the mean curvature

..,n.
for the normal

direction N as the average trace o f B:

and the unique mean curvature normal

where Np, p=l,

of

as

...,(m-n) is an orthonormal basis o f

1T N.
Y

Now t h a t we have found t h e geometric interpretation associated w i t h the normalfield B, l e t us r e t u r n


t o our nonlinear least-squares adjustment problem and apply our results t o obtain a geometric
interpretation o f

4.4.

The l o c a l r a t e o f convergence

o f Gauss' i t e r a t i o n method.
Recall f r o m (4.6) t h a t

But Gauss' decomposition formula states that

Hence,

since y

1-;
E T.N.
Y

Thus we see that indeed the extrinsic curvatures o f submanifold


direction y

-i
S

at

w i t h respect t o the normal

govern the local convergence f a c t o r o f Gauss' method. We can r e w r i t e (4.34) i n a

f o r m which b e t t e r resembles our univariate result (3.25) if we make use o f the eigenvalue problem
(4.30).
T N.
X

r =

Assume therefore t h a t X
,,

...,n,

l,

forms an orthonormal basis o f principal directions i n

Then

( ~ ( a ,a

a B

<

) ,N)$~

a0 r

(no summation over r).

With

and

expression (4.34) can then be w r i t t e n as

Hence, we have

r
q+l =

r
uq

(G

0(ut6 U )
q ts q

r=l,...,n

(no summation over r).

Compare this w i t h our univariate result (3.26).


With (4.35) we are now able to generalize some of our conclusions of section three:
(i)

If

and

xo is sufficiently close t o 21,


then the sequence

(ii)

{xd

generated by Gauss' method converges t o 21.

The local convergence behaviour of Gauss' method is determined by the combined e f f e c t of


the curvature behaviour of submanifold

(iii)

i,

at

and the residual vector y

Since the extrinsic curvatures are a property of the submanifold


convergence

behaviour

of

Gauss'

method is

invariant

to

any

-i.

itself,

admissible

the local
parameter

transformation. Hence, we cannot expect t o speed up convergence i n general by choosing a


particular parametrization.

(iv)

Gauss' method has a local linear r a t e of convergence. F r o m (4.35) follows t h a t

Hence, the local convergence f a c t o r (Icf.) of Gauss' method reads

I c f . = m'.{

I I.,I l l l

I I.,l I I l

N o t e that since (B(

aa, ag) ,N)~need

n o t be positive definite, the extrinsic curvatures can

either be positive, zero or negative. But they are always real, since B is symmetric i n i t s
arguments.

(v)

F r o m the geometry o f our non-linear least-squares problem follows t h a t the solution $ is a


1
1 1 - 1 1 0 The f a c t t h a t is a s t r i c t local minimum
s t r i c t local minimum i f l -

does however not ensure local convergence of Gauss' method. See (4.36).

(vi)

If

then the observation point ys and the w i t h

< 0,

curvature l i e on opposite sides o f the submanifold


target

along the principal direction X,

i n each

D.

corresponding centre of

Consequently, one w i l l overshoot the

iteration step i f

< 0.

Hence, the

i t e r a t i o n w i l l then show an oscillatory behaviour along the direction X,.


Similarly, one w i l l have an undershoot along the direction X, i f

> 0 (see also example 3

o f the previous section).

A n interesting point o f the above conclusion (vi) is that i t indicates the possibility o f adjusting the
steplength i n each i t e r a t i o n step w i t h the aid of the curvature behaviour o f

D,

so as t o improve the

convergence behaviour (4.35) of Gauss' method. L e t us therefore pursue this argument a b i t further.
Instead of (4.5) we take

where AX

is provided by Gauss' method and t is a positive scalar, chosen so as t o adjust the


q
9
steplength. Instead o f (4.35) one would then get

...,

r = l, n; no summation over r.
As could be expected, it follows f r o m (4.39) t h a t the scalar t

should be chosen less than one i f a l l


9
extrinsic curvatures are negative, and greater than one i f a l l extrinsic curvatures are positive. Now
l e t us investigate what the optimal choice o f t would be. Since the i n absolute value largest
9
r
coefficient of U
r=l,.
,n, i n (4.39) is given by
9'

..

it follows t h a t the optimal choice o f t

is given by the solution of

1
min.[mx.{I(l-t
) + ($,,N,Y~-Y)$~t q l , l ( l - t ) + ( $ N , Y ~ - Y ) $ ~
t q l}).
9
9
t >O
9
F r o m figure 28 follows then, t h a t i f $ is a s t r i c t local minimum, the optimal choice f o r tq is:

figure 28

Substitution of (4.40) i n t o (4.39) gives then

And f r o m this follows t h a t the smallest attainable linear convergence factor (Icf.) f o r Gauss' method
w i t h a l i n e search strategy is given by:

1 n
((~~,,,)N,Y~-Y)~~
Icf. =

N o t e that although now local conver ence is guaranteed i f


can s t i l l be very slow; namely when

?-G

iis a s t r i c t local minimum,

convergence

>> 0 f o r instance.

The above discussed Gauss' method w i t h the optimal choice (4.40)

is of course not practical

executable as such, since we generally lack the curvature information needed. Nevertheless, the
above results are o f some importance since w i t h (4.42) we have obtained a lower bound on the linear
convergence factor attainable f o r Gauss' method w i t h a line search strategy. This means that when
one decides t o use a l i n e search strategy i n practice, one should choose a strategy which gives a r a t e
of convergence close t o (4.42).
Apart f r o m the minimization r u l e which w i l l be used i n the next section t o establish global
convergence, we shall not discuss i n the sequel any o f the existing line search strategies. F o r details
the reader is therefore referred to the relevant literature (see e.g. Ortega & Rheinboldt, 1970). Our
decision of not including a discussion on various line search strategies is mainly based on the

following important conclusion:

(vii)

1 n
( ( w k ) ~ , y ~ - y ) is~ small,
~
then t = l is a good choice f o r a line search strategy (see
9
(4.40)). Hence, f o r small residual adjustment problems and moderately curved submanifolds
If

fi , Gauss'

method without a line search strategy has a close t o optimal r a t e o f convergence.

I n fact, i f either B

0 or ys = y, one must choose t

convergence behaviour.
(viii)

'4

= 1 i n order t o assure a local quadratic

F r o m (4.35) follows t h a t Gauss' method has a local quadratic convergence behaviour i f either
the normalfield B vanishes identically on

fi,

i.e.

B % 0, or y

E
S

N,

i.e. ys =

G.

Submanifolds

f o r which B G 0 are called t o t a l l y geodesic. This as a generalization o f the concept of a


geodesic ("straight line") f o r which the f i r s t curvature vanishes identically.
The local quadratic convergence behaviour is described by

O f course, we s t i l l have t o prove (4.43). B u t i t is reasonable t o expect t h a t (4.43) holds, since we know
f r o m the previous section t h a t f o r geodesics Gauss' method has a local quadratic convergence
behaviour w i t h convergence f a c t o r

-$ ( S ' (;))-'S''(;).

And we also know t h a t

generalizes t o the C h r i s t o f f e l symbols of the second kind


I f B-0,

then T

(S'(t))-lst'(t)

rYaB'

which means that our actual adjustment problem is linear. Hence, i f B C 0 then

f r o m which follows t h a t

This already shows that indeed the convergence behaviour w i l l be the same i f either B c 0 or ys =
holds. Remember t h a t i n both cases we are actually solving the inverse mapping problem: given

f o r sure y l ~ hi
f i n d the pre-image k under map y. To prove the
i = y + PTM,Tlil(ys-yl)
quadratic convergence behaviour (4.43), we Taylorize the right-hand side of

about the least-squares solution

i. With (4.45)

and

2q + l = xBq + AxB4'

B u t according t o Gauss' decomposition formula we have

this gives

And since

this gives

Hence, w i t h (4.46) the quadratic convergence r u l e (4.43) follows.


(ix)

As

another

generalization o f the univariate case we have t h a t the local quadratic

convergence r u l e (4.43) is not invariant t o nonlinear reparametrization. This follows f r o m the


f a c t t h a t the C h r i s t o f f e l symbols are not the components of a tensor.
With the reparametrization X

FY={ r

a a
(X

their transformation law reads namely as

),

a+ x a x

I-.

Note t h a t this is the generalization of the easily verifiable tranformation rule

With respect t o the univariate case there is however one big difference. I n the univariate case we
could always f i n d a parametrization f o r which (sf(t))-'sl1(t)

would vanish identically.

I n the

multivariate case however this is only possible i f B GO. The explanation is t h a t i n the univariate case

T N and N are identifiable irrespective the curvature o f the space curve c, whereas i n the

t
rnultivariate case T N and N are only identifiable i f B E 0. Namely, only i f B E 0 can one f i n d a
X

parametrization f o r which

( .,. )N

reduces t o the standard m e t r i c globally.

Nevertheless there do exist parametrizations f o r which the Christoffel symbols


Coordinates f o r which the C h r i s t o f f e l symbols vanish a t a point,

X,

Y
raB
vanish

locally.

say, are geodesic polar

coordinates.
The procedure of finding geodesic polar coordinates is the following:
According t o the theory of ordinary differential equations a geodesic c ( s )

through a point X, i s
d
and the tangent vector c (-)
at
o
d
x ds
X.,
Hence a point X = c ( s ) E N on this geodesic can be identified by
c (-) a t X, and S. O r i n
a
a
x ds
coordinates: the point X E N w i t h coordinates X = c ( S ) can be identified locally w i t h the point
locally uniquely characterized by the coordinates o f

Tx N having coordinates
0

= c(o)

Thus, since the geodesic c ( s )

is locally uniquely characterized by xo = c(o)

and

a t xo,

there exists locally a diffeomorphism f r o m N i n t o Tx N . L e t us denote this map i n coordinates by


0

F r o m the Taylor expansion o f c ( s ) ,

follows then w i t h

that

Or w i t h (4.481,

The inverse o f this relation gives then

as the desired expression f o r (4.49).


We can now view (4.50) as a nonlinear parametertransformation. It is admissible since the Jacobian
determinant equals 1 a t xo. The new coordinates

are known as geodesic polar coordinates.

I n these new coordinates t h e geodesic c(s) is found as the solution o f

where t h e new C h r i s t o f f e l symbols


coefficients

Pa
BY

vanish a t . ,X

Fa
BY

follow from (4.47) using (4.50).

B u t as is easily verified the

Hence i n a neighbourhood o f xo the geodesic c ( s )

is given i n

geodesic polar coordinates as

F r o m the above discussion follows t h a t i f the coordinates


at

by chance, and B f 0 but ys =

y,

a
X

i n (4.46) are geodesic polar coordinates

then Gauss' method has a local t h i r d order convergence

behaviour. N o t e by the way t h a t since the geodesic polar coordinates

are linear i n

we are

indeed dealing here w i t h the proper multivariate generalization o f the case considered i n the previous

section where the univariate parameter t was chosen as linear function o f s so as t o eliminate the
necessity o f i t e r a t i o n f o r solving the inverse mapping problem.

4.5. Global convergence

the above l o c a l analysis o f Gauss' method we have seen that both the i n i t i a l guess xo had t o be
r
1 had t o hold f o r a l l r = l,..,n
i n order
f f i c i e n t l y close t o the solution ic and I
N,y g-y)9M I

(&

assure convergence. F o r most practical problems we indeed believe that these conditions are
satisfied. Nevertheless, it would be dissatisfactory not t o have an i t e r a t i o n method which guarantees
convergence almost independently o f the chosen i n i t i a l guess and curvature behaviour o f the
submanifold

i.

I n the following we w i l l discuss therefore the necessary conditions which assure


does not r e f e r t o 2, but t o the almost

global convergence. N o t e t h a t the adjective "global"

independency o f the i n i t i a l guess xo, i.e. usually one w i l l have global convergence t o a l o c a l minimum.
The method we w i l l discuss is essentially the above discussed Gauss' method, but now w i t h the socalled minimization rule as line search strategy. I n formulating the method we have chosen t o s t a r t
f r o m some general principles so as t o get a better understanding o f how the various assumptions
contribute t o the overall proof o f global convergence.
As a start we assume

t h a t we are given a sequence


f o r a l l q = 0,1,

...

{X

f o r which E ( x

q+l

E(x ),
9

This seems a natural conditon t o s t a r t w i t h since we are looking f o r an iteration method which can
locate a local minimum o f E. F r o m (4.52) follows that the sequence { ~ ( x)} converges t o a l i m i t ,
9
since the sum o f squares function E is bounded f r o m below ( 0 2 E ( x ) , VX) and the sequence
{ ~ ( x ~ )is} non-increasing.

NOW,i n order t o find an appropriate i t e r a t i o n method which generates a sequence { x d satisfying


the conditions o f (4.52), we f i r s t need t o know, given an i n i t i a l guess, i n which direction t o proceed. I n
ordinary

vector

..,n.

analysis

the gradient

of

a scalar

f i e l d E is defined as the

vector

field

And i t is w e l l known t h a t -3 E points i n the direction i n which the function E


aaE,
a
decreases most rapidly locally. I n view o f (4.52) i t seems therefore appropriate t o proceed i n the
ad,.

direction o f -3 E. However, this ordinary definition o f gradient is not invariant under a change o f
a
coordinates. With our geometric exposition o f the preceding sections i n mind we can therefore expect
t h a t the simplicity of the ordinary vector analytic definition o f the gradient almost inevitably forces
difficulties and awkwardness when problems involving change o f coordinates are encountered. A way
out of this dilemma is offered i f we bring the requirements o f invariance under change o f coordinates
t o the foreground. Therefore, given a function E:
E, invariantly by

N + IR

we define the gradient field, denoted by grad

( g r a d E,X )N

= X(E) f o r a l l vector fields X on N.

I n local coordinates this expression reads as

(grad

aB

+a

E.

And this gives

aB

( g r a d E ) =~ g

ag.

Since the direction f o r which

is minimized as function o f Ax $ 0 , is given by

= Ax(x) = - g r a d E ( x ) c TxN

it follows t h a t Ax(x)

(4.55)

points i n the direction o f maximal local decrease o f E. N o t e that since

. .

~,E(x)

-say i( x ) g . J.(y;-yJ(x)),
1

the vector

equals the incremental step as produced by Gauss' method (4.5).

Hence, both the geometry o f our non-

linear least-squares problem as w e l l as the f a c t t h a t -gradE points i n the direction o f maximal l o c a l


decrease o f E, suggest t h a t the vector Ax(x)
direction o f search. However, although Ax(x)

as given by (4.55) is an appropriate choice f o r the


points i n the direction o f maximal local decrease o f E,

this does not necessarily imply t h a t t h e function value o f E(x) decreases by taking Ax(x)

as

incremental step. I n f a c t we already saw i n the previous section that the descent property only holds
if

is moderately curved and

sufficiently close t o

;. So, we s t i l l need a rule according t o which we

f r o m X Nevertheless, the above discussion is not without meaning


can compute an appropriate X
q+l
q'
since by agreeing upon taking Ax(x ) as the direction o f search we have reduced the dimensions o f
q
t c [R + N w i t h
our problem essentially from n t o 1. That is, by choosing a curve c
q'

c (t=O) =
9
we can define

and c

q+l by X q + l

(-1

qrt d t

= Ax
X

= Ax(x ) = - g r a d E ( x ),
9
q

= c ( t ), where t is an appropriate scalar so t h a t


q q
9

holds. That such a scalar exists is seen as follows. Since

E ( c ( t ) ) - E ( c (0))
dc
'l
q
d
li m
t 'l
= a aE ( q~ )d t- ( O ) = ( g r a d E , ~ ~ ( ~ ) ) ~ ~
t 4
'l
d
i t follows w i t h c (-)
= Ax(x ) =
g r a d E ( X ) , that if Ax(x ) f 0,
qr d t X
'l
'l
'l

'l

E(cq(t))

li m

E(c (0))

- (grad

t 4

< 0.

E, g r a d E ) x

'l

) f 0,

there exists a 6 > 0 so t h a t E ( c ( t ) ) < E ( c ( 0 ) )


f o r a l l t E (0,6).
'l
'l
Thus i f X is not a c r i t i c a l point of E i t is always possible t o choose a positive scalar t so that
'l
q
Hence, i f

&(X

'l

so t h a t the maximal possible decrease i n E is obtained. This is the


'4
is chosen so as t o minimize E along the curve c ( t ) . That is, when t is computed as
q
'l
'4
the scalar satisfying the minimization rule

It seems appropriate to choose t

case when t

E ( c ( t ) ) = min. E ( c q ( t ) ) .
tx
'l q

(4.58)

So f a r we did not specify the type o f curve c ( t ) chosen. The simplest way computationwise would
'l
be t o choose the curve c ( t ) so that i t s coordinate functions are given by
'l

B u t other choices are also possible. And since the particular type of curve chosen is not important f o r
our convergence analysis, we just assume that a rule is given which smoothly assigns a unique curve
c : t ~ l R +N t o every point X so t h a t the i n i t i a l conditions (4.56) hold. That is, we assume t h a t the
q
a
q
coordinate functions c
*l, ,n,
of the curve c are smooth functions o f not only the
'4
'l
parameter t but also of the i n i t i a l conditions. Instead o f c ( t ) we may therefore w r i t e
'l
c ( t , x ,Ax(x ) ) and by Taylor's formula we have
'l
'l

...

where the smooth functions $a depend on the r u l e given.

Summarizing, we have come up w i t h the following iteration method:

Choose a n initial guess

(i)

X,

and s e t q = 0. Choose a rule which smoothly

assigns a unique curve c : t E IR


9
t h e prescribed initial conditions

c (0) =
9

N t o every point

and c (-1
9
qr d t

= Ax(x )
X

with

f 0.

grad E ( x ). If Ax(x ) = 0 then stop.


9
9

(ii)

Compute Ax(x ) =
9

(iii)

C o m p u t e t h e scalar t satisfying E ( c ( t ) ) = min. E ( c q ( t ) ) .


q
t > O

(iv)

Compute

q+l

= c ( t ) and set q = q+l. R e t u r n t o (ii).

T h e sequence { X ) generated by (4.60) is e i t h e r finite or infinite. If i t i s finite then clearly i t s limit


a
is a critical (or stationary) point of E by virtue of t h e stop command in (4.60). But if i t is infinite then
t h e only thing we know f o r sure is t h a t t h e sequence {E(X ) ) has a limit. I t is important t o realize,
a
however, t h a t this by itself implies nothing about t h e val;dity of t h e final convergence s t a t e m e n t
= i, with i being a critical point of E. This is best
9 x
s e e n by means of a n example: T a k e E ( x ) = m.e , where m is a real-valued constant, and X = 2-9,
9
Then l im. E ( x ) = m and l im. X = 0, but X = 0 is clearly not a critical point of E.
S+"
9
P
O
'
9
In f a c t , t h e convergence of t h e sequence { ~ ( x - ) ) does in general not even imply t h e convergence of

which we s e t out t o prove, namely t h a t $g)-

II

t h e sequence { X ). Therefore, in order t o assure t h a t t h e sequence { X ) a s generated by (4.60)


9
9
converges t o a critical point of E, we assume in addition t o (4.52),
t h a t t h e initial guess
L ( x )= { X
0

X,

is chosen such t h a t t h e level set

E ( x ) 5 ~ ( x ~ is
) }bounded, and t h a t t h e function

values of E a t critical points in L(xo) a r e distinct.


With (4.61) w e a r e now in t h e position t o prove t h a t t h e sequence { X ) converges t o a critical point
9
of E. We will assume t h a t t h e sequence { X } is infinite.
9
According t o (4.52) w e have E ( x
) 5 E ( x ) f o r all q=0,1,2
Hence, X E L(xo) f o r all
q+l
9
9
q=0,1,2
And since t h e level set L(xo) is bounded by assumption, i t follows t h a t { X ) has a t l e a s t
q
one convergent subsequence, say { X ) , where q .
> q . , and with limit !im. X
= X.
91+1
I
l+qi
We shall now proof by contradiction \hat
is a critical point of E. Assume t h e r e f o r e t h a t 2 is not a

... .

... .

critical point of E.
We d e n o t e t h e unique curve assigned t o i by c ( t , & , ~ x ( i ) ) , and t h e positive s c a l a r t satisfying
h

~ ( c (,X,AX(;)))
t
= min. ~ ( c ( ,;,Ax(;)))
t
t > O
assigned t o a n arbitrary point X
E ( c ( t f ,x,Ax(x))) =

by
by

rpj~.E ( c ( t , x , A x ( x ) ) )

Now we define a function F(x) a s

i = t(;). Similarly, we denote t h e unique curve


c(t,x,Ax(x));
and t h e scalar t' satisfying

by tl=t(x).

Since F(x) is continuous by inspection and

im. x
1-cm

qi

i t follows that

X,

F r o m the definition o f the l i m i t of a convergent sequence (see e.g.


t h a t f o r every

W. Flemming, 1977) follows then

> 0 there exists a positive integer r such t h a t

)
i

IF(X
Since we assumed

Hence, we can take

~ ( i I)

for every i

r.

t o be a non-critical point, we have

> 0 i n (4.63) t o be

F ( -~)
ql

5 $ F(X) <

for every i

I.

IF(;)

This gives us then

From
E(c(t(x),x,

x(x)))

E(c(t,x,

E(c(t(x),x,

x(x)))

E(x)

x(x)))

or

F(x),

follows then t h a t

or

~ ( c ( t ( x ),xqi,
qi

E ( ~ ~ <~ E+( x ~ .)) +


91
With E ( x

)
'i+l

x(xqi)))

5 E(x

E ( x .) < F ( x .)
q~ q1

w i t h F(;)

F(;),

5 71 F ( x ) <

f o r every i

0,

every

r 9

r.

) follows that
qi+l

E(xq

) 5 E ( x -)-f I F(;)
i+l q I

with

F(;)

0,

f o r every i > r.

Hence,
lim. E(x ) =
j -cm
qi

-a

(4.65)

;is not

a c r i t i c a l point then (4.65) must hold. B u t this contradicts the f a c t that { ~ ( x- ) )


ql
converges t o E(x). Hence, jc must be a c r i t i c a l point of E.
Thus i f

) itself converges t o a c r i t i c a l point of E, suppose that ;and are


q
must be c r i t i c a l
distinct l i m i t s of t w o convergent subsequences of { X ). We know then t h a t and ;
q
B u t this contradicts w i t h
= E(;).
points of E. And since {E(X ) ) converges, we must have E(;)

To prove t h a t the sequence

{X

our assumption that the c r i t ~ c a values


l
of E are distinct. Hence we must have t h a t
means t h a t the sequence

{X

i t s e l f converges t o a c r i t i c a l point.

;=

;( which

This concludes the proof o f the following global convergence theorem (Ortega & Rheinhold, 1970):

L e t an i n i t i a l guess xo be chosen such that the level set L ( x ) = { X


0

E(x)

E(x ) }
0

is bounded, and l e t the function values o f E be distinct a t c r i t i c a l points i n

} defined by (4.60) is either f i n i t e and terminates


q
a t a c r i t i c a l point o f E, or it is i n f i n i t e and converges t o a c r i t i c a l point, i.e.
L(xo).

Then the sequence

l i m xq =

{X

w i t h g r a d E ( x ) = 0.

c!To conclude this section we w i l l prove the following result on the r a t e o f convergence o f the globally
convergent i t e r a t i o n method (4.60):

If

then

1~

Y ~ - Y 1 (lM~- ~1 lys-91
~ )

l im.

1 n 2

lM
2

($,,-$,,l

1 lys-$1 lM

<-

2'

) one should f i r s t decide upon a descent


q
Fortunately a l l methods for selecting such a curve are asymptotically

R e c a l l f r o m (4.60) t h a t i n order t o generate the sequence

{X

,&(X
)).
q
9
equivalent i n the sense t h a t the curves are a l l tangent a t the starting point

curve c ( t

,X

That is, as the


X
q'
stepsize goes t o zero the methods a l l move approximately along the same curve, which implies t h a t

} are independent o f the type o f curve chosen provided


q
t h a t the i n i t i a l conditions (4.56) hold. Hence, f o r the determination o f the local r a t e o f convergence
the asymptotic properties o f the sequence

{X

we are f r e e i n choosing the type o f curve c ( t ) . F o r convenience we w i l l assume therefore t h a t the


q
descent curve c ( t ) is a geodesic.
q
Now, before we prove (4.67) we w i l l f i r s t prove t h a t the linear map H: T N + T N defined by
X

HX =

VXg r a d E

f o r a1 l

TxN,

satisfies

(H X,Y)

= (x,Y)~ (B(x,Y),N)~

f o r a1 l X,Y

F r o m (4.70) and the definition o f the pushforward o f grad E

follows that

And w i t h

D
y*(X)

( y -y) = -y*(X),
S

T N,
X

where

this gives

y ( g r a d E)
-y, (X) = -D
Y*(X)

+D

y*

Hence,

(x,Y),

(Y*(x),Y*(Y))~

= ( D y * ( x ) ~ * ( g r a d E)

Y*(Y))~

(Dy (x)N, Y * Q ) ~ *

(4.71)

Since,

O=D

Y*(X)

( N, y*(Y)

)M = (Dy (x)N, y*(Y) )M + ( N,D

we can w r i t e (4.71) also as

Two times application o f Gauss' decomposition formula (4.18) gives then

which proves (4.69).

With (4.29), i t follows f r o m (4.69) t h a t

B u t for X = g r a d E ( x ), this is precisely t o a f i r s t order approximation the inverse of the scalar t


'l
9
satisfying the minimization r u l e E ( c ( t )) = m i n E ( c q ( t ) ) . To see this, take a plane section of the
'l q
DO
submanifold
through the points Y ( X 1, Ys and y ( x )
y*(grad E ( x )), and approximate the
'l
'l
'l
resulting plane curve by i t s circle o f curvature (see figure 29).

y(cq(t))
figure 29

I n a neighbourhood o f y(x ) this c i r c l e o f curvature can then be considered as a sufficient


q
approximation o f the curve y ( c ( t 1). Since the curve c ( t ) is a geodesic by assumption i t follows
q
q
that

is constant along the curve. Hence,

the parameter o f arclength, is proportional t o t. Since

S,

i t follows therefore that

(1

grad E(x )

(1

t = lly*(grad E(x

1) I l M t

Furthermore we know that the scalar t satisfies the minimization rule. Therefore
9

must hold. F r o m figure 29 follows then t h a t

where

N1 is the f i r s t

normal o f y (c ( t 1)

With (4.73) follows then

Compare w i t h (4.72).
To make relation (4.75) precise we r e c a l l that geodesics are characterized by

VV=O,

with

V = c

d
dt

(-1.

F r o m t h e f a c t t h a t t satisfies the minimization r u l e follows then

And w i t h (4.68) and VC (o) = c


q

d
(-

qr d t

grad E(x )

this gives

( w a d E, g r a d
t

1.

+ 0 ( IIgrad E l l x

(H g r a d E, g r a d E ) ~
q

Compare w i t h (4.75).
Now, t o continue our proof o f (4.67), we substitute (4.76) i n t o

E(c (t ))
q q

E ( c ( 0 ) ) = (grad E, V )
q
c

t + -(vVgrad

(0)

( g r a d E, g r a d

E,

v ) ~(o)

tq + O ( t

3
9

(o) t q +
q

+ -(H g r a d E, g r a d
2

and find

2
3
(o) t q + O ( t )
9

(grad E, g r a d E ) ' ~ hi
E(x

-1

E(x ) =
q

3
+0(IIgradElIX
q

(H g r a d E, g r a d

).

(4.77)

By assuming t h a t
can w r i t e
E(c(9))

and x are connected by a geodesic c ( s ) w i t h c(o) =

4'

9 +

E ( c ( o ) ) = (grad E,

1
?(Hw,

2
9

4'

V,#

where

= 0

0 =(gradE,
Hence,

4'

and c(9) =

0 (s3)

U)-X

=(gradE,U)

+(Hw,u)

q
2
O(Ilgrad E l l x N ) .

V# =

0)

along c ( s ) ,

9+0(sL).

4'

4'

Substitution i n t o (4.78) gives then


E(;)

~ ( x =)
4'

+ ( ~ ' g r a d E, g r a d E),

N + o(I l g r a d E l
q

).

And subtracting this f r o m (4.77) gives

2
( g r a d E, g r a d E ) ~

4'

(H g r a d E, g r a d E)

4'

(H-lgrad

E, g r a d
q

(E(X ) - E ( ~ ) * o (
q

X,

we

(4.78)

(-1.

-1
= - H grad E(x )
q

wx

4'

4'

d
ds
we have f o r an arbitrary parallel f i e l d U (i.e.

and W = c

g r a d E ( x ) = 0,

Since

l lgrad El l x
q

)l,

with Ax(x ) =
9

grad E(x ).

By assuming that jc

is a strict local minimum of

we can now apply Kantorovich' inequality to (4.80). Kantorovich' inequality (see Rao, 1973, p. 74)
states namely that if a linear map

is positive definite and selfadjoint with eigenvalues

for all normalized Ax

T N,i.e.
X

(bx,bx)

= 1.

Since the eigenvalues of the linear map H read

= 1 - < 1 [ y ~ - y 1 1 ~ ,r = l ,

...,n,

application of (4.81) to (4.80) finally gives the desired result (4.67).

5. Supplements and examples

In this section we will consider some examples to illustrate the theory developed in the previous
sections. Apart from the examples, we also present new results on the Helmert transformation and
give some suggestions as to how to estimate the extrinsic curvatures.

5.1. The two dimensional Helmert transformation

In subsection 3.6 we have seen that the solution of the Helmert transformation only admitting a

rotation could be found by orthogonally projecting the observation point onto a c i r c l e w i t h radius
equalling the square r o o t of the moment of inertia of the network involved. We w i l l now generalize
this result and consider the f u l l H e l m e r t transformation. That is, we w i l l assume the scale- and
translation parameters to be included as well.
O f couse, the solution to the two dimensional H e l m e r t transformation is w e l l known (see e.g. Kijchle,
1982). I t is therefore n o t so much our purpose to present the solution, but t o emphasize the geometry
involved. And the method chosen f o r deriving the solution prepares us for the case considered i n our
next example.
The model of the H e l m e r t transformation reads

where:

xi

yi

u.A cos 9

viA

sin 9

u.A s i n 9

viA

cos 9

i
9

Y i

...,

i = l, n = number of points,

are the cartesian coordinates of the network points i n the f i r s t coordinate


xi y
system, and

,v

are the coordinates i n the second coordinate system,

A , 9 , t x and t
are respectively the
Y
parameters, which need t o be estimated, and
ex i 9 eyi

scale,

orientation

and translation

are the errors to be minimized i n the 2-norm.

I f we w r i t e model (5.1) as

where:

our least-squares problem becomes


min.

E(A,9, t x , t y ) =

A999tx,ty

I l ys-

min.
A,9,t

X'

Acos9

Asin9

X 3

y 4

II

(5.3)
We shall solve (5.3)

by proceeding i n two steps. F i r s t we assume A

subproblem
min.
t
t
X'

E(A,9,tx,ty).

and 9 fixed and solve the

'.

Let t

( h , B)

( h , B ) denote the solution t o (5.4) and formulate the second problem as

min.

E(X,8,tx(X,B),t

198

,. .

(A,B)).

L e t X , 8 denote the solution o f (5.5). The overall solution o f our original least-squares problem (5.3)
is then

B y taking this two-step procedure we have separated our original four-dimensional least-squares
problem (5.3) i n t o t w o two-dimensional least-squares problems (5.4) and (5.5).
With the abbreviation

ys(X,8)

= ys

X cos B

X sin 8

(5.7)

2'

the f i r s t subproblem (5.4) becomes

X'

l lys(X,8) -

= min.

E(k,B,tx,ty)

min.
t
t

,t

txx3

t X I I'.
Y 4 M

(5.8)

And geometrically this problem can of course be seen as the problem o f finding the point i n the plane
spanned by the orthogonal vectors x3 and X 4 (as before we assume that the observation space is
endowed w i t h the standard metric) which is nearest t o y ( X , 8 ) . (see figure 30).
S

figure 30

and
x 4 are orthonormal, i t follows that the point i n the plane
3
spanned by x3 and x4 closest t o y ( X, 8 ) is
Since the two vectors

Hence,
(

(X

M)

1
= i
; ( ~ ~ , y ~ ( A , 8 ), ) ~

ty(A,8)

or w i t h (5.7)
tx(A99) =
t

( ~ , 8 )=

1
;
(x3,yS 1

;( x 4 , y S -

A cos 8

X1

A s i n 8 x2)
M

A cos 8

A sin 8

,
(5.9)

This concludes the f i r s t step.


To solve (5.5), we substitute (5.9) i n t o (5.3) and f i n d

min.
A,9

E(A,8,tx(A,8),t

(A,8))

where:
C

YS
C

-1

YS

= X

1
n (x4.ys)M

;(

1
;(x3*x2)M

( X ) . Y ~ ) ~

1
l-

A cos 8

X2

= min.11~
S
1, 8

A s i n 8 xC11
M

, (5.10)

1
x
~
1
;(x49x2)N

x4
~

t~

X4

The geometry o f problem (5.10) is illustrated i n figure 31.

l c c
-(X
R
2 'Y S)M

figure 31

1 c
1 c
C
C
- X
and - X
with
R = I I xl ( I
= 1 I X * 1 1 ,are orthonormal,
R 1
R 2,
C
C
C
it follows t h a t the point i n the plane spanned by X
and X closest t o y
is
1
2
S

Since the two vectors

icos 6

$
(x!,y;)
R

=W,
i

sin

('1.~1)~

= $(xq,y~)
= (x$,Y~)M
R
M ('$,'$)M

With (5.10') a n d (5.2') t h i s c a n b e w r i t t e n a s

8 = tan

n c c
C ".X -1 i = l I i

c c

C C

UiYi
9

X =

i=l

u.x

n c c
c c
C u.x.+ v y
i=l I I
i i
n
n
1 Y.
C X.
C
j=lj
c
j=lj
where: X = X - - Y . = Y - i
i
n
l
i
n

C C

+ V.Y.)
1 1

n c c
+ (.C v . x
1=1I i

c c
UiYi)

21

n
C U.
j=lj

c
U

= U - -

,v

c
i

= v.--.
I

n
C v
j=l j
n

T o find t h e least-squares solution f o r t h e translation p a r a m e t e r s w e s u b s t i t u t e (5.11) i n t o (5.9) and


find

With (5.10') and (5.2') t h i s gives

(5.12)

together w i t h (5.13)

constitute t h e w e l l known solution o f the t w o dimensional H e l m e r t

transformation (see e.g. Kochle, 1982).


N o t e t h a t although t h e functions occurring i n model (5.1) are non-linear, the actual adjustment
problem is linear. That is, the submanifold

as described by model (5.1) is a typical example o f a

t o t a l l y geodesic manifold. The non-linearity present i n (5.1)

effects therefore only t h e inverse

mapping problem. 'This follows f r o m (5.11) i f one solves f o r the parameters A and 8 .
Also note t h a t we are by no means restricted t o the particular two-step procedure chosen i n (5.4) and
(5.5). Instead o f taking the above two-step procedure, we could for instance have decided t o only f i x

8 first. I n the f i r s t step we would then have t o solve f o r A ( 8 )


t ( 8 ) and t ( 8 )
And this is
Y
s t i l l a linear adjustment problem. B u t when solving for 8 i n the second step, we would get a nonlinear adjustment problem namely t h a t o f orthogonally projecting onto a circle. Hence we see t h a t
where we started w i t h an essentially linear adjustment problem we end up w i t h t w o subproblems o f
which the second is non-linear. What has happened is o f course t h a t by fixing 8 we have chosen t o
project onto a non-linear curve l y i n g i n an otherwise f l a t manifold. Thus generally speaking such a
step procedure would not be very recommendable since i t produces a non-linear problem out o f a
linear one. A n interesting point is, however, that i f we reverse the argument one should be able i n
some cases t o get a linear subproblem out o f an essentially non-linear problem by applying t h e
appropriate step procedure. Think for instance o f parametrized curved submanifolds which have
linear, i.e.

straight, coordinate lines. I n the following we w i l l consider a typical class o f such

manifolds.

5.2. Orthogonal projection onto a ruled surface

A ruled surface is a surface which has the property that through every point o f the surface there
passes a straight line which lies entirely i n the surface. Thus the surface is covered by straight lines,
called rulings which f o r m a f a m i l y depending on one parameter.
I n order t o f i n d a parametrization o f a ruled surface choose on the surface a curve transversal t o the
rulings. L e t this curve be given by c ( t l).

A t any point o f this curve take a vector T o f the ruling

which passes through this point. This vector obviously depends on tl.

Thus we have T ( t l)

Now we can w r i t e the equation o f the surface as

The parameter tl indicates the ruling on the surface, and the parameter t 2 shows the position on t h e
ruling.
I f i n an adjustment context the submanifold
take advantage of the special properties o f

fl

n.

turns out t o be a ruled surface, one can expect t o

fl

w i l l namely be f l a t i n the directions o f the

rulings, whilst curved i n the directions transversal t o it. Hence, it might turn out t o be advantageous
t o perform the adjustment i n t w o steps. I n the f i r s t step one would then solve f o r a linear leastsquares adjustment problem, and i n the second step f o r a non-linear adjustment problem o f a reduced

dimension. F o r t h e ruled surface (5.14)

f o r instance, we would choose a point on t h e curve

t l)

C(

) say. To this point there corresponds a ruling w i t h direction T ( t )


The linear
1
1
least-squares adjustment step consists then o f orthogonally projecting the observation point ys onto
C(

the ruling given by

As solution we get an adjusted point on the surface which depends on the choice o f ruling, i.e. on the
choice t

1:

The second step consists then of orthogonally projecting ys onto the curve given by (5.16).

This

problem is o f course i n general s t i l l non-linear, but i t has the advantage o f being o f a smaller
dimension than the original adjustment problem.
As an example one could think o f a cylinder (this is i n f a c t a very special ruled surface, since i t is
developable). Then we have (see figure 32):

i= l

( t

) = R c o s ( t 1,
1
1

I n the f i r s t step we would choose t

;(ty)

1'

i= 2

( t ) = R s i n (tl),
1

i= 3
(tl)
1

This would give us then

yk'3~ .

= c(ty)

F o r the second step we would then need t o minimize

i= 3

min. (yS-yS
to
1

c(tl),

ys-

i= 3
yS T

c(tO))
1 $4

Ys

figure 32
It w i l l be clear t h a t the above described procedure also holds f o r ruled-type o f manifolds.

= 0,

5.3.

The t w o dimensional Symmetric H e l m e r t transformation

As a nice application of the idea described i n the previous example we have what we shall c a l l the
t w o dimensional Symmetric H e l m e r t transformation.
Recall the model of the t w o dimensional H e l m e r t transformation (see (5.1)) and note that the model
i n i t s classical formulation favours one point f i e l d above the other. This can also be seen f r o m the
rather asymmetric solution o f the scale parameter (see (5.12)).

It has bothered the present author for some t i m e t h a t one was satisfied w i t h the classical formulation
(5.1). A b e t t e r formulation would namely be:

-X .
-l
yi
--

where:

u.X

cos 8

v.X

sin 8

u.X

sin 8

v.X

cos 8

l
l

...,

i = l, n = number of points,
the t i l d e
xi,yi

l'-!

sign stands for the mathematical expectation,


are the "observed" cartesian coordinates of the network points i n

and x i ,

the two coordinate systems,

X,

8,

t x and t

are the transformation parameters which need t o be estimated,

and

ui, vi are the cartesian coordinates which need t o be estimated.

O f course, the submanifold as described by the classical H e l m e r t transformation is t o t a l l y geodesic.


Hence, one could fear i n the f i r s t instance that (5.17) can only be solved iteratively, i.e. through the
process of linearization. However, i n this example we w i l l show t h a t i f one views model (5.17) as a
ruled-type o f manifold, one can i n f a c t find i t s least-squares solution also analytically.
N o t e t h a t i f we f i x ui, vi,

i=l,

...,n,

i n (5.17) we are back a t the classical H e l m e r t transformation,


as described by (5.17) is f l a t i n directions transversal t o the

which was linear. Hence, manifold


ui,vi-coordinate

lines. But i f we f i x X and 8 ,

we see t h a t it is also f l a t i n the directions transversal

t o the X , 8 - coordinate lines. Thus i n the f i r s t adjustment step we can either f i x the ui,vi,
or X and 8.

i=l,

...,n,

It turns out that the choice of fixing X and 8 is the most advantageous one.

Skipping the tedious but t r i v i a l adjustment derivation we find f o r fixed X and 8 the solution of the
f i r s t adjustment step as:
(1

yc +

(1

%,X

= Cc

ti(h,8)

iX(X,8)

= C,

,t.y ( X , 8 )

where:

Ci(X,8)

yc +

C,

-1
x ~ ) (X:
cos 8

X sin 8

X ~ XC O S

XTX

y c X

- yc

e -

sin 8

s i n 8,

X cos 8,

yy
yy

X s i n 81,
X cos 81,
(5.18)

-C
X

,,

F .j,

= n

~ = J1

j=l

Yi = Yi -C

Y c 9

c
C

= n
=

-1

j=1
-

yC

X.,
J

-1

Yi = Yi

c,

j=1

(5.1~)

Hence, f o r t h e second adjustment s t e p we g e t


X

-C

2
+ ( l + A )

-1

2 c
(Ax.

iC

+ (1 + A

-1
)

-C

C
X .

' cos 0
-S

in 8

A cos 8 +

A cos 8

sin

y Ci

A sin 8) + e

'

y . A s i n e ) + eI
x '
i

sJ(lj"

cos 8

(5.19)

where e a r e t h e residuals.
The sum of t h e squared residuals reads then:

And this function needs t o be minimized in order t o find t h e least-squares e s t i m a t e s A and 8 . T h e


function is of course still non-linear (and non-quadratic) in A and 8 .However, observe t h a t if w e fix

A = 1

t h e model underlying t h e function of (5.20) equals, a p a r t from t h e f a c t t h a t we a r e

dealing h e r e with coordinates referring t o t h e c e n t r e s of gravity, t h e H e l m e r t transformation (3.30)


admitting only a rotation. Hence, f o r a n arbitrarily fixed value of A t h e minimum 8 (A ) of (5.20)
follows readily f r o m (3.36) a s

N o t e that n o t too surprisingly the estimated r o t a t i o n angle is invariant t o scale changes.


F r o m substituting (5.21) i n t o (5.20) we f i n d

;'l! "
t

2
f ( A ) = ( l + A )

-1 n

,c-

i=l

'

X .1

cos

sin

- A

cos 0

-sin 0

i,

y
,,

.'i,

(5.22)

which needs to be minimized i n order t o f i n d A .


With the reparametrization

we can w r i t e (5.22) also as

f(4)

=(cos

sin

2'

cos

where:

s i n 4 e2)

Observe t h a t the function f ( 4 )

describes the distance f r o m the origin t o an ellipse lying i n the

plane spanned by the vectors el and ep. Hence, t o minimize f ( 4 ) we need t o f i n d t h a t point on the
ellipse
y(4)

= cos

4 e

sin

2 '

which is closest to the origin. 'This minimization problem results then i n the following eigenvalue
problem

sin

And the minimum of f ( 4 ) equals the smallest eigenvalue p


follow f r o m

min.

of (5.25). The eigenvalues o f (5.25)

Hence,

Substitution of p = p

tan @ =

mi n

( (e29e2)-(e19e1)1

into (5.25) gives

V( (elvel)+

(e2,e2)12
2(e 1' e2)

or with (5.23) and (el,e2) = s g n ( (el, e2))

2'
4((e1,e1)

(e2,e2)- ( (el,e2))

)
9

I (el, e2) 1

With (5.24'), (5.21) and (5.18) the least-squares solution of t h e two dimensional Symmetric Helmert
transformation (5.17) finally becomes:

- nc i c o s 6

- yc i s i n

6,

t y = yc + gc X s i n 8

- yc X cos

8,

ix

ji

= xc

= gc

+ ( 1 + i2)

-1

(:f

+ xf

COS

- y f i s i n 6)

N o t e t h a t the reciprocal scale parameter reads as

E i

2
+ Y

,-l
- i=1
-

-c

E1 = 1( ( x ic)

+ (y;)2)

-------------------------U--------------------

2 j/(.E
x i c + Y
1=1
l l

+v1
+

'

'i-,
2

(yI
C i IC

-c 2
-c 2
((xi)
+ (yi) 1

c 2

c-c
2'
x.y.1)
l

c 2

- iel((xi)
+ (yi)
1
.......................................
C-C
C-C
E ( y Cl i lC - Xc.1Y- c.11 )
(i[l(xixi
+ YiYi)12
+

2'
2

which demonstrates the symmetry i n our least-squares solution o f the scale parameter. This i n
contrast t o solution (5.12) o f the classical H e l m e r t transformation.

5.4. The two dimensional Symmetric Helmert transformation with


a rotational invariant covariance structure

U p til now we assumed the simplest structure possible for the covariance matrices o f the observed
cartesian coordinates. I n many practical applications this assumption w i l l do, but i t w i l l not be
sufficient f o r a l l applications. Unfortunately one can not expect t o find a solution l i k e (5.27) i f the
observed coordinates are allowed t o have an arbitrary covariance matrix. One o f the reasons t h a t the
derivation of (5.27) went so smoothly is namely that the covariance matrices used f o r the t w o sets o f
coordinates are scaled versions o f each other and are invariant t o rotations. This indicates, however,

t h a t i f we assume the covariance matrices Q and


o f the t w o coordinate sets (
t
and
X
y
)
t o be o f such a structure that

..

, , .. .

2
k Q =

..

.X

,y . , . . . ) t
I

f o r some k E /R+,

and
t
R Q R = Q ,

where R is a 2n

2n block diagonal m a t r i x w i t h equal 2

cos 8
s i n 8

2 blocks

sin 8
c o s 8

one should be able t o generalize (5.27) accordingly.


Note, that it follows f r o m (5.28) and (5.29) t h a t

G - ~consists o f 2 X 2 diagonal blocks o f the type

Hence, the Baarda-Alberda c r i t e r i u m m a t r i x (see e.g.

Baarda, 1973 or Teunissen, 1984b) is a proper

c a n d i d a t e f o r 6.
T o solve f o r t h e S y m m e t r i c H e l m e r t t r a n s f o r m a t i o n with t h e new c o v a r i a n c e s t r u c t u r e (5.281, (5.29)
w e apply t h e s a m e two-step procedure a s used before.
F o r fixed X and 8 w e g e t t h e n a s solution of t h e f i r s t step:
u . ( x , ~ )=
I

;C

;C -

X cos

t (x,e) = y + X sin
Y
C

COS

- k X sin 0 yi)

2
C
2
C
2 -l -c
(yi + k X sin 0 x + k X cos 0 yi)
)
i

+ X

+ l

t x ( ~ , 8 )=

2 -l -c
2
(xi + k X

+ ( 1 + (kX) )

1 sin 0

y c'

- X c o s 8 yc,

where:

F r o m this follows t h a t w e g e t f o r t h e second a d j u s t m e n t step:


'"X

2
] = ( l + (LA)

-1

i
C

'i

cos0

-C
X

sine

- X
- s i n e

cos8

r \yi

yi

'

-C
I

e=

cos 0

k X ( 1 + (kX)

- s i n e

-C

cos 0
cos0

-C
,

. 'i

,,

w h e r e e a r e t h e residuals.
H e n c e , t h e weighted s u m of t h e s q u a r e d residuals r e a d s t h e n

di-l, i
,i-1

:i i

0
i-l

0
di+l, i
0

0
l

0
di i
:i+l,i

'

d . ~ i +, l

0
'

..

0
di,i+l

c
i
c
yi

-c
i

YP

and this function needs t o b e minimized in order t o find A

and

8.

With t h e r e p a r a m e t r i z a t i o n

1
0 < @ < 2

@,

k A = tan

(5.34)

we c a n r e w r i t e (5.33) a s
t ,

-c ij-c
-c i j - c
(x.d X . + y.d y . )
1

- c i j c
-cijc"
-k(xid X . + y.d y.)

-c i j - c
-c i j - c
(xid X . + y.d y . )

- c i j c
- c i j c
-k(x.d X . + yid y j )

cos@

-c i j c
-k(yid x j

-c i j c
xid Y.)

-c i j c
-c i j c
-k(yid x j - x i d y . )

sin@cos8

2 c ij c
c ij c
k (xid X . + yid Y . )

sin@sin0
COS(

I,,

where w e have used Einstein's summation convention.


T h e least-squares problem (5.35) results in t h e following eigenvalue problem:
-c i j - c
-c i j - c
(x.d X . + y.d Y . )
1

0
-c i j - c
-c i j - c
(x.d X . + y.d y . )
1

-c i j c
-c i j c
-k(xid X . + y.d y . )
1

s i n @ cos0

-c i j c
-k(y.d X .

sin@sine = p

sin@sin0

-c i j c
-k(yid x j

cos@

cos@

- c i j c
- c i j c 3 '
s i n @cos0
-k(xid X . + y i d y . )

-c i j c
xid y.)

-c i j c
xid y j )

2 c ij c
c ij c
k ( x i d x . + y . d y.)

And t h e minimum of (5.35) equals t h e smallest eigenvalue pmi

of (5.36).

T h e smallest eigenvalue reads:

kin.= 71 { ( -xci d i j -Xcj

-c i j - c
2 c ij c
c ij c
y . ) + k (x.d x j + yid y . )

+ y.d

1~

-c i j - c - c i j - c
2 c ij c c ij c 2
2 - c i j c -c i j c 2 - c i j c - C i j c 2 I
$xid
x . + y . d y.1-k ( x i d x . + y i d y . ) ) + 4 k ( ( x . d x.+y.d y . ) + ( y i d X . - x i d Y . ) )
J
l
J
J
1
1
1
1
J
J
J

F r o m t h e f i r s t two equations of (5.36) w e find t h a t

and f r o m t h e third equation we g e t

k),

2 c ij c
c ij c
k (x.d
X. + yid
Y.)
'min.
1
J
1
= t a n $, = -------T------------------------------------------------- c ~ j c - c i j c
- c i j c
- c i j c
-k((xid
X . + yid
y.) cos 6 + (y.d
X
- xid y.) sin
J
J
1
j
J

6)

Together with (5.37) and (5.38) this finally gives

-c i j -c
-c i j-c
(xid X + yid Yj)
j
kX =---------1---------L---------------------------2
- c i j c
- c i j c 2 '
- c i j c
- c i j c
xid x + y.d
y.)
+ (yid
X
- x i d
y.)
j
I
J
j
I
c ij c
2 c ij c
k (xid X. + yid y.)

* 2 c i j c
c i j c
k (xid X + y.d
Y.)
j
I
J
+

-c i j -c
(x.d
x
I
j

- c i j c
- c i j c
x i d X + y .L d
yj)
j

- c i j c
+ (yid
X
j

- 21

-c i j - c
y i d y.)
I

----------------------------------------------F

(5.39)

- c i j c
xid y j )

21

'

The adjusted coordinates and translation p a r a m e t e r s c a n be found by substituting (5.38) and (5.39)
into (5.31).

5.5. The three dimensional Helmert transformation and its symmetrical generalization

Now t h a t we have found t h e solution t o t h e t w o dimensional H e l m e r t transformation and i t s nonlinear generalization, i t is natural t o t r y t o generalize these results t o t h r e e dimensions.
We will first consider t h e classical t h r e e dimensional H e l m e r t transformation. The model for t h e
t h r e e dimensional Helmert transformation reads:

where:

...,n = number of network points,

i = l,

xi,yi,zi a r e t h e 'lobserved" coordinates of t h e network points in t h e first coordinate


system,
ui,vi,wi a r e t h e fixed given coordinates in t h e second coordinate system,

X , ( a , B ,y ) and ( t , t ,t ) a r e respectively t h e scale, orientation and translation


X

parameters,

ex i

pey

i ) e z i a r e t h e errors, and

cosa sina
0 -sine: c o s a

,R2(g)=

cosg 0 - s i n g
0
1
0
s i n g 0 cosg

cosy s i n y 0

I n contrast t o the t w o dimensional case, the submanifold o f the three dimensional HeImert
transformation

is

curved.

This

compIicates

matters

considerably.

However,

number

of

simplifications can be obtained i f we again apply the appropriate two-step procedure. I n the f i r s t step
we therefore assume the orientation parameters a , B

a n d y t o be fixed, and solve f o r the scale

~ ( a , g , ~and
) translation parameters t X ( a , g , y ) , t y ( a , B , y ) , t z ( a , & y ) .

Since the f i r s t step

consists o f a linear adjustment problem, i t is relatively easy t o solve. The second adjustment step,
where we have t o solve f o r the orientation parameters, is however s t i l l non-linear. We w i l l solve this
second adjustment step by making use o f the alternative formulation as discussed i n example 1 o f
section 3.6

To apply the alternative formulation which makes use o f the trace operator,

we take the

abbreviations

: ,
:

H =
nxl

tx
t
Y
t
z

R = R (y)R2(~)R1(a)

and w r i t e (5.40) as

The f i r s t step o f our adjustment problem reads then


min.

= min.

f(X,t)

X,t

t
t t
t
t
t r a c e [ ( X - X U R -Ht ) ( X - X U R -Ht

1).

(5.43)

X,t

To f i n d the c r i t i c a l point of the function f ( X , t ) the following results on m a t r i x differentiation w i l l


be used:
~f

a/a

L =

a/a

ij

then

(b)
(C)

---------------a t r a c e [ L K L M)- =
a L
a t r a c e (L M L t K)- =
---------------a L

2 K L M, w h e n K a n d M symmetric
2 K L M,

when K and M symmetric

The proofs of these relations are straightforward, and we illustrate the method by proving (5.44.a):
Let
t r a c e (K L M) = K . . L

jk
M

1 1

k i

'

where Einstein's summation convention is understood.


'Then

trace

(K L M) = K

a
or

a t r a c e (K L M] =
--------------a L

M
t
t
i m n i - KmiMin

KtMt

With the aid o f (5.44) i t follows f r o m (5.43) that

(a)

a f - -2
-at

(b)

-a x --

F r o m (5.45.a)

(X-X U R

2 i trace

(utu)

2 nt =

t t
t
2 t r a c e ( ( X - ~ t) ( U R

)) = 0

follows t h a t

p y
t =

(X-X U R ) H

Substitution o f (5.46) i n t o (5.45.b)


t

1 t r a c e (U ( I
Note
(I

gives
t

I H H )U)
n

t
t r a c e (X ( I

1
t
.
1s
a
that
I H H
1
t
H H ) With the abbreviations

uC
nx3

projector,

1
;H

t
t
H )U R

i.e.

(I

) = 0.

- -n1 H

t
H )(I

1
t
C
1
t
= ( I - - H H ) U
and X = ( I - - H H ) X
n
n
nX 3
nX 3
nX n
nx3
nXn

(5.47)

1 H Ht ) =

(5.48)

i t therefore follows f r o m (5.47) t h a t

trace ( ( X ~ ) ~ ( U ~ ) R ~ ]

-------------------

trace ( ( u ~ ) ~ ( u ~ ) )

Formula (5.46) together w i t h (5.49) constitute the solution o f the f i r s t step. To formulate the second
adjustment step, we substitute (5.46) and (5.49) i n t o
t r a c e ((X-X U R

t t
t
Ht ) ( X - X U R

Ht

))

This gives f o r the second adjustment step:

min.

trace

))

~,B,Y
subject to

R = R ( y ) R (B)Rl(a)
3
2

trace2( (xc)' ( u c ) ~ ' )

------------------- l
trace ( ( u ~ ) ~ ( u ~ ) )

Since we know that the scale parameter h must be positive and t h a t


positive, it follows f r o m (5.49)

that

t r a c e ( ( X C ) (u')R

t r a c e ( ( U C)

( U C ) ) is

must be positive. We can therefore

rephrase our second adjustment step as


t r a c e ( ( ~ ~ ) ~ ( U ~s )u b
~j e
~ c)t

max.
a,B,y

To f i n d the solution R t o (5.51)


Teunissen, 1984a) t o the m a t r i x

toR=R3(y)R2(B)Rl(a).

we apply the singular value decomposition theorem (see e.g.


( X C ) ( U C ) . This theorem says that the m a t r i x ( X C ) ( U C ) may

be factorized i n the f o r m

where V1 and V2 are orthogonal matrices of order 3x3 respectively, and D is a diagonal m a t r i x o f the
form

where di,
dl)

i=1,2,3,

d2)

d3)

are the singular values of

( X C ) (UC )

which may be ordered so that

0.

'

2 t
F r o m (5.52) follows that ( U C ) ( X C ) ( X C ) (UC ) = V2D V2.
orthonormal set of eigenvectors of the symmetric m a t r i x

Hence, the columns of V2 give an


t
c t
c
2
(U ) (XC) (X ) (U )
and the d i
c

are the corresponding eigenvalues.


Substitution of (5.52) i n t o (5.51) gives
max.
a

,B ,Y

t r a c e ( V DV R
1
2

subject

Since f o r arbitrary matrices A and B, t r a c e ( A B)


max.
a ,B ,Y

t r a c e ( V R V D)
2
1

to

= t r a c e ( B A)

subject

R = R3(y)R2(~)Rl(a)

to

(5.53)

we can r e w r i t e (5.53) as

R = R ( y ) R 2 ( ~)R1(")
3

I f we denote the diagonal elements of the m a t r i x V2R V1 by a

i= l , 2,3, i t follows t h a t

(5.54)

L e t us now f i r s t assume that a l l three singular values are non-zero. Then, since the singular values di
t t
are positive and the matrices i n the t r i p l e product V2R V1 are orthogonal, i t follows that (5.55) is
maximal i f ai = 1, i = 1,2,3.

This implies then t h a t (5.55) is maximal i f and only i f

Hence, our solution becomes

or w i t h (5.52)

Thus i n case o f non-zero singular values di, i=1,2,3,


eigenvectors

and

corresponding

( U C ) ( X C) ( X C ) ( U C)

the orthogonal m a t r i x R can be found f r o m the

eigenvalues

of

the

symmetric

matrix

. Since this m a t r i x is of order 3x3 i t s characteristic equation is a cubic,

Substitution o f
gives

1
2
2
p = b - - a
a n d q = c + - - a
3
27

where

- -31 ab.

According t o the Cardanian formula (see e.g. Griffiths, 1947) the three roots o f (5.58) are:

where w = c o s
Thus

with

-32

(5.59)

( U C ) ( X C ) ( X C)

+ i

and

sin

(5.57)

-32
one

2
a a n d i =-l.
can

compute

the

eigenvalues

o f the symmetric

matrix

( U C ) . Once the eigenvalues are known it becomes straightforward t o compute

the corresponding eigenvectors.

Although the case of zero singular values w i l l n o t occur very o f t e n i n practice, l e t us now assume
t h a t one of the singular values, say d equals zero. It follows then again t h a t (5.55) is maximal i f and
j
t
only i f R = V1V2. With (5.52) we can therefore w r i t e

where D + is the pseudo-inverse of D, and Vlj

and V2j are the j - t h column vectors o f V1 and V2

respectively.
Finally we consider the case of multiple zero singular values. The case dl

= d2 = d3 = 0 is trivial,

since then the orthogonal m a t r i x R is indeterminate and may take any arbitrary form. I n case only
t w o of the singular values, say d2 and d3, equal zero, we find that (5.55) is maximized i f R takes the
f orm

O
cos+
-sin$

+sin+
+cos+

V:

where

+ is arbitrary.

Thus i n case the t w o singular values d. and dk equal zero we find w i t h (5.52) t h a t the orthogonal
J
m a t r i x R takes the f o r m

+ is arbitrary.

where

I n the geodetic l i t e r a t u r e a number of authors have studied the three dimensional H e l m e r t


transformation. The two most recent papers on the subject are (Sansb, 1973) and (KBchle, 1982).
References t o earlier papers can be found i n (Schwidefsky and Ackermann, 1976).
Using the factorization of Cayley, (Kochle, 1982) arrives a t an i t e r a t i v e solution f o r the orthogonal
m a t r i x R. (Sansb, 1973) on the other hand, formulates the solution f o r R through the use o f
quaternion algebra i n terms of an eigenvalue problem of a symmetric 4x4 matrix. H i s result is
therefore t o some extent comparable w i t h our solution. Note, however, t h a t our derivation is more
general than Sansb's, since it does not require any restrictions on the number of columns i n the
matrices X and U i n (5.42).

Now t h a t we have found the solution of the three dimensional H e l m e r t transformation (5.42), we w i l l
consider the three dimensional generalization of the Symmetric H e l m e r t transformation (5.17). Using
our alternative formulation the model can be w r i t t e n as

As i n section 5.4
sets

.. .

we assume t h a t the covariance matrices Q and 6 of the two coordinate


t
-1
L
)
and (
X
y
z
)
are of such a structure t h a t d

,y , , .. .

. ..-

consists of 3x3 diagonal blocks of the type

,- ,

...

and
2
k Q =

f o r some k E /R+.

Our adjustment problem becomes then

,V.,W

m i n i m i ze
,A,t
,t ,t
i
X
Y
Z

f(ui,vi,wi,A,t

t
X'

,~,B,Y

ttz9a98,~),

with

..
and where the element o f the nxn symmetric m a t r i x G on place ij is given by d'l.
To solve (5.63)

we w i l l proceed i n three steps. F i r s t we w i l l f i x the scale

and orientation

parameters a , 8 , y :

min imize
U ,V.,W.,t
i I I

9(ui,vi,w

t ,t
x ' y z

,t ,t , t
x y z

2
t
t t
t
t
{ k t r a c e [ ( x - A U R - H t ) G ( X - A U R - H t ) ) + trace[(%-u)~G(R-U))}.

minimize
U .I, V

i, W i , t X , t

,t
Y

(5.64)
With the aid o f the m a t r i x d i f f e r e n t i a t i o n rules o f (5.44) we find that the c r i t i c a l point o f g should
satisfy:

(a)

(b)
F r o m (5.65.b)

au

= 2 ( k 2 A 2 + 1 ) ~U

f f = -2

2
t
2A k G ( X - H t ) R

2 G

t t
2
t
( X - A U R ) G H + 2 k t H G H = O

we f i n d t h a t

Substitution o f (5.66) i n t o (5.65.a)

gives

Premultiplication w i t h H(HtGH)-lHtG shows that

= 0

Hence, we can w r i t e (5.67) also as

With the abbreviations


t
-1 t
XC = ( I - H ( H G H ) H G ) X

and

%C

= ( I -H(H G H )

-1 t
H G)>;:

we thus find that

When we substitute (5.68) i n t o (5.66) we f i n d the translation vector as

(5.68) and (5.69) constitute the solution o f our f i r s t adjustment step. Compare (5.68) and (5.69) w i t h
(5.31).
To commence w i t h our second adjustment step we substitute (5.68) and (5.69) i n t o (5.63') and find

I n a similar way as (5.56) was derived, we find t h a t f o r fixed scale the conditional minimum o f (5.70)
is obtained by

where the diagonal m a t r i x D contains the singular values o f ( X C ) t ~ ( 8 C ) and the column vectors
of
(8')

the

orthogonal

V2

matrix

are

provided

by

the

eigenvectors

of

the

3x3

matrix

t ~ ( ~ (X')
C ) t~(8C).

To f i n d the least-squares

estimate

X,

of

we substitute (5.71)

reparametrization

k X = tan
This gives

+,

0 <

<

-2

8.

into (5.70)

and use the

The minimization o f (5.73) leads then t o the following eigenvalue problem

Since the minimum o f (5.73) equals the smallest eigenvalue p

min

o f (5.74) i t follows that

F r o m (5.74) follows t h a t

'min.

12 { k 2 t r a c e ( (XC) t ~ ( ~ C+)t r)a c e ( (X- c ) tG(RC))]

12 $k2trace((~C)t~(~C))-trace((X- c ) tG(%')))'

2
c t -c-t
4 k 2 t r a c e ( ( X ) G(X )R

Substitution o f (5.76) i n t o (5.75) then finally gives

The least-squares estimates t

and U are found f r o m substituting (5.71) and (5.77) i n t o (5.69) and

(5.68) respectively.

5.6. The extrinsic curvatures estimated

I n general, the problem o f finding the curvature behaviour o f submanifold

can only be solved

through actual computation o f the extrinsic curvatures kN f r o m the normal f i e l d B f o r a chosen


tangent direction X and normal direction N. But, as w i l l be clear f r o m (4.30) the computation o f the
principal curvatures entails some e x t r a expenses. It is therefore o f some importance t o have ways o f

finding realistic estimates f o r the extrinsic curvatures o f

i . As

we see it, there are three

possibilities:

(i)

T r y t o compute the extrinsic curvatures analytically. Those cases where this is possible will,
however, be rare.
L-et us take as an example the Symmetric H e l m e r t transformation (5.17). F o r convenience we
reparametrize (5.17) as

y .I = - b u .1

a v .I + t

Y'

where a = X c o s e and b = X s i n e .
1 a
We assume that the observation equations y ( X )
X

a = 1,

.. . , 2 n + 4 ,

I = 1,

... , 4 n ,

and the parameters


I
a y

o f model (5.78) are ordered such t h a t the design m a t r i x

reads i n partitioned f o r m as

where:

m a t r i x A is 2n

1
1

2n block-diagonal w i t h equal 2

v
-U

l
1

n
n

v
-U

n
n

2 blocks

0 '
1

B =
2nx4

C = I

2n

and

D =
0 .
2nx4
2nx4

We also assume t h a t the observation space has the standard metric, i.e.
follows then t h a t the induced m e t r i c g

with:

aB

reads i n partitioned f o r m as

I J = 6I

J. It

I a
Furthermore it follows from (5.78) t h a t t h e non-zero second derivatives of y ( X ) a r e given
by:

...,

f o r i = l, n.
Hence f o r an arbitrary unit normal vector N, i.e.

t h e matrix

where:

(B(

a,, ag ) , N ) ~reads in partitioned form as

I n order t o determine the w i t h the normal direction N corresponding principal curvatures, we


need t o solve the general eigenvalue problem

I (Bc%9aB),N)M-kN

gaB I = 0.

With (5.80) and (5.82) this gives

Now assume t h a t kN f

. Then we can apply the following well-known result:

I f U is a regular square matrix, then

This applied t o (5.83) gives

Since N is a normal vector it follows f r o m (5.79) and (5.81) that

i= l

i= l

(NZi-'ui

2i

+ N

= 0 and

t t
Hence F A B = 0.

v.)
I

.E= l

= 0,

i= l

(NZ;-'v.

2i
N u.)

2i
N
= 0.

t
2
With A A = X I
this gives f o r (5.85)
2n '

= 0,

We c a n now apply t h e following variant of (5.84):

This gives f o r (5.86)

I (

2
I 2
2
2 2
- kN i r l ( u i + v i ) )
(N
I=l

2
(NI)'
I=l

= k

2
N

( .1 f= 1 ( u 2.I +

2 2 2
I 2 + nk ( U + V
N c c

2
2
v . ) -nu
I

- v

2
C

I2

= 0

2
( (uc)2+(vg)2)
N i=l
I

Thus t h e t w o non-zero principal c u r v a t u r e s of model (5.78) r e a d

(ii)

T r y t o e s t i m a t e t h e e x t r i n s i c c u r v a t u r e s with t h e help of t h e information which c o m e s


available during t h e iteration. R e c a l l f r o m s e c t i o n 3.6 t h a t t h e numerical e x a m p l e s clearly
b e t r a y t h e c u r v a t u r e involved.
In o r d e r t o e s t i m a t e t h e c u r v a t u r e during t h e i t e r a t i o n w e need a m a n a g e a b l e formula.
F o r m u l a (4.37a) d o e s t h e r e f o r e n o t apply, s i n c e i t n e e d s

in advance. T h e following

formula, however, c a n b e used:

(5.88)
T h e proof of (5.88) goes similar t o t h a t of (4.37.a).
(iii)

Try t o o b t a i n rigorous bounds on t h e e x t r i n s i c curvatures. F r o m Gauss' decomposition f o r m u l a


(4.18) follows t h a t

L e t k denote the i n absolute value largest principal curvature f o r the normal direction

N =

/ 116 1

have then

M '

with

= y
S

where the m a t r i x norm

I I .I 1

-9.

According to the eigenvalue problem (4.30) we

is the spectral norm.

With (5.89) and the Cauchy-Schwarz inequality we obtain the following upperbound:

a8
gYY(R) = t r a c e ( g
(R)).

with

2
aaB

To estimate the spectral radius of

one can make use of the various exclusion

theorems known f r o m the literature. F o r instance, one of the simplest exclusion theorems is:
F o r a l l eigenvalues p of a m a t r i x AaB one has

where

I I .I I

is a chosen vector norm.

F o r the max-norm

l l XB I l

= max

l XB l

this becomes then

I I 5

mgx

I AaB I ,

i.e. the largest row of A


aB

For a diagonal dominant m a t r i x one could take Gershgorin's theorem, which says t h a t the
union of a l l discs

Am

12 I I

AaB

I},

(5.94)

(no s u m s t i o n over a ) ,

B =l
B b

contains a l l eigenvalues of the n

n matrix A
a8
Instead of using exclusion theorems one coul.d also t r y t o compute the spectral radius o f
2
i
directly. This can turn out t o be feasible especially when per observation equation
aaB y
only a few parameters are involved.
X

As an alternative t o (5.91) we could make use of condition equations i f they are available.
Let Y

and

~
.'Then
i
X

( Y * ( Y ) , h ) = 0.
U

Hence,

= D
(Y*(Y),R)
= ( D ~(x)Y,(Y)
Y * (X)
M
a

,R)

nr

(Y,(Y) ,Dy (X)R)II

and w i t h Gauss' decomposition formula (4.18) this gives

(B(x,Y),R) M =

(D

Y* (X)

R,y*(~))M

N o w assume that

With t h e condition equations

m=

(gjk

( y ) = 0,

-c 1
p
a U e ) a.,
a k u g p ~1
J

...,(m- n ) ,we can w r i t e then


m;
p , ~ = l ,... , ( m - n ) ,

P = l,

j ,k,l=l,...

where

And with X =

aa '

Y =

aB

this gives f o r (5.95),

Hence,

Expression (5.96)

s t i l l looks horribly complicated. B u t we can simplify it somewhat by


i
a
known result t h a t for two arbitrary m a t r b e s
A i , Bap
i B
and B A . have the same non-zero
i=l,.
,m, a = l , .
,n their products A?B'
I a
a I
eigenvalues w i t h the same multiplicities. Application o f this result t o (5.96) gives
recalling

the

..

well

..

since t h e eigenvalues of a projector equal one or zero.

5.7. Some two dimensional networks

(i)

Recall t h a t Gauss' method (4.5) has a local quadratic convergence behaviour in c a s e


submanifold is totally geodesic. A typical example of a t y p e of geodetic network for which
this holds is a planar geodetic triangulation chain if t h e parameters a r e cartesian coordinates
(see figure 33).

figure 33
(ii)

In t h e previous subsection we observed t h a t i t may be worthwhile t o t r y t o compute t h e


2
i
spectral radii of t h e m a t r i c e s a
y , i =l,
,m,directly if every observation equation

aB

...

only contains a f e w parameters. Fortunately, this is precisely t h e c a s e in geodetic network


theory. In c a s e of a t w o dimensional trilateration network for instance only four p a r a m e t e r s
a r e involved in e a c h observation.
By expanding t h e distance function 1 connecting t h e network points P and P we g e t
P9
P
q

And it is not too d i f f i c u l t t o v e r i f y t h a t the maximum eigenvalue o f i t s 4x4 Hessian is given


by

Since i n practice the observations are usually assumed t o be uncorrelated w i t h equal


variance, we can w r i t e

g..

= d i a g (... o

I J

- 2 ...)
Y

(5.99)

Now i f we also assume t h a t a l l distances i n the network are about the same, i.e.

= I
P4
and t h a t the variances o f the estimated parameters do not d i f f e r too much, we get w i t h
(5.98) and (5.99) f o r (5.91):

(iii)

As an example o f how t o apply (5.97) we take a t w o dimensional closed polygon i n which


every t w o neighbouring points are connected by one measured distance I and one measured
azimuth A. The t w o condition equations read then:

!?1 1.

i=l

cos A.

= 0,

and

f!

i=l

I.

sin^

= 0

I f we assume that the observations are uncorrelated and the variances satisfy

then

and

where t h e odd numbered residuals r e f e r t o the distance residuals and the even numbered
residuals r e f e r t o the azimuth residuals.
Furthermore

it

follows

that

the

following

two

2n

2n

matrices

Im

g=l

and

Im

2 p =2
U
mn

a 2
sin A
1.
i
l

- a 2 sin A
i
Ai

are block-diagonal w i t h blocks o f respectively

A.

COS

1.
l

and
,

cos A.

- a 2 I cos A
A. i
i

Ai

F r o m (5.102), (5.1031, (5.104) and (5.105) follows then that, the 2n

sin^

2n m a t r i x

is block-diagonal w i t h blocks

The eigenvalues X . o f m a t r i x (5.106) read


l

( a cos A.

b sinA.)

2 2
4(a +b )

3 ( a cos A.

b sinA.)

2'

And f r o m this follows t h a t

Hence, w i t h (5.104) we find f o r (5.77):

where l

2
is the distance f o r which U / l
li

i1

...,n ,

is the greatest.

i(5.105)
,

6. Some statistical considerations

I n the previous sections we dealt w i t h the problem o f finding the least-squares solution

B u t i t is o f course n o t enough t o compute a vector


value o f the unknown

fi.

fi

9 to

and state t h a t this is the estimated

The step following the actual adjustment process is equally

important. That is, one also needs t o f i n d the statistical properties of the estimators involved and
formulate ways of testing statistical hypotheses. Unfortunately we are not able yet t o present a
complete treatment of the statistical theory dealing w i t h non-linear geodesic estimation, although i t
w i l l be clear t h a t i n considering non-linear models one cannot expect a well working theory as we
know i t for linear models. I n the following we w i l l r e s t r i c t ourselves therefore t o a few general
remarks.

As we have seen, Gauss' method enabled us, given the observation point y
least-squares estimate i( o f

= y(;)

of

Nc

X.

And w i t h the map y :

= P(ys)

and

= y

-1

where y-l is a leftinverse of y : N + M

y - l o P: M

and

and

t o compute the

such that

P(yS),

(6.2)

values f o r ys. And application of the maps P and

this gives the least-squares estimate

I f the observation process which yielded our data were t o be repeated,


values o f

A4

I n this way the least-squares estimation method defines, a t least

implicitly, t w o non-linear maps P : M+

E
S

G respectively. Thus the

-1
y

and j?

WE

would obtain d i f f e r e n t

o P t o the new data would yield different


are themselves samples drawn f r o m certain

probability distributions which depend both on the nature of the maps involved and on the assumed
normal distribution o f the observations. F o r making statistical inferences i t is therefore important t o
know the statistical properties of the estimators involved.
i
I n case the coordinate functions y ( xa ) of the map y are linear, i t is not d i f f i c u l t t o derive the
precise distribution of the least-squares estimators. The following distributional properties are w e l l
known:

However, these results do not carry over t o the non-linear case. Only i n the exceptional case that one
is dealing w i t h a totally geodesic submanifold

w i l l the last three distributional properties of (6.3)

s t i l l hold. O f course, a similar complete theory as we know i t for linear models can hardly be
expected. Essential properties which are used repeatedly i n the development of the linear theory
break down completely i n the non-linear case. Take for instance the mathematical expectation

operator E{

i.e.,

.)

I f z is a random variable and g is a non-linear map, then

the mean of the image differs generally f r o m the image o f the mean. Hence, we can hardly

expect our least-squares estimators t o be unbiased i n the non-linear case. Consequently, one cannot
justify least-squares estimation anymore by referring to the Gauss-Markov theorem. O f course this by
no means implies that one should do away w i t h the least-squares estimators. Under the usual
assumption o f normality the least-squares estimators are namely s t i l l maximum likelihood estimators.
Besides, when one overemphasizes the importance o f exactly unbiased estimators, one can f i n d
oneselves i n an impossible situation. Very o f t e n namely we have a natural estimator which is,

2,

however, slightly biased. F o r example, i f z is a good unbiased estimator o f


t o estimate g(;),

then i t seems natural t o estimate

C J (by
~

g(=),

and i f i t is required

although this estimator w i l l

nearly always be biased.


Another property t h a t fails t o carry over t o the non-linear case, is the property o f estimability.
Recall

that

*
( X ,X),

with

*
X

respect

N*,

to

X E

linear

model

-y E

= ANcM

linear

function

is usually defined t o be an estimable function i f it admits an

unbiased linear estimator. However, this definition cannot be used f o r a non-linear model. F i r s t of a l l
since a r e s t r i c t i o n t o linear estimators is not reasonable anymore, and secondly since non-linear
estimators are almost always biased. Thus what we need is a more general definition o f estimability,
one which f o r linear models reduces to the above given one. The answer is given by the dual r e l a t i o n

This dual relation implies namely t h a t either

* *

= A y

forsome

Y E

or

Ax = 0

and

( X ,X)

f 0,

* ,X )

but not both hold. Hence, asking f o r an unbiased linear function ( X


a linear function

which is invariant t o solutions of A X = 0

( X ,X)

is equivalent t o asking for


(see e.g.

Grafarend and

Schaffrin, 1974). Therefore i n general i t would seem more appropriate t o couple the definition o f
estimability t o the property of invariance.
Since it is impossible i n general t o derive precise formulae f o r the distributional properties of the
non-linear estimators, the best we can do seems to be to find approximations. Three approaches
suggest themselves:
When one has a non-linear model it is natural t o hope t h a t i t is only moderately non-linear so t h a t
application of the linear theory is justified. I n practical applications the f i r s t step taken should
therefore be t o prove whether a linear(ized) model is sufficient as approximation, because then the
statistical treatment is much more simple. And since the origin of a l l complications i n non-linear
adjustment lies i n the presence o f curvatures, it seems reasonable t o take the mean curvature as a
measure o f non-linearity. L e t us therefore Taylorize the expressions i n (6.2) about the true values

F=

Y(x).

With e = y

-y

this gives:

= ( ~ ( y ~ =) yk
) ~+ a i ( p ( ? ) )

a2

i j

(P(?))

i j
e e +

...

and

By taking the expectation we find t o an approximation o f the order a


k

-k
Y

1 2 2
k i j
= ? a a..(~(?)) g

'I

and

12 a 2 a 2i j ( Y- 10

xQ} =

E{P'-

a i j
~ ( y ) )g

And w i t h the definitions o f the unique mean curvature normal 6l (see (4.33)) and the C h r i s t o f f e l
a
Isymbols o f the second kind T
(see (4.17)), and by using the f a c t that y s - P ( y s ) E T ,N,
one
BY
Y
w i l l find that one can r e w r i t e (6.6) as

1
= - a2n
2

E{?-?}

(a)
and

where

{P

(b)

-X

N
p
,
P'
a n d a,B,y

"P
1 2 BY
50 g

... ,(m- n ) ,

-21

n N

,
BY

is an orthonormal basis o f

,...,n.

= l

T%
Y

(see also Teunissen, 1 9 8 4 ~ ) .


Thus the f i r s t moments o f the parameters depend on the connection coefficients o f N, whereas the
f i r s t moment o f the residual vector depends on the mean curvature o f submanifold y ( N ) .
Hence, the f i r s t moments o f the parameters can be manipulated by a change o f parameter-choice,
whereas the f i r s t moment o f the residual vector is invariant t o such a change of parameters.
As an example, l e t us apply (6.7) t o the t w o dimensional Symmetric H e l m e r t transformation (5.17).
We assume t h a t the observation space has the standard metric.
According t o (5.87)

the non-zero principal curvatures o f model (5.17) f o r an arbitrary normal

direction N read

Hence, the corresponding mean curvature reads


a

coordinates

- ,y- ,
a

,y

and x i

i= l ,

... ,n ,

cN

= D.

The bias i n the parameters follows i f one applies (6.7.b.)


one w i l l then f i n d t h a t

With (6.7.a)

follows then t h a t the adjusted

are unbiased.
F o r the Symmetric H e l m e r t transformation

Similar estimates as given by (6.7) can also be derived for the higher order moments of the non-linear
estimators.
Fortunately our rather pessimistic estimates i n section 5 indicate that the application o f the theory
of linear statistical inference is generally justified i n geodetic network adjustments. But, we must
admit t h a t it is n o t clear t o us y e t what t o do when the model is significantly non-linear and
therefore much more research needs t o be done i n this area. Such being the case one may be surprised
t o realize how l i t t l e developed is the statistical theory o f non-linear estimation for p r a c t i c a l
applications. See f o r instance the survey papers (Cox, 1977), (Bunke, 1980); the book (Goldfeld and
Quandt, 1972) and the very recent book (Humak, 1984).
A n alternative way t o estimate the properties o f the distribution of the estimators involved, would be
t o use computer simulation. One could replicate the series of experiments as many times as one
pleases, each t i m e w i t h a new sample o f errors drawn f r o m the prescribed normal distribution and so
obtain the relevant distributional properties by averaging over a l l replications.

Although this

approach could give us valuable insight i n t o the e f f e c t of non-linearity, it must be carried out on a
system whose parameters are known i n advance, and such a system may n o t always be realistic. B u t
then again, since the distributions o f the estimators involved depend on the actual distribution of the
observational data which on i t s t u r n depends on the "true" values

which are generally unknown, one

is almost always faced w i t h the problem t h a t even when one can derive exact formulae f o r the
distributions one can evaluate only the approximation obtained by substituting the estimated
parameters f o r the true ones.
Finally we mention the possibility t o r e l y on results f r o m asymptotic theory. The central idea o f
asymptotic theory is t h a t when the number m of observations is large and errors of estimation
corresponding small, simplications become available t h a t are not available i n general. The rigorous
mathematical development involves l i m i t i n g distributional results holding as m

and is closely

related to the classical l i r n i t theorems o f probability theory. I n recent years many researchers have
concentrated on developing an asymptotically theory f o r non-linear least-squares estimation. I n
(Jennrich, 1969) a f i r s t complete account was given of the asymptotic properties o f non-linear leastsquares estimators. And i n (Schmidt, 1982) i t was shown how the asymptotic theory can be u t i l i z e d t o
formulate asymptotic exact test statistics. See also the very recent book (Bierens, 1984). Roughly
speaking one can say t h a t under suitable conditions one gets the same asymptotic results for the nonlinear model as for the linear one. Unfortunately, we doubt whether the results obtained up t o now
can satisfy the requirements of applications i n practice. I n particular, the theory s t i l l seems t o lack
statements concerning the accuracy of the approximations by l i m i t distributions.

7. Epilogue

I n this chapter we have t r i e d t o show how contemporary differential geometry can be used t o improve
our understanding of non-linear adjustment. We have seen t h a t unfortunately one can very seldom
extend the elegant formulations and solution techniques f r o m linear t o non-linear situations. F o r most
non-linear problems one w i l l therefore have t o recourse, i n practice, t o methods which are i t e r a t i v e
i n nature. As our analysis showed, the Gauss1 method is pre-eminently suited for small extrinsic
curvature non-linear adjustment problems. On the whole, one could say t h a t solutions t o linear
problems are prefabricated, while exact solutions t o non-linear problems are custom made. A n
important example is our inversion-free solution t o the Symmetric H e l m e r t transformation.
Although we have treated a number of new aspects o f non-linear adjustment, we must recognize t h a t
we are only on the brink of understanding the complex of problems o f non-linear adjustment. Many
problems and topics were l e f t untouched or were not further elaborated upon.

F o r instance, i n our proof o f the global convergence theorem (4.66) we made use o f the line search
strategy known as the minimization rule. However, i t s practical application is l i m i t e d by the f a c t
that the line search must be exact, i.e.,

it requires that the exact minimum point o f the function

E ( c ( t ) ) be found i n order t o determine X


Therefore i n practice the exact minimization is
9
q+l'
replaced by an inexact line search, i n particular by a f i n i t e search process (see e.g. Ortega and
Rheinboldt, 1970).

I n our discussion o f Gauss' method, we assumed the non-linear map y t o be injective. However, i n
many p r a c t i c a l applications the m a t r i x of f i r s t derivatives
(see e.g.

y
becomes o f non-maximum rank
a
chapter 111) and the required inverses cannot be calculated. A way out o f this dilemma is

one then
suggested by the theory of inverse linear mapping. Instead o f an ordinary inverse o f g
aB '
B
Ba
B
Ba
a a E = ( g r a d E ) is
takes a generalized inverse, g
say, of gaB. TO show that Ax
= -g
s t i l l i n a descent direction, note t h a t

Hence,

if

Ax 4

then

- (grad

E,

Ax

> 0, which shows that

Ax

has a positive

component along the negative gradient and so is downhill.


As t o the local r a t e o f convergence o f Gauss' method, recall that the extrinsic curvatures are a
property of the submanifold
w i l l remain unchanged i f

y.

a ay

Therefore, the local convergence results obtained f o r Gauss' method


has non-maximum but local constant rank.

O f the many i t e r a t i o n methods available, we only discussed Gauss' method. We did not mention any o f
the possible alternative iteration methods such as, f o r instance, Newton's method, LevenbergMarquardtls compromise or the method o f conjugate-gradients (see e.g. Ortega and Rheinboldt, 1970).
Although more intricate, these methods can become quite a t t r a c t i v e i n case of large curvature
problems since they take care, i n one way or the other, o f the curvature behaviour o f

i.

Also did we not discuss the interesting point o f view which is provided i f one interprets the i t e r a t i o n
process as a dynamical system. Consider namely Gauss' method
(a)
(b)

A X

B = gB a ( X )a Y i ( X ) g i j ( ~j s j-( X~ l ) ,
q

B
xqrl=

x B + t A xB
q
q

and assume t h a t the positive scalar t

q
then the autonomous dynamical system

is taken infinitesimally small i n each i t e r a t i o n step. We obtain

I t s solution is a curve c ( t ) which passes through the i n i t i a l value xo a t t i m e t=O and which has i t s
velocity given by the value o f the vector field

- grad E.

o f d i f f e r e n t i a l equations implies that c ( t )

is never a c r i t i c a l point o f E, this should n o t bother us

too

much

grad.~(;)

since

= 0.

one

can

show

that

under

Although the uniqueness theorem for systems

suitable

conditions

hjg

c(t)

= ;(

with

This is l i k e the pendulum paradox, which says that the pendulum once i t is i n

motion can never come t o a state o f rest, but only approximate one arbitrary closely. Thus, g'lven an
i n i t i a l guess xo which is n o t a c r i t i c a l point o f E, one can t r y t o solve our non-linear adjustment
problem by solving the system o f d i f f e r e n t i a l equations (7.1),

using one o f the many numerical

integration methods available.

I n connection w i t h the above dynamical interpretation we also mention the potential value which a
study o f the qualitative theory o f the global behaviour o f dynamical systems and o f Morse theory, can
have for a betterment o f our understanding o f non-linear adjustment. This qualitative theory is
namely concerned w i t h the existence o f equilibrium behaviour o f a dynamical system, together w i t h
questions o f local and global stability (see e.g.

Chillingworth, 1976; Hirsch and Smale, 1974). And

Morse theory studies, amongst other things, the equilibrium configuration o f a gradient system. The
Morse inequalities, for instance, place restrictions on the number o f c r i t i c a l points t h a t a function E
can have due t o the topology o f the manifold on which it is defined (see e.g. Hirsch, 1976).

Finally we note that we o m i t t e d the important case o f an i m p l i c i t l y defined submanifold

G.

This

would correspond t o a non-linearly constrained adjustment problem. Although the geometry o f the
problem is n o t too d i f f e r e n t f r o m the one discussed i n this chapter, the various methods for actually
solving a constrained problem can become quite involved (see e.g.

Hestenes, 1975). The usual way t o

go about is, t o prolong the original constrained problem w i t h the aid o f the Langrange m u l t i p l i e r r u l e
t o one which is unconstrained. I t is interesting t o point out that although the Lagrange multipliers are
o f t e n thought o f as being merely dummy variables, which are just needed t o prolong the constrained
problem i n t o an unconstrained one, they actually have an important interpretation o f their own. I n
fact,

there is a very r i c h duality theory connected w i t h the Lagrangian formulation (see e.g.

Rockafellar,

1969). It goes back t o the Legendre transformation o f classical mechanics.

The

Lagrangian formulation has namely the physical significance that i t replaces the given (kinematical)
constraints by forces which maintain those constraints. As a result the multipliers equal the forces o f

reaction (see e.g.

Krarup, 1982b). The multipliers can therefore be used as test statistics. F o r linear

models one can show t h a t the standardized Lagrangian multiplier equals Baarda's W-test statistic (see
Teunissen, 1984b).

That many more problems and topics related t o non-linear adjustment can be brought forward is
indisputable. Many questions are s t i l l open f o r future research and i t w i l l probably take some t i m e
before we understand non-linear geodesic adjustment as w e l l as we understand linear adjustment. We
therefore conclude by expressing the wish that the rather unsurveyed area o f non-linear adjustment
and statistical inference w i l l receive more serious attention than i t has received hitherto.

REFERENCES

AdQm, J., F. Halmos and M. Varga (1982): On the Concepts o f Combination of Doppler Satellite and
Terrestrial Geodetic Networks, A c t a Geodaet.,

Geophys. e t Montanist. Acad.

Sic. Hung.

Volume 17(2), pp. 147-170.


Alberda, J.E.

(1969): The Compliance o f Least-Squares Estimates w i t h the Condition Equations (In

Dutch: "Het Voldoen van Kleinste-Kwadraten schattingen aan de Voorwaardevergelijkingen"),


Laboratorium voor Geodetische Rekentechniek, R. 65, Delft.
Baarda, W. (1967a): Adjustment Theory, P a r t One (In Dutch: "Vereffeningstheory,

Eerste deel"),

Laboratorium voor Geodetische Rekentechniek, Delft.


Baarda, W. (1967b): Statistical Concepts i n Geodesy, Netherlands Geodetic Commission, Publications
on Geodesy, New Series, Vol. 2, No. 4, Delft.
Baarda, W.

(1969): A Testing Procedure f o r Use i n Geodetic Networks, Netherlands Geodetic

Commission, Publications on Geodesy, New Series, Vol. 5, No. 1, Delft.


Baarda, W. (1973): S-transformations and C r i t e r i o n Matrices, Netherlands Geodetic Commission,
Publications on Geodesy, New Series, Vol. 5, No. 1, Delft.
Baarda, W. (1978): Mathematical Models, European Organisation f o r Experimental Photogrammetric
Research, Publ. Off. Nr. 11, pp. 73-101.
Baarda, W.

(1979): A Connection between Geometric and Gravimetric Geodesy; A f i r s t sketch,

Netherlands Geodetic Commission, Publications on Geodesy, N e w Series, Vol. 6, No. 4, Delft.


Backus, G. and F. G i l b e r t (1968): The Resolving Power o f Gross E a r t h Data, Geophys. J.R. astr. Soc.,
16, pp. 169-205.
Bierens, H.J.

(1984): Robust Methods and Asymptotic Theory i n Nonlinear Econometrics, L e c t u r e

Notes i n Economics and Mathematical Systems, Springer-Verlag, Vol. 192.


Bjerhammar, A.

(1951): Rectangular Reciprocal Matrices, w i t h Special Reference t o Geodetic

Calculations, Bull. Gbod., 20, pp. 188-220.


Bjerhammar, A. (1973): Theory of Errors and Generalized M a t r i x Inverses, Elsevier, Amsterdam.
Blais, J.A.R.

(1983): D u a l i t y Considerations

i n Linear Least-Squares Estimation,

Manuscripta

Geodaetica, Vo1.8, No.2, pp. 199-213.


Blaha, G. (1984): Tensor Structure Applied t o the Least-Squares Method, Revisited, Bull. Gbod. 58,
pp. 1-30.
Brouwer, F.J.J.,

D.T.

van Daalen, F.T.

Gravesteijn, H.M.

de Heus, J.J. Kok and P.J.G.

Teunissen

(1982): The D e l f t Approach for the Design and Computation o f Geodetic Networks, In: "Forty
Years o f Thought

..."Anniversary edition on the occasion o f the 65th birthday o f Professor W.

Baarda., Vol. I,pp. 202-274.


Bunke, H. (1980): Parameter Estimation i n Nonlinear Regression Models, i n Handbook o f Statistics
(P.R. Krishnaiah, ed.), Vol. 1, North-Holland Publishing Company, pp. 593-615.
Celmins, A. (1981): Least-Squares Model F i t t i n g w i t h Transformations o f Variables, J. Statist.
Comput. Simul. Vol. 14, pp. 17-39.
Celmins, A. (1982): Estimation o f N M R Function Accuracies f r o m Least-Squares Fitting, Journal o f
Magnetic Resonance, 50, pp. 373-381.
Chillingworth, D.R.J.

(1976): D i f f e r e n t i a l Topology w i t h a View t o Applications, Research Notes i n

Mathematics, No. 9, P i t m a n Publishing.


Cox, D.R.

(1977): Nonlinear Models, Residuals and Transformations, Math. Operationsforsch. Statist.

Ser. Statistics, Vol. 8, No. 1, pp. 3-22.


Eeg, J. (1982): Continuous Methods i n Least-Squares Theory, Bollettino d i Geodesia e Scienze A f f i n i ,
Nr. 4, pp. 393-407.

Engler, K.,

E. Grafarend, P. Teunissen, and J. Zaiser (1982): Testcomputations of Three-Dimensional

Geodetic Networks w i t h Observables i n Geometry and Gravity Space. DGK, Reihe B, H e f t Nr.
258/VII, Mijnchen, pp. 119-141.
Flemming, W. (1977): Functions of Several Variables, Springer Verlag.
Gauss, C.F.

(1827): Algemeine Flachentheorie (Disquisitiones Generales C i r c a Superficies Curvas),

Deutsch herausgegeben von A. Wangerin, Leipzig, 1889.


Gauss, C.F.

(1887): Abhandlungen zur Methode der Kleinsten Quadrate, Deutsch herausgegeben von

Konigl. Preussischen Geodatischen Institut, Berlin 1887.


Goldfeld, S.M.

and R.E.

Quandt (1972): Nonlinear Methods i n Econometrics, Amsterdam, North-

Holland Publishing Company.


Grafarend, E. (1970): Verallgemeinerte Methode der kleinsten Quadraten f i r Zyklische Variabele,
ZfV. 95, H e f t 4, pp. 117-121.
Grafarend, E. (1973): A t t e m p t s f o r a U n i f i e d Theory of Geodesy, Bull. GBod., 109, pp. 237-260.
Grafarend, E. and B. Schaffrin (1974): Equivalence o f Estimable Quantities and Invariants i n Geodetic
Networks, ZfV 101, pp. 485-491.

H. Heister, R. Kelm, H. K r o p f f and B. Schaffrin (1979): Optimierung Geodatischer

Grafarend, E.,

Messoperationen, Herbert Wichmann Verlag, Karlsruhe, Band 11.


Grafarend, E.W.

(1981): Kommentar eines Geodaten zu einer A r b e i t E.B.

Christoffels, i n E.B.

Christoffel, The Influence of his Work on Mathematics and the PhysicaI Sciences. Edited by
P.L. Butzer and F. FehBr, Birkhauser Verlag.
Grafarend, E.W.,

E.H.

Knickmeyer and B. Schaffrin (1982): Geodatische Datumstransformationen,

ZfV., NO. 1, pp. 15-25.


Griffiths, L.W. (1947): Introduction t o the Theory of Equations, John Wiley & Sons, Inc., New York.
Heiskanen, W.A.

and H. M o r i t z (1967): Physical Geodesy, Freeman and Co., San Francisco/London.

Helmert, F.R. (1880): D i e Mathemat. und Physikal. Theorieen der Hoheren Geodasie, Leipzig.
Hestenes, M.R. (1975): Optimization Theory, The F i n i t e Dimensional Case, John Wiley, New York.
Hirsch, M.W. (1976): D i f f e r e t i a l Topology, Springer-Verlag.
Hirsch, M.W.

and S. Smale (1974): D i f f e r e n t i a l Equations, Dynamical Systems and Linear Algebra,

Academic Press, New York.


Hotine, M. (1969): Mathematical Geodesy, ESSA Monograph 2, Washington.
Humak, K.M.S.

(1984): Statistische Methoden der Modelbildung, Bd. 11, Akademie-Verlag, Berlin.

Jackson, J. (1982): Survey Adjustment, Survey Review, Vol. 26, No. 203, pp. 248-249.
Jennrich, R.I.

(1969): Asymptotic Properties of Nonlinear Least Squares Estimators, The Annals of

Mathematical Statistics, 40, pp. 633-643.


Kelley, R.P. and W.A. Thompson jr. (1978): Some Results on Nonlinear and Constrained Least Squares,
Manuscripta Geodaetica, Vol. 3, pp. 299-320.
Kochle, R. (1982): D i e Raumliche H e l m e r t transformation i n Algebraischer Darstellung, Vermessung,
Photogrammetrie, Kulturtechnik, 9, pp. 292-297.

Kooimans, A.H.

(1958): Principles of the Calculus of Observations, Rapport SpBcial, Neuvieme

Congres International des GBomBres, Pays-Bas, pp. 301-310.


Krarup, T. (1969): A Contribution t o the Mathematical Foundation o f Physical Geodesy, Geod. Inst.
Kdbenhavn, Medd. No. 44.
Krarup, T. (1972): On the Geometry of Adjustment, ZfV., H e f t 10, 97, pp. 440-445.
Krarup, T.

(1982a): Non-Linear Adjustment and Curvature, In: Daar heb ik veertig jaar over

... Feestbundel t e r gelegenheid van de 65ste verjaardag van professor Baarda, D e l f t ,

nagedacht

pp. 145-159.
Krarup, T. (1982b): Mechanics of Adjustment, Peter Meissl

Gedenkseminar, Geodatische Institute,

T.U. Graz.
Kube, R. and K. Schnadelbach (1975): Geometrical Adjustment of the European Triangulation
Networks

- Report of

the K E T r i g Computing Centre Mijnchen, AIG, Section I,Publication No.

11.
Kubik, K. (1967): I t e r a t i v e Methoden zur Lijsung des Nichtlinearen Ausgleichsproblems, ZfV., Nr. 6,
pp. 214-225.
Levallois, J.J.

(1960): L a RBhabilitation de l a GBodBsie Classique e t l a GBodBsie Tridimensionelle,

Bull. GBod., No. 68, pp. 193-199.


Marussi, A.

(1952): Intrinsic Geodesy, The Ohio State Research Foundation, Project No. 485,

Columbus.
Meissl, P. (1973): Distortions of Terrestrial Networks caused by Geoid Errors, Bolletino di Geodesia e
Scienze A f f i n i , N. 2, pp. 41-52.
Meissl. P. (1982): Least Squares Adjustment, A Modern Approach, Mitteilungen der geodkitischen
Institute der Technischen Universitat Graz, Folge 43.
Molenaar, M. (1981a): A Further Inquiry i n t o the Theory of S-transformations and C r i t e r i o n Matrices,
Netherlands Geodetic Commission, Vol. 7, Nr. 1, Delft.
Molenaar, M. (1981b): S-transformations and A r t i f i c i a l Covariance Matrices i n Photogrammetry, I T C
Journal, No. 1, pp. 70-79.
Morduchow, M. and L. L e v i n (1959): Comparison of the Method of Averages w i t h the Method of
Least-Squares:

F i t t i n g a Parabola.

Presented a t the 557th Meeting of the American

Mathematical Society, N e w York.


Moritz, H. (1979): The Geometry of Least-Squares, Publications of the Finnish Geodetic Institute, No.
89, pp. 134-148.
Neeleman, D. (1973): Multicollinearity i n Linear Economic Models, Tilburg University Press.
Nibbelke, P. (1984): Adjustment of Geodetic Networks on the Ellipsoid (In Dutch: "Vereffening van
geodetische netwerken op de ellipsoide"), thesis.
Ortega, J.M.

and W.C.

Rheinboldt (1970): I t e r a t i v e Solution of Nonlinear Equations i n Several

Variables, Academic Press.


Penrose, R. (1955): A Generalized Inverse f o r Matrices, Proc. Cambridge Philos. Soc.,

51, pp. 406-

413.
Peterson, A.E.

(1974): Merging of the Canadian Triangulation Network w i t h the 1973 Doppler

Satellite Data, The Canadian Surveyor, Vol. 28, No. 5, pp. 487-495.
Pope, A. (1972): Some P i t f a l l s t o be Avoided i n the Iterative Adjustment of Nonlinear Problems,
Proceedings o f the 38th Annua.1 Meeting, American Society of Photogrammetry.

Pope, A. (1974): Two Approaches t o Nonlinear Least-Squares Adjustments, The Canadian Surveyor,
Vol. 28, No. 5, pp. 663-669.
Rao, C.R. (1973): Linear Statistical Inference and i t s Applications, Wiley, New York.
Rao, C.R.

and S.K. M i t r a (1971): Generalized Inverse o f Matrices and i t s Applications, J. Wiley, New

York.
Rockafellar, R.T. (1969): Convex Analysis, Princeton University Press, Princeton, N.J.
Rummel, R. and P.J.G.

Teunissen (1982): A Connection between Geometric and Gravimetric Geodesy

- Some Remarks on the Role of

the Gravity Field, In: "Forty Years of Thought

..."Anniversary

edition on the occasion o f the 65th birthday of Professor W. Baarda., Vol. 11, pp. 602-623.
Rummel, R. (1984): F r o m the Observational Model t o Gravity Parameter Estimation, L e c t u r e Notes
o f the International Summer School on L o c a l Gravity F i e l d Approximation, Beijing, China,
Aug. 2 1 t o Sept. 4.
Sansb, F. (1973): A n Exact Solution of the Roto-Translation Problem, Photogrammetria, 29, pp. 203216.
Schmidt, W.H. (1982): Testing Hypotheses i n Nonlinear Regressions, Math. Operationsforsch. Statist.,

Secr. Statistics, Vol. 13, 1 pp. 3-19.


Schwidefsky, K. and F. Ackermann (1975): Photogrammetrie, Grundlagen, Verfahren, Anwendungen,
B.G. Teubner, Stuttgart.
Spivak, M. (1975): D i f f e r e n t i a l Geometry, Vol. 1-5, Publish or Perish Inc.
Stark,

E.

and E.

Mikhail (1973):

Least-Squares and Non-Linear Functions, Photogrammetric

Engineering, pp. 405-412.


Stoker, J.J. (1969): D i f f e r e n t i a l Geometry, Wiley-Interscience.
Strang van Hees, G.L.

(1977): Orientation of the Ellipsoid i n Geodetic Networks, D e l f t Progress

Report, 3, pp. 35-38.


Teunissen, P.J.G.

(1980): Some Remarks on Gravimetric Geodesy, Reports of the Department o f

Geodesy, Section Mathematical and Physical Geodesy, No. 80.2, Delft.


Teunissen, P.J.G.

(1982): Anholonomity when using the Development Method f o r the Reduction of

Observations t o the Reference Ellipsoid, Bull. GBod. 56, no. 4, pp. 356-363.
Teunissen, P.J.G.

(1983): A N o t e on Anholonomity, Paper presented a t the meeting on Geometric

Geodesy, IUGGIAIG general assembly, Hamburg 15-27 August 1983.


Teunissen,

P.J.G.

(1984a):

Generalized Inverses,

Adjustment,

The

Datum

Problem

and

S-

transformations, L e c t u r e Notes, International School of Geodesy, 3rd Course: Optimization


and Design of Geodetic Networks, Erice-Trapani-Sicily, 25 A p r i l
Teunissen, P.J.G.

- 10 May 1984.

(1984b): Quality C o n t r o l i n Geodetic Networks, Lecture Notes, International School

o f Geodesy, 3rd Course: Optimization and Design o f Geodetic Networks, Erice-Trapani-Sicily,

- 10 May 1984.

25 A p r i l
Teunissen, P.J.G.

(1984~):A N o t e on the Use o f Gauss' formulas i n Non-Linear Geodesic Adjustment,

Paper presented a t the 16th European Meeting of Statisticians, Marburg (FRG), 3-7 Sept. 1984.
Tienstra, J.M.

(1947): A n Extension o f the Technique of the Method of Least-Squares t o Correlated

Observations, Bull. GBod., 6, pp. 301-335.


Tienstra, J.M.

(1948): The Foundation o f the Calculus of Observations and the Method o f Least-

Squares, Bull. GBod., 10, pp. 289-306.


Tienstra,

J.M.

(1956): Theory of the Adjustment of Normally Distributed Observations,

N.V.

U i t g e v e r i j Argus, Amsterdam.
Torge, W. and H.G.

Wenzel (1978): Dreidimensionale Ausgleichung des Testnetzes Westharz, DGK,

Report B234, Mijnchen.


Vanicek, P. (1979): Tensor Structure and the Least-Squares, Bull. GBod., Vol. 53, No. 3, pp. 221-225.
Van Mierlo, J. (1978): A Testing Procedure f o r Analysing Geodetic Deformation Measurements, F.I.G.
Symp. on Deformation Measurements, Bonn.
Van Mierlo, J. (1979): Free Networks Adjustment and S-transformations, D G K B, Nr. 252, pp. 41-54.
Whittaker, E. and G. Robinson (1944): The Calculus o f Observations, Dover Publications, Inc.
Wolf, H. (1963a): D i e Grundgleichungen der Dreidimensionalen Geodasie i n elementarer Darstellung,
ZfV, Nr. 6, pp. 225-233.
Wolf, H. (1963b): Geometric Connection and Re-Orientation of Three-Dimensional Triangulation
Nets, Bull. GBod., No. 68, pp. 165-169.
Wolf, H. (1978): The H e l m e r t Block Method

I t s Origin and Development. Proceedings Second

International Symposium on Problems Related t o the Redefinition o f N o r t h American Geodetic


Networks, pp. 319-326.
Yeremeyev, V.F.

and M.I.

Yurkina (1969): On Orientation o f the Reference Geodetic Ellipsoid, Bull.

GBod., No. 91, pp. 13-15.

Você também pode gostar