Modelling Future Telecom Systems

MODELLING FUTURE
TELECOMMUNICATIONS SYSTEMS
BT Telecommunications Series
The BT Telecommunications Series covers the broad spectrum of

telecommunications technology. Volumes are the result of research
and development carried out, or funded, by BT, and represent the
latest advances in the field.
The series include volumes on underlying technologies as well
as telecommunications. These books will be essential reading for
those in research and development in telecommunications, in
electronics and in computer science.
1. Neural Networks for Vision, Speech and Natural Language
Edited by R Linggard, D J Myers and C Nightingale
2. Audiovisual Telecommunications
Edited by N D Kenyon and C Nightingale
3. Digital Signal Processing in Telecommunications
Edited by F A Westall and S F A Ip
4. Telecommunications Local Networks
Edited by W K Ritchie and J R Stern
5. Optical Network Technology
Edited by D W Smith
6. Object Oriented Techniques in Telecommunications
Edited by E L Cusack and E S Cordingley
7. Modelling Future Telecommunications Systems
Edited by P Cochrane and D J T Heatley
MODELLING FUTURE
TELECOMMUNICATIONS
SYSTEMS
Edited by
P. Cochrane
Advanced Applications and Technologies
BT Laboratories
Martlesham Heath
UK
and
D.J.T. Heatley
Advanced Mobile Media
BT Laboratories
Martlesham Heath
UK
SPRINGER-SCIENCE+BUSINESS MEDIA, B.V .
First edition 1996

1996 Springer Science+Business Media Dordrecht
Originally published by Chapman & Hall in 1996
Softcover reprint of the hardcover 1st edition 1996
ISBN 978-1-4613-5850-3
ISBN 978-1-4615-2049-8 (eBook)
DOI 10.1007/978-1-4615-2049-8
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the UK Copyright Designs and Patents Act,
1988, this publication may not be reproduced, stored, or transmitted, in any form
or by any means, without the prior permission in writing of the publishers, or in
the case of reprographic reproduction only in accordance with the terms of the
licences issued by the Copyright Licensing Agency in the UK, or in accordance
with the terms of licences issued by the appropriate Reproduction Rights
Organization outside the UK. Enquiries concerning reproduction outside the
terms stated here should be sent to the publishers at the London address printed
on this page.
The publisher makes no representation, express or implied, with regard to the
accuracy of the information contained in this book and cannot accept any legal
responsibility or liability for any errors or omissions that may be made.
A catalogue record for this book is available from the British Library
(oo) Printed on permanent acid-free text paper, manufactured in accordance with

ANSI/NISO Z39.48-1992 and ANSI/NISO Z39.48-1984 (Permanence of Paper).
Contents
Contributors
Preface, Peter Cochrane and David J T Heatley
vii
IX
The future
P Cochrane
Modelling interactions between new services

M H Lyons
11
Fractal populations
S Appleby
22
Internal markets
I Adjali, J L Fernandez-Villacaiias Martin and M A Gell
45
Evaluation of Hopfield service assignment

M R W Manning and M A Gell
65
Hierarchical modelling
M A H Dempster
84
Graph-theoretical optimization methods

EAMedova
103
Distributed restoration
D Johnson, G N Brown, C P Botham, S L Beggs
and I Hawker
124
Intelligent switching
R Weber
144
10
Neural networks
S J Amin, S Olafsson and M A Gell
153
vi
CONTENTS
11
System and network reliability

P Cochrane and 0 J T Heatley
168
12
Pre-emptive network management
201
R A Butler and P Cochrane
13
Evolving software
224
C S Winter, P W A McIIlroy and
J L Fernandez-Villacanas Martin
14
Software agents for control
245
S Appleby and S Steward
15
Evolution of strategies
264
S Olafsson
16
Dynamic task allocation
285
S Olafsson
17
Complex behaviour in nonlinear systems
311
C T Pointon, R A Carrasco and M A Gell
Index
345
Contributors
I Adjali
Systems Research, BT Laboratories
S Amin
S Appleby
S L Beggs
Network Software Applications, BT

Laboratories
C P Botham
Network Modelling, BT Laboratories
G N Brown
R A Butler
Faculty of Science and Technology, The

Robert Gordon University
C A Carrasco
School of Engineering,
Staffordshire University
P Cochrane
Advanced Applications and Technologies,

BT Laboratories
M A H Dempster
Department of Mathematics, University of

Essex
J L Fernandez-Villacanas
Martin
M AGell
Multi-Business Zones Research, Ipswich
I Hawker
D J T Heatley
Advanced Mobile Media, BT Laboratories
D Johnson
M H Lyons
Applications Research, BT Laboratories
P W A Mcillroy
E A Medova
Department of Electronic Systems

Engineering, University of Essex
S Olafsson
viii CONTRIBUTORS
C T Pointon
School of Engineering,
Staffordshire University
S Steward
Distributed Systems, BT Laboratories
R Weber
Management Studies Group,

Cambridge University
C S Winter
Intelligent Systems Research,

BT Laboratories
Preface
Since the invention of the electric telegraph, generations of engineers have
concerned themselves with the modelling of systems and networks. Their goal
has been, and continues to be, the gaining of fundamental insights and
understanding leading to the optimum exploitation of available technology.
For over 130 years this has brought about startling advances in the development of transmission systems, switching and networks. We are now within
sight of realizing a global infrastructure that represents the nervous system
of the planet, with telecommunications governing and underpinning all of
mankind's activity. It is therefore vital that we continue to expand our
understanding of all facets of this global infrastructure, from the constituent
parts through to market demands.
At a time when national networks are achieving 100070 digital transmission
and switching, with optical fibre dominating over copper cables, and with
satellite and microwave radio, demand for mobility and flexible access is on
the increase, and a new awareness of complexity has arisen. Firstly, the world
of telecommunications is becoming increasingly complex and inherently
nonlinear, with the interaction of technologies, systems, networks and
customers proving extremely difficult to model. Secondly, the relevance of
established models and optimization criteria are becoming questionable as
we move towards the information society. For example, minimizing
bandwidth usage or charging for distance hardly seems appropriate when
both are becoming increasingly low cost and irrelevant with the deployment
of optical fibre systems. Conversely, optimizing the performance and cost
of system hardware and software independently of each other seems shortsighted when either can represent a dominant risk. In a similar vein we could
also challenge the continuation of established, but little understood,
technologies and approaches in software and packet switching.
The key question is whether we are optimizing the right parameters to
the right criteria. There are no universal answers or solutions to this question
as we live in a sea of rapidly changing technology, applications and demand.
Even a crude global model remains just a gleam in our engineering eye, but
a much coveted objective. In the meantime, we have to settle for an
independent and disconnected series of models and assume we can cope with
the rising level of chaos (in the mathematical sense)! Probably the single,
PREFACE
most focused hope that we can foster is the ideal of widespread (even global)
simplification. Switching and transmission systems hardware has already
undergone a meteoric rise in complexity, followed quite naturally by incredible
simplification, and there are now signs that software may ultimately share
the same good fortune. In contrast, their interaction with services, compounded by the unpredictability of the market-place, shows no such tendency
- so far!
The ideal of a single, all-embracing model that will identify and correctly
optimize the right parameters is undoubtedly some way off. It may even be
unattainable in the strict sense due to the rapid development of new technologies, services and societies, and so we may never attain true global
optimization. Nevertheless, work towards understanding that goal and the
barriers must continue. It is therefore the purpose of this book to highlight
a meaningful sample of the diverse developments in future system and
network modelling. Our selection has been purposeful and designed to
contrast with, and challenge, the progressively established wisdoms and
practices of previous decades. We contend that telecommunications is undergoing fundamental change across a broad range of technologies, and as such
can adopt new strategies to dramatic effect. The key difficulty is the
transformation of established human preconception. For example, one fibre
in a cable can be more reliable than ten in parallel; the duplication of power
supplies can realize a higher level of network reliability than alternative
routeing; conventional software routines amounting to millions of lines of
code can be replaced by just a few hundred by using the principles of artificial
life; conventional market models do not necessarily apply to telecommunications, etc. All of these are known to be true and yet fly in the face of
current expectations and acceptability.
In gathering together this selection of diverse topics, we have tried, with
the help of the best crystal ball available, to indicate the most likely directions
for the long-term development of telecommunications. In this task we have
enjoyed the full co-operation and support of the individual authors whose
respective works all support our future vision. That is not to say that there
have not been, or do not remain, points of contention. Quite the contrary.
Nor is our selection complete - we have merely taken a snapshot, the best
available at this epoch, to indicate some of the most promising and likely
directions. We hope that you, the reader, will find our selection agreeable
and that you will share in our excitement for the challenge ahead.
Peter Cochrane
David J T Heatley
THE FUTURE
P Cochrane
1.1
INTRODUCTION
Most telecommunications networks are still designed as if the constraints of

the copper and radio past were still with us - compress the signal, save
bandwidth, minimize hold times, continue with the old protocols and
switching architectures (although they are now the new bottle-necks), use
the established design rules despite technology changes! The apportionment
of operational and economic risks also seems vested in the past and demands
realignment. The wholesale reduction in the hardware content of networks
relative to the massive expansion of software is a prime example, as is the
interface on most telephonic services which is dictated by a base technology
that is now over 100 years old.
Today we sit at a crossroads where the advance of fibre optics has removed
the bandwidth bottle-neck and distance-related costs. Integrated electronics
has made information processing and storage limitless and effectively free,
whilst software has introduced the potential for system and network failures
on a grand scale. What should we be doing? First, we need to start thinking
'engineering systems' and not hardware, software, networks and applications
as if they were disconnected. Secondly, we need to embrace new tools and
techniques to help us understand the nonlinear world we live in. Thirdly,
we have to humanize the interfaces and information presentation to facilitate
understanding and control. Finally, we should not shy away from starting
again with a clean sheet of paper.
In this chapter we briefly examine some of these key issues and postulate
new possibilities based on concepts still at the research stage. We also
2 THE FUTURE
challenge a number of the established wisdoms and indicate the likely impact
of the changes forecast and the implications for future networks.
1.2
NEW NETWORKS
In less than 15 years, the longer transmission spans afforded by optical fibre
has seen a reduction in the number of switching nodes and repeater stations.
The arrival of the optical amplifier and network transparency will accelerate
this process and realize further improvements across a broad range of
parameters, including:
reduced component count;
improved reliability;
reduced power and raw material usage;
increased capacity and utility.
A further logical (and revolutionary) development will see the

concentration of more traffic on to single rather than several fibres in parallel.
Optical fibre transparency with wavelength division multiplexing (WDM) is
the more reliable option as the time to repair a single fibre is far shorter.
As transmission costs continue to fall, the balance between transmission,
switching and software has to be readdressed. Radical reductions in the
number of network nodes and repeater spans, consolidated switches, reductions in network management and constraints imposed by software are
good targets. We might also have to move away from calls that just span
the local area and expand, over some relatively short period, to encompass
all of the UK, then Europe, and gradually the whole planet. E-mail is
the only practical realization currently available - and it is a subscription service!
Whilst cellular radio technology is now well developed, and satellite mobile
will no doubt follow, we suffer an apparent lack of spectrum, although we
have yet to exploit microwave frequencies> 30 GHz. We also have the ability
to take radio off-air, up-convert it to the optical regime, feed it down fibre
that is transparent through optical amplification, down-convert and retransmit at a distant cell undistorted. In addition, the performance of freespace optics in the home and office has been demonstrated to be very similar
to microwaves, but with the advantage of a vastly greater bandwidth.
Research systems are already providing pico-cellular illumination of the desk,
personal computer, individual staff members, and offer the potential of active
SWITCHING AND TRANSMISSION
badges/communicators and inter-desk/computer links. Applications in the

local loop might also be anticipated as an alternative to fibre, copper and
radio.
Satellite technology has been eclipsed by optical fibre which carries over
55010 of all international traffic. However, satellite now has a new role in
broadcast, getting into difficult locations, mobility and service restoration.
The path delay ( - 300 ms) for geostationary satellites is problematic for real
time communications. However, there are proposals for low earth orbit
( - 1000 km) satellites to form a cellular system in the sky using 70 or so lowcost units. Other exciting developments include direct satellite-to-satellite links
using free-space optics, on-board signal processing and switching, individually
positioned multiple micro-beams, the use of much higher frequencies, and
even optical links from satellite to ground. All of this can be expected to
extend satellite technology by an order of magnitude (or so) beyond that of
today. Fundamentally, satellite systems look set to migrate to mobile/
difficult/rapid-access applications.
For global mobility the major challenges are likely to remain the
organizational and control software necessary to track and bill customers
- and the ability to deflect/hold/store calls that are traversing time zones
at unsocial hours.
With the increasing multiplicity of carriers, growth in mobility, demand
for new types of service, and growing complexity of signalling and control
for call and service set-up, as well as the complex nature of bits-on-demand
services, it may also be necessary to abandon the practice of charging for
bandwidth. We are already in the regime of carriers supporting signalling
overheads approaching 50% of network capacity, with billing and
management system costs exceeding 20% of turnover - and growing. The
future service trajectory requirement of customers, carriers and networks
cannot sustain such non-essential growth.
1.3
SWITCHING AND TRANSMISSION
Only a decade ago, a typical UK repeater station had to accommodate

2.4 Gbit/s of speech circuit capacity. Today, it accommodates 8 Gbit/s and
projections indicate that 40 Gbit/s will be required by the end of the
millennium. This packing density cannot be achieved with conventional
electronics alone - another degree of freedom is required, and wavelength
is the obvious choice. Recent developments have seen the demonstration of
suitable technology such as contactless 'D-type' (leaky feeder) fibres
embedded in printed circuit backplanes. When coupled to an erbium-doped
fibre amplifier, a lossless multitap facility is realized which can distribute
THE FUTURE
10 Gbit/s and higher rates almost endlessly. An interesting concept now arises
- the notion of the infinite backplane. It could be used to link, for example,
Birmingham, Sheffield and Leeds through the use of optically amplifying
fibre that offers total transparency. Such concepts naturally lead to the idea
of replacing switches by an optical ether operating in much the same way
as radio and satellite systems today. The difference is the near-infinite
bandwidth of the optical ether. Demonstrators have already shown that a
central office with up to two million lines could be replaced by an ether
system, but suitable optical technology is probably still some 15 or so years
away. Systems of this kind would see all the software, control and
functionality located at the periphery of networks with the Telco probably
becoming a bit carrier only!
1.4
SIGNAL FORMAT
Asynchronous transfer mode (ATM) is often quoted as the ultimate answer

for future bandwidth-on-demand services. In an electronic format, this view
is likely to be short lived as terminal equipment development will overtake
the ATM capacity available within the time-frame of network provision. It
will also become apparent that ATM switching nodes are inefficient when
realized on a large scale, as is the overall network throughput. Other major
problems are associated with the fixed and variable delays which necessitate
complex signal processing - often at customer terminals. Interestingly,
photonic ATM looks far more attractive as the capacity available resembles
the M25 motorway with a near infinite number of lanes! It also looks feasible
to waste some of the optical bandwidth to realize new forms of switching
on the basis of simple binary steering. This requires a greater number of
switching elements, but they can be simple (with only two output directions)
and have little or no software base. Such a possibility also falls in line with
the development of soliton and WDM-soliton systems.
A combination of customer/service demand and technology progression
is already changing the nature of telecommunications. In the midst of this
revolution it is interesting to contemplate the history of cyclic alternation
between digital and analogue formats. In parallel, computation has also
followed an alternating analogue/digital history. We might then presuppose that this alternation will continue into the future, and may even be
encouraged by the inherent qualities of optical fibre, network transparency,
ether nets and photonic computing. Perhaps the future will not be all-digital!
SOFTWARE
1.5
THE ECONOMICS OF ANALOGUE AND DIGITAL
During the 1970s many administrations completed studies that established

the combination of digital transmission and switching as the most economic
option. Programme decisions made at the time are only now seeing their full
realization with the near full digitalization of national and international
networks in the developed world. A point to be recognized here is that
historically the fixed assets of a telco are generally measured in Bn, and
any major network changes take 5-10 years to complete. Before fibre
technology, the asset ratio for long lines would be of the order of 50%
transmission and 50% switching. With the widespread introduction of optical
fibre, the transmission costs may now be as little as 10070, with some 70%
of all resources residing in the access network. This has significantly changed
the economic balance away from the original 'all-digital' figures. When
repeaters are spaced at > 50 km and optical amplification is introduced, an
all-analogue (or hybrid) system for global wideband services may then become
optimum.
1.6
SOFTWARE
In the software domain very minor things pose a considerable risk, which,
it appears, might grow exponentially in the future. New ways of negating
this increasing risk are necessary as the present trajectory looks unsustainable
in the long term. Perversely, the unreliability of hardware is coming down
rapidly whilst that of software is increasing, so much so that we are now
seeing sub-optimal system and network solutions. From any engineering
perspective this growing imbalance needs to be addressed. If it is not, we
can expect to suffer an increasing number of ever more dramatic failures.
It is somewhat remarkable that we should pursue a trajectory of
developing ever more complex software to do increasingly simple things. This
is especially so, when we are surrounded by organisms (moulds and insects)
that have the ability to perform complex co-operative tasks on the basis of
very little (or no) software. An ant colony is one example where very simple
rule-sets and a computer with - 200 (English garden ant) to 2000 (Patagonian
ant) neurons are capable of incredibly complex behaviour. In recent studies,
the autonomous network telepher (ANT) has been configured as a contender
for the future control of networks. Initial results from simulation studies
have shown considerable advantages over conventional software. For network
restoration, only 400 lines of ANT code replaced the > 106 lines presently
used in an operational network. Software on this scale 1000 lines) is within
THE FUTURE
the grasp of the designer's full understanding, and takes only a few days
to write and test by a one-man team.
1.7
NETWORK DISASTER SCALE
A number of network failures on a scale not previously experienced have

recently occurred. Quantifying their impact is now both of interest and
essential if future network design is to be correctly focused. The key difficulty
is the diversity of failure types, causes, mechanisms and customer impact.
A simple ranking of failure by severity uses the approach of Richter for
earthquakes. The total network capacity outage (loss of traffic) in customeraffected time is thus:
... (l.l)
where N = number of customer circuits affected,

T = total down time.
Exploiting the relationship with the Richter scale further, typical and
extreme events are as follows:
on the earthquake scale, 6.0 marks the boundary between minor and
major events - a magnitude 6 outage would represent, say, 100 000
people losing service for an average of 10 hours;
earthquakes in excess of 7.0 magnitude are definitely considered major

events - for a telecommunications network, outages above 7.0 are rare,
but the series of USA outages in the summer of 1991 were at such a level;
globally there appears to have been only one outage that exceeded level 8.
1.8
NETWORK MANAGEMENT
Monitoring systems and operations, extracting meaningful information, and

taking appropriate action to maintain a given grade of service is becoming
increasingly complex and expensive. Much of the data generated by networks
is redundant and the complexity of the management role increases in
proportion (at least) to the amount of data to be handled. Consider the
quantity of data generated when a network node failure generates a fault
report, and, in addition, other connected nodes also generate error reports.
PEOPLE AND RELIABILITY
For a fully interconnected network of N nodes, this results in one failure

report and error reports from the remaining nodes with which it
communicates, i.e. (N - 1) others. Allowing for two or more nodes failing
simultaneously, it can be shown that:
the mean number of reports per day = N 2/(MTBF in days)
'" (1.2)
For example, a network of 500 000 nodes with a mean time before failure
(MTBF) of 10 years will suffer an average of 137 node failures and will
generate an average of 68.5 million reports per day. Assuming each node
is communicating with all the others is, in general, unreasonable, and the
opposite extreme is the least connected case, which leads to:
the mean number of reports per day = [N 2/6]/(MTBF in days)
... (1.3)
Whilst there are network configurations and modes of operation that
realize a fault report rate proportional to N, the nature of telecommunications
networks to date tends to dictate an N 2 growth. A large national network
with thousands of nodes can generate information at rates of -1 Gbyte/day
under normal operating conditions. Clearly, maximizing the MTBF and
minimizing N have to be key design objectives. A generally hidden penalty
associated with the N 2 growth is the computer hardware and software, plus
transmission and monitoring hardware overhead. For very large networks
this is now growing to the point where it is starting to rival the revenue-earning
elements - a trend that cannot be justified or sustained.
1.9
PEOPLE AND RELIABILITY
The reliability of operational systems is commonly less than that predicted

at the design stage. Moreover, there are often significant differences between
the measured performance of individual units and large numbers co-located
in single installations. The sub-optimal behaviour may be masked by
maintenance activity, resulting in performance below specification. In most
cases this can be traced back to human activity - imperfect repair,
interference and incidental damage/weakening of individual or groups of
elements. A mathematical model assuming a finite probability of latent faults
being introduced by human intervention, i.e. repair/maintenance action
creating latent faults in the serviced unit or those nearby, reveals an overall
network performance reduction of 50010 to be a reasonable expectation. This
THE FUTURE
level of weakening is also supported by practical experience across a broad

range of equipment, system and network types and is applicable to all forms
of line plant, radio, electronic and photonic transmission, and switching and
computing hardware.
1.10
QUANTUM EFFECTS AND NODE REDUCTION
All of our experience of systems and networks to date, coupled with the
general development of photonics and electronics, points towards networks
of fewer and fewer nodes, vastly reduced hardware content, with potentially
limitless bandwidth through transparency. With networks of thousands of
nodes, failures tend to be localized and isolated - barring software-related
events! The impact of single or multiple failures is then effectively contained
by the 'law of large numbers' with individual customers experiencing a
reasonably uniform and flat grade of service. However, as the number of
nodes is reduced, the potential for catastrophic failures increases, with the
grade of service seen at the periphery becoming extremely variable. The point
at which such effects become apparent depends on the precise network type,
configuration, control and operation; but, as a general rule, networks with
< 50 nodes require design attention to avoid quantum effects occurring under
certain traffic and operational modes. A failure of a node or link today, for
a given network configuration and traffic pattern, may affect only a few
customers and go almost unnoticed. The same failure tomorrow could affect
large numbers of customers and be catastrophic purely due to a different
configuration and traffic pattern.
1.11
A GLOBAL MODEL
The modelling of telecommunications networks has traditionally seen isolated

activities concerned with traffic prediction/flow, topologies, switching,
transmission, performance, customer activities, etc. In each case the degree
of sophistication and refinement has reached a high level with reasonable
agreement between theory and practice. As we approach the 21st century,
the key challenge is now to produce global models that link all elements in
the process. Existing and isolated models are inadequate for the rapidly
changing technology and service base that is developing. It will be no use
looking at isolated performance parameters, bereft of the customer and
network activity. We need a global view to manage the increasingly important,
and complex, telecommunications infrastructure. Furthermore, the
perspective required will be that of the customer rather than the network
CONCLUSIONS
operator, for serving customer needs will become the essential credo as the
level of competition increases. Models giving an end-to-end network view
of the service - interfaces, protocols, signalling, connection, performance
and activity - are therefore the next big challenge.
1.12
CONCLUSIONS
The increasing speed at which consumer and office electronics can be

developed and deployed is promoting:
a rapid migration of increased computing power and artificial intelligence

towards the customer;
a growing diversity of service type and demand;
an increasingly mobile population;
a rising customer expectation in terms of service level/availability and

network performance.
Such a demand can only be satisfied by networks that can respond on

a similar time scale through:
increasing network transparency via photonic amplification and

switching;
reductions in the amount of network plant and number of nodes;
new forms of network software and control;
a migration to analogue or hybrid forms of signalling;
customers becoming responsible for service configuration and control;
the provision of bandwidth on demand;
new forms of tariff and billing;
a new regulatory regime.
Most of the technology to realize all the above is either to hand or at

least in the research laboratory. The key outstanding problems to be solved
include the realization of a suitable switching fabric with new forms of
software/control and management. However, before we can start the build
process we need the vital ingredient of suitable and adequate global models
- these we do not have!
10
THE FUTURE
BIBLIOGRAPHY
Cochrane P: 'Future trends in telecoms transmission', Proc IEEE F 13717, pp 669
(December 1984).
Cochrane P, Heatley 0 T J and Todd C J: lTV World Telecoms Conference, Geneva
91, p 105 (1991).
Cochrane P and Brain M C: IEEE Comsoc Mag 26/11, pp 45-60 (November 1988).
IEEE Special Issue: 'Fiber in the subscriber loop', LTS 3/4 (November 1992).
IEEE Special Issue: 'Realizing global communications', COMSOC Mag 30/10 (1992).
IEEE Special Issue: 'The 21st century subscriber loop', COMSOC Mag 29/3 (1991).
IEEE Special Issue: 'Global deployment of SOH complaint networks', COMSOC
Mag (August 1990).
IEEE Telecommunications Network Design & Planning, J-SAC 7/8 (October 1989).
World Communications - going global with a networked society, lTV Publication
(1991).
Hughes C J: 'Switching - state-of-the-art' , BT Technol J, 4 , No I, pp 5-19 and 4 ,
No 2, pp 5-17 (1986).
- Hawker I: 'Future trends in digital telecoms transmission networks', IEEE ECEJ,
2/6, pp 251-290 (December 1990).
Brown G N et al: 3rd lEE Conference on Telecommunications, Edinburgh, PP 319-323
(1991).
Olshanski R: 'Sixth channel FM video SCM optical communication system', IEEE
OFC'88, p 192 (1988).
Chidgey P J and Hill G R: 'Wavelength routeing for long haul networks', ICC-89,
p 23.3 (1989).
Healy P et al: 'SPIE digital optical computing II', 1215, PP 191-197 (1990).
MODELLING INTERACTIONS
BETWEEN NEW SERVICES
M H Lyons
2.1
INTRODUCTION
Telcos are faced with an increasingly volatile business environment arising

from the introduction of new technologies and services, the emergence of
competitors, and other changes in the business/economic/regulatory environment. In this complex situation, it can be difficult to see intuitively the full
impact of one particular change on a telco's overall operation. To help
manage this complexity, there is increasing interest in the development of
advanced economic models to investigate the impact on business of changes
in operating environments.
To support these models, there is a need to develop algorithms which
describe specific interactions within the model. These are often generic and
may be encountered in many different contexts. For example, competition
may take several forms:
competition between similar services offered by rival operators;
competition between alternative telecommunications services offering

similar facilities;
12
MODELLING SERVICE INTERACTIONS
competition between a telecommunications service and a rival (nontelecommunications) service offering similar facilities.
Competition between services will increase in the future. A large number

of new telecommunications services are planned for the future. These services
will have a major impact on future revenue streams through their own growth
and through the impact they may have on existing services. However,
predicting their usage presents considerable difficulty; detailed statistical
models are of little value due to the lack of any history. Insight into the
development of new services can be gained by the use of simple models which
reproduce the main features of a competitive market.
A number of workers have modelled competition between services and
products using equations originally developed for biological systems [1, 2] .
Careful curve fitting can give good estimates of future growth, but the
meanings of the various coefficients are not always clear. In addition, most
of these models are based on predator/prey relationships implying that
customers can change, e.g. from service A to service B, but not from B to
A. This is not necessarily the case when considering competition between
telecommunications services.
In this chapter, equations are developed with a view to understanding
the significance of the various parameters. It is assumed that a key parameter
affecting the growth of rival services is the degree to which they are
interconnected. The consequences of this assumption are examined in detail.
2.2
GROWTH MODEl
The size of the communications base which is accessed by a particular service

will strongly influence customer choice. A major factor is likely to be the
extent to which a customer using one service can communicate with customers
of competing services (i.e. the extent to which there is full interconnection
and equality of access between services). This can be described by an
interconnection parameter F j which is defined as follows:
Fj = the fraction of the total service class which is fully interconnected

with service i.
The model presented in this chapter assumes that other differences
between services can be represented by a single preference parameter Pj. This
GROWTH MODEL 13
parameter includes factors such as price and quality of service and is defined
formally as:
Pi = the probability that a customer will, all things being equal,
purchase service i. This definition implies EPi = 1.
I
A number of workers have developed equations describing the growth

of markets for products or services [3, 4] ; such equations are usually based
on the logistic equation. In the growth model considered here, competing
services are grouped into a single service class which is assumed to grow
exponentially at a rate R. However, this is for simplicity, and it is possible
to replace R with a more complex growth function.
After a unit time period, the distribution between services of two groups
must be considered:
new customers;
existing customers.
In the following sections, analytical expressions are derived separately

for each group.
2.2.1
New customers
If n\ and n2 are the number of existing customers to services 1 and 2 respectively, then N (the total number of existing customers to the service class)
is given by n\ + n2 and the number of new customers (oN) in unit time is

given by NR. The growth of services 1 and 2 is ont = TIN and
on2 = T2N respectively. Thus:
... (2.1)
It is assumed that growth of service i is proportional to the preference

parameter Pi and the interconnection parameter Fi Thus, Tt = Kpi F I and
T2 = KP2F2 where K is a constant. Then:
... (2.2)
and the growths of services 1 and 2 are given by:
14
[PtF/(P1F1 + P2 F2)]RN
... (2.3a)
+ P2 F 2)]RN
... (2.3b)
[P2F2/(P1F1
2.2.2
Existing customers
The number of existing customers (N) is constant during a unit time period.
However, some redistribution of existing customers between the services may
occur (Fig. 2.1).
service 2
service 1
J12 = CP2 F2n 1
n1
Fig. 2.1
J21 = CP1 F1n2
n2
Redistribution of existing customers between services.
It is assumed that, at any time, some customers will be transferring from

service I to service 2 and vice versa. The transfer rate to, for example, service
2 (J12) will be proportional to P2' F 2 and the number of customers in service
I (nt). A constant C describes the average willingness for customers to
transfer from one service to another. Thus the number of customers
transferring from service I to service 2 in unit time (J12) is given by:
... (2.4)
and the number transferring from service 2 to service I (J21 ) is:

... (2.5)
The redistribution of existing customers between services 1 and 2 is given

by the difference of these two rates:
onl
J21 - J 12 = C(P 1F 1n2 - P2F2nt)

... (2.6)
- on2
2.2.3
Overall growth
The overall equations describing net growth of services 1 and 2 in unit time
are obtained by summing the new customers arising from growth and transfer:
RESULTS AND DISCUSSION

.:lnl = [PIF/(PIF I +P2F2)]RN+C(PIFln2-PIF2nl)
.:ln2 = [P2F2/(PIFI
2.3
+ P2F2)] RN + C(P2F2nl -
PIFln2)
15
... (2.7a)
... (2.7b)
In this section, the model is used to establish the behaviour of competitive

systems with varying degrees of interconnection or access between rival
services. Three specific cases are considered:
equal access and full interconnection between services;
no access between services;
an intermediate situation in which there is unequal access between

competing services.
2.3.1
Case 1: equal access and full interconnection
When the two services are fully interconnected, a customer can communicate
with the whole of the service class, i.e. F I = F 2 = 1. This situation applies,
for example, to competition between rival PSTN operators. Substituting for
F I and F 2 in equation (2.7), and using the fact that (PI + P2) = 1, the
following expressions are obtained:
.:lnl
= PIRN + C(Pln2 -
P2nl) =PIRN + C(PIN - nl)
.:ln2 = P2RN + C(P2nl - Pln2) = P2RN + C(P2RN - n2)
... (2.8a)
... (2.8b)
Inspection of equations (2.8a) and (2.8b) show:
growth of service i is proportional to Pi;
the transfer rate tends to zero as
ni
tends to piN.
Over a period of time, an equilibrium is reached in which the market share

of each service is proportional to the value of its preference parameter (n/N
= Pi)' Rand C have no effect on the equilibrium market shares although
increasing the value of either parameter will reduce the time taken to reach
equilibrium. This behaviour is illustrated in Fig. 2.2 for R = 10070 per annum,
C = 0.1, PI = 0.6 and P2 = 0.4. Initially service 1 is - 1% of the size of
16 MODELLING SERVICE INTERACTIONS
'0 3000
0
z 2000
1000
0
Fig. 2.2
year
Growth of service I and service 2: full interconnection (R = IOOJo, C= 0.1, PI = 0.6).
service 2. The equilibrium market shares of services 1 and 2 are 60010 and
40% respectively, reflecting the values of PI and P2'
This model has been used to estimate the future growth of mobile
communications services in the UK by assuming that mobile could be
considered to be in competition with connections to the PSTN. The results
are shown in Fig. 2.3.
100000
-0 -o~<> -
PSTN
- - - - - - - -cellular
<Ii
~
iii
100
::l
()
10
lL-_...L..._......L_---lL...-_.J--_........_
1987
Fig. 2.3
1989
1991
1993
1995
1997
........_
1999
Growth of cellular communications networks, compared with PSTN.
17
In fitting the model, it was assumed that C = 0, since, in general, there

is little transfer between mobile and fixed communications services - in most
cases the customer to a cellular system will already have a fixed PSTN line
which is kept on joining the mobile service. Good agreement with existing
data [5,6] was obtained assuming an overall growth rate (PSTN connections
+ cellular) of 5070 and Pcellular = - 0.2. The model predicts the number of
UK cellular customers will rise to over 4 million by the year 2000.
2.3.2
Case 2: no access between services
A second extreme case arises when there is no interconnection or access

between the competing services. This can occur if a similar service is provided
on separate and non-interconnected networks - for example telex and
facsimile. Since customers to one service can only communicate with other
customers on the same service, it follows that the interconnection parameters
are F I = n/ Nand F z = nz/N. Substituting these expressions in equations
(2.7a) and (2. 7b) gives:
... (2.9a)
... (2.9b)
This reveals a very different behaviour from case 1; instead of reaching
an equilibrium, one service will grow at the expense of its rival until it
dominates the market. If PI >0.5, then service 1 dominates; if PI < 0.5, then
service 2 dominates. This is shown in Fig. 2.4 (for R = 10070, C= 1 and
PI = 0.7). As with case 1, increasing R or C reduces the time taken for the
preferred service to dominate the market.
The facsimile/telex market provides an ideal example of noninterconnected but competing services. There has been a world-wide decline
in the number of telex connections following the introduction of Group 3
facsimile in the early 1980s. In the UK, the growth of the combined market
for facsimile and telex has averaged 25070 since 1980. Unlike the US and
Japan, the UK telex market has continued to grow throughout much of the
1980s, although at a slow rate. The data [7] can be fitted to equations (2.9a)
and (2.9b) assuming R=250J0, C=0.25 andptelex=0.2 (Fig. 2.5). It can be
seen that from 1988 onwards telex is in slow decline, as in other major
markets.
18
12000
'"Qj
~:;,
o
'0
o
10
15
20
25
year
Fig.2.4
Growth of service 1 and service 2: no access between services (R= 10070, C= I,

PI =0.7).
Fig. 2.5
2.3.3
Development of UK facsimile and telex services.
Case 3: unequal access between competing services
As new telecommunications services are introduced, it will be increasingly

common for an intermediate case to occur in which access is in some way
unequal. An example of this is videotelephony where customers with normal
telephones can contact anyone on the network, but the benefits of
videotelephony can be enjoyed only when both caller and called have
videophones.
To illustrate this, consider the situation where customers to service 1 can
communicate with all customers in the service class (PI = 1) while customers
to service 2 can only communicate between themselves (F2 = n2/N). As
19
before, substituting in equations (2.7a) and (2.7b) gives the overall growth
for services 1 and 2:
~nl
=P1RN/(PI +P2n 2/N)+C(p\n2-P2n 2n t!N)
~n2 = P2 n 2R/(PI + P2n2/N) + C(P2 n 2n t!N
- p\n2)
... (2.10a)
... (2.IOb)
If service 2 (which can only communicate with itself) is the preferred

service, then the market will reach an equilibrium, as in case I described
above. However, in this case the equilibrium market share is given by nt!N
= PI/P2' When service 1 (which can communicate with the whole service
class) is the preferred service (i.e. PI > 0.5), then pt!P2 > I and service 1
will eventually dominate the market.
This model allows some comments to be made on the introduction of
videophones. If the domestic videophone market is considered, then
videotelephony can be regarded as being in competition with ordinary PSTN,
but access is unequal in that the advantages of a videophone require the
recipient of a call also to have a videophone. The model predicts that
videophone will displace POTS, providing that pricing and other considerations mean that it will be preferred by more than half the market. On
the other hand, if the preference falls below 50070 (either because of the high
purchase costs of videophones, poor quality, or because it requires subscribing
to a new system such as ISDN) then videotelephony will fail to gain a
significant share of the market. This is shown in Fig. 2.6 for preference
p=0.8
100.00
10.00
~
Ql
iii 1.00
or;
IJl
Q)
p=0.5
.><
iii
year
Fig. 2.6
Growth of installed videophones showing effect of preference parameter. Pvideo'
20
0.8 and Pvideo = 0.5. Both curves assume a growth rate for the
overall service class (videotelephony + POTS) of 5070, an initial market
penetration by videophones of 0.1 % and a value of C = I. It can be seen
that if P = 0.8, then videophones would reach an equilibrium 75% of the
market by 2010, whereas for P = 0.5, the market share remains static at a
mere 0.1 %. Smaller values of P would lead to a decline in market share.
Pvideo =
2.4
CONCLUSIONS
Understanding the factors which determine customer choice is necessary in

order to predict the growth of telecommunications traffic in the 21st century.
This chapter has concentrated on one aspect of competition between services
- the degree to which customers of one service can access customers to a
rival service. A market model has been developed, illustrating the effect that
differing degrees of access or interconnection can have on the development
of the market. The model was applied to a number of services including
cellular telephony, facsimile, telex, and videophones.
The model described here does not displace more detailed statistical
analyses, but provides a framework for discussing long-term trends, especially
when historical data is not available. In particular, it provides guidance on
whether a competitive market will reach an equilibrium (in which the market
is shared between several services) or become dominated by just one service.
The algorithms developed here have been incorporated into a number of more
complex models which involve competitive markets. Applications include the
growth of broadband services and the market for global visual services.
REFERENCES
I.
Armolavicius R J, Colagrosso P and Ross N E: 'Technology replacement models

based on population dynamics', Int Teletraff Congr, 11, p 255, Torino (1988).
2.
Veda T: 'Demand forecasting models for markets with competition', Int Teletraff
Congr, .!l, p 261 (1990).
3.
Bass F M: 'New product growth for consumer durables', Management Science,

12 ' p 215 (1965).
4.
Mahajan V and Muller E: 'Innovative diffusion and new product growth models
in marketing', J of Marketing, 43, p 55 (1979).
REFERENCES
21
5.
'UK Telecommunications Market', MDIS Ltd, 8 Eastgate Square, Chichester,

UK (1991).
6.
BT Today, p 19 (February 1994).
7.
'Yearbook of common carrier telecommunication statistics' (20th Edition), ITU

Geneva (1993).
3
FRACTAL
POPULATIONS
S Appleby
3.1
INTRODUCTION
This chapter presents a review of fractal and related techniques which may
be useful for the planning or analysis of large networks to serve the human
population. The work divides naturally into two areas:
firstly, the use of fractals for modelling and characterizing the spatial
distribution of human population;
secondly, fractal and similar methods for analysing large networks.
Finally, these two areas are combined to show how fractal structure in
the population affects the design of a distribution network.
The motivation for this review is to make the techniques described here
more widely known amongst the telecommunications engineering community
and to show how these techniques can be used.
The main reason for a telecommunications operator to be interested in
these techniques is because a graph-theoretical approach is not tractable for
large networks; there are too many possible network configurations. If an
underlying structure could be found in the population distribution, then it
might allow a number of problems to be solved without designing the network
FRACTAL GEOMETRY
23
in detail. This is particularly useful when the network uses mass-produced

components which are customized locally. For example, suppose there is a
need to choose between two alternative technologies for a particular kind
of network component, one of which has a high fixed cost but a low cost
per unit capacity and the other has a low fixed cost but a high cost per unit
capacity, then the more economical option to adopt would depend on the
spatial distribution of the population. Another application for this work
would be to carry out a sensitivity analysis to see whether small variations
in the population distribution would cause the cost or design of a network
to change dramatically.
The work reviewed in this chapter shows that there is a very clear structure
in the spatial distribution of the population and that the simplification
afforded by this structure enables some useful conclusions to be drawn
regarding the networks used to serve the population.
The chapter begins with a brief introduction to fractal geometry followed
by a review of work which uses fractal methods to characterize and model
the spatial population distribution. Then a review is presented of the work
which uses fractal techniques to analyse large graphs. Finally, a useful
connection between the fractal structure of the population and the cost of
a distribution network to serve that population is described.
3.2
FRACTAL GEOMETRY
Fractal geometry [I] has been developed as a tool to characterize sets of

points whose distribution in space is scale-invariant in either a deterministic
or statistical sense.
A fractal has no formal definition and so the concepts underlying fractal
geometry are generally introduced through a series of examples. One example
of a natural, statistical fractal is that of a coastline. Mandelbrot [I]
introduces the concept of a fractal through this example.
Consider a map which presents a reasonably accurate representation of
a coastline. Take a pair of dividers with a fixed gap between the points and
use them to estimate the length of the coastline. If the gap between the points
of the dividers is reduced and the measurement repeated it is observed that
the estimate of the length of the coastline increases. If this process is continued
one finds that the length of the coastline is approximately a power law
function of the divider gap. The power law relationship implies that as the
divider gap is reduced to zero in an attempt to get ever more accurate
estimates, then the length of the coast would tend to infinity.
It can be seen from this simple experiment that length is not a useful
measure for the size of coasts. If the length of the coast does follow a power
24
FRACTAL POPULATIONS
law then the exponent of the power law is a characteristic of the coast.
Mandelbrot proposed interpreting the value of the exponent of the power
law as an indicator of the dimension of the coast.
This 'divider dimension' is only one of many dimensions that may be
used to characterize a fractal. In general the process of measuring the
dimension of a shape proceeds as follows. One forms an approximation of
the shape such that all the detail below some length is obscured (in this chapter
this length will be called the resolution). The coastline example above used
the dividers set at a particular spacing to form an approximation of the
coastline which obscured all detail below the divider spacing. The coastline
can be approximated by joining the points where the dividers cross the coast.
The next step is to establish how much information is required to specify
the location of a point in the shape to within the resolution. In the case of
the coastline, the amount of information required to specify the pair of line
segment ends that straddle a point on the coastline is used. This is the
logarithm of the number of segments. The amount of information is then
plotted against the logarithm of the resolution. If the resulting graph is a
straight line then, for the purposes of this chapter, the shape is a fractal.
The dimension of the shape is the negative of the gradient of the line.
When measuring the dimension of a distribution such as the population
distribution it is more suitable to partition the plane into squares of a given
size and count the number of people living in each square in order to form
the approximation of the actual distribution at a particular resolution. In
this case the size of the squares is the resolution. The next concern is the
amount of information required to determine in which square a particular
member of the population (selected at random) lives. There are many different
information measures that could be used but these can all be shown to be
special cases of the generalized information given by:
I
_1_ log E pC(

l-q
i I
... (3.1)
where q is a real number and Pi is the probability that a member of the

population selected at random lives in the ith square.
Integer values of the parameter q select the more conventional information
measures whilst less conventional measures are given by intermediate values
of q. For example, as q-l, I q becomes the more familiar Shannon
information. The derivation of this information measure is described in Renyi
[2]. To calculate the generalized dimensions, I q is plotted against the
logarithm of the size of the squares. A line for any value of q can be plotted.
The gradient of the line as a function of q is called the generalized q dimension
and is denoted by D q
FRACTAL GEOGRAPHY
25
D q contains much more information about a distribution than anyone
dimension. It is analogous to specifying the whole set of moments of a

probability distribution rather than specifying just the mean.
3.3
FRACTAL GEOGRAPHY
A number of workers have observed connections between fractal geometry

and human geography. Arlinghaus [3] has commented on the connection
between the recursive hexagonal forms produced by central-place theory [4]
and edge-replacement fractals, such as the Koch curve. Batty, Longley and
co-workers have used aggregation processes that produce fractals as models
for the morphology of cities as they grow [5, 6] . De Cola [7] has calculated
the fractal dimensions of land put to different uses from Landsat images.
Goodchild and Mark [8] have critically reviewed the application of fractal
geometry to geographic phenomena.
There has also been a considerable interest in using fractal descriptors
to characterize geographic phenomena. The interest starts with Mandelbrot's
own question: 'How long is the coast of Britain?' [9]. Fractional Brownian
motion has been used extensively to model landscapes [1, 10].
One of the characteristics of many fractal-producing algorithms is that
very simple algorithms can produce subjectively very complex patterns. This
was seen in a quite spectacular way in the case of dynamic systems that have
chaotic attractors. Fractal geometry is useful in showing that apparently
complex processes can often be produced by simple underlying processes.
This phenomenological approach is consistent with the philosophy behind
earlier work on human geography which attempted to adopt simple rules from
physics to model population distribution (for examples of the earlier work,
see the review by Stewart and Warntz [11]).
Mandelbrot [1] suggested that scale-invariance often occurs as a result
of a relationship between length, area and volume. For example, consider
a machine which generates heat that needs to be dissipated. The rate of heat
generation is proportional to the volume of the machine and yet the ability
to dissipate heat is a function of the surface area of the machine and the
temperature difference between the machine and the cooling medium.
Another example would be a circuit board or VLSI integrated circuit. The
number of components is proportional to the area of the board and yet links
to and from the circuit board pass through the board's perimeter and are
thus restricted by the length of the perimeter.
One may propose a similar relationship between a perimeter and area for
towns. The number of people living in a town is dependent on the area of
26
FRACTAL POPULATIONS
the town and yet communication with the people in the town takes place
through the town's perimeter. This may explain the dendritic town
morphologies noted by Fotheringham, Batty and Longley [12].
In a series of papers Longley, Batty and co-workers [6, 12-15] investigated
the use of fractals for modelling urban morphology. They tried a number
of fractal-generating algorithms with the primary interest of discovering
whether simple algorithms could explain the complex shapes exhibited by
urban population distributions. Two algorithms of particular note are
diffusion limited aggregation (DLA) and the dielectric breakdown model.
A DLA cluster begins with a seed particle. A second particle is allowed
to randomly walk on a lattice until it collides with the seed particle (collision
meaning that it occupies a neighbouring lattice site) or until it wanders beyond
some limit whereupon it is discarded. Another particle is then released and
the process continues; particles either stick to the growing cluster or wander
beyond the given limit. Figure 3.1 shows a DLA cluster.
Clusters constructed in this way have no characteristic length. It is not
at all obvious why this should be the case since the lattice upon which the
cluster is built clearly has a characteristic length. The scale invariance seems
to be due to the way that the growing arms of the cluster screen the inner
perimeter sites from the diffusing particles.
DLA was originally proposed as a model for a number of natural processes
[16] . There have been many papers published which study different aspects
of DLA and similar growth processes. The sources that are of most relevance
to the current work are those that discuss the occupancy probability
distribution [17-24] and the various simple algorithms that produce complex
dendritic structures [25-27].
Measurements of the fractal dimension of the DLA clusters reveal that
the Do dimension is approximately 1.7. In comparison Batty, Longley and
Fotheringham [6] found the dimension of Taunton in Somerset to be between
1.6 and 1.7.
The DLA process in its original form has no parameters to adjust and
so cannot be fitted to actual data on the shape of towns. For example, a
method of adjusting the fractal dimension to fit that actually measured would
be beneficial. The dielectric breakdown model (DBM) has such a parameter
that can be adjusted.
DBM is closely related to DLA since both processes are governed by
Laplace's equation:
... (3.2)
In both DLA and DBM this is solved in two dimensions with the
appropriate boundary conditions which assume that the growing cluster is
FRACTAL GEOGRAPHY
Fig. 3.1
27
Diffusion-limited aggregation cluster with 4000 particles.
a conducting object at unit potential and that the cluster is surrounded by

a ring of infinite diameter at zero potential.
To grow an ordinary DLA cluster the probability that a site becomes
occupied would be made proportional to the potential at that site. Each time
a particle is added, the boundary conditions change, so the potentials need
to be recalculated. In the DBM model, the relationship between the potential
at a site and the occupation probability of that site is altered to produce a
range of clusters with different fractal dimensions.
The relationship:
p(x, y) ex (x, y)li
... (3.3)
has been demonstrated by Niemayer, Pietronero and Wiesmann [28]. The

parameter p. can be used to generate clusters of any dimension between 1
and 2. As p. tends to infinity the dimension tends to 1 and as p. tends to zero
the dimension tends to 2.
Actually implementing the DBM version of the diffusion equation requires
much more computational expense than the direct DLA version. At each stage
of the cluster's growth one needs to solve the diffusion equation to enable
28
FRACTAL POPULATIONS
the site occupation probabilities to be calculated. One then uses a random

number generator to select the actual site that is to be occupied.
DBM was used by Batty (5) to model urban growth. Other fractalgenerating algorithms have been used to model land use. For example, Batty
and Longley [29] use a recursive subdivision method for generating
hierarchical regions and then apply a random algorithm to decide what use
is assigned to each region based on the multinomial logic model [30]. The
aim of this work was to produce realistic-looking distributions of land use
which previous models had failed to do. The resulting algorithm is similar
to the multiplicative multinomial process (MMP) for generating fractal
measure distributions that is described in Stanley and Meakin [27].
3.4
FRACTAL DIMENSIONS OF THE POPULATION
The q dimensions have been measured for cities in the United States and
Great Britain [31].
Figures 3.2 and 3.3 show generalized information as a function of
resolution as q-l for the United States and for Great Britain. Natural
logarithms have been used to calculate information.
8
7
C 4
.Q
1U
E 3
~ 2
.......
O~-..,.~
~~
__
_---
.......;~--,..~
100
Fig. 3.2
Information min,max versus resolution (q = 1.0) (United States).
The information which is measured is sensitive to the choice of origin

for the squares used to partition the population, i.e. one may shift the whole
grid of squares relative to the population co-ordinate system to obtain
different information values. For the United States data a number of
information values were measured by displacing the origin of the squares
relative to the population co-ordinate system in 100 km steps. For Great
FRACTAL DIMENSIONS OF THE POPULATION
29
6
)(
ell
E 5
c
4
C-
"e
o
"~
:s<5
OL.10
~-----.....;;:lI--...J
resolution, km
Fig. 3.3
Information min, max versus resolution (q = 1.0) (Great Britain).
Britain the displacement was in 10 km steps. In each graph the maximum

information obtained is plotted as a dashed line and the minimum information
plotted as a continuous line. It makes most sense if the minimum information
is taken as the appropriate value since this would cause the information to
tend to zero as the resolution tends to infinity.
Both of the graphs have a remarkably well-defined linear region. For the
United States data this ranges from cell sizes of 50 km to around 2000 km.
For Great Britain the linear region extends from 20 km to around 500 km.
The graphs for England, Scotland and Wales individually are very similar
but with smaller linear regions corresponding to the smaller dimensions of
the countries.
Smaller values of q give the best results with the quality and extent of
the linear regions deteriorating significantly by the time q=3.0. Figure 3.4
shows information as a function of resolution when q = 3.0 for the United
States. There are two main sources of error in the data. Firstly, not all of
the population is included in the city population data. For each data set cities
with populations below a given threshold are neglected. Secondly, each city
is effectively replaced by a single point even though its population is
distributed over a finite area.
The finite population cut-off should have the most significant effect on
the information graphs for small values of q and small resolutions. This is
because, for small values of q, large numbers of squares with a smaller
population become relatively more important in the information calculation
than small numbers of squares with a larger population.
30
FRACTAL POPULATIONS
Figure 3.5 shows information plotted against resolution for the United
States as q-l when all cities with a population below 50 000 inhabitants
have been neglected. The graph shows that the quality and extent of the linear
region have been reduced. It is worth noting, though, that the gradient of
the linear region has only decreased slightly compared with the graph in
Fig. 3.2.
8
x
ell
E 6
c::
E5
c::
.Q
co
E3
.2
-- -
. 2
OL..-_....I100
.......
----:3o___---J
resolution, km
Fig. 3.4
Information min, max versus resolution (q = 3.0) (United States).

8
7
population cutoff
=50,000
.....
OL..-_......
100
_----
~----"""--~
resolution, km
Fig. 3.5
Information min,max versus resolution (q = 1.0) (United States).
The effect of replacing cities with points will be most pronounced for
larger q values and smaller resolutions. For example, Fig. 3.4 shows a
considerable variation in information for small resolutions.
LARGE GRAPHS
31
The values of D q for the United States and Great Britain are presented in
Tables 3.1 and 3.2.
Table 3.1
D q for the United States.
Dq
0.0
0.5
1.0
1.5
2.0
2.5
3.0
1.580.05
1.52 0.03
1.46 0.05
1.360.05
1.260.05
1.17 O.I
I.I!O.!
Table 3.2
D q for Great Britain.
Dq
0.0
0.5
1.0
1.5
2.0
2.5
3.0
1.55 0.05
1.53 0.05
1.49 0.05
1.45 0.05
1.290.!
1.160.1
1.1O0.!
3.5
LARGE GRAPHS
What constitutes a large graph will depend on the problem at hand. In general,
though, a graph is large when no deterministic technique is able to provide
a solution and one needs to resort to stochastic or approximate methods.
If it is true that the morphology of towns is determined to a large extent
by a volume/surface relationship, then there is likely to be a clear structure
to the communication that takes place between people as a function of their
locations since the freedom of communication is, in part, measured by the
surface of the town. There is some justification, therefore, in studying the
relationship between fractally distributed populations and the networks that
are required to interconnect that population. This has led to some work on
the concept of 'fractal graphs'.
3.5.1
Fractal graphs
The name 'fractal graph' was used by Bedrosian and co-workers [32-34]
to describe networks used to interconnect large populations that are
distributed in a fractal manner. The population distributions studied initially
were simple power-law distributions about a single population centre (as has
been suggested is the case for a town). The population was interconnected
32
FRACTAL POPULATIONS
by a minimum spanning tree where the cost of each link in the network was
simply equal to its length. Various statistics were then calculated for the
resulting network. In later work the populations were extended to multi-foci
populations based on DLA.
The single focus population distributions produced in this work may be
justified by early work on population distributions which used a power law
model of population density centred on a single point. The work of Batty
and Longley [13] provides some justification of the multi-foci population
distributions based on DLA. The networks produced are not at all realistic
though. The reason is that the total length of the links is not the only factor
determining the cost of the network. There are many other factors which
need to be included such as the costs of the nodes and the fact that the cost
of any network component is dependent on its capacity.
3.5.2
Percolation and the renormalization group
Fractals and graphs coincide in a quite different way as the result of a

percolation process.
Percolation may be classified into two types [35] - site percolation and
borid percolation. Site percolation is described first.
In a lattice of points where each site is occupied with probability P or
unoccupied with probability (l - p), the collections of occupied sites that are
connected through occupied neighbours are called clusters. In percolation
theory it is the statistics of such clusters that are of interest. In order to be
able to disregard finite lattice size effects it is generally assumed that the lattice
is infinitely large.
The key result of percolation theory is that for P below a certain critical
value, Pc, the probability that a single cluster will span the lattice is
vanishingly small. For P > Pc the probability that such a cluster exists is
unity. A cluster which spans an infinite lattice is called the incipient cluster.
The connection with fractals arises when P is close to Pc. One of the
statistics that is used to measure the percolation effect is the correlation length.
When P Pc the correlation length is very small. As p approaches Pc the
correlation tends to infinity. For cluster sizes below the correlation length,
the clusters appear to be scale invariant in the sense that the distributions
of the various statistics of the clusters (such as perimeter length) follow a
power law. These power-law distributions allow the definition of an
interrelated set of so-called critical exponents which govern the distribution
of cluster sizes as p approaches Pc.
Bond percolation is very similar to site percolation but here the bonds
which link sites are occupied with a certain probability. A cluster in this case
LARGE GRAPHS
33
consists of all of those occupied links which are interconnected. Again there
exists a critical probability and a series of interrelated exponents.
The connection between percolation theory and the study of networks
arises when one wishes to calculate the conductivity of a substance in which
the bonds may assume different impedance values with different probabilities.
One technique which has been successfully used to analyse systems near
critical points is that of the renormalization group [36]. Renormalization
group techniques have been used to predict the critical probabilities of a
number of percolation systems and have been used to estimate their critical
exponents [35]. The renormalization group is an approximation technique
whereby one iteratively replaces the lattice at one scale with the lattice at
a coarser scale. Take, for example, site percolation on a square, twodimensional lattice with lattice parameter b and with a probability that any
site is occupied of p, and then approximate this lattice by a lattice with a
larger lattice parameter, say 2b (see Fig. 3.6); then each point on the coarser
lattice will represent four points on the finer lattice. Since, in this case, interest
lies in the clusters of connected occupied sites, the collection of four sites
is replaced with an occupied site only if the four points on the finer lattice
are connected. The probability that a group of four adjacent sites on the
finer grid is connected can be estimated and, as a result, the probability that
the group is replaced by an occupied site can also be estimated. If this
probability is smaller than p then at each renormalization the occupation
probability will decrease and therefore the correlation length relative to the
renormalized lattice parameter will also decrease (correlation length increases
monotonically with p). The critical probability will be that which is invariant
under renormalization.
To understand the conductivity of a bond-percolation process one needs
to study the 'backbone' of the incipient cluster since the links which pass
no current will not contribute to the conductivity even if they are part of
the incipient cluster. Orbach [37] and Aharony et al [38] have studied the
dynamics of percolation clusters at the critical point (the latter workers in
terms of eigen-dimensions of fractal transfer matrices disussed in section
3.5.3).
Renormalization techniques with two parameters have also been used to
estimate the dimension of diffusion-limited aggregation clusters [17].
Site percolation has a particular relevance to the design of networks that
consist of a number of nodes interconnected by links. In this case, site
percolation can be used as a simple model for the propagation of states from
one node to another across the network. For example, if there is a certain
probability that a node will become overloaded then percolation theory may
be useful in determining the statistics of the regions of overloaded nodes.
34
FRACTAL POPULATIONS
lattice
parameter = b
--~-II"--~-II"-
Fig. 3.6
Example of renormalization group techniques applied to a two-dimensional lattice.
3.5.3
Interscale transfer matrices
The idea of an interscale transfer matrix (abbreviated to TMF for transfer

matrix of the fractal) is closely related to the renormalization group techniques
outlined in section 3.5.2 [39]. A TMF relates the fractal on one scale to the
fractal on another. Each element in the matrix represents the relative
FRACTALS AND DISTRIBUTION NETWORKS
35
frequency of occurrence of a shape selected from a set. It happens that a

considerable amount can be said about the TMFs. For example, the largest
eigenvalue of a TMF gives the fractal dimension of the structure produced
by the TMF. Other eigenvalues are the dimensions of fractal subsets. The
eigenvalues are always positive and the eigenvectors are orthogonal.
It is rather more difficult to interpret the eigenvectors of the TMFs since
these can have negative components and it is not clear what interpretation
should be given to the implication that a shape occurs with a negative
frequency.
It has been suggested that TMFs can be used to generate interconnections
which are close to the maximum entropy distributions produced by the
simulated annealing method [40]. It is asserted that using TMFs to produce
interconnections could lead to a method of producing near-optimal
interconnection distributions.
TMFs have not been widely used, although they have been successfully
applied to an analysis of the conductivity of an incipient bond-percolation
cluster [38].
3.6

3.6.1
A simple example
The kind of network considered will be a simple hierarchical star network.

The users are represented by the termination nodes of the network which
will be connected to the first layer nodes in such a way that each termination
node is connected to precisely one first layer node. Each of the first layer
nodes will in turn be connected to precisely one second layer node and so on.
An example will be used to illustrate the connection between the fractal
dimension of a population and the cost of a network that interconnects
that population. For this a very simple fractal is chosen - the Sierpinski
triangle [1].
Suppose that the Sierpinski triangle represents a rather idealized fractal
population distribution. Now, consider a very simple network in which there
is a single node that serves the whole population and which is placed at the
point that minimizes the total link cost (shown in Fig. 3.7).
Now compare this with using three nodes to serve the population (see
Fig. 3.8). To simplify the example, only the first layer of the network is
considered. This is justified by the fact that in most practical networks the
total network cost will be dominated by the lower layers since these consist
of large numbers of low capacity links. To minimize the cost in this layer
it is clear from the self-similarity of the distribution that the three nodes should
36
FRACTAL POPULATIONS
Fig. 3.7
Fig. 3.8
Sierpmski population served by a single node.
Sierpinski population served by three nodes.
37
be placed at the same location with respect to the half-size triangles as the
single node was with respect to the full-size triangle. Each of the smaller
triangles has one-half of the linear dimension of the larger triangle and one
third of the population. The total length of cable used in each of the three
smaller triangles will then be one sixth of that used when there was just a
single node. There are three smaller triangles so the total length of cable is
one half of that in the single node case.
In this example, the capacities of the links connecting the users to the
first layer nodes does not change as the number of nodes is increased. With
the assumption that the cost of a cable of given capacity is proportional to
its length, it is clear that the cost of the cable is reduced by one half when
the number of nodes is increased by a factor of three. The cable cost as a
function of the number bf nodes is then a power law whose exponent is
log(2)/log(3). This is the reciprocal of the Hausdorff dimension of the
Sierpfnski triangle [1].
As far as the cost of the nodes is concerned, the symmetry of the
distribution (at least in this example) means that each node will serve the
same fraction of the population and therefore will be expected to have the
same cost. The total cost of the nodes in a network layer will be proportional
to the number of nodes in that layer.
Knowing both the total cost of nodes and the total cable cost as a function
of the number of nodes means that one can choose the number of nodes
that minimizes the cost of the first layer in the network. This process can
be repeated for subsequent layers by treating the first layer nodes as the
population to be served by successive layers.
3.6.2
A more realistic example
So far the generalized q dimensions have been reviewed, and the connection
between the fractal properties of a population distribution and the cost of
a network to interconnect that population has been indicated. In this section
the actual population distributions of the United States and Great Britain
obtained from census data will be used to see if the relations that were
suggested in the previous section hold in realistic cases.
38
FRACTAL POPULAnONS
The population data is available as populations of cities and so it will

be assumed that each city is a node in the lowest layer of the network. The
cities effectively will be considered as constituting a zeroth layer of nodes.
The aim is to minimize the total cost of the network. Unfortunately, this
problem is intractable in the general case and so it will be necessary to make
some approximations. To begin, it is assumed that it is reasonable to build
the network from the bottom upwards, i.e. the number and locations of the
lower layer nodes are chosen without consideration of the higher layers of
the network. The justification for this is that the lower layers of the network
will consist of larger numbers of lower capacity components which are more
expensive (per unit capacity) than higher capacity components. This
approximation allows consideration of one layer of the network at a time.
A further approximation is that, for a given number of nodes, the
locations and catchment areas of the nodes are chosen using an algorithm
that is not guaranteed to produce the minimum cost layer (since no precise
algorithm is known). The algorithm used is a substantially modified version
of the so-called k-means algorithm.
The k-means algorithm is used in pattern recognition to identify clusters
of similar patterns. In this application the algorithm is used to find clusters
of cities that minimize the total squared length of the links between the cities
and the nodes serving those clusters. Total squared distance is used instead
of the total length since it is computationally more efficient to calculate the
centre of mass of a given subset of the population than it is to calculate the
point that minimizes total link length. The error introduced through this
choice has been studied elsewhere [41] and for the purpose of this study
it is unlikely to be significant.
The basic k-means algorithm proceeds in two stages. In the first stage
locations are chosen for the nodes and each city is assigned to its nearest
node. In the second stage the nodes are moved to the centre of mass of the
population in the catchment area of the node. The first stage is then repeated
by re-assigning the cities to their nearest cluster centre. The algorithm
continues by carrying out these two stages alternately until no new assignments
are made.
There are a number of problems with using the k-means algorithm. It
is not guaranteed to reach the global minimum solution - the final solution
depends on the starting locations for the nodes. Also, the algorithm tends
to terminate prematurely, particularly when the cities are sparsely distributed.
This is because the nodes may be some distance from their optimal locations
when no new assignments take place.
To attempt to overcome these problems a modified version of the k-means
algorithm is used.
From the discussion in the previous section it would be expected that the
total link length in a layer of the network would vary as a power-law function
39
of the number of nodes, and the index of the power law to be the inverse
of the fractal dimension of the distribution. This function can be inverted
so that the number of nodes is a function of the total link length. In this
case, the logarithm of the number nodes in the first layer of the network
will be equal to I q when q = O. This can be extended further to calculate I q
for other values of q as a function of total link length. In this case the total
link length plays the role of the resolution in the calculation of the generalized
q dimensions.
Figures 3.9 and 3.11 show the total link length plotted against the number
of nodes for the United States and the Great Britain respectively. Figures
3.10 and 3.12 show the generalized information plotted against the total link
length. The linear regions in the graphs are not as well defined as those
10 12
E
.c
i5>
c:
..9!
<Il
:0
CIl
0
.Q
109
10
100
number of nodes
Fig. 3.9
Total cable length versus number of nodes (United States).

7
6
c:
.2
iii 4
E
(;
~ 3
o"'---
-':-::--
total cable length, km
Fig. 3.10
Generalized information versus total cable length (United States).
40
FRACTAL POPULAnONS
number of nodes
Fig. 3.11
Total cable length versus number of nodes (Great Britain).
.2
iii
.E.!;;
109
total cable length. km
Fig. 3.12
Generalized information versus total cable length (Great Britain).
obtained when the information is calculated as described above for the q

dimensions. However, the gradients of the lines are consistent with those
obtained above.
There are some indications that the poor quality of the lines in Figs. 3.9
and 3.11 is due to the inconsistent performance of the k-means algorithm.
For example, the value of information should monotonically decrease with
link length. This is not the case in these graphs, which implies that the node
locations being chosen may be significantly differ~nt from the optimal
locations.
41
The conclusion from these results is that the generalized q dimensions

provide a statistical description of the population distribution which can be
used to estimate the cost of a layer of a distribution network. In the next
section some of the assumptions made so far will be removed to show that
a more general relationship exists between the q dimensions of the population
and the cost of a distribution network to serve that population.
3.6.3
A general distribution network
The previous section used the k-means algorithm to show that the moments
of cable length do indeed scale in the same way as in the Sierpinksi triangle
example. The next logical step would be to ask whether the generalized
dimensions can be related directly to the cost of a layer of a distribution
network.
To cope with the most general case of fractal population distribution (i.e.
where the distribution has well-defined but arbitrary generalized dimensions),
some assumptions need to be made regarding the way that the costs of the
network components depend on their parameters.
For network nodes it must be assumed that the cost of a node is a function
of the population served by that node only. The costs model for the node
is then represented by a weighted sum of power-law functions whose
exponents take on the same values of q as the q dimensions, Dqo In practice
this means that q should be between 0 and 3 since the dimensions are not
well defined outside of this range. This allows the possibility, say, that the
node cost function could be the sum of a constant value and a value that
is proportional to the number of people served by the node.
For links in the network, again it must be assumed that the cost of a link
is a function of the population served by the link, and that the cost of a link
is proportional to the length of the link.
With these assumptions regarding the cost models of the network
components, it is possible to expand the costs of each type of network
component in terms of the generalized entropy, and (with suitable
renormalization) as a function of the q dimensions. This means that
characterization of a population distribution in terms of its q dimensions
is enough to estimate the cost of each layer of a distribution network.
One benefit of deriving an equation to estimate the network cost is that
it can be differentiated and optimized with respect to parameters in the
equation. For example, if it is assumed that the catchment areas of the first
layer nodes all have the same diameters, then the optimum diameter can be
found very easily.
This connection between the generalized q dimensions and the cost of
a distribution network is reported in Appleby [42].
42
FRACTAL POPULATIONS
3.7
CONCLUSIONS
This chapter reviews work on three main topics - the fractal structure of
the spatial distribution of the human population, the analysis of scaleinvariant graphs and the use of the fractal nature of the population to estimate
the cost of a distribution network to serve the population.
The fact that there is such a strong, easy to characterize structure in the
spatial distribution of the population should simplify the task of making
design decisions regarding networks on a national scale. Such a structure could
also be used for issuing planning guidelines which are designed to minimize
the capital cost of a national network.
The work on scale-invariant graphs has introduced the concept of critical
phenomena to networking. This means that for large networks, small changes
in a network parameter may have very dramatic consequences, possibly
resulting in end-to-end blocking.
Finally, it has been shown that the fractal dimensions of the population
distribution impinge directly on the cost and hence the design of a distribution
network.
REFERENCES
1.
Mandelbrot B B: 'The fractal geometry of nature', Freeman, New Yark (1982).
2.
Renyi A: 'Probability theory', North-Holland, Amsterdam (1970).
3.
Arlinghaus S L: 'Fractals take a central place', Geografiska Annaler, 67B , pp

83-88 (1985).
4.
Christaller W: 'Central places in southern Germany', New Jersey: Englewood

Cliffs (1966).
5.
Batty M: 'Cities as fractals: simulating growth and form', in 'Fractals and Chaos',
pp 43-49, Springer-Verlag (1991).
6.
Batty M, Longley P and Fotheringham S: 'Urban growth and form: scaling,

fractal geometry, and diffusion limited aggregation', Environment and Planning
A, ~, pp 1447-1472 (1989).
7.
De Cola L: 'Fractal analysis of a classified landsat scene', Photogrammatic

Engineering and Remote Sensing, 55, No 5, pp 601-610 (1989).
8.
Goodchild M F and Mark D M: 'The fractal nature of geographic phenomena',

Annals of the Association of American Geographers, 77, No 2, pp 265-278
(1987).
-
9.
Mandelbrot B B: 'How long is the coast of Britain?', Science, ISS, pp 636-638

(1967).
.
-
REFERENCES
43
10. Lam N S N: 'Description and measurement of landsat TM images using fractals',

Photogrammetric Engineering and Remote Sensing, 56, No 2, pp 187-195
(1990).
II. Stewart J G and Warntz W: 'Macrogeography and social science', The
Geographical Review, 48, pp 167-184 (1958).
12. Fotheringham A S, Batty M and Longley P A: 'Diffusion Limited Aggregation
and the fractal nature of urban growth', Papers of the Regional Science
Association, 67, pp 55-69 (1989).
13. Batty M and Longley P A: 'Urban shapes as fractals', Area,
pp 215-221 (1987).
19, No 3,
14. Batty M and Longley P A: 'Fractal-based description of urban form',

Environment and Planning B, 14, pp 123-134 (1987).
15. Longley P A, Batty M and Shepherd J: 'The size, shape and dimension of urban
settlements', Transactions of the Institute of British Geographers, 16, No 1,
pp 75-94 (1991).
16. Witten T A and Sander L M: 'Diffusion-limited aggregation', Physical Review
B, 27, No 9, pp 5686-5697 (1982).
17. Gould H, Family F and Stanley H E: 'Kinetics of formation of randomly
branched aggregates: A renormalisation group approach', Physical Review
Letters, 50, No 9, pp 686-689 (1983).
18. Meakin P, Stanley H E, Coniglio A and Witten T A: 'Surfaces, interfaces and
screening of fractal structures', Physical Review A, 32, No 4, pp 2364-2369
(1985).
19. Garik P, Richter R, Hautman J and Ramanlal P: 'Deterministic solutions of
fractal growth', Physical Review A, 32, No 5, pp 3156-3159 (1985).
20. Turkevich L A and Scher H: 'Occupancy-probability scaling in diffusion-limited
aggregation', Physical Review Letters, 55 No 9, pp 1026-1031 (1985).
21. Amitrano C, Coniglio A and di Liberto F: 'Growth probability distribution in
kinetic aggregation processes', Physical Review Letters, 57 , No 8, pp 1016-1019
(1986).
22. Halsey T C, Meakin P and Procaccia I: 'Scaling structure of the surface layer
of diffusion-limited aggregates', Physical Review Letters, 56, No 8, pp 854-857
(1986).
23. Meakin P, Coniglio A, Stanley H E and Witten T A: 'Scaling properties for
the surfaces of fractal and non-fractal objects: An infinite hierarchy of critical
exponents', Physical Review A, 34, No 4, pp 3325-3340 (1986).
24. Meakin P: 'Scaling properties for the growth probability measure and harmonic
measure of fractal structures', Physical Review A, 35, No 5, pp 2234-2245
(1987).
-
44
FRACTAL POPULATIONS
25. Meakin P: 'Diffusion-controlled cluster formation in 2-6 dimensional space',

Physical Review A, 27, No 3, pp 1495-1507 (1983).
26. Meakin P: 'Cluster-growth processes on a two-dimensional lattice', Physical
Review B, 28, No 12, pp 6718-6732 (1983).
27. Stanley H E and Meakin P: 'Multifractal phenomena in physics and chemistry',
Nature, 335, pp 405-409 (1988).
28. Niemayer L, Pietronero Land Wiesmann H J: 'Fractal dimension of dielectric
breakdown', Physics Review Letters, 52, pp 1033-1036 (1984).
29. Batty M and Longley P A: 'The fractal simulation of urban structure',
Environment and Planning A, .!.., No 9, pp 1137-1274 (1986).
30. Anas A: 'Residential location markets and urban transportation', Academic
Press, New York (1982).
31. Appleby S: 'The q dimensions and alpha spectrum of the human population
distribution', submitted to Geographical Analysis (1994).
32. Bedrosian S D and Jaggard D L: 'Fractal-graph approach to large networks',
Proceedings of the IEEE, 75, No 7, pp 966-968 (1987).
33. Jaggard D L, Bedrosian S D and Dayanim J F: 'Large fractal-graph networks',
Proceedings of the IEEE International Symposium on Circuits and Systems, pp
948-951 (1987).
34. Bedrosian S D, Jaggard D L and Sun X: 'Multi-foci fractal graph networks',
International Symposium on Circuits and Systems, pp 2674-2676 (1990).
35. Stauffer D and Aharony A: 'Introduction to percolation theory', Taylor and
Francis, London (1992).
36. Wilson K G: Renormalisation group, critical phenomena and the kondo problem',
Reviews of Modern Physics, 47, pp 773-840 (1975).
37. Orbach R: 'Dynamics of fractal networks', Science,
1ll, pp 814-819 (1986).
38. Aharony A, Gefen Y, Kapitulnik A and Murat M: 'Fractal eigendimensionalities

for percolation clusters', Physical Review B, ll, No 7 (1985).
39. Mandelbrot B B, Gefen Y, Aharony A and Peyriere J: 'Fractals, their transfer
matrices and their eigendimensional sequences', Journal of Physics A, 18, pp
335-356 (1985).
40. Christie P and Styer S B: 'A fractal description of computer interconnections',
Report of University of Delaware, Electrical Engineering Department (1991).
41. Vergin R C and Rogers J B: 'An algorithm and computational procedure for
locating economic facilities', Management Science, 27, No 6 (1967).
42. Appleby S: 'Estimating the cost of a telecommunications network using the fractal
structure of the human population distribution', submitted to lEE Proceedings
on Communications (1995).
INTERNAL MARKETS
I Adjali, J L Fernandez-Villacanas Martin and M A Gel!
4.1
INTRODUCTION
The communications industry is going through a period of considerable

change as market forces and a proliferation of information technologies and
information service providers enable a convergence between telecommunications, computing and media to take hold. Competition, the dominant
process affecting the emergence and evolution of the new communications
industry, is going to revolutionize the ways in which companies will be
structured and run and the ways in which communications technologies will
be harnessed.
An indication of the qualitatively new market conditions which are being
created can be appreciated by looking at the initial evolution of the
communications market just within the UK; a similar picture can also be
painted for the USA, for example.
Figure 4.1 shows the number of operators in the UK has increased since
liberalisation of the incumbent monopolist in the early 1980s. Growth in the
number of operators in the communications market, perhaps now the most
open market in the world, is practically exponential, although some of the
initial market entrants have dropped out; saturation of the market is likely
to occur before the end of the decade as trading conditions become
increasingly harsh [2]. Such opening of markets to competition is becoming
a characteristic feature of the global economy and will allow a fundamental
restructuring of the global communications infrastructure and its modes of
governance.
46
INTERNAL MARKETS
80
60
.9
~
Ql
a.
0
'0
40
Qj
.0
E
::J
c:
20
0 .......
....-..
year
Fig. 4.1
Growth of UK telecommunications operators, 1980-90 [I].
The advent of large numbers of communications operators means that

the number of networks will increase; most of the new operators are building
their own networks. The transition from the public utility paradigm (PUP)
to the competitive industry paradigm (CIP) will bring about many changes
in the ways in which networks are operated and services are provided. Some
of the key differences between these paradigms are indicated in Table 4.1.
Table 4.1
Key differences between public utility and competitive industry paradigms.
Public utility paradigm
Competitive industry paradigm
Monopoly
Monopoly service provision
Business oriented
Stable pricing regime
One network
Large numbers of competitors

Competitive service provision
Customer oriented
Dynamic competitive pricing
Networks of networks
Competitive interworking issues
Pressures to decentralize
'Central Office' philosophy
The emergence of the CIP will bring enormous changes not only to the
ways in which a communications company carries out its business to win
and keep its customers but also to the basic philosophy underlying network
engineering. Having to interconnect and interwork with large numbers of
competitive networks may undermine the traditional philosophy of the
Central Office which has until now dominated telecommunications. It is
unlikely that the 'Central Office' philosophy will be able to cope with the
operation of global networks and their myriad interoperations with networks
INTRODUCTION
47
of networks. The emergence of networks of networks, operating within the

decentralizing global economy, by its very nature will suggest an unavoidable
migration from centralization towards competitive decentralization in
communications systems. The trends towards decentralization are already
apparent in many other areas of human activity and social structure.
The changes occurring in the communications industry fall basically into
two sets of challenges:
rapid business reorientation towards a decentralized competitive market

focusing on customers (see Fig. 4.2);
the accompanying reorientation of communications technology from

centralized networks towards networks incorporating a significant degree
of decentralization.
regulated monopolist
PUP
homogeneous network
Fig. 4.2
competitive access
provider
CIP
heterogeneous network
The PUP-CIP transition in the communications market.
These challenges are strongly interrelated and, for many, require

considerable changes in business and technology mind-set, particularly for
those monopolistic operators emerging from the PUP. These challenges also
require new tools and techniques with which to address the qualitatively new
issues which will arise as a result of operating in an increasingly competitive
market.
A characteristic feature of the communications market is that it is moving
into conditions of increasing disequilibrium; the continuation of any given
trend through market churn and market discontinuities cannot be guaranteed.
Evolution, pluralism and discontinuities are key features of this new market.
The market is no longer in steady state and therefore many of the basic tools
of economics, which have evolved to deal with the simple cases of markets
in equilibrium [3], such as those exemplified under the PUP, have little
relevance. New economic tools are required to tackle the challenges posed
48
INTERNAL MARKETS
by the real-world market conditions [4]; attempting to perform business

analysis using outdated techniques, such as equilibrium analysis and marginal
analysis, will not provide competitive edge.
To address this issue, a new theory of market systems has been developed
which provides a way of investigating economic behaviours in disequilibrium
markets. An important feature of the theory is the capability of describing
explicitly the effects of market fluctuations, a characteristic of today's markets
which is becoming all too apparent but which lies beyond the scope of
traditional economic frameworks. The new theory provides a powerful
framework for attempting to understand and unravel market processes in
the competitive communications sector. The availability of such tools will
enable evolution scenarios, continuous or otherwise, to be devised and tested
so that appropriate market strategies can be engineered and competitive edge
implemented.
An important application area of the theory is in helping to construct
competitive (real time) pricing strategies for communications services; with
increasing competition it is expected that pricing inside the communications
system, particularly towards the end of the decade, may begin to exhibit some
of the turbulence, volatility and discontinuities normally seen in the financial
markets. Strategic pricing of services may become an increasingly important
tool in maintaining competitive edge in a market in which customers will
become increasingly cost conscious. First indications of the trend towards
competitive pricing have been seen with the emergence of discounts, special
deals and targeted subscriptions in the UK and UK communications markets;
broadband services may provide the testing ground for competitive pricing
of services in real time. It is not clear that regulators will wish or be able
to regulate the automated intelligent processes associated with the emergence
of competitive real time pricing.
In addition to providing new techniques for analysing business and
devising strategic manoeuvres and tactics, the concept of market-oriented
analysis is also relevant to technology. It has been suggested that the
emergence of competitive markets outside networks will lead to the diffusion
of market-based processes into the communications networks themselves [5].
Such processes, such as real time pricing, will prompt not only the emergence
of intelligent automated business systems to feed competitive evolution in
the market-place, but may also play an important role in bringing a shift
towards increasing network decentralization. It is well accepted that realworld command economies are brittle and inefficient compared with marketoriented economies; the analogy which is being made is that 'Central Office'
communications systems, some of which have already shown the capability
for continent-wide day-long outage [6], are likely to be more brittle and
inefficient compared with decentralized systems. It is becoming a mandatory
THE MODEL 49
requirement for communications systems to be robust, adaptable and flexible

while increasing in levels of complexity in order to capture new market
opportunities. Market-based (decentralized) approaches, augmenting
conventional approaches, may provide a way of meeting such requirements.
Indeed, elements of negotiation, familiar in economics, are already encompassed to some degree within, for example, client/server, open distributed processing and asynchronous transfer mode (ATM) systems.
Decentralizing markets call for decentralising communications. Increasing
decentralization, which itself is not without risk, especially in mixed
communications environments, may become necessary to limit the size and
scope of network outages.
In this chapter an outline is given of the theory which has been developed
and a selection of results are presented for a simple (2-resource) system
illustrating various market behaviours which can occur and their connection
with physical parameters in the model. Many of the emergent behaviours
are highly non-intuitive and indicate the strategic importance of developing
tools for analysing how to compete in a future communications market which
will be seething with real competition, uncertainties and discontinuities.
4.2
THE MODEL
A central feature of open systems is the nonlinear nature of their dynamics,

which gives rise to a rich repertoire of behavioural regimes ranging from stable
equilibrium to oscillations and chaotic states. One has to construct a theory
which integrates both the deterministic evolution equation, responsible for
the macroscopic behaviour of the system, and the stochastic part which deals
with fluctuations within the system. Fluctuations arise because of uncertainty
and asynchronicity in the system.
A theoretical framework has already been formulated for describing a
self-organizing, open computational system with resources, free agents and
pay-off mediated interactions [7] that overcomes the limitations inherent
in previous work. Examples of market (agent/resource) processes inside
communications systems are:
bidding processes at the front-end of switches and processing units;
toll gates into information superhighways, transmission lines, databases

and other information systems;
bidding, bartering and auctioning processes for links, common channel

signalling, channel capacity and frequency allocation;
voting for carriers.
50
INTERNAL MARKETS
This approach allows the effects of fluctuations to be investigated

systematically in the form of a large system-size expansion due to Van
Kampen [8, 9]. Figure 4.3 shows an outline of the model which is described
in detail in Adjali et al [7]. Once a master equation for a general
agent/resource system has been written down, the master equation obtained
by describing a Markovian jump process is considered and then Van
Kampen's system-size expansion is applied. The deterministic equation for
the behaviour of the system arises as the lowest-order (dominant) term in
the expansion and is seen to coincide with the mean-field equation of Kephart
et al [10]. The main contribution of the fluctuations comes in the form of
a linear Fokker-Planck equation (FPE). Up to this order the noise in the
system is linear and the solution of the master equation is given by a Gaussian
(normal) distribution. Nonlinear effects of fluctuations are calculated as small
perturbations to the linear noise approximation.
computational system
the master equation
ata P(n,t) = 0' 1:1:

I
J;JI!I
[njP;(n')P(n',t) - njP;(n)P(n,t)]
the large system-size expansion

f=tI>+f~
the deterministic evolution equation

dtl>
- = p(tI-tI>
dt
the linear Fokker-Planck approximation
O'-'~ <0
at
O'-'~<F>
at
= a;<O;
= 2a'W> + a
1
2
first nonlinear fluctuation corrections
a
at
'~
1
f
,O'--<>=a,<>+-a
a <F>
at
0'-'_
Fig. 4.3
fa~<O
+ (2a; +
~ az)<F>
<F> +-a,
f2 '- W>
a
+ fa;' +
a';'
<~'>
+ a2
Main steps in the proposed model for an agent-resource system,
THE MODEL
51
In this model, the function p represents the probability that an agent in

the system will find one resource more attractive than the rest. In general,
the exact form of p is not known and will depend on several features of the
problem at hand, such as incomplete, uncertain or delayed information about
the available resources, as well as other factors (e.g. previous history of
dealing, trust, special offers, advertising) influencing the choice that agents
may make.
In the next section the expressions that relate p to the pay-off and the
incomplete knowledge in a system with two resources will be derived
analytically; the results for this simplified case will be analysed and the effect
of changing pay-offs and uncertainty parameters studied thoroughly.
However, these price-mediated interactions need to be modelled for a general
system containing an arbitrary number of agents and resources. The theory
described in Adjali et al [7] leaves the reward mechanism for each resource
as a free input parameter. The examples given in the next section assume
that each resource has a pay-off described by a polynomial of a certain degree
(from linear to cubic dependence) on one variable, j, the fraction of agents
using, or market share of, a resource.
A pay-off model is proposed that is based on the biological stratification
from carbon-based molecules to proteins and enzymes (see Fig. 4.4). In
biology, four organic molecules (bases) code for 20 different aminoacids in
a DNA chain; these chemically active units are themselves responsible for
the coding of thousands of proteins and enzymes that intervene in chemical
reactions in cells and, in general, organisms. How well an organism does
at the macroscopic level can be interpreted and modified by going down the
biological structure and changing (mutating) bits of information that result
in different macro-structures. The driving force behind improvement is the
survival of the fittest that kills off organisms that do not mutate to adapt
better to the environmental conditions. Similar concepts are relevant in
today's communications market - companies and networks must adapt to
the requirements of the global market-place.
This model starts from a set of numbers (it could well be a set of integers,
natural or real numbers). The system bases are a collection of quality boxes,
shown in Fig. 4.5, filled with a number from one of the previous sets; in
this model up to eight different bases have been used, although this number
could be extended as needed. A combination of these bases characterizes a
system parameter.
In this model it is assumed that a pay-off's dependence can be described
by a polynomial of third order; this covers constant, linear, quadratic and
cubic boxes for each coefficient. Also included are a random component box,
with the purpose of introducing extra noise, a relaxation box (different
systems may react with different speeds to changes in pay-offs), a smell box
52
INTERNAL MARKETS
biological pay-off mOdel)
I
I
I
_____ 1
I
I
I
I
._---,----I
I
I
-----1----I
I
I
I
I
_____ I
I
organism
systems
~------------~
I
I
I
pay-offs
macroscopic layer
how well organisms perlorm
Fig. 4.4
Biological analogy for the pay-off model.
Fig. 4.5
Box structure for any system parameter.
(introducing the idea of pay-offs weighted by the distance between systems

or awareness of others' activities and other market intelligence) and a greed
box which quantifies behavioural aspects influenced by market dynamics.
The system parameters are the ingredients of the pay-offs. The next section
will, for simplicity, refer only to one, j, number of agents in a certain resource;
the model, however, is quite general and allows for pay-offs depending not
only on j but also on the price of a service, total volume of business, profit
made, costs, profit re-investment and initial capital. Once a set of independent
system parameters has been established by working out what (if any) are the
relationships within the initial set (e.g. the profit could depend on the costs,
THE MODEL
53
the reinvestment of the profit, etc), the total pay-off would be the composition
of those for each individual system parameter identified in the independent
set. It should be noted that, as in its biological counterpart, the system bases
are common to all system parameters in the same way as aminoacids are
always combinations of three of the same four bases.
Depending on the content of the relaxation box, when a system parameter
decides to change its appreciation value (this time can vary for the different
parameters), there is said to be a pay-off mutation. The content of a random
number of boxes changes by the amount indicated in the greed box. This
mutation mechanism turns into a competitive process as systems try to
improve their pay-offs. In an environment with a large number of systems,
these learn to adopt evolutionary strategies that defeat (or suppress) the
weakest with lowest pay-offs. Social behaviours, such as parasitism, flock
formation, antiparasitism seen in biological systems, will emerge in these new
telecommunications, computational and market systems as players learn how
to change their attributes to dominate the market.
If a system is devised in which resources provide several services with
different pay-offs to a pool of randomly distributed agents, these competition
mechanisms would help the resources to become more attractive to the agents.
The pay-off pool may be finite, Le. resources cannot increase their pay-offs
permanently if there are no resources that are lowering theirs; this represents
a dynamic conservation process. The size of this pool can be fixed initially
or can be left to expand or contract. In a computational model the limits
would be imposed by the initial size of allocated memory and the processor
speed as resources would compete for space and CPU time. The agentresource system is open-ended when resources evaluate their pay-offs
depending on the model constraints instead of being fixed beforehand. There
is therefore a perpetual force, the pay-off mutations, pushing the system out
of local minima in the search for improved solutions.
So far consideration has been given to the resources changing their payoffs, competing and therefore communicating with other resources. But the
agents can also communicate with one another. Inter-agent communication
is achieved through the introduction of an alternative set of agent boxes that
allow for individual characteristics for each agent. Some of these boxes are
templates (e.g. 4 boxes with binary Is or Os code for 16 different templates)
that can be compared with other agents' templates. This comparison process
enables agents to construct plans for market tactics and strategies.
When the number of resources is low, the diversity is not high enough
for the resources to learn from evolution; that is why learning must come
from looking at the system and analysing the competitive behaviour of a few
strategies that have been previously introduced. An example along these lines
will be discussed in section 4.3.2.
54 INTERNAL MARKETS
Both the linear Fokker-Planck approximation and the first nonlinear

fluctuation corrections given in Fig. 4.3 correspond to a system with two
resources. For illustration purposes, this simplied case (fl = I -12' every
agent is either in one resource or the other) will be considered; more complex
systems, with a higher number of resources, have also been considered and
their stochastic components derived analytically.
In the following section new results for a two-resource system, shown
in Fig. 4.6, in both stationary and time-dependent states are presented and
discussed. An analytical expression is also given for the uncertainty parameter
a that affects the perception of each resource from the agent's point of view.
Finally, section 4.4 summarises the main points and indicates directions for
further work.
Fig. 4.6
Schematic diagram of a decentralized agent/resource system made of computational
agents sharing two resources. The system's behaviour is determined by the agents' evaluation
of the pay-off corresponding to each resource.
4.3
The objectives for the two-resource system are twofold:
to investigate the approximation scheme for fluctuations based on the

large-system size expansion of Van Kampen - the one-step Markovian
formulation of the problem would allow, in particular, calculation of
the exact probability distribution for time-independent solutions (see
equation (12) in Adjali et al [7]);
to study the case where resources (as well as agents) compete within the
market environment - as this requires that the resources change and
adapt their pay-offs to the current market situation, time-dependent
solutions with changing pay-offs would, therefore, be sought.
4.3.1
55
Stationary solution
Numerical values for the input parameters (coming into the transitiOn
probability function p) will be taken from Kephart et al [10] to provide
comparison with some previously published (but restrictive) results. In a
simple case, p can be made a function of the fractional number of agents
fusing resource I, through the payoffs G 1 and G2 for using resources I and
2 respectively:
... (4.1)
Figure 4.7 shows the pay-offs G 1 and G2 as a function of f. They model
a simple competitive behaviour (opposing gradients) between agents so that
the pay-off for using each resource decreases with the number of agents
already using the same resource.
An agent will therefore choose to switch to the other resource if its payoff is larger. The system reaches a stability point when the two pay-offs are
equal so agents will prefer staying with the resource they are using. For G 1
and G2 given in equation (4.1), this optimal behaviour of the system occurs
for f = 0.75, i.e. 75% of all agents are using resource l. The decision region
can be made less sharply defined by introducing an uncertainty element in
the pay-off evaluation of agents. This can be achieved by introducing
15
~
Q.
3
2
0L...-_ _. L -_ _....L.._ _--'-_--:--L._ _-.J

0.0
0.8
1.0
fraction of agents using resource 1

The linear pay-off functions 01 = 7 -II and 02 = 7 - 312 associated with resource
I and resource 2 respectively. The system is in equilibrium when the two pay-offs are equal,
i.e. at the crossing point between the two lines.
Fig. 4.7
56
INTERNAL MARKETS
Gaussian noise with standard deviation a around the true value of the payoff. The resulting transition probability p is given by:
p =
2I
1 + erl
(G(-G
2;-
... (4.2)
)]
and shown in Fig. 4.8 for a value a=0.125. The two limiting cases of a=O
and a = 00 correspond respectively to perfect knowledge (f = 0.75) and
complete lack of information on pay-offs, leading to the uniform distribution
of agents (f = 0.5).
1.0
Ql
~
::J
0.8
Ul
Ol
c:
'iii
0
0
"
0.6
,
,, "
()
'0 0.4
1i
til
.Q
0.2
0.
",
"",
""""
"
", "
_...L..._ _-"-_ _.....I_ _-'-......---.;_-'

0.0 I&;...._
"
0.0
0.2
0.4
0.6
0.8
1.0
fraction of agents using resource 1
Fig. 4.8
The transition probability (from resource 2 to resource I) p(f) corresponding to

the pay-offs shown in Fig. 4.7, for two values of the uncertainty parameter a. The intersection
with the line p = f gives the solution to the time-independent macroscopic equation.
By approximating 1 with its deterministic contribution cf>, a graphical

solution is obtained of the deterministic equation (p(cf = cf represented in
Fig. 4.8 by the crossing point between the curves p(cf and cf>. This point gives
the equilibrium solution which now, due to a non-zero value of cf>, is slightly
offset from the optimum value 1=0.75. The macroscopic value of 1 for
a=0.125 is cf> = 0.724.
In order to see how the Van Kampen approximation depends on the
uncertainty parameter a, the time-independent probability distribution has
been plotted for different orders in the approximation, as well as the exact
distribution, for three values of a (vertically) and three values of the number
of agents N (horizontally) in Fig. 4.9.
57
Fig. 4.9
The time-independent probability distribution for three different values of Nand

three different values of a, in three orders of the large system-size expansion (- = mean-field
(including linear noise) result, --- = first order nonlinear corrections included, -' -' - = second
order nonlinear corrections included, .... = exact solution).
As expected, the approximation works better for larger N, as all

approximating distributions are seen converging towards the exact curve.
However, for a fixed value of N, it is noticed that the approximation worsens
with decreasing a. The second order nonlinear corrections become larger for
small a, resulting in a shift in the peak of the distribution of the same order
as the one caused by the first order corrections. This shift gradually disappears
when a increases. Moreoever, for increasing a it seems that the role of
nonlinear fluctuations (in shifting the mean and spreading slightly the
distribution) is suppressed. These observations can be explained by noting
the relationship between internal noise (modelled by the stochastic variable
~ [7]) which is inherent in systems with a finite number of agents, and the
noise deliberately introduced by adding a random element in the agents'
decision-making processes (modelled by a). The result is that, in systems with
agents with (almost) perfect knowledge (small a), nonlinear fluctuations are
prominent (especially in the region of small N); a larger uncertainty in this
knowledge (large a) blurs the region of optimal decisions made by the agents
and suppresses the nonlinear effects of internal fluctuations. This latter
58
INTERNAL MARKETS
fact was noticed by Kephart et al [10 1 and used in systems with delayed
information to reduce the effects of persistent oscillations and chaos, which
are manifestations of nonlinearities in the fluctuations.
The following conclusions can be made:
the approximation works reasonably well for all values of a considered;
the first order nonlinear corrections are sufficient for correctly estimating
fluctuation effects in the system, especially if the uncertainty parameter
is not too small;
the approximation appears to be best suited for systems with a moderate

value of the global uncertainty parameter a ("" 0.5), where nonlinear
effects of fluctuations, although significant, converge rapidly in the
expansion - this may be the range of a to look for in realistic systems,
where agents are neither expected to have perfect knowledge nor be
completely ignorant about the pay-offs associated with their transactions.
The degree of complexity can be increased and then a study made of a

system in which the agents' perception of each resource is different; two
uncertainty parameters are defined, al and a2 for resources I and 2
respectively. This allows different environments to be assigned or simulated
and each resource (market penetration, advertising, geographical location,
radius of influence, etc) to be characterized. The new expression for p
(depending on at and a2) has been analytically derived:
... (4.3)
and coincides with equation (4.2) when al = a2'

So far this analysis has only looked at systems with a single macroscopic
stable behaviour, a consequence of the unique (stable) fixed point occurring
at the intersection between the linear pay-off functions 0l and O2 in Fig.
4.7. In order to see more clearly the effect of having two different uncertainty
parameters, this simple competitive behaviour will be changed by making
the pay-off functions nonlinear, i.e. introducing co-operation as well as
competition between agents in the system. Whereas competition meant that
agents would favour a resource if it had less agents using it, co-operation
is expressed by an increased pay-off when a resource is used by more agents.
The interplay of these two tendencies through nonlinear pay-offs leads to
a richer range of possible behaviours in the system.
The system's dynamics depends notably on the different values of
uncertainty parameters at and a2' In Fig. 4.10 the probability for a system
59
0.40
0.40
02= 0.04
~0.30
~ 0.20 Ifl ..
0.30
2=0.24
~ 0.20
.g
0.0.10
0.0.10
0.2
0.40
0.6
0,
0.40
~030
O2
~ 0.20
=0.54
2=0.98
0. 0 . 10
0.00
Fig. 4.10
0.8
Effect of different 02 for each resource for a system with cubic pay-offs (leading
to a bistable system), with uncertainty 0\.
with cubic pay-offs (leading to bistability) has been represented as a function

of the market share f and the uncertainty a\, for four different values of az.
For low values of al and a2 the agents are uniformly distributed into two
well-defined peaks of probability which correspond to states of nearly total
occupancy or total vacuum (the reverse situation happens in the second
resource).
By increasing a" a2 or both, the two peaks are seen to gradually get
closer to each other and merge into a single (symmetric) peak.
These critical values of a, and a2 can be found by inspection of the timeindependent macroscopic equation - they play the role of control parameters
which can change qualitatively the dynamical phase space of the system, in
this case from a system with two attractors to a system with a single one.
This is reminiscent of phase transitions in physical systems such as the
spontaneous magnetization of a ferromagnetic system which happens by
lowering the temperature below a critical value (Curie temperature). Above
this value the overall magnetization is zero and symmetric while below it there
are two possible states of opposite magnetization. By choosing one state or
the other, the system breaks its spatial symmetry, just like, by decreasing
a, or a2 below their critical values in the agent/resource system, a sudden
60
INTERNAL MARKETS
transition from an equal distribution of agents on the two resources to a

definite bias towards one or the other can be seen.
4.3.2
Time-dependent solution with changing pay-offs
So far results obtained through the time-independent simplification of the

model [7] have been presented. In order to extend these simulations to
account for time-dependent behaviour, the evolution of a system with two
resources and a number of agents 10<N<50 have been studied. The
deterministic equation (see Fig. 4.3) with the linear pay-offs as in equation
(4.1) and the fluctuation equations have been used in order to estimate their
effect on the evolution of the system. Each pay-off is descrioed by a linear
equation in the market share!of resource 1 (it should be noted that the system
is closed, thus!2 = 1- f) as:
... (4.4)
In this model notation, and for this particular example, the pay-offs
consist of only one system parameter,!, that is expressed in terms of five
system bases (linear, constant, random, relaxation and greed) from which
only the first two can mutate while the random, relaxation and greed
components are fixed.
Starting from an initial distribution!, which will depend on (11 and (12,
the less-dominant resource (0 2) mutates (increases) its slope and intercept
proportionally to I::..!=!- 0.5. In order to constrain the system it is assumed
that these increases are equally matched by the decreases in slope and intercept
for resource I:
O 2 = (c+ I::.. c)! + (d + I::.. d)
and
0 1 = (a-I::..c)!+(b-I::..d)
... (4.5)
where:
I::..c=-yl::..!+ 0
and
I::..d=cxl::..!+{3
... (4.6)
61
have two contributions each - one depending on how badly they are losing
to the competing resource (sensitivity) and another random component
introducing noise (using here, for simplicity, a = 'Y and 0 = (3).
The competing process goes as follows. The deterministic equation gives
an initial distribution/that allows the resources to calculate how much they
have to mutate their pay-offs to become more attractive to the agents. The
new pay-offs are re-introduced, together with al and a2, to calculate the new
probability p which is simultaneously used to solve the deterministic equation
and the fluctuation equations, thereby giving the market share value / as a
function of time and the evolving pay-offs.
As in biological systems, the rate at which mutations happen is
fundamental in achieving evolutionary improvement. In this simple case it
has been observed that after a mutation, irrespective of the initial
configuration, equilibrium is always reached after 100 units of time. This
is the relaxation time that must be introduced in order to take full advantage
of all the mutations; changes would otherwise happen too quickly for the
system to adapt and gain benefit.
Figure 4.11 shows the evolution of the market share of resource 1 in a
system with two resources and pay-offs, as in equation (4.1), taking
al =0.14, a2=0.40 and a step-size M=O.Ol. The evolving pay-offs were
updated every 100 time units, and calculated with a sensitivity parameter
a=O.1 and a noise parameter, {3, with random values in the interval (0,1).
The simulation starts with resource 1 having a market share of 26070 and
no initial fluctuations. At t = 100, as the system is nearly at its equilibrium
point (f"" 0.73), the first update of the pay-off values is introduced according
0.8
Q)
:5o
~ 0.6
Ol
c:
'iii
::l
III
E
Q)
0.4
1ijl
'0 02
c: .
o
.~
.:: O.OL-
~------I~---~
200
400
600
time
Fig. 4.11
Evolution in a competitive system with linear pay-offs (- is the deterministic

solution, ...... includes the contribution from fluctuations).
62
INTERNAL MARKETS
to the prescription in equation (4.5). This has the effect of changing the
equilibrium distribution in favour of resource 2; the market share!of resource
1 now decreases towards the value of 0.58, which it reaches fairly quickly
before the next pay-off update (at t = 200) takes place. This second update
again reduces the equilibrium distribution to the value of! = 0.49, resulting
in resource 1 having a slightly smaller market share. At the next update
(t = 300), resource 1 fights back and pulls the equilibrium value to! = 0.57.
A general 'tit-for-tat' behaviour is observed where the actions of the two
resources are equally matched (fighting with the same intensity). From the
long-time behaviour of the system observed in Fig. 4.11, it is further
concluded that, after a given number of iterations, the system will settle into
an equal distribution of agents over the two resources. It is interesting to
observe that fluctuations do not affect the evolution of the system drastically.
They seem, however, to introduce a short oscillation in the market share
before the system settles into the equilibrium distribution. Stochastic effects
may have a more important impact if nonlinear pay-offs are used. The
inclusion of more resources is also likely to remove the symmetrical 'tit-fortat' behaviour observed here and may lead to more complex strategies in the
competition between resources. These issues are being investigated and will
be reported elsewhere [11].
4.4
CONCLUSIONS
In this chapter, a model for market-like agent-resource systems has been

studied, whose formulation is based on one-step Markov processes and the
large-system size expansion of the master equation due to Van Kampen [8,
9] . This formulation enables a systematic treatment of fluctuations to be
carried out. A deterministic equation governing the dynamics of the system
in the limit of large numbers of agents arises as the lowest order contribution
in the expansion and coincides with the equation obtained in the mean-field
approach turning out to be a linear Fokker-Planck equation. Higher order
terms are included to provide nonlinear corrections to the FPE that are crucial
when the number of agents is relatively small and the mean-field theory
inadequate.
A model is also proposed for the agents' perceptions' pay-offs of a
resource based on the way information is dealt with in biological organisms.
New concepts are introduced for system bases and system parameters that
resemble those of organic bases and aminoacids. As in biology, competition
CONCLUSIONS
63
among different systems arises when mutations are introduced at the system
bases' level. New perception values arise in the form of compositions
of system parameters (genes) that are the result of random combinations of
mutated system bases.
In order to test the time-independent approximation in the case of this
agent/resource system, a system with two resources was used and pay-off
functions associated with the two resources were considered in order to model
a simple competitive strategy between agents, with an uncertainty parameter
monitoring the accuracy of information available. Sensitivity to accuracy of
the information available to agents was also studied; the main observation
is that higher uncertainty leads to the suppressing of nonlinear noise effects.
This is compatible with the conclusion in Kephart et al [10] that an increase
in the uncertainty parameter lowers the threshold for persistent oscillations
and chaos in systems with time delay, since these non-optimal behaviours
are the result of nonlinearities taking over in the dynamical equations. It has
also been shown how the one-step Markov formulation enables the exact timeindependent distribution to be found in the case of a bistable system, which
results from nonlinear pay-off functions.
Also a simplified time-evolution scenario has been modelled by making
two resources with linear pay-offs compete for the agents with two mutating
system bases (linear and constant) coding for only one genej. After an initial
period of instability the system adopts a 'tit-for-tat' cycle where one resource
dominates the other only to give way to the competing one after a fixed
relaxation interval. Time-dependent solutions, in complex systems with payoffs that result from the combination of several system parameters each with
up to eight mutating bases, have been studied in a separate work [I I] .
As indicated in the introduction, the emergence of markets inside
communications systems will lead to major changes in the ways in which
communications businesses are run. Increasing levels of competition will lead
to a speeding up of all business operations, including price setting for
communications services. Ability to unravel, understand and probe the myriad
of ultrafast business cycles, rhythms and discontinuities erupting inside the
global communications network will provide the intelligent operator with a
novel source of competitive edge. Competitive real time pricing for non-free
services, coupled with strategic planning and business operations, may emerge
as an important component of automated network intelligence. The work
reported here represents an initial step towards developing new tools which
will enable us to meet some of the challenges associated with the increasingly
competitive, turbulent and volatile communications business.
64
INTERNAL MARKETS
REFERENCES
1.
Reynolds R G: 'Growth in telecommunications: the tools for the job', published

in the proceedings of the lTV 6th World Telecommunication Forum (Policy
Symposium), Geneva (October 1991).
2.
Turner L and Hodges M: 'World market competition - the challenges for

business and Government', Global Shakeout, Century Business London (1992).
3.
Stiglitz J E: 'Information and economic analysis: a perspective', The Economic

Journal Supplement, 95, p 21 (1985).
4.
Littlechild S C: 'Elements of telecommunications economics', lEE, Telecommunications Series, 1.. (1979).
5.
Gell M A and Adjali I: 'Markets in open telecommunication service systems',

Telematics and Informatics, 10, No 2, P 131 (1993).
6.
Chalmers J: 'Catastrophe! the network nightmare', Public network Europe, 1 ,

P 30 (1990).
-
7.
Adjali I, Gell M A and Lunn T: 'Fluctuations in a decentralized agent-resource

system', Physical Review E, 49, P 3833 (1994).
8.
Van Kampen N G: 'A power series expansion of the master equation', Can J
Phys, 39, p 551 (1961).
9.
Van Kampen N G: 'Stochastic processes in physics and chemistry', North Holland

Publishing Company (1981).
10. Kephart J 0, Hogg T and Huberman B A: 'Dynamics of computational

ecosystems', Physical Review, A40, P 404 (1989).
11. Fermindez-Villacaiias Martin J L, Gell M A and Adjali I: 'Evolution in a selforganising agent-resource system', Proc IEEE Conf on Evolutionary
Computation (IEEE WCCI '94), Orlando, Florida (June 1994).
5
EVALUATION OF HOPFIELD
SERVICE ASSIGNMENT
M R W Manning and M A Cell
5.1
5.1.1
INTRODUCTION
The assignment problem
With the rapid increase in complexity of telecommunications and computational systems, an urgent requirement is the development of techniques
for dealing with difficult optimization problems, many of which are associated
with the assignment of tasks to resources. Typical application areas are:
network and service management;
distributed computer systems;
work management systems;
general scheduling, control or resource allocation problems, e.g. financial

transaction handling, traffic scheduling, automated production, reservation, storage or cargo-handling systems.
66
HOPFIELD SERVICE ASSIGNMENT
In such multiservice systems [1, 2] , the mechanism of assigning incoming

tasks to resources is an important factor in the efficiency of the overall system.
A flexible system can be constructed by considering each task as being suited
to a type of resource to a varying degree of success characterized by a gain
term. If the resource is completely unsuitable for the task, this term would
be zero; if it is ideally matched to the task, then the term would be equal
to unity. In between these two limits, the gain term would take on a range
of values, and in the extreme would become a continuous variable, which
is the assumption made in the present model. For a system of M available
resources, task i out of the N incoming tasks can be characterized by:
... (5.1)
where aij are the gain terms such that 0:5 aij :5 1 and Ti is the time estimated
by the task's agent that it would take for an ideally suited resource (aij = 1)
to carry out the task. An estimate of the actual execution time, if task i is
assigned to resource}, is taken to be T/ajj in the present model, so that time
increases as the resource becomes more unsuitable.
If it is assumed that tasks can be dealt with individually, then the
assignment problem can be tackled by simply allocating each task to the
resource corresponding to the highest gain term. In large systems, however,
there will be many incoming tasks at any given decision instant and the
problem then becomes one of switching the tasks through to the resources
in an optimum manner at high speed.
5.1.2
Background
The Hopfield neural network [3] has been extensively used for solving
optimization problems, such as the travelling salesman problem (TSP) [4],
but also for simpler problems such as its use for switching purposes [5-8].
The switching application is relevant as it is very similar to the assignment
problem, even though it is an easier optimization task because there are no
gain terms involved. The assignment problem, also known as the resource
allocation problem, has been dealt with in the form of the concentrator
assignment problem [9] and the list-matching problem [10]. It has also been
considered in the generalized higher-order case [11, 12].
The present study provides more information regarding the parameters
used and the various considerations in the simulation of the Hopfield net
for considerably larger problem sizes. Another aspect of the present work
is its emphasis on the performance of the Hopfield net at system level, in
particular highlighting its behaviour when operating in an overload condition.
INTRODUCTION
67
Because the assignment problem as formulated here is not NP-complete

such as is the case for the TSP [4] or for generalized assignment [I I ] , it
can be solved by linear programming techniques such as the Munkres
algorithm [13]. This algorithm, however, is limited in that it does not extend
to the generalized assignment case and does not allow many-to-many, instead
of the one-to-one, assignment as specified in the formulation here.
Nevertheless, it does provide optimal solutions for the simple assignment
problem, and in this case, the advantage of the neural net has to lie in the
speed gain obtained through its inherent parallel operation, especially when
implemented in hardware and when dealing with large-size problems.
Hardware implementations of neural nets have been used to solve the
assignment problem. In particular, a novel neuroprocessor architecture for
very large scale integration (VLSI) using a winner-takes-all strategy and
hysteretic annealing has provided excellent performance with optimal or nearoptimal results with fast-settling times [14]. This approach, however, is
specific to the linear assignment problem and as such does not aliow
generalization to higher-order problems [15].
More general-purpose neural-net VLSI chips have been developed and
tried out on the assignment problem [16, 17]. The present work intends to
provide a more detailed discussion of the parameters and general issues.
5.1.3
Framework
Studies into the performance of task distribution systems have been made,
and in particular, a comprehensive theoretical framework has recently been
developed for describing processes in service systems [2, 18]. The neural
network can be used within market-based systems [19], where the gain terms
may be considered as a function of cost or price structures. The objective
of the present study is to present the Hopfield net as an implementation
method for distributed task allocation in these and more general systems (Fig.
5.I).
Nevertheless, it is important to understand that the work presented in
this chapter is just a first step towards developing a more sophisticated model,
as the mechanisms controlling the adjustment of the gain terms Cl'ij are a
critical feature of the envisaged system [18,19] (see also Chapters 4 and
16). The present work does not deal with this aspect, but instead takes the
Cl'ij gain terms to be known a priori, assuming a uniform distribution of these
terms for the purposes of the simulations. The hardware-implemented neural
network would therefore perform the assignment operation only as part of
a more comprehensive processing system.
68
inputs:
tasks, jobs,
service-requests,
fault identification
Hopfield neural network
Fig. 5.1
Use of a Hopfield neural net to optimize the allocation of incoming tasks to available
resources. The objective of the neural net is to optimize the gain terms of the chosen task/resource
pairs.
This chapter introduces the Hopfield neural network, then uses the
presented model to solve a task allocation problem by making certain
assumptions about the task allocation process. The relevant factors affecting
the convergence of the neural net are discussed. Simulation results are
presented and, in particular, a range of performances is determined as the
optimization process is forced to become localized instead of being more
global. The merits of the Hopfield net approach are discussed, as well as
the extensions required for a general framework.
5.2
THE BASIC HOPFIElD MODEL
Neural networks consist of nonlinear computational nodes known as neurons,

which operate in a highly parallel manner. In the case of the Hopfield net,
the nodes are based on the McCulloch and Pitts neuron model and are fully
interconnected by weights; Fig. 5.2 shows one such neuron. The processing
function of the neurons is of the following form:
... (5.2)
BASIC HOPFIELD MODEL 69
Fig. 5.2
A neuron in the Hopfield net model.
Here the weight W pq connects the output v p of the pth neuron to the
input uq of the qth neuron, while lq represents an internal bias in the qth
neuron. The summation is over all neurons in the network. The activation
function f has to be some nonlinear monotonically increasing function a standard continuous-valued sigmoid function is used, with neuron outputs
ranging from 0 to 1:
... (5.3)
where {3 is the gain factor, which controls the steepness of the sigmoid
function.
Hopfield showed that the net will always converge providing the synaptic
strengths (weights) in the net are symmetric, i.e. if W pq = W qp for all
interconnections. The proof is based on the use of an overall energy function
associated with the net [3]:
E = - Yz E E
p q
WpqVpV q -
E
p
Ipv p
117
E l~~
p
"
f-l(V' p)dv' p .. (5.4)
where T is the neuron time constant, which throughout this work is set to
unity. Note that the lower bound of the integral term in E representing the
internal energy of each neuron is taken to be Yz instead of the zero value
given in Hopfield [3]. This is because the neuron sigmoid function in this
case ranges from 0 to 1, whereas in their model it is from - 1 to 1. Equation
(5.4) corresponds to the following dynamical equation:
70
... (5.5)
-dEldv
The energy function will decrease until it reaches one of its minima because
dEl dt::5 0 [3] - these equilibrium points are the attractors of the system,
corresponding to solutions of the energy function as determined by the set
of weights Wpq and biases I q in the network.
5.3
THE TASK ALLOCATION PROBLEM
The task allocation controller, implemented by a Hopfield neural net, has

to make a decision on which of the N incoming tasks should be allocated
to each of the M available resources based on the characteristics of the tasks.
The input to the Hopfield neural net can therefore be seen as being an N
x M matrix with entries consisting of gain terms aij as described in equation
(5.1). These terms are adjusted by any other factors influencing the decisionmaking process, e.g. the assignment of tasks requiring long execution times
could be delayed by reducing each term aij as a function of Tj in equation
(5.l). The output of the Hopfield neural net also consists of an N x M matrix,
which is referred to as the output matrix, where each row corresponding to
an input task has a non-zero entry in the column corresponding to the
processor to which the task has been assigned.
5.3.1
Assumptions
In this study, two basic assumptions are made about the system:
task indivisibility;
resource dedication.
The first assumption concerning the assignment of tasks is that the

workload required for each task cannot be split up between several processors
- each task is viewed as being indivisible. In fact, larger jobs will consist
of many such tasks, but each task can only be carried out by one processor.
For this reason, each row in the output matrix will have at most one nonzero entry. All entries will be zero if the task could not be assigned, either
because all available processors took on tasks which were better suited, or
because the gain terms corresponding to any of the processors that could
have been chosen failed to exceed a certain threshold to make the assignment
THE TASK ALLOCATION PROBLEM 71
worthwhile. If the assignment was successful, there will be one non-zero entry
in the row corresponding to the input task.
The second assumption is that each processor can take on at most one
task, and, unless interrupted, that processor is dedicated to the task to which
it has been assigned. Each column in the output matrix therefore has at most
one non-zero entry, depending on whether the processor corresponding to
that column has had a task assigned to it or not. This is not a restriction
on the type of processors used in the system, as long as, for any processor
capable of dealing with several tasks, the task allocation controller is made
aware of the number of effective processor, this amount being equal to the
number of tasks that the processor could handle at that instant in time. The
possibility of a certain level of control hierarchy is therefore also implied.
5.3.2
The Hopfield neural net solution
Because of the two assumptions made, there will be at most one non-zero
entry in any of the columns or rows of the output matrix, depending on
whether a task was assigned to the corresponding resource or not. In addition
to the minimization of both row and column sums, the purpose of the
controller is to maximize throughput, Le. to have the maximum number of
non-zero entries or assignments in the output matrix. These constraints can
be summarized by the first three terms in the following energy function that
the controller has to minimize:
E=A/2
i= I j=l 1= I
I;o<j
Vij Vii
+ e/2 (min(M,N) -
+ B/2
M
i:1 j ~ 1 Vij)
E v
Vk'
1)
j=1 i= I k=l
k;o< i
N
+ D i~ I j~ 1Vij(l -
Cl'ij)
... (5.6)
where Vij is the entry for the ith row and jth column. The first and second
terms correspond to row and column sum minimization respectively, while
the third term results in matrix sum maximization. The fourth term
corresponds to the quantity to be optimized, which in this case is the sum
of chosen gain terms Cl'ij. The objective is for the neural net to minimize the
sum of cost terms Cij = (l - Cl'ij) whilst still respecting the constraints
imposed by the first three terms.
The expression in equation (5.6) is compared to the energy function for
a Hopfield net in equation (5.4) to obtain the weight and bias terms. While
the integral term in equation (5.4) is required in addition to the above function
to ensure that there are two attractors corresponding to a neuron either being
72
active or inactive [20] (see Chapter 16),nevertheless one can neglect it if A,

Band C are made large compared to the term 1/{3. By considering the inverse
of the activation function (equation (5.3, it can be seen that the resulting
integral term is proportional to 1/{3 - a large gain factor (3 will therefore
reduce the effect of the integral [3]; another approach would be to have
a small neuron time constant 7 [21] . By taking a Hopfield net of size Nx M,
where the neuron outputs Vij are similarly indexed in two dimensions, the following expressions for weights and biases are obtained:
- AOik(l -
J..IJ
C/2 - D
0jl) -
Ojl(l -
0ik)
... (5.7)
(l - (Xij)
where Wij,kl is the weight between neuron ij and neuron kl, and I ij is the bias
for neuron ij; Oij = 1 if i = j and 0 otherwise. Substituting the weight and bias
terms back into the dynamical equation (5.5) gives:
dUi/dt =
- Uij -
EVil -
1=1
I;><j
k=1
Vkj
+ C/2 - D(l-
(Xij)
...
(5.8)
k;><i
where the neuron inputs Uij are initialized to zero. (Small random initialization values could also bJ used, but this was not found to make any
difference.) Equation (5.8) is the required differential equation to update
the neurons at each iteration, resulting in minimization of the energy function
and therefore leading to the solution of the optimization problem.
Once the neural net has converged using the dynamical equation (5.8),
then only N neurons (or min (N,M) neurons if N;zt.M in the case of a
rectangular array) will remain active, the rest having been turned off. If the
neuron in position (i,j) is active then this is the control for task i to be assigned
to resource j.
5.3.3
Options and refinements
The optimization term for the gain terms (Xij in the energy function (equation
(5.6 results in the neuron biases being adjusted by the (Xij (equation (5.7.
An alternative is to follow the switching approach [5] and simply initialize
the neuron inputs Uij with the values of the gain terms (Xij' Using a simplified
dynamical equation without the D-term (equation (5.8 will lead to the
neurons with the highest initial values being chosen that still satisfy the constraints given by the A, Band C terms. The method involves centring the
inputs around zero and multiplying them by a factor of, for example, 102
to improve convergence time [22]. While this approach does present a speed
advantage (approximately twice as fast), it leads to slightly worse results (by
HOPFIELD NEURAL NET IMPLEMENTATION
73
about 1%) and exhibits a higher sensitivity to the choice of the A, Band
C parameters (e.g. up to 3070 difference for the two sets given in section 5.4.2).
The C term in the energy function (equation (5.6 relies on the fact that
the dynamics are such that the summation of Vij terms always approaches
the min (M,N) term from below, thus guaranteeing that there is no signchange leading to erroneous operation. For more complex problems such
as the TSP [4], the C term is squared so that only the magnitude is considered
in the energy function. This results in the C /2 expression in the dynamical
equation (5.8 being replaced by C (min(M,N) - E E Vij)' As can be
expected, this modification leads to no performance gain for the simple assignment problem.
Tagliarini and Page [9] use a D term which is added to the weights Wij,kl
rather than the biases I ij The D term in this case is more complex, as the
symmetry of the weights matrix needs to be maintained. No advantage was
obtained from this method.
Brandt et al [10] use an additional feedback term FVjj, where F is a
constant, which is added to the right hand side of the dynamical equation
(5.8). This term originates from a slightly different formulation of the energy
function. Again, the results obtained were no better than for the simpler
dynamical equation (5.8).
5.4
HOPFIELD NEURAL NET IMPLEMENTATION

5.4.1
Attractors
To obtain neuron output states Vij of either zero or one, the final neuron
input Uij has to be either a large negative or a large positive value. To
guarantee the existence of these two attractors, i.e. asymptotically stable
states, the dynamical equation (5.8) is considered under conditions of
equilibrium, i.e. with du / dt = 0, for the two cases to obtain the following
(see Chapter 10):
Vij
=0
Vij =
~ Uij
= -A -B + C/2
~ Ujj =
-D(1-aij)
C/2 -D(1-aij) > 0
< 0
... (5.9)
... (5.10)
To satisfy these inequalities for any value of aij, equation (5.9) uses the
maximum value aij = 1, while in equation (5.10) the minimum value aij = 0
is taken. Additionally, it is beneficial (see Chapter 10) to keep the negative
attractor in equation (5.9) larger than the positive attractor in equation
(5.10). As the solution requires far more zeros than ones, a larger negative
attractor can improve the operation of the network. While the
74
network is intended also for asymmetric mput matrices, it is primarily

dimensioned for square matrices, so A = B can be set. All these conditions
lead to the following constraints for the optimization parameters:
A> CI2>D
... (5.11)
5.4.2
Parameter determination
A useful relationship was developed in Bousono and Manning [15] to

demonstrate the relationship between the various parameters. First of all,
the energy function in equation (5.4) is normalized to one of the optimization
parameters, such as C, so that E = CEO" Substituting v from equation (5.3)
into the first part of equation (5.5) gives:
Llv= -2{3v(l-v)dE/dv Llt
= -2v(l-v)C{3dEn/dvLlt
... (5.12)
It can be seen from this that the parameters {3, C and Llt playa similar
role in the evolution. Because of the decay term in the dynamical equation
(5.5) and also the updating process, there remain significant differences
between the way these terms affect the operation of the network.
The performance of the net is significantly affected by changes in the
gain factor {3 in the activation function of the neurons in equation (5.3). A
large value of {3 implies that the neuron outputs will nearly always be very
close to zero or to one. Because the neuron states are then close to their
attractors, this makes it impossible to escape from false local minima in which
the net may have become trapped. For this reason, one has to reduce {3,
provided that the optimization parameters are made large, although if {3 is
too small, the net will find it difficult to evaluate correctly the differences
between the gain terms aij, especially for large networks. Extensive testing
over the range of matrix sizes considered (up to 128 x 128) led to the choice
of {3 = 1.
The step-size Llt used in the updating procedure U new = Uold + (du / dt)Llt
of the neuron inputs is important. If Llt is too large, errors will result in the
updating. Simulations for a range of matrix sizes gave results close to optimal
from Lltz5.10- 2 onwards, with convergence times increasing rapidly from
Llt z 5.10 - 4. The chosen operating value of Llt = 10 - 2 showed no deterioration compared to smaller step-sizes.
The set of optimization parameters used was A = B = 1000, C = 25 and
D = I, which performed well over the range of problem sizes. An example
of a set which gave better results than the first set (0.3 % closer to optimal)
HOPFIELD NEURAL NET IMPLEMENTATION 75
over the range studied was A == B == 200, C == 100 and D == 1. Whilst the latter
set was fine for a uniform distribution of <Xij, the performance degraded for
the extreme case of using binary (Xij for large-size matrices (> 50 x 50) due
to the smaller value of A used. The parameters should work over the whole
range of sizes and ideally for any type of <Xij distribution. In any case, no
parameters were found which gave more than marginal improvements over
the first set.
5.4.3
Performance
The performance of the Hopfield net was evaluated for two types of
distribution of gain terms <Xij - uniformly distributed and binary. Figure
5.3 gives the performance over matrix sizes from 5 x 5 to 50 x 50 averaged
average
3.0
(ij
2.5
.S 2.0
a.
0
g
0~
1.5
1.0
,,
,,
, ...., ,
0.5
0.0
,,
,,
"
"
10
"
20
30
40
50
problem size
(a)
(ij
25
20
15
g
~
0
worst case
\
~
.... _- _--
10
5
0
...
10
20
30
40
50
problem size
(b)
Fig. 5.3
Performance of the Hopfield neural net compared with the optimal solution, showing
(a) average and (b) worst case. The comparison is for uniformly distributed (-) and for binary
(----) gain terms, (tij'
76
over 100 trials in each case. The average performance for the uniformly
distributed case is approximately 2070 from optimal, but gradually improving
with increasing matrix size. As can be expected from the simpler problem,
the extreme case of binary values for CYij results in near-optimal solutions.
Because of the random nature of the inputs, small-problem sizes suffer
from trials that locally give non-uniform distributions of CYij over the matrix,
resulting in reduced quality of solution. The worst-case solutions in
Fig. 5.3 (b) reflect the fact that the large-problem sizes result in more of a
uniform distribution of CYij across the matrix, which the neural net finds
easier to solve.
A different method of generating a uniform distribution of CYij across the
matrix gave averages that were of the order of 0.1 % (instead of 2 %) from
optimal for a 10 x 10 matrix.
The results compare well with published ones, such as an average of 7.5%
from optimal for 5 x 12 matrices [9], or 50% of trials within 3% of optimal
for 7 x 7 matrices [17]. The neuroprocessor architecture in Eberhardt et al
[14] outperforms the general-purpose neural-net structure used here, giving
worst case results of 0.5% from optimal for matrices up to size 64 x 64.
Difficulties exist with comparisons in general, as there is a level of uncertainty
as to how the input data has been generated.
5.4.4
Considerations
The operation of the neural network relies on the balance of the constraint
terms (A, Band C terms) and of the optimization term (D term) in equation
(5.6). The result is therefore not necessarily improved by simply increasing
the value of D, as the solution may move to the high CYij values too fast. The
net converges to a set of chosen neurons with some corresponding to higher
CYij, but with the whole set corresponding to an overall lower sum of CYij
terms. The goodness of the solution was most sensitive to the value of D
and did not improve by increasing D for the given parameter set.
A second possibility for increasing the goodness of the solution is to add
noise into the system [23]. In fact, this is a necessity, if binary CYij are used,
in order to escape local minima during convergence. Noise of order 10- 2
was found to be most appropriate for this. This was injected into the system
either by addition to the dynamical equation (5.8) or to the gain factor {3
in equation (5.3). This noise did not affect the goodness of the solution for
SIMULATION RESULTS
77
the uniformly distributed (Xij' More generally, many simulated annealing

schedules [23] were tried out, but did not improve on the quality of the results
obtained.
5.5
5.5.1
SIMULATION RESULTS
Basic performance
This section introduces the measurement expressions of utilization and grade

of service, and then evaluates the performance of the Hopfield net by considering these terms. The utilization is the percentage of resources that are
busy. It is evaluated over the length of a sampling interval, this being the
time in between two sampling instants when the Hopfield net carries out the
task allocation. If a task cannot be assigned to a resource because all resources
are busy, then this is classed as a lost request, assuming that there is no queuing in the system. The grade of service (GoS) is the number of lost requests
in the system as a percentage of the overall number of incoming requests.
A system is constructed by informing the neural net of the incoming tasks,
as defined in equation (5.1), and of the available resources to which the tasks
may be assigned. A process external to the neural net keeps track of which
resources are available. The Hopfield neural net was simulated using a
variable size depending on the number of tasks N and the number of resources
M. In practice, the neural net would be a fixed size so that the weights in
equation (5.7) do not need to be changed. In this case, the neurons for the
entries in the unused rows and columns have their biases set to a value
I min $.CI2-D (from equation (5.7, giving a performance found to be no
different from the variable-size approach.
The behaviour of the system is shown by plotting averaged results over
a range of relevant input rates for a 64-resource system. Different sampling
intervals are used, i.e. the time is varied between each instant when the
Hopfield net makes the assignments. Figures 5.4(a) and (b) demonstrate how
system performance deteriorates if the sampling interval chosen is too small.
If the net can only assign one task at a time, then it will only choose the
best resource for that task, irrespective of how detrimental that choice may
be to the assignment of future tasks. This corresponds to simply picking the
maximum gain term for each task on a first-come, first-served basis, which
is a function that is easily fulfilled by any non-neural approach.
78
i'::~~';'--=--==
:= 40
'S 20
o
o
20
40
60
80
arrival rate
(a)
~5-~ 60[
40
~
~ 2:
(..
..I.,
20
.,,'
...... ...,.
,**'
,I.-_E:../_/_,."
.......
I ..-
40
60
~,
80
arrival rate
(b)
Fig.5.4
Averaged utilization and number of lost requests for a 64-resource system against
task arrival rate. Performance is given for three sampling intervals: 0.2 (--), 0.07 (----) and
0.02 (-.-.-.). Optimization becomes global as the sampling interval is sufficiently increased, hence
the worse performance deterioration for small sampling intervals in overload condition (constant
task arrival rates, constant task times 7 j = I).
5.5.2
Overload conditions
In overload conditions, the number of lost requests is simply equal to the

excess of arrivals above those allowed once maximum utilization has been
reached, provided that the sampling interval is chosen correctly. For small
intervals, on the other hand, the system (or any other sequential assignment
process) fails badly by allowing a rapid increase in the number of lost requests.
A minor disadvantage is that when maximum utilization has been reached
in the case of larger sampling intervals, then the actual utilization will not
completely approach 100010 as with small sampling intervals (Fig. 5.4(b.
The reason for this is that, due to the larger interval, there will be on average
more idle time in between termination of tasks and the following sampling
instant, which is the earliest time at which a resource may receive its next
task. The solution to this is to have a system with an adaptive sampling
interval.
SIMULATION RESULTS 79
5.5.3
Scalability
The set of curves is similar for other matrix sizes. To demonstrate the
scalability of the application, the highest task arrival rate for which no lost
requests occur is found over a fixed number of runs of 500, using a sampling
interval of 0.2. In Fig. 5.5 this breakpoint is shown to increase linearly with
the number of resources in the system, given an identical distribution of gain
terms aij for all system sizes, which in this case is a uniform distribution
between 0 and 1. Figure 5.5 indicates that the system is well-behaved in that
an increase in the number of resources leads to a linear increase in overall
performance.
In terms of speed the neural net scales fairly well - 290 iterations for
a 10 x 10 matrix increases up to about 1000 iterations for a 200 x 200 matrix
for the parameters given in section 5.4.2. It should be noted that these figures
could probably be improved through the fine-tuning of the simulations. A
trade-off between convergence speed and goodness of solution can be made
by an appropriate choice of parameter set, e.g. halving these times for a drop
of a couple of percentage points in performance.
maximum arrival rate with no lost requests over 500 runs
100
Q)
~ 60
"iii
>
.~ 40
20
Fig. 5.5
Scalability of application - the maximum arrival rate, allowing no lost requests,
increases linearly with the number of resources (constant task arrival rates, task times Tj = I,
sampling interval = 0.2).
80 HOPFIELD SERVICE ASSIGNMENT
5.6
DISCUSSION AND IMPLICATIONS
The Hopfield neural net has been proposed as an optimization tool for
dynamic task assignment in an environment consisting of different types of
resource. The purpose of the present work has been to determine all
parameters affecting the operation of the Hopfield net, which then led on
to an assessment of the net's performance. Because the task assignment
problem is a general one, the results presented here can also be applied in
other planning or control systems.
Although the Munkres algorithm [13] works optimally and exceeds the
performance of the Hopfield net in the restricted case of linear assignment
only, the neural network has the advantage of flexibility of application. For
more general higher-order cases, the Hopfield net easily outperforms any
conventional algorithms [15]. It has also been successfully adapted for
carrying out many-to-many assignments instead of one-to-one assignments.
An example of this is 2-to-l assignment, if each processor can accept two
tasks rather than just one. This is done by replacing the summation terms
in the dynamical equation (5.8) by sums of product terms. Because of its
inherent parallelism, the neural net can obtain significant speed improvements
over the conventional algorithm, if it is implemented in a parallel hardware
structure [16, 17], and especially for large-size problems.
It is intended to construct a system of clusters of resources connected
to the network via Hopfield nets acting as controllers. A range of topologies
can be considered, e.g. in Fig. 5.6 a consortium of resource clusters is formed
around a ring network which is accessed in turn via a controller by the external
network, thus allowing a hierarchy of controllers to be set up if necessary.
Applications in such heterogeneous decentralized systems are discussed further
in Chapter 4.
A prerequisite for such systems is that a job consisting of many tasks
will require communication between the various tasks. So, in addition to the
execution costs, there will be communications costs incurred which will be
negligible if the tasks are performed on the same machine, and if carried
out within the same resource cluster will be less than for communications
between different clusters. The work described in this chapter has now been
successfully extended to take into account additional constraints, such as these
communications costs, allowing the method to be applied to more general
higher-order problems [15].
CONCLUSIONS 81
resource
cluster
Fig. 5.6
5.7
General network architecture.
CONCLUSIONS
The Hopfield neural net has been proposed as an implementation method

for the assignment of resources in a distributed multiservice environment,
82
showing that the assignment is carried out in a near-optimal manner. The

critical parameters for a general-purpose Hopfield neural network structure
were given. It was also shown that the sampling interval for decision-making
by the Hopfield net needs to be sufficiently large consistent with the task
arrival rate, otherwise the performance of the system deteriorates to that of
a sequential controller. Given a suitable choice of architecture parameters
as indicated here, the resulting system possesses both the flexibility and speed
to deal with assignment processes, especially when coupled with bidding
mechanisms which mediate assignments of tasks to resource clusters.
REFERENCES
1.
Hubermann B A (Ed): 'The ecology of computation', Elsevier Science, BV North

Holland (1988).
2.

Telematics and Informatics, 10, No 2 (June 1993).
3.
Hopfield J J: 'Neurons with graded response have collective computational

properties like those of two-state neurons', Proc Nat Acad Sci USA, !,
pp 3088-3092 (May 1984).
4.
Hopfield J J: 'Neural computation of decisions in optimization problems',

Biological Cybernetics, 52, pp 141-152 (1985).
5.
Ali M M and Nguyen H T: 'A neural net controller for a high-speed packet
switch', Proc Int Telecommunications Symposium, pp 493-497 (1990).
6.
Troudet T P and Walters S M: 'Hopfield neural net architecture for switch

control', IEEE Int Conf on Neural Nets (July 1988).
7.
Brown T X: 'Neural networks for switching', IEEE Comm Mag, 27 , pp 72-81

(November 1989).
8.
Brown T X and Liu K: 'Neural network design of a Banyan network controller' ,

IEEE Journal on Sel Areas in Communications, SAC-8, No 8, pp 1428-1438
-(October 1990).
9.
Tagliarini G A and Page E W: 'A neural-network solution to the concentrator

assignment problem', Proc of the IEEE Conf on Neural Information Processing
Systems, American Inst of Physics, pp 775-782 (1988).
10. Brandt R D, Wang Y, Laub A J and Mitra S K: 'Alternative networks for solving
the traveling salesman problem and the list-matching problem', Proc of the IEEE
Int Conf on Neural Networks, 1.., pp 333-340 (July 1988).
II. Fang Land Li T: 'Neural networks for generalized assignment', Proc of 2nd
lASTED lnt Symposium, 'Expert Systems and Neural Networks', pp 78-80
(August 1990).
REFERENCES
83
12. Li T and Fang L: 'A comparative study of competition based and mean field
networks using quadratic assignment', Proc of 2nd lASTED Int Symposium,
'Expert Systems and Neural Networks', pp 81-83 (August 1990).
13. Munkres J: 'Algorithms for assignment and transportation problems', J Soc Ind
Appl Math, ~, pp 32-38 (1957).
14. Eberhardt S P, Daud T, Kerns D A, Brown T X and Thakoor A P: 'Competitive
neural architecture for hardware solution to the assignment problem', Neural
Networks, -.i, pp 432-442 (1991).
15. Bousono C and Manning M: 'The Hopfield neural network applied to the
quadratic assignment problem', to appear in the Neural Computing Applications
Journal.
16. Moopenn A, Duong T and Thakoor A P: 'Digital-analog hybrid synapse chips
for electronic neural networks', in Touretzky D (Ed): 'Advances in Neural
Information Processing Systems 2', Morgan Kaufman Publishers, pp 769-776
(1990).
17. Duong T, Eberhardt S P, Tran M, Daud T and Thakoor A P: 'Learning and
optimization with cascaded VLSI neural network building-block chips', Int Joint
Conf on Neural Networks, 1-, pp 184-189 (June 1992).
18. Adjali I and Gell M A: 'Self-organization in open computational systems', Phys
Rev E, 49, (5-A), pp 3833-3842 (1994).
19. Gell M, Fernandez-Villacanas J L, Adjali I, Manning M and Amin S: 'Selforganization and markets inside communications', Proc of 6th Annual Conf on
Neural Networks, Genetic Algorithms and Chaos Theory: Intelligent Financial
and Business Systems, London (February 1994).
20. Amin S J, Olafsson Sand Gell M A: 'Constrained optimization for switching
using neural networks', Proc of the Int Workshop on Applic of Neural Networks
to Telecomms, INNS Press, pp 106-111 (1993).
21. Aiyer S V, Niramjam M and Fallside F: 'A theoretical investigation into the
performance of the Hopfield model', IEEE Trans on Neural Networks, 1-, No
2 (June 1990).
22. Manning M Rand Gell M A: 'A neural net service assignment model', BT Technol
J, 11, No 2, pp 50-56 (April 1994).
23. Hertz J, Krogh A and Palmer R G: 'Introduction to the theory of neural
computation', Santa Fe Institute, Addison-Wesley (1991).
HIERARCHICAL
MODELLING
M A H Dempster
6.1
INTRODUCTION
Management science has for over thirty years been concerned with
mathematical and computer models at the macro, meso and micro levels of
detail to support corporate decision-making in planning, management and
control, which reflects the classical three-level military hierarchical planning
concepts of strategic long-run, tactical medium-run and operational shortrun [1] (see Table 6.1). As planning moves down the corporate hierarchy
it becomes increasingly detailed and involves shorter timescales and many
more, but smaller, uncertainties. The mathematical modelling involved at
successively lower levels reflects these differences - paralleling the macro,
meso and micro-scale mathematical models of classical physics (e.g. see
Woods [2]) - increasing in complexity at each level (see Table 6.2). In a
stationary corporate environment, operational planning models - involving
mainly management and control functions - can become extremely complex.
In a highly dynamic uncertain environment, useful mathematical and
computer models tend to become simpler, as it is the strategic and tactical
decisions involving rarer major uncertainties which are critical for survival.
Over the last two decades, supported by rapid technological advances in
computing and telecommunications, complex corporate information systems
have developed, involving multiple decision-support systems at each level,
which, taken together, are referred to as hierarchical planning systems [3].
INTRODUCTION
85
Table 6.1
Planning, management and control hierarchy after Anthony [1). Strategic and
tactical levels handle complexities and uncertainties by aggregation in a stable environment in
order to focus on rare major environmental change.
Lead
Time
Level 1 Strategic
Level 2 Tactical
Level 3 Operational
Table 6.2
Cost
t t
Uncertainty
Macro
Meso
Micro
Three-level queuing network traffic approximations parallel on the network the

three-level approximations of classical physics in free space.
Physics
MACRO
Complexity
Fluid flows
High density
Irreversible
Queuing Networks
Reflected
Fluid
Flows
Euler (incompressible)
Navier-Stokes (compressible)
PDEs
MESO
Kinetics
Medium density
Partly reversible
Reflected
Brownian
Motions
Poisson, Boltzmann
Blazov PDEs
MICRO
Particle dynamics
Low density
Reversible
Standard
Discrete
Event (flow)
Processes
Many-body ODEs
About a decade ago it was proposed that the behaviour and performance
of such complex human-computer systems could be evaluated in terms of
relatively simple three-level stochastic optimization models [4, 5] (an idea
originally introduced in the context of military logistics by Dantzig in the
1940s). This has subsequently been demonstrated in the fields of
manufacturing and distribution.
A recent text [6] on integrated voice-data telecommunications network
design discusses the three-level corporate planning hierarchy in the context
of private corporate network design, management and control, providing
mathematical models at each level of detail and recommending the establishment of a corporate network team concerned with tasks at each level. In this
chapter tentative steps are taken towards appropriate 3-level stochastic
optimization models for understanding and designing integrated hierarchical
planning systems in the telecommunications industry. In this context, it should
86
HIERARCHICAL MODELLING
be noted that, while advance in telecommunications technology undoubtedly

appears exponential, this advance has been mainly reflected in rapid change
in range of service provision, while the large telecommunications firms who
dominate the world market face a more-or-less stable-growth environment.
Moreover, seventy years of deterministic and stochastic mathematical
modelling and performance analysis for telecommunications, and more
recently computing, is available for use in constructing hierarchical models.
This chapter will loosely focus on the three-level hierarchy (involving
mainly the transport and network layers of the seven-level OSI architecture
shown in Fig. 6.1) concerning what at the corporate level has been termed
[6] the tactical and operational network planning cycles - namely, the
successively lower-level optimization problems of network topology design
(and enhancement), switching node and link capacity provision (and
enhancement), and traffic management (involving flow control, routeing,
resilience and re-routeing in the event of failures). At each level it should
be noted that the relatively stable-growth demand environment for the
telecommunications industry dictates an optimal incremental enhancement
approach to all levels of planning, management and control over the network
r------------,
host
virtual network service
virtual session
trartlc Inputs and outputs
------.J..r:;;;;;;;;;;;;;;;l
virtual link for end 10 end messages
physical link
external
site
subnet
node
network
subnel
external
node
site
r-----------t---------------------------------------------------------t-----------t"
L
.J
.J
Fig. 6.1
Open systems interconnection (OSI) seven-layer architecture (7). Each layer presents
a virtual link to the next higher layer and data-flow rates and planning lead times vary directly
with depth.
THREE-LEVEL TRAFFIC APPROXIMATION
87
life cycle. Currently, queuing models are used to describe the network, while
deterministic optimization methods are the primary tools in both design and
routeing problems. The purpose of the hierarchical models proposed in this
chapter is to perform integrated optimization with random elements
represented at each level at an appropriate level of detail. To this end, the
next section outlines recent mathematical work on successively aggregated
meso and macro approximations of micro-level queuing networks - the
traditional tool of telecommunications performance engineering. Building
on the familiar discrete-event stochastic processes of queuing theory by
appropriate rescaling of time and aggregation of events, reflected Brownian
motions and deterministic fluid flow processes are involved (or, in the case
of bursty traffic, Markov modulated fluid stochastic processes) (see Table
6.2). In section 6.3, the recent (functional) central limit theory for stochastic
processes which yields these results is briefly outlined. These ideas are applied
to some preliminary two- and three-level planning models in section 6.4. In
the final section of the chapter a few directions for future research and open
mathematical problems are outlined, whose pursuit and solution would
contribute to the applicability of the approach introduced here to integrated
network planning for future multi-point, multimedia and multi-rate
connections as discussed, for example, in Hui [8].
6.2
Following Chen and Mandelbaum [9], an open (i.e. exogenous input)

network model is considered on an arbitrary graph G(N,A) of nodes (with
infinite buffer capacity) and (directed) links, with random routeing of particles
(calls, packets, messages) from each node on its output links to each other
node (see Fig. 6.2). (Similar results are available for closed networks.)
Classical sources for queuing network models are Kelly [10, 11] and
Kleinrock [12]. More recently consideration has been given [7, 8, 13] to
the application of such a generalized Jacksonian network to performance
analysis of packet-switched networks - both switching (defined with sets
of exogenous input-only and exogenous output-only nodes) and
communication (all nodes with exogenous and endogenous inputs and
outputs) - under Poisson input processes, negative exponential service times
and the Kleinrock independence assumption (packet arrival times between
node buffers are uncorrelated).
Here, arbitrary independent identically-distributed exogenous input interarrival and service-time processes are allowed at each node assuming only
the existence of a dynamic equilibrium (long-run) limiting (process)
distribution for the system, represented by an exogenous arrival rate 'A?, a
88
Fig. 6.2
Open queuing network model with independent identically-distributed inter-event
exogenous input and potential service processes and infinite queue buffers.
(potential) service rate p.j and the switching fractions Pjk of particles that are
routed directly to node k on link (j,k) after service at node j, on each of
the J: = INI nodesj in the network. The row v~.ctor/matrix triplet (AD, p.',
P) specifies the long-run average performance 6f the system and the (total)
inflow vector A' of (total) arrival rates Aj at nodes j = 1, ... ,J is the maximum
solution of the traffic equations:
A' = Ao
+ (A' /\
... (6.1)
p.')p,
where /\ denotes co-ordinatewise minima. Since the network is open, the

routeing transition matrix P has spectral radius a(P) < 1. (A closed network
has a(P) = 1 and Ao : = 0 ' .) The traffic intensity at node j is defined
to be Pj : = A/ p.j and node j is termed a strict bottleneck if Pj> 1, a nonbottleneck if Pj < 1 and a balanced (bottleneck) node if Pj = 1. Denote the
sets of non-bottleneck, balanced and strict bottleneck nodes as ex, (3 and l'
respectively. Let:
Q' := [Q'(t):= (Ql(t), ... ,QJ(t: t~OJ
denote the (integral) equilibrium queue length process representing the

network node occupancies (buffers and servers). In accordance with intuition,
it turns out that the (meso) diffusion and (macro) fluid approximations of
this process are supported on the sets {3 and l' of bottleneck nodes.
Moreover, it may be shown that Q may be expressed in terms of the
potential net throughput (or netput) process:
X' : = [X' (t) : = (Xt(t), ... ,XJ(t) : t ~
OJ
89
which represents the difference between the equilibrium cumulative input and
potential output processes (and which, together with P, completely specifies
the system) and the equilibrium lost output (due to empty nodes) process:
Y' : == [Y'(t) : ==(Y1(t), ... ,YJ(t : t ~
OJ
Indeed, given X' and P, the pair (Y' ,Q') is the unique solution of:
Q' ==
X'
Y' (I-P)~O'
... (6.2)
(where 0' is the process which is identically zero), for which Y' is nondecreasing with Y'(O) : == 0' and has co-ordinates Yj(t)~O, when QiU) ==
0, j == 1, ... ,J. These relations may be expressed as the statements that, given
X' and P', the process Y~ 0 is the unique non-negative solution of the
abstract order complementarity problem [14] defined by equation (6.2) and:
[X'
Y'(I-P)] 1\ .:lY' == 0'
... (6.3)
where Q' is defined by equation (6.2) and Y' == E1j:s(") .:lY' (ti), Le. the sum
of the jumps .:lY(tjO of the process Vat jump epochs tj up to time t. The
lost output process Y' is termed the regulator of the queue-length process
Q' and it follows that it is the least element process satisfying the inequalities
in equation (2) and non-negativity.
Aggregating numbers of particles by the scaling, e.g. X~/fit and
accelerating time by the scaling nt, the queue length process Q' emerges in
the (almost sure) limit as reflected Brownian motion (RBM) [15] (from a
functional central limit theorem for stochastic processes) regulated by a
suitable increasing (local time) process Yand driven by a Brownian motion
X (see Fig. 6.3). Specifically, the processes (X, Y, Q) represent the asymptotic
heavy-traffic diffusion approximations to the (X, Y, Q) processes of this
queuing network as the exogenous input rates ~3', service rates j.tn' and
initial potential throughputs X~ (0) increase at rate fit as n- 00. When
numbers of particles are more heavily aggregated by the scaling, e.g. X~/n
with the same time acceleration nt, the queue length process Q' emerges in
the (almost sure) limit as deterministic fluid flow (from a suitable functional
strong law of large numbers for stochastic processes) regulated by a
deterministic increasing process Y' and driven by the deterministic expected
potential net throughput process:
X':== [X'(t):== (~'-j.t')t: t~OJ
... (6.4)
90
heavy traffic diffusion approximation

(MESO)
aggregate particles by the average

nt
accelerate time by the scaling
functional central limit theorem ~
AAA
almost surely
(X, Y,a,)
(X, Y, a,)
inc~ea~
Brownian motio!
reflected Brownian motion
deterministic fluid approximation

(MACRO)
aggregate particles by the average

accelerate time by the scaling
nt
X In /
n
>
functional strong law of large numbers

(X, Y, a,)
almost surely
.
'h
/
expected potential throug put
X:
=(A.. J.l ) (.)
6<, Y, a,)
\.~
Increasing " "

reflected fluid flow
bursty traffic approximation

(MESOMACRO BOUNDARY)
Markov modulated fluid systems
(x.Y:'0': )
Fig. 6.3
Successive approximations to discrete event (particle) processes are obtained from

functional central limit theory by averaging arrivaldeparture (particle) events on an accelerated
time scale.
In this case the aggregated processes (X, Y, Q) represent the deterministic

fluid approximations to the original stochastic processes (X, Y, Q) as
stochastic fluctuations become negligible. In the case of both the meso
diffusion and the macro fluid approximations, given X (respectively X) and
P, V (respectively Y) represents the unique solution to the analogue of
equation (6.3) and is the unique non-negative least element solution of the
analogue of the inequality equation (6.2). It is also interesting to note the
intuitive result that both queue length approximations QI and Q' are nonzero only on the sets of balanced nodes {3 and of strict bottlenecks "y. (On
"Y the fluid approximation Q' builds up indefinitely; recall buffers have infinite
capacity.)
More specifically, the diffusion approximation to equation (6.3) is given
by:
[X' + V'(I-P)] t\ d V' =0 ' ,
... (6.5)
91
where y l is the non-negative increasing local time process [15] of the

Brownian motion netput process X' with respect to the boundary of the nonnegative orthant of IR J The lost output process Y has singularly continuous
sample paths Y with dY/dt = 0 for Lebesque almost all t ~ 0, so that equation
(6.5) must be interpreted (as for stochastic differential equations) in terms
of the Ito integral as:
'00
,\ 0
[X' + Y'(I-P)]dY=O
In the fluid approximation we have analogously, but more simply:

[X' + Y'(I-P)]I\Y' = 0'
... (6.6)
where Y' (t) = t(f-t' - A' )1\0 Hence the time derivative Yj (t) of the )th coordinate of the lost output process is identically zero unless f-tj > Aj, )Ea,
when the queue at) is identically 0 due to the fact that instantaneous outflow
rate exceeds instantaneous inflow rate. Formally, to apply the theory of
Borwein and Dempster [14] the sample paths of the processes (X, Y, Q),
(X, Y, Q) and (X, Y, Q) may be considered as elements of suitable spaces of
functions of time - respectively, left-limited right-continuous functions,
continuous functions and continuously differentiable functions.
From a geometric point of view, the resulting order-complementarity
problems represent abstractly the dynamical situation in which the appropriate
queue length process Q' evolves in the interior of the non-negative orthant
of IR J exactly as the corresponding potential throughput process X'
(representing all network nodes occupied) until it hits one or more lower
dimensional faces of this cone (representing empty nodes), when the
appropriate regulating lost output process y ' acts minimally to reflect it back
(in directions dictated by the Leontief matrix [ - P [14, 16]) into the interior
of the non-negative orthant (see Fig. 6.4).
I.
node 3
II.
II.
X(O)=Q(O)
II.
II.
Q(t)=X(t)
node 2
node 1
Fig. 6.4
Geometric representation of reflected Brownian motion (RBM) for a three-node

network showing regulation at empty nodes,
92
In order to represent bursty network traffic processes on the meso-macro

interface, the analogous set-up can be applied to a Markov modulated fluid
system (X, Y, Q) in which the driving potential throughput process X involves
deterministic fluid rates AI> J-tl with independent negative exponential holding
times with means p,1 in fluid states I which at state release events are
switched between with probabilities P/(PI + ... pd, I = 1, ... ,L. More
generally, X may be taken to be a piecewise deterministic Markov process
involving an arbitrary (possibly controlled) deterministic dynamical system
punctuated by (possibly controlled) random jumps [17, 18].
6.3
CENTRAL LIMIT THEORY FOR TRAFFIC PROCESSES
The purpose of this section is to give a short intuitive explanation of some

of the technical concepts alluded to in the previous section (more details and
further references may be found in Chen and Mandelbaum [9] and Harrison
[15 J).
It should be recalled that classical central limit theory treats the variants
of three powerful results concerning sequences [Xn : n = 1,2, ...] of
independent identically-distributed random (vector) variables under suitable
technical conditions (see Fig. 6.5). The celebrated strong law of large numbers
states that their sum Xl + ... + X n converges almost surely (in the sense
of almost all realized sample paths) to the deterministic random variable nlEX
as n- 00 (where lEX denotes expectation of the common distribution);
asymptotically, statistical fluctuations of a large sum of independent random
quantities cancel out. The central limit theorem states that the sum Xl +
... + X n has asymptotically a Gaussian (or normal) distribution with mean
nlEX and (co)variance (matrix) vnvarX (where varX denotes the (co)variance
(matrix) of the common distribution). The large deviation theorem states that
for la - lEX 1 >0 asymptotically:
p(IXI
+ ... + X n
nal >O)==O(e-nl(a),
... (6.7)
where:
I(a) : = sup (ea -loglEe 8X j

8
is an increasing concave function of the (vector) parameter a and 1.1 denotes

modulus (norm). Intuitively, the probability of a rare (tail) event is decreasing
exponentially in the parameters n and a and occurrences of the rare event
in the discrete time sum process [Xl + ... + X n : n = 1,2, ...] look like a
Poisson process for large n.
CENTRAL LIMIT THEORY FOR TRAFFIC PROCESSES
functional
theory
classical
theory
MICRO
{X n} n = 1,2,
{X n} n = 1,2,...
independent identically
distributed random variables
EX
=J.!
var X
n-oo
93
continuous time stochastic

processes
common mean function
EX (t) =J.!t
(t ~ 0)
=(J2
central limit theorem
"-
drift
almost all sample paths
MESO
a.s. \
X 1+..+X n -
2
BM(J.!, (J )
(J.! ' (J2) Brownian motion
extreme value
I =0 (e-nl(a))
P {X 1 + ... + Xn>na}
extreme values of BM (J.!,(J2)

negative exponentially
distributed with Poisson
occurence
I (a) := sUPe{ea - log Ee ex}
n-oo
MACRO
Fig. 6.5
strong law of large numbers
~dete@~
a.s.
nEX
X 1 + .... + Xn
a.s.
nEX
Central limit theory for traffic processes.
Functional central limit theory replaces the independent identically

distributed sequence X of random variables by a sequence of not necessarily
independently distributed continuous-time stochastic processes X n : =
(Xn(t) : t ;?: OJ with common mean function IEXn(t) : = (IEXn(t) : t ;?: OJ
to conclude similar asymptotic results for almost all sample paths of
the processes as n-oo (see Fig. 6.5). Here, for the superposition Xl + ...
+ X n of the underlying processes the mean function nIEXn is the asymptote
and Brownian motion - a Gaussian stochastic process - plays the role of
94
the Gaussian distribution in the central limit theorem. The large deviation
result (equation (6.7)) is represented by the negative exponential distribution
of the extreme values of the asymptotic Brownian motion and the Poisson
process nature of their occurrence over time.
A standard (vector) Brownian motion (or Wiener) process W has
continuous sample paths, a Gaussian state distribution N(O,tI)) for W(t),
t ~ 0, and stationary independent increments, i.e. the distributions of the
increments W(t)- W(s) depend only on t-s and the random (vector)
variables:
are independent for any n ~ 1 and 0 $ to < ... < t n < 00. The sample paths of
the Wiener process, although continuous, are extemely erratic (technically,
they are of unbounded variation and at almost all points of time they do
not possess a derivative). A process X is a (vector) Brownian motion with
drift IJ- and (co)variance (matrix) (J if it has the form:
X(t) = X(O)
IJ-t
(JW(t)
... (6.8)
where X(O) is independent of the Wiener process W. The properties of

continuous sample paths and stationary independent increments characterize
the Brownian motions, so that their Gaussian nature is a consequence of these
properties. More generally, a diffusion satisfies a generalized differential
version of equation (6.8) given by the formal stochastic differential equation:
with X(O) as in equation (6.8). A diffusion is a Gaussian (Markov) process

but no longer has independent increments due to the dependence of the drift
and variance on the current state of the process.
6.4
SOME SIMPLE HIERARCHICAL NETWORK PLANNING

MODElS
This section briefly outlines an approach to 3-level hierarchical stochastic

optimization models for the network design problem mentioned in section
6.2 - involving, successively, network topology design, link capacity
provision and routeing - in a (temporarily) stable random demand
environment. Such models take the general form:
HIERARCHICAL NETWORK PLANNING MODELS
95
... (6.9)
Here Xl is a (usually exponentially large) set of integer vectors Xl

representing possible network link allocations at total network implementation
cost g(Xl) on a geographically located set of potential network nodes and
X 2 is a similar set of integer vectors X2 representing the amount of capacity
allocated to the links of the network topology specified by the topology design
parameters Xl at cost g(Xl,X2)' The random vector d represents the stationary
state distribution of demands for traffic bandwidth between point-to-point
(potential) node pairs in the network over the planning period, which are
served by bandwidth allocated at the third level, and h: = h(Xt>X2' d) is the
random cost of unserved requests with network topology specified by Xl and
link capacities specified by X2' Random cost variables and even randomly
varying link-capacity allocations make no essential conceptual change to the
model (6.9).
A practical private line network planning problem involving requests for
DSI 1.5 Mbit/s capacity bandwidth has been treated by the Bell System as
a special case of the second and third levels of the model (6.9) under
simplifying assumptions [19]. The topology of the centrally-controlled
switched transport network is fixed in terms of remotely programmable digital
crossconnect systems (DCS), add-drop multiplexers (ADM) and a centralized
bandwidth manager which controls the crossconnects of these devices. A
connection is a OS 1 signal successfully transmitted between an origin and
destination DCS switch pair, and, when a customer makes a request for
bandwidth, the bandwidth manager performs the routeing function of seeking
and allocating a sequence of DCS switch connections to serve the demand.
Whenever a request is successfully routed, bandwidth is allocated to it until
a disconnect request is received. From time to time the bandwidth manager
will re-route ongoing calls for current network utilization efficiency.
Specifically, representing the installed network topology by the directed
graph G(N,A) with (switch) nodes n EN and (transport) links aEA, the second
and third level implementation of the three-level model is:
min
LE[h(x,d)]
... (6.10)
X2:0
S.t.
EXa -s.b,
aEA
where h represents the total number of unserved requests, b is the total

capacity in DSI units which can be allocated across network links, and the
natural integer restriction on X has been dropped in view of the high value of b.
96
The third level model is a version of the stochastic multicommodity

network flow problem of the form:
h(x,d) : = min E
Sw
wEW
s.t. E f p $. Ca +
pEQ.
pEP w
fp +
aEA
Xa
Sw =
dw
wE W
... (6.11)
... (6.12)
Here the (routed) flow f p is the random stationary state number of OSl
connections routed by the bandwidth manager using path pEPw' the set of
allowable paths (restricted to three in the application) associated with the
origin-destination (00) (node) pair wE W, Qa is the set of paths utilizing link
aEA and Sw is the random number of unserved OSl requests from the total
random demand dw associated with the pair w. The inequality (6.11) is the
(almost sure) capacity constraint involving the current embedded link
capacities Ca and the second stage decision variables of additional allocated
capacities X a , aEA, while equation (6.12) represents the demand constraints
with (almost sure) non-negative slacks sw' wE W, which drive the entire
planning process.
Although not stated this way, the demand d vector is modelled as a
stationary Markov modulated fluid state variable with equiprobable
independent rates on each link. Rate estimates come from Kalman filtering
of actual network traffic and involve 5-10 rates on 10000 pairs, leading
to an astronomical number 5 100_10 100 of network demand states. This is
beyond the range of current (and probably future) numerical algorithms [20]
for solving explicitly the complete certainty equivalent form of the two-stage
recourse problem (6.10) - even when the requirement of integral flows is
dropped. Hence, an iterative algorithm, termed stochastic decomposition
[21], combining Benders' decomposition with network state sampling, has
been employed to solve problem (6.10) in the sense of providing tight
confidence bounds (see also Oantzig and Infanger [22]) on expected unserved
requests. This solution was also validated by dynamically routed simulations,
all in reasonable computing times on contemporary UNIX workstations.
On the other hand, the full three-level model is a new order of
computational difficulty, even with continuous variable assumptions, without
considerable further simplifying analysis, which remains to be done.
97
Advance in this direction is contained in a recent proposal [23] for

capacity, virtual path and virtual channel allocation in an asynchronous
transfer mode (ATM) network. The ATM protocols - currently under
development (see Hui [8, 24] and Medova [23] for more details and further
references) - implement broadband integrated services digital network (BISDN) requirements in support of a wide range of audio, video and data
applications in the same network (see Fig. 6.6). Information transmitted in
an ATM network is packetized into 48-byte cells possessing an additional
5-byte header which contains virtual path and virtual channel identifier
information. Network functions may be hierarchically layered upwards from
the cell layer at increasingly slower time scales (see Fig. 6.7). Call admission
and traffic management in an ATM network is expected to exploit statistical
multiplexing and is associated with some form of user contract. A particular
feature of ATM networks is a committed grade of service (GoS) per
connection, regardless of what other traffic is being carried, so that a
connection appears as a dedicated circuit established at call set-up stage
between an origin-destination (switch node) pair. Call set-up can be
implemented by the assignment of a virtual path and the allocation of a
number of virtual channels of fixed nominal bandwidth. Actually, of course,
a call requires, at peak, at least its total assigned bandwidth, and statistical
multiplexing across calls of various types efficiently utilizes total link
capacities.
It is therefore not surprising that a more complex version of the previous
model involves chance or probability constraints of the form:
... (6.13)
where gw (: = 10- 3 say) is the probability of random cell flow Iw : =

EpEP w /p from all calls routed along allocated virtual paths pEPw between the
OD pair wE Wexceeding the effective bandwidth C w: = Cw(N~lI ,13~all)' Here,
Nc~1I is the maximum number of calls allowed by the network bandwidth
manager between an OD pair w in light of the requested traffic intensity
P~all : = "'callittcall in order to maintain gw and similar, but increasingly
stringent, GoS parameters gcal" gburst and gcell at lower network layers in
terms of call blocking, burst blocking and cell-loss probabilities respectively
[23,24].
The calculation of Nc~1I and the corresponding average effective
bandwidth per call Ccall : = Cwl Nc~1I to maintain the GoS probability gw
(and similar calculations at lower network layers) can be effected using
generalizations to suitable stochastic processes X of the large deviation result
98
network management centre
ATM network
customer premises
sources
muhiplexer
voice
data
video
voice
data
video
Fig. 6.6
ATM network.
layers
path
....
call
burst
cell
Fig. 6.7
jI
'
",.
0
0
C1
..............
0
h
"""n
traffic
state
GoS
"path
Npath
9path
"call
Ncall
9call
"burst
Nburst
9burst
"cell
Ncell
9cell
Hierarchical network layering by timescale in an ATM network [22].
(equation (6.7 [8]. One particularly simple proposal [24] involves a

3-layered Poisson-modulated Poisson process in which the state of a given
network layer at a fixed timescale is a Poisson process depending on the state
of the next higher layer, assumed to be quasi-static due to its slower timescale
(compare Fig. 6.9). This allows the use of the inverse of the standard Eriang
blocking formula to produce maximal states and corresponding effective
bandwidths (in say ATM cells/s) at each level. The method can be adapted
to heterogeneous Markov-modulated fluid call-type traffic processes by fitting
suitably scaled Poisson processes to the first two moments of the resulting
overall traffic processes at each network layer [8].
It follows from the above that, in order to maintain GoS at all levels of
the network, it suffices to replace total random cell flow fw and inequality
(6.13) by a deterministic number fw of virtual channels which meets the
effective bandwidth C w : = EpEPw Cp requirement (in virtual channel cell
99
capacity units) between the OD pair wE W. The result is a deterministic twostage planning model with second stage a classical multicommodity flow
problem involving link provision costs f3 a, rEA, and OD pair revenues rw '
wE W, namely:
min
s. t.
0:
aEA
- E
f3aCa
E f p)
... (6.14)
aEA
... (6.15)
Cw
wEW
... (6.16)
pEPw ,
wEW
... (6.17)
wEW
rw
E f p $Ca
pEQa
E fp
pEP w
fp
;::::
pEP w
When the number of virtual channels f p assigned to a virtual path p is

taken to be integral - given the effective bandwidths C w , wE W - the
second stage is a classical NP-hard multicommodity flow problem (unlikely
to be solved exactly in less than a time exponential in the number of network
nodes IA I and the number of OD pairs I Wi). On the other hand, when the
integral flow requirement is relaxed - as in the previous problem and as
is appropriate to the nominal nature of virtual channel capacities - the
relaxed problem becomes an easily solved linear programme for fixed effective
bandwidths Cw , wE W. The size of this deterministic equivalent (6.14)-(6.17)
of the original chance-constrained programme is very small relative to that
of the recourse formulation (equations (6.10)-(6.13 of the previous problem.
Expressions (6.14)-(6.17 define a linear programme of about 200 constraints
and 500 variables for the three-path 100 OD pair problem.) Of course, an
added computational overhead to this problem is the (off-line) calculation
of effective bandwidths C w for each set of OD traffic intensities Pw' wE W.
Since the number of edges and OD pairs is 0(n 2) for an n node network,
fast decomposition heuristics for solving problem (6.14)-(6.17) could be useful
for large networks and real time implementation. These iterate between a
determination of link capacities C a for fixed path flows f p , using shortest
path techniques, and a solution of the maximal revenue second stage
multicommodity flow problem for virtual path and virtual channel allocation
f p using the network simplex method to achieve the co-ordinated solution
[21]. For example, when using problem (6.14)-(6.17) for real time call
admission by the network bandwith manager, as a call-assigned effective
bandwidth Ccall between an OD pair wE W clears down, this bandwidth
would be available for one or more similar calls utilizing the same path
pEPw ' From time to time within a fixed traffic period, however, network
traffic rebalancing involving new calls (and possibly even calls in progress)
100
would be required by rapidly resolving problem (6.14)-(6.17) starting from

the solution to the previous instance. In practice, the existence of multiple
traffic periods over the network management cycle would require periodic
resolution of problem (6.14)-(6.17) to determine maximal revenue virtual path
and virtual channel assignments for traffic between all OD pairs - as would,
of course, network failures.
It is perhaps worth noting in conclusion that two-level hierarchical
optimization problems similar to expressions (6.14)-(6.17) have been
considered in Labourdette and Acampora [25] and Medova [26] in the
context of wavelength channel assignment and routeing in gigabit per second
lightwave networks with wavelength division multiplexing. At the first stage
of this flow and wavelength assignment problem [26] the virtual topology
of the network is set for a given traffic specification by the allocation of single
wavelength direct channels between a limited number of OD pairs. At the
second stage, all traffic is routed between OD pairs so as to minimize the
largest flow over the virtual links utilized. The network balancing optimal
flow assignment involves wavelength changes (hops) at transit nodes on virtual
routes between lower traffic level OD pairs. These considerations are
independent of the network physical fibre topology, which is expected to be
in the form of a bus, ring or coupled star employing optical transmitter/
receiver pairs at each physical node.
6.5
CONCLUSIONS AND FUTURE DIRECTIONS
This chapter treats two topics which are - at least in the author's view closely related. The first is a practical concern with the use of three-level
hierarchical models for integrated network planning; the second is a
mathematical concern with aggregating the flow of discrete network events
for use with more appropriate models at earlier higher levels of the planning
process, and providing a justification for the use of deterministic flow models
for network design.
Three-level hierarchical stochastic optimization models could help to aid
and understand, as an integrated whole, piecemeal complex computer-based
planning and management systems for future networks. This has been
tentatively demonstrated by the models of the previous section.
Progress towards this lofty goal would be aided by rigorously extending
the results of section 6.3 to queuing networks with finite node buffer
capacities, when problem (6.2)-(6.3) becomes the order complementarity
problem:
REFERENCES
101
Q' A A Y' = 0'

[B' -Q']A A Z' =0'
O'~Y'
O'~Z'
O'~Q/:=[X'+(Y'-Z')(/-P)]~B',
where B' is a constant process representing fixed node capacities and Z'
represents the buffer overflow loss process. An optimization problem on a
single node for such a system is studied in Harrison [15].
Progress would also be aided by an extension of the model (6.9) or (6.10)
of section 6.4 to incorporate a dynamic third stage allowing non-stationary
network demand processes to illuminate network capacity expansion
planning. Efficient process path sampling and numerical optimization
procedures based on nested Benders' decomposition have yet to be designed
for such models, but progress in efficient simulation of diffusion processes
[27] is relevant to this endeavour.
In conclusion, it is clear that the application to telecommunications
network planning of multilevel stochastic optimization models is mathematically and computationally challenging. Hopefully, this chapter has also
indicated their potential as practical aids to future network planning problems
in the industry.
REFERENCES
1.
Anthony R N: 'Planning and control systems: a framework for analysis', Harvard

U Press (1965).
2.
Woods L C: 'The thermodynamics of fluid systems', Oxford U Press (1975).
3.
Dirickx Y M I and Jennergren L P: 'Systems analysis by multilevel methods:

with applications to economics and management', Wiley, New York (1979).
4.
Dempster M A H: 'A stochastic approach to hierarchical planning and

scheduling', in Dempster M A H, Lenstra J K and Rinnooy Kan A G H (Eds):
'Deterministic and stochastic scheduling', Reidel, Dordrecht, pp 271-296 (1982).
5.
Dempster M A H, Fisher M L, Hansen L, Lageweg B, Lenstra J K and Rinnooy

Kan A G H: 'Analytical evaluation of hierarchical planning systems', Operations
Res, 29, pp 707-717 (1981).
6.
Sharma R L: 'Network topology optimization: the art and science of network

design', Van Nostrand Rhienhold, New York (1990).
7.
Bertsekas D P and Gallager R G: 'Data networks', Prentice-Hall, Englewood

Cliffs, NJ (1989).
8.
Hui J Y: 'Switching and traffic theory for integrated broadband networks',

Kluwer, Norwell, Mass (1990).
9.
Chen H and Mandelbaum A: 'Stochastic discrete flow networks: diffusion

approximations and bottlenecks', Annals of Probability, 19, pp 1463-1519
(1991).
-
102
10. Kelly F P: 'Reversibility and stochastic networks', Chapter 8, Wiley, New York
(1979).
II. Kelly F P: 'Loss networks', Ann Appl Probability,
-.L,
pp 319-378 (1991).
12. Kleinrock L: 'Queueing systems', Vols I and 2, Wiley, New York (1975).
13. Molloy M K: 'Fundamentals of performance modelling', Macmillan, New York
(1989).
14. Borwein J M and Dempster M A H: 'The order complementarity problem', Maths
of OR, 14, pp 534-554 (1989).
15. Harrison J M: 'Brownian motion and stochastic flow systems', Wiley, New York
(1985).
16. Chen H and Mandelbaum A: 'Leontief systems, RBVs and RBMs', in Davis
M H A and EIliott R J (Eds): 'Applied stochastic analysis', Gordon and Breach,
New York, pp 1-43 (1991).
17. Davis M H A: 'Piecewise-deterministic Markov processes: a general class of nondiffusion stochastic models', J Royal Stat Soc, B46, pp 353-388 (1984).
18. Dempster M A H: 'Optimal control of piecewise deterministic processes', in Davis
M H A and Elliott R J (Eds): 'Applied stochastic analysis', Gordon and Breach,
New York, pp 303-325 (1991).
19. Sen S, Doverspike R D and Cosares S: 'Network planning with random demand' ,
Tech Report, Systems and Industrial Engineering Dept, University of Arizona
(December 1992).
20. Dempster M A H and Gassmann H I: 'Computational comparison of algorithms
for dynamic stochastic programming', Submitted to ORSA J on Computing.
21. Higle J L and Sen S: 'Stochastic decomposition: an algorithm for two-stage linear
programs with recourse', Maths of OR, 16, pp 650-669 (1991).
22. Dantzig G Band Infanger G: 'Large scale stochastic linear programs: importance
sampling and Benders' decomposition', Tech Report SOL91-94, Dept of
Operations Research, Standford University [to appear in Ann of OR] (1991).
23. Medova E A: 'ATM admission control and routeing', Internal BT technical report
(December 1993).
24. Hui J Y, Gursoy M B, Moayeri N and Yates R D: 'A layered broadband switching
architecture with physical or virtual path configurations', IEEE J on Selected
Areas in Communications, .2.-, pp 1416-1426 (1991).
25. Labourdette J-F P and Acampora A S: 'Logically rearrangeable multihop
lightwave networks', IEEE Trans Comms, 39, pp 1223-1230 (1991).
26. Medova E A: 'Network flow algorithms for routeing in networks with wavelength
division multiplexing', Proc lith UK Teletraffic Symp, Cambridge (1994).
27. Newton N J: 'Variance reduction for simulated diffusions', Tech Report, Dept
of Electronic Systems Engineering, University of Essex (1992).
7
GRAPH-THEORETICAL
OPTIMIZATION METHODS
E A Medova
7.1
TELECOMMUNICATIONS NETWORKS AND GRAPHTHEORETICAL MODElS
Communications networks of any kind - from early telegraph and circuitswitched telephone networks to future integrated broadband networks - are
represented most naturally by a graph G(V,E), where vertices, or nodes, of
Vare essentially switches (telephones or computer terminals) and the edges
or arcs of E are the transmission links. Classification of networks, for example
into local area networks (LANs), metropolitan area networks (MANs) or
wide area networks (WANs), will result in a change of the technical definitions
of network nodes and their geographical coverage, but the graph representation preserves the concepts of 'interconnectivity' and 'reachability' in terms
of existing paths leading from anyone node to any other node. This is the
precise reason why graph-theoretical methods are of great importance for
design and routeing in telecommunications networks.
Graph theory has its own extensive vocabulary which differs slightly from
author to author. A knowledge of this theory is important since solutions
of graph problems based on intuition can be misleading and a slight change
of graph structure can turn a problem to one that is computationally
intractable. Although there have been many applications of graph theory
to network design and analysis over a long period, probabilistic analysis and
Erlang traffic theory prevails over it as a basic tool because of tradition and
the educational background of communications engineers.
104 GRAPH-THEORETICAL METHODS
The intention of this chapter is to introduce a minimum number of

definitions and to briefly review some of the principal concepts and graph
models which are directly applicable to the solution of practical
communications problems.
Formally, a graph G(V,E) consists of two sets of objects called nodes
(or vertices) and links (or edges), given as an unordered pair of nodes. For
a directed graph an arc (i,j) is viewed as an ordered pair. An arc (i,j) is called
outgoing from node i and incoming to node j and is to be distinguished from
the pair (i,i). A vertex i is adjacent to a vertex j in the vertex set V(G) if
(i,j) is an edge in the edge set E( G) and the edge (i,j) is incident with the
vertices i and j. The standard algebraic representation of a graph is given
by either the IV I by Ivi vertex adjacency matrix or the Ivi by IE I vertexedge incidence matrix, where, for example, Ivi denotes the number of
vertices.
The adjacency matrix of a directed graph is given by (see Fig. 7.1):
e.. . =
IJ
I if Vi and Vj are adjacent

0 otherwise
V':8r
e2
e3
V4
V3
e4
eS
Fig. 7.1
V, [0
V1
V2
V3
M: = V2 0
V3 0
V4 0
V4
i]
(set mjj = 0)
Adjacency matrix of a directed graph.
The node-arc incidence matrix of a directed graph is given by:
mij : =
1 if node ~ is starting point of link ei

1 if node Vi is terminal point of link ei
o otherwise
For the graph of Fig. 7.1 the incidence matrix is given by:
TELECOMMS NETWORKS MODELS
M:=
VI2
V
V3
V4
el
e2
e3
e4
-1I
0
0
1
0
-1
0
0
1
-1
0
0
1
0
-1
105
es
Jl
The dimensions of a graph are given by the cardinality I vi of V, called

the order of G and the cardinality lEI of E, called the size of G. The degree
of a vertex i is the number of edges that are incident to i. A graph of order
N in which every vertex is adjacent to every other vertex is called a complete
graph. In a complete graph every vertex has the same full degree. A graph
in which every vertex has the same, not necessarily full, degree is called a
regular graph.
We define a path from a vertex i in G to a vertex} in G as an alternating
sequence of vertices and edges. A path is called simple if it contains no
repeated edges and no repeated vertices. A cycle is a path for which the start
and end vertices are the same.
A graph that contains no simple cycle is said to be acyclic. A graph is
connected if for each pair of vertices i and}, there is a path starting at i and
ending at}. A tree is a connected graph with no cycle. A subgraph which
is a tree, and which contains all the vertices of a graph, is called a spanning
tree of the graph.
As the complexity of modern telecommunications networks grows, their
representation will require new tools and three-dimensional graphics. An
example is the tiered hierarchical structure, shown in Fig. 7.2 [1]. It can
be seen that various special graphs occur repeatedly as basic elements of any
complex network configuration at different levels of presentation.
Two basic network configurations are of interest in the modern
telecommunications environment:
switching networks (see Fig. 7.3), Le. open acyclic (no cycles) networks
with N input nodes, N output nodes and at least N log N internal nodes;
communications networks (see Fig. 7.4), i.e. closed networks on a total

number of N nodes with a bounded node degree (number of links
connected to a node).
Fig. 7.2
Fig. 7.3
Logical tiered hierarchical network [I).
Two examples of a four-path 16x 16 Omega network:

(left) disjoint path network and its redundancy graph, and
(right) non-disjoint path network and its redundancy graph.
TELECOMMS NETWORKS MODELS
(a)
(b)
107
(c)
Fig. 7.4
Some possible topologies for communications networks:

(a) irregular mesh, (b) fully connected mesh, (c) ring.
Traditionally, a switched network is represented by the crossbar

representation (Fig. 7.5(a. Here the crosspoint (node) indicates the switch,
and incoming and outgoing arcs are the input and output terminals. Another
model for a switching network is a bipartite graph (Fig. 7.5(b. The vertex
set V for such a graph is the disjoint union of sets VI and V2 , and every edge
in E has the form (vI> v2), where VI belongs to VI and V2 belongs to V2. In
combinatorial optimization the problem of finding a minimum (maximum)
cost connection between sets VI and V2 is called the assignment problem.
In the bipartite graph representation of a switching network, input and output
terminals are nodes and potential switching is represented by an arc.
A matching M on a graph O(V, E) is a set of edges of E(O), no two of
which are adjacent. A matching determines a regular subgraph of degree one.
For rearrangeable (point-to-point) switching with the single path property,
each switching state may be represented as a matching subgraph of the
switching graph (Fig. 7.5(c, i.e. each vertex has degree one in the bipartite
subgraph representation. For example, this representation is used in the design
of multistage interconnected networks with the self-routeing property, which
do not suffer from the time and space bottlenecks existing in conventional
centrally controlled switching architectures [2].
Another type of network topology is a ring, particularly important for
local and metropolitan area networks. At present the ring topology is being
deployed in SONET (synchronized optical network) with much stress on its
survivability, i.e. ability to maintain all communications in the event of office
(switch) failure [3] . One class of graphs that solve the minimum cost reliable
network design problem for ring networks are the so-called circulants, which
we will define later (see Fig. 7.4 (b) and (c.
11
11
12
12
13
In
On
13
1
2
3
(a)
(b)
11
12
13
In
On
Fig. 7.5
7.2
Switching network representations.
MODELS FOR RELIABILITY AND RESILIENCE OF

TRANSMISSION NETWORKS
In studying the vulnerability of a communications network due to various

failures, the basic concepts of connectivity are defined in terms of the vertex
connectivity and edge connectivity, and the relation between the connectivity of a graph and the number of disjoint paths that exist between pairs
of vertices in the graph.
The vertex connectivity of a graph G is the minimum number of vertices
whose removal results in a disconnected or trivial graph. A connected induced
subgraph of G of maximal order is called a component of G. Thus, a
connected graph consists of a single component. A vertex whose removal
increases the number of components in a graph is called a cut-vertex.
Analogously, the edge connectivity of a graph is the minimum number
of edges whose removal results in a disconnected or trivial graph. An edge
whose removal disconnects the graph is called a bridge. Menger's theorem
and many other variations and extensions of this famous result show that
NETWORK FLOW MODELS
109
connectivity of a graph is related to the number of disjoint paths between

distinct points in the graph [4].
The disjoint paths problem is, given a graph G and p pairs of vertices
of G, to decide if there are p mutually vertex-disjoint paths of G linking the
pairs. If p is part of the input of the problem then this is one of the
computationally intractable problems, even when G is restricted to be planar.
If p is fixed, however, it is more tractable, and for p = 2 there is a simple
algorithm [5]. In practice, it is conceptually important to know which
problem would lead to search procedures growing exponentially with the size
of the problem, Le. which are effectively computationally intractable for large
networks.
Both problems - to calculate the connectivity of a graph, or to define
disjoint paths between vertices which are guaranteed by a given level of
connectivity - are special topics of research in graph theory. For illustration,
see McHugh [6], where an example of a connectivity model for reliable
transmission between processors of a distributed communications system is
given in the form of a 'Byzantine general problem'.
For a graph of connectivity one, 'depth first search' is used to find the
connectivity and to identify the connecting paths. For higher levels of
connectivity these problems are generally solved by using 'maximum flow
algorithms' which are described next in the context of routeing problems.
7.3
NETWORK FLOW MODELS
A principal consideration in the design of the network topology represented

by the network-directed graph concerns optimal or near-optical performance
of the network in terms of message flow.
Of particular relevance are models, problems and algorithms concerning
(message) flows along links of a network graph between specific nodes. Such
models are referred to collectively as (network) flow models. In (open)
switching networks, input nodes act as message sources and output nodes
as message sinks. In closed communications networks each node may act
as a message source or sink or as an internal (trans-shipment) node through
which messages flow to other nodes (possibly after temporary storage in a
buffer).
To formulate models for optimal design and routeing of telecommunications networks, firstly the chosen performance measure, which is usually
associated with traffic congestion, must be specified. Traffic congestion can
be quantified in terms of statistics of the arrival processes of the network
buffer queues. These statistics determine the distributions of queue length
and waiting time at each link. For example, for a data network, quantity
of service is usually assessed in terms of throughput of the network measured
in terms of the average number of 'packets' transmitted per unit time and
quality of service is measured in terms of the average delay per packet. The
basic underlying quantities are of course random variables whose averages
and other statistics are used for performance assessments. Analytical
expressions for such measures are usually not accurate and are very difficult
to use in optimization models.
Stochastic optimization is a challenging area for research, with very
interesting applications to telecommunications, as for example in dynamic
alternative routeing (DAR) [7,8,9] and the work on private networks of
Higle and Sen [10, II].
An alternative is to use deterministic optimization models and to measure
performance on a link in terms of (perhaps a fixed factor times) the average
traffic carried by the link, with the implicit stationarity assumption that the
statistics of the traffic entering the network do not change over the time period
being studied. This assumption is adopted here and the formulation of flow
models as in Bertsekas and Gallager [12] is described.
The traffic arrival rate h is called the flow on link (i, j) expressed in
data-units/sec, where the data-units can be bits, packets, messages, etc. The
objective function to be optimized is of the form:
E D(
...)
Ij Jij
(i,j)
... (7.1)
where Dij is a monotonically increasing cost function. If the system

behaviour satisfies the Kleinrock independence approximation and Jackson's
theorem, then each link cost function may be expressed as:
... (7.2)
where qj is the transmission capacity of link (i, j) measured in the same

units as /;j and d ij is the processing and propagation delay. Another cost
function frequently used is:
D..(
...) = max [Jij
"/e]
Ij Jij
Ij
... (7.3)
Le. maximum link utilization.

To formulate the optimal routeing problem the following data is required:
W the set of all origin-destination (00) pairs w = (i, j) of distinct nodes i
and j;
P w the set of all directed paths connecting the origin and destination nodes
of the 00 pair w;
NETWORK FLOW MODELS
Xp
III
the flow (data-units/sec) along path p;
rw the (stationary) input traffic arrival rate, measured in data-units/sec,

entering the network at node i and destined for node j.
The routeing objective is to divide each rw among the many paths from
origins to destinations in such a way that the resulting total link-flow pattern
minimizes the cost function. Figures 7.6 and 7.7 illustrate this model.
Fig. 7.6
Example of routeing for WI = !1.6] 00 pair origin for aD pair w2
lr
= 11.4.5.6].
optimum path P w
I
w2
destination
for aD
pair w1
destination
for aD
pair w2
Fig. 7.7
Example of routeing for two 00 pairs -
two 'commodity' flows [(I.6).{2.5)J.
Therefore the optimization problem can be written as:

minimize E D ij (E
(i,i)
subject to E
x p)
... (7.4)
all paths p .
containing (I,j)
Xp =
rw for all wE W
pEPw
Xp ~ 0
for all pEPwand wE W.
Early formulations of flow models were given by Kleinrock [13] as

the 'capacity assignment', 'flow assignment', and 'capacity and flow
assignment' problems for the design of networks. These results and the
discussion of their application to the Arpanet still represent the most
significant work in this area, with much subsequent work along the same
lines. In the above formulations, nonlinear optimization techniques are
used on networks with a mesh topology (assumed without loss of
generality since non-existent arcs may be assigned very high costs).
Network optimization problems related to cost minimization or profit
maximization are often linear optimization problems. A linear program
is the problem of minimizing or maximizing a linear function subject
to linear constraints, where these constraints may include both inequalities
and equalities and the unknown variables may include both non-negative
variables and variables that are unrestricted in sign. When all variables
and constraints are required to be integers, the problem is called an integer
program. The linear programming specifications of the most important
problems for network design and routeing are given below (detailed
explanations can be found elsewhere [15-19]).
Trans-shipment problem - the problem is to determine a network flow

of minimal total arc (link) cost which satisfies conservation of flow (i.e.
inflow + supply = outflow + demand) at each node (see Fig. 7.8).
N(V,A)
Maximum flow problem (MF) - for the single commodity flow problem
consider the following notation:
NETWORK FLOW MODELS
113
iij
amount of flow from node i to node i

unit cost of flow from i to i
aj amount of flow required at node i:
Cjj
aj > 0
supply node
demand node
aj = 0 trans-shipment node.
aj < 0
A balanced network requires Eaj = 0, i.e. total supplies meet total

demands, and a flow of maximum amount is to be determined.
For the network given by a directed graph G(N,A), the next problem is
a generalization of MF.
Minimum cost-flow problem -
min E cIjJjj
f'..
(i,j)EA
s.t. E iij
j
- E fji
j
aj, iEN (flow conservation)
(bounded link capacity)
iij ~ 0
(integer),
(i,i) EA
... (7.5)
A solution is being sought for the constraints which will yield an extreme
value (minimum) of the objective (cost) function. When all costs Cjj are set
to -1, the problem becomes equivalent to MF.
The main idea of the primal cost improvement solution method [14] is
to start with a feasible flow vector and to generate a sequence of other feasible
flow vectors, each having a smaller primal cost than its predecessor. If the
current flow vector is not optimal, an improved flow vector can be obtained
by pushing flow along a simple cycle C with negative cost, where C+ and
C- are the sets of forward and backward arcs of C. The simplex method
[20] for finding negative cost cycles is the most successful in practice and
it can also be used to give the proofs of important analytical results concerning
graph algorithms for network flow problems.
It can be shown that a basic feasible solution B of the flow conservation
constraints corresponds to a subgraph NB which is a spanning tree of the
network represented by G. This is the principal result which relates the simplex
method of linear programming and graph-theoretical algorithms.
The network simplex method can in fact be used to solve a variety of
optimization problems such as assignment, transportation (both special cases
of trans-shipment involving bipartite graphs), and capacitated network flow
problems (a feasible flow is bounded by the capacities of the links, which

are represented by the additional capacity constraints). This method can also
be used to solve the shortest path problem and the maximum flow problem.
Other problems of practical importance are flow problems between
multiple pairs of origin-destination nodes.
Multiterminal maximum flow problem - this is the problem of finding the
maximal flows between all pairs of nodes in a network. Usually the solution
is obtained for one pair of nodes at a time. This leads to ( ~) MF solutions
and, for real time applications, needs good heuristics.
Multicommodity flow problem (MFP) - consider the global flow f and the
flow iij on each arc (i,i):
E };j
i=\ j=\
... (7.6)
where r is the number of commodities (corresponding to origin-destination

(00) pairs in a telecommunications network application). Let b~ be the
supply/demand vector of commodity k at node i. Then the problem becomes:
r
min E
c~J~j
k = \ (i,j)EA
S.t. E
f~j
[j :(i,j)EA)
r
E f~j
k=\
E f~i
[j :(i,j)EA)
Uij
b~ for all i,k (commodity flow conservation)
for all (i,i) (bounded total link capacity)

... (7.7)
The MFP belongs to the class of problems for which exact solutions in
integers are believed to be computationally unfeasible for large networks.
A standard heuristic uses linear programming to solve the problem in real
numbers and then adjusts the solution found to get an approximate integer
solution to the original problem [18]. A new heuristic procedure [21] has
been developed in the context of the optical network design problem using
the best known polynomial algorithms from Simeone et al [22].
Network design - any of the above problems can be modified to incorporate
a network design objective by adding the constraints:
NETWORK FLOW MODELS
115
E f~j::5
k=l
UijYij
for all (i,j)
... (7.8)
where Yij is a 0-1 variable which represents whether or not a link (i,j) is to
be included in the network with corresponding cost term qjYij'
When any of the above problems has a suitable special structure, a large
number of efficient non-simplex algorithms have been developed for solutions
of each particular problem. Non-simplex methods may often be classified
as either greedy methods or dynamic programming.
A greedy method works in a sequence of stages, considering one input
at a time. At each stage, a particular input forms part of an optimum solution
to the problem at hand. This is done by considering the inputs in an order
determined by some selection procedure which mayor may not be in terms
of the objective (cost) function of the problem. In some cases the greedy
algorithm generates a sub-optimal solution.
A well-known greedy algorithm is the Kruskal algorithm for finding
minimum spanning trees. Interest in spanning trees for networks arises from
the property that a spanning tree is a subgraph 0' of a (nondirected) graph
o such that V(O')= V(O) and 0' is connected with the smallest number of
links. If the nodes of 0 represent cities and the links represent possible
(bidirectional) communications links connecting two cities, then the minimum
number of links needed to connect n cities is n - 1. The spanning trees of
o represent all feasible choices. In practical situations, the links will have
weights assigned to them, e.g. the length of the link, the congestion on the
link, or the cost of construction of the link, etc. The design problem is to
select a set of communications links that would connect all the specified cities
and have minimum total cost or be of minimum length. Therefore the interest
here is in finding a spanning tree of 0 with minimum 'cost' (suitably
interpreted). A greedy method to obtain a minimum-cost spanning tree builds
this tree edge by edge. Kruskal's algorithm uses the optimization criterion
for choosing the next edge in the solution by considering the edges of the
graph in non-decreasing order of 'cost'.
Dynamic programming is another algorithm design method that can be
used when the solution to the problem at hand may be viewed as the result
of a sequence of decision stages. For some problems, an optimal sequence
of decisions may be found by making the decisions one at a time and never
making an erroneous decision. This is true for all problems (optimally)
solvable by the greedy method. For many other problems, it is not possible
to make stepwise decisions (based only on local information) in such a manner
that the sequence of decisions made is optimal. For example, the shortest
path from node i to node j in a network is impossible to find by the greedy
method. But to find a shortest path from node i to all other nodes in a network
G on n nodes, Dijkstra's (dynamic programming) algorithm yields an optimal
solution in O(n 2) basic steps.
One theoretical way to solve problems for which it is not possible to make
a sequence of stepwise decisions leading to an optimal decision sequence is
to try all possible decision sequences, which is termed complete enumeration
and usually involves a number of sequences exponential in the problem size.
Dynamic programming often reduces the amount of enumeration required
using the Principle of Optimality [19]:
'An optimal sequence of decisions has the property that, whatever the
initial state and decisions are, the remaining decisions must constitute
an optimal decision sequence with regard to the state resulting from the
first decision.'
The difference between the greedy method and dynamic programming
is that in the greedy method only one decision sequence is ever generated.
In dynamic programming many decision sequences may need to be generated
to solve the problem at hand. This is illustrated in the context of 'shortest'
path problems.
Shortest-path problems (SP) - three types of shortest-path problem and
corresponding solution methods are of interest:
from one node to another node, i.e. one origin-destination pair Dijkstra's algorithm;
from one node to all the others -
the all pairs problem with computational complexity O(n 3)

Ford algorithm.
Floyd-Warshall algorithm;
-
Bellman-
Let G(N,A) be a directed network with n nodes. Let C: = (C(i,} be an

adjacency cost matrix for G such that C(i, i) = 00, 1:s i:s n. Here C(i,}) is
the 'length' of link (arc) (i,}) if (i,})EA(G) and C(i,J) = 00 if (i,})~A(G).
Given the initial adjacency cost matrix C, the all-pairs shortest-path
problem is to determine a final matrix A such that A(i,}) is the length of
the shortest path from i to }.
Note that the Bellman-Ford recursive (dynamic progamming) algorithm
for this problem works only when G has no cycles of negative length.
NETWORK FLOW MODELS
117
Using Ck(i,j) to represent the length of the shortest path from i to j

going through no node of index greater than k, the following recursion is
obtained:
A(i,j) = min[min[ Ck - l(i, k) + Ck -l(k,j)] ,Ck(i,j)]
isksn
... (7.9)
Then Ck may be computed recursively from this formula by setting

CO(i,j): = C(i,j) and solving:
... (7.10)
For routeing in data networks a new asynchronous distributed version

of the Bellman-Ford algorithm has been proposed [14, 23]. However,
problems with stability arise in all adaptive routeing algorithms based on
shortest paths.
Network flow problems and routeing problems are often referred to
collectively as network layer protocols. A network layer protocol is a
collection of algorithms that may work independently, but may require an
exchange of information between themselves and are usually organized in
some hierarchical structure.
Examples of such embodied algorithms for the selection of routes are
shortest path algorithms (SPA), such as maximal flow or minimum cost-flow
algorithms involving link-capacity constraints. These algorithms must be
supported by further algorithms for co-ordination between all nodes of the
network. Usually such a co-ordination algorithm is some version of an
algorithm for finding the shortest spanning tree (SST) in a graph, such as
the Prim, Dijkstra or Kruskal algorithms [6, 19,24].
In some cases messages must be sent to a specified subset of the nodes
of the network and this subset must remain connected in the event of failures.
This is a Steiner tree problem [15]. When traffic between different OD pairs
must be treated as separate - for example it is transmitted at different
wavelengths in a fibre optic network [25] - the multicommodity flow
versions of simple network flow models result. Multicommodity flow
models can also be used to construct reconfiguration tables for network survivability [26].
In switched data networks, such as Tymnet, a centralized version of a
dynamic programming algorithm is used for routeing. Routeing decisions
are needed only at the time virtual circuits are set up. A simple version of
the basic dynamic programming algorithm involves the simultaneous
calculation of the shortest path from any node to all others [12]. For packetswitched networks, such as Arpanet, the asynchronous distributed version
of the Bellman-Ford shortest path algorithm has been proposed [12]. For
real time application it was shown that this basic algorithm converges to the
optimal routeing distances if the link lengths in the network stabilize and
all cycles have strictly positive length. However, this convergence can be very
slow, which is a particular problem in the case of link failure, when the
algorithm will keep iterating without effective end. This behaviour is known
as counting and in this case data messages cycle back and forth between nodes,
which is called looping. It is obvious that such a problem may completely
destroy communication, particularly in a high-speed network.
7.4
ALGORITHMIC AND COMMUNICATIONS COMPLEXITY
As has already been shown, in practical telecommunications applications it

is very important to have a precise knowledge of the algorithmic (time and
space) complexity of the mathematical optimization problems embodied in
the network protocol. The principle distinction to be made between problems
is to identify those which are polynomially solvable (i.e. for which efficient
algorithms exist) and those which are NP-hard [27] (i.e. for which
examination of essentially all potential solutions appears to be the basis of
any algorithm for their exact solution and hence their processing times are
likely to increase exponentially with the number n of nodes in the network
- see Fig. 7.9 [24]). Both the Steiner tree network problem and (integer)
multicommodity network-flow problems are of this latter type.
The main directions for tractable telecommunications optimization
problems thus involve models and (sometimes approximation) methods
(heuristics) from linear programming and network flows, polyhedral
combinatorics, nonlinear optimization and dynamic programming.
For a parallel distributed algorithm, the concept of communications
complexity is needed. Such a concept originally arose in design problems.
The examples of Figs. 7.10 and 7.11 [28] illustrate this concept. The problem
in this example is to design a systolic 'chip' that checks whether two strings
(Xl> X2,''''X n) and (Yl> Y2, ... ,Yn) of n bits, each arriving simultaneously on
2n inputs, are the same. A single bit at the output port is set to '1' if, and
only if, Xi = Yi for all i. In the Fig. 7.10 chip, the simplicity of topology is
compensated by time - bits are shifted back and forth in 2n steps. In the
Fig. 7.11 chip, (2n 2 - 2n) nodes allow the solution of the problem in O(log
n) time steps. Therefore space is traded for time. Such considerations can
be critical in the design of communications networks when exchange of
information between the nodes in the network is important.
ALGORITHMIC AND COMMUNICATIONS COMPLEXITY
119
65536
16384
4096
1024
256
64
16
4
Fig. 7.9
Fig. 7.10
x,
X2
x3
)(4
)(5
Fig. 7.11
Common computing time functions.
A chip with a single path topology.

)(6
x7
)(8
Vl
Y2
Y3
Y4
Ys
Y6
Y7
Ya
A chip with parallel comparisons.
Some new graph-theoretical models for communications applications have

recently been proposed which require relatively little routeing information.
A basic problem of routeing schemes in switching and communications
networks is to manage the trade-off between computing and transmission
efficiency and buffer storage memory. In Awerbuch and Peleg [23] a
switching network is designed involving a multiple multi-butterfly (multiple

splitter) topology with buffers of fixed size which exchanges packets between
n nodes in 0 (log n) steps (links) without blocking. The graph-theoretic nodecovering problem is used in Upfal [26] to construct a hierarchical adaptive
routeing scheme in an arbitrary communications network on n nodes
involving at most 0 (n 2) links with buffers of size 0 (n log n).
Another solution to routeing complexity difficulties might lie in adopting
a multi-ring topology which allows a combination of the tasks of route
selection and (control signal) broadcasting in one single problem. A simple
ring architecture allows the combination of basic information transmission
with the capability of broadcasting information on the current states of nodes
or links. Using wavelength division multiplexing, a multi-ring virtual topology
can be obtained by connecting each pair i and) of central offices (COs) with
a unique wavelength ~k> k= 1, ... ,K carried on a (single-mode) fibre path
passing through each CO node exactly once. In general, for i,) = 1, ... ,n,
K = n(n - 1) wavelengths are needed to implement the ring network, since
the link between COs i and) must use a different wavelength in each direction
to avoid interference as in Fig. 7.12. It is clear that such a network would
require a number of wavelengths well beyond current projections for
multiwavelength optical systems (e.g. a small network of eight COs would
require 56 wavelengths). A compromise solution using a physical multi-ring
(diverse protection) architecture has been proposed [29, 30], based on a
special case of the circulant travelling salesman problem (CTSP). With this
architecture the K = 0 (n 2) different wavelengths required for a single fibre
ring are reduced by a 'circulant' pattern of wavelength reuse to only K = n - 1
(or to K = n - 1 physical rings each carrying a single wavelength for a fully
independent system) as shown for n = 5 in Fig. 7.13. For an implementation
of logical connectivity with K transmitter-receiver pairs at each CO, each
operating at a different wavelength, the problem is to find the minimum
number K for the required number of diverse physical rings. Special algebraic
properties of the CTSP may be applied to the design of a multiwavelength
optical multi-ring network which is resilient to node and link failures [13].
7.5
CONCLUSIONS
It is important to stress that more attention be paid by telecommunications engineers to theoretical work already completed at a very advanced level in mathematics and theoretical computer science. However, the
chosen methods must be carefully tailored to the application at hand.
CONCLUSIONS
01
Fig.7.12
121
"1
"2
"3
"4
"5
"6
"7
A ring configured from an existing point-to-point fibre optic mesh network (a),
and assignment of wavelengths for full connectivity (b).
"1
'-2
"3
"4
"4
"1
"2
"3
"3
"4
"1
"2
"2
"3
"4
"1
"1
'-2
'"3
"4
01
Fig. 7.13
0
1
A multi-ring network with circulant wavelength assignment.
For example, the choice of objective function for optimization, as of that

for flow models of earlier networks, must be carefully studied, particularly
in light of the subsequent problem of flow updates in high-speed networks.
The high speeds of future networks will necessarily require parallel processing
and simple algorithms, some of which have been introduced here. Further
study of communications complexity in real time applications is needed for
comprehensive use of large volumes of existing optimization problem solution
algorithms.
REFERENCES
l.
Walker G R, Rea P A, Whalley S, Hinds M and Kings N J: 'Visualization of

telecommunications network data', BT Technol J, 11 ' No 4, pp 54-63 (October
1993).
2.
Lea C-T: 'Bipartite graph design principle for photonic switching systems', IEEE,
Trans Commun, 38, No 4, pp 529-538 (1990).
3.
Coan B A, Leland W E, Vecchi M P, Weinrib A and Wu L T: 'Using distributed

topology updates and preplanned configurations to achieve trunk network
survivability', IEEE, Trans on Reliability, 40, pp 404-427 (1991).
4.
Wilson R J: 'Introduction to graph theory', Oliver and Boyd (1972).
5.
Robertson N and Seymour P D: 'An outline of disjoint paths algorithm', in

Korte R et al (Eds): 'Algorithms and Combinatorics', ~,Springer-Verlag, Berlin
(1990).
6.
McHugh J A: 'Algorithmic graph theory', Prentice-Hall Inc (1990).
7.
Gibbens R J and Kelly F P: 'Dynamic routeing in fully connected networks',

IMA J on Math Control and Information, 1-, pp 77-lll (1990).
8.
Gibbens R J, Kelly F P and Key P B: 'Dynamic alternative routeing - modelling

and behaviour', Proc 12th Int Teletraffic Congr, Torino, Italy, 3.4A.3, pp 1-7
(1990).
9.
Kelly F P: 'Loss networks', Ann Applied Probability,
l, pp 319-378
(1991).
10. Higle J L and Sen S: 'Recourse constrained stochastic programming', Proc 6th
Int Conf on Stochastic Programming, Udine, Italy (1992).
II. Sen S, Doverspike R D and Cosares S: 'Network planning with random demand',
Research Report, Systems and Industrial Engineering Dept, University of Arizona
(December 1992).
12. Bertsekas D and Gallager R: 'Data networks', Prentice Hall, Englewood Cliffs
(1987).
13. Kleinrock L: 'Queuing systems: Vol II', Computer Applications (1976).
14. Bertsekas D: 'Linear network optimization, algorithms and codes', MIT Press
(1991).
15. Christofides N: 'Graph theory, an algorithmic approach', Academic Press (1975).
16. Gondran M and Minoux M: 'Graphs and algorithms', Wiley, New York (1984).
17. Hu T C: 'Combinatorial algorithms', Addison-Wesley (1982).
18. Hu T C: 'Integer programming and network flows', Addison-Wesley (1970).
19. Lawler E: 'Combinatorial optimization: networks and matroids', Holt, Rhinehart
and Winston, New York (1976).
20. Dantzig G B: 'Linear programming and extensions', Princeton (1963).
REFERENCES
123
21. Medova E A: 'Network flow algorithms for routeing in networks with wavelength
division multiplexing', Proc 11th UK Teletraffic Symposium, pp 3/1-3/10 (March
1994).
22. Simeone B, Toth P, Gallo G, Maffioli F and Pallotino S (Eds): 'Fortran codes
for network optimization', Annals of Operational Research, 11 (1988).
23. Awerbuch Band Peleg D: 'Routeing with polynomial communication-space tradeoff', Discrete Math, 2., pp 151-162 (1992).
24. Horowitz E and Sahni S: 'Fundamentals of computer algorithms', Computer
Science Press, Potomac, MD (1978).
25. Labordette J-F and Acompora A S: 'Partially reconfigurable multihop lightwave
networks', Proc IEEE Globecom '90, 300.6, pp 1-7 (1990).
26. Upfal E: 'An O(n logn) deterministic packet-routeing scheme', J of ACM, 39,
pp 55-70 (1992).
27. Garey M R and Johnson D S: Computers and intractability: a guide to the theory
of NP-completeness', W H Freeman and Co (1973).
28. Lovasz L: 'Communication complexity', in Korte R et al (Eds): 'Algorithms and
Combinatorics', .2.-, Springer-Verlag, Berlin (1990).
29. Medova E A: 'Optimum design of reconfigurable ring multiwavelength networks',
Proc Tenth UK Teletraffic Symposium, BT Laboratories, pp 9/1-9/9 (April 1993).
30. Medova E A: 'Using QAP bounds for the circulant TSP to design reconfigurable
networks', in Pardalos P and Wolkowics H (Eds): 'Proc DlMACS Workshop
on the QAP', American Mathematical Society, Providence (1994).
DISTRIBUTED RESTORATION
D Johnson, G N Brown, C P Botham, S L Beggs, I Hawker
8.1
INTRODUCTION
The introduction of software managed networks based on synchronous digital

hierarchy (SDH) and asynchronous transfer mode (ATM) technology will
enable faster network reconfiguration, greater flexibility and improved control
compared to existing network technologies. To increase resilience, distributed
restoration algorithms (DRAs) have been proposed [1-7] which seek to
restore routes autonomously within seconds following cable or node failure
without intervention from a central management facility. DRAs combine the
sub-second restoration speeds associated with ring and 'N + l' protection
schemes with the minimal spare capacity requirement of network-wide
restoration. They may also be designed to deal autonomously with disaster
scenarios such as multiple cable breaks and node failures.
This chapter gives an introduction to DRAs and the TENDRA (tool for
the analysis of transport networks employing DRAs) modelling tool developed
to evaluate them. The design of networks incorporating distributed restoration
techniques is also described and the applicability of distributed algorithms
to other areas of network management is discussed.
8.2
NETWORK PROTECTION -
AN OVERVIEW
Historically the protection of telecommunications networks has been realized

by automatic protection switching (APS) systems that include 'N + l' standby
NETWORK PROTECTION
125
[8, 9] and centrally controlled network restoration based on pre-planned

make-good paths [10]. However, both schemes have their limitations. 'N + l'
systems are fast, but only provide protection against individual line system
failures and not against whole cable or duct damage. Centrally controlled
schemes, using real time or pre-planned 'make-goods', provide better
utilization of spare capacity but are slower (typically a few minutes [11])
and depend on the integrity of stored information relating to the network
state.
Future SDH transmission networks will employ computer-controlled
digital crossconnect systems (DCSs), providing greater scope for flexible
routeing and fast reconfiguration. This enhanced functionality will enable
simple distributed algorithms to be loaded directly into the crossconnects for
restoration and bandwidth management, giving improvements in reliability,
resilience and utilization of spare capacity.
A summary chart (Fig. 8.1) compares the performance of various restoration strategies, where it is assumed that customers' service is only affected
by breaks lasting more than one second. It can be seen that link protection
and network protection under central control improve circuit availability by
reducing perceived down times. However, DRAs are able to restore service
within the call drop-out time for digital switches (about 1 sec), thus reducing
the number of faults seen by customers and greatly increasing the customerperceived mean time before failure (MTBF). Only rare multiple failure events
99.9999
99.999
.!!1
'(ij
-10
'5
~
'u
<f.
99.99
99.9
customer perceived MTBF, years
Fig. 8.1
Comparison of core network restoration strategies.
126 DISTRIBUTED RESTORAnON
will affect service to the customer. For data links with a drop-out time as
low as 500 ms, both the MTBF and circuit availability remain much higher
than for alternative restoration methods.
Comparing DRAs to 'end-to-end' path protection the latter is equally
fast but less flexible and much more expensive in standby hardware. DRAs
allow protection capacity to be shared across the network, considerably
reducing the redundancy necessary for a given level of restorability.
8.3
DISTRIBUTED RESTORATION PRINCIPLES
Distributed restoration uses a simple but efficient flood search to quickly

and automatically identify alternative routes after a link or node failure within
the network. There is no need for network databases or control software
as the network acts as its own database. Any new line system or node added
to the network is automatically protected, subject to spare capacity
availability, since there are no protection plans to modify. Each node knows
only its own unique identity and contains a simple set of rules that tell it
how to react when it sees an alarm or a message from a neighbouring node.
To achieve fast message passing, and hence fast restoration, all signalling
should take place along line systems, not via conventional ISO/OSI
(International Standards Organization/open systems interconnection)
signalling stacks. For example, in an SDH network, messages would be
transported in frame overhead bytes carrying a signature impressed
continually on every line system by the end nodes. Registers in all the nodes'
ports store the transmitted and received signatures and notify the node
processor when a signature change is detected due to failure, resulting in a
restoration procedure described below.
8.3.1
Example span restoration
The following example of span restoration illustrates the operation of a DRA

on a simple 5-node network consisting of eight spans with two working links
and one protection link per span (Fig. 8.2(a. The network reacts to a span
failure as follows.
Alarms are detected at the end nodes of the failed span.
PRINCIPLES
127
Each end node compares its own unique identity number with the NID
(node identity) field lodged in its receive register, i.e. the identity of the
node at the other end of the failed span. The node with the lowest number
will become a sender node and the other will become a chooser node
(Fig. 8.2(b)). This selection is arbitrary but necessary to ensure that sender
and chooser nodes are clearly identified.
The sender node sets the target fields of the signatures on all of its
protection line systems to the identity of the chooser node and the source
field to its own identity. This is the start of the sender flooding phase
which is used to identify make-good paths (Fig. 8.2(c)).
Nodes receiving these new signatures react by changing the source and
target fields on their outgoing protection links to the same values, thus
rebroadcasting the flood message. These intermediate nodes are said to
playa tandem role (Fig. 8.2(d)).
Eventually the chooser receives a signature with a target field which

matches its own identity (Fig. 8.2(e)) showing that a restoration route
exists. The chooser then seeks to physically establish the route by sending
an acknowledgement signature back on the same line system.
Records of how incoming signatures have 'mapped' to outgoing

signatures have been kept at each tandem node along the route. Hence
the acknowledgment signature can trace a route back to the sender and
live traffic can be switched over. Tandem nodes make the physical
connections necessary to set up the restoration route when they pass on
the acknowledgement signatures (Fig. 8.2(f)).
Multiple diverse routes can be found either by the sender applying

different index numbers to the signatures it sends on different line systems
or by re-applying the algorithm until enough routes have been found. This
is necessary because it is unlikely that one restoration route will have sufficient
spare capacity to restore all the capacity from the failed span.
A repeat field in the signatures is used to indicate the number of spans
a potential restoration route will take, and hence enable control of the
maximum acceptable path length.
128
la) etwork be'Ote span failure
(d) 'n'ermediate nodes enler landem Slate
(b) Alarms received and roles cleelcled
(e) Chooser r8C8IV8S signatures
(e) Sender flood"'ll
(I) Network whh restoration routes sel up
[3
alarm
prolectkln line syslem

working line system
[3
signature
)(
key
Fig. 8.2
Operation of a distributed restoration algorithm on a simple 5-node network.
TENDRA SIMULAnON MODEL
8.4
129
THE TENORA SIMULATION MODEL
TENDRA is a tool developed to evaluate distributed network algorithms.

It can be applied to any network topology where data relating to the topology,
elements and the DRA to be simulated (Fig. 8.3) are known, and may also
include mean time to repair (MTTR), mean time between failure (MTBF),
cost and bit rate.
There are currently (Fig. 8.4) four simulation modes:
all-spans mode - each span will fail in turn and invoke the DRA to
find alternative routes, the time taken to find each replacement route
for all failed links being recorded and displayed for each span;
interactive mode -
free-run mode - a discrete event availability simulation, through which

elements are failed and repaired randomly in accordance with their MTBF
and MTTR;
all-nodes mode - similar to all spans mode, except that each node in
the network fails in turn.
allows the user to select which element to fail;
type 01 ORA
and relevant
data
physical
network data
and topology
volume 01
messaging
traffic
graphical
representation
01 ORA
Fig. 8.3
Data and control flow in and out of the TENDRA model.
cable
link
Fig. 8.4
Construction of a simple network.
TENDRA outputs include:
time to restore individual links following a failure;
restoration routes found by an algorithm in response to a failure;
number of messages sent during a restoration event.
8.5
SIMULATED NETWORK RESTORATION TIMES
This section presents simulation results for span restoration times based on
a hypothetical SDH transport network. Protection links were added to the
network using a heuristic algorithm (described in section 8.8) to enable
restoration of any single span failure. The basic topology of the network
(Fig. 8.5) comprises 30 nodes and 57 spans with a total of 332 working links.
The simulation was run using TENDRA in the all-spans mode described
earlier.
SIMULATED NETWORK RESTORATION TIMES
Fig. 8.5
131
Test network.
The results (Fig. 8.6) indicate that distributed span restoration in an SDH
network is feasible in - 1 sec. This offers the possibility of restoration within
the call drop-out threshold, providing customers with an uninterrupted
service; 5 ms processing and 20 ms crossconnection times were assumed, which
represent modest modifications to current crossconnect specifications [12].
132 DISTRIBUTED RESTORATION

40
35
30
real time and pre-planned
CI real time only
25
0-.!2.
-0
Q)
.9
'"~
"'~'""
20
15
10
5
0
0
500
restoration time, ms
Fig. 8.6
8.6
Result of experiment to cut all spans sequentially.
ALTERNATIVE APPROACHES TO DISTRIBUTED

RESTORATION
Ideally networks should be resilient not just to span failures but also to
multiple cable failures and node failures. All network faults should be
imperceptible to customers. In this section alternative approaches to
distributed restoration are considered [13] to determine which offer the best
prospects for achieving this aim.
8.6.1
Real time versus pre-planned span restoration
Instead of using a DRA to find restoration in real time, look-up tables of

restoration routes could be built while the network is operating normally,
using the processing power available during operational periods to reduce
the search activity when failure events occur. Pre-planned restoration can
provide up to a fourfold speed advantage (Fig. 8.6) [14]. However, it does
not guarantee successful restoration since there is always a possibility that
the solution may be out of date. An optimum solution is to use real time
protection as a back-up to pre-planned restoration to produce a 100%
restorable system.
Table 8.1 summarizes the main features of pre-planned and real time
restoration.
ALTERNATIVE APPROACHES
Table 8.1
133
Pre-planned versus real time distributed restoration.
Pre-planned
Real time
Speed
very fast
fast
Storage required
grows with network size
none
Risk of non-restoration
small
extremely small
8.6.2
Span versus path restoration
A further option is for the DRA to construct a restoration between the end
nodes of each failed path 1 rather than the end nodes of the failed span 2
Path restoration is more efficient in its use of spare capacity than span
restoration because the whole path is re-routed and re-optimized. It is also
more flexible because it can restore multiple span and node failures. Figure
8.7 illustrates how path restoration can be initiated - a span carrying two
paths has failed, and the end nodes of the span have detected the failure and
have sent messages along the affected paths to notify their end nodes. A sender
and chooser node are selected for each path and restoration completed as
previously described. Table 8.2 summarizes the main features of span versus
path restoration for span failures.
sender 2
path 1//-...
...
. //~
sender 1
detecting node
failure
chooser 1
Fig. 8.7
Table 8.2
end node of path
Path restoration initiation.
Comparison of span and path restoration for span failure.
Span restoration
Speed
fast
Path restoration
Spare capacity utilization
moderate
moderate
very good
Extensible for node restoration
difficult
simple
I A 'path' is defined here as a bi-directional circuit, routed from one node to another via any
number of intermediate nodes.
2 A 'span' refers to the collection of all line systems directly between two nodes.
8.6.3
Extension to node restoration
Node restoration seeks to restore all paths through a failed node. The two
principal methods are path restoration, as described in section 8.6.2, or local
restoration of paths within spans adjacent to the failed node (Fig. 8.8).
In the latter method, every node records the identities of the previous two
nodes visited by each path so that, when a node sees an alarm, it is able to
initiate restoration either with its neighbour or with its neighbour's
neighbours.
Table 8.3 compares the features of node restoration between adjacent
nodes and end-to-end path restoration.
8.6.4
Integrated restoration strategies
From the preceding discussion, it can be seen that no one method of applying
distributed restoration is better than all others in all circumstances. For
--0
D
,;'
,8
Fig. 8.8
Table 8.3
node A can see that

paths on the alarmed
span have passed
through nodes B,C,D & E
Use of path tracing.
Comparison of approaches to node restoration.
Restoration between
adjacent nodes
End-to-end path
restoration
Speed
fast
moderate
Spare capacity
utilization
moderate
good
Extensible for multiple

node restoration
difficult
simple
DISTRIBUTED RESTORATION IN REAL NETWORKS
135
example, an algorithm which can restore node failures may be inefficient

at handling span failures. This has led to the notion of adapting and
combining algorithms to form integrated restoration strategies [14]. An
example strategy is outlined below:
try pre-planned span restoration first (very fast);
try real time span restoration for any spans not restored;
try real time end-to-end path restoration for any outstanding faults (slow,
but copes with any network fault including node failures).
A general form of this strategy is illustrated in Fig. 8.9.
pre-planned
span
restoration
first-level
protection
second-level
protection
real time
span
restoration
third-level
protection
real time
path
restoration
Fig. 8.9
8.7
pre-planned
path
restoration
pre-planned
node
restoration
Multi-level restoration strategy.
EVOLUTION TOWARDS DISTRIBUTED RESTORATION IN

REAL NETWORKS
Migration from fully centralized restoration to fully distributed restoration

should be an evolutionary process in telecommunications networks as follows.
Stage 1 - central database and centrally controlled restoration, where
restoration routes:
are determined in real time when a fault occurs; or
are determined in advance and stored centrally.
Restoration would typically be achieved in 2-5 min [11], limited by signalling

and database access times.
Stage 2 - centralized creation of restoration plans downloaded to network
element managers and implemented locally. It requires extra memory in the
network element managers, but avoids reliance on the network management
centre when a fault occurs. If the network changes frequently, there may
be a significant signalling overhead in downloading restoration plans, and
hence the plans may become out of date. Restoration times of - 1 sec are
possible because all information is stored locally.
Stage 3 - distributed creation of restoration plans stored in network element
managers. The storage requirements are the same as for stage 2. In addition,
message passing is required between neighbouring nodes in order to find the
restoration routes (in an SDH network, frame overhead bytes could be used).
The benefit of this is that the network is now independent of the management
centre. It will, of course, inform the management centre of any distributed
restoration actions taken. Faster processors in element managers may be
desirable to allow the network to update its plans within a reasonable time
following a change to the topology, e.g. adding a new transmission system.
Again, restoration times could be - 1 sec.
Stage 4 - distributed real time restoration in addition to stage 3. This
provides greater resilience than stage 3, because real time distributed
restoration can be used as a back-up if the pre-computed restoration plan
fails. Once again restoration times could be - 1 sec.
Key features of the four stages are summarized in Table 8.4.
Table 8.4
Summary of restoration strategies.
Stage
Route
finding
Route
storage
Dependency
on management centre
central preplanned or
real time
central preplanned
distributed
pre-planned
distributed
pre-planned
and real
time
central
total
distributed
distributed
for preplanning
none
distributed
none
3
4
Restoration
time
2-5 min
I sec if plans OK
I sec
< I sec
PLANNING FOR DISTRIBUTED RESTORAnON
8.8
137
PLANNING FOR DISTRIBUTED RESTORATION
For restoration algorithms to give a desired quality of service, sufficient spare

capacity must be planned into the network. Networks built from DCSs have
a meshed architecture and do not require protection capacity to be dedicated
to the restoration of any particular fault. Consequently, the number (and
cost) of protection systems can be much lower than for a comparable ringbased or N + M protected network. Several approaches to optimizing spare
capacity have been proposed, including heuristic algorithms [4, 15], linear
programming [5] and simulated annealing.
A heuristic algorithm [IS] which minimizes the number of protection
links required to make a network resilient to single span failures has been
extended to optimize network cost and to protect against multiple span and
node failures [16].
The resulting designs can be used with either distributed or centralized
restoration. The algorithm:
designs new protection networks or finds optimum places to add

protection to existing networks;
runs in a time which scales polynomially, not exponentially, with network

size;
produces graphs showing the effects of adding and removing protection

systems.
Figure 8.10 shows the restorability versus redundancy plot for an example
network design (see Fig. 8.5).
The benefit of optimizing true cost rather than the number of protection
links depends on the variation of link costs within the network. Networks
containing a wide variation in link lengths, a range of environments, or which
use a mixture of transmission technologies (e.g. fibre and radio) may benefit
significantly from true-cost optimization.
Placing spare capacity in such a way that the restoration algorithm can
find all the make-good routes is harder for node failures than for span failures
since the design and restoration algorithms must use the same technique for
handling contention for spare capacity between the failed paths. An extended
heuristic design algorithm solves this problem by prioritizing restoration
actions.
The planning algorithm described above has been applied successfully
to networks across Europe.
redundancy, %
Fig. 8.10
8.9
Restorability versus redundancy plot for example network design.
OTHER APPLICATIONS FOR DISTRIBUTED CONTROL
Taking a broader view of network management, distributed route finding

algorithms can be used for applications other than restoration. Examples
include circuit assignment, network audit and overload avoidance. Distributed
restoration becomes increasingly attractive when deployed as part of a suite
of such applications since it then offers the possibility of removing significant
central network control costs. Potentially, distributed control could make
a significant impact on the shaded areas of the CNA-M (Co-operative
Network Architecture - Management) function diagram shown in Fig. 8.11.
Fig. 8.11
CNA-M network management functions.
CONCLUSIONS
139
For example, with distributed circuit assignment deployed in future SDH

networks, customers could be able to set up a new private circuit immediately
on demand, without the need for detailed route planning by the network
operator. Distributed route-finding algorithms would be embedded in the
network down to the 2 Mbit/s level to provide 'Megastream' on demand.
The customer or network operator would initiate a 'sender' process at one
end of the required circuit and a 'chooser' process at the other. The network
itself would then work out where capacity was available to route the new
demand and, after checking the acceptability of the route, install the circuit.
No databases would need to be accessed to establish the route. A distributed
circuit assignment algorithm has been developed with four main stages:
the network operator (or end user) selects the source (sender) and
destination (chooser) nodes and requests a number of circuits;
the sender automatically transmits messages to each of its nearest neighbours, and messages 'flood' the whole network eventually reaching the
chooser; network resources are reserved at each node during flooding;
the chooser transmits an acknowledgement to the sender along each

selected path established above;
the sender transmits a confirmation along each temporary path, seizing

required network resources permanently and releasing unwanted capacity.
Simple metrics have been associated with the assigned routes to regulate
path length to avoid areas of the network which are already heavily loaded.
The algorithm has been validated on a test network (Fig. 8.12) and all circuits
were assigned in less than 100 ms, for example, between nodes X and Y.
8.10
CONCLUSIONS
DRAs are simple, fast and can find multiple diverse routes around a network
failure. No databases are required and no co-ordinated or centralized control
is needed to find the routes. Any new line system or node is automatically
protected without the need to modify protection plans.
DRAs have several distinct advantages over traditional centrally
controlled approaches.
The algorithms are fast. Recent simulation work using the TENDRA
simulation model has demonstrated the potential for sub-second
restoration in an SDH network compared to minutes for centralized
140
'""""
~---"
\
........
\....
\
....
....
........."
.....
.....
"",',
..........
,,
.....
1{q
II
I
I
I
\
,
,
"
I
,
I
:
\,
I
I
\\
\
\ I
"
~
\
x---.,\
I
I I
I I
I I
II
II
II
"
.... -
h--.:q\
~ -
e .... ~""~ ....
<f-:.:-----~-,'
, -'& \
'"
----b
"rr""tI',.,,'"
~,
Fig. 8.12
Test network for distributed circuit assignment.
schemes. This opens up the possibility for restoration to be completed

before the call drop-out threshold of a digital switch is exceeded, thus
avoiding customers being disconnected.
No databases are required during restoration, hence there is no problem

ensuring stored information is up to date and correct. Moreover, no
updating of information is required when extensions are made to the
network. The algorithms cover network extensions automatically, subject
to spare capacity availability.
REFERENCES
141
Increased resilience is achieved because there is no central controller

involved in the restoration process.
However, it is important to conduct accurate simulations of DRAs so

that a good understanding of the overall effects of distributed restoration
is obtained.
Restoration using pre-planning makes full use of crossconnect processing
power to ensure rapid response, but requires local storage. Real time
algorithms require minimal storage and are never out-of-date but are slower.
Accordingly, pre-planned restoration should be complemented by real time
restoration, and a multilayer framework has been developed to accommodate
such an approach.
Flexibility, low cost and simplicity are of paramount importance. A
strategy for evolving in stages from centralized restoration to distributed
restoration has also been described, each stage offering more of the benefits
of distributed restoration. These techniques offer many advantages and are
likely to be used increasingly in telecommunications networks in the next few
years.
APPENDIX
List of acronyms
APS
ATM
DeS
DRA
GUI
ISO
MTBF
MTTR
OSI
SDH
TENDRA
automatic protection switching

asynchronous transfer mode
digital crossconnect systems
distributed restoration algorithm
graphical user interface
International Standards Organisation
mean time before failure
mean time to repair
open systems interconnection
synchronous digital hierarchy
tool for the evaluation of network
distribution restoration algorithms
REFERENCES
l.
Grover W D and Venables B D: 'The self-healing network protocol: functional

description and implementation', TR-92-1O, TRLabs, Edmonton, Canada (1992).
142 DISTRIBUTED RESTORATION

2.
Han Yang C and Hasegawa S: 'FITNESS - failure immunization technology

for network service survivability', Proc IEEE Global Conf Commun
(GLOBECOM '88), Hollywood, USA, pp 1549-1554 (1988).
3.
Homine H, Chujo T, Ogura T, Miyazaki K and Soejima T: 'A distributed

restoration algorithm for multiple-link and node failures of transport networks' ,
Proc IEEE Global Conf Commun, (GLOBECOM '90), San Diego, USA, pp
459-463 (1990).
4.
Chujo T, Komine H, Miyazaki K, Ogura T and Soejima T: 'Distributed selfhealing network and its optimum spare-capacity assignment algorithm',
Electronics and Communications in Japan, Part 1, 74, No 7 (1991).
5.
Sakauchi H, Nishimura Y and Hasegawa S: 'A self-healing network with an

economical spare-channel assignment', Proc IEEE Global Conf Commun, San
Diego, USA, pp 438-443 (1990).
6.
Brown G N, Donachie S J, Beggs S L, Johnson D and Botham C P: 'TENDRA

- a simulation tool for the analysis of transport networks employing distributed
restoration algorithms', lEE Colloquium on 'Resilience in Optical Networks',
lEE, Savoy Place, London (October 1992).
7.
Johnson D, Beggs S L, Brown G N, Botham C P and Hawker I: 'Distributed

algorithms for restoration in telecommunications networks', Proc 5th Bangor
Communications Symposium, University of Wales, Bangor, UK, pp 76-79 (June
1993).
8.
Schickner M J: 'Service protection in the trunk network: Part 1 - Basic

Principles', British Telecommunications Eng J, 1-, pp 89-95 (July 1988).
9.
Schickner M J: 'Service protection in the trunk network: Part 2 - AutomaticallySwitched (I-for-N) Route protection system', British Telecommunications Eng
J, Vol 1-, pp 96-100 (July 1988).
10. SchicknerM J: 'Service Protection in the Trunk Network: Part 3 Automatically-Switched Digital Service Protection Network', British
Telecommunications Eng J, 1-, pp 101-109 (July 1988).
11. McCafferty J K and Spada M E: 'Network restoration: putting the network back
together again - faster', Telephone Engineer and Management, 97, No 16,
pp 25-28 (August 1993).
12. Bellcore: 'Digital cross-connect systems in transport network survivability', SRTNWT-002514, Issue 1 (January 1993).
13. Johnson D, Brown G N, Beggs S L, Botham C P, Hawker I, Chng R S K, Sinclair
M C and O'Mahony M J: 'Distributed restoration strategies in telecommunications networks', Proc International Communications Conference (lCC'94),
New Orleans, USA (May 1994).
14. Chng R S K, Sinclair M C, Donachie S J and O'Mahony M J: 'Distributed
restoration algorithm for multiple failures in a reconfigurable network', Proc
5th Bangor Communications Symposium, University of Wales, Bangor, UK,
pp 203-206 (June 1994).
REFERENCES
143
15. Grover W D, Bilodeau T D and Venables B D: 'Near optimal spare capacity

planning in a mesh restorable network', Proc IEEE Global Conf Commun,
Phoenix, USA, pp 2007-2012 (1991).
16. Johnson D: 'A heuristic technique for designing protection networks', lEE
Colloquium on 'International Transmission Systems', lEE Savoy Place, London
(February 1994).
9
INTELLIGENT SWITCHING
R Weber
9.1
INTRODUCTION
The traffic in an ATM (asynchronous transfer mode) network is packaged

in cells and carried over links between switches in the network. Traffic sources
are bursty and so for periods of time cells may arrive at a given switch faster
than they can be switched to output links. Switches are buffered to cope with
short-term bursts of greater than average traffic intensity, but cells will be
lost when buffers are full.
Quality of service (QoS) depends on many factors, but it is typically a
function of the cell-loss rate. A problem for the designer of call admission
and routeing protocols is to estimate how many calls can be carried through
a switch, while meeting such a QoS constraint. Typically, the cell-loss rate
should be comparable to the rate of losses due to transmission in the fibre.
This is about one in 108 cells.
Two lines of research are described. Both are concerned with the problem
of deciding whether additional calls can be admitted without raising the cellloss rate above a given threshold. Section 9.2 describes a method for making
an on-line estimate of the cell-loss rate and for deciding if additional calls
can be accepted and routed through a switch. Section 9.3 explains how to
associate an effective bandwidth with each class of traffic, so that if there
are m classes of traffic, and class i has an effective bandwidth aj' then a
switch of bandwidth c can safely accommodate N j calls of class i if, and only
if,
I Nja j ~ c.
E::
ON-LINE ESTIMATOR OF SPARE CAPACITY
9.2
145
AN ON-LINE ESTIMATOR OF SPARE CAPACITY
Consider a single switch, whose bandwidth is such that it can route C cells
per second and whose input buffer is B cells. It is reasonable to imagine that
calls can be classified into m classes - such as video, telephone, fax or file
transfer; calls in the same class have the same statistical characteristics.
Suppose that the proportion of calls in class i is fixed at Pj' and so if there
are N calls in progress the number in class i is Npj' Denote by F(N,B,c) the
average number of cells that are lost per second. Suppose the QoS constraint
requires the frequency of cell loss to be less than some small amount, say
F(N,B,c)-s.1O- 8 . Given this constraint, one asks for the value of the
maximum possible N. This is a difficult question. One way to address it is
by a queuing theory approach - one selects a probabilistic model for the
burst traffic and then attempts to calculate F(N,B,c). However, any traffic
model that is simple enough to be treated by this type of analysis is unlikely
to be rich enough to encompass the variety of traffic characteristics one would
expect to meet in practice.
The approach here is different. On-line measurements are made to
estimate the cell-loss rate and to decide whether additional calls can be
admitted. For example, if the switch is presently carrying 200 calls, the present
cell-loss rate might be estimated as less than 10- 8 and further that it will
not become greater than 10- 8 even if a further 20 calls were to be routed
through the switch. The problem with this approach is that it is difficult to
estimate the cell-loss rate when this rate is so small. Cells are lost very
infrequently and there will be little information on which to base an estimate
of the loss rate. Furthermore, it is not at all clear how F(N,B,c) scales in
N. What is the number of extra calls that can be safely routed through the
switch if all we know is that L(200,B,c) is about 10- IO ?
Fortunately, key insights are provided by the theory of large deviations.
This is a theory that is concerned with rare events - precisely such events
as infrequent buffer overflows. The theory is a rich one that has many
applications (see Bucklew [1]). Three important insights of the theory are
that:
the frequency of a given type of rare event tends to zero exponentially

in the values of important system parameters;
times between successive occurrences of a rare event are distributed as

exponential random variables;
if a rare event occurs, then it does so in the most likely of the ways that
it can happen.
146
It is the first of these that suggests how F(N,B,c) might be estimated.

Suppose a buffer of size B starts from empty, receives input from N bursty
sources and is served at a constant rate c. Let (N,B,c) be the probability
that the buffer overflows before it is next empty. Since an overflow is a rare
event, the probability of this event is small. Clearly (N,B,c) is much the
same as F(N,B,c). In fact, they are the same to within approximately a
constant multiple. It can be shown that the following asymptotics in B apply
- they are stated here for <1>, but hold for F as well. They are illustrated
in Fig. 9.1, in which some simulation estimates of F(N,B,c) are plotted against
B for a Markov modulated fluid source traffic model:
log (N,B,c)
- BH(N,c)
... (9.1)
o(B)
where:
H(1
+ E)N,c)
= H(N,c/(1
+ E
... (9.2)
Note that equations (9.1) and (9.2) imply that for any k> 1:
<1>(1 +E)N,B,c) =
[(N,B/k,c/(1
+ E) +
O(B)]k
... (9.3)
In equations (9.1) and (9.3) the terms in o(B) are such that o(B)/B- 0
as B-oo. In, other words, these are asymptotics for large B. Equation (3),
with E = 0, provides a method of estimating (N,B,c). The idea is to observe
E
:0
<ll
.g
Q.
~
.Q
~
u
N=340
on/off = 25/50 ms
/\, = 2500 cells/s
c = 350 000 cells/s
0.1
0.01
0.001
0.0001
1e-05
1e-06
2%
1e-08
09~
= 0, 0.01, 0.02
O~~~
1e-07
.1e -
increasing linearly in N
\ \
_ _~
Fig. 9.1
_ _~..,--_----.,.~_ _~
400
600
buffer size
Loss rate versus buffer size.
ON-LINE ESTIMATOR OF SPARE CAPACITY
147
the offered traffic, but to pretend that the buffer is only as large as B/k,
for some k> 1. The frequency with which a buffer of this size would overflow
during its busy periods can be estimated by a simple on-line simulation, that
can be implemented in software and which simply counts cells in a small
virtual buffer of size B/k. If, for example, k = 4, then the on-line simulation
would keep track of the contents of a buffer that is only one quarter as large
as it is in reality. If the frequency of buffer overflow in this smaller buffer
is estimated to be 8 x 10- 3, then by equation (9.3) an estimate of the
frequency of buffer overflow in the actual buffer is the fourth power of this
quantity, i.e. 4.096 x 10- 9 Since the frequency of overflow in the small
virtual buffer is relatively large, in this example 8 x 10-3, it should be
possible to get a reasonable estimate of the overflow rate.
Equation (9.3) also suggests a way of estimating whether or not it is
possible to increase the number of calls that are routed through the switch
by some small percentage, say by 100OJo. Again, over some period of time,
we should conduct an on-line simulation to measure the cell-loss rate that
would occur if the buffer were of size B/k and the switch bandwidth were
of size c/(l + ) cells per second. The result of this simulation can be used
to estimate the cell-loss rate in the actual buffer for a bandwidth of c/(l + ).
By equation (9.2), it follows that if the QoS constraint is satisfied under the
reduced bandwidth then 100% more calls can be routed through the switch
when it operates with the true bandwidth, c. Note that the assumption that
p is fixed corresponds to the assumption that the extra N calls will occur
in the same mix of classes as those calls already present at the switch.
The on-line simulator that carries out the two estimation procedures
described above has been called MINOS (monitor for inferring network
overflow statistics), evoking the island of Crete where, at the Computer
Science Institute in 1990, the idea of this simulator was originated by
Courcoubetis, Walrand and Weber [2]; there, also, details of the derivation
of equations (9.1) and (9.3) can be found. A principal advantage of MINOS
is that it does not require any assumptions to be made about the statistical
nature of the traffic. MINOS uses actual observed traffic to make its
inferences; it is adaptive and adjusts its recommendations to changing patterns
and types of calls.
Subsequent research has addressed various issues concerned with practical
implementation; some relevant remarks are given here.
To obtain a better estimate of if!(N,B,c) , it is valuable to estimate
(N,B/ k ,c) at three values of k. This is because a more refined version of
large B asymptotic is:
10gif!(N,B/k ,c) = A
+ ~log(B/ k) - (B/k)H(N, c) + o(B)
148
where A, ~ and H are unknown and can be estimated by evaluations at three

values of k. For example, in one series of experiments with a Markov
modulated fluid model, the values (20,15,19) worked well for B= 1800.
Recently, Courcoubetis and Weber [3] have established a different
asymptotic, in N. Suppose there is a fixed distribution of calls amongst classes,
so that as N increases there are Npj calls in class i, where Ejpj = I. Suppose
that, as N increases, the buffer per source and bandwidth per source is
constant, i.e. B=NBo and c=Nco. Then:
10g4>(N,B,c) =
N/(Boco) + o(N)
... (9.4)
where o(N)IN-O as N-oo. Equation (9.4) can also be used to obtain a

MINOS estimate of buffer overflow frequency. Suppose N sources are routed
through a switch with total bandwidth c and total buffer B. Define Co = clN
and Bo = BIN. Use an on-line simulation to measure the cell-loss rate that
would occur if NIk sources were routed through a switch having total
bandwidth N colk and total buffer N Bolk. Because the asymptotic suggests
that the loss rate should be about exp( - (NI k)/(Bo,co, the actual loss rate
with N sources can be extrapolated to be the kth power of this amount. For
example, suppose the channel is carrying 1000 sources, with a loss rate of
about 10- 10 . An on-line measurement of the loss rate for 200 sources in a
virtual switch, with one-fifth the bandwidth and buffer space, should reveal
a loss rate, say p, of about 10- 2 At this loss rate cell losses will be observed
in a relatively short time and the loss rate can be measured satisfactorily.
We should take p5 as an estimate of the actual cell-loss rate. Of course there
is the opportunity to make five independent estimates of p, using five groups
of 200 sources. Numerical work is under way to investigate the practicality
of this idea. However, it appears that the large N asymptotic often gives a
substantially better estimate of the loss rate than does the large B asymptotic.
Furthermore, the large N asymptotic is more appealing for practical reasons.
Usually, the amount of buffer per source is small. Savings due to statistical
multiplexing occur not because of an averaging of traffic within a large buffer,
but because of averaging of a large number of sources.
The estimates obtained by the MINOS procedure can have large variances.
A better procedure might be to use either the large B or large N MINOS
procedures to estimate an effective bandwidth for each class of call
individually. In other words, when carryng N j calls in a given class i, we
might try to find, for each m class a value of reduced switch capacity OjC
OJ < I, such that when the switch bandwidth is reduced to OjC, the value of
F(Nj,B,Ojc) exactly matches the maximum permitted loss rate. The effective
EFFECTIVE BANDWIDTHS
149
bandwidth of a class i call would be taken as ()jclNj . The idea of effective

bandwidths is pursued in the subsequent section.
9.3
EFFECTIVE BANDWIDTHS
Here we adopt an approach which also has some of the advantages of the
previous section - a simple characterization of source statistics that can be
measured on-line. However, the emphasis is the direct calculation of effective
bandwidths for calls in different classes. Suppose that a switch handles M
classes of traffic and has capacity to handle c cells per second. A number
of authors have described models for which the condition, that the switch
can carry N j sources of class i and the probability of buffer overflow be kept
smaller than some specified amount, can be written:
c~
E Ni(Xi
... (9.5)
i=!
where (Xi is the effective bandwidth of a source in class i, i= 1, ... ,m, e.g. see
Kelly [4], Courcoubetis and Walrand [5]. When a source is bursty, its
effective bandwidth will be somewhat greater than its average rate.
However, because at any given moment some sources are producing cells
above their average rates and others below, there is potential for statistical
multiplexing. This means that each source's effective bandwidth need not
be as great as its peak rate.
Again the analysis is based on the theory of large deviations. We suppose
that time is discrete and each of the N i sources in class i deliver to the buffer
numbers of cells at successive time points that are independently distributed
as the stationary process [X:,t = 1,2, ... As in the previous section, let
ip(N,B,c) be the probability that the buffer overflows during a busy period,
where Npi is the number of calls in class i. Kesidis and Walrand [7] have
shown that equation (9.1) holds, and:
J.
H(N,c) = sup[():c~ E N;i())/()],

1
where:
i()) = lim log
T-CXl
E [ex p ()(
Exi )]
t=!
is the asymptotic logarithmic moment generating function for traffic in class

i. The requirement ip(N,B,c) $, e- IJ can be written:
150
c~
E NNx;(OIB)
i
where 0';(01B) = 4>;(B)/(OIB), and is thus identified as the effective bandwidth of class i. Courcoubetis and Weber have shown that it is possible to
make the expansion:
... (9.6)
where
mi =
EX! is the mean bandwidth and 'Yi is:
Exi)
1'; = lim (l/T)E [(

T-oo
t=
2 ]
This is often called the index of dispersion. It is also 1f times the spectral
density evaluated at 0, i.e.:
00
'Yi = 1fJi(0) = 'Yi(O)
+ 2 E 'Yi(k)
k=\
where 'Yi(k) is the kth order autocovariance of the process (Xli J. The above
converges for well-behaved, purely nondeterministic second-order stationary
processes. In the case that the numbers of cells that a source produces in
successive periods are independent, 'Yi is the variance. In general, I' can be
estimated from the data by spectral estimation techniques (see, for example,
Chatfield [8]). It is attractive that effective bandwidths might be estimated
observed data, since it is unlikely that any theoretical model is rich enough
to adequately model all traffic classes.
It is interesting to observe what happens if the source is pre-smoothed
by linear filtering, say:
where ao + ... -t.... a p = 1 is imposed, so that the mean does not change.
Then, because 1;(0) = IT(0)12Ji(0) and the transfer function, IT(O) I =
I Ei ad = 1, it is found that 'Yi does not change. What happens is that presmoothing, effected by averaging inflows of several periods, decreases the
variance, but simultaneously increases higher-order autocovariances, but the
combined effect is that the effective bandwidth is unchanged. This is not
too surprising, since the effects of pre-smoothing are not really seen by a
very large buffer, and it is large buffers with which these effective bandwidths
are concerned.
CONCLUSIONS
151
Another interesting observation is that the effective bandwidths given

by equation (9.6) are invariant within a constant multiple with respect to the
chosen discretization of time. This follows from the invariance of 'Yi under
pre-smoothing. If the definition of a time period is changed from 1 to 2
seconds, so that the number of cells produced by a class i source is now Ydt
= Xl i_ I + Xli, n = 1,2, ... , then the effect is the same as if the process had
been smoothed with ao=0.5, al =0.5 and then multiplied by 2. Then 'Yi is
multiplied by 2, just as it should, since in the new model c and mi are also
multiplied by 2.
One can find expressions for the effective bandwidth of some theoretical
models of a source. A number of authors have studied the two-state Markov
modulated fluid source whose rate is controlled by a Markov process. The
state of the Markov process at time t is denoted XU); it alternates between
states 1 and 2 and has holding times in these states that are exponentially
distributed with parameters Ai and !J.i' The rate of the fluid source is 0 and
ai as the Markov process is in states 1 and 2 respectively. The process is in
state 2 with probability A/(Ai + !J.i)' The index of dispersion, 'Yi, can be
computed for this source, either by discretizing the process, or by proceeding
directly in continuous time. It is found:
This makes sense in various ways. It has the right dimensionality properties,
scaling correctly in time and in cells. It agrees asymptotically, as O/B-O,
with the effective bandwidths given in de Veciana et al [9].
9.4
CONCLUSIONS
This chapter has explained how insights arising from the theory of large
deviations can be used to make on-line estimates of the cell-loss rates arising
at the buffered switches of an ATM network.
The approach also suggests that the effective bandwidth of a bursty traffic
source can be computed as a function of the mean source rate and its index
of dispersion. These ideas are simple to implement, have worked well in some
simulations, and are presently receiving further development and refinement.
152
REFERENCES
I.
Bucklew J A: 'Large deviation techniques in decision, simulation and estimation',

John Wiley, New York (1990).
2.
Courcoubetis C, Kesidis G, Ridder A, Walrand J and Weber R R: 'Admission

control and routeing in ATM networks using inferences from measured buffer
occupancy', IEEE Trans on Comms, to appear.
3.
Courcoubetis C and Weber R R: 'Buffer overflow asymptotics for a switch

handling many traffic sources, unpublished manuscript (1994).
4.
Kelly F P: 'Effective bandwidths at multi-class queues', Queuing Systems, 9,

pp 5-16 (1991).
-
5.
Courcoubetis C and Walrand J: 'Note on the effective bandwidth of ATM traffic

at a buffer', unpublished manuscript (1991).
6.
Courcoubetis C and Weber R R: 'Effective bandwidths for stationary sources',

to appear in Probability in the Engineering and Informational Sciences (1994).
7.
Kesidis G, Walrand J and Chang C S: 'Effective bandwidths for multiclass

Markov fluids and other ATM sources', IEEE/ACM Transactions on Networking, -.L, pp 424-428 (1992).
8.
Chatfield C: 'The analysis of time series: theory and practice', Chapman and
Hall, London (1975).
9.
de Veciana G, Olivier C and Walrand J: 'Large deviations for birth death Markov
fluids', Probability in the Engineering and Informational Sciences, 7 , pp 237-235
(1993).
-
10
NEURAL NETWORKS
S J Amin, S Olafsson and M A Gel!
10.1
INTRODUCTION
Packet switching structures are envisaged in emergent broadband switching

systems [1], and, although it is generally agreed that such networks will rely
on asynchronous transfer mode (ATM) within the lower layers of a protocol
architecture agreed in CCITT, many challenges exist in realizing the necessary
high-speed packet-switching technologies [2, 3]. Some of the issues, which
impact on service quality control, have been addressed, for example, in terms
of using a back-propagation neural network for achieving ATM network
control [4]. In this work, a neural network approach to an access switching
problem is explored. This problem has attracted interest recently because of
the requirement for adaptive and robust switching elements in packet
switching networks. A review of applications of neural networks to switching
and routeing schemes is given in Brown [5].
In a recent study [6] Ali and Nguygen explored the use of a neural
network (the Hopfield dynamical model) within an access scheme for a highspeed packet switch. Use of a Hopfield neural network for control of a switch
was first proposed in 1989 [7]. Although Ali and Nguygen demonstrated
that the Hopfield dynamical network could be used to obtain high solution
accuracy, long simulation times and choice of optimization parameters are
a problem. This puts severe restrictions on the size of the switching system
that can be studied.
154
NEURAL NETWORKS
In this chapter a detailed analysis of the optimization parameters in the

Hopfield dynamical equation for the access switching problem is presented.
It is shown that imposition of appropriate constraints on the optimization
parameters can lead to improvements in the operation of the Hopfield model.
This provides a means of extending the usefulness of the model, not only
as far as the current switching problem is concerned, but also in terms of
the application of the model to other problem domains.
10.2
HOPFIElD NEURAL NETWORK BASIC MODEl
The neural network used in this study is the Hopfield model [8, 9] . It consists
of a high number of simple processing elements interconnected via the neural
weights. Due to the high number of neural connections, the Hopfield network
provides massive processing capabilities. At any moment in time each neuron
is described by two continuous variables, the neural activity level Xij and the
neural output Yij' These variables are related by the nonlinear monotonically
increasing processing function!:
... (10.1)
In this work! is taken to be the sigmoid function:
!(Xjj) =
1 + exp( _ !3Xjj)
... (10.2)
where !3 is the gain factor, which controls the steepness of the sigmoid
function, as illustrated in Fig. 10.1.
1.0
0.8
0.2
O.O...=::::..L_-=::::.L.._ _L-_---I.._ _....L.._........J

-60
-40
-20
o
20
40
60
x
Fig. 10.1
Sigmoid function for values of {3 of 0.08 and 0.16
HOP FIELD BASIC MODEL
155
The Hopfield equation describing the dynamics of a neuron is given by:

n
- aXjj
+ E
T;j,kl Ykl
... (10.3)
Ij,j
k,l= I
Tjj,kl is the weight matrix which describes the connection strength between
the neurons indexed by (ij) and (kl). I ij describes the external bias which can
be supplied to each neuron. Hopfield has shown that, for the case of
symmetric connections Tjj,kl = Tkl,ij and monotonically increasing
processing function, the dynamical system of equation (10.3) possesses a
Lyapunov (energy) function which decreases on the system's trajectories. The
existence of such a function guarantees that the system converges towards
equilibrium states which define point attractors for the dynamics. The
Hopfield energy function [8] is of the form:
E = -
Tjj,kl Yij Ykl
ij,kl
f(Xij)
Aij
i,j=1
where the
Ajj
I jj Yjj
i,j = I
j-I(Xjj) dXiJ
... (10.4)
are positive constants. It is easily established that with
Ajj
a, for all i,l, the equation of motion (10.3) can be written as:
... (10.5)
where the dot denotes differentiation with respect to time. From this relation
it can be derived that:
n
E = -
i,j = I
-2 E
i,j = I
aE'
aE')
Y . . +_x..
( aYjj
I)
aX;j
I)
~ (~)2
axI)
ayI)
<0
... (10.6)
156
NEURAL NETWORKS
The inequality follows from the fact that the processing function is
monotonically increasing. Without the integral term in equation (10.4) the
time derivative of the energy function becomes:
E ( i ~j.kl hi + J.)
i,j = I
( E
'lij,pq Ypq
+ I ij
Ti-J. kl
p,q= !
( E
k,I=!
Ykl
(d!(X;j))
dXij
IJ
k,l= !
E a--x- ( dd!(Xjj))
i,j=!
IJ
IJ
h)
Xij
... (10.7)
In general, this expression would not satisfy the condition E ~ 0 because

of the second term on the right hand side of equation (10.7), but in the large
gain limit the derivative df!!Jj) becomes a delta function and therefore
df!!Jj) Xi -- O.
dXij
J
dXij
This establishes a result discussed by Hopfield [8]. Another way to make

equation (10.4) a Lyapunov function without the integral term is by making
the decay a zero in the dynamical equation. This approach has been discussed
by Aiyer et al [10]. In this case the dynamic equation becomes:
n
k,!= !
10.3
'lij,kl Ykl
+ I ij
... (10.8)
DESCRIPTION OF THE SWITCHING PROBLEM
A switch can be regarded as a device which takes a set of n input signals

and reproduces them in an order described by the switching mechanism. This
study is concerned only with square switches of size n x n. A request for
a call connection through that switch can be mapped on to an n x n binary
matrix in the following manner [5,6]. Let rij be the number of packets at
input i requesting a connection to the outputj. The status of each input queue
can be represented by the matrix:
Y=
YII,
,Yin)
Ynl:
':Ynn
Yij
= [
o if rij
1 if
rij
... (10.9)
THE SWITCHING PROBLEM
157
In this formulation the rows represent the input lines, and the columns
represent the output lines. The above condition states that there could be
more than one packet in input line i requesting a transmission to the output
line j. Every index pair (ij) defines a connection channel. During each time
slot, only one packet can be permitted per channel. In the general case the
configuration matrix, which sets up the channel connections, can maximally
contain one non-vanishing element in each row and column. If there is a
queue at each input and a request for a connection to each output, then the
request matrix Y = Yij is said to be full. This amounts to every column and
every row containing some non-vanishing entry. Furthermore, if there is more
than one non-vanishing entry in any row or column, then more than one
input is requesting to be connected to the same output. Since only one input
can be connected to one output at any time, the switching mechanism will
have to choose only one request at any time and force the rest not to be
connected. Also, the switching mechanism must not generate non-vanishing
entries in a column or row, which initially had zero entries. In the case of
a full request matrix the optimal switching requires the following mapping:
[YI, ....... , Yn] -
... (10.10)
[ep(ll' ....... , ep(nl]
where Yk = [Ylb ....... , Ynkl T and ep(nl is an n component column vector

with 'one' in the p [n] th position and 'zero' everywhere else. The position
p represents a permutation on the integer set! I, 2, ....... ,n]. If Y is a full
matrix of size n x n there are n! solutions, i.e. configuration matrixes, since
there are n! different permutations on the set [1, 2, ....... , n]. A simple example
demonstrates this for the case n = 3:
input matrix:
(i )
n( n( n
n( n( )
output matrices:
(
(
1 0
0 1
0 0
0 0
1 0
o 1
1 0
0 0
0 1
0 1
1 0
0 0
0 0
0 1
1 0
0 1 0
0 0 1
1 0 0
158
NEURAL NETWORKS
Each of those configuration matrices maximizes the packet flow through

the switch. In the case of y not being full, the number of acceptable solutions
is fewer. An example of this is demonstrated by the following case:
n
n( n
n( n
input matrix:
0 0
I 0
I
output matrices:
(
(
0 0
I 0
0 I
0 0
0 0
I 0
0 0
I 0
0 0
0 0
0 0
0 I
Here it is important to realize that if a row or column contains only zero

entries then the resulting configuration matrix has to contain only zeros in
the same rows and columns. Otherwise connections would be made even
though there was no request for these calls. In this case an attractor has to
be imposed; this is discussed later (see section 10.7.2). The access scheme
is formulated with multiple queues at each input and a separate output for
each input [6]. For a switch of size n x n, there will be n queues at each
input giving a total of n 2 input queues. During each time slot, packets are
transferred from the input to the output in such a way that the throughput
of the switch is maximized. Maximization of the throughput requires
continuous optimization, and so it is for this purpose that the neural network
is used.
10.4
THE ENERGY FUNCTION
From the analysis of the configuration matrices one can construct an energy
function for the switching problem. This function can then be compared with
the Hopfield energy function to find the resulting weight connection matrix
and the external biases. From the description in the previous section it is easily
established that an energy function for the switching problem [6] is given by:
ESTABLISHING OPTIMIZATION PARAMETERS
E= -
E YiJYil+2 i,j,I=1
2
I r! j
+ C(
2
i,j= I
159
E Yij Yjk
i,j,k= I
k r! i
... (10.1 I)
n- Yij )
This energy function takes on zero values only for configuration matrices
that are solutions for a full request matrix Y (i.e. for an input matrix which
has at least one non-vanishing element in each row and one non-vanishing
element in each column). The last term on the right hand side of equation
(10.11) takes on positive values if the request matrix contains one or more
zero rows or columns. A comparison with equation (10.4) without the integral
term gives the following expressions for the weights and biases:
Tij,kl =
- AO ik (I
C
2
... (10.12)
where oij is the Kronecker delta. Substituting this back into the dynamical
equation gives:
dXjj
dt
-axij-A
'-'
I r! j
Yil- B
'-'
Yk"+k r! i
J
2
... (10.13)
which is the desired differential equation for the switching problem. Under
this dynamical equation the neural activities are a dissipative process. They
will develop towards neural configurations which minimize the energy
function (equation (10.11 and are therefore solutions to the switching
problem.
10.5
ESTABLISHING THE OPTIMIZATION PARAMETERS
In order to gain some information about the optimization para,meters, it is

useful to consider the dynamical equation under conditions of equilibrium,
i.e. when:
... (10.14)
160
NEURAL NETWORKS
Then, from equation (10.13), it is found that:

n
XO,ij = - A
I~j
f(XO,il) - B
f(xo
k~i'
kj)
c
2
... (10.15)
To be able to put some further restrictions on the parameters, the neural

positions with 'one' and 'zero' must be considered independently. Given that
a correct solution to the switching problem has been found, this means that
the final solution of any input matrix would have at most one active unit
per row and per column. First (i,j) is assumed to be a zero position (i.e.
f(xij) = 0). Then, it is straightforward to establish that the equilibrium
condition reads:
L.= -A-B+
"'\1,1)
C
2
... (10.16)
where the superscript 1 denotes the first equilibrium solution. Because it is

at equilibrium, the associated y value must therefore be close to zero as x
tends to minus infinity. Accordingly, equation (10.16) can be rewritten .as
the following inequality:
- A - B+ 2
~O
... (10.17)
In general there will be n 2 - n positions in the network satisfying the

conditionf(xij) = O. This solution may be referred to as the negative attractor
solution, as it is the equilibrium solution obtained as x tends to minus infinity.
If it is assumed that the position (i,j) represents a location which is tending
towards the value one, i.e. f(Xij) = 1, in equilibrium, then the equilibrium
condition becomes:
2
Xo,ij =
2"
... (10.18)
where superscript 2 represents the second equilibrium solution. It is easily

seen that y tends to one as x tends to infinity, and accordingly equation (10.18)
can be rewritten as the inequality:
>0
... (10.19)
In general there will be n positions satisfying the conditionf(xij) = O. The

equilibrium condition means that n neurons in the network have converged
to one of the two attractors and (n 2 - n) neurons have converged to the other
attractor. From these conditions it is found that the parameters have to satisfy
the following inequalities:
SIMULATION RESULTS
... (10.20)
0<C<2(A +B)
10.6
161
THE CONNECTION MATRIX
Tij,kl -
SPECIAL CASE
In this section some general properties of the connection matrix are discussed,
such as the degree of connectivity it establishes in the network. A symmetric
connection matrix can be achieved by putting A = B, in which case the
condition in equation (10.20) reads:
... (10.21)
0<C<4A
A network connected by Tjj,kl is a sparsely connected network. This is

easily seen, as the first term on the right hand side of equation (10.12) is
non-zero only if (i = k) and (j ~ k). This gives n 2(n - I) connections. The
second term equally has many non-vanishing contributions. The connection
matrix in equation (10.12) therefore defines 2n 2(n-l) connections. This
number has to be compared with the maximum possible number of connections a network of n 2 neurons can have, i.e n4 - n 2
There is an n! redundancy in the solutions of the switching problem in
the sense that each initial state can lead to n! different solutions. Let ek be
an n component vector with one in the kth position and zero everywhere
else; in component notation ek,j = Okj. Then every initial neural configuration
is to be mapped into the matrix array [el' e2'
,en]' or permutations
thereof [ep (l),ep (2), ..... 'ep (n)]. This condition is restricted in the following
sense. If a row or a column in an incoming neural matrix contains only zero
positions as outputs, i.e. negative neural activations, then these are to be
kept at their zero values. The simulation of equation (10.13) is therefore only
to be applied to the rows and columns which initially contain entries with
positive outputs, i.e. call requests. All other rows and columns are kept at
their zero output values. It is apparent from this argument that an input
matrix with many zero elements will require less time to converge to solution.
If one does not proceed in this manner the network is more likely to converge
to an invalid solution. The benefits of this approach have been demonstrated
in a number of simulations.
10.7
SIMULATION RESULTS
The main application of the Hopfield dynamical model, as studied here, is

to offer a potential solution to real time switching optimization problems.
This could be achieved provided an appropriate set of synaptic weights can
162
NEURAL NETWORKS
be found. In any system in which the equations of motion are determined

by a continually reducing energy function, there is a risk that the system may
converge but become trapped at a non-valid solution. Therefore, in the current
problem, noise is introduced into the system to avoid convergences to spurious
states. This is achieved by randomly varying the value of (3 in equation (10.2).
Throughout these simulations the sigmoid function in equation (10.2) is used
with (3 changing randomly between 0.08 and 0.16 (see Fig. 10.1). This is to
allow the network to escape from trapping minima and to converge to a valid
solution. At any fixed time all the neurons will change according to one
particular sigmoid function.
10.7.1
The role of the optimization parameters
In the previous sections, it was shown what relations the parameters A, B

and C have to satisfy in equilibrium. In this section the results of simulations
are presented and an assessment is made of the stability properties of the
network as a function of A, Band C. The condition given by equation (10.21)
is a soft constraint; to be more realistic the hard constraint must be considered
as it is also supported by extensive simulations, i.e. at the limit, the magnitudes
of kO,ij and XO,ij are equal. After putting lO,ij= into equation (10.16),
equations (10.20) and (10.21) lead to:
o <C<2A
... (10.22)
In the simulation of the Hopfield model for the switching problem, it

is found that any values of A and C which satisfy equation (10.22) give valid
results provided that (3 and of are correctly chosen. This has been confirmed
for a large number of input matrices and different matrix sizes.
Consider the full 8 x 8 input request matrix; in order to understand the
behaviour of the network as a function of the A, Band C parameters, a
complete request matrix will be studied. The 3-D graphs of the neural velocity
dXij and neural output YiJ" as a function of time are shown in Figs.
dt
10.2-10.5 for two different sets of A, Band C.

The function in Fig. 10.2 is very unstable. Although some neurons are
stable in Fig. 10.3, there are a few which are not stable and no solution is
achieved. In constrast, the function in Fig. 10.4 attains stability after only
40 time intervals and the system converges to an acceptable solution
(Fig. 10.5). This example demonstrates clearly how sensitive the network
dynamics are to the values of the parameters A, Band C. For some values
no equilibrium condition is achieved mainly because, as there are no longer
two separate attractors, only one attractor will exist at any time. But by
choosing A p C, the existence of two attractors throughout the time
SIMULATION RESULTS
163
5.0e+03
S -5.0e+03
~
~ -1.0e+04
"0
-1.5e+04
-2.0e+04
100
80
60
40
number of iterations
Fig. 10.2
20
20
00
40
60
80
neuron index
dx for simulations of 8 x 8 input matrix with A = B = 1250 and C = 700.

dt
_ 1.0e+00r
6.0e-01l
2.0e-01
100
80
number of 60
iterations
40
80
20
00
Fig. 10.3
f(x) for simulations of 8 x 8 input matrix with A
= B = 1250 and C = 700.
2.0e+03r
I
O.Oe+OO
I
.
: -4.0e+03;
:!:!. - 6.0e+03:
.
-8.0e+03r
x
-2.0e+03r
-1.0e+04 i
100
80
00
Fig. 10.4
dx for simulations of 8 x 8 input matrix with A = B = 1250 and C = 100.

dt
164
NEURAL NETWORKS
1.0e+00;
8.0e-01;
:g6.0e-Ol '
- 4,Oe-Ol
2.0e-Ol
O.Oe + 00
100
80
60
number of
iterations 40
20
Fig. 10.5
f(x) for simulations of 8 x 8 input matrix with A = B = 1250 and C= 100.
evolution of the dynamical equation can be ensured. Hence, oscillations and

unstable behaviour of the network can be avoided.
10.7.2
Imposed attractor
As discussed in section 10.3, incomplete requests have to be dealt with

carefully in order to make sure that connections are not generated where there
are no requests. The energy function in equation (10.11) and the connection
matrix (equation (10.12 do not guarantee this to be the case. This follows
from the fact that the energy function (equation (10.11 does not take on
minima if one or more rows or columns have only vanishing entries, i.e. that
configuration is energetically unfavourable. Dissipative dynamics therefore
generate non-zero entries in rows and/or columns which initially have none.
To avoid this occurrence, the null columns and rows are decoupled from
the Hopfield dynamics. For these special cases, the dynamics have to be
modified. This is done in the following manner.
All non-zero input neurons will be allowed to change as dictated by the
dynamics, whilst all the others will be forced into the negative attractor (the
imposed attractor), and disconnected from all other neural positions in the
network. Similarly all the external biases of the zero neurons are set to zero.
This arrangement guarantees that these zero neurons stay at their zero output
values as required. Without disconnecting the neurons with zero requests in
the manner discussed above, the network will connect them in order to
minimize the energy function.
SIMULATION RESULTS
165
This can be demonstrated by an 8 x 8 input request matrix. This matrix

is taken to have non-vanishing elements in its first and last row, while all
other elements contain zero requests. This type of input will only require
two connections.
Initially, all the neurons are deemed to be able to change freely according
to the dynamical equation. The result of the neural output is shown in
Fig. 10.6. This shows that n = 8 neurons are connected, which means that
the network artificially connected calls which are not required by the initial
input matrix. However, if the zero-request neurons are disconnected from
the dynamics of the system, as described in the earlier sections, then the
network will converge to a valid solution as can be seen in Fig. 10.7.
1.0e+00.
8.Oe- 01 r
6.0e-01 t
4.0e-01f
2.0e-01 r
O.Oe+OO
100
80
60
40
number of iterations
20
00
Fig. 10.6
neuron index
Result for simulations of 8 x 8 input matrix with A = B= 1250 and C= 100; all
neurons are changing according to the dynamics of the network.
1.0e+00
8.Oe-Ol
6.0e-Ol
4.0e-01
2.0e - 01
O.Oe+OO
100
80
60
80
40
number of iterations 20
00
Fig. 10.7
neuron index
Result for simulations of 8 x 8 input matrix with A = B = 1250 and C = 100; nonrequested neurons are forced towards the imposed attractor.
166
NEURAL NETWORKS
It is possible to study the optimization process for the entire ensemble

n2
of 2 input request matrices. In the case of n = 4, for example, there are
65 536 possible request matrices.
Using parameter settings A = 1250, B = 1250 and C = 100, simulations of
the neural network have been performed for each of the ensembles of input
request matrices for the cases of n = 2,3 and 4. On average, the computation
reaches convergence in less than 20 time intervals for all requested input
matrices. More importantly, the neural network finds valid solutions in all
cases. In order to test the performance of the neural network for larger
networks (n = 5 and 6), 100 000 input matrices were randomly chosen from
n2
any of the 2 possible matrices. Results of the simulations were
encouraging with convergence being achieved in typically less than 30 time
intervals and with all solutions being valid.
10.8
DISCUSSION AND CONCLUSIONS
In this chapter a study of the stability and sensitivity properties of the Hopfield
dynamical model applied to a crossbar switch has been presented. The
dynamical equation (fot the switching energy function) has been assessed
with respect to the setting of internal optimization parameters and
information about network stability and performance obtained. Various
bounds on the values of the optimization parameters were studied and
extensive simulations have been performed to verify these bounds. The role
of optimization parameters (A, B, C) and the role of an additional imposed
attractor (x 3 = - 2A) have also been demonstrated. Also, the use of the
random gain parameter {3, as given by equation (10.2), is crucial to push the
network out of local minima in which the system may get trapped. The
approach established here for the application of the Hopfield dynamical
model to crossbar switching is relevant to other problems, e.g. resource
allocation, in which an optimization process represents a key step in reaching
a solution. Such examples are ubiquitous in telecommunications and
computational systems.
The simple approach to the determination of convergent optimization
parameters in the Hopfield dynamical model presented here has shown how
the introduction of an imposed attractor into the network enables the model
to be computationally operational for large random input matrices. This
capability has been exploited in the study of a switching problem, which was
previously handicapped by slow convergence of the network. It has also been
shown how improved computational speeds can be obtained.
REFERENCES
167
REFERENCES
I.
Pattavini A: 'Broadband switching systems: first generation', European

Transactions on Telecommunications, 2, pp 75-89 (1991).
2.
Listanti M and Roveri A: 'Integrated services digital networks: broadband

networks', European Transactions on Telecommunications, 2, p 59 (1991).
3.
CCITT, Draft Recommendation 1.150: 'B-ISDN ATM Aspects', Geneva (January

1990).
4.
Hiramatsu A: 'ATM communications network control by neural networks', IEEE

Transactions on Neural networks, -.L, p 122 (1990).
5.
Brown T X: 'Neural networks for switching', in Posner EC (Ed): 'Special Issue

on Neural Networks in Communications', IEEE Communications Magazine,
p 72 (November 1989).
6.
Ali M M and Nguyen H T: 'A neural network implementation of an input access

scheme in a high-speed packet switch', Proceedings of IEEE Globecom,
pp 1192-1196 (1989).
7.
Marrakchi A and Troudet T: 'A neural network arbitrator for large crossbar
packet switches', IEEE Transactions on Circuits and Systems, 36, No 7,
p 1039 (1989).
-
8.
Hopfield J J: 'Neurons with graded response have collective computational

properties like those of two-state neurons', Proc Nat! Sci, Biophysics, 81 , pp
3088-3092, USA (1984).
-
9.
Hertz J, Krogh A and Palmer R G: 'Introduction to the theory of neural

computation', Addison-Wesley, Redwood City (1991).
10. Aiyer S V, Niranjan M and Fallside F: 'A theoretical investigation into the
performance of the Hopfield model', lEE Transactions on Neural Network, I ,
No 2 (June 1990).
-
11
SYSTEM AND NETWORK

RELIABILITY
P Cochrane and D J T Heatley
11.1
INTRODUCTION
Optical fibre transmission systems have now largely replaced their copper
forebears. This has been achieved by overlaying the copper pair and coaxial
systems with optical fibre, thereby realizing vastly increased repeater spacing,
smaller cable size, increased capacity and reliability, and orders of magnitude
in reduction of costs. Despite these radical changes, the approach to network
design, reliability and performance assessment has seen little change, merely
a scaling of the established copper techniques, figures and assumptions, with
minor modifications to accommodate the new family of optoelectronic
components. The validity of this is questionable since the move from copper
to glass has eradicated key failure mechanisms such as moisture and corrosion,
while at the same time the considerable increase in repeater spacings has
removed the need for power feeding and so has moved the reliability risks
towards surface switching stations. Present system and network models do
not reflect the full impact of these improvements or attach sufficient
importance to the novel features of this new technology.
The established approach to network management, maintenance, repair
and restoration is perhaps best described as curative, i.e. preventative or preemptive measures to improve overall performance are generally not adopted.
However, there is an increasing body of evidence that suggests such an
approach is possible. This is increasingly so with the performance enhancements realized by fibre systems that, in turn, allow the detection of fibre
RELIABILITY
169
disturbance prior to a breakage, plus the location of such breaks. In addition,

the ambient bit error ratio (BER) of systems has also been reduced to the
point where abnormal behaviour can be detected and analysed as a precursor
to failure (see Chapter 12).
The disparities between copper and fibre are likely to be further
compounded by optical amplifiers opening up the full fibre bandwidth. These
devices will also render networks insensitive to 'lossy' components and clear
the way for the eradication of much of the concatenated electronics currently
necessary in transmission. Indeed, opportunities for radically new forms of
network will arise that will challenge most of the established wisdoms in
system, network, management and operational design. For example, the
ability to transport a number of wavelengths on a single fibre will see higher
concentrations of traffic than hitherto. Paradoxically, a single fibre cable
with a very high concentration of traffic will realize a higher level of
operational reliability than distributing the traffic over hundreds of fibres
in parallel because of the much faster repair time. There is also likely to be
a collapse of hierarchical networks, from the old international, trunk, junction
and local loop heritage, back to a single local loop structure, i.e. a flat
monostructure of stunning simplicity and elegance.
This chapter examines the established reliability models, techniques and
practices, and then goes on to consider the potential gains and novel properties
brought into the arena by optical fibre technology. The material is presented
chronologically and is supported by data from operational networks,
laboratory and field trials, plus theoretical studies and computer models of
systems past, present and future. Firstly, long-distance point-to-point links,
both terrestrial and undersea, are considered, and then the local loop. Finally
the overall picture is considered and a number of radical changes are
recommended performance enhancements for the future.
11.2
11.2.1
RELIABILITY
General
Transmission technology has always been a rapidly changing field with new
generations of system evolving on an ever-shortening time scale. Within a
ten-year period two generations may be realized and deployed in very large
numbers, with an expected service life of 7-10 years. Even undersea cables,
traditionally a techno-cautions field, are now being upgraded with higher
170 SYSTEM AND NETWORK RELIABILITY
capacity (5-10 times) systems in the same time frame. Not surprisingly,
therefore, the reliability equation is constructed with data of an immature
and often dubious pedigree. Never before has there been a time when the
statistical reliability data and evidence being accumulated for a system under
study has been overtaken by a new generation containing, in part at least,
some radically new technology - and this is likely to become the norm.
It is also worth noting that even when a technology pedigree has been
established over a long period of time, a reliability model is still likely to
succumb to relatively large prediction errors in either direction. Even with
the utmost of care, a system containing thousands of components may
experience the odd rogue that slipped through, or suspect batches installed
in error - stipulated storage and operating conditions exceeded for individual
components or sub-modules and complete systems, human interdiction
introducing a progressive weakening of sub-systems and systems, the impact
of static discharge, electromagnetic radiation by man, acts-of-God, etc. The
task ahead is therefore difficult and complex and it is necessary to make a
number of fundamental and simplifying assumptions. Whilst the absolute
nature of the results presented here can be challenged, their relative accuracy
is sufficient for the purpose of comparison. Moreover, the results fall in line
with operational experience where available.
In this study six fundamental assumptions are made:
a Markov model is sufficient to describe the 'working-failed' status of

all elements;
failure events are statistically independent;
the distribution of life and repair times is a negative exponential;
the failure rate of all components is constant;
the repair rate of all components is constant;
the repair rate is significantly greater than the failure rate.
Whilst these assumptions are not strictly true they are sufficient to
construct a meaningful comparative model. It is also useful to make the
following additional assumptions based upon practical experience with real
systems:
all components are tested and proven to be within specification, and have
verified characteristics;
all infant mortalities have been filtered out;
all system components are stable in the long term;
RELIABILITY
171
all system components can be allocated a failure rate in terms of operating

time normalized to 109 hours - this is failures in ten (to the power 9)
hours (FITs);
all elements can be allocated a mean time between failure (MTBF) that
is either computed or estimated based on field experience;
a mean time to repair and/or mean time to replace (MTTR) can be

assigned to all elements including duplicated, hot stand-by, and switched
replacements;
the system availability is defined as:

Av =
MTBF
(MTTR + MTBF)
the system unavailability is defined as:

MTTR
(MTTR
MTBF)
the reliability of individual components, modules, sub-modules, cables

and plant can be defined on an average basis using the above definitions
and assumptions.
In order to establish an expectation against which individual technologies,

configurations, systems and networks can be compared, the typical reliability
figures used in this study have been drawn from a large number of products,
manufacturers, systems and networks world-wide in order to give the broadest
perspective.
11.2.2
System and network considerations
Since 1969 the progress of integrated circuit technology has seen a doubling
of electronic circuit density each year. Whilst data/clock rates, power
consumption, and performance have improved at a more modest rate, they
have generally been sufficient to place transmission system development at
the leading edge. The key reason for this is that the high-speed elements of
transmission systems have required only a modest degree of integration,
whereas the lower-speed elements have taken advantage of significant
integration. By and large the field is dominated by silicon technology with
current bit rates reaching beyond 10 Gbit/s. The use of GaAs, although
capable of significantly higher speeds, is still relatively rare and may be
confined to optical devices and specialist amplifiers and logic elements. The
172
SYSTEM AND NETWORK RELIABILITY
bulk of the electronic terminal and repeater technology therefore has a long
and traceable heritage through its silicon basis, affording some confidence
in device and component performance prediction. Furthermore, there is now
a reasonably solid body of evidence and data to be found in publicly available
handbooks of reliability data (e.g. HRD 4).
For the purposes of historical completeness this chapter presents reliability
analyses on copper systems (twisted pair and coaxial), multi-mode and single
mode fibre systems. From the data available, mean reliability figures have
been derived across European and North American systems for all system
types, including the now dominant 565/1800 Mbit/s plesiochronous digital
hierarchy (PDH) systems. The numbers of these systems deployed can be
counted in their thousands and thereby provide a good foundation for
extrapolations into higher order and future systems. For the synchronous
digital hierarchy (SDH) systems under development, a scaling from the
existing PDH systems has been combined with manufacturers' data and trial
results. In the case of optically amplified systems and the use of wavelengthdivision multiplexing (WDM) and wavelength-division hierarchies (WDH),
the best available data from reported system experiments and trials has been
assumed. In all cases, the objective has been to be balanced and consistent
in order to benchmark comparisons.
11.2.3
Cable and line plant
Transmission cables and line plant vary within and between countries and
comprise:
exposed overhead cables;
direct bury;
plastic, metal and earthenware ducts buried to depths of 1-2 m;
river, lake, sea and ocean crossings at depths of less than 1 m to greater
than 3 km, with or without armour.
More recently the process of deregulation has encouraged new network

operators to adopt novel solutions, such as deploying fibre cables in disused
canals and waterways, in storm, clean and foul water drains, and using
lightweight fibre structures bonded to the earth wire of power distribution
lines. Although the impact of these established and new cabling techniques
on the reliability equation varies greatly, they are all subject to the following
law - it is fundamentally impossible to improve upon the reliability dictated
by cable damage or failure on an individual link. Alternative routeing (i.e.
RELIABILITY
173
redundancy) within the network is frequently used to improve end-to-end

reliability but once again this improvement is fundamentally constrained.
The bottom line on transmission system reliability is the cable.
The transition from copper to glass has brought about significant
improvements through the unique properties of glass - for example:
fibres do not corrode and require far fewer joints because they can be
installed in longer continuous lengths;
fibre cables are not affected by moisture;
the increased repeater spacing owing to the low loss of fibre has eradicated
the need for power feed conductors and buried/surface repeaters in
terrestrial systems, and is now only necessary on undersea systems that
exceed 150-250 km in length;
the eradication of buried/surface repeaters in terrestrial systems has also

reduced the risks of damage due to human interdiction/interference;
electro-magnetic compatibility (EMC) has been dramatically improved

with greater resilience to r.f. radiation, power surges and lightning;
despite early fears, fibre technology has turned out to be more resilient
and easier to handle than its copper forebears;
fibre and copper, however, are equally susceptible to the spade and back
hoe digger.
It is important to differentiate between two kinds of link availability one is the link availability experienced by the customer, the other by the
operator. The two are different due to the nature of individual cable failures
and the use of N+ I stand-by, or network protection/diverse routeing. To
understand this, consider these two extreme failure scenarios.
Only a single fibre between two end points is broken - to effect a repair
the operator has to take the entire cable out of service and insert an extra
length; so effectively a total outage is seen across the whole cable without any circuit or network protection, this is what the customer sees,
too. However, if automatic protection is provided, the customer will only
experience a momentary break in transmission. The worst experience the
customer is subject to is when manual re-routeing is used and the MTTR
can extend to several minutes or even hours for remote locations.
174
All the fibres between two end points are simultaneously broken without circuit or network protection the customer sees the same outage
time as the operator. The last customer is restored when the last fibre
is spliced. If network protection is provided it is only the operator who
is inconvenienced.
In a modern network, circuit and/or network protection is provided as

customers now demand a level of service that cannot be delivered by any
other means. This chapter therefore focuses on the outages experienced by
the operator and assumes that all cable failures necessitate a complete repair
to all fibres, Le. the cable has to be taken out of service to have a new section
inserted, requiring two splices per fibre. It is also assumed that the operator
undertakes to dispatch an adequately equipped repair team to minimize down
time - a reasonable assumption in today's competitive climate. In short,
the worst case is assumed for the operator and the best case for the customer.
Precise statistics on line failure rates and their causes are rare as the process
is a stochastic one that changes with technology, climate and the activities
of society. The typical MTBFs and MTTRs upon which this study has been
based are given in Tables ll.l and 11.2 for 1000 km (long-distance) and
20 km (local-loop) point-to-point spans of copper and fibre cables.
Table 11.1
Practice
Cable failure and repair data for 1000 km long-distance link.
Failure mechanisms
MTBF
(years)
Fibre
Copper
MTTR
(days)
buried
earthenware
ducts
damage by other utilities

back hoe diggers
water ingress
own working parties
ageing
10
<I
direct bury
as above
<1
overhead on poles
as above + windage
snow and ice
high loads and cranes
tree falls
road accidents
buckshot
<1
undersea
shallow
depth
trawlers' fishing nets

tidal erosion
water ingress
corrosion
>15
15
<7
non-armour
deep lay
water ingress
corrosion
sharks
>40
40
<20
RELIABILITY
Table 11.2
Practice
175
Cable failure and repair data for 20 km local-loop link.
Failure mechanisms
MTBF
(years)
Copper
Fibre
MTTR
(days)
buried
earthenware
ducts
damage by other utilities

back hoe diggers
water ingress
own working parties
ageing
160
100
<1
direct bury
as above
40
25
<1
overhead on poles
as above + windage
snow and ice
high loads and cranes
tree falls
road accidents
buckshot
2.5
<1
11.2.4
Repeaters
When optical fibre systems were first being developed, pulse code modulation
(peM) and coaxial digital systems had already matured and were in
widespread use in the network. Repeaters were placed at regular intervals
along the line to reshape, regenerate and retime (i.e. the 3R process) the signal,
thus ensuring a consistent performance over long lines that would otherwise
be impossible. A migration of that same technology into the optical regime
was the obvious next step. The first optoelectronic repeaters were realized
by merely introducing lasers and photodetectors into standard electronic
circuitry. The fact that fibre does not introduce the same degree of signal
distortion as experienced on copper meant that optoelectronic repeaters were
devoid of sophisticated equalization circuitry. With modern day optoelectronic repeaters it is possible to achieve repeater spacings well in excess of
100 km, although in practice spacings tend to be a more modest 30-50 km.
This contrasts significantly with the 1-2 km for coaxial systems using electronic
repeaters.
With the recent development of optical amplifiers, particularly erbiumdoped fibre amplifiers (EDFA), repeaters will be all-optical in the signal path,
with a small amount of electronics retained purely for monitoring and
management functions. This technology will open up the full bandwidth of
fibre routes, effectively transforming them into transparent optical pipes with
almost unlimited bandwidth. This, in turn, will see the introduction of
wavelength-division multiplexing (WDM) with signal formats resembling
those of the frequency-division multiplexing (FDM) copper systems, albeit
in the optical regime.
176
Table 11.3 lists the FIT figures for the individual components in electronic,
optoelectronic and all-optical repeaters. These figures are shown through eight
eras of digital transmission technology, spanning the introduction of simple
PCM in the 1960s through to the optically amplified systems of the 1990s
and beyond. In each case the total FIT figure is converted into an MTBF
from which an unavailability figure is computed using the assumption that
the MTTR is constant at 5 hours. From Fig. 11.1 it is clear that the
introduction of power-supply duplication to offset the high FIT of DC/DC
converters, which were introduced on a large scale during the 1980s, has a
significant bearing on the unavailability of terrestrial systems. Undersea
systems benefit from a further improvement in unavailability by using power
feeding along cable conductors and doing away with individual DC/DC
converters.
Full duplication of repeaters is seldom undertaken in practice as the overall
improvement in system reliability is negligible when compared with the risk
posed by the cable. It would also involve the added complexity of hot standby switching, sensing and control, which is not trivial.
11.2.5
Terminal stations
Terminal stations are the network nodes at each end of a long transmission
link and are effectively the interface point (Le. switching centre, central office,
etc) to the local loop. Essentially they comprise repeaters (electronic or
optical), multiplexers (MUX) and switches, together with management and
control elements. Table 11.4 lists the FIT figures for the key components
across eight eras of digital transmission technology. In each case the total
FIT figure is converted into an MTBF and unavailability, again assuming
a constant MTTR of 5 hours. Figure 11.2 shows the unavailability is again
dominated by the risk associated with the power supply DC/DC converter.
Duplicating the power supply reduces the risk to well below that imposed
by the station battery and power system for remote locations. Again, full
duplication of terminal equipment in terrestrial systems is seldom undertaken
as the overall system advantage is negligible when compared with the risk
posed by the cable and repeaters. Furthermore, it involves the added
complexity of hot stand-by switching, sensing and control, which again is
not trivial but is less of a problem to deal with than in the repeater case.
Full terminal duplication is most commonly adopted on undersea systems,
since the highest achievable availability is warranted on such critical
international routes.
Cin;uil t'oard
I
2
I
0
0
0
0
26
l
6
0
"
2
2
0
0
0
0
632
J2
120
4
4
4l
7
0
3
12
3
2
8
3
2
0
0
0
0
0
0
Table 11.3
I
0
0
,
3691.25
30.9J
0.00189
1000
'00
150
0
0
Coaxial
90/140 Mbills
PDH
TOlal
Count
FITS
164
32.8
1.8
6
3l
7
18.6
62
12
12
0
0
3
4.l
Il
6
3
60
2
10
10
15
5
4
60
150
3
0
0
0
0
0
0
0
0
0
0
56
l60
3
110
410
3
0
0
0
0
0
0
0
0
850
'l
4l
6.75
0
0
0
0
10.8
16
6
60
4
4
0
0
I
2
I
0
0
l
I
2
24
106
4
4
38
7
0
2
6
I
I
6
2
I
I
0
I
I
0
0
2.
2
I
0
2
0
0
\70
20
0
2
1980s
3276.8
34.84
000168
1000
800
150
0
0
7
0
6
1.8
2
30
6
6
15
50
0
200
10
0
0
290
100
IlO
0
80
0
0
57
3
0
I
7.2
lO
I
200
11.4
FITS
21.2
1.2
20
4.9E-05
92.152
1239
4.7E-OS
95.48
100.424
1137
5.IE-05
1196
42171.7
2.71
0.02162
42253
2.70
0.02166
1000
0
0
40000
8
42335.1
2.70
0.0217
I
0
0
2
4
2171.:'
52.5(,
0.00111
1000
0
0
40000
8
2253
50.67
0.00115
I
0
0
2
4
0.6
l
6.6
4
0
0
0.6
2
30
4
6
Il
0
200
0
0
10
100
60
50
IlO
20
40
60
200
32
1.8
0
I
3
40
I
200
10.4
Single Mode
2.4/10 Gbil/s
SDH
Count
TOlal
FITS
60
12
3
0 .
I
5
26
7.8
4
4
0
0
0
0
I
0.3
0
0
I
30
2
2
2
6
I
15
0
0
I
200
0
0
0
0
I
10
0
0
4
40
I
lO
I
IlO
I
20
0
0
2
120
6
240
340
34
6
0 .
0
0
2
I
6
1.8
2
20
I
I
2
200
1990,
2335.1
48.89
0.0012
1000
0
0
40000
l2
2
I
22
4
0
0
2
I
I
4
2
I
0
I
0
0
I
I
6
I
I
I
I
I
l
320
12
0
2
10
4
I
2
Single Mode
0.56/1.8 Obills
PDH
Count
Total
Reliability figures and calculations for a single I-way repeater.
Total FIT 1"01 undersea repeaters

MTBF (year~)
0/0 Unavailability
I
0
0
2
4
Single Mode
90/140 Mbil/s
PDH
TOlal
Count
FITS
13.6
68
3
0 .
2
10
28
8A
6
6
0
0
3
4
1.2
I
2
I
30
6
6
2
6
Il
I
I
lO
I
200
0
0
10
I
0
0
100
I
100
10
I
lO
IlO
I
I
20
2
80
0
0
4
160
490
4.
2.4
16
0
0
2
I
3.6
12
lO
l
I
I
2
200
FITS
1980/90s
RESHAPE REGENERATE & RETIME
1970/BOs
Multi-Mode
6/8 Mbit/s
PDH
Count
lOial
JR SYSTEMS -
19705
T01al FIT withOUl durlicated DCfDC convener

MTOF (yearq
01. Unayailability
2992.5
38.15
lI.t1015J
1000
'00
ISO
0
0
30
0
0
0
0
0
0
320
100
300
0
0
0
0
63.2
4.2
0
0
7.8
lO
6
0
3.6
6
60
8
24
1.2
20
U.S
7
0
FITS
1.60s
Copper Pair
1.5/2 Mbit/s
peM
Count
TOlal
TOlal FIT, dur'cd DCfDC l'(\l1\'cncr

MTRF (yearq
'0 Unavailability
Coaxial
Pair
Fibre
Powrr sllpplirs
Station ballery
Po.....er feed unit
Po.....er pid off
DC fDC o.:onvertcr
DC/DC du Iil:atcd
COMfIl1OrS
Inl~raled eels TTL

MDS
ECl
JoinlS Solder...d
Crilllped
Crimped (fibrc)
S lieI'd fibre
FET~
Po.... ~r
Dri\'~r~
Laser
LED
PIN
"PD
Peller coolers
Transislors Signal
VaraCIOT
Oiodf's Signal
Vollage regulator
Po.....er
Tramforrncrs
Filters
Wound irems IndUl."1or<;
Variable
Tamalum
Electrolytic
C8parilOf'S CeramiC/poly
1000
400
150
20000
0.2
0.3
l
0.3
I
5
3
0.3
2
30
I
3
15
50
200
200
10
10
100
10
50
150
20
40
60
40
0.1
0.15
I
0.5
03
10
I
100
RniSlon Lo.... po....er

High power
Pre-sels
FITS
COMPONENTS
SYSTEMS
ERA
I
0
0
I
I
21322
5.35
56.832
2009
2.9E05
O.OI()9)
.2.1786.9
S.24
0.01117
7S.15(,
1519
3.9E05
1000
0
0
20000
2
1322
86.35
0.00068
I
0
0
I
I
1786.9
63.88
0.00092
1000
0
0
20000
2
2000 +
1990>
OPTICALLY MPLIFIED
DISTRIBUTED
LUMPED
Single Mode
Single Mode
10+ Gbil/s
2.5110 Gbil/s
WDH
SDH
Coun!
TOlal
Count
TOlal
FITS
FITS
Il
3
6
1.2
0
0
0
0
I
0
0
l
2A
4
1.2
8
I
I
2
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
I
3
0
I
Il
0
0
0
0
0
0
I
200
I
200
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
20
0
0
0
I
lO
0
I
IlO
0
0
0
0
0
0
0
0
0
0
I
I
60
60
40
I
40
I
14.2
6.8
142
68
0
0
0
0
0
0
0
0
O.l
2
I
I
0.3
I
0.3
I
2
20
I
10
I
I
I
I
2
200
0
0
Table 11.4
1110 Unavailability
MTBF (years)
3
6
280
136
48
21000
320
0
280
0
0
260
32
16
260
8
32
560
136
72
2100
70
0
2300
30
120
Count
168
27591
4.14
45523.2
2.51
0.02276
5523.2
20.67
0.00276
3
6
30
14
22
7.2
120
24
1000
40000
8
2700
50
0
42
10
28
MTBF (years)
07, Unavailabilil)'
22.30
5119.03
I.IE-05
66783.5
1.71
0.03339
6783.5
16.83
0.00339
1000
60000
18
9
140
22
270
7.5
0
1680
600
1120
120
500
600
14
24
15
14
8
I
12
10
4
5.4
244
180
J5
20
12
10
2400
10
0
22
30
10.44
10936.7
5.2E06
45689.9
2.50
0.02284
5689.9
20.06
0.00284
1000
40000
8
6
120
10
1.5
240
1200
900
880
100
500
300
14
24
15
14
8
I
10
10
2
4.2
20
180
63
16
0
14
10
6
210
16
0
78
58
10.2
20
Total
FITS
PDH
Coun!
290
34
4
22
0
PDH
0.56/1.8 Gbil/s
Single Mode
1980/905
72
12.6
30
18
122
6
260
22
0
360
42
6
390
9
0
1920
0
0
240
600
600
20
24
90
6
16
150
75
30
0
88
6
100
UnderS1:a systems - duplicated MUX
87591
1.30
0.0438
0.0138
2
4
24
24
12
84
1360
48
48
0
0
24
12
4
20
8
6
20
8
5
250
30
0
440
20
20
Coun!
Total
FITS
FITS
PDH
Total
Coun!
PDH
(OI(l8x
Single Mode
1.512 Ml:it!s
90/140 Mbit/s
3900
60
0
1000
60000
18
19805
Mulli-Mode
4x 1.5/2 Mbit/s
6/8 Mbil/s
2100
48
0
11200
0
0
2600
1600
2400
260
24
480
2160
212
630
70
0
460
9
600
TOlal
FITS
I970/80s
MULTIPLEX TERMINAL
SDH
Single Mode
''''''''
I
2
4
11655.1
4.9E-06
44259.4
2.58
0.02213
9.79
4259.4
26.80
0.00213
1000
40000
8
5.4
100
6
18
10
6
210
0
0
400
720
800
300
300
60
J5
18
10
2.4
18
150
15
0
51
54
6.6
10
Total
FITS
2100
0
0
10
12
20
6
6
2
10
6
I
'8
9
5
170
15
0
270
22
2
Coun!
SDH
2.4/10 Obil/s
Reliability figures and calculations for a single 2-way MUX terminal.
79155
1.44
0.03958
1000
60000
18
No duplicated DCfDC converter

MTBF (years)
0'10 Unavailability
3
6
19155
5.%
0.00958
1000
20000
2
720
220
2400
195
0
0
0
0
2380
3600
]450
390
108
570
72
136
2160
405
84
0
486
15
680
Total
FITS
With duplicated DC/DC converter
DC/DC Convener
DC/DC Duplicated
100
1000
220
72
220
0.3
10
Coaxial
Pair
Fibre
WDM MUX
POWN Suppltes
Station Baltery
Circuit Board
24000
1300
0
0.1
0,1'5
0.5
Joints
Soldered
Crimped
Spliced (fibre)
ConnKlors
0
0
0
40
60
40
238
72
23
390
36
38
I
3
15
10
50
150
240
68
72
0.3
2
30
1350
84
0
2430
50
136
Count
PDH
Coaxial
60/68>: 1.512 Mbil/s
90/140 Mbil/s
Copper Pair
24132>:64 kbit/s
1.5/2 Mbitls
peM
19705
1960>
TTL
MOS
ECL
Signal
Drivers
Power
Inlqr8Ced Cels
Tnnslslors
Voltage Regulator
Power
Filters
Dlodts
Signal
InduClOrs
Transformers
Wound Ilems
Tantalum
0.2
0.3
5
0.5
0.3
FITS
Ceramic/Poly
Electrolytic
CapI('ilors
Pre-sc~ls
Resislors
Low Power
High Power
COMPONENTS
SYSTEMS
ERA
SDH
I
I
2
I
2
I
1400
0
0
5
5
J2
6
6
5
180
12
0
150
10
0
2.51
45477.2
1.3E-06
0.0112
22406.1
5.09
2406.1
47.44
0.0012
1000
20000
2
0.3
20
I
140
0
0
200
300
480
1.8
12
150
54
12
0
30
3
0
FITS
TOlal
2.5/10 Gbit/s
SDH
Single Mode
1990,
0
0
0
10
I
0.02
IE-QS
5685000
2004
56.96
2004
56.%
O,(XH
0
0
0
1000
1000
FITS
Total
WDH
Count
10+ Obit/s
WDM
Single Mode
2000 +
OPTICALL Y AMPLIFIED
LUMPED
DISTRIBUTED
Counl
END-TO-END RELIABILITY OF LONG LINES
0.1
terrestrial (no duplicated power supply)
.~
15
~
0.01
co
C
::J
.-------.
terrestrial (with duplicated power supply)
iii
>
~
0
179
0.001
undersea (full duplication)
0.0001
._--.
0.00001 + - - - - + - - - - + - - - - + - - - - I - - - t - - - + - - - - I
19605
19705
1970/805 19805
1980/905 19905
19905
2000+
technology era
Fig. 11.1
0.1
0.01
.----'-------.----------.,---
terrestrial (no duplicated power supply)
0.001
15
"",,trial 1_ d,pll",I'" powe'
iii
>
0.0001
C
::J
~
0
0.00001
'"
Unavailability of a single 2-way repeater.
~---+--
undersea (full duplication)
0.000001
0.0000001
0.00000001 +----+----+----+----1,..---+---+-----'+
19605
19705 1970/805 19805 1980/905 19905
19905
2000+
technology era
Fig. 11.2
11.3
Unavailability of a single 2-way terminal MUX.
END-TO-END RELIABILITY OF LONG-LINE SYSTEMS
In order to model circumstances in national and international routes, endto-end system lengths of 100 km, 1000 km and 10 000 km are assumed. For
each system length the model computes the correct number of line repeaters
for the technology era. For example, repeater spacings in the copper eras
were constrained to 2 km, whereas the later optical fibre eras readily
accommodate 50 km. In the case of a 100 km system length, the model
therefore invokes 49 repeaters for the copper eras, reducing to I for the fibre
eras. For 100 km and 10000 km systems these figures scale linearly. This
is implicit from this point on.
11.3.1
Terrestrial systems
100 km terrestrial systems can now be repeaterless, although in practice one

or two will usually be present as part of flexibility nodes. From Fig. 11.3
it can be seen that the reliability is dominated by the terminal MUX and
repeaters throughout the technology eras, if power supply duplication is
absent - cable reliability has little influence. However, when power supply
duplication is used, the reliabilities of all three elements become broadly
similar with time. The reducing number of repeaters with time accounts for
the observed improvement in their cascaded reliability.
repeaters (duplicated PS)
f 01~
~
>'<
~d"PIi"'<,"1
~~~
~--c
MUX (no PS duplication)
O.Ol
.----~--MUX
(duplicated PS)
line (duct)
O.OOlt-----+-----f---+----+------,r----+--==.....l}
19605
19705
1970/805
19805
1980/905
19905
19905
2000+
technology era
Fig. 11.3
Repeater, MUX and line unavailability for 100 km terrestrial systems.
Increasing system length to 1000 km of course requires more repeaters

(19 in the fibre eras increasing to 499 in the copper eras), and this is reflected
in Fig. 11.4 where it can be seen that the reliability of the cascaded repeaters
now dominates through the technology eras if power supply duplication is
absent. Cable and MUX reliability have little influence. However, when power
10
----
181
repeaters (duplicated PSi
:0
.!!1
.~
repeaters (no PS duplication)
0.1
Cll
::J
0'<
0.01
line (duct)
MUX (duplicated PSi

0.001+----f---.......-1f----+----+---/-----F'==--4l
2000+
1970/80s
1980s
1980/90s
1990s
1970s
1990s
19605
technology era
Fig. 11.4
supply duplication is used, the repeaters and cable bcome equally dominant
with time. The MUX plays little part in the equation. The reducing number
of repeaters with time again accounts for the observed improvement in their
cascaded reliability.
As might be predicted, the reliability of 10 000 km systems is broadly
similar to that observed for the 1000 km case because the length-dependent
elements (i.e. cable and repeaters) dominate the reliability equation. So, once
again, Fig. 11.5 shows that the reliability of the cascaded repeaters dominates
throughout the technology eras if power supply duplication is absent, with
cable and MUX reliability having little influence. If, however, power-supply
duplication is used, the repeaters and cable become equally dominant with
time. Duplicating the power supply of the terminal MUX now has such a
marginal impact that it is arguably not worthwhile.
11.3.2
Undersea systems
Optical fibre undersea systems have been in service since the mid-late 1980s,
and this is reflected in the graphs presented in this section.
For 100 km undersea systems Fig. 11.6 shows very clearly that cable
unavailability always dominates, and this continues to be so for 1000 km

100
repeaters (duplicated PS)
repeaters (no PS duplication)
10-~
~
ii
.!Jl
iii
line (duct)
~---~
~-
.. ---....,.r----
>
<tl
c
-..------
_______
--~r----
---~:i__=-==_..
0.1

0.01
0.001
+-
M-j-U_X_(:....d-.:up_li_ca-j-t_ed_PS-.:)_+--_ _-+
1960s
1970s
1970/80s
1980s
1980/90s
+--_ _-=::f:==~
1990s
1990s
2000+
technology era
Fig. 11.5

1
0.1
0.01
0.001
>
0.0001
ii
.!Jl
'iii
<tl
line (shallow water, armoured cable)

~===!~==:::::t:!==~====!
line (deep water, non-armoured cable)
repeaters -------.,
::l
?f-
-------------..
0.00001
0.000001
MUX (full duplication)
0.0000001
0.00000001 +----+----t----+------i---t-----+----<;J
1960s
1970s
1970/80s 1980s 1980/90s 1990s
1990s
2060+
technology era
Fig. 11.6
Repeater, MUX and line unavailability for 100 km undersea systems.
(Fig. 11.7) and 10 000 km (Fig. 11.8) system lengths. The observation is not
a reflection on the quality of undersea cables, quite the contrary, it is a direct
consequence of the unavoidably high MTTR. Clearly the time taken to
dispatch a cable ship plus crew, locate the fault, recover the cable, effect
the repair and replace the cable is significantly more than the corresponding
time for terrestrial systems (compare Table 11.1).
0.1
0.01
E
:0
0.001
183

~====t

repeaters
.----.---0-. _---.
'm>
ell
0.0001
::J
0.00001
0.000001
0.0000001
0.00000001+---+----+---l------1f----i---+----l;J
1960s
1970s 1970/80s 1980s 1980/90s 1990s
1990s
2000+
technology era
Fig. 11.7
Repeater, MUX and line unavailability for 1000 km undersea systems.

~==~~==~!\====;~
0.1
0.01
:0
~
0.0001
::J
0.00001
if.
.------------.
0.001
'iii
>
ell
c
repeaters
0.000001
0.0000001
0.00000001+---+-----+---l-----If----i---+----l;J
1960s
1970s 1970/80s 1980s 1980/90s 1990s
1990s
2000+
technology era
Fig. 11.8
Repeater, MUX and line unavailability for 10 000 km undersea systems.
As before, the small observed improvement in cascaded repeater reliability

with time, for all three system lengths, is due to the reducing repeater count.
It is apparent from Figs. 11.6-11.8 that the need for MUX duplication
recedes with system length; however, there is no compelling argument on
the grounds of cost, etc, for doing away with this duplication and so it is
generally provided as standard on important international routes.
184
11.3.3
N + 1 stand-by and diverse routeing
N + 1 stand-by is a well-established and straightforward practice that improves

link reliability by providing one 'hot stand-by circuit' for every 'N operational
circuits'. When a failure occurs the affected traffic is automatically switched
to the stand-by circuit. The scheme originated out of necessity in the days
of twisted pair and coaxial line plant, when repeaters represented the
dominant failure mechanism. Its main appeal is the simplicity of the switching
equipment involved, whereas the down side is that it provides little or no
protection when the cable is the dominant risk, which is nearly always the
case. A further limitation of the technique is its limited ability at coping with
multiple circuit failures on the same line, due for example to power supply
outages, cable breaks, human interdiction, adverse environment, etc. Multiple
failures arise through these mechanisms because the stand-by circuits are
generally fed from the same power bus, or are in the same cable, or in the
same duct. Clearly a better arrangement would be for the stand-by circuits
to reside in independent cables that follow different geographical paths
between the same two end points. This in fact is the basis of diverse routeing
(often referred to as network protection) where every effort is made to avoid
any form of sharing. The geographic separation between the main and diverse
paths can range from a few kilometres within cities to hundreds of kilometres
within countries, and ultimately whole countries and continents for
international links. Diverse routeing is generally limited to long-distance, highcapacity links because of its high implementation and operating cost.
Figures 11.9, 11.10 and 11.11 show the impact that N + 1 stand-by has
on the reliability of 100 km, 1000 km and 10 000 km terrestrial systems. N + 1
1
NO N+1 stand-by, WITH duplicated power supply
~O'1.~
:0
.!!1
'iii
10c
N=10
N=1
WITH N+1 stand-by, NO duplicated power supply
O,001+-----+---+---+---+---+---+--~
1960s
1970s
1970/80s 1980s
1980/90s
1990s
1990s
2000+
technology era
Fig. 11.9
Unavailability of 100 km systems with/without N + I stand-by,
185
10
:0
~
ro>
etl
::J
eft 0.1

001 f - - - - + - - - - + - - - - + - - - - + - - - t - - - - + - - - - - - - l
1970s
1990s
1960s
1970/80s
1980s 1980/90s
1990s
2000+
technology era
Unavailability of 1000 km systems with/withoiut N + I stand-by.
0,1+----+---1-------+----+---+------+--_ _
1960s
1970s
1970/80s
1980s
1980/90s
1990s
1990s
2000+
technology era
Fig. 11.11
Unavailability of 10 000 km systems with/without N+ I stand-by.
stand-by may have been the right solution in the early days of peM when
other critical factors were at play, but today, and more so in the future, it
is clear that this approach wins no meaningful avantage over the significantly
simpler use of power supply duplication. At a time when telcos are striving
to increase network utilization N + I stand-by is also now expensive in the
broad sense. Realistically a worthwhile advantage can only be achieved by
introducing network protection (diverse routeing), in which case power supply
duplication could also be dispensed with.
11.3.4
Add-drops
Long-distance terrestrial transmission systems incorporate add-drops at

selected points along the cable to extract/insert traffic in support of the
network routeing/switching functions. For operational convenience these are
co-located with repeaters. In this study it is assumed that each add-drop
requires a pair of MUX units back-to-back in the case of PDH, and a single
MUX unit for the SDH, with a pair of 2-way repeaters in each case. For
WDM operation a pair of MUX units are required, but these are passive
(Le. all optical) and hence are highly reliabile compared with their electronic
counterparts (at least ten times better).
Figures 11.12, 11.13 and 11.14 show that the impact of add-drops on endto-end system reliability is a strong inverse function of system length, so much
so that their application to 100 km systems should ideally be avoided (or
used minimally at least). This is because the additional hardware in the adddrops represents a significant proportion of the total system. For the same
number of add-drops this proportion, of course, reduces as system length
increases, dwindling to insignificance for 10 000 km systems.
:0
E1
.~
0.1
::J
oR
0.01
duplicated power supply
o to 10 add drops
0.001+----+----+------1---+-----+-----+------1
1970s 1970/80s 1980s 1980/90s
1990s
1990s
2000+
1960s
technology era
Fig. 11.12
11.4
Unavailability of 100 km systems with add drops.
SYSTEM RELIABILITY IN THE lOCAL lOOP
Telecommunications systems are at their most vulnerable in the local loop,

for example, owing to exposed cables, shallow burial with or without ducts,
SYSTEM RELIABILITY IN THE LOCAL LOOP
187
100
10 add drops
no duplicated power supply
~:---
o add drops
0.1 + - - - - + - - - + - - - - - + - - - - - - 1 - - - - + - - - - - + - - _
1960s
1970s
1970/80s
1980s
1980/90s
1990s
1990s
2000+
technology era
Fig. 11.13
Unavailability of 1000 km systems with add drops.
10 add drops
15
.!!1
.~
C
::::l
it-
0.1
o add drops
0.01+----+---+-----+------1----+-----+--_
1960s
1970s
1970/80s
1980s
1980/90s
1990s
1990s
technology era
Fig. 11.14
Unavailability of 10 000 km systems with add drops.
poles and wire drops. All of these are good targets for other utilities and
accidents generally. As a result the risks encountered are orders of magnitude
greater per unit length compared with the long-lines environment, but
fortunately the distances are short and so the risks are manageable.
188
In this analysis of reliability in the local loop, the same overall approach
has been adopted as was used thus far for long-distance systems, the primary
difference being the significantly shorter line lengths involved and the higher
risk of failure or damage to line plant. Not surprisingly it is found that the
overall balance between the various MTBFs and unavailabilities for the local
loop is dramatically shifted.
11.4.1
Route configurations
The limited transmission and signalling abilities of our copper heritage

dictated a local loop with a tree and branch distribution, with switching and
flexibility nodes sited at population centres. Figure 11.15 shows the evolution
of node, cable and line plant configurations with time and technology.
Although simplistic in content, these configurations are entirely valid for these
purposes. For historical completeness and in order that this study be as
broadly applicable as possible, the configurations incorporate a mixture of
copper systems (twisted pair and coaxial) and multimode and single-mode
fibre systems, with each configuration pertaining to a particular technology
era. Included is the concatenation of different line plants that might arise
along a typical route, e.g. duct followed by direct bury, then overhead.
Optical fibres raise the possibility of eradicating all of the flexibility points
in the local loop, and so removing the risk of failures caused by craft
interdiction. Interestingly some 500;0 of the faults experienced by telcos today
are in the local loop, around half of which are in some way attributable to
their own craft people during installation and maintenance. A further 10%
approximately are related to corrosion, with another 10% approximately to
windage, and the rest attributed to other utilities inflicting accidental damage.
In the choice of route configurations here, particularly those involving
fibres, contentious issues, such as centralized power feeding, have been purposely discounted as this is something of a red herring. Telcos are not
power/electricity companies and in the history of telecommunications the
'central battery' policy is a recent and retrograde solution that jeopardizes
future developments. All modern telecommunications equipment comes
complete with a power cord and/or an internal battery. The notion that the
advantages of fibre and an all-optical local loop should be jeopardized merely
to satisfy yesterday's power-feed technology solutions and limitations is a
nonsense, and as such is neglected here. The notion of one fibre per customer
all the way back to the central office is equally moribund. Sharing the
- 5000 GHz bandwidth of a fibre amongst several users by using passive
Iocalona
189
19501605 : twtsled pair
(MU
~I ott..:. 19701805 : coax ...
tw!st8dpU"
ovlothtldor
dlJllClburyor
DurJM dUd
10<&1
otI,,'l8O/9O$ :MM ,.... + lWISUtll po.

"'nolo
(MUX)
.-
twtsled pair
'emoto
(M\)
poIIIOP
~ntralor
~
......
...--.,...... ..
18
OOP
cable In
~dUd
....,=:-l~
I,em coax or
... ......
lWtstedPWIll
'km twIl1ld
...Iod_
p.. .,dirld
tlUryOf
poIIIOP
cotlC4onuttOli
~
eaOIe"
.....
lekln MM
Ill,. W'l blliotd

dUd
..~
I akin tw'IIttd
2'OOm
ptIIr In cited
.. Id Pll, In
bury or
ovWhtId 0(
bul'1ed ckId
d1.ect bU'Y Of
tM,lrIM dUCt
twistld~J,1n
dit~buryor
1990/2000$ : SM ,.... 10 Iho kOlb
199012000s : SM r....
......
to th. homo
I,ekm SM fbi.
buy 01'
bur. dUel
in
"'1
199012000. : PON wilh TOM
_.........
2000s : PON WIth WDM (all opIiCallocalloopj
tSJun SM
"'-'<I
.........
18kmSM
,.".
Fig. 11.15
.......-
1.lkm SMtibtt
In d1recl tuyOf
Local loop configurations through the eras.
overhtlOOI
eked buryOIl'
190
splitters at strategic locations offers a far more economic solution, and so

for these future-looking scenarios a passive optical network (PON)
configuration is assumed.
Clearly the opportunities for reliability improvements in the local loop
through the deployment of fibres are significant. The challenge is to realize
the obvious benefits while consolidating the network into a less hierarchical
(i.e. flatter) structure with minimal switching. In short, the future telecommunications network becomes a vastly extended local loop.
The prize is not just improved reliability and performance, but greatly
reduced operating costs, manning levels, improved flexibility and service
offerings.
11.4.2
Reliability
From Fig. 11.16 it is clear that overall unavailability in the local loop has
remained relatively constant through the eras. At first sight this would appear
inconsistent with the fact that reliability improvements are constantly being
realized and deployed. However, it must be remembered that, as the new
technologies are introduced, bringing the benefit of, for example, increased
capacity and new facilities, they also come with a price - an initial reduction
in effective reliability until the new technology matures. The early days of
fibre is a classic example of this. The ensemble effect of this is a broadly
constant reliability over the eras, but it is important to note that this is in
concert with significant increases in capacity, system reach and performance,
and equally significant reductions in equipment and operating costs. Some
of these trends are illustrated in Fig. 11.17.
Comparing the overall reliabilities of the local loop and 100 km systems
(see Fig. 11.9) yields similar figures. The reliability of customer-to-customer
links (i.e. local loop + long line + local loop) of this approximate length
is therefore evenly distributed between the local loop and long line, which
is an optimum situation. However, for links in excess of 100 km it is found
that their reliability is dominated by the long-line portion. For example, these
results show that a 10 000 km route (compare Fig. 11.11) has a failure risk
today more than one hundred times that of the local loop. Does this mean
there can be complacency about the local loop in the knowledge that for many
traffic connections the reliability problems arise elsewhere? Most definitely
not, because it is vital that telecommunications is viewed as an end-to-end
191
0.1
:c
.!J!
.~
Ctl
0.01
~~
::J
._~
CO <duct> LO <direct bury> <O/H> customer
CO <duct> LO <direct bury> customer
~
CO <duct> LO <duct> customer~
CO <duct> customer (no LO)
0.001+----+----+----+---1----+----+----1
1950/60s 1970/80s 1980/90s 1980/90s 1990/2000s 1990/2000s 1990/2000s 1990/2000s
technology era
Fig. 11.16
Overall unavailability of local lines.
0.1
cost per channel
O.OOll--.:-+-----4I----+-----+~~~:::::~~~
19501605 1970/805 1980/905
technology era
Fig. 11.17
Key trends in relation to unavailability in the local loop.
global process. It is desirable on operational and economic grounds to

engineer a balance throughout a network with an equitable distribution of
failure risk.
192
11.5
11.5.1
CABLE REPAIR
All your eggs in one basket
Throughout the history of cable transmission the philosophy has been to avoid
putting all your eggs in one basket, i.e. to distribute circuits across several
pairs, coaxial tubes and fibres. This approach was encouraged by technologies
that could support only a limited amount of traffic on individual bearers.
With the advent of the optical amplifier, this philosophy is about to be
changed in a most dramatic way. The ability to place all circuits on one
amplified optical fibre using WDM actually improves the overall circuit
availability. Consider the rationale behind this rather surprising statement.
Suppose a cable contains a number of parallel fibres each of which is carrying
an equal share of the total traffic in the cable. When the cable is hit, how
long does it take to repair? A fixed time to detect, locate, dispatch, and
prepare the cable for fibre splicing might be supposed - say 24 hours (it
would probably be considerably less, but this scenario is purposely being
pessimistic). Then the repair crew start to splice the individual fibres - say
15-30 minutes per fibre. The MTTRs and unavailabilities that follow from
this are listed in Table 11.5 as a function of the number of fibres in the cable,
and these are summarized in Fig. 11.18. For even a modest number of fibres,
say 200, the MTTR and unavailability increase rapidly. From the point of
view of the last fibre to be repaired, which could be your fibre, this is
unacceptable. Now consider that same cable breakage but with all the traffic
carried by the previous 200 fibres now on a single fibre using WDM. Now,
the MTTR and unavailability are only marginally down on the best-case
figures governed by the static time to detect, locate, dispatch and prepare
the cable. To the customer as well as the operator this represents a significantly
better availability.
11.5.2
Detecting and locating fibre breaks
In the case of duplex (bi-directional) and diplex (WDM) transmission over

a single fibre, there is an inherent ability to detect the location of a fibre
break in real time. It is only necessary to detect an abrupt change in the signal
CABLE REPAIR
Table 11.5
Number
of fibres
1
2
10
100
200
500
1000
193
Cable MTTR and unavailability figures for a given amount of traffic concentrated
on N fibres.
Splice time per fibre = 30 minutes

Repair
Repair
Av
Uv
(%)
(%)
time
time
(days)
(hours)
24.5
25
29
74
124
274
524
1.02
1.04
1.21
3.08
5.17
11.42
21.83
0.028
0.029
0.031
0.085
0.142
0.313
0.598
99.97
99.97
99.97
99.92
99.86
99.69
99.40
Splice time per fibre = 15 minutes

Repair
Repair
Uv
Av
time
time
(%)
(%)
(hours)
(days)
24.25
24.5
26.5
49
74
149
274
1.01
1.02
I.I
2.04
3.08
6.21
11.42
0.028
0.028
0.030
0.056
0.085
0.170
0.313
99.97
99.97
99.97
99.94
99.92
99.83
99.69
25.00
0.6
2000
0.5 .
:0
15.00
0.4
.!!!
.~ 0.3
ctl
c
iO'
'0
30 minute repair time per fibre
10.00
::>
(j)
cr:
I:~
rf 0.2
5.00
0.1
0 + - - - - - + - - - - - - - + - - - - + - - - - - + - - - - - + - - - - - - + 0.00
1
2
10
100
200
500
1000
number of fibres
Fig. 11.18
Unavailability and MTTR experienced by the last fibre in a cable failure.
level or error event activity, and then use the phase or path delay as a distance
calibration, to be able to accurately position the fault. It is also conceivable
that network protection switching could be invoked before the fibre break
interrupts service to the customer.
11.6
FURTHER RELIABILITY CONSIDERATIONS IN NETWORKS

11.6.1
Lightning and EMC
Longitudinal currents induced in copper lines by lightning strikes can couple

across into the static-sensitive electrooptic and MUX equipments in fibre
systems causing failures, incipient faults and burst errors. Wherever possible
copper and fibre plant should be divorced and steps taken to deflect such
induced currents away from fibre plant. A similar situation can also be
presented by power grid-distribution systems. It should not be assumed that
lightning and power transients will not affect fibre systems - they generally
do unless some protective action is taken.
11.6.2
Failure prediction
Recent studies have shown that it is possible to differentiate between the cause
of error bursts on the basis of the error-pattern statistics (see Chapter 12).
Further development of these techniques might enable discrimination between
events and alarms to the point where maintenance action can be focused and
explanations furnished automatically. Failure type, time and location
forecasting is an area now needing attention in order that network reliability
and performance be further enhanced.
11.6.3
Network management
Monitoring systems and operations, extracting meaningful information and

taking appropriate action to maintain a given grade of service is becoming
increasingly complex and expensive. The level of complexity increases in
proportion (at least) to the amount of data to be handled, much of which
is redundant or unusable. For a fully interconnected network of N switching
nodes, a single fault results in one failure report plus error reports from the
N - I other nodes. It can thus be shown that:
mean number of reports per day : : : ;
MTBF in days
For example, a network of 500 000 switching nodes each with an MTBF
of 10 years will suffer an average of 137 node failures and will generate an
average of 68.5 million reports per day. The assumption in the above formula
FURTHER NETWORK RELIABILITY CONSIDERATIONS
195
that each node is communicating with all the others is, of course, somewhat
extreme. At the opposite extreme there is the least connected case, which
leads to:
mean number of reports per day : : : ;
MTBF in days
which predicts that the reports still count in the millions. Whilst there are
certain network configurations and modes of operation that realize fault
report rates proportional to N, the nature of telecommunications networks
to date tends to dictate a - N 2 growth. Indeed, a large national network
with thousands of switching nodes can generate information at rates of
- 2 Gbyte/day under normal operating conditions. Maximizing the MTBF
and minimizing N must clearly be key design objectives.
11.6.4
Software
Today's networks rely heavily on software for their management and service
provision operations, while the software itself is becoming more complex,
involving in some instances millions of lines of code (Fig. 11.19). Coupled
telecommunications
ATM
SOH
control centre
System-X
exchange
other examples
miles of code
20
0
0
0
SOl
0
0
Encyclopaedia
Britannica
nuclear reactor
space shuttle
complete works
of Shakespeare
1 human's limit
10
small exchange
0.2
TXE-10 switch
(1 mile of code = 400,000 lines of code)
Fig.ll.19
Foster's metric - measure of software size/complexity.
196
with this is the fact that even minor errors in software, either in the base
code or in its implementation, pose a considerable risk to network operation,
as is evidenced in recent outages which may be quantified using the modified
Richter scale in Fig. 11.20.
If the present trajectory in software development is maintained, the
magnitude of the risk will grow exponentially into the future. In contrast,
the reliability of hardware is improving rapidly whilst that of software is
reducing, so much so that sub-optimal system and network solutions are being
seen. From any engineering perspective this growing imbalance needs to be
addressed. If it is not, an increasing number of ever more dramatic failures
can be expected. A number of technological developments hold promise that
this trend can be checked - for example, optical transparency within the
network which utilizes very simple switching and control methodologies, and
distributed intelligence whereby network control becomes less centralized and
hence less critical.
Richter scale
failure types:
range of effects:
major economic
disruption
international
network control
e[I,21
e [3,4)
national network
control
7
e[51
national press
reports
switch node
failure
local press
reports
local network
failure
local line
fault
government
concern & action
individual
complaints
within normal
contract
e =magnitude of the outage
Fig. 11.20
'Richter' outage disaster scale for networks.
QUANTUM EFFECTS
11.7
197
PEOPLE
The reliability of operational systems is generally less than that predicted at

the design stage. Moreover, there are often significant differences between
the measured performance of individual units and large numbers co-located
in single installations. In most cases this sub-optimal behaviour can be traced
back to human intervention causing imperfect repair, interference or incidental damage/weakening. Indeed, evidence suggests that - 50010 of the
faults experienced by telcos today are in the local loop, around half of which
are in some way attributable to their own craft people carrying out their
everyday installation and maintenance activities. The figure for long lines,
although lower, is nevertheless significant. The replacement of manual distribution points and writing frames with software-based routeing on a PON
represents a significant gain. When combined with gains arising from the
eradication of corrosion, the overall saving can be as high as 40% of the
operating total. Similar savings are also possible for repeater stations and
switches. A network based solely on fibre with < 100 switches would have
a fault rate of only - 20% of today's network for fibre to the home (FTTH).
If, however, the fibre only extends to the kerb (FTTK) with copper or radio
providing the final drop to the customer, the above suggested gains will not
be realized and the overall running costs will remain high. FTTK also
introduces a finite bandwidth limitation that prevents future capacity
upgrades.
11.8
QUANTUM EFFECTS AND NODE REDUCTION
All experience of systems and networks to date, coupled with the general
development of photonics and electronics, points towards networks of fewer
and fewer nodes, vastly reduced hardware content, with potentially limitless
bandwidth through transparency. With networks of thousands of nodes,
failures tend to be localized and isolated - barring software related events.
The impact of single or multiple failures is then effectively contained by the
'law of large numbers', with individual customers experiencing reasonably
uniform grade of service. However, as the number of nodes is reduced the
potential for catastrophic failures increases, with the grade of service
experienced at the periphery becoming extremely variable. The point at which
such effects become apparent depends on the precise network configuration,
control and operation, but, as a general rule, networks with less than 50 nodes
require careful design to avoid quantum effects occurring under certain
operational modes, i.e. a failure of a node or link today for a given network
198
configuration and traffic pattern may affect only a few customers and go
by almost unnoticed; the same failure tomorrow could affect large numbers
of customers and be catastrophic purely due to a different configuration and
traffic pattern in existence at the time. Caution should be exercised when
moving towards networks with fewer nodes, while at the same time increasing
the extent of mobile communications at their periphery.
11.9
THE TOTAL PICTURE
Our future vision is of a global network in which plant complexity is vastly

reduced, transmission and switching is transparent, and all reliability risks
are optimally distributed across the planet. To get there we must first
overcome certain obstacles. For example, a reduction in the number of
switches of one hundredfold (or more) is feasible, leading to a reduction by
more than a thousandfold in the scale of network management. Routeing
and control software could undergo even greater reductions with millions
of lines of structured code in hierarchical operating systems being replaced
by a few hundred lines operating a self-organizing, intelligent system. The
resulting reliability benefits are currently impossible to quantify with
precision, but clearly they are significant and will ultimately converge on the
fundamental limits to network reliability. This migration would be in concert
with the migration of intelligence to the periphery of networks in the form
of increasingly sophisticated customer terminals, such as computers.
Telecommunications is already the nervous system of our planet - the
peripheral computing power and mobile users are the neurons.
A future based on distributed computing and mobile information-intensive
services necessitates that a high end-to-end circuit availability be realized
everywhere.
In support of this objective, this study has shown that the local-loop and
long-distance systems up to 100 km already have the potential of achieving
a suitably low unavailability, - 0.01 070 in each case. On international links,
however, some form of diverse routeing is necessary to achieve a similar
figure. In its simplest form this could be achieved by setting up a diverse
route to each switch. Such a strategy could improve the - I % unavailability
computed earlier for a to 000 km link to a level on a par with the local loop,
which would represent an optimum balance. In either example, the end-toend (Le. local loop + long line + local loop) unavailability would be a very
acceptable - 0.03% for any link between any two global locations. Improving
this figure would involve considerable expense since it would require total
interconnectivity between all switches, the introduction of route diversity into
the local loop, and the duplication of all switches, flexibility points, etc.
CONCLUSIONS
199
However, there is more to reliability than simply ensuring that the

telecommunications links needed by customers are available on demand and
do not fail during use. Customer terminals are likely to evolve that assume
a level of network and interface control which will give rise to complex
interactivities that, in themselves, must be reliable. Multimedia and homeworking will add to this. In overall terms it is thus imperative that future
networks be configured to employ a minimum of concatenated plant, have
direct customer-to-customer capabilities, are lean, clean, inherently reliable
and transparent.
The reader might like to contemplate the prospect of networks devoid
of any hierarchy (Le. flat), single switched on a national, international and
even global basis, with no more than two to four concatenated switching
centres on anyone communications channel.
11.10
CONCLUSIONS
The key results presented in this chapter can be summarized as follows:
for terrestrial fibre systems the repeater and cable risks are generally
dominant, whereas only cable risks dominate in undersea systems;
duplicating power supplies realizes the lowest cost solution towards

improving end-to-end reliability;
concentrating traffic on a single fibre realizes a higher reliability than

dispersing across several fibres once optical amplifiers are introduced;
the reliability of long lines can be on a par with the future local loop
when diverse routeing is introduced;
minimizing the number of switch nodes in the network must be a prime

objective.
Without the advent of optical fibre technology it is doubtful if the human

race could realize effective global telecommunications. Optical technology
is unique in combining an inherent ability for high reliability at very low cost,
whilst also providing an almost limitless bandwidth for future exploitation,
the lowest power and material usage, the smallest physical size, and a repeater
spacing that seemed pure fiction only a decade ago. Looking to the future
there is every reason to anticipate that advances in this technology will
continue and become even more dominant in all sectors of telecommunications. From a reliability perspective it is clear that it can deliver more
than any other technology. However, regularly revisiting this technology to
200
check the validity of these assumptions and solutions would be well advised.
What was effective and appropriate yesterday may not remain so tomorrow.
This chapter has examined the reliability of optical transmission systems
and networks against a perspective of past, present and future technologies.
The challenge ahead is the realization of transparent optical networks
requiring a minimum of software and human interdiction. Such networks
will ultimately be required to satisfy the future demands of mobile computing
and communications on a global scale. Looking beyond this, new forms of
fibre (e.g. those that give even lower loss than silica, and/or contain
programmable structures to realize integrated signal processing) and networks
must be found to create even more reliable solutions.
REFERENCES
1. Mason C: 'Software problem cripples AT&T long-distance network', Telephony, 218,
No 4, p 10 (January 1990).
2. Neumann P G: 'Some reflections on a telephone switching problem', Commun of ACM,
33, No 7, p 154 (July 1990).
3. 'SS7 errors torpedo networks in DC, LA', Telephony, 221/1 (I July 1991).
4. 'DSC admits software bug led to outages', Telephony, 221/3 , pp 8-9 (15 July 1991).
5. Davenport P: 'Scarborough returns to electronic dark ages', The Times, Issue 63864
(15 November 1990).
BIBLIOGRAPHY
Cochrane P, Heckingbottom R and Heatley D J T: 'The hidden benefits of optical
transparency', Optical Fiber Communication Conference (OFC'94), USA (February 1994).
Cochrane P, Heatley D J T et al: 'Optical communications - future prospects', lEE
Electronics and Communication Engineering Journal, 2., No 4, pp 221-232 (August 1993).
Cochrane P and Heatley D J T: 'Optical fibre systems and networks in the 21st century',
Interlink 2000 Journal, pp 150-154 (February 1992).
Cochrane P and Heatley D J T: 'Optical fibres - the BT experience', Conference
on Fiber-optic Markets, Newport, USA (21-23 October 1991).
Cochrane P, Heatley D J T and Todd C J: 'Towards the transparent optical network',
6th World Telecommunication Forum, Geneva (7-15 October 1991).
Heatley D J T and Cochrane P: 'Future directions in long haul optical fibre
transmission systems', 3rd lEE Conference on Telecommunications, Edinburgh,
pp 157-164 (17-20 March 1991).
Hill A M: 'Network implications of optical amplifiers', Optical Fiber Communication
Conference (OFC'92), San Jose USA, paper WF5, p 1218 (2-7 February 1992).
Butler R A and Cochrane P: 'Correlation of interference and bit error activity
in a digital transmission system', lEE Electronics Letters, 26, No 6, p 363 (March
I~~.
-
12
PRE-EMPTIVE NETWORK
MANAGEMENT
R A Butler and P Cochrane
12.1
INTRODUCTION
Instead of waiting for systems to fail before taking remedial action, it might
be possible to remove systems from service by diverting traffic prior to failure.
Also, the likely failure mechanism might be identified from the transient
pattern of error events, facilitating rapid repair and restoration. If these objectives were achieved they would lead to extensive cost savings and network
performance enhancement. The basis for this alternative monitoring strategy
has been formulated from practical system experience and predicated by the
total lack of adequate burst error models. To date it has been assumed that
randomly generated errors have a negative exponential arrival statistic, which
can be demonstrated to be true. All other errors have been assumed to have
some form of compound Poisson arrival [1]. Whilst this might be true, no
one has been successful in formulating a general mathematical model that
fits anything but a small selection of the recorded error events from practical
networks. There are two possible explanations for this apparent difficulty.
First the statistics of individual bursts may be different depending on their
origin, such as power transients, lightning, capacitor breakdown, human
intervention, radio interference, etc. Secondly, the reported models generally
try to fit statistical distributions to the error signals produced after the line
decoding operation. It is contended here that decoding and retiming circuits
significantly distort the burst error statistics and further complicate the model.
202
PRE-EMPTIVE NETWORK MANAGEMENT
Existing recommendations for the performance of digital transmission

systems are based on a number of questionable suppositions rather than
confirmed theoretical and practical results [2]. However, they do reflect
international concern that whilst the present performance of digital networks
is acceptable for telephony, it is not the case for the rapidly increasing
amounts of data traffic. Extensive measurement programmes have thus been
undertaken by numerous operators to measure the performance of transmission systems.
The present recommendations only relate to the number of errors
occurring per minute, second or decisecond, and no mention is made of the
statistical distribution of the errors. An alternative approach is therefore
presented here that relates a pattern of burst errors back through a complete
transmission system to its source - the reverse process. The ultimate objective
is to include error burst monitoring in transmission system terminals that
could analyse the arrival statistics of the errors detected and convey the results
to a network management facility for appropriate action.
12.2
IMPLICATIONS OF ERRORS ON SYSTEMS AND NETWORKS
At the time of writing there is no mechanism by which the monitoring of

the digital transmission systems can detect or predict impending failure. The
monitoring, at present, is simply of a 'has just failed' or 'just too late' nature
[3-5] . The objective is thus to investigate the possibility of providing information about impending failures so that planned preventative maintenance
can be performed. This ambition now poses the question - 'What and how
should it be monitored?'
CCITT) study group XVIII developed performance objectives and
standards for the integrated services digital network (ISDN) at Nx 64 kbit/s
and other higher bit rates. These are outlined in Recommendation G. 821 :
'Error performance of an international digital connection forming part of
an integrated services digital network' [6, 7] . This Recommendation is firmly
based on the concept of long-term bit error ratio (BER) [8].
The error performance parameters chosen by the CCITT relate to large
quantities of data and must therefore be viewed as long-term parameters.
The use of error-free seconds (EFS) is a step in the right direction, but the
use of sample periods containing at least 2 Mbitis means that it is still only
the long-term BER that is being considered. So far the distribution of errors
remains neglected.
I CCITT has now become ITU/T - the Telecommunications standardization sector of the ITU.
EXISTING MODELS
12.3
203
EXISTING MODElS
At the current time there is no formalized international definition of what

constitutes an error burst. This makes the task of identifying individual bursts
and collecting statistics difficult. A very popular definition employs a guard
space:
'Error bursts are separated by gaps containing at least g error-free bits.
The intervals between errors within a burst are less than g bits.'
However, there is no commonly agreed value for g and as a result various
models have been developed.
Berger and Mandelbrot [9] observed the clustering of error events and
noted their appearance to be characteristic of a process governed by a
Pareto distribution. They claim that the interval between successive errors
is statistically independent of earlier activity and attempted to fit this
model to measurements made on the German telephone network. The
model is reasonable for the data used, but considerable complications
need to be added to fully explain the data obtained by other researchers.
This has led to the rejection of this model [10, 11].
Bond [12] describes error bursts in terms of the gap distribution. Gaps
are counted from the start of one error to the start of the next. The
minimum value of g is therefore 1 bit. Successive gaps form a sequence
of not necessarily independent random variables.
Gilbert [13] considered a good state with zero probability of an error

occurring and a bad state with a large probability of error, with transition
probabilities between the two. The bad state gives rise to bursts of errors,
the transition probabilities then being chosen to account for the
occurrence of these bursts. Many complex extensions of this model exist
[14-16] with multiple states introduced in an attempt to follow the
complex structures found in practice.
Pullum [17] assumed that the occurrence of bursts can be described with
a Poisson distribution using a parameter ml and that the occurrence of
errors within a burst can be described by a second Poisson distribution
using a parameter m2. These are then combined into a single distribution, first described in 1939 [18], and known as Neyman's Type A
(NTA) contagious distribution. The model attempts to fit a distribution
to the errors emerging from measurements [1], but no account is taken
of the order in which the error events arrive and vital information is lost
in respect of the physical cause of a burst.
204
12.4
MATHEMATICAL MODEL
The established models reported in the literature [9, 12, 13, 17] for burst
effects attempt to define a statistical distribution for the occurrence of errors.
This has limitations as the distribution is dependent on the cause of the error
burst and has, so far, defied all attempts at generalization. Many texts
concerned with random processes in communications stress the importance
of autocorrelation in distinguishing signals from noise. In terms of this work
the signal is the cause of the error burst and the autocorrelation function
of a burst is given by:
R(T) =
T=O,I,2,3 ...N
r,f(t)f(t+T)
... (12.1)
t=O
where T, t and T are quantized in terms of bit periods. In the general case
the power spectral density of an error burst is given by:
S(w) = Tsinc 2(wTI2) [R o+
r,
j=
2Rjcos(iwT)]
... (12.2)
where:
the sinc 2 term gives the power spectral density of a burst its characteristic shape;
the R o term adds a frequency independent offset;
the cosine series introduces maxima and minima unique to each error
burst.
In order to study the variation of the power spectral density of error bursts
with system and interference wave form parameters, power spectral densities
need to be compared. To reduce the storage and computation necessary,
algorithmic compressions are required to produce metrics that adequately
describe the structure. Such metrics are relatively simple to compute and it
has been found that a suitable mechanism for this comparison can be based
on peak amplitude comparison of the terms in the cosine series. Standard
statistical metrics have been used to characterize the cosine term peak
amplitudes in both measured and simulated results:
N
r, RJ
mean, JJ-
j=
r,
i=l
... (12.3)
Rj
MATHEMATICAL MODEL
205
E Rj(i- fJ-)2
standard deviation, a
i=l
... (12.4)
ER j (i-fJ-)3
skewness
j=
... (12.5)
E R j(i_fJ-)4
kurtosis
i=l
... (12.6)
Examples of these have been calculated for sample error bursts, as shown
in Fig. 12.1.
These metrics have been found to be sufficient in practice as they produce
a unique set of values [19-21]. It must be remembered that a time-reversed
error burst will produce the same autocorrelation function and every error
burst has a palindrome. In practice, the likelihood of this occurring is
sufficiently small to be of no real significance and is merely recorded here
as a matter of completeness. Even if by some unlikely cause palindrome effects
turn out to be significant the necessary algorithm for their detection and
differentiation is trivial. Experience gained through simulation and
measurement has shown that the mean metric is best suited for comparative
purposes [22]. The variance, skewness and kurtosis may have some value
in separating bursts with similar mean metrics.
Consider a very specific error burst containing N errors, each separated
by an interval of / bits. Then:
mean =
(N+ 1)/
--3-
... (12.7)
CCITT Recommendation G.821 [7] defines system failure when the BER
exceeds 10- 3 for ten consecutive seconds. Assuming the errors occur at
exactly WOO-bit intervals for a system operating at a bit rate of R bits per
second, then:
lOR
1000
mean ::::: -3- + -3-
... (12.8)
206

metrics
error burst
e=error
autocorrelation function
(ACF)
mean
(bits)
stand. dev.
(bits)
skewness
kurtosis
eeeeeeeeee
~~
3.67
2.21
0.57
2.36
~Lulill
3.00
1.73
0.58
2.33
2.33
1.25
0.59
2.27
~ i.. . . .,'":-:-~:---:-::-- __
1.67
0.75
0.63
2.04
~l
1.00
NC
NC
NC
4.00
2.00
0.60
2.20
4.47
2.50
0.28
2.08
4.13
2.47
-0.07
1.47
delay time, bits
eeeeeeee
delay time, bits
eeeeee
1
9
---1......I-...I~I...I__~
delay time, bits
eeee
delay time, bits
ee
e-e-e-e-e
ee--ee--ee
~I
()___
--.,
delay time, bits
I
I
-L.:-:-~...a...~--II-
delay time, bits
___ 1
11.
9
.........---JL..I........._
I I I
......
I
delay time, bits
eee---eee
~~II
el,
delay time, bits
Fig. 12.1
Metrics for example error bursts.
This example serves to illustrate three important aspects of the use of

the mean metric:
for systems operating at very low error rates, large mean metric values
result;
the mean metric is related to the structure of the error burst;
mean metric values can be indicative of system failure.
SIMULAnON
12.5
207
SIMULATION
It was considered necessary to model a physical transmission system, which

introduced considerable complexity to the simulation design that is difficult
to validate directly. Each simulation calculated the error probability per
transmitted bit, taking pattern, interference wave-form, and decision-point
offsets into account [19-21]. The error probability string was autocorrelated
and metrics calculated. These are compared with those obtained from physical
measurements taken on a laboratory system described in section 6. This
produced a good agreement, but has a number of limitations in respect of
the physical system. Specifically, the physical process in the receiver for an
individual error burst was not modelled. The simulation yields an ensemble
average of the metrics for all possible error bursts given the particular scenario
of pattern, interference wave-form and offsets. The comparison from a
measured set is reasonable if the sample is large (say > 50). Also, the
simulation does not model decoder action, giving bit and code error
indications.
To overcome these limitations a Monte Carlo simulation was devised,
the process being as follows:
generate a random data stream;
line code the data stream;
transmit the line coded data stream, introducing Gaussian noise,

interference, offsets, etc;
decode the received line-coded data and detect code errors;
detect bit errors;
calculate metrics for bit and code errors.
This was repeated 400 times for a given set of parameters. Allowing for
the published constraints of Monte Carlo simulation [23, 24] that involve
1000-10 000 runs, this number was shown to give acceptable confidence
intervals. An element of batch processing was incorporated into the simulation
with a series of parameter sets specified and simulated to illustrate the
variability of metrics with time. Each set of parameters has been termed a
'scene'. This visualization is based on a statistical analysis of the simulated
results, yielding an average result and a 95010 confidence interval for each
scene.
208
12.6
MEASUREMENT SYSTEM
This laboratory system featured commercial HDB3 [25] optical modems

[26] configured as shown in Fig. 12.2. Interference wave-forms are downloaded to a D/A converter that forms part of the light emitting diode (LED)
drive circuit of the optical transmitter. Values are injected at the line data
rate of 2.048 Mbit/s, synchronous with the data. The technique is similar
to that used by Kawashima [27]. As the interference wave-form is injected,
four signals are recorded:
code errors from the optical receiver;
AIS (alarm insert signal) from the optical receiver;
recovered clock from the optical receiver;
bit errors from the error detector.
interference
waveform
code
errors
AI S
recovered
clock
clock
bit
errors
interference generation and burst error acquisition card
file selVer (for

sottware
development)
local disc drive

(for logging
results)
Fig. 12.2
IBM PC compatible
Schematic diagram of the measurement system.
The same interferer can be injected for a preselected number of times.

The results from each measurement run are analysed to produce mean and
95070 confidence intervals. If an AIS indication occurs or clock cycles are
lost during a measurement, this is also recorded and the run repeated. To
further validate this work 30 error interval loggers [28] were installed on
a variety of systems in the BT network.
RESULTS FOR HDB3 CODING
12.7
209
INTERFERERS
On the basis of work by Udren [29] and others [9-17] a series of

exponentially decaying interference wave-forms was adopted to simulate
system transients experienced in practice. These included just two generalized
sets:
a decaying exponential pulse

V(t)
A t exp( - tIT)
... (12.9)
a decaying sinusoidal pulse

V(t) = A exp( - tIT) sin(21l"7/64)
12.8
12.8.1
... (12.10)
Identification of interference parameter variations
The results from the measurements and bit-by-bit simulation are illustrated
in Fig. 12.3. As the decay time constant increases from 100 bits, more cycles
of the interferer contribute to the error activity. This increase in error activity
leads to an increase in the value of the measured mean metrics. Once the
decay time constant reaches 600 bits all the cycles in the 320-bit duration
interferer are contributing to the error activity. Consequently, increasing the
decay time constant does not yield any significant increase in the value of
the mean metrics. The trends of the measured bit and code error mean metrics
are similar [21]. The bit-by-bit simulation result is similar, but by no means
a close match, to the measured results. The measured and Monte Carlo
simulation results for bit errors are compared in Fig. 12.4 and show good
agreement with Fig. 12.3.
An important feature of the measured and Monte Carlo simulation results
is the similarity between the bit and code error mean metric values for each
duration. In practice only code error metrics are available to predict the bit
error performance the customer is receiving. On the basis of these results
the code error mean metric appears to be a good indicator of the bit error
mean metric.
210
120
t::.:.:-:=:.:::-.:.::::.:::-.:.~: .... --:~:::
100
::'
80
2
:c
c5
'C
l/
a;
E
co
40
(])
E
c:
20
0
0
/'
.
.I
,"
..
60
.I
200
,"
"
,
,,
,
,
,,
,
,
,,
,
,,
{.
,,-,,
",
"
"
,,''
~""
"
"
----- simulation
........... measured bit errors
- . - measured code errors
400
600
800
1000
decay time constant, bits
Fig. 12.3
Metrics for HDB3 coding relative to decay time constant.
120
100
80
2
:c
<.>
'C
60
a;
E
c:
co
(])
40
20
0
0
,,
,,"
,
,
,,
,
,,
,,,
,,
,,
,,'
, , ~"
" "
....... >
--measured
- - - - - Monte Carlo
,,
,
,,
200
400
600
800
1000
Fig. 12.4
211
Identification of system parameter variations
12.8.2
When the signal-to-noise ratio (SNR) is low the mean metrics have a high
value and are rather unfocused, giving wider confidence intervals as the low
SNR produces many errors. For larger values the mean metrics begin to focus
towards particular values as the number of errors reduces. The trends of the
bit and code error mean metrics in Fig. 12.5 are again similar, with the code
error mean metric having slightly lower values.
90
:0
c5
.~
a;
E
c::
co
Ql
50
E
40
- - bil errors
-----code errors
\
\,
30
,,
,,
,~
~~
~ ~~ ~ -~=-===-===-=-==~-
20 L . - - - I _ - - L _ - J . . _........_
200
2
2.5
3
3.5
......._ L . - - - I ' - - - - L _......
4.5
5.5
signal-Io-noise ralio
Fig. 12.5
Metrics for HDB3 coding relative to signal-to-noise ratio.
Extreme values of the decision threshold offset produce large amounts

of error activity, and consequently large mean metric values, with wide
confidence intervals (Fig. 12.6). Low values of decision threshold offset
produce little, if any, error activity, and consequently low mean metric values
with narrow confidence intervals. A minimum mean metric value would thus
be expected at zero decision threshold offset. In practice, this is not the case
as these simulations also incorporate an interferer. With zero decision
threshold offset, the interferer is responsible for the error activity and can
bias the mean metric values. Figure 12.7 shows the measured and bit-by-bit
simulated variation of the mean metrics with interferer duration. In this case
the LED in the optical transmitter was operated in a linear region of its
characteristic. Figure 12.8 illustrates the variation of the mean metrics with
212
90
80
- - bit errors
-----code errors
70
60
.i1
:c
rS
';:
Qi
III
Q)
10
100
80
60
40
20
-20
-40
-60 -80 -100
decision threshold offset, %
Fig. 12.6
Metrics for HDB3 coding relative to decision threshold.
duration for a decaying sine wave interferer with the transmitter operating
in a nonlinear region of its characteristic [21].
Having been able to identify changes in the parameters of a decaying sine
wave interferer and the system, attempts were made to identify changes in
the parameters of a second type of interferer. Unfortunately, there is clearly
no correlation (Fig. 12.9) between the measured and simulated results. The
Monte Carlo simulation also reveals the same discrepancy. This problem is
compounded by the good correlation between the measured and simulated
results for a decaying sine wave interferer with HDB3 coded data, as shown
in Figs. 12.3 and 12.4. In order to identify the cause of this discrepancy the
measured results were studied in detail. It emerged that the error density
within the burst caused by the decaying sine wave interferer is typically 21.5070
for bit errors and 16.5% for code errors. The error density for the peak and
decay interferer is 40.7% for bit errors and 18.9% for code errors. Bearing
in mind that the nature of the interferers is such that this error density will
affect both polarities of marks for the decaying sine wave interferer, but only
one polarity of marks for the peak and decay interferer, then the peak and
decay interferer is subjecting one polarity of marks to a far greater error
density than the decaying sine wave interferer.
213
120
100
- - - - - simulation
- . - measured code errors
80
.l!l
:0
e5
.;::
Qi
60
c:
III
OJ
40
20
duration, bits
Fig. 12.7
Metrics for HDB3 coding relative to interferer duration for a linear transmitter.
The behaviour of an HDB3 decoder, under high error-density conditions

is not well documented. An assumption, based on the above, is that the HDB3
decoder is functioning correctly for the decaying sine wave interferer. The
peak and decay interferer is introducing too many errors for the decoder to
function correctly. This observation is supported by the work of Becam [301
where HDB3 error detection was also studied under high error-density
conditions. In the bit-by-bit simulation the error probability of each
transmitted mark is used to calculate metrics. This simulation does not include
the decoder action.
The Monte Carlo simulation does include an HDB3 decoding process to
obtain both bit and code error results. In the original version of the simulation
the digital sum variation (DSV) was monitored to detect code errors. When
this sum went out of bounds, due to errors, the sum was reset to its boundary
condition. It is thought that this resetting process may not be occurring in
practice. To investigate this the Monte Carlo simulator was modified to only
give code error indications when the DSV boundary is crossed. The DSV
is no longer reset by the boundary violation and is free to take the value
214
100
- - measured code errors
90
80
70
.l!l
:0
60
(,)
";::
Qi
E
c:
ctl
Ql
50
40
30
20
10
0
0
32
64
96
128 160 192 224 256 288 320

duration, bits
Fig. 12.8
Metrics for HDB3 coding relative to interferer duration for a nonlinear transmitter.
60
........... bit errors
- - code errors
50
.l!l
:0
cS
.;::
Qi
E
c:
40
30
ctl
Ql
,,
20
,
,,
,,
,
,,
,,
,
,,
,,
,
,,
,,
,
,,
,,
,
,,
,,
:;;,,~~.:~~~. ::>;.::<..,..
10
"
,,
'\......
".
,'.....,.
"
0
0
20
40
",
''''''''''''''-',
v"' :::::::..~.:.::::
60
Fig. 12.9
215
dictated by the received marks. When the DSV is out of bounds the decoder
is assumed to be functioning incorrectly. An equivalent of AIS is injected,
causing the suspension of bit and code error recording. During periods of
AIS the bit and code error interval counters continue to increment, awaiting
the recovery of the decoder.
Results from this modified Monte Carlo simulation are presented in
Figs. 12.10 and 12.11. For the decaying sine wave interferer (Fig. 12.10),
the results follow a similar trend to the measurements and simulation. In
the peak and decay case (Fig. 12.11) the modified Monte Carlo simulation
is now reacting to the high error density by introducing AIS, and the mean
metric value is consequently reduced and now resembles the measured results.
Decoder behaviour has thus been identified as the major cause of discrepancy.
The Monte Carlo simulator was also modified so it could react in a similar
way to a practical decoder. Fine tuning of this modification to include decoder
delays was attempted but with little effect on the metric values.
The original simulation, based on mark error probability is a record of
what actually happened, and what could be achieved by way of metric
variation if the decoding process could cope with the error burst densities
involved.
120
....
100
I'.,::.:,,:::-:.:-:.:-.:.-=-:.:-:.:-:.:-:;-:,;-:.:-~-_
80
_._
:c
<.S
0;::
a;
c:
III
CD
60
40
i /'. . . . . .. //
......
."
/ / ../
20
,..:::<
0
20
/' I
,.......................
.......
::::.::::.:::" measured bit errors
.....
...,....
- - - - - measured code errors

........... mOdified Monte Carlo bit errors
- - modified Monte Carlo code errors
0
Fig. 12.10
216 PRE-EMPTIVE NETWORK MANAGEMENT
25
-----...........
- -
20
.l!l
:0
u
.;:
Qi
E
measured bit errors

measured code errors
modified Monte Carlo bit errors
modified Monte Carlo code errors
15
10
CIl
CIl
0
-5
20
40
60
80
10
Fig. 12.11
12.9
RESULTS FOR 5868 CODING
The 5B6B block code [31] is well documented and, in contrast to the HDB3
case, the decoder action is precisely defined. Consequently a good model of
decoder action is possible. For all the simulations the SNR = 6 (i.e. the
background bit error ratio = 10 x 10- 9), and the decision threshold was set
to its nominal position. A set of 100 simulation runs is presented for each
scene. It is assumed that the decoder remains in alignment during the error
burst which is reasonable for the interferer durations studied here. Griffiths
[31] quotes the mean total realignment time of a 5B6B decoder to be 750 bits.
12.9.1
Identification of interference parameter variations
The results for the peak and decay interferer with varying decay time constant
are illustrated in Fig. 12.12 and show a mean metric that follows the now
familiar trend with a tight 95010 confidence interval, even for this limited
number of runs. It should be noted that the code error indications are based
RESULTS FOR 5868 CODING
217
70
- - bit errors
- - - - - code errors
........... scaled code errors
60
50
.2l
:0
c5
.;:
40
Qi
c:
ell
Q)
30
20
10
0
0
--- --- ---
.......... --- ------........ ---
---
Fig. 12.12
Metrics for 5868 coding for a peak and decay interferer with varying decay
time constant.
on words. The code error mean metric axis requires a scaling factor of 5 to
be applied for comparison with the bit error mean metric axis. When this
scaling is applied, the mean metric is found to have similar values for both
bit and code errors. It is therefore used in all further results for 5B6B coding.
As the decaying sine wave interferer varying decay time constant increases,
more cycles of the interferer contribute to the error activity, until all the cycles
in the 320-bit duration are contributing to the error activity, at which point
no further increase in the mean metric is possible. The mean metrics have
a tight 95010 confidence interval even for this limited number of runs
(Fig. 12.13).
As the decaying sine wave interferer with varying duration increases, more
cycles of the interferer contribute to the error activity. The mean metrics have
a tight 95% confidence interval (Fig. 12.14).
12.9.2
Identification of system parameter variations
The results from the Monte Carlo simulation of the peak and decay interferer
with varying decision threshold offset are illustrated in Fig 12.15. As the
218
--------------80
70
.l!l
:c
u 60
- - bit errors
----- code errors
';:
Qi
E
as
Q)
E
60
40
30
20
10
100
Fig. 12.13
Metrics for 5868 coding for a decaying sine wave interferer with varying decay
time constant.
90
80
- - bit errors
----- code errors
70
60
,,
,
,,
,,
,
,,
,,
,
,,
,,
,
,,
,,
,
,,
,,/
,,
,,
,
,,
,,
,
,, "
,
,,
,
,,
,
,
20
10
0l-......L_......._I--L.,............._L...-......L_........_I-.....I
32
96
224 256 288 320
duration, bits
Fig. 12.14
Metrics for 5868 coding for a decaying sine wave interferer with varying duration,
RESULTS FOR 5B6B CODING 219
80
---,,
,
,,,
,,
r"""
,,
,
,
70
60
.l!l
- - bit errors
- - - - - code errors
I
\
,
,,,
,
,,,
,,
,,,
,,
,,
,,
,,
,
:0
<.J
';::
Q)
,,
,
ttl
Q)
,
,,,
,
,,,
,
,,,
,
,,
10
0
-100
-50
50
100
Fig. 12.15
Metrics for 5868 coding for a peak and decay interferer with varying decision
threshold offset.
decision threshold offset sweeps from - 100070 to + 100070 the mean metrics
focus towards a particular value and then defocus.
The most focused set of results coincide with the minimum values for
the mean metrics for an offset of 0070. In this region little, if any, error activity
is produced.
As the decision threshold offset sweeps from - 100070 to + 100070 the mean
metrics focus towards a particular value and then defocus (Fig. 12.16). The
most focused set of results coincide with the minimum values for the mean
metrics for an offset of + 40070. In this region the effect of the interferer
is counteracted by the decision threshold offset and little error activity results.
Here the scaling of the code error axis by a factor of 5 does not give such
a convincing correlation between bit and code error mean metrics at the
extremes of the decision threshold offset. This can be explained by the wide
spread of metric values that are produced by the high number of background
errors introduced by the larger decision threshold.
220
90
I
80
- - bit errors
__ - - - code errors
.l!l
:0
<5
'':::
Qi
,'",,
,,
,,
,,
,,
c:
<ll
Q)
30
20
10
0
-100
-50
0
Fig. 12.16
Metrics for 5B6B coding for a peak and decay interferer with varying decision
threshold offset.
12.10
CONCLUSIONS
A mechanism has been devised which can detect changes in interferer and
system parameters from the error activity they induce in a transmission
system. In particular variations in interference parameters such as duration
and decay time constant can be detected. Variations in system parameters
such as SNR, decision threshold offset and linearity can also be detected.
The metrics produced behave in similar, almost identical ways for both bit
and code error activity. This enables the use of code error metrics as 'inservice' indicators of bit error performance. Decoder actions have also been
shown to be critical, and under high error-density conditions can inhibit true
error detection and metric determination. An 'in-service' detection of system
ailments and changes from code-error activity has therefore been demonstrated, and this may allow the cause of impending failures to be predicted
from the transient patterns.
Identifying the cause of an error burst only relies on the decay and
duration times plus the form for all transient interferers. It is certain that
REFERENCES
221
the difference between a lightning strike, electrostatic discharge from the

human hand and power surges can be detected and discriminated. Fidelity
beyond this crude level has yet to be established for operational environments.
This requires a further detailed measurements programme to establish
viability. Alternatively, the accurate recording of future events may prove
sufficient as the sophistication of management systems advances. All that
can be guaranteed at present is that the potential looks promising.
REFERENCES
I.
Jones W T and Pullum 0 0: 'Error performance objectives for integrated services

digital networks exhibiting error clustering', Second International Conference
on Telecommunication Transmission, lEE Conference Publication No 193,
London (March 1981).
2.
McLintock R Wand Kearsey B N: 'Error performance objectives for digital

networks', British Telecommunications Engineering J, 1- (July 1984).
3.
Schickner M J and Coleman C S: 'Requirements for the design of an automatically

switched 140 Mbit/s digital service protection network', Third International
Conference on Telecommunication Transmission, London (March 1985).
4.
Dwyer J M and McCarthy D: 'Network performance features of a digital

transmission network management system', International Conference on
Measurements for Telecommunications Transmission Systems, lEE Conference
Publication No 256, London (November 1985).
5.
Tremel J Y and Thebault R: 'Digital transmission network centralised monitoring

and supervising related to CCITT Recommendation 0.821', International
Conference on Measurements for Telecommunications Transmission Systems,
lEE Conference Publication No 256, London (November 1985).
6.
Recommendation 0.821: 'Error performance of an international digital

connection forming part of an integrated services digital network', CCITT Red
Book applicable after the eighth plenary assembly (1984).
7.
Recommendation 0.821: 'Error performance of an international digital

connection forming part of an integrated services digital network', CCITT Blue
Book applicable after the ninth plenary assembly (1988).
8.
McLintock R W: 'The international standardisation of error performance

criteria', First lEE Conference on UK Telecommunications Networks present
and future, lEE Conference Publication No 278, London (June 1987).
9.
Berger J M and Mandelbrot B: 'A new model for error clustering in telephone
circuits', IBM Journal of Research and Development, 7, Part 3, pp 224-236
(1936).
-
222
10. Muntner M and Wolf J K: 'Predicted performance of error control techniques

over real channels', IEEE Transactions on Information Theory, IT-14 , No 5,
pp 640-650 (September 1968).
-II. Lewis PAW and Cox D R: 'A statistical analysis of telephone circuit error data' ,
IEEE Transactions on Communication Technology, COM-14 , No 4, pp 382-389
(August 1966).
12. Bond D J: 'A theoretical study of burst noise', BT Technol J, 5 , No 4, pp 51-60
(October 1987).
13. Gilbert E N: 'Capacity of a burst noise channel', Bell System Technical Journal,
pp 1253-1265 (September 1960).
14. Adoul J A: 'Error intervals and cluster density in channel modelling', IEEE
Transactions on Information Theory, pp 125-129 (January 1974).
15. Kanal L N and Sastry ARK: 'Models for channels with memory and their
applications to error control', IEEE Proceedings, 66, No 7, pp 724-744
(July 1978).
16. Ephremides A and Snyder R 0: 'Modelling of high error rate binary
communication channels', IEEE Transactions on Information Theory, IT-28 ,
No 3, pp 549-556 (May 1982).
-17. Pullum G G: 'Modelling burst errors in digital transmission', Problems of
Randomness in Communication Engineering, Pentech Press, pp 134-143 (1984).
18. Neyman J: 'On a new class of contagious distributions, applicable in entomology
and bacteriology', Annals of Mathematical Statistics, 10, No 35 (1939).
19. Butler R A and Cochrane P: 'Correlation of interference and bit error activity
in a digital transmission system', lEE Electronics Letters, 26, No 6, pp 363-364
~~1~.
20. Butler R A and Cochrane P: 'The use of correlation of error activity as a
diagnostic tool', lEE 3rd Bangor Symposium on Communications, UCNW,
Bangor, North Wales, pp 214-218 (May 1991).
21. Butler R A and Cochrane P: 'Failure type forecasting via burst errors', lEE 5th
Bangor Symposium on Communications, UCNW, Bangor, North Wales, pp
215-218 (June 1993).
22. Strang C: 'Definition of objectives for error burst power spectral density metrics',
Thesis for the Degree of Master of Science in Information Systems, The Robert
Gordon University (September 1992).
23. Bouton P M and Ma F: 'On the Monte Carlo simulation of dynamic systems',
SimUlation, pp 267-273 (June 1990).
24. Jeruchim M C: 'Techniques for estimating the bit error rate in the simulation
of digital communication systems', IEEE Journal on Selected Areas in
Communications, SAC-2, No 1, pp 153-170 (January 1984).
REFERENCES
223
25. Recommendation G.703: 'Physical/electrical characteristics of hierarchical digital

interfaces', CCITT Red Book applicable after the eighth plenary assembly (1984).
26. Cochrane P, Hall R D, Moss J P, Betts R A and Bickers L: 'Local line single
mode optics - viable options for today and tomorrow', IEEE Journal SAC-4/9
Special Issue on Fiber Optic Systems for Terrestrial Applications, pp 1396-1403
(December 1986).
27. Kawashima M: 'A new measurement method for eye opening estimation on
equalised digital signal', Measurements for Telecommunications Transmission
Systems, MITTS 85, lEE Conference Publication No 256, pp 14-18, London,
UK (November 1985).
28. May P J: 'Error statistics and measurement for digital transmission systems',
Measurements for Telecommunication Transmission Systems MITTS 85, lEE
Conference Publication No 256, pp 24-27, London, UK (November 1985).
29. Udren E A: 'Man-made interference in digital communications systems',
Dissertation for the Certificate of Post Graduate Study in Engineering, University
of Cambridge (June 1978).
30. Becam D, Codet A and Ruhault F: 'Error detection in operating digital systems
- influence of error distribution', lEE Proceedings Part F, 134, No 5, pp
459-463 (August 1987).
31. Griffiths J M: 'Binary code suitable for line transmission', Electronics Letters,
~, No 4 pp 79-81 (February 1969).
13
EVOLVING SOFTWARE
C S Winter, P W A Mcilroy and
J L Fermindez-Villacanas Martin
13.1
INTRODUCTION
A key feature of the industrial revolution was the replacement of hand-crafted

manufacture with mass-production techniques. In the early period of
industrialization, craftsmen used machines as tools of manufacture; later,
the machines completely supplanted the craftsman. Computers have reached
a similar stage in their development - the computer is a tool used by a skilled
programmer to produce an end-product - a program. Software engineering
is still a craftsman's industry, awaiting the development of mass production.
This chapter describes how the computer could replace the craftsman
programmer by using techniques borrowed from biological evolution to
generate new programs without any need for human instruction.
Many of the ideas described in this chapter have emerged from a new
field of study called' Artificial Life'. The field is very diverse but it is unified
by the belief that complex biological phenomena can be modelled using simple
systems. The simulations typically consist of a number of simple building
blocks that interact to form complex patterns - a principle known as
'emergent behaviour'. This approach to modelling biology should be
contrasted with those that insist that complex phenomena must be modelled
using complex formalizations. Evolutionary modelling is a key tool in the
'emergent behaviour' approach as described below.
BIOLOGICAL EVOLUTION
13.2
225
BIOLOGICAL EVOLUTION
In 1859 Darwin first described how 'evolution' could lead to the formation
of new species of animals [1]. Darwin's model of evolution required a diverse
population of organisms that competed to obtain sufficient resources to
reproduce. Those organisms that obtained the necessary resource had to be
able to pass on to their offspring information on the strategy that they had
used. Evolution has come to be associated with the phrase 'survival of the
fittest', from the struggle for resource. This conceals, however, the importance
of several other elements of the evolutionary model. The four key elements
are - competition, diversity, selection and reproduction.
Competition - the resources must be limited and essential for

reproduction if the competition is to be meaningful. In nature the resource
can be food, space, mates, etc. In computer simulations there is normally
a limit on either the number of organisms 'alive' at anyone time or on
the available CPU time or memory space.
Diversity - there must be a diverse range of strategies in the population

for obtaining and using the resources, otherwise there can be no selection.
Selection - clearly, to select the fittest it is necessary that there be some

discrimination between the competing entities. If the discrimination is
purely based on fortuitous changes in the environment rather than the
strategies of the individuals, then there can be no progressive
improvement in performance.
Reproduction - it is the repeated cycle of reproduction and selection

which drives evolution. For example, a creature with an eye would need
to have an enormous advantage over one without in order to become
the dominant species in just one generation. However, a creature with
just a small advantage (perhaps having just a single photocell) could come
to dominate the population in time since advantage accumulates rather
like compound interest. With an advantage of one per cent after 1000
generations the accumulated advantage is 1.01 1000 = 20 959. Thus,
creatures with one photocell would come to dominate those without, to
be beaten in turn by those with two, four, eight, etc, number of
226
EVOLVING SOFTWARE
photocells, leading by steps to the formation of an eye. This is the way

in which small advantages can generate better and better solutions to
the problem of competition.
Successful strategies must be inaccurately passed on to the next generation.
The best strategy in the first generation will eventually displace all the others
and the evolutionary process will cease if the strategies are passed on
unaltered. At the opposite extreme, where there is no correlation between
the strategies of the parents and their offspring, evolution degenerates into
a random search. Evolution only works when the current best strategies are
starting points for better strategies.
In biology there are two processes that ensure that offspring differ from,
but are similar to, their parents - crossover and mutation. Crossover, the
main sexual operator, mixes together two existing strategies, whereas mutation
produces new strategies. However, the passing on of the genetic information
must be in a 'discrete' manner, not as a blending of characteristics. It was
quickly proved in Darwin's day that blending of parents would rapidly lead
to the removal of all diversity from the population. When combined, these
two operators form a powerful tool to search large problem domains for
solutions in an efficient manner.
In biology, the strategies are encoded as 'genes' in the genetic material
(DNA [2]). This level of representation is termed the 'genotype'. However,
the strategies are implemented by converting the DNA code into proteins.
These proteins form the physical, operational body called the 'phenotype'.
Selection operates on the phenotype; but reproduction occurs by the mixing
and/or mutating of the genotype. The mapping between the two is very
complex.
The mapping from the genotype to the phenotype is generally much
simpler in computer simulations than in biology. This simple coupling causes
current computer evolution to proceed in a slower and less powerful way
than biological evolution. As the mapping of genotype to phenotype
approaches the complexity of that in biology, it may offer many more exciting
and powerful options [3].
13.3
CURRENT COMPUTER MODELS
Computer models based on evolutionary principles have tackled real problems

with varying degrees of success. The three main forms of computer simulation
are genetic algorithms (GA) [4, 5], genetic programming (GP) [6] and
artificial life (AL) [7, 8].
13.3.1
227
Genetic algorithms
Genetic algorithms were first described by Holland [4] in his seminal work
in the field. A genetic algorithm consists of a linear string of symbols. The
length of the string is fixed. The symbols represent possible functions or states
of the system. In the simplest form the string consists of binary symbols and
each position in the string represents one of two possible functions or values
that the system may possess. The string thus represents the 'genotype'.
Each string is evaluated to see how effective the particular combinations
of values or functions it represents are at tackling the given problem. The
success of the string at tackling the problem is expressed by a numerical value
- its 'fitness'. Conversion of the string symbols to their respective functions
or values is equivalent to the conversion of genotype to phenotype in biology.
The fitness function then represents success in competition and reproduction
- a high fitness implies more progeny in the next generation. An example
that has been tackled in this way is the travelling salesman problem. The
string might encode the order of the cities to be visited and the fitness would
be related to the distance travelled on that particular journey.
After the assignment of fitness two strings are selected for replication.
The chances of selection are related to the string's 'fitness' although the precise
selection mechanism differs between the various implementations of genetic
algorithms. The two parent strings produce a child by means of the 'crossover'
operator. Figure 13.1 shows how the simplest crossover operator works. First,
a point is chosen on one parent string - all the string values before this point
are passed to the offspring, and all the string values after the equivalent point
on the other parent are passed on. More sophisticated crossover operators,
such as two-point selection, have also been studied. Mutation is typically a
secondary, low-frequency event which inverts the binary value at a single
point on the string. This process of selection, crossover and mutation is
repeated until there are the same number of children as parents, at which
point all the parents are killed.
parent 2
parent 1
~ x~
crossover here
~
child
Fig. 13.1
Genetic algorithm crossover operation.
228
EVOLVING SOFTWARE
The great strength of the genetic algorithm lies with this crossover
operator. Holland showed that the search for possible solutions actually
proceeds much faster than the number of individuals in the population might
indicate. This is because each string represents not just itself, but a whole
family of related strings where each binary position can be replaced by a
'#' or 'don't care' symbol. Thus the string' 1100' also represents the solutions
'#100', '#1##', etc. The fitness of any string is effectively the sum of the
fitness of each of these representations or 'schemata'. The crossover operator
works by maximizing the frequency of the best schemata in the population.
Since each string represents many schemata (for a 4-bit string, 24 = 16
schemata), the search proceeds much faster than might first appear. This
'schema' theory has explained why such evolutionary processes are more
efficient than a random search. An excellent introduction and explanation
of it is given in Jones [9] and a detailed mathematical analysis in Holland
[4]. Unfortunately schema theory can only be applied to strings of fixed
length. Clearly for some problems the form of the solution is known but
not its exact parameters (in the travelling salesman problem, for instance,
the number of cities is known but not the order). Genetic algorithms work
well on these problems. However, for many problems neither the form nor
the parameters of the solution are known. For these problems genetic
algorithms are largely useless. Despite this limitation genetic algorithms have
been successfully used in an enormous range of optimization problems and
are a well-proven technique, demonstrating that evolutionary approaches can
be used to tackle problems where other heuristic techniques fail.
13.3.2
Genetic programming
Conventional genetic algorithms are incapable of representing computer

programs because they use a fixed-length, linear-string representation. Genetic
programming was developed to overcome these limitations; it does not include
any further biological features (indeed, in some ways, it is less realistic than
genetic algorithms), but rather it expresses the evolutionary paradigm in a
more 'programmer-friendly' manner [6].
In genetic programming the program is represented as a parse tree. Figure 13.2 shows the tree form. The LISP representation actually manipulated
by the program would be (+ (- T1( + T2 T1 (IFLTE( + T3 T2) TI T3 T2.
There are two components to the tree - nodes and leaves. The nodes are
a predefined set of functions (say +, -, <, etc) and the leaves a set of
terminals (variables, constants, input values, random numbers, etc). The
components of the tree are limited to the members of the terminal and
function sets defined at the start of any given simulation. Normally each
member can recur as many times as required in any given tree. The set of
229
= (T1 - (T2 + T1)) + (If ((T3 + T2) < = T1) T3 else T2)
Fig. 13.2
Genetic program tree using the function set! + , - , if-Iess-than-or-equal-to I and

the terminal set [Tl, n, T3J.
functions used in a simulatio'l is fairly small, typically 3-8 members. The

first implementations of genetic programming were in LISP [6], which is
still the favoured language, although implementations in C, C+ + ,
Mathematica, Scheme and XLisp now exist.
The evolutionary process begins by generating a random set of trees. These
are evaluated against the test problem and assigned a fitness (similar to the
genetic algorithm approach). Two of the trees are then selected at random,
with a probability that is related to their fitness, and a child produced using
the crossover operator shown in Fig. 13.3. This crossover mechanism operates
by cutting one sub-tree from one parent and replacing it with a sub-tree from
the other parent, much as an apple tree can be grafted on to a pear tree stem.
Although two different children are possible, only one child is normally
generated. It should be noticed that, in contrast to genetic algorithms, the
tree size is not preserved. In this way new programs are built up using the
best of the building blocks developing in the population. Since the crossover
operator can produce new trees of greater or lesser length than the parents
there is no strict size limit to the programs that can be produced. However,
the time required to evaluate the fitness function normally limits the
complexity of the program that can be evolved.
Genetic programming has been shown to rapidly produce short, compact
programs to tackle a wide range of problems such as Boolean multiplexers,
nonlinear control strategies, curve fitting, pattern recognition and game
strategies. A long list of example problems and solutions are given in Koza
[6] , where it is conclusively demonstrated that genetic programming is vastly
more effective than a random search for solutions. A later section discusses
why genetic programming works at all - given that the crossover operator
230
EVOLVING SOFTWARE
parent 1 x parent 2
child
Fig. 13.3
Genetic program crossover operator. The short arrows mark the place the crossover
operator breaks the chains into two sub-trees.
breaks most of the requirements of the schema theory (fixed length, positional
relationship, etc).
13.3.3
Artificial life programs
Genetic algorithms are not creative - they can only produce solutions that
lie in the region bounded by the largest and smallest numbers the strings can
represent. They do not generate new functions. Genetic programs can generate
new functions - by combining elements from the original set to produce
functional sub-trees, although an inappropriate selection of functions at the
start can inhibit or even stop the evolutionary process. Neither genetic
algorithms nor genetic programs can be used if the desired outcome is a group
of programs that co-operate with each other to tackle the task. A radically
different approach to evolving systems has been pioneered by Ray [7] and
Skipper [8]. They attempted to model biology directly rather than the
abstractions used in genetic algorithms and programs. Despite this, they offer
insights into how evolving software might develop and techniques that might
be included into the more conventional approaches. Here the Tierra model
231
of Ray [7] is described - the C-zoo program developed by Skipper is similar

in concept but different in implementation [8].
The Tierra program is currently run on a serial computer (typically a
workstation or powerful PC). The program emulates a virtual multipleinstruction multiple-data (MIMD) parallel machine. This virtual computer
environment was chosen to minimize the risk that a virus could evolve and
escape from the program into the operating system. However, the program
is clearly more suited to a parallel processing machine than to a serial one.
The Tierran world is a one-dimensional string of memory cells. Each cell
can contain one instruction from the virtual pseudo-assembler language that
defines operations in the Tierran world. Each Tierran creature consists of
a contiguous block of these cells or instructions. Normally a creature only
executes its own code; however, as will be described below, access/reference
can be made to any no-operation instructions (nop-O, nop-I) in other
programs in the Tierran soup.
The programs compete both for CPU time to execute their code and for
memory space. The Tierra program controls CPU allocation and the lifetime
of each creature by two mechanisms - the 'slicer' and 'reaper' queues. Each
creature is assigned a place in both these queues. When an organism has risen
to the top of the slicer queue it is allocated a slice of CPU time, and can
execute a number of its instructions. When the time slice is finished the
organism leaves a pointer to the last instruction it completed and gives way
to the next program in the queue. If an organism makes an error whilst trying
to execute an instruction its error counter is incremented and the program
moves on to the next instruction. The error counter controls the movement
of the organism in the reaper queue. At periodic intervals, the organism at
the top of the reaper queue is killed and its memory released back into the
central pool.
The Tierran pseudo-assembler language consists of a set of 32 instructions.
Each instruction is coded for by five bits, including its operands. The set
has been kept small because it was designed to imitate the 20 amino-acids
biology codes for using DNA. There are good reasons for believing that a
small instruction set makes evolution easier. Tierra avoids the use of numerical
operands to help keep the effective instruction set small. In Tierra addressing
is always by 'template' rather than by location, as in a normal computer
program. The template idea comes from biology again, where proteins
identify their intended 'address' by matching a chemical 'template' on their
surface with the surrounding molecules. In Tierra, each jump Ump] instruction
is followed by a sequence of no-operation [nop] instructions of which there
are two kinds, nop-O and nop-I. The system will search outward in both
directions from thejmp instruction looking for the complementary pattern.
If the pattern is found the instruction pointer moves to the end of the pattern
232
EVOLVING SOFTWARE
and resumes execution. If it is not found an error condition is returned and

thejmp instruction ignored. Full details of the CPU implementation can be
found in Ray [7]. A mutation in one of the templates can cause the program
to start executing the wrong part of its code; if the mutation happens in the
template that marks the beginning or end of the offspring, the new program
will thereby be enlarged or shortened. Equally it is possible to jump into
another organism and then use part of its code in the copy procedure, thereby
permitting a variety of parasitic and symbiotic relationships to develop.
Initially the soup is seeded with a single hand-written program that
contains a routine which, on being executed, copies the program to another
location in memory. However, the execution of instructions is flawed at a
low rate (equivalent to mutation), so that they can execute slightly incorrectly.
These flaws can either affect the function or the copying of the 'program',
so that the offspring are not always identical to the parents. Programs are
killed when, through these mistakes and poor code in their structure, they
make errors.
Every generation the programs with the highest accumulated number of
errors are removed - freeing memory for which the survivors compete as
a location for their offspring. Fitness is defined simply by a creature's ability
to utilize memory and CPU more efficiently than its neighbours. A given
organism's fitness is a local function, depending on its interactions with its
co-evolving competitors, since each can utilize, to some extent, the code of
other programs present in the memory soup. This leads to a wide range of
interactive behaviours such as parasites, predators, viruses, colonies and social
behaviour. Replication is implicit in the system and the genetic representation
is open-ended. In these respects AL systems are more realistic models of
evolution than genetic algorithms but correspondingly more difficult to
constrain.
13.3.4
Differences between the approaches
There is an enormous difference in the approach between genetic algorithms

or genetic programming and artificial life simulations. The former two use
'task-oriented' evolution, the latter uses 'environment-oriented' evolution.
The difference this causes in both the use of these techniques and the answers
they return can best be understood in the following manner.
Imagine the scenario where a fast race horse is required - there are two
approaches that could be taken:
select a herd of horses, race them, discard the losers and breed from the
winners, and then repeat this process until one horse runs the race
sufficiently quickly for the original purpose (= selective breeding);
233
select a nice open grassy plain like the Serengheti (food but no cover),
populate it with lions, release a bunch of assorted animals and wait (a
few million years), and then come back and select those animals that
have evolved to escape from the lions (= open-ended evolution).
The first method is how genetic algorithms work. You know what you
want, you formulate the problem precisely and you already know the general
form of the solution. It is efficient but unoriginal, and requires considerable
pre-simulation knowledge. The second method is how artificial life techniques
work. You create a general environment that represents the nature of the
problem you want to solve. Then you sit back and see what comes out. It
might be a fast horse; it might be a rhinoceros; both solve the problem as
expressed! The artificial life approach is less controlled but correspondingly
more creative. It can produce solutions of a form never thought of or
anticipated. Such simulations may be a first step to producing originality
and innovation with a computer. The next section describes some of the
frustrating creativity that artificial life programs can demonstrate.
13.4
The chapter has concentrated on how genetic programming and artificial life
techniques work. These both seem more promising than genetic algorithms
for program writing because they are designed from the beginning to evolve
programs of variable length. Genetic algorithms generally use fixed-length
strings. Although variable string length implementations exist [10, 11] , their
theoretical basis has yet to be demonstrated. Such implementations fit uneasily
into the schema theory, thus largely negating the advantages of the genetic
algorithm approach.
13.4.1
Genetic programs
Koza has used genetic programming to tackle a range of problems [6] . Koza's
method has been applied to a number of problems using a C implementation
of this technique [12] and a series of trials of the techniques has begun. The
problems tackled fall under the heading of 'pattern recognition'. They consist
of trying to find patterns in an apparently chaotic or very noisy data set and
trying to identify generalized features in a pattern. Curve fitting, trends in
noisy data and feature recognition all represent complex problems that might
be solved using fairly short evolved sequences of code. Tackett [12] has
already shown that genetic programming can produce efficient pattern
recognizers - evolving a program that identified tank targets from noisy
234
EVOLVING SOFTWARE
IR detector data. Whilst similar work can be done with neural networks,
Tackett [13] found that the genetic programs outperformed the neural nets.
The genetic programs were found to be remarkably good at curve fitting,
quickly deriving a formula very close to that originally used to generate a
test set of data points. Similarly, faced with an experimental time series, the
genetic program evolved an algorithm that predicted the subsequent behaviour
of the data series better than that obtained through simple linear regression
analysis. Now, programs are being evolved that, hopefully, will recognize
facial features by deriving a suitable algorithm from a set of test faces.
Problems like pattern recognition, where it is difficult for a human
programmer to know where to start, are particularly appealing to evolving
software practitioners. Although it is important to choose a suitable initial
function set (Le. it is necessary to specify the set of operators [ +, -, *, /,
IF, LOG, etc] available to the GP), it is not necessary to specify in advance
how the GP will combine the operators it uses. The evolution time required
for the first two problems has been of the order 1-10 hours and thus within
the power of current machines. The resulting programs have possessed less
than 100 nodes and leaves.
There is a well-developed theory to explain why the crossover operator
in genetic algorithms works efficiently (see the schema theory in section 13.3).
Koza shows that crossover, rather than mutation or random search, is also
the key to the speed of genetic programming [6]. However it is not clear
why the genetic program crossover operator should be an efficient search
tool. At a first examination a randomly selected sub-tree is unlikely to improve
another tree when it is grafted on. Although the sub-tree is physically
unchanged, one would expect its change in location to change its meaning.
So, although the new tree is likely to be syntactically valid, one would not
expect it to produce a meaningful result. The explanation lies both in the
nature of the evolutionary process and in the concept that the meaning of
a sub-tree depends little upon when it is moved.
Examining an evolved program it is quickly noticed that many branches
of the tree are similar. Structures have developed that provide general
problem-solving tools that are hierarchically arranged. This can be pictured
for pattern recognition as the development of crude classifiers that operate
by progressively filtering their output upwards through a tree of similar crude
classifiers. The tree is then self-similar. At whatever level the tree is viewed,
the same classifier structure exists. This has great evolutionary benefits, since
there is only one basic structure that evolves in a variety of patterns and is
modified slightly to fine-tune the structure. Now any sub-tree has a similar
meaning when it is grafted on to another tree. The evolutionary process drives
the trees in this direction because such 'self-similar' trees will always have
output reasonably similar to their parents. Trees whose offspring are unpre-
235
dictable are likely to produce a high ratio of failed children which will lead
in the long run to their elimination from the pool of possible solutions. Thus,
a 'self-similar' solution offers many evolutionary advantages and will tend
to arise regardless of the function set put in initially. It is interesting to
disassemble such programs and observe what basic functional units have
evolved.
The great advantage of genetic programming over other techniques is that
the evolved parse trees can often be simplified and presented in a manner
that a human programmer can understand. This 'understandability' and ease
of relationship to, in particular, standard programs makes the technique more
user-friendly and easier to integrate into standard programming environments. Thus, genetic programming might soon become a standard tool
for developing small, but complex, pieces of code in normal development
procedures.
13.4.2
Tierra and (-zoo
Tierran simulations have shown that the programs can increase the speed
of their self-copy algorithms by a factor of six when the simulator is run
overnight. They have achieved these improvements by significant alterations
to their original programs, such as 'unrolling the loop' [7]. Thus, Tierra
might be a good tool for the optimization of application programs, and may
be particularly useful for the programming of massively parallel machines.
However, the only problem tackled by the Tierran organisms described in
the literature is that of efficient self-reproduction. So the question remains
- can Tierran programs do something else apart from reproducing?
Tierra has been altered so that the creatures can be forced to solve practical
problems rather than just reproduce. The original instruction set was modified
so that programs could read and write from I/O buffers. These buffers can
be used by the programs to communicate directly with other creatures or
with the user. The user places the problem in the input buffer and the creature
replies via the output buffers. The slicer queue was modified so that creatures
communicating to the user were rewarded with extra CPU time according
to how well they performed the task.
The first problem fed to the Tierran creatures was a simple maze-running
problem, the intention being to reward those creatures that learnt to follow
the maze quickly. The second problem was to evolve a general algorithm
for converting any input 32-bit integer so that all the bit positions were unity.
The results were both disappointing and surprising. In both cases the
organisms found efficient ways to 'cheat'. In the maze-running problem, the
programs ignored the problem and indeed evolved maze-running software
out of their system altogether. Then, instead, they optimized their reproduc-
236
EVOLVING SOFTWARE
tion loop using the minimal CPU time permitted to organisms which refused
to co-operate. It was necessary to 'feed' the bugs a minimal time, even if they
were not tackling the problem, in order to let the process of evolution start.
Viewed from the organism's perspective the problem is how to use the CPU
resource available to reproduce efficiently. Perhaps the environment failed
to provide sufficient reward and they found a creative solution to reproducing
efficiently in the minimal time available. Clearly the bugs found that evolving
a good maze-running algorithm in order to gain more resource was not a
satisfactory evolutionary strategy in this environment. The slicer implementation was altered further and the second problem fed to the simulator. This
time the programs kept intact the low-quality algorithm initially embedded
in their code but evolved their reproductive code much more rapidly than
their bit-manipulation code. These programs were clearly more concerned
about reproduction than task solving. From the human perspective they
'cheated' and exploited loopholes in the environment to maximize their reproductive success. Although artificial life programs are particularly good at
this, any evolutionary system, be it genetic algorithm/programming or Tierra,
will exploit any available loopholes in the fitness functions where it enhances
its reproductive success. It is very easy to anthropomorphize when describing
the apparent behaviour of these programs and speak in general terms of their
'behaviour' .
Apart from going their own way, Tierran programs have a further
disadvantage over genetic programming - it can be extremely difficult to
disassemble a creature to work out how it is functioning and even more
difficult to analyse how the whole environment is operating.
Despite these difficulties some progress has been made with C-zoo.
Evidence of programs co-operating to tackle a task has been observed by
altering the interactions controlling the creatures in the C-zoo. Co-operation
between programs is difficult to evolve in genetic algorithms or genetic
programming because the evolving programs do not directly interact with
each other, and the interactions are limited to a rather remote selection
procedure. In C-zoo and Tierra, interactions through templates left in the
memory allow a wide range of competitive and co-operative behaviour. It
would be interesting to maintain this feature whilst making the system more
task-oriented.
13.4.3
Hermes
Hermes is a simulator developed at BT Laboratories to incorporate more

artificial life features into a controlled task-oriented environment. Specifically,
it was felt that fitness functions allowed more control of task-orientation
than CPU and memory utilization, but that inter-program communication
237
could lead to interesting colonies of heterogeneous programs that co-operated

to perform a task. The Hermes model is based on the observation that
most communication in biology is via messages that have a meaning only
to the receptor. In Hermes each creature consists of a set of condition-action
rules that form its sensors and a further set of such rules that form its
effectors. The sensors recognize messages either from the environment or
from the creature's internal register. If the sensor is triggered by a message
of sufficient strength, then it releases a message to the environment or to
the creature's own registers. If an effector recognizes a message from the
internal registers and is triggered, then the organism carries out the effector's
action. Messages placed by the Hermean creatures have an initial strength
and a characteristic decay time. The condition, action, strength and decay
rates of a message are all evolvable. All rules are operated simultaneously
in any given cycle of evaluation.
An example world might have messages placed in the environment
identifying the location of food. The creature's sensors would need to
recognize these messages and post internal messages that then activated the
effectors to move towards the food and eat it.
Similar message systems - or classifiers - have been proposed before
and used both with and without the evolutionary aspect. Holland [14]
invented a classifier system where the creatures were single rules. Credit was
assigned to successful rules (those that participated in solving the problem)
by a non-biological, non-genetic bucket brigade algorithm. The work was
taken further by making each organism a collection of rules [15]. However,
these creatures only posted messages to themselves - not to each other via
the environment. In all the classifier systems the lifetime of a message was
only one update cycle. In biology many different self-organizing effects occur
through the interplay of the duration, distance and strength of a message.
For instance, ants communicate using touch (short range, short time), sound
(long range, short time) and smell (long range, long time). Hermes thus adds
to traditional classifier models the ability to communicate with other
organisms in the evolving world and the ability to place messages with
different ranges, decay times and strengths. The firing of a rule then becomes
dependent on whether the sensor detects messages of sufficient strength in
the environment.
The nature of the messages and the sensors are illustrated in Fig. 13.4.
The recognition of messages by sensors is 'sloppy'. Sensors identify messages
that match a pattern of Is, Os or #s (= don't care). Thus each sensor can
recognize a range of messages. Sensors can evolve to be highly specific (e.g.
recognize '11111111' messages) or to be very general (e.g. recognize
'#######0' - any message ending in a '0'). The rules can mutate on
reproduction and offspring inherit half of each parent's rule set (crossover).
238
EVOLVING SOFTWARE
environment message list
11001100
11110010
11110000
10101010
10000001
sensor
condition
effector
action
condition
action
internal message list
Fig. 13.4
Hermes classifier system. A sensor can detect messages either from the environment
or its own internal list and place them back into either register.
The mutation operator can cause both bit mutations within a rule and rule
duplication or deletion. This means the complexity and number of rules in
a creature can increase or decrease with time under evolutionary pressure.
Hermes has been used to evolve solutions to a variant of the travelling
salesman problem. In this variant there are a number of tasks located on
a grid and a number of 'salesmen' who must visit these tasks and complete
them in the minimum time. Each salesman has a particular skill set and is
rewarded for completing jobs according to how well their skill set matches
that job. However, during the day more tasks appear - how do you best
utilize your salesmen to complete the maximum number of jobs? This is a
'multiple travelling salesmen in a fog' problem. The Hermean representation
is very computationally expensive so that evolution is relatively slow compared
with genetic programming. However, evolution of the programs has been
observed, but, to date, no examples of inter-program co-operation have been
clearly identified. Like Tierra, disassembling Hermean programs to find out
how and why they work is time consuming. Generally, phenomena like self-
FUTURE DEVELOPMENTS
239
organization can only be identified by the pattern of behaviour seen, rather

than by direct reading of the code.
Hermes includes features of both the artificial life and genetic programming approaches. The programs can interact with each other (a feature of
AL, but not GP IGA), it has an open-ended genetic representation (a feature
of AL/GP, but not GA), and the evolutionary process is task-oriented and
driven by a clearly defined fitness function (a feature of GA/GP, but not
AL). There are clearly many other hybrid combinations that can be explored
in an attempt to find the most efficient synthesis of biological techniques
and computer simulation.
13.5
FUTURE DEVELOPMENTS
Two major obstacles to the widespread acceptability and use of evolutionary

programming techniques need to be discussed.
When will computers of sufficient power be available to produce code

as economically as human programmers?
Evolved programs are likely to be radically different in structure to

written ones - can we trust them and test them?
The following sections briefly discuss these issues.
13.5.1
Time scales
A good programmer writes 20-30 lines of debugged code per day - roughly
300 bytes of object code. How quickly do programs such as Tierra or Koza's
genetic programming evolve an equivalent amount of code? The comparison
depends on a number of assumptions.
The human programmer's code is written as part of a large project,

including testing and requirements capture. The computer generated code
is based on the evolution time which implicitly includes testing in the
form of evaluation of the fitness function. It does not allow for
requirements capture, which is equivalent to setting up the appropriate
fitness function.
Only the evolution of small programs has been studied. The scaling to
large problems is unknown. Section 13.3 briefly discussed this issue. Until
workstations of sufficient power are available the techniques will be
difficult to test on long programs.
240
EVOLVING SOFTWARE
It is assumed that current developments in the cost/price relationship

for workstations will continue. At the current time workstation
performance is improving tenfold every five years at constant real prices.
No allowance is made for the advantages that parallel processing can

give. The evolutionary programs described above lend themselves
naturally to parallel processing. Indeed such advanced features as local
populations (called demes) were first studied using transputer-based
parallel machines and connection machines.
The data shown in Table 13.1 is based on speed comparisons between

the Mac, Sun and HP workstations using Tierra, and on running a C version
of Koza's genetic program on the HP. The latter experiment is used to
calculate how long it would take a genetic program to evolve thirty lines of
code. Column 2 of Table 13.1 shows the assumed processing speed in MIPS.
It should be noted that the Cray 2 has a processing power of about 3 x 103
MIPS.
Table 13.1
Computer speeds and availability.
Computer model
Processing
speed (MIPS)
Year
available
Time for 300 bytes

of object code
Mae Quadra
10
30
100
1992
1992
1992
1992
2000
2010
2020
100 days
30 days
Sun Spare II
HP Apollo 720
Human
3 x 103
3 x 105
Workstation
Workstation
Workstation
3 x 107
10 days
I day
I day
2 hours
3 mins
The predicted year that such a serial computer will be available for about
25 000 at 1992 prices is indicated in column 3. Column 4 shows the calculated
and observed times for 'producing' thirty lines of code. This data is presented
graphically in Fig. 13.5. Thus, if genetic programming can be shown to scale,
it is predicted that around the turn of the century such techniques will compete
with humans at writing code. From then on the power of the machines will
rapidly overtake the human programmer.
13.5.2
Testing and reliability
Testing is an implicit part of the evolutionary software generation process.

Each generation is constantly exposed to a set of problems and measured
by how it performs at tackling these tasks. However, this seldom equates with
FUTURE DEVELOPMENTS
100
241
,...,...-
10
,...-
,...-
0.1
0.01
0.001
,...-
,...-
~ ~
0.
c.~
00
~g
QlC\l
"O~
ec.
workstation
Fig. 13.5
Time taken by evolving software to produce 300 bytes of object code on various
current and anticipated workstations.
complete testing. Clearly, it would be computationally expensive to test each

organism,every generation, with the complete input set. As a consequence,
evolving software programs are only incompletely tested in the simulator.
Because each generation only faces a subset of the possible input set, the
programs may well evolve specialized algorithms that can deal correctly with
this subset of data, but fail to deal with the full set. This process is described
as 'overfitting'. One of the most important research topics in evolving
software currently is how to evolve general, rather than specific, solutions
to a given problem set.
Some attempts have been made to tackle this problem, the most important
of which is by Hillis [16]. Hillis creates two populations, one of candidatesolutions and one of candidate-problems. Both are able to evolve. The
problem-creature's fitness is defined by the number of solution-creatures that
fail to correctly answer their problem. In this manner the problem space is
always evolving, within the limits of the input space, to select those test cases
that the candidate solutions find most difficult. In the long term, the candidate
solution that is capable of answering all possible input problems is the one
most likely to succeed. This technique - called the 'co-evolving parasite
242
EVOLVING SOFTWARE
model' - has been shown to be considerably faster at evolving than the static
test case.
Programs developed using rigorous mathematical models should, in
theory, be provable as error-free. Evolving software will only be statistically
error-free depending on the number of test cases to which it has been exposed
unless formal methods can be developed for verifying the generated code.
Programs that are shown to possess errors would probably have to be evolved
further. However, evolving software tends to develop programs that are fault
and error tolerant. This is because of the nature of the evolutionary process,
particularly with the co-evolving parasite model. A program that evolves to
cope well with a subset of the data and poorly with the rest can be viewed
as 'brittle' and error-sensitive. Changes in the test set will rapidly kill these
programs. So programs that return reasonably close, but not fully
satisfactory, answers to a fuller range of the test set of problems are more
likely to survive. A good example of this process can be seen in Tierra. The
low flaw rate of execution of instructions drives evolution towards programs
that can tolerate small errors in their execution. A program that requires a
thousand instructions to be executed exactly as they are written is likely to
rise quickly to the top of the reaper queue and to extermination. Tierran
programs can be hard to disassemble precisely because, for them, it is more
important to evolve error tolerance than readability! Until larger programs
can be evolved it will be difficult to judge the error rate in an evolved, in
preference to a hand-written, program.
13.5.3
Scaling and decomposition
The most striking examples of evolved software come from the field of genetic
programming. However, the problems tackled, whilst interesting and
complex, only require relatively short lengths of code. Indeed, to prevent
the simulator grinding to a halt the maximum depth of the parse tree is
deliberately limited to, typically, fifteen layers. The question then becomes:
'How does genetic programming scale with the length of program to be
evolved?' The simple answer is nobody knows. Holland discusses the
consequences of gene string length and the number of alleles (independent
symbols that can occupy a single site on the gene string - for binary
representations this is two [0, I]) on the size of the genetic search space and
thus on the search time (see in particular Jones [9] for a clear discussion).
The search space for a genetic algorithm and the number of schemata in the
population both grow as power functions. Unfortunately Holland's analysis
does not apply to open-ended systems. Genetic programs that seem to evolve
similar structures may actually search a smaller space than first appears and
thus may also show a less than exponential growth in difficulty with the size
CONCLUSIONS
243
of the problem. However, the extrapolations above assume a linear

dependence of evolutionary time against program length which is extremely
doubtful, even with new developments in genetic programming, such as
automatic function definition [6].
So it seems that evolving software techniques will be used to best effect
evolving small pieces of code (component parts) rather than whole programs.
This requires large tasks being broken down into sub-components. Although
tentative suggestions have been made that the decomposition of a task could
be part of the evolutionary process, detailed models for doing this do not
yet exist. Furthermore, decomposition into fragments that can be evolved
quickly may not be the same process as decomposition into fragments that
can be written quickly. The key step in the long-term future of evolving
software is tackling these decomposition/scaling problems. Unfortunately
they require rather more powerful machines so that real problems can be
evolved in reasonable times and the limitations of today's techniques can
be explored.
13.6
CONCLUSIONS
Evolving software is a technique that may alter the whole approach to

software writing. In the future the program's required function could be
specified using a formal 'fitness specification language', and then the
computer would evolve the program to solve the problem as specified. Such
programs are unlikely to have a form that is intelligible to human
programmers - indeed artificial life programs already produce code that
is extremely difficult to interpret. They may produce creative solutions to
problems that were not anticipated at the time of setting the specifications,
for, unlike automatic code generating techniques, they are creative. In the
event that errors occur or specifications change, the code could be fed back
in the simulator to evolve further to cope with the new problem or the change.
Large problems are unlikely to be tackled by a single piece of evolved code,
but rather the problem would be decomposed into smaller portions for which
suitable fitness function expressions could be written. The way a problem
is decomposed for an evolving system and for normal code writing could
be very different. Such evolved code is likely to be robust and error-tolerant
because the evolutionary simulation places a premium on such code to survive
the evolutionary process.
The year-on-year increase in the power of computers suggests that these
techniques could be available within a decade. Genetic algorithms already
form a useful tool in control optimization problems, and perhaps genetic
programming will be enhancing the programmers' tool kit before the end
244
EVOLVING SOFTWARE
of the century, allowing the human to concentrate on the program's

architecture and specification, whilst the computer designs and writes the
code itself.
REFERENCES
I.
Darwin C R: 'On the origin of species', John Murray, London (1859).
2.
Stryer L: 'Biochemistry', 3rd edition, W H Freeman, New York (1988).
3.
Conrad M: 'Structuring adaptive surfaces for effective evolution', Proc 2nd Ann
Conf Evol Prog, p I (1993).
4.
Holland J H: 'Adaptation in natural and artificial systems', University of

Michigan Press, Ann Arbor (1975).
5.
Holland J H: 'Genetic algorithms', Sci Am, p 44 (1992).
6.
Koza J R: 'Genetic programming', MIT Press, Cambridge (1992).
7.
Ray T: 'An approach to the synthesis of life', Artificial Life II, p 371 (1992).
8.
Skipper J: 'The computer zoo - evolution in a box', Proc 1st Eur Conf on Art
Life, p 355 (1992).
9.
Jones A J: 'Genetic algorithms and their applications to the design of neural

networks', Neural Comput & Applic, -.1, pp 32-45 (1993).
10. Harvey I: 'Species adaptation genetic algorithms: a basis for a continuing SAGA',
Proc 1st Eur Conf on Art Life, p 346 (1992).
II. Goldberg D E, Deb K and Korb B: 'An investigation of messy genetic algorithms',
Technical Report TCGA-90005, TCGA, University of Alabama (1990).
12. Tackett W A and Carmi A: 'SGPC: simple genetic programming in C', University
of Southern California, Dept of EE Systems and Hughes Missile Systems Co.
13. Tackett W A: 'Genetic programming for feature discovery and image
discrimination', Proc of the Fifth International Conference on Genetic
Algorithms, p 303 (1993).
14. Holland J H: 'Escaping brittleness: the possibilities of general purpose learning
algorithms applied to parallel rule-based systems', in Michalski R S, Carbonell
J G and Mitchell T M (Eds): 'Machine Learning: An Artificial Intelligence
Approach, Vol II', Morgan Kaufmann, Los Altos, pp 593-623 (1986).
15. Smith S F: 'Flexible learning of problem solving heuristics through adaptive
search', Proc 8th Int Conf on Art Int (1983).
16. Hillis W D: 'Co-evolving parasites improve simulated evolution as an optimisation
procedure', Artificial Life II, p 313 (1992).
14
SOFTWARE AGENTS FOR

CONTROL
S Appleby and S Steward
14.1
INTRODUCTION
The control of a distributed system, such as a telecommunications network,

presents unique problems to which traditional programming languages and
operating systems are not well suited. The problem centres around the fact
that the distributed system is a single resource which needs to be managed
as such. Typically a distributed system may either be controlled by a single
central controller, or by multiple distributed controllers. Systems with central
control suffer from poor scaling with system size due to the increase in
communication and processing and are potentially vulnerable to controller
failure. An additional problem is that the system may be administered by
multiple authorities, making a single controller unacceptable.
In distributed control, different controlling processes execute on different
processors. If the system were to be managed as a single entity, then the
processes would need to co-ordinate their activities. This is not at all
straightforward to achieve. Distributed processes need to communicate with
one another in order to identify that an event requiring controlling action
has occurred. Deciding what controlling action to take and implementing
that action will be highly dependent on the processes' ability to co-operate.
246 SOFTWARE AGENTS
This chapter introduces the concept of mobile agents as an alternative

method of structuring software to control distributed systems. It is argued
that the use of mobile agents, in conjunction with an appropriate set of
programming guidelines, will result in distributed control software which:
is intrinsically robust to system and program failures;
is intuitive to write in a well-structured manner;
is self-regulating;
requires little direct co-operation between processes.
A mobile agent may be considered to be a natural extension of the object

oriented programming philosophy to include features which are suited to
distributed control [1-3]. The extension is that a collection of objects may
be grouped together into an entity called an agent and that this agent should
have the power to move from node to node around the network used to
control the distributed system.
In general, as programming languages develop, they allow the program
as entered by the human programmer to be abstracted from the details of
the target system. One of the benefits of the right kind of abstraction is that
it allows the programmer to structure programs in a way which makes them
easier to write, maintain and reuse. A well-structured program is made from
units that are easy to visualize and manipulate mentally. The concept of a
mobile agent is proposed as a way of taking the abstraction one step further
than in current object oriented programming, to make it suitable for
structuring distributed control programs.
14.2
RULES FOR ROBUSTNESS
If a programming discipline is imposed on the way that mobile agents are

used, then well-structured code can be achieved that is intrinsically robust
to the failure of an agent or the failure of a component in the distributed
system. In particular there are three properties that agents should have in
order to be intrinsically robust:
there should be no direct inter-agent communication;
the agents should be present in reasonably large numbers;
the agents should be able to dynamically alter their task allocations and
number.
RULES FOR ROBUSTNESS
247
If there is no direct inter-agent communication an agent would not send

a message whose destination is a specific instance of an agent. Nor should
an agent rely on receiving a message from a specific instance of an agent.
This gives mobile agents autonomy and independence which is a very
important feature for robustness.
There must, however, be some mechanism by which the agents can pass
the results of their activities on to other agents. The way that the agents should
do this is through modification of the state of the system that they control
or by leaving data in the nodes of the controlling network. For network
control, one type of agent could determine the cost of a resource based on,
for example, the pattern of usage of the network. A second type of agent
would use the costs to find the minimum cost-routeing table. In this way
one class of agent is passing on a collective result to another class of agent,
but no specific instance of an agent communicates with another specific
instance of an agent. This kind of communication is only possible if the agents
are mobile.
The second property that the agents should have is a presence in large
enough numbers so that no single agent carries a significant proportion of
the control responsibility. If this is coupled with a mechanism for dynamically
assigning tasks to the agents then the ensemble of agents will be intrinsically
robust to the failure of any individual agent. Of course, the failure of an
agent must cause some degradation to the system but it would be gradual
rather than catastrophic. Gradual degradation of performance is one of the
most significant advantages of using mobile agents.
The third property implies that the number of agents should be
proportional to the workload and that failed agents should be detected and
replaced. A failed agent should be detected by the ensemble of agents which
could compensate for the failure by increasing the number of agents to the
level before the failure. To do this the ensemble of agents needs to have the
ability to:
detect that an agent has been lost;
produce a replacement agent once the loss has been detected;
organise itself in a way that redistributes the workload.
The mechanism which is proposed as a means of achieving this is to

include a time stamp on the data elements that are modified by the agents
as part of their normal function when they visit a node. This is a very simple
device but has tremendous potential for allowing the ensemble of agents to
self-organize.
248 SOFTWARE AGENTS
So far it has been argued that the use of mobile agents in conjunction
with indirect inter-agent communication can lead to intrinsically robust, wellstructured code. This potential benefit remains academic until it can be
demonstrated that such agents can be programmed to carry out useful tasks
in a way which includes this benefit.
In an extreme case a single mobile agent can do any task that a central
controller could do. The mobile agent has access to the same information,
since it can move to any part of the system and read any data that would
be accessible to a central controller. It can then be said that there is no
restriction to the tasks that non-intercommunicating mobile agents can do
compared to a central controller. This means that the issue is not whether
a mobile agent can carry out useful tasks but whether mobile agents can be
used to carry out tasks in a way which offers benefits over central or fully
distributed control.
14.3
PROGRAMMING A MOBILE AGENT
In a complex distributed control system many different kinds of control will

be required. At one extreme, software will be used to carry out highly
algorithmic control. At the other extreme, human beings will be used to
provide very subjective control, perhaps based on general guidelines. Within
this spectrum there is a requirement for a control system which is a mixture
of heuristic and algorithmic control. This is the intended application area
of mobile agents.
Traditionally this area of application is in the domain of distributed
artificial intelligence (DAI). DAI techniques generally employ agents which
co-operate (and sometimes compete) in order that the ensemble of agents
works towards a common goal [4]. Co-operation between these types of
agent is typically achieved through a fairly complex process of inter-agent
communication which results in agents being able to convey relevant aspects
of their internal states and their world views to one another.
As has already been discussed, there are good reasons for requiring that
there be little or no direct inter-agent communication. Therefore inspiration
is not taken from the field of distributed artificial intelligence, but, instead,
a behavioural programming method is preferred. This kind of programming
has been used with some success in the field of robot control as an alternative
to symbolic artificial intelligence. One example of a behavioural programming
method is the Subsumption Architecture [5].
TESTING MOBILE AGENTS
249
Designing a control system for a robot using the Subsumption Architecture

involves choosing sets of task-achieving behaviours. These sets of behaviours
are called competences. Competences are arranged in levels so that any level
of competence is dependent on the correct functioning of lower levels and
independent of higher levels. Higher competence levels will generally have
priority over lower levels in the sense that they can suspend lower levels of
behaviour when they become active. Such systems can display apparently
complex, intelligent behaviour. It is argued that such behaviour appears not
as the action of a complex, intelligent control system, but as the interaction
of a simple control system with a complex environment.
A Subsumption Architecture agent would identify patterns in the system
being controlled and implement the controlling actions appropriate for those
patterns. An ensemble that consisted of different species of agent could
function by each agent being designed to recognize and respond to a different
set of patterns.
14.4
TESTING MOBILE AGENTS
Exhaustive testing of a complex program is not generally feasible - there

are too many possible permutations of input and internal states. As a result,
a mixture of partial testing and analysis of the program has to be relied upon.
The degree to which a program can be tested depends on the number of
internal and input states. Although the potential number of input states to
an agent could be very large, an agent based on the Subsumption Architecture
would classify these states into a relatively small number of categories. This
effective reduction in the number of input states would make individual agents
straightforward to test and analyse.
The method of separating competences in the Subsumption Architecture
means that they can be tested in an incremental manner. An agent would
be developed by adding competences that make use of existing competences.
At each stage the agent should be able to carry out the tasks for which the
new competences were designed. The ease with which the Subsumption
Architecture agents can be decomposed will significantly facilitate debugging.
The dynamics of an ensemble of agents may be very complex in a way
that is difficult to predict from analysis and testing of a single agent. Some
work has been done to develop an algebra [6] which can be used to analyse
distributed systems. Currently this algebra is only applicable to synchronous
systems although the unit of time can be made arbitrarily small to simulate
asynchronous processes.
250 SOFTWARE AGENTS
14.5
AN EXAMPLE APPLICATION
An application has been devised which is capable of showing an ensemble

of mobile agents controlling congestion in a circuit-switched telecommunications network, and which illustrates the points discussed above. In this
application there are two types of mobile agent which provide different layers
of control in the system.
Each node in the network is represented by a routeing table storing the
neighbouring node to which traffic should be routed in order for that traffic
to reach any particular destination node. Agents control congestion by making
alterations to these routeing tables in order to route new traffic away from
congested nodes.
Traffic flow in the network is simulated by a traffic generator module.
A traffic profile determines the source and destination nodes, the start time,
duration and bandwidth for calls which will be added to the network. To
place calls on the network the traffic generator first adds the call's bandwidth
to the source node, thus increasing its activity. The traffic generator checks
the routeing table at this node to find the next node en route to the destination
node. The call is then added to this node. This process continues along the
route until the call is finally added to the destination node.
A user interface allows any network and associated traffic profile to be
tested together and the results displayed. Traffic is shown in the network
by changing the nodes' colour depending on their utilization (node
activity Inode capacity).
14.5.1
The load management agent
The load management agent (load agent) provides the lowest level of control
in the system. It is designed to manage the network by distributing traffic
evenly. When a load agent is launched on a node in the network it updates
the routes from that node to all other nodes by modifying appropriate
routeing tables throughout the network. An agent primarily acts on behalf
of the node where it was launched but can have a beneficial effect elsewhere
in the network - when a routeing table at a node is changed it will re-route
traffic from all nodes via the new route, not just that sourced from the agent's
own node. When an agent has completed a single update for its node it
terminates.
The algorithm for constructing routes is based on a well-known optimal
route algorithm by Dijkstra [7]. In this example the algorithm is used to
EXAMPLE APPLICAnON
251
find the path from the source node (s-node) to each other node in the network
that has the maximum amount of spare capacity available. The spare capacity
of a route is defined to be equal to the smallest spare capacity of any
component on the route.
The internal state of the agent consists of a list of records each with the
following fields:
node identifier;
largest currently known spare capacity to the node;
the neighbour of the above node that contributed to the calculation of

the largest known spare capacity;
permanent or temporary flag.
There is one record for each node that the agent knows about. Initially,
the agent only knows that the s-node exists, so there is only one record. The
s-node record is permanent (as indicated by the flag) and its spare capacity
is set to infinity. The contributing neighbour field is not used for the s-node.
The agent interrogates the s-node to find out who its neighbours are and
what the spare capacity to each neighbour is. A temporary record is then
created for each neighbour which indicates the s-node as the neighbour that
contributed to the calculation of the spare capacity to that neighbour.
The temporary record with the largest spare capacity is made permanent.
This will be called the newly promoted node. Figure 14.1 shows the way that
the algorithm 'grows' the network of permanently labelled nodes by
promoting temporarily labelled nodes at the periphery. At this point the agent
knows the route from the s-node to the newly promoted node. In this case
it is rather trivial; traffic for the newly promoted node should simply be passed
directly to that node from the s-node.
The agent now makes its first move to the newly promoted node so that
this node becomes the current node. The agent creates new temporary labels
for each of those neighbours of the current node that it does not already
have records for. If the agent already has a record for a neighbour it will
check to see if the route via the current node has a higher spare capacity
than that indicated in the record. If this is the case then the record will be
altered to indicate the new, higher capacity and show the contributing
neighbour as the current node. Once again the temporary record with the
largest spare capacity is made permanent. It is now known that the route
to the newly promoted node should be the same as the route to the neighbour
that contributed to the calculation of the spare capacity to the newly promoted
252
SOFTWARE AGENTS
II
Fig. 14.1
s-node
permanently labelled node
temporarily labelled node
Dijkstra's algorithm 'growing' the network of labelled nodes.
node. To update the routeing tables accordingly, the agent needs to visit every
node between the contributing neighbour and the s-node.
The order in which the agent visits the nodes to update the routeing tables
is very important. If the agent passed from the s-node to the newly promoted
node then there is a possibility that, at some stage, the routeing tables would
contain invalid routes that were either circular, or terminated before reaching
the destination. If, instead, the agent moves from the destination back to
the s-node then no route which was previously valid would be invalidated
by the changes that the agent makes to the routeing tables.
To see how updating a route from destination to source does not leave
an invalid route, consider the example shown in Fig. 14.2. Suppose an agent
is in the process of modifying a path which was originally valid. Now trace
the path from the source node to the destination node with the routeing tables
in this intermediate state. At some point on the path a node may be
encountered whose routeing table has been modified by the agent. If no such
EXAMPLE APPLICAnON
-"
,,
,,
,,
,,
,",
,,
,,
253
,
,,
current agent location
,,
,,
" . starting node
- - - - - old route
- - new route being created
by agent
Fig. 14.2
Intermediate routes need to be valid.
node is found then, by assumption, the route to the destination node is valid.
If a modified node is encountered, then the path segment between the source
node and the first modified node must be valid (since the path has been traced
that far). The path segment between the first modified node and the
destination node must be valid since it is being written by the agent. This
argument is independent of the choice of source node and so will be valid
for all source nodes in the network.
14.5.2
The parent agent
Parent agents provide the second level of control in the system. These agents
are responsible for managing the population level and task allocation of load
agents. The parent agent detects nodes with high utilization values and then
launches load agents on nodes that are sourcing the most traffic. In this way
the parent agent provides a link between the nodes that are most likely to
become overloaded and the nodes that are most likely to cause the overload.
A parent agent steps randomly around the network gathering information
for its internal records from each node that it visits. From each node the
agent records the traffic sourcing rate and current utilization. Traffic sourcing
rate values are used to build a 'sourcing-rate history' for each node in the
network that the agent knows about. Utilization values are used to form a
'utilization history', which is an average of the utilization values of the last
254
SOFTWARE AGENTS
n nodes that the agent has visited. Using this information, a parent agent
can determine the level of management required in the network at any time
by analysing how evenly traffic is distributed. The parent agent uses techniques similar to those used in economics to analyse income distributions [8].
The parent agent updates an internal record with the following fields for
each node visited in the network:
node identifier;
the node's rank in terms of traffic sourcing rate;
the sourcing-rate history - a rolling average of the node's sourcing rate.

Also, the agent stores the following general information within itself:
the number of nodes encountered so far;
the utilization history value - a rolling average of node utilizations.
The agent has two main modes of behaviour - a 'gather-information'

mode and a 'look-for-overload' mode. In the first mode the agent takes
random steps about the network updating its information about the nodes
and the network. In the second mode the agent takes a small number of steps
to look for nodes that have utilizations higher than its utilization history value.
In the ideal case, all nodes should be loaded to the same utilization level and
therefore no node would have a value significantly higher than the agent's
utilization history value. If the agent finds a node with a utilization
significantly higher than this value, it can assume that traffic management
is needed in the network at this location.
Comparing a sample of node utilizations with the agent's utilization
history value has a desirable effect when traffic builds up or drops in the
network. As this value is an average of values the agent has stored, it will
track changes in the network but with a time lag. Therefore, when traffic
is building up in the network, a parent agent is more likely to find nodes
with higher utilizations than its stored value, so load agents are launched
to manage the surge of new traffic. When traffic is removed from the network
the converse is true and the utilization history is likely to be higher than the
utilizations found by the agent; so as traffic is removed, no load agents are
added.
When nodes with utilizations greater than the agent's utilization history
are found in the 'look-for-overload' mode, the agent enters a load agent
launch cycle. The nodes are ranked according to their sourcing rates. The
agent moves to the node with the highest sourcing rate in its ranking table
and tries to launch a load agent. If it is not desirable to launch a load agent
EXAMPLE APPLICAnON
255
on this node (perhaps there are already too many processes on this node)
the parent agent moves to the next node in its ranking table until it successfully
launches an agent. The agent will move to the next ranked node if:
it estimates that the node still has an agent working for it;
it believes that an agent has crashed - in this case the parent agent will
clear the records of the crashed agent, thus allowing a load agent to be
launched on this node next time.
Parent agents must be able to detect load agents that have crashed and
replace them quickly. The parent agent does this by using load agent timestamps to calculate average load agent lifetimes (the time taken for the load
agent to do a route update for an s-node and then terminate). When a load
agent is launched it registers its start time at the s-node and when it finishes
it posts its start time again in another field at the s-node. By comparing these
two fields a parent agent can determine if the last load agent to be launched
has finished successfully. If this is the case, it will know that it can launch
another load agent. If a load agent crashes or is delayed due to an unusually
heavy work-load, the parent agent will become aware when the time elapsed
from the last posted start time becomes larger than the average life-time.
To cope with variations in the lifetimes, the parent agent has a safety margin
to allow agents to overrun under heavy traffic conditions. When this time
has elapsed the parent agent will assume the load agent has crashed and will
clear its records from the s-node to allow another load agent to be launched
in future. If that scheme fails for any reason such as the system suddenly
slowing down and hence making load agent lifetimes much longer, it will
not matter, as a node can have a number of load agents working for it at
any time. The agent lifetime average is calculated so that the last lifetime
is weighted heavily. This is essentially a last lifetime value with a small
historical content.
It is crucial that parent agents regulate themselves from information
gathered from the environment in order that an ensemble of parent agents
can self-organize with no direct inter-agent communication. The parent agent
must know not only that there is a need for more load agents, but also when
there are enough load agents and, therefore, adding more would not be useful.
The parent agent will stop the launch cycle and return to the 'gatherinformation' mode if:
it successfully launches an agent;
the agent's ranking table is empty;
256 SOFTWARE AGENTS
the traffic sourced by the ranked node will not significantly increase load
on the network;
there are enough load agents already present in the network.
The rules above represent a minimum set in order to allow a group of

parent agents to control the population of load agents successfully. There
are potentially many more rules the parent agent could use. The first rule
stops the parent agent going on to launch more agents if it successfully
launches an agent so that the load agent population will increase gradually.
The next rule simply prevents load agents being launched when there are no
traffic sourcing nodes for them to manage. The third rule allows the parent
agent to judge whether it is worthwhile to manage a particular traffic sourcing
node. The agent considers a node's traffic to be significant if all the
traffic, when added to the highest utilized node in the agent's records, causes
that node to overload.
Various improvements to this scheme are possible. For instance, to ensure
that load agents are introduced quickly when traffic is building up in the
network and the node utilization values are low, the traffic from the node
is weighted in order to make it more significant. This weighting is reduced
when the parent agent knows that load agents are present in the network
(i.e. as the node moves down the agent's ranking table) and hence it is
increasingly likely to give up launching load agents as the load agent
population increases. The final rule implies an upper limit on the number
of load agents in the network. This is predetermined by a maximum
proportion of nodes in the network that can be s-nodes. Using this value
the parent agent can calculate how many load agents are needed. If this is
m, then the agent will exit the launch cycle after it has tried to launch on
the mth node in its ranking table. Of course, the maximum number of load
agents will be able to exceed the ideal value as different parent agents may
have different nodes in their ranking tables and one s-node may have more
than one agent working for it.
Many copies of the parent agent can be present in the network at once.
This is essential for robustness. Each parent agent will gather network data
and make its own decisions about where to launch load agents. A group of
parent agents, however, will be able to self-organize via the load agent start
time and lifetime time-stamps. Each of the parent agents present in the
network is able to tell whether a load agent is running, so multiple launches
at a single node can be avoided. When parent agents decide to launch a
load agent at a node, they check that they have not been beaten by another
parent agent. If this happens they give up on the node and move down the
ranking table.
EXAMPLE APPLICATION
14.5.3
257
Parent agent population management
The population of parent agents must be managed in order to replace crashed

agents and to build up the level of parent agents when the network is
initialized. This is done by static processes called 'parent monitors'. Parent
monitors reside on nodes in the network and these nodes have fields for the
parent agent and parent monitor to swap information, rather like a
blackboard messaging system. Parent monitors are initialized with a value
that determines the number of parent agents that they will try to maintain
in the network. When a parent agent visits a node with a parent monitor
present, it will register itself by posting the following data:
its process ID number;
its start time;
the current time.
The entire control system is started by simply launching the parent

monitors. A parent monitor checks the parent agents' fields and if the number
of agents is less than the desired level, the parent monitor will launch a parent
agent on its own node. The parent agent will immediately register itself at
that node. If there are many parent monitors then there is a chance that their
records may differ. This may result in extra parent agents being launched
and so parent agents need the ability to terminate themselves if there are too
many present in the network. If a parent agent finds that there are too many
parent agents, then it will terminate only if it is the youngest agent, i.e. it
has the latest start time. The youngest agent has the least historical data stored
and is therefore the least useful. The parent monitor uses a similar technique
to that described before, based on the average agent lifetime, to decide if
a parent agent has crashed. Every time the parent monitor checks the data
fields it will calculate the time elapsed since the parent agent's last visit. An
average of these times, again with some safety margin, is used to determine
if an agent has crashed. If a parent monitor decides an agent has crashed
it will remove that parent agent's records which will allow another parent
agent to be launched.
This system is very robust and can handle many process crashes and other
errors. There can be many parent monitors running at a node and many nodes
with parent monitors running on them. Therefore, if a parent monitor crashes,
another at that node will carryon. Also, if a parent monitor's node is isolated
from the rest of the network, parent monitors at other nodes will provide
parent agent population control. If the parent monitor's records become
corrupted and it repeatedly launches parent agents, they will terminate as
258
SOFTWARE AGENTS
soon as they read another parent monitor's records. Crashed parent agents
will be replaced after only slightly longer than their typical cycle time;
meanwhile load agent management will not suffer as other parent agents can
cope equally as well. The preferred number of parent agents can be changed
when the system is running and the population will adjust accordingly.
14.6
TESTING THE SYSTEM
14.6.1
The network
To test the agents' ability to spread traffic evenly, a network was needed
that had a number of alternative routes from node to node. This network
must not be fully interconnected as there would always be a direct link to
all other nodes and the network would be limited only by the capacity of
its links. Also the network cannot be a tree network as there will be no
routeing decisions for the agents to make since there is only ever one route
from node to node. A proposed UK network was chosen as the test network
(see Fig. 14.3). This network is sufficiently interconnected to give the agents
routeing decisions to make and it also has the benefit of being a realistic
network topology.
A network description file was created to set the positions of the 30 nodes
in the network and the links connecting them. Links are defined as
unidirectional and so nodes were connected with a link in each direction to
represent a circuit. In this example all nodes in the network were given the
same capacity.
14.6.2
The traffic profile
Traffic in the network was represented by blocks of calls that were 2.5010
of each node's total capacity. The source and destination nodes were chosen
so that there was a range of call distances, where the distance is the number
of nodes through which a call travels. As the initialized routes for the network
were known, it was possible to deliberately overload particular nodes by
organizing many calls to cross at one node. This was done for three nodes
in the network.
The call blocks were scheduled to start at regular staggered intervals and
the durations were arranged so that they all finished approximately together.
The traffic generator module is able to apply a traffic profile to the network
at any specified rate.
TESTING THE SYSTEM
Fig. 14.3
14.6.3
259
The network used to test the agents.
Measuring agent performance
The simplest experiment to evaluate the performance of a group of agents

in a network is to compare the result of the same traffic profile with and
without agents. This will show the advantage of the agents over a network
with a fixed routeing table network.
The network was initialized with default routes. When the traffic profile
was applied to the network three nodes overloaded, as expected. The peak
traffic level in the network reached 115 0/0. This gave an example load
260 SOFTWARE AGENTS
distribution as shown in Fig. 14.4(a). It should be noted that nine nodes are
completely unused.
To test the system with agents present, the routes were initialized as before,
and a parent monitor was started on a node. The parent monitor then
launched the required number of parent agents. For a network of this size,
two parent agents are sufficient and this value had been pre-set. The traffic
generator was then started with the same traffic profile as before. As there
is a random element in the behaviour of the system, the agents were tested
with the traffic profile a number of times.
As soon as the parent agents found traffic in the network, they calculated
where load agents would be needed. As the amount of traffic in the network
increased, so did the number of load agents. As traffic was added, the
maximum spare capacity routes could differ from the initial default routes.
The load agents changed routes in response to this and moved call blocks
away from congested nodes. In the tests, the number of load agents typically
peaked at about seven. When the traffic stabilized and then was removed
from the network no more load agents were launched. Figure 14.4(b) shows
a typical result, with the network at the point of peak traffic.
0> 100%
(a)
50-75%
8
0
0 -25 %
(!)
not used
25 - 50%
(b)
without agents
maximum node utilisation 115 DAI

three overloaded nodes
nine unused nodes
Fig. 14.4
75-100%
with agents
maximum node utilisation 85%

no overloaded nodes
no unused nodes
Network traffic distribution with and without mobile agents.
CONTROL BY MOBILE AGENTS
261
Comparison of the two results in Fig. 14.4 shows the benefit of the agents.
The agent-managed network had a maximum node utilization of 85070 - no
nodes are over-loaded. This was achieved as the agents spread the traffic
more evenly to use all the nodes, no nodes remaining unused.
14.7
THE NATURE OF THE CONTROL PROVIDED BY MOBILE

AGENTS
The use of mobile agents combines some of the benefits of central control
and distributed control. At each stage of the algorithm the load agent learns
about the network incrementally, by interrogating a node to discover its status
and the status of the links to its neighbours; no central store of information
is required although non-local information is held within the agents.
Individual agents can fail whilst the overall control system continues to
function. Dijkstra's algorithm is very difficult to implement in a fully
distributed manner and yet it has been implemented here in a very simple
way that retains a number of the benefits of distributed control.
The agents exhibit both algorithmic and heuristic behaviours. They exhibit
algorithmic behaviour when calculating routeing tables but heuristic behaviour
when managing their workload by choosing whether to launch agents,
terminate themselves or move to another node. This is a powerful way of
introducing a degree of apparent intelligence into an otherwise purely
algorithmic system.
Distributed control has advantages and disadvantages compared to central
control of a distributed resource. Primarily, distributed control can be more
robust and faster reacting than central control. The robustness occurs since
there is no single controller whose failure would cause the whole control
system to fail. On the other hand, central control is potentially able to produce
a result that is closer to optimum since a central controller will have global
information available to it.
Mobile agents appear to be distributed in the sense that there is no central
controller. However, the agents have a view of the distributed system which
is not local and, indeed, may be global since, in principle, a mobile agent
could visit every part of the distributed system to gather data.
The concept of mobile agents seems to embody the advantages of
distributed control in that they are very robust and yet they can have a view
of the system which includes many nodes.
As well as providing some of the benefits of central control, the mobile
agents can inherit some of the disadvantages of central control. In central
control, decisions may be based on data that is quite old, since it will take
time to poll all parts of the system. Mobile agents also obtain the data that
262 SOFTWARE AGENTS
they require in a serial manner and so could also suffer from the problem
of carrying out actions based on old data.
The age of the data that an agent uses can be controlled by restricting
the number of nodes that an agent visits to gather data before it implements
a control action. If the number of nodes visited by an agent is too small,
the agent may suffer from the problem that its control is far from globally
optimal. However, the use of mobile agents in this way allows one to obtain
the desired balance between the characteristics of distributed control and those
of central control.
14.8
CONCLUSIONS
The main conclusion from this work is that the use of mobile agents offers
a radically different way to control a distributed system such as a
communications network. The use of mobile agents extends the philosophy
behind object oriented programming to make it more amenable to
programming distributed control algorithms.
The use of mobile agents provides a means of taking advantage of work
in robotics where a novel architecture, called the Subsumption Architecture,
has been shown to be superior to traditional artificial intelligence in terms
of the complexity of the environment it can cope with and the simplicity of
the resulting control structure.
Distributed resource controllers based on mobile agents would have quite
different characteristics from those based on a more traditional approach
to control problems. There would be an automatic increase in robustness,
the penalty for which would be the inability to formally predict the
performance of the resource being controlled by the mobile agents. This may
not be a disadvantage for certain aspects of control since in most control
problems some kind of heuristic algorithm will be needed (even if it is usually
provided by a human controller).
Mobile agents would be of most benefit in the kinds of application where
robustness and the ability to self-regulate are more important than the speed
of response. They are likely to be suitable for carrying out background
optimization tasks by adjusting parameters to keep the system within
operating limits. This attribute would allow agent control systems to carry
out some of the mundane tasks currently undertaken by human supervisors.
REFERENCES
263
There are still a large number of questions to be answered regarding the

use of mobile agents. Experience will provide a better feeling for the kinds
of problem that would benefit from their use. The potential benefits of using
mobile agents are sufficiently great that they certainly warrant further
investigation.
REFERENCES
1.
Tenma T, Yakote Y and Tokoro M: 'Implementing persistent objects in the

Apertos operating system', Internal Report, Sony Computer Science Laboratory
(1992).
2.
Casais E: 'An object oriented system implementing KNOs', Sigois Bulletin, 9,

No 2-3, pp 284-290 (1988).
-
3.
Stephani J: 'Spiders, structure and communications shceme for mobile objects',

Decentralized Systems, Proceedings of the IFIP WG 10.3 Working Conference,
pp 207-218 (1989).
4.
Chaib-Draa B, Moulin B, Mandiau Rand Millot P: 'Trends in distributed

artificial intelligence', Artificial Intelligence Review, -i, pp 35-66 (1992).
5.
Brooks R: 'A robust layered control system architecture', IEEE Journal of Robot
Automation, RA-2, No 1 (1986).
6.
Tofts C: 'Describing social insect behaviour using process algebra', University

College, Swansea, Report (1992).
7.
Dijkstra E W: 'A note on two problems in connexion with graphs', Numerische

Math, -.L (1959).
8.
Arnold B C: 'Pareto Distributions', International Co-operative Publishing House,

Maryland, USA (1983).
15
EVOLUTION OF STRATEGIES
S Olafsson
15.1
INTRODUCTION
The applications of conventional game theory to economics and decision

theory have not been as successful as some of the field's initiators had hoped
for. It was indeed noted by von Neuman and Morgenstern [1] that
some of game theory's most serious limitations were due to its lacking
dynamics. For this reason economists and strategists have watched with
some interest the emergence of dynamic evolutionary game theory [2,3].
The main subject of evolutionary game theory is the study of the emergence
of strong strategies in a competitive environment. In his seminal work [4]
Maynard Smith has demonstrated how evolutionary game theory can be
applied to the study of animal contests in their struggle for territory and
mating rights.
Modern market players, in particular those in the fields of information
technology and telecommunications, face two major strategic problems. These
are the regulator and the competition, both national and international. In
a sense, playing the regulator amounts to the situation where a competitor
struggles against the hazards of the environment. Playing the competition
on the other hand is most naturally analysed in terms of pairwise contests
between opponents in the struggle for markets and customers (territorial
claims and dominance rights). As it turns out, these two different aspects
of the competitive game can be accommodated within a single structure [5] .
The regulator and the competition define an aggregate behaviour of
INTRODUCTION
265
competitors, against which a strategy has to be constructed. Put in a different

way, the regulator puts some (environmental) constraints on the strategies
available to the players of the markets.
Maynard Smith and Price [5] introduced the concept of the evolutionarily
stable strategy as an uninvadable strategy. They demonstrated that, in a
number of games, the strategy selection eventually evolved towards an
evolutionarily stable strategy. This concept has been immensely popular
amongst evolutionary biologists. It has also been successful at explaining some
evolutionarily stable conditions such as the sex ratio in many species [6].
It is likely that the concept could offer some explanation for an efficient owner
structure in different industries.
In spite of its interesting features evolutionary game theory is not without
its problems. Only two of these problems will be mentioned here: firstly,
what is the meaning of an evolutionarily stable strategy in competitive market
systems, and, secondly, what decision-making processes in human, market
or social conflicts are responsible for the strategy distribution moving towards
it? Or, in more basic terms, under what general conditions can one expect
market competitors to play the evolutionarily stable strategy?
Before this criticism is considered, one has to be aware that the concept
of equilibrium (evolutionarily stable or not) is, even in evolutionary theory,
a difficult one as evolution demonstrates a process of continuous change.
It does, nevertheless, supply the evolutionary biologist with valuable tools
for the study of animal-to-animal or animal-to-environment conflict
situations. Even though the system trajectories are generally far more difficult
to analyse than are the equilibria, they contain important information on
the dynamics of the learning process. Some studies on the learning process
leading to evolutinarily stable strategies have been undertaken [7] in recent
years. The learning process originates from the fact that the strategies are
selected and updated on the basis of the increase in fitness they bring to their
user.
The work described in this chapter originated from the author's attempts
to apply evolutionary game theory to the analysis of some dynamic aspects
of competitive markets [8]. The opportunities seem abundant, but the main
difficulty is the formulation of the problems and the mapping of some of
the biological concepts on to those of economy and market competition. Some
of these tasks are straightforward. For example, the struggle for dominance
and territorial rights can be identified with the struggle for markets and new
customers. Successful marketing is essentially a learning process, where a
number of different strategies are tried out. The evolution of the strategies
is determined by the success they bring their user.
The work is theoretical and develops the concepts needed for the intended
applications. Some concrete studies have already been undertaken.
266
Applications of dynamic game theory to task allocation on distributed processor systems have been discussed in Olafsson [9]. Also, the use of some
evolutionary models [10] has been applied to the analysis of the market
diffusion of different, and competing, network technologies [II]. Further
applications of the results will be presented elsewhere.
The chapter is organized as follows. Section 15.2 reviews some of the
basic concepts of dynamic evolutionary game theory (hereafter mostly called
dynamic game theory), as introduced by Maynard Smith [4], Taylor and
Jonker [12], and Zeeman [13]. Section 15.3 gives a formal definition of
the evolutionarily stable strategy [5] and discusses some of its general
properties. Here it is proved that the equilibrium states for the game can
be derived analytically, i.e. without simulating the game dynamics. In section
15.4 some stability properties of the equilibrium strategies are analysed. It
is shown how the analysis of a linear system reveals the number of pure
strategies which contribute to an evolutionarily stable strategy. Furthermore,
it is proved in this section that the fitness of an equilibrium strategy can be
evaluated in terms of the eigenvalue spectrum of the game's stability matrix.
In the final section a few examples are also discussed and results of analytical
calculations and simulations presented.
15.2
THE FUNDAMENTALS OF DYNAMIC GAMES
This section introduces the basic concepts of dynamic game theory. Let
S=(Sl>S2,.",Sn) define the finite set of strategies available to a population
of competitors. The vector P = (PI>P2,'" ,Pn) describes the probabilities with
which the strategies are used, i.e. Pi = P(Si) is the probability that a
competitor uses strategy Sj. The pay-offs associated with the various
strategies are presented in the form of a gain matrix G = Gij , I::; i,j::; n. The
precise meaning of the matrix elements is as follows: G jj is the pay-off to
an individual applying strategy Si against an individual using strategy Sj' The
components of the vector 1= (fl ,12,." ,In) present the fitness values assigned
to the various strategies. In general, the probabilities, the pay-offs and the
fitness values are functions of time. The triplet r = (G,s,p) will be taken to
define a game.
As a result of a contest between two opponents applying anyone of the
available strategies, their respective fitness values do, in general, change. For
the evaluation of the fitness a rule discussed by Maynard Smith [4] is
adopted. It gives the fitness of the strategies by the following expression,
I(p) =10 + Gp. The component.fi(p) =lo,i + (Gp)j gives the fitness of a player
using strategy Sj, when contesting a population using a strategy defined by
the probability distribution p = (PI>'" ,Pn)' Similarly, if a population of
competitors plays the available strategies with the probability distribution
FUNDAMENTALS OF DYNAMIC GAMES
p, then the mean fitness of that population is given by
267
I p =10 + p + Gp.
Generally, the mean fitness of the population serves, at any moment in time,
as a benchmark against which the fitness values of the pure strategies are
to be compared.
One would, in general, expect the dynamic to move the game towards
probability distributions which favour strategies with high fitness. In a
biological context, the interpretation of this fact can be twofold. Either the
present population modifies its strategy towards a probability distribution
which improves the average fitness of the community, or those members of
the population which are already applying the high fitness strategies are
rewarded by a higher number of descendants, which consequently inherit
the strategies of their parents. The effect of this is also that more individuals
will be playing the successful strategy. From a mathematical point of view
both interpretations are identical.
Many workers have considered it to be a disadvatage that evolutionary
game theory treats the growth of strategies as an asexual process. In fact,
this feature of the theory is a very desirable one from the market strategist's
point of view. Here, the success of a strategy has in general two very different
effects. Firstly, it implies a market expansion for its user, and, secondly, the
strategy multiplies in the sense that it gets used by any number of competitors
which become aware of its success. Both cases can easily be captured by the
elements of dynamic game theory [9], if the gain matrix elements are made
dependent on the strategy probability distribution.
This work considers mainly the following updating rule:
... (15.1)
where the dot means derivative with respect to time. The reasons for
considering this system of equations rather than those studied in Taylor and
Jonker [12] and Zeeman [13] have mainly to do with the context in which
this study arose, i.e. using dynamic game theory for the analysis of
competitive markets. Realistically, market-like systems are in general not
globally stable, as the position of the equilibria will depend on the system's
initial configuration. Furthermore, for most markets it is likely that a number
of gain matrix elements are negative, as one can expect losses when applying
some of the available strategies. Unlike in the case studied in Taylor and
Jonker [12] and Zeeman [13] the effects of these negative values cannot
be removed by just adding a constant vector to each column of the gain
matrix. They are real in the sense that they affect the attractor behaviour
of the dynamic system [8].
The main questions addressed in this chapter are the following.
268
Is there a single strategy or a combination of strategies which lead to

maximum fitness?
Does the system in equation (15.1) find these strategies (if they exist)?
Are maximum fitness strategies stable with respect to perturbations in

the probability distribution?
What is the connection between the stability of equilibrium states and

their fitness?
Simulations of the system in equation (15.1) reveal that it usually settles

in equilibrium states most of which are stable with respect to small
perturbations. The important question is whether the stable probability
distribution in which the system settles is one which leads to high (the highest)
fitness values or whether it is possible that the system gets trapped in stable
states giving rise to low fitness values. What would be the biological,
commercial or social implication of that type of state? It is demonstrated
that whereas evolutionarily stable strategies are attractors of the dynamics
of equation (15.1) the reverse is not, in general, true. It should be noted that
some systems do not settle in stationary point attractors at all, but rather
oscillate in a cyclic-type of attractor. How this comes about can be explained
in terms of the eigenvalues of the gain matrix. An example will be discussed
later.
15.3
STRONG STRATEGIES
Before tackling the problems related to the questions stated in the previous
section a more precise definition of what is meant by a winning strategy is
needed. Let II be the state space of all possible probability distributions, i.e:
n
II
[Pi~OI.E
Pi =
1=1
1]
Furthermore, p + Gq = fp(q) means the fitness of strategy p when it contests

strategy q.
Definition 15.1 - let p,qEll define two mixed strategies. The strategy p is
said to be stronger than q if for all rEll the following inequality holds:
STRONG STRATEGIES
269
One therefore expects a higher fitness increase by strategy P when

contesting any other, mixed or pure, strategy that can be constructed on II.
The special case where r is either P or q will be discussed later.
It has been shown [8] that the concept of strongest strategy has only
a limited usefulness. This is mainly due to the fact that the strongest strategy
in the game r == (G,p,s) is always a pure strategy. This has interesting
implications for the situation when every member of the population only plays
the strongest strategy all the time. In Olafsson [8] it was demonstrated that
if all the members of a population play only the strongest strategy, Sj, then
the average fitness of the population is given by the expression Ii == G ji .
Clearly, the strongest strategy only gives the population the highest possible
average fitness if:
Gjj == maxG ij
I :SJ:sn,
i.e. if the highest element in the row vector Gj==(Gjl> ... ,G jn ) is on the
diagonal of the gain matrix G. The following example demonstrates that some
games have no strongest strategy.
Example -
let the gain matrix be given by the expression:
Assume that pEIl is the strongest strategy in the game r == (G, p, s). Then
p+Gr>q+Gr, vrEII, which implies p+G>q+G. It is straightforward to
establish that this inequality leads to the two incompatible conditions
- PI> - ql and PI> q, where PI and ql are the first components of the
probability vectors p and q. This game has therefore no strategy which is
stronger than all other strategies.
As mentioned in the introduction Maynard Smith and Price [5]
introduced the concept of the evolutionarily stable strategy which is a highly
biologically motivated concept. The definition given here is taken from Taylor
and Jonker [12] and Zeeman [13]. It generalizes the definition given in
Maynard Smith [4].
Definition 15.2 - a strategy pEII is called an evolutionarily stable strategy
(ESS) if for all strategies qEII - [PJ one or the other of the two conditions
holds:
... (I5.2a)
270 EVOLUTION OF STRATEGIES
... (l5.2b)
The ESS gives a formalized definition of the best strategy in an
evolutionary context [4]. In particular the definition implies the ability to
resist the invasion of new strategies, possibly generated through mutations.
This point will be discussed later. Before discussing in what type of situation
one or the other condition, stated in the definition of the ESS, is satisfied,
it is necessary to make some statements about a system being in ESS. Here,
a simple proof of a theorem first proved by Bishop and Cannings [14] will
be given.
let p be an ESS. Then the fitness of p is equal to that of
all the pure sub-strategies Si contribution to p, i.e. fp = ii
Theorem 15.1 -
Proof - let W = [1:5 i:5 n IPi ~ OJ. The ESS is given by the linear combination p = E PiSi' Assume fp> ii, for some iE W. Then:
iEW
fp
E Pkfk
kEW
E Pkfk + Piii < E
kEW - fiJ
kEW - fiJ
Pkfk + pJp
... (15.3)
Pk
. Obviously qEII. If one defines q = E

qdk then
I-Pi
kEW-fiJ
it follows that fq > fp which contradicts the assumption that p is an ESS. By
assuming the inequality fp <ii for some iE W one will arrive at the same
contradiction again. The conclusion is therefore that if P is a mixed ESS,
then the following relations must hold:
Set qk =
--
ii = fj = f p ,
Vi,jE W
which proves the theorem.

One can easily show that, if p 0 is an ESS to the game
... (15.4)
r = (G, p, s), then:

... (15.5)
The result stated in Thereom 15.1 is intuitively clear. If the fitness of the
various contributing strategies was not equal, the probabilities would be
shifted so as to achieve that condition. A brief inspection of the equation
for the probability evolution makes this clear.
Maynard Smith [4] defined the ESS to be the ability of a population
to resist mutant strategies. A brief description of his original arguments and
how they relate to the formal definition in equation (15.2) will now be given.
Let pEII be an equilibrium state for a population, i.e p describes how the
STRONG STRATEGIES 271
average member of a population selects their strategy combination. If qEII

defines some other state the population can be in, one introduces
r=(1-)p+q with -O but ~O. r defines a state where most of the
population plays P but only a small fraction plays q. The important question
asked by Maynard Smith was: 'What is the fate of those individuals who
decide to play q?' A straightforward calculation gives:
fp(r) = (1- )fp(p) + fp(q)
... (15.6a)
fq(r) = (1- }fq(p) + fq(q)
... (15.6b)
If P defines a strategy which resists the mutation defined by q, Le. if P

is an ESS, then the following relation must be satisfied: fp(r) >fq(r). As -O
this inequality can be satisfied by the condition fp(p) > fq(p) which is just
the definition in equation (15.2a). If on the other hand Wq ~ Wp , then from
Theorem 15.1 one finds that fp(p) = fq(p) and P can only be an ESS if the
inequality fp(q) > fq(q) holds. This situation is described in the definition in
equation (15.2b).
Example - now a simple example of two strategies will be discussed in some
detail. For the sake of clarity, let the angular bracket ( ) indicate the mean
values in this example. The gain matrix is given by:
G=(-~ ~).
This is the so-called Hawk-Dove game which has been discussed in detail
by Maynard Smith [4] and others (see for example Zeeman [13]). Here it
will be demonstrated that with respect to the dynamic (equation (15.1 this
simple game has two equilibria, only one of which is an ESS. Assume that
the system has settled in an equilibrium state for which one writes
P = P1SI + P2S2. Assuming that both strategies contribute with a nonvanishing probability, Le. PI ~ 0 and P2 ~ 0, then fl = f2 = (1). These
conditions lead to a matrix equation of the form shown in equation (15.5).
The probabilities solving the fitness-constraint conditions can therefore be
found as solutions to this matrix equation. As the matrix equations can be
scaled by an arbitrary factor, one can write the linear system as Gp = h,
where /2 is a two-component unity vector, h = (1, l)T. The normalized
solutions to Gp = /2 will represent the equilibrium state p. They are, as one
would expect, PI =P2 = 112. Later, this result will be generalized to include
multi-strategy games. It should be noted that the fitness values associated
with this probability distribution are fl = f2 = 0.5 and the average fitness is
the same, (1) =0.5.
272
It can be proved that the equilibrium state p = (l12, 1/2) is stable

(see next section). Figure 15.1 demonstrates that this state is not globally
stable, i.e. only a subset of all possible initial conditions converges towards
p = (l12, 112). The following simple analysis shows how this comes about.
One calculates the expected fitness in the general state p = (PI ,pz) to be
(1) = 1 - 2pf. The zero position is given by PI,o = [172 . For all PI > PI,o one
finds that (f1(PI) - (f(PI) >)/(f(PI > 0, and therefore PI will increase until
it reaches the value PI = 1. A straightforward analysis reveals that, as long
as PI is initialized in the open interval 1=(0, JII2 ), PI will converge towards
the stable equilibrium p = (112,112). If on the other hand PI is initialized in
the interval J = ([f72, 1) the system will converge towards the stable state
q=(l,O). This is demonstrated in Fig. 15.1. The dynamic has therefore two
stable states p = (l12, 112) and q = (1,0) surrounded by the basins of attraction
1= (0, [f72) and J = ([f72, 1) respectively. It is straightforward to calculate
thatfp(p)=fq(p) = 112, but -112=fp(qfq(q)= -1 which establishes that
only the attractor p = (l12, 112) is an ESS.
Most games considered here do have ESS which are mixed. Nevertheless,
an ESS can be pure. One can prove [8] that if Sb 1 :$ k:$ n, is a pure ESS,
then the gain matrix elements in the kth row satisfy the condition
Gkk>GibVi~k.
.....
1.0
_-:-:::;;Ir;":'I
.. -=.~_.-.-
0.8
0..
ii
til
.c
e0..
0.6
0.4
0.3
400
0
time
Fig. 15.1
Simulation of the system
Pi
= (
(A )(J, - (f. All probability distributions
initialized in the interval (0, Jf72) evolve towards the attract or p = (1/2, 1/2). When the system
is initialized in the interval (jf7i) it converges towards the attractor p = (1,0).
STABILITY AND FITNESS OF AN EQUILIBRIUM STRATEGY
273
The following theorem shows how the strategy distribution can be

calculated from a set of linear equations defined by the gain matrix. For
proofs see Olafsson [8].
let P = (PI ,P2,'" ,Pn) be the positive solutions to the linear
Theorem 15.2 -
system:
.E
GijPj
JEW
Then qi
=Ilwl ' viE W, W= [I :5i:5nlpi>O]
~ defines
. I.J
Pj
.. , (15.7)
a set of equilibrium states for the game:
JEW
r = (G,p,s)
Theorem 15.3 demonstrates how all strategies of one particular game can
be found by extending the results of Theorem 15.2 to include all possible
sub-matrices of the gain matrix.
Theorem 15.3 -let G be a real n x n gain matrix. For every system of indices
(Giii;.. . .
l:5i l :5 ... :5ik :5n, k:5n denote by

i~) the sub-matrix of the gain
matrix G found by removing simultaneously the rows and the columns
numbered by (il> ... ,ik ). If the solutions to the system:
... (15.8)
. . .I.e.
are pOSItIve,
vector:
q(k)
= ( q(k),l>'"
P(k)i>
,
0 ,1=
' I , ... ,m=ran k(Gili2
,q(k),m ) , q(k),i
ik) ,
'1'2'k
P(k),i
-m--
t h en t h e norma I'Ize d
Tb '
represents an eqUl I num state
E P(k),j
for the game
15.4
r = (G,p,s).
j= I
This section discusses in some detail the connection between the equilibrium
states for rule (15.1) and the ESS. From previous analysis it is clear that ESS
defines an equilibrium, but it is not clear whether all equilibrium states also
define an ESS. Furthermore, it will be discussed whether and then under what
conditions the equilibrium states are unique. The following two Lemmas can
be proved by using elementary linear algebra.
Lemma 15.1 - let G be the n x n gain matrix for the game r = (G,p,s) where
every component of p has a non-zero value, then this equilibrium is unique
only if rank(G) = n. This does not exclude the game having a number of
different equilibria each one with less than n non-vanishing components.
From this it follows that if the equilibrium states with n non-vanishing
components are not ESS, then the game has no n component ESS. The
following Lemma states that the n component equilibrium states are in fact
ESSs.
Lemma 15.2 - let G be an n x n gain matrix and det(G) ;:0. Then any n
component stable equilibrium state of the game r = (G,p,s) is also an ESS.
As discussed, an ESS defines an equilibrium state for the dynamic
equations. It is important to understand the stability of the ESS. The precise
meaning of this statement is the following: 'Does a permutation in the
probability state vector pEII lead to a new equilibrium or does the new
(permutated) strategy lose and the system fall back to its previous
equilibrium?' The question has to be approached in a dynamic context.
Assume that Po = (Po I>Po 2,." ,Po n) is an equilibrium state for rule (15.1).
Linearising rule (15.1) in t'his stat~ gives an equation of the form q = t:. (p o)q
with:
n
t:.(Po) ij = Po,i ( G ij -
k=!
(Gjk + GA )PO,k)
... (15.9)
Using a well-known theorem from dynamic systems theory [15] the

following is true.
the equilibrium state qo = (qo,! ,qo,2"" ,qo.n) is stable if the
real parts of the eigenvalues of t:.(qo) are negative.
Theorem 15.4 -
The following theorem states some results which make it possible to

evaluate the fitness of an equilibrium state p in terms of the stability matrix
t:.(p).
let p be an equilibrium state for the game r = (G,p,s). Then

the average fitness of a population in this stage is given by the expression:
Theorem 15.5 -
fp(p) =
2:
[tr(A) - tr(t:.(p]
... (15.10)
where the components of the matrix

means the trace of the matrix.
are defined by
A ij = GijPj
275
and tr(A)
Proof - see Olafsson [8).

It is interesting that the fitness of a population in a state p can be expressed
in terms of the eigenvalues of the stability matrix t.(p). From theorem 15.5
the following can easily be proved.
Lemma 15.3 - let AJ,A2, ... ,An be the eigenvalues of the stability matrix
t.(p). Then the fitness of a population in a state p is given by the expression:
... (15. II)
A possible interpretation of this result is that adaptive strategies are more
importantly characterized by stability rather than optimality. Later on, it
will be demonstrated how a competitive system can evolve towards
equilibrium states with high fitness values but poor stability. They therefore
are not ESS.
This chapter will not discuss in general the case where the eigenvalue
spectrum of the stability matrix is complex, but only demonstrate by one
example how this leads to oscillations in both probability distribution and
fitness.
Example -
consider a system with the gain matrix:
231)
1 2 3
3 1 2
Applying theorem 15.3 one finds the equilibrium states qo=(I/3,l/3,l/3).

Inserting this into the stability matrix gives the following eigenvalues:
(AI ,A2,A3) =(- 2,O.58i, - O.58i). The two complex eigenvalues cause the
system to oscillate around the equilibrium state q O' This is demonstrated in
Fig. 15.2.
One can prove the following [8].
Theorem 15.6 -
w=
consider the game
ilai = max (Gp)k]

kE[I, ... ,n]
r = (G,p,s).
Define:
... (15.12)
276
0.8
0.7
0.6
N
a.
0.5
:.g
0.4
~
D
ea.
0.3
0.2
0.1
O'--_~
_ _...L.-_........L_ _- ' -_ _.L-_""""__...L.-_~
0.1
probability, P1
Fig. 15.2
23 I )
The oscillating solutions to a game with the gain matrix, G = ( I 2 3

3 I 2
then I(p) = E
pjSj
is an ESS if the real parts of the eigenvalues of
iEW
the matrix /i(p) are negative.
15.5
EXAMPLES
In this section the results from previous sections will be applied to some
concrete cases. It will be shown how the number of contributing strategies
can be derived from the analysis of linear systems of the type in equation
(15.5). Furthermore, it will be emphasized that it is possible for the evolving
system to arrive at non-stable equilibria with high fitness values. These states
are characterized by their dependence on the system's initial strategy state.
Example 1 - in this example the methods developed so far are used to analyse
the four-strategy game defined by the gain matrix:
EXAMPLES
277
The normalized solution to the equation Gp = 1(4) is P = (0.43, 0.35, 0.12,

0.10). P therefore defines an equilibrium state for rule (15.1). That this is
a stable equilibrium is clear by the eigenvalues to the stability matrix,

evaluated inp. They are A= (- 2.88, -1.7, -0.79, -0.13). Figures 15.3(a)
and (b) show the evolution of the four strategies for two different initial values
for the probabilities. Both initial states evolve towards the equilibrium state p.
Now consider the games defined by some sub-matrices of G. Let Ai ;
i = 1,2,3,4 be the four pay-off matrices given by the expressions:
AI=
A 3=
0 ~) (
C D;
5
-1
3
7
2
2
; A2=
A4 =
35)
-1
4
4
-1 2
3 1
-1
3
4
2 5
3 -1
73)
All except AI have positive solutions to the linear equation A i Pi=1(3).

They are, after normalization:
P2
P3
P4
(0.39, 0.12, 0.49);

(0.48, 0.28, 0.24);
(0.45, 0.41, 0.14).
These are the equilibrium states for the subgames defined by A 2 , A 3 and
A 4 . The eigenvalues of the stability matrix evaluated in these equilibrium
states are:
A2
( - 1.9, -2.5, -0.5);
A3
( -2.7, - 2.1, -0.3);
A4
(-2.9, -1.6, -1.0).
The equilibrium states PI> P2, P3 are therefore stable. Looking at these
vectors as defining strategies where one of the strategies has not been selected,
P2 would be taken to mean ih = (0.39, 0.0, 0.12, 0.49) when viewed as a
strategy within the initial game. Similarly one can define the following two
states ih = (0.48, 0.28, 0.0, 0.24) and P4 = (0.45, 0.41, 0.14, 0.0). All the
states P2,P3,P4define stable equilibrium states for the dynamics of the initial
4 x 4 game. For example, if the system is initialized as P2 = (ql ,0,Q3,Q4) it
eventually converges towards P2' The same is true for the other states P3 ,P-4'
278
0.6
~ 0.4
----------------
:0
--------------------
ell
.0
a.
.......
'" '"
............................................................
_._._._._._._._._._.-._._.
__ ._._._.-._._.-
0'--
--1-
--10.
.&--
200
.&--
400
700
.....
1000
time
(a)
0.6
l/J
:0
ell
.0
ea.
"","'---- -
--- ------- -----_. --- ---- ------------
/.-.-.-.-
0.1
/.......... .-.-.-.- .-.-.-._._ .-'-'-._.

.......
..-
O'--
--I-
200
...L..
-'-'-'-'-'-'-'-._._.-..
400
600
---'
800
1000
time
(b)
Fig. 15.3
Solution trajectories for the four-strategy game with the gain matrix in
Example I. The two different initial conditions lead to the same equilibrium strategy.
EXAMPLES
279
But, a minor perturbation in the vanishing component, ie. replacing it by

a small non-vanishing value, for each state pj:i = 2,3,4 will drive the system
towards the global equilibrium state of the 4 x 4 game, i.e. p =
(0.43, 0.35, 0.12, 0.10). This is demonstrated in Fig. 15.4.
--
0.5
0.4
~
;E
0.3
.,'-',
'./
,r'o
".
! '.
ii
""
~,
0.2
a..
.~
'
'',...(''
" "
,/
,/.
,, ,
0.1
", *" '

....
,,~-'-
........
'_
---------~------------_..-----
.-.-.-.-._.- '-'-.-
._ .... _...
.....................................................................
,,
.."
OL...._---L_
_......L_ _...L._ _....-_---JL.-_--L_ _- L - _
200
400
600
800
1200
1400
time
P2
Fig. 15.4
The gain matrix is the same as in Fig. 15.3. The probability vector = (0.39,0.0,
0.12, 0.49) defines a non-stable equilibrium. By perturbating its second component, the system
evolves towards its global equilibrium at p=(0.43, 0.35, 0.12, 0.10).
From the above it is clear that the equilibrium states can be analysed in
terms of the algebraic properties of the gain matrix. Given the gain matrix
one would in general not have to simulate the system in rule (15.1) to find
the equilibrium states. These can be found by solving the linear system in
Theorem 15.2.
Example 2 - In the following, an example is considered which has been
discussed by Maynard Smith [4] and Zeeman [13]. It is a four-strategy game,
the so called 'Hawk-Dove-Bully-Retaliator' (HDBR) game. The gain matrix
is given by the following expression:
D
B
R
1
3
3
1
6
2
6
2
6
0
3
6
0
4
2
4
The letters for the individual strategies have been included to indicate
the expected benefits when one strategy is played against another. First one
has to solve the linear equation Gp = /(4) just to find that the P vector
contains two negative components. One concludes that there is no equilibrium
state containing some contribution from all the pure strategies. Furthermore,
the fact that there are two negative components in the solutions to Gp = /(4)
shows that there are no equilibrium states containing more than two contributing strategies. This can be demonstrated by considering the eigenvalues of
the matrices found by removing some of the rows and the columns. First
consider the following sub-matrices:
=0 }A =(;
6
3
6
A, =(: : }A' =0
6
2
6
Al
3
6
4
2
4
6
2
2
)
2
4
where the matrix Ai ; i = 1,2,3,4 is found by removing the ith row together
with the ith column. Only in the case of the three first matrices do the
equations AiP = /(3) have solutions with non-zero components. The normalized solutions are:
PI
(1/3, 0, 213); P2
(3/5, 2/5, 0); P3
(0, 112, 1/2)
Each of these matrices defines a three-strategy subgame of the initial game

but as mentioned above each state contains only simultaneous contributions
from two of the four available strategies. Firstly, consider the game defined
by the matrix AI' The eigenvalues of the stability matrix (equation (15.9
in PI are A = (0, - 3.33, 0). Figure 15.5 plots the probabilities P2 as a
function of the probabilities P3 which have been initialized at six different
values. The probabilities P2 have in all cases been initialized at the value
P2 = 0.01. Because A2 <0 the system returns to P2 = but at different values
for PI and P3'
The game defined by the matrix A 3 is particularly interesting as it has
a continuous sequence of equilibrium points of the form P = (O,q, 1- q), i.e.
all these states are solutions to the linear equation A 3P = / (3)' The fitness of
the two contributing strategies is easily calculated to be Iq(p) =II _ q (p)
= 4 - 2q. Figure 15.6 shows the probability distribution P2 =pipI) for vari-
EXAMPLES
281
0.7
0.6
N
a.
,,,,-- ..................
O.5
I
::? 0.4
,,
I
:0
ell
-g
a.
0.3
0.2
0.1
..
....
..
"" ....
..
I
,,,
,,
.... '"
,.'
".
------
OL'~_~=--_:"'-'_~:--.....c.:':"'_-_#;=::=:=--~~:""':::"':::~u
0.6
probability, P3
Fig. 15.5
Probability trajectories for a three-component subgame, A I of the Hawk-DoveBully-Retaliator game. The component pz is plotted as a function of the component P3 for six
different initial conditions.
0.6
0.1~_---;~_---;~--*,---::-L:--_*,........_""'=""::--_~
Fig. 15.6
Probability trajectories for a three-component subgame, A 3 of the Hawk-DoveBully-Retaliator game. The component pz is plotted as a function of the component PI for seven
different initial conditions. The component P3 was initialized at the value 0.5. Four of the initial
states settle in the ESS state at p = (2/3, 1/3, 0). The remaining initial states evolve towards
non-stable equilibrium states.
282
ous initial values for Pl' By initializing the system in P = (r, 0.5, 0.5 - r),
rE [0.27,0.32] one finds the trajectories shown in Fig. 15.6. For r=0.285
the system converges in the point P = (2/3,1/3,0) which has the average fitness
value of If(p)) = 2.66. If on the other hand r5 0.285 the system converges
towards a state of the general form p (q) = (O,q, 1- q) with the above
mentioned q-dependent fitness. The evolution of the fitness for the various
initial conditions is shown in Fig. 15.7.
The states p (q) = (O,q,l- q) are non-stable equilibrium states as each one
of them is arrived at through one particular initial condition. For q5 0.67
these states have fitness which is higher than that of the attractor state
p = (2/3,1/3,0), i.e. 2.66 but they are not stable with respect to perturbations.
They are therefore not ESS.
As mentioned at the beginning of this example two of the components
of the solution of Gp =/(4) are negative. By removing the corresponding
rows and columns and solving the reduced linear equation A 2,4 P = /(2) with
A 2 ,4 =
(~
:)
one finds p = (3/5,2/5). This state is stable and the
eigenvalues for the stability matrix are I
= (-
0.02, - 2.98).
...................................................................
3.2
,.,-------------------------------------:,~.--"'"-------------------
3.0
III
III
.... .'
2.8
CD
E 2.6
....
'.
2.4
2.2
0
time
Fig. 15.7
The fitness values for the probability trajectories of Fig. 15.6.
REFERENCES
15.6
283
DISCUSSION AND CONCLUSIONS
Some of the limitations of conventional game theory are due to its lacking
of dynamics. In their seminal work von Neumann and Morgenstern [I]
expressed some regret at the static nature of game theory. These limitations
were partly overcome with the development of methods to find equilibria
in zero-sum games [16]. Since then, most of game theory has made strong
usage of the concept of strategic equilibrium. It is probably the most
frequently used game theoretic concept in applications to market analysis
and strategic games. Some of the economic applications are market
equilibrium, co-operation, bargaining and public goods, just to mention a
few.
Contrary to the static game theory, evolutionary game theory analyses
the temporal evolution of strategies. A fundamental concept in evolutionary
game theory is that of an evolutionarily stable strategy. Whether the strategies
of the game evolve towards an evolutionarily stable strategy or not depends
often on the initial strategy applied by the population. The various trajectories
represent the learning processes initiated under different conditions, but
motivated only by maximizing returns. This fact is considered to be of an
essential importance in applications to economics and market analysis.
Contrary to the conditions amongst animals, market operators have more
freedon in constructing the initial strategy distribution.
This chapter has discussed in some detail the mathematical structure of
evolutionary game theory. It has been demonstrated that the evolutionarily
stable strategy introduced in Maynard Smith and Price [5] can be found
analytically by solving a set of linear equations. Furthermore, it has been
shown how an algebraic analysis of the gain matrix allows one to find the
number of pure strategies which contribute to the evolutionarily stable
strategy.
It has also been demonstrated that adaptive strategies are better
characterized by stability rather than optimality. Simulations have shown
that some unstable states, reached through specific initial conditions, can
have higher fitness values than evolutionarily stable strategies for the same
game. This is best demonstrated by the fact that the fitness of a population
can be expressed in terms of the sum over the eigenvalue spectrum of the
game's stability matrix.
REFERENCES
1.
Von Neumann J and Morgenstern 0: 'Theory of games and economic behaviour',

Princeton University Press (1953).
284
2.
Lewontin R C: 'Evolution and the theory of games', J Theor Bioi, I ,pp 382-403
(1961).
-
3.
Slobodkin L B and Rapoport A: 'An optimal strategy of evolution', Q Rev Bioi,

49, pp 181-200 (1974).
4.
Maynard Smith J: 'Evolution and the theory of games', Cambridge University

Press (1989).
5.
Maynard Smith J and Price G R: 'The logic of animal conflict', Nature, 246,
pp 15-18 (1973).
-
6.
Hamilton W D: 'Extraordinary sex ratios', Science, 156, pp 477-488 (1967).
7.
Harley C B: 'Learning the evolutionary stable strategy', J Theor Bioi, 89, pp

611-633 (1967).
-
8.
Olafsson S: 'On the stability of strategies in competitive systems', International

Journal of Systems Science (July/August 1995).
9.
Olafsson S: 'A general model for task distribution on an open heterogeneous

processor system', IEEE Trans on Systems, Man and Cybernetics, 25, No I,
pp 43-58 (1995).
-
10. Eigen M: 'Self-organization of matter and the evolution of biological

macromolecules', Naturwissenschaften, 58, pp 465-526 (1971).
11. Olafsson Sand Gell M A: 'Application of an evolutionary model to
telecommunication services', European Transactions on Telecommunications,
1-, pp 69-75 (1993).
12. Taylor P D and Jonker L B: 'Evolutionary stable strategies and game dynamics',
Math Biosc, 40, pp 145-156 (1978).
13. Zeeman E C: 'Dynamics of the evolution of animal conflicts' , J Theor Bioi, 89,
pp 249-270 (1981).
14. Bishop D T and Cannings C: 'A generalized war of attrition', J Theor Bioi, 70,
pp 85-124 (1978).
15. Jordon D Wand Smith P: 'Nonlinear ordinary differential equations', Clarendon
Press, Oxford (1989).
16. Brown G: 'Iterative solutions of games by fictitious play', in Koopmans T C
(Ed): 'Activity analysis of production and allocation', pp 374-376, Wiley, New
York (1951).
16
DYNAMIC TASK ALLOCATION

S Olafsson
16.1
INTRODUCTION
The recent development of open computer systems has led to new challenges
for the management of computer resources and their efficient utilization
[ I, 2]. Interconnected heterogeneous computer systems, operating in a highly
parallel manner, are very different in their behaviour to those conventional
computational resources operating in an isolated and non-interconnected
manner. The management of these systems requires a new approach aimed
at an efficient distribution of computational requirements in an environment
subject to continuous change and evolution.
First attempts to apply market principles to the allocation of computer
time were reported in Sutherland [3], where the price of computer time was
allowed to depend on general demand and the relative priority of users, so
that the more important users had easier access to computer resources.
However, even the most impoverished users could be allocated some computer
time not needed by anyone else. By applying this auction principle it was
found that the computer utilization was very high.
In recent years, applications of market-like and game-theoretic principles
to resource allocation on open heterogeneous computer systems have been
studied more rigorously and some extensive theoretical frameworks for this
approach have now been worked out [4, 5] . In the course of these studies
the view has emerged that the market approach offers considerable benefits
compared with centrally controlled and synchronized networks. In the last
ten years, some researchers [6-8] have reported on implementations of
286
market-based open computational systems. All these systems have been

designed to allow customers to utilize idle processors in distributed
heterogeneous computer networks. Nevertheless, the utilization mechanisms
of these systems are different in their degree of sophistication and flexibility
in the management of the networks' resources.
An essential aspect of market-like systems [9] is the flexibility of the way
in which jobs are allocated for execution. Here, a job is taken to be some
type of computational or processing requirement which can be addressed by
some members of the heterogeneous processor system. In general each job
can be split into a number of sub-jobs, called tasks. As it is, in fact, the task
which is allocated to a processor for execution, this chapter refers to task
allocations rather than job allocations. In principle each task can be dealt
with by more than one processor. Which one actually does the processing
is a matter of decision, to be made by the task.
When a task enters the network, its presence, together with (at least some
of) its requirements, is announced to some or all processors on the network.
They then submit bids indicating their interest and capabilities (resources)
to do the execution of the required tasks. After the bids have been received
and evaluated the task is in a position to 'write down' a utility function which
quantifies the perceived benefits of using anyone of the processors which
have submitted a bid. On the basis of these perceived benefits, the task selects
one processor for the execution of its requirements.
The utility functions represent the 'pay-off matrix' which lays out the
strategies followed by the various tasks (customers):
0 1,1
=
...... OI.N
... (16.1)
OM,I ...... 0M,N
N is the number of processors and M the number of tasks. Each element

in the utility matrix, say 0k,i, describes the perceived benefits task k expects
to gain by using processor i. This utility matrix, together with additional
uncertainties, is mainly responsible for the actual distribution of tasks on
the available processors.
Limited information is an important aspect of the 'computational
ecosystem philosophy' [4]. It is not that complete information cannot, in
principle, be achieved, but it is rather a question of striking the balance
between the costs of getting that information and the possible penalties one
has to pay for making decisions when equipped with only limited information.
As the processor system is an open one, new processors are constantly being
put into service or removed from the system. Each task has only limited
knowledge of these new services and the jobs for which they are particularly
INTRODUCTION 287
well suited. Because of the limited information, the dynamics of the task
allocation process is a probabilistic one.
In this work a new model for task distribution on an open multiprocessor
system is introduced [10, 11]. It is more general than the one described in
Kephart et al [4] as it can, without any modification, be applied to an
arbitrary number of tasks and processors. The model is dynamic and the
basic set of equations describes the time evolution of a matrix quantity,
Pm,n, which gives the probability that task m is dealt with by processor n.
The limited information available to the tasks, and therefore the degree of
uncertainty in the system dynamics, is reflected in the time evolution of
Pm,n' For example, a minimum knowledge of the potential benefits of
choosing one or another processor results in no preference at all, leading
to a near equal probability distribution of tasks on the available processors.
On the other hand, a reliable knowledge of the benefits that a task can expect,
by using one processor rather than another, will manifest itself in a preference
for few processors and possibly in a structured, i.e. uneven, task distribution.
As in statistical systems in general, the structure inherent in the probability
distribution can be described in terms of the entropy function (see, for
example, Chandler [12]). In the model developed here an entropy function
is introduced for each task. The task entropies give some information on
the utilization of the processor system. Each task entropy provides
information on how that task distributes itself, in a probabilistic manner,
on the available processors. Low values for any of the task entropies
demonstrate a preference for some processors over others.
In section 16.2 the basic concepts of the model are introduced. In addition
to the above mentioned probabilities Pm,n and the utility functions Gk,i'
section 16.2 introduces a gain parameter {3 and the so-called transfer function
W(kl,m,n which measures the transition rate for task k moving from processor
n to processor m. The transfer function is responsible for the continuous
redistribution of the tasks, i.e. the time evolution of Pm,no In this notation
W(kl,m,ndt measures the probability of moving task k from n to m in the time
interval dt. The system evolution can also be described in terms of a quantity
in which gives the expected value for the fraction of the total number of
tasks being dealt with by the nth processor.
The task entropies are introduced in section 16.3. The special distributions
leading to maximum and minimum values for the task entropies are discussed,
as well as their importance as a 'watchdog' for the system's utilization. The
section also discusses the use of task entropies as a metric for the processor
system's suitability for dealing with the incoming computational requirements.
A number of different choices for the utility functions are discussed in
section 16.4 and the dynamical results of each particular choice are described.
288
This section contains a number of graphs describing the percentile occupation

of the processors and the entropy evolution of the associated task
distributions. In section 16.5, a second look is taken at the entropy properties
of the system. This time a new entropy function is defined in terms of the
total task-distribution on the available processors. Section 16.6 introduces
some concepts from general equilibrium theory and discusses the possibility
of constructing dynamical task allocation processes which lead to competitive
equilibria of Pareto type. In section 16.7 some analysis of the results is
offered, and the conclusions of the work together with some general remarks
are presented in the final section.
16.2
BASIC FORMALISM
The system considered is a network of heterogeneous processors, each one

capable of performing various tasks. The input to the network is a number
of tasks to be distributed on the network's processors. The tasks can be
allocated to anyone of the processors, but the decision as to which one is
not predetermined nor centrally programmed into the system. The actual task
allocation is done on the basis of the perceived benefits each task sees in using
a certain processor. These benefits are represented by the utility function
which measures the potential benefits of choosing one or a few of the available
processors. The nature of these benefits, will, in general, relate to cost, speed
and reliability of service, etc.
Let M be the number of tasks and N the number of available processors.
Then P=Pm,n I smsM,1 snsN is a matrix which describes how the M
tasks are distributed on the N processors. Each element P m,n of the P-matrix
describes the probability of task m being allocated to processor n. The task
distribution is not a stationary one, but changes with time. Consequently,
Pm,n is a dynamic MxN matrix which describes the time evolution of the
task allocation.
Because of the probabilistic interpretation of P m,n it has to satisfy the
normalization condition:
N
E P mn
n= 1
'
1, "1m,
... (16.2)
which just means that every task m is certainly executed.

The fraction of tasks using one particular processor is given by the
expression:
BASIC FORMALISM
289
E P mn
fn
m=\
... (16.3)
'
Because of the probabilistic nature of P m,n the variable fn is indeed the

expected fractional load on the nth processor.
Let w(k),m,n be the transition rate for task k moving from processor n to
processor m. The tasks get moved from one processor to another on the basis
of the tasks' preferences (see Figs. 16.1 and 16.2). This preference can depend
on a number of different factors, but, in general, it does depend on the present
distribution of tasks on the processors. Various options will be discussed in
section 16.4. The utility function Ok m describes the benefits task k sees in
using processor m for the execution 'of its requirements. Defining r~)n =
Ok,m - 0k,n and assuming that the transition rate for task transfer is given
by the normal distribution with variance:
a(
~t3
).
and zero mean, then:

(k)
rm,n
W(k),m,n
.\
e-(t!Xm,n)
-00
Wo (
k )
21-erf(-t3fm,n)
... (16.4)
where Wo is a constant with the dimension of a rate. Therefore w(k),m,n

represents the transition rate for task k moving from processor n to processor
m given that it perceives its gain in using these processors to be 0k,n and
0k,m respectively. The form of this transfer function is given in Fig. 16.3.
From Fig. 16.3 it is clear that, if 0k,m = 0k,n' then the probability that
the task k is transferred is 0.5. If, on the other hand, 0k,m>Ok,n, then the
probability that k moves from n to m is larger than 0.5 and it increases and
approaches 1 as the difference between 0k,m and 0k,n grows larger. The rate
at which the probability increases depends on the 'gain parameter' t3 which
is inversely proportional to the variance of the normal distribution (equation
(16.4. It is steeper the larger t3 is. For t3-00, the transfer rate approaches
the step function, so that W(k),m,n = 1 if Ok,m > Ok,n and 0 otherwise. In this
case, task k stays with processor m as long as the condition 0k,m > 0k,n is
satisfied. This makes the dynamics wholly deterministic.
290 DYNAMIC TASK ALLOCAnON
Fig. 16.1
tasks
processors
A schematic presentation of the probabilistic assignment of tasks to available

processors.
--...
Ul(k),1,N
y-1
\.....
~
Ul(k),n,m
Fig. 16.2
How the transfer function is responsible for the transfer of jobs from one processor
to another.
It has been argued [11] that the evolution of the probability matrix
Pm,n(t) can be described by the master equation:
BASIC FORMALISM
291
1.0
0.8
0.6
~
3'
0.4
The transfer function w(k).i.i for

three different gain parameters
{3 = 1.0, {3 = 2.0, {3 = 10.0
0.2
0.0
-6
Fig. 16.3
-4
-2
The shape of the transfer function for three different gain parameters.
- E w(m),I,n (t)Pm,n (t)

1=\
... (16.5)
Before continuing, it is in order to discuss one simple special case. When

the rate of transition does not depend on the task index, that is:
W(m),n,1
Wn,1
Vm
... (16.6)
then equation (16.5) can be written in the simple form:

.
!n
E
1=\
wn,IJi -
E WI,n!n
... (16.7)
1= 1
where the dot means the time derivative, and!n is the expected value for the
fraction of tasks using the nth processor. The assumption in equation (16.6)
is a rather unrealistic one as it does not consider any task-dependent
requirements but assumes that all the tasks perceive the processor system in
an identical manner. The network can consist of a high variety of service
providers offering services as diverse as digitized voice, electronic mail,
interactive videotex services and facilities which perform lengthy numerical
calculations. The tasks submitted to the network are of equally diverse nature.
292
It can be shown [II] that, in the case of only two processors, equation
(16.7) reduces to the following expression:
... (16.8)
This equation is simply the one studied by Kephart et al [4] and
Hubermann and Hogg [5] in the case of a two-processor or strategy system.
It is interesting that even this simple system can display immensely
complicated behaviour, including chaotic phenomena which arise when time
delays are introduced [4]. It is concluded that the equations studied by these
authors [4, 5] constitute a special subset of the general case of equation
(16.5).
16.3
THE ENTROPY OF THE TASK DISTRIBUTION
Instead of monitoring the values Pm,n themselves, an entropy function for

each task k was defined:
N
Sk =
-E PknlnPkn , k=I" .. ,M
n=I'
... (16.9)
Sk is a measure for the probabilistic distribution of task k on the

processor system. Sk has the maximum value of Sk,max = In(N) if the task is
evenly distributed over all the processors (i.e. Pk,n = 1/N for all n) and Sk
has the minimum value of Sk,min = 0 if the task is likely to be dealt with by
only one processor. In this latter case the distribution is given by Pk,n = 0m,n
for all n and one (the favourite) processor m.
Low values for the task entropy Sk can mean one of two things. Firstly,
it is possible that the set of processors considered is generally not very suitable
for the requirements defined by the tasks. In other words, only a small part
of the whole spectrum of computational and processing capabilities offered
by the processor system is attractive to the task so that it only distributes
itself (in a probabilistic manner) on a small subset of the whole processor
system. Secondly, low task-entropy values can be caused by the task only
receiving bids from few processors excluding most of the other processors
from consideration. In general, this second situation would arise if only a
small number of processors are aware of the task's requirements and
consequently most processors are not in a position to submit bids.
Comparing the components of the entropy vector, S = (SI,S2"",SM) says
whether the tasks perceive the processor system in a similar manner or not.
SIMULAnONS
293
If all the components are close in value the various tasks have a similar
distribution on the processor system. Given that the computational and
processing capabilities of the processors are in general very different and
together span a large spectrum of processing facilities, similar values for the
S components imply that the processor system is equally suitable for all of
the tasks. For a multitude of different tasks this is generally not the case.
16.4
SIMULATIONS
As stated by equation (16.5) the time evolution of the probability distribution

w(k),m,n- These in turn depend
on the utility matrix elements Gk,n as given by equation (16.4). In the
simulations described in this section the procedure was as follows. Firstly,
the system is initialized with some arbitrary probability task distribution. In
all the simulations the initial distribution is random, Le. each job is randomly assigned to one of the available processors. To enable a comparison between different choices for Gk,i and other relevant parameters, all initial
probability distributions were randomized in the same manner. Subsequent
redistribution takes place if the initial distribution is not compatible with the
perceived preference that the tasks show for the different processors. There
are a number of possible ways to set the preference matrix, but only a few
of them will be discussed here.
One possible utility function has been studied in Kephart et al [4] for
the case of a two-processor system. The utility functions have been set as:
P m,n(t) is determined by the transition rates
... (16.10)
where aj and bi> i = 1,2 are positive or negative constants. Here, the utilities
are expressed in terms of the fractions of jobs presently distributed on the
two processors.
Because of the more general nature of the model introduced in this work,
it is not sufficient to express the Nx M elements of the utility matrix only
in terms of the N expected fractions of jobs fl> ... ,fN on the processor
system. As it is the time evolution of Pm,n that is of interest, the aim is to
express the utility matrix elements in terms of these probabilities, or functions
thereof. The elements of the probability matrix P m,n relate to the fractional
averages as expressed in equation (16.3). Further motivations for this choice
will be discussed later in this section.
In all the simulations conducted the number of tasks and processors was
kept constant - 25 processors and 20 tasks.
294
16.4.1
Arbitrary preference
In the first experiments (Fig. 16.4) it is assumed that the utility matrix elements
are functions with arbitrary values in some interval [O,A] ,A> 0. The smaller
A is, the closer in value are the elements of the utility matrix. This, on the
other hand, means that the tasks perceive the processors as being similar with
respect to the benefits of using them. Under these circumstances one would
expect the initial probability distribution to stay fairly even, because tasks
are not particularly encouraged to use one processor rather than another.
If, however, the values of A are increased, a preference is likely to arise and
the initially even distribution may generate structures which reflect the
different values of the matrix elements Gk,i' This evolution is reflected in
changes of the entropy. Figures 16.4(a) and (c) show the time evolution of
the expected fractional task distribution on the system's processors for two
different values of A. Figures 16.4(b) and (d) show the corresponding task
entropy evolution. Both Gk,i and {3 are kept at fixed values during the
simulation.
(a)
32
30
28
26
22
2._
A_ 100
~.'O
f28
i28
A.100
6\0
2_
22
20
..".
(b)
(d)
Fig. 16.4 (a) and (c) represent the fractional distribution of tasks on the processors for two
different values of A:A = 1.0 and 10.0; (b) and (d) show the evolution of the entropy functions
associated with the tasks.
SIMULAnONS
16.4.2
295
The self-confident choice
Next, the assumption that Gk,i = Pk,i' i.e. task k believes its benefits of
using processor i are directly proportional to the probability that it is using
that processor already, makes equation (16.5) nonlinear in Pk,i' and implies
that, as certain tasks increase their usage of particular processors, the more
likely they are to use them in the future. In this case, redundancy
(unemployment) for a number of processors would be expected, at least for
sufficiently high values of (3. A few results using this choice are presented
in Figs. 16.5(a)-(d).
In this case, it is assumed that the tasks are programmed in such a manner
that they respond to the present probability distribution of tasks on the
processor system. A knowledge of this probability can be achieved if the tasks
record the past pattern of probability distribution, i.e. an estimate for the
present probabilities is achieved by examining the past usage of the various
.. 6
p.200
I.
l5.
(a)
(c)
3.5
~ 25
i
9
20
, 5
Fig. 16.5
~ 2.5
i
~.200
1Ii
20
J!l
' 5
Fractional task distribution, (a) and (c), and task entropy evolution, (b) and (d),
when the gain matrix G k i is equal to the probability distribution Pk i. The graphs show the
,
two cases (3 = 20.0 and 50.0.
.
296
processes. In an economic analogy it is rather like customers responding to

their expectation of prices rather than the prices themselves. An estimate of
prices can be achieved from the recordings of previous price patterns. In real
life market economies, this is a very realistic situation. Most people are more
likely to visit one particular service provider with their specific service
requirements. This choice is based on their direct or indirect experience with
the price and quality of the service provided in the past. In many cases choice
is based on likelihood rather than certainty.
16.4.3
The limited self-confident choice
The self-confident choice represents a special case which has some

shortcomings. The main one is the fact that after a while, given that the gain
parameter is sufficiently high, the majority of tasks will be given to only a
few processors which are kept busy whilst many of the processors run at low
activity. This cannot be a satisfactory situation as it is unlikely to take
advantage of the network's actual potential. Furthermore, it assumes that
if a particular task is, with a high likelihood, being dealt with by one particular
processor, then that is a good arrangement only to be enhanced by further
usage. All future decisions will only improve on the initial decision,
irrespective of that decision possibly being incorrect. In real life, this situation
can arise when our likely selection of certain service providers is never critically
reviewed nor revalued in the light of new competition that has emerged more
recently. This could be called a 'once good, always good' situation.
It is possible to introduce some feedback effects into the utility function
which renders the processors less attractive when their probability of being
used exceeds certain critical values. This can be achieved by the following
form for the utility matrix:
... (16.11)
By making this choice, it is guaranteed that every processor becomes less
attractive if its probability of occupation exceeds a certain critical limit. The
value of b, which could be made dependent on the processor index, would
take into account the limited resources available to the processor. The critical
limit depends on a and b as follows:
pc
k,i
a
2b
Figure 16.6 shows the form of
... (16.12)
Gk,i
for a= 1.0 and a few values of b.
SIMULATIONS
1.0
297
the gain function Gk.i as a function of the

probability distribution, Gk.i = Pk.i - bPk.i 2
0.8
b=0.5
b=0.75
0.2
probability distribution, Pk.i
Fig. 16.6
The gain functions as they depend on the probability distribution and four different
values of b.
A number of simulations have been run for different values of b. The

parameter a has been kept at the value a = 1.0. To assess the effect of different
values for the parameter b the experiments have been run for fixed (3 values
and for different b values. Figures 16.7(a)-(d) represent the expected fractional
task distribution for (3 = 20 and two different values of b, b = 0.5 and 1.0.
The essential advantage of the matrix in equation (16.11) is the fact that
it makes the situation less likely that anyone processor is overloaded with
tasks at the expense of other processors. The feedback effects due to the
quadratic term in equation (16.11) lead to a more even distribution on the
processor system which is more likely to make a better utilization of its
resources.
16.4.4
Do as the others do
As an alternative to previous choices,

Gm,i =
(it) E
nr!m
Pn,i'
Gm,i
is now set as follows:

... (16.13)
i.e. processor i becomes more attractive to task m the more it is used by other
tasks. Two instances of this choice are represented in Figs. 16.8(a)-(d) for
two different values of the gain parameter, {3 = 20.0 and 50.0.
298
DYNAMIC TASK ALLOCAnON
,....
(a)
3'
~ 30
f30
i
~~
28
~
b-0.5
28
.
26
2i':x,~~200
"
20
..".
40
20
(b)
00
10
-.
(d)
Fig. 16.7
The gain matrix is given in section 16.4.3. Graphs (a) and (c) give the fractional
task distribution on the processors and graphs (b) and (d) represent the associated task entropies.
It is noticeable that the task distribution becomes more, even with increasing values for b. This
is reflected in overall increased values for the task entropies.
It is obvious from equation (16.13) that the value of the utility matrix
element Gm,i is close to the average probability Pi of the tasks using
processor i. Indeed equation (16.13) can be rewritten in terms of this average
probability as follows:
Gm,i = Pi -
(~) Pm,i
... (16.14)
Due to M being a large number in general, in this case M = 20, it is found

that Pi "",~, Vi, j, i.e. the perceived utilities, in using anyone of the
available processors, are very similar. Therefore, as long as the gain factor
(3 is not too large, the condition in equation (16.13) leads to a fairly even
distribution as demonstrated in Figs. 16.8(a) and (b). It is only for very large
gain factors, like {3 = 50.0, that an uneven distribution is achieved.
PROCESSOR ENTROPY
~r
299
p.200
. . .0
... 0
j
13~
20
0.
""'"
IlfCCOSOO"
(a)
(e)
~2~
-eI 20
Il
11~~20
80 60
1~
IlfM
(d)
(b)
10
00
.0
tasks
Fig. 16.8 The fractional distribution of tasks and the associated task entropies where the
gain functions are chosen as in section 16.4.4 Results for two different gain parameter values
are displayed.
16.5
THE PROCESSOR ENTROPY
In section 16.3 the concept of a task entropy was introduced. For each task
one entropy function was defined. It was seen that the time evolution of the
task entropies gives some information on the utilization of the processor
system. However, the limitations of the task entropies as a watchdog for the
utilization of the whole processor system were also discussed, and it was
pointed out that the only real information they give is how the individual
tasks are distributed on the available processors. For example, a high value
for the entropy of the kth task only means that this particular task has a
close-to-even probability distribution over the system's processors. The task
therefore does not express any real preference for anyone processor.
300
If all the task entropies are high, none of the tasks has a preference for
anyone of the available processors. Under these circumstances, the processor
system would be well utilized, with the tasks evenly distributed over the
processor system. If, on the other hand, each of the task entropies is very
low, one knows that all the tasks have strong preference for only one or a
few of the processors. However, by considering the task entropies alone, one
cannot decide whether all the tasks have preference for the same few or
different few processors. One cannot therefore reliably assess the utilization
of the whole processor system. This point is demonstrated by analysing the
results of some of the simulations discussed in previous sections.
Figure l6.4(b) only demonstrates that each task has a similar entropy
value. This does not mean that all the tasks are evenly distributed over all
the processors as is clearly shown in Fig. l6.4(a). The graph in Fig. l6.4(b)
says only that the tasks have a similar distribution. This fact is demonstrated
in Fig. l6.9(a), from where it can be seen that the probability distributions
for the tasks are similar, but that they are not necessarily even, which explains
the distribution as represented in Fig. l6.4(a). On the other hand, the fact
that the task entropies in Fig. l6.4(b) are relatively high means that the task
distribution is fairly even. This results in reasonable system utilization. The
distribution in Fig. l6.9(b) explains in the same way the results demonstrated
by the graphs in Figs. l6.4(c) and (d). Here, the individual task entropies
are lower than in Fig. l6.4(b), resulting in poorer system utilization. A look
at Fig. l6.4(a) shows that no single processor receives more than 5010 of the
total amount of the work-load and no processor has less than I %. The
situation is completely different in the case demonstrated in Fig. 16.4(c), when
one processor received more than 25010 of the total work-load and about 15
processors are almost idle.
It is concluded that the task entropies alone are not, in general, a reliable
measure for the utilization of the processor system. Their main value lies
in the fact that they measure the suitability of the whole processor system
for a given task. The processor system is particularly well suited for the
execution of a task if, as a result of the bidding process, it is likely to be
given to anyone of a large number of processors.
A quantity better suited for monitoring the total distribution of tasks on
the whole processor system is the expected fractional task distribution
introduced in section 16.2. In terms of this distribution function the 'processor
entropy' is defined as follows:
N
S = - E !nln!n
n=1
... (16.15)
PROCESSOR ENTROPY
~
';::
A:: 1.0
0.8
p:: 1.0
iii
'90. 6
.>t:
'"
.?' 0.4
:5
jg 0.2
c.
(ij
20
'""
10
processors
20
25
(a)
A:: 1.0
/3:: 1.0
5
(b)
Fig. 16.9
on available processors
The probability distribution of tasks
(with reference to Fig. 16.4).
301
302
where in is defined as in equation (16.3). This function gives a better

measure for the utilization of the processor system. In particular the time
evolution of S gives information on how the redistribution of tasks is taking
place and whether it reflects a strong preference for one or few processors
or not.
Together the two entropies - the task entropy and the processor entropy
- give a good picture of the suitability and the utilization of the processor
system. The processor entropy measures the global utilization of the processor
system, i.e. how the total distribution of the task deviates from an even
distribution. In general, the objective would be for the processor entropy
to take on maximum values, reflecting an even distribution, but, because
of the heterogeneity of the processor system and the diverse nature of the
incoming tasks, this aim cannot usually be achieved. A distribution taking
into consideration the different resources available to the processors will be
discussed briefly in the following section.
16.6
CONDITIONS FOR COMPETITIVE EQUILIBRIUM
This section briefly discusses how some general problems from the theory
of economic equilibrium relate to the model developed in this work. It is
demonstrated how a task distribution, which maximally utilizes the systems
resources, can be constructed by maximizing the task entropies subject to
constraints given by the available processor resources.
It is tempting to take the approach to the task allocation problem in which
the processors are looked at as being consumers and the tasks as commodities.
Each consumer has limited resources and can therefore only consume a subset
of all available commodities. The aim of an effective allocation procedure
is to distribute the commodities (tasks) on the consumers (processors) such
that the total consumption exhausts the total resources. This point will now
be discussed in terms of the equilibrium states of the dynamical allocation
equation (16.5), i.e. states which satisfy the stationary condition:
... (16.16)
Let R m n be the resources available to processor n in dealing with tasks
of type m'. Then the allocation, described by the distribution p~ n' is
optimal only if:
N
r;
n=1
p(mlpO
n
fi,O
= r;
n= 1
R m n' 'rim
'
... (16.17)
CONDITIONS FOR COMPETITIVE EQUILIBRIUM
303
where p~m) is the resource expenditure of processor n when dealing with

task m.
A distribution is said to be Pareto optimal if no processor can be made
better off without making other processors worse off, under the constraints
of equation (16.17). When a distribution which satisfies this condition is
realized, no processor can improve its position without making some other
processor worse off in terms of tasks. It is a problem of essential importance
to identify the conditions under which the allocation dynamics (equation
(16.5 lead to and sustain the Pareto optimality. For interesting comments
on this problem, see Wallich and Corcoran [13].
There follows a description of how a stationary task distribution satisfying
the condition of equation (16.17) can be constructed by using the principle
of maximum lack of knowledge [14], with the variables p~m), 15,n5,N5,
m 5, M as a set of M functions with different values for anyone of the N
processors. Then, the expression:
N
(p(m) = 1: p~m) P~,n

n=l
... (16.18)
has the meaning of an expectation value for p(m) in the equilibrium

distribution P~ n' The normalization equations (16.2) and (16.18) are treated
as constraint conditions which have to be satisfied by the stationary
probability distribution P ~ n' The task entropy is maximized subject to the
constraints of equations (16.2) and (16.18). This optimization problem can
be solved by using the method of Lagrangian parameters. The details of the
calculation will not be given here but the result for the task distribution [11]
is:
... (16.19)
The "'O,k parameters can be found from the normalization condition in

equation (16.3) to satisfy:
"'O,k
1n
(m~ fXp( - "'kPg )

"'k
... (16.20)
The parameters can be computed from the normalization constraints

in equation (16.18).
The expression given by equation (16.19) leads to an optimal distribution
on the processor system subject to the constraints in equations (16.2) and
(16.18). Similar considerations can be applied to the maximization of the
processor entropy. The details of that calculation will not be given here. In
the allocation of tasks on the processor system, as described by the dynamical
304
set of equation (16.5), this optimal distribution can serve as a benchmark

against which the actual distribution is compared.
16.7
DISCUSSION OF RESULTS AND IMPLICATIONS
A new model for the distribution of tasks on an open heterogeneous

multiprocessor system has been introduced and results of some simulations
represented. The model offers an alternative to and a generalization of the
model developed by Kephart et al [4] and Hubermann and Hogg [5]. The
first two sections of the work developed the basic concepts of the model and
established the dynamical equations which describe the continuous
redistribution of tasks on the processor system. The process of redistribution
is a probabilistic one controlled by the momentary benefits the tasks see in
using one or another processor. Because each task is equipped with only a
limited information on the availability and capability of the processors, its
knowledge of the benefits is incomplete. The decision to opt for one particular
processor is therefore not necessarily the best possible option - it is only
the best one subject to the available information.
An important quantity in the model is that of the task entropy. Its values
measure the probability distribution of the task on the processors. In
particular the task entropies tell how one task has been distributed on the
processors, i.e. evenly or with some structure. The task entropy gives a general
measure for the suitability of the available processors in dealing with the tasks
entering the system. A sharp probability distribution leading to low values
for the task entropies implies that the tasks consider only a few processors
for the execution of their requirements. This situation can have two different
causes. One is that the specification of the tasks in need of execution has
only been made available to a small number of processors, so that processors
potentially in a position to deal with the tasks have not had the opportunity
to put in bids for the tasks. The other possible reason for the narrow
distribution of tasks is that just a small number of processors are capable
of dealing with the tasks and consequently they only consider these few
processors in the allocation process. In this second case, the competition
between the processors bidding for the tasks is not an effective one as most
of the bids are simply neglected. This might indicate the need to add new
processors to the open processor system.
Section 16.4 described the results of simulations with four different choices
for the utility function and for a number of different values for the gain
parameter. Initially the utility matrix elements are randomized within intervals
of three different sizes. For the smaller intervals the distribution of tasks
is relatively even and this is reflected in high values for the task entropies.
RESULTS AND IMPLICAnONS
305
Rescaling all the utility matrix elements by the same amount increases the
variance in the benefits as perceived by the tasks. This results in an increasing
workload for some of the processors and less work for others. The results
for this type of utility matrix were presented in Figs. 16.5(a)-(d).
By putting the utility matrix elements equal to the momentary probability
distribution, Gk,i = Pk,i (the self-confident choice), the perceived utility
changes as the system evolves. As the initial values of Pk,i are not perfectly
even, this kind of choice will eventually lead to a preferential distribution
of the tasks, at least for sufficiently high gain parameters. This was
demonstrated in Figs. 16.5(a)-(d) for two different gain parameters.
The self-confident choice has some shortcomings as discussed in section
16.4.3. These can be rectified by introducing a feed-back effect into the utility
function (the limited self-confident choice). This guarantees that a processor
becomes less attractive if its usage exceeds a certain critical limit. The effects
of this choice on the expected fractional task distribution and the entropy
evolution can be seen in Figs. 16.7(a)-(d). The threshold defined by the critical
limit (equation (16.12 relates to the resources available to the processor.
When these are fully stretched the processor becomes less attractive and tasks
are allocated to alternative processors.
Finally the option is considered of making a processor more attractive
to task m the more it is used by other tasks. As demonstrated in section 16.4.4
this choice leads to a very small variance for the values of the utility matrix
and consequently a fairly even task distribution (Figs. 16.8(a) and (b. A
sharp deviation from an even distribution can only be observed for large gain
parameters, {3 = 50.0 (Figs. I6.8(c) and (d.
Earlier sections have discussed at some length how the task entropy gives
valuable information on the probability distribution of the allocation of tasks
to the processor system. It supplies a metric for the suitability of the processor
system for the execution of the incoming tasks. However, it does not, in
general, present a reliable metric for the actual utilization of the whole system.
To supplement the utilization metric a scalar quantity, called processor
entropy, is introduced. The processor entropy gives information on how the
totality of tasks has been distributed on the processors. Figures 16.10-16.13
plot the processor entropies of all the different utility function choices
discussed in section 16.4.
In general, the different resources available to the various processors will
put constraints on the probability distributions under which the entropy
functions are to be maximized. This important point is discussed in section
16.6. It is demonstrated how this optimal task distribution can be found by
applying the principle of maximum lack of knowledge. As the simulations
of section 16.4 have clearly demonstrated, the choice of the utility function
fixes the distribution of tasks on the processor system. Bearing in mind that
306
DYNAMIC TASK ALLOCAnON
3.5
A= 1.0, 13 = 1.0
<Il
Cll
~3.0
C
Cll
o
<Il
<Il
Cll
K2.5
()
2.0!------:~--l:__---=~===-~-___J
Fig. 16.10
The graph shows the processor entropies for the case represented in Figs. 16.4(a)(d) but now for three different values of A.
3.
13 = 20.0
<Il
.~ 3.0
cCll
o
<Il
<Il
Cll
g 2.5
a.
2.0't,-_ _+
__--J..,......-_ _
~--~--~
time
Fig. 16.11
Processor entropies for two different gain parameters and a gain function
set equal to the probability distribution. The associated task distributions are given in
Figs. 16.5 (a) and (c).
maximum utilization of the processor system is to be attained, the major

problem to be solved is to find a utility matrix which simultaneously represents
correctly the computational expertise available on the system and optimal
distribution of tasks. The nature of this problem has been discussed in
section 16.6.
RESULTS AND IMPLICATIONS
307
3.250
en
Ql
'0.
-E
(3 = 20.0
3.240'
3.230'
Ql
oen
~
b= 1.0
3.220
3.210
3.200L-_ _....L._ _---J
Fig. 16.12
..L..-_ _....,.a-_ _~
time
The processor entropies for the case presented in Figs. l6.7(a)-(d) for three different
values of b.
3.5
3.0
en
Ql
'0.
gc 2.5
Ql
oen
2.0
i5..
1,5
_ _~:--_ _-':-_ _--:':-_ _--:':
1.0~
Fig. 16.13
time
The two processor entropies for the task distributions represented in

Figs. l6.8(a)-(d).
The practical implication of introducing the task and processor entropies

is the fact that they provide, in terms of only M + I scalar variables (M is
the number of tasks), a reliable metric for the utilization and the suitability
of the entire processor system for dealing with the incoming tasks. The
advantage of this measure as compared with the probability matrix elements
P m,n or the expected fractional loads in is that the constraints imposed by
308
the limited resources of the processors can be incorporated into optimal

entropy functions (section 16.6) which can serve as benchmarks for optimal
utilization and suitability.
16.8
CONCLUSIONS
Over the last 20 years or so a number of schemes have been developed to

provide efficient distribution of tasks (work-load) on heterogeneous multiprocessor systems. Most of the schemes assume that on arrival a task is
allocated to a processor on the basis of a predetermined distribution
algorithm. More recently some evidence has emerged [4, 5] to suggest that
centrally regulated multiprocessor systems which operate on the basis of
predetermined task allocation are not very efficient as they do not use the
processing power of the multiprocessor system optimally.
The introduction of economical and ecological principles for the
management of computational systems offers new means to more efficient
and cost-effective management [4,5, 11] . Some computer networks charge
flat rates for users entering the network, which does not appear to be sensible
taking account of all the services on offer. Furthermore, the 'flat-rate'
approach invites customers to overuse the system. Consequently it appears
natural to charge for these services on the basis of both the costs of providing
them and the customers' need for them. It is important that network operators
realize these differences and offer the services to their customers on the basis
of this understanding. The quantitative model, developed as part of this work,
lays out means for achieving this [11].
The market laws defined by the utility matrix in equation (16.1) determine
the task distributions arrived at when the processor system reaches
equilibrium. The analysis of these equilibrium states is important for a proper
understanding of the system efficiency. As the processor system is an open
one, new services are constantly being added and these temporarily shift the
equilibrium of the system. Continuous monitoring of the new equilibria is
essential for efficient system utilization. Ideally the system should monitor
itself and reconfigure the task distribution so as to reach a highly efficient
quasi-equilibrium. This process of self-regulation is achieved by continuous
modification of processor bids and consequent changes of the values of the
utility matrix elements.
If a job can be subdivided into a number of smaller tasks, they may well
be given to several different processors. In this case, procedures for the
CONCLUSIONS
309
Fig. 16.14 The formal representation of the processing of jobs on a heterogeneous network.
The incoming job is split into different tasks in a suitable way and the network's processors
are informed of the presence of these outstanding jobs. They are invited to send in bids which
will be evaluated by the job provider. On the basis of that evaluation, the job provider writes
down a gain function which quantifies the perceived benefits of using anyone of the processors
which have submitted a bid. The task solutions are returned by the individual processors to
be joined into a final solution.
decomposition of a job and the subsequent synthesis of the derived solutions

are required (Fig. 16.14). For the approach followed here, it is not important
whether the task being processed is part of a larger job or not, since the
principles of processor selection are the same in both cases.
The formalism developed in this work is a general one, i.e. no particular
structure has been imposed on the incoming tasks nor on the special
capabilities of the processors in the heterogeneous system. A structure has
to be imposed when concrete applications are to be considered. The potential
applications of this model are not necessarily restricted to those of task
distribution on a system of processors. It is easy to envisage applications to
fields such as a fault repair viability analysis where the utility function would
capture the perceived benefits of a repair as compared with the consequences
of taking no corrective action.
310
REFERENCES
1.
Hewitt C: 'The challenge of open systems', Byte, 10, pp 223-242 (April 1985).
2.
Huberman B A (Ed): 'The ecology of computation', Elsevier Science Publishers

BV, North Holland (1988).
3.
Sutherland I E: 'A future market in computer time', Communications of the

ACM, .!.!.' No 6, pp 449-451 (June 1968).
4.
Kephart J 0, Hogg T and Huberman B A: 'Dynamics of computational

ecosystems', Physical Review, A40, pp 404-421 (1989).
5.
Huberman B A and Hogg T: 'The behaviour of computational ecologies', in

Huberman B A (Ed): 'The Ecology of Computation' , Elsevier Science Publishers
BV, North Holland (1988).
6.
Shoch J F and Hupp J A: 'The 'work' programs - early experience with a

distributed computation', Communications of the ACM, 25, No 63, pp
172-180 (March 1982).
-
7.
Litzkow M J, Levy M and Mutka M W: 'Condor - a hunter of idle

workstations', Proceedings of IEEE International Conference on Distributed
Computer Systems, pp 104-111 (1988).
8.
Waldspurger C A, Hogg T, Huberman B A, Kephart J 0 and Stornetta W S:

'Spawn: a distributed computational economy' , IEEE Transactions on Software
Engineering, .!.., No 2, pp 103-117 (February 1992).
9.
Miller M S and Drexler K E: 'Markets and computation: agoric open systems',

in Huberman B A (Ed): 'The Ecology of Computation', Elsevier Science
Publishers BV, pp 133-176, North Holland (1988).
10. Olafsson S: 'A model for task allocation' , Internal BT Report (September 1993).
11. Olafsson S: 'A general model for task distribution of an open heterogeneous
processor system', IEEE Transactions on Systems, Cybernetics and Man, 24,
Pt II (1994).
12. Chandler D: 'Introduction to modern statistical mechanics', Oxford University
Press (1987).
13. Wallich P and Corcoran E: 'Games that networks play', Scientific American,
p 92 (July 1991).
14. Jaynes E T: 'Information theory and statistical mechanics', Phys Rev, 106,
pp 620-630 (1957).
-
17
COMPLEX BEHAVIOUR IN
NONLINEAR SYSTEMS
C T Pointon, R A Carrasco and M A Gell
17.1
INTRODUCTION
As the competitive information age evolves, communications systems will

become increasingly complex as networks of networks emerge and competing
operators strive constantly for competitive edge [1, 2]. Under conditions
of intense competition, global communications systems will exhibit increasing
levels of diversity, distribution and decentralization as new networks, systems
and services are developed, installed and brought on-line. As different services
may be offered by different networks, networks will not only compete against
each other but also co-operate by sharing resources. Different networks may
have different levels of reliability and may be managed to provide various
qualities of service. The operation of communications systems in the emerging
competitive industry paradigm will be radically different from operation in
the public utility paradigm. Increasing diversity of services will be brought
about by numerous operators competing for custom, using diverse technologies. Increasing distribution of communications systems will be brought about
by globalization of markets with its inherent redistribution of centres of
information production and consumption [1]. Increasing decentralization
will arise as a result of increasing functionality and distribution in systems,
increasing diversity of users' requirements and proliferation of types of
information networks operated under various regimes of ownership.
312
BEHAVIOUR IN NONLINEAR SYSTEMS
An example of the requirement for decentralized control in communication systems is within the context of emerging communications free-trade
zones (CFTZ), in which numerous communications service providers will be
operating different networks with differing characteristics [3]. Differing
services may be offered by different networks; networks will be competing
against each other and using resources in other networks; different networks
may have different levels of reliability and may be managed to give various
qualities of service. In such zones, networks will have to interconnect and
interoperate as both users and network operators draw upon resources
scattered within the disordered communications conglomeration.
The communications conglomeration in a CFTZ will consist of large open
collections of locally controlled, asynchronous and concurrent processes
interacting with an unpredictable environment. Decisions made at any point
in the system will be based upon local, imperfect, delayed and conflicting
information. Such systems will have operational characteristics which are
likely to be very different from those of the homogeneous network in the
public utility paradigm dealing with one type of information (e.g. voice)
controlled by a central office. Decentralized control of communications and
computational structures will become an overriding prerequisite for the
integrity and security of the highly complex communications systems which
will emerge [3, 4] .
With increasing network complexity and correspondingly increasing distribution, it will become essential to develop decentralized control and coordination mechanisms: it will become increasingly difficult to control global
networks with centralized control systems. Globally distributed networks will
raise many new problems, particularly in the areas of signalling and signal
processing as many networks evolve and operate asynchronously.
New ways of engineering communications and processing systems will
be required to take account of the high levels of diversity, distribution and
decentralization. This will lead to fundamental changes in terms of the ways
in which command, co-ordination and control functions are perceived; interworking will raise many issues which go far beyond basic issues of protocols
for interconnection. Since the complete information exchange required for
a centralized decision-making process may not be feasible, particularly as
many information and communications systems are by nature decentralized,
practical constraints may make distributed decentralized control mandatory,
especially for extended multi-commodity service and network systems with
rapidly varying service configurations and user demands. The co-ordination
of decentralized decision makers is, however, a formidable problem.
Decentralization by its very nature introduces uncertainty into the decision
process - remote components of the same system can only have limited
information about each other and the overall system. Hence, decisions must
INTRODUCTION
313
be made by individual controllers that have access only to partial information

regarding the state of the system. This leads to inevitable inconsistencies
between local and global optimization and contention may arise among
decision makers, particularly when faulty or overloaded processors are
operating.
In a complex communications system containing many decentralized
controllers, there may be large uncertainties regarding system parameters,
actions and the frequency and nature of external events. As telecommunications and computer-based systems become more complex and demands for
ever increasing adaptability with improved reliability and security are made,
the nature of the information and service kinetics will become critical.
Instabilities, perhaps within processor scheduling, resource allocation,
message transfer and evolving information retention and manipulation
systems, will arise and cannot be avoided. Thus, there is a general need to
view communications systems as dynamical systems whose time behaviour
can substantially influence performance and may even lead to degradation,
irregular operation or failure of a system. Such irregularities and failures
are already frequently experienced in communications and computation
systems.
Field data taken from a telephone system has highlighted the occasional
proliferation of unusually large delays in the processing of the underlying teletraffic [5, 6]. The unusual behaviour is related to contentions within the
data switching systems which give rise to traffic synchronization - under
certain circumstances, the system may begin to batch its workload and degrade
into chaotic behaviour. The behaviour of the system is very sensitive to
switching conditions and under certain circumstances may slip into states
of high degradation, which give low quality of service. These effects have
been studied numerically using a simplified model for a switching exchange
[5, 7]. The studies have revealed a wide range of dynamical behaviours.
Knowledge of such behaviours is important in the operation of systems with
guaranteed levels of performance. Although a system may be well-designed
at the outset, in-service operation, involving, for example, interconnection
with other networks, consolidation or failure of certain nodes or service
centres, may lead to qualitatively different behaviour. The unusual and often
persistent degenerate behaviours which have been observed in real telephone
systems and data processing systems will occur also in more sophisticated
multi-commodity service systems. Thus, interworking between competing
networks will bring new problems of control and co-ordination which will
need to be addressed in real and live communications systems.
In order to provide a platform for exploring some of the key issues
associated with co-ordination and control in such complex service systems,
a novel signal processing system has been designed and implemented. The
314
system incorporates a Texas Instruments TMS320C25 digital signal processor

(DSP) which is interfaced to a reconfigurable array of nonlinear data
processing and service provision elements [8, 9]. The array is based on field
programmable gate arrays in very large scale integration (VLSI) and allows
a wide range of network configurations, network hybrids and conditions to
be explored with different types and numbers of processing elements.
The novel architecture for studying complex behaviour in nonlinear processing systems can be configured in real time to set up traffic patterns in
the system, and perform real time data acquisition. The system provides a
platform for the exploration and analysis of signalling in heterogeneous
environments and to begin exploring large-scale fault-tolerant systems. The
system that has been devised will enable new techniques for signal processing
in heterogeneous networks to be explored. Issues of chaos and instability
in communications networks are beginning to be recognized as highly significant to the integrity of future communications systems. As such, the work
reported here raises many important issues concerning the maintenance of network integrity under conditions of competitive interworking of communications service systems and is part of a newly emerging research field which
is rapidly gaining momentum.
17.2
NONLINEAR DATA PROCESSOR SYSTEM MODEL
Service processing elements in telecommunications networks can be visualized as processing units in a multi-process environment which handle a certain
number of tasks including service originations, collection and processing of
service requests, disconnects and overhead activities including operating
system overheads and network management audits. Figure 17.1 shows a
simplified processor schedule for the processing unit in a typical switching
system.
The system operates as follows. A timer of duration T is initiated by the
overhead routines at the start of a processor cycle. The processor completes
all outstanding jobs in the higher priority data process queue before proceeding to a lower priority data process queue. Following the expiration of
the timer, the processor suspends work on any outstanding jobs and initiates
a new cycle. Note that the timer value T effectively determines the nature
of the processor schedule. If T is small, the schedule resembles a non-preemptive priority schedule, whilst a large T has the effect of assigning equal
priority to each of the queues in a schedule that is indicative of polling. The
application of this model can be viewed within the following contexts:
NONLINEAR DATA PROCESSOR SYSTEM MODEL
315
t=O
overhead
activities
decreasing
priority
collection and processing of service

requests data process queue
service originations
data process queue
t<=T
Fig. 17.1
Idealized data processor schedule.
the processing time for the low-priority service originations is much

smaller than the processing time for the high-priority collection and
processing of service requests' tasks; this corresponds to the practical
situation where the origination task involves the connection of the
originator to a receiver in order to collect service addresses - in contrast,
much of the work on a service request occurs with the arrival of the last
digit of the service address, such as the translation of the address and
connection of the required service to the caller (Le. client/server
connection);
the interval of time (signalling duration) of the service originations and

the reception of the last service address digits, known as the 'fixed
feedback delay', is constant.
Figure 17.2 shows a block diagram of the switching system under consideration. The operation of the switching system is as follows. Inbound data
I[k] arrives at data process 1 (QI [k]). This process introduces a fixed time
delay d into the data stream. Once served (at the rate R t ), the data 0 1 [k]
is forwarded to data process 2 (Q2 [k]) where it is processed at the rate R 2
and subsequently passes out of the system (given by O2 [k]). Prioritization
is introduced into the system by serving data process 2 before data process
1. Operating on a fixed time cycle, the processor decides which data process
to perform based upon the quantity of data in each data process QI [k] and
Q2 [k]. If data is present at the second data process queue, then it will be
processed first. Within any given cycle, data in data process 1 will be served
only if data process 2 has been emptied. If both data processes are emptied,
the system waits until the beginning of the next time cycle. This represents
a non-pre-emptive schedule with data process 2 having the highest priority.
The data flow through the system may be studied using discrete time analysis
316
data process 2
queue
O,lk)
delay
d
outbound
data
inbound
data
Ilk)
data process 1
queue
Q,lk)
Fig. 17.2
Block diagram of the nonlinear data processor system.
(see Appendix A) and the following four coupled equations describe

completely the evolution of the system.
... (17.1)
min (Q2[kj +Odk-dJ, R 2)
dkj
min ( Qdkj +I[kJ,R, ( 1 -
02[k j ))
... (17.2)
R(
Qdk+l]=f ( Qdkj+I[kj + - (Q2[kj-Q2[k+lj
R2
+ Qdk-dj-Qdk-d+ I] +I[k-dj-R 2)
...
(17.3)
Q2[k+lj = f(Q2[kj +Qdk-dj-Qdk-d+lj +I[k-dj-R 2)

... (17.4)
where the function min(i ,J) is the minimum of the two real arguments i and
j and, for some argument x, the function f is:
x+lxl
f(x) = - - - which is an irreversible nonlinear function.
Equations (17.1) and (17.2) express the contention and priority of the
two data process queues whilst equations (17.3) and (17.4) are conservation
rules. Since equations (17.3) and (17.4) describe a nonlinear system with
feedback, an analytic treatment of the flow equations is not feasible.
However, it is possible to perform a certain amount of analysis by generating
the system's attractors for different service loading conditions using functional
iteration. For telecommunications applications, an understanding of the
NONLINEAR PROCESSING SYSTEM ARCHITECTURE
317
system's reaction to an input data stream that exhibits occasional peak values
is essential. In order to simplify the problem, a digital switching function
is considered with a peak value, which occurs for the duration of one
processor time cycle with amplitude I p at several instances of time. The
switching function can be described as:
f=Inorm+I[k-To] +~ I[k-Til - ~ I[k-Tj]
J
for i
2,4, 6, ... ,N and j
where I[k-T] =
Inorm
[ I peak > I max
1,3, 5, ... ,N-I
... (17.5)
k< T
... (17.6)
k?:.
and N is the total number of sample intervals, k is the time step (sample),
and Tj are the switching time variables, and I max is the maximum
steady-state capacity of the data processor.
Numerical simulations of this nonlinear data processor system model have
revealed four basic operational modes in which the system either returns to
its original state after a peak loading, enters into long-lived oscillatory
behaviour, degrades into chaotic states or results in unbounded (overloaded)
behaviour.
TO, Tj
17.3
The block diagram of Fig. 17.3 shows the architecture of the signal processing
system for studying the complex behaviour of the nonlinear array. The
personal computer (PC) provides the man-machine interface to the processing
system and executes a suite of host software which allows:
a given set of user parameter values to be downloaded to the DSP target

module, via a standard RS232 serial communications link;
data obtained from the array and processed by the DSP to be uploaded
to the PC for display in commercial graphics software packages.
In realizing the data acquisition and signal conditioning demands of an array

consisting of potentially thousands of elements, the processing system has
scope for the interconnection of several DSP modules in parallel, which can
collate in real time, large amounts of data emanating from the array and
initiate signal transformation and processing techniques using digital methods.
318
PC host computer
: serial communications
RS232 interface
output data rate
Texas Instruments
TMS320C25
data process 1 length
digital signal
data process 2
processor
length
target system
address bus
control bus
data bus
, ,
data
acquisition
interface
data
generator
process
rate 1
process rate 2
input data
data processor clock
Fig. 17.3
array of nonlinear
data processing
elements
System for studying complex behaviour in nonlinear processing systems.
The DSP target module provides a convenient interface for both the
generation and acquisition of data to and from the array respectively. The
user-defined array parameter values are assembled by the DSP for onward
transmission to the internal registers of the data generator hardware.
Conversely, data obtained from the array by the data acquisition interface
are received by the DSP for digital signal processing operations and returned
to the PC host. Signal transformation processing techniques (e.g. phase space
decomposition, Fast Fourier Transform) are used to analyse real time dynamical behaviour of the heterogeneous array.
The array has scope to emulate global information networks consisting
of thousands of elements. First results from the array, programmed initially
as cascades of double-process elements with simple embedded nonlinearity
[5-8] , have reproduced traffic phenomena which have recently been revealed
in teletraffic studies in real telecommunications networks. The array provides
a rapid and flexible method of exploring phenomena in diverse communica-
319
tions and computation systems. In particular, the reconfigurable array allows

the study of fault tolerance in the co-ordination and control of asynchronous
disordered networks of heterogeneous processors.
17.3.1
Processing system implementation
The processor system was realized in VLSI using field programmable gate
array devices with two intensive shift register fabrics representing the two
data process queues connected in cascade. The data processes provided storage
for the two data streams contending for execution by the data processor.
The flow of data between the data processes was controlled by a data direction
control logic unit in the form of a Moore algorithmic state machine.
A block diagram of the data processor element is shown in Fig. 17.4.
The data processor element represents the nonlinear data processor system
and consists of two data processes that are connected serially and their
operation controlled by the data processor controller. A peripheral device
which provides the source data to the system is also shown.
The data processor element operates as follows. When the Write] line
is asserted high, by the input data generator, data available on the Data line
is systematically written to the first data process. Both data processes are
identical in structure but differ in their interface connections. Whereas the
Data and Write] inputs on data process 1 are sourced by the data generator,
in the case of the second data process, these are provided by the Output line
of data process 1 and the Write2 line of the data processor controller.
The input clock to each of the data processes is provided by the data processor controller which takes one of two possible nominal rates. The first
rate Rload is used for loading data into the particular data process at a rate
determined by the master clock. The second rate R9I ff1oad for iE(1,2), is a
system parameter that determines the frequency at which data is removed
from each of the data processes. Furthermore, as the two data processes are
connected in series, a transfer of information between the first and second
data processes requires that the output rate of the first data process be
synchronized to the input data rate of the second data process. In either case,
such transfers are initiated by the data processor controller by means of the
Read], Read2 and Write2 signals, based upon the status of the output ready
lines - OPRdy] and OPRdy2 of the respective data processes.
17.3.1.1
Data processor controller
The data processor controller functions according to the data processor in

the nonlinear data processor system model and has two associated timers.
320
data processor
controller
Clk1
r--- read 1
r--- OPRdy1
Clk2
write 2
read2
OPRdy2
A-
master
clock
Y;
>.
:;;: 11l "0
<3
11l""
-0
~~ f-- data
-Ql
::Ie f-- write 1
e.G)
.!:
Ol
a:
0..
a:
0..
output
data process 1
Fig. 17.4
N N
>-"0
"0 11l
input
Ql
""
Ny
Ql N
=E=
;: ()
output 1--'output
data
data process 2
Data processor system implementation block diagram.
The first is responsible for the generation of the data processor fixed schedule
cycle and is self-sufficient, i.e. it generates the active low terminal count status
signal AckTimer at the appropriate moment and consequently asserts the
reset mechanism RstDe/ay.
The second timer is responsible for the generation of the status signal
AckDe/ay following a given number of elapsed processor time cycles. The
value of the counter variable can be changed as necessary to explore a wide
range of system performances.
Finite state machine description of the data processor controller

The functional specification of the data processing element is described as
a finite state machine using the state transition diagram given in Fig. 17.5.
At the beginning of a processor time cycle, given by StateO, the data processor
controller inspects the lengths of the two data processes. The higher priority
data process 2 is served initially by entering Statel, should it contain any
data. Furthermore, if data process 2 is serviced within a single processor cycle,
which is defined by the path StateO-Statel-StateO-State3, then data process
1 begins service within the same processor cycle. However, if data process
321
2 is not serviced within a single processor cycle, then it will continue to be

serviced immediately following the start of the next and subsequent (if
necessary) processor cycle(s), repetitively following the path StateO-Statel-
StateO.
OPRdy2
OPRdy1
Fig. 17.5
State transition diagram of the data processor controller.
If data process 2 is empty at the start of a processor cycle, but data process
1 contains data, then data process 1 will be served. If neither data process
contains data at the beginning of a processor cycle, then the system remains
idle until the next processor cycle, with the exception of inbound data entering
data process 1. This is defined by the path StateO-State2-StateO.
Data contained within data process 1 are processed by means of the data
processor controller initiating a fixed delay of duration d processor cycles
322
before the transfer of data from the first data process to the second data
process at a rate determined by the parameter R?ffload. Upon the expiration
of this delay the system remains idle until the end of the present processor
cycle, when a new cycle is initiated. This is achieved by traversing the
transitional path State3-State4-State5. Data contained within data process
2 at the beginning of a processor cycle are processed by removal at a rate
governed by Rfffload. As previously discussed, if this operation is performed
within a given processor cycle, then data process I resumes service. Otherwise
data process 2 continues to be served over a number of processor cycles if
necessary, until the entire contents of the data process have been removed.
Data processor controller equations
The state machine description is synthesised into a definition of the sequential

and combinatorial logic required to implement the transition and output
functions. Gray state assignment encoding, using the state variables p, q and
r as the present state variables for each of the six states (StateO to State5)
in the state transition diagram, was used to complete the algorithmic
description of the controller. The minimization of the resulting expressions
emanating from the associated state transition table, which are given by
Table Bl in Appendix B, lead to the following set of data processor controller
equations:
Read2 = Sel2 = P/\( - q)/\r
... (17.7)
Read] = Sell = Write2 = (- p)/\q/\( - r)
... (17.8)
Nextp = [( - AckTimer)/\p/\( - q)/\( - r)]

V [( -
OPRdy 1)/\( - p)/\( - q)/\( - r)]
[OPRdy2/\p/\( - q)/\r]
[OPRdy2/\( - p)/\( - q)/\( - r)]
... (17.9)
Nextq = [OPRdy] /\( - p)/\q] V [( - p)/\r]
... (17.10)
Nextr = [OPRdy2/\p/\( -q)/\r]

V [( V
AckDelay)/\( - p)/\r]
[OPRdy2/\( - p)/\( - q)/\( - r)]
V [OPRdy] /\( - p)/\( - q)]
v [( -p)/\( -q)/\r]
RstDelay = [( - p)/\( - q)/\r]
... (17.11)
... (17.12)
NONLINEAR PROCESSING SYSTEM ARCHITECTURE 323
17.3.1.2
The data process
The data processor model previously described is representative of a timeshared multi-process class of system that adopts a first-come first-served nonpre-emptive discipline, and therefore requires intermediary storage for the
processes that await the attention of the processor. The use of two intensive
shift register components naturally preserves the sequence of the arriving data
and permits the storage of any outstanding data that awaits execution by
the data processor.
The structure of each of the data processes is such that, at each data bit
position x, there exists a data register R x and an associated controller Con x '
Each register has three operating modes hold, load and shift with the
exception of the first stage which has only two (load and hold):
the hold mode maintains the contents of the associated data register bit
position for the duration of the current clock cycle;
the load mode is responsible for loading the associated data register bit
with the data value available at the immediate left data register output;
the shift mode is responsible for invoking a right shift operation on the
corresponding data register bit position.
These modes are derived for all stages (apart from the first stage) from the
corresponding controller outputs ~ and Ml (where x represents the
particular stage), which form the control inputs to a 3-to-l multiplexer at
the data register input. In the case of the first stage, a single multiplexer
control input M n_ 1 is used to define the two operating states load and hold.
Each multiplexer has an additional output enable En control input to ensure
data integrity.
Each controller utilises two states full and empty, which indicate the status
of the corresponding data register. There are three stages of controller element
- last, ith and first. The ith stage controller is a generic stage, and the
remaining two stages are its derivatives, with the constraint that the last stage
(controller) requires the shift output function, whereas the first stage does not.
Table 17.1 shows the input, output and register operating mode
combinations that apply to the first, ith and last stage controllers respectively Conn _ 1, Coni and Cono
A state transition diagram for the generic ith stage controller is shown
in Fig. 17.6, from which the set of state transition tables (given in Appendix
B) and resulting implementable set of equations relating to the first and last
stage controllers can be deduced.
324
Table 17.1
Data process controller input, output and register operating mode combinations.
First stage
controller
(Conn_i)
Inputs
Read, Write, R n -
ith stage
controller
(Coni)
2
Read, Write, R j _
Rj + 1
Last stage
controller
(Cono)
I'
Read, Write, R 1
Outputs
Rn_ 1
Rj
Output ready, R o
Register
operating
modes
Hold, Load
Hold, Load, Shift
Hold, Load, Shift
write' + write read +

write read' R'i-1
write read' i_1/1oad
read R'i+1 write'
read'/hold + read Ri+1/shift +

read R'i+1 write/load
Fig. 17.6
State transition diagram of the ith stage data process controller.
Data process controller equations

The state machine descriptions of the first, ith, and last stage data process
controllers are subsequently synthesized into a definition of the sequential
and combinatorial logic required to implement the transition and output
functions. Reduction of the resulting expressions leads to the following set
of data process controller equations.
The equations, derived from Table B2 (see Appendix B), for the ith stage
of the data process controller are:
NextR j = [Write/\( - Read)/\R j _ 1] v [ Write/\R j ] v [( - Read)/\R j ]

v[Rj+I/\R;]
... (17.13)
325
M? = Read/\R j + ,
... (17.14)
MI
= (- Read)/\R j
... (17.15)
Enj
[(-
Write)/\(-R j)] v [Read/\(-R j)] v [(-Rj_,)/\(-R j ) ]

Write)/\Read/\( - R j + ,)]
... (17.16)
v [( -
The equations, derived from Table B3 (see Appendix B), for the/irst stage
of the data process controller are:
NextR n _ 1 = [Write/\( - Read)/\R n _ 2 ] v [Write/\R n _ 1 ] v [( - Read)
/\R n _ d
... (17.17)
M~_I
( - Read)/\R n -
... (17.18)
[( - Write)/\( - R n _ I)] V [Read/\( - R n _ I)] v

[( - R n - 2)/\( - R n - ,)] v [( - Write)/\Read]
... (17.19)
The equations, derived from Table B4 (see Appendix B), for the last stage
data process controller are:
NextR o = [Write/\( - Read)]
V[R1/\R o]
M8
v [ Write/\R o]v [( -
Read)/\R o]
... (17.20)
Readl\R(
... (17.21)
MJ = [( - Read)/\R o]
... (17.22)
Eno
[( - Write)/\( - R o)] V [Read/\( - R o)] V [( - Write)
I\Read/\( -R 1)]
...
(17.23)
Finally, the equation for output ready is:

... (17.24)
OPRdy = R o
17.4
It has been shown that the data processor system will return to a transparent
state following a momentary overload of arbitrary size in the rate of the input
data stream, as long as the normal input data rate is less than half of the
frequency of the second data process rate (i.e. I norm < ~) [7]. An input
326
data rate below this critical value leads to a stable mode of operation, whereas
a rate which exceeds the critical value results in an unstable mode of operation
which can lead to chaotic and unbounded behaviour. Through the use of
four differing sets of input stimuli, a single data processor was driven into
stable, unstable, chaotic and unbounded modes of operation. The first set
of input data satisfied the critical value condition and enabled the processor
to return to its transparent state following a series of excessive input
perturbations. However, the second, third and fourth sets of input stimuli
were chosen so that they exceeded the critical value and, following a series
of momentary overloads in the input data rate, resulted in sustained
oscillations and chaotic and boundless behaviours respectively in the length
of the low priority data process. The high priority data process on the other
hand, exhibited long-lived oscillations and chaotic states in response to these
stimuli, whilst the output data rate developed 'hard-clipped' oscillations with
a variable degree of regularity.
17.4.1
Numerical study of the nonlinear system behaviours
The switching time variables 7j in the switching function given by equation

(17.5) were 70= 10,71 = 11,72=20,73=21,74=30 and 75=31. Thus, the
input remains constant for 10 processor cycles, which is given by the value
of [norm' peaks for the entire duration of the proceeding cycle, given by the
value of [peak when it assumes the normal input rate for a further 9 cycles
prior to a single peak, after which it assumes the normal input rate for another
9 cycles and a final peak value in the subsequent cycle prior to resuming the
normal input rate for the duration of the simulation. Data process rates of
R 1= I and R 2= 0.5, and a fixed delay d = 3 were used for all cases studied
in this chapter (examples using different parameters have been described
elsewhere [7, 8]).
Figure 17.7 shows the numerical simulated phase spatial evolution of the
low priority data process queue length when the rates [norm E[0.125,0.25,
0.332,0.5) are successively used in conjunction with the above input data rate
profile to stimulate the data processor. It is shown that the application of
an input profile adopting the rate [norm = 0.125 results in a rapidly
diminishing transient response, for which the data process queue length is
empty after approximately 45 schedule cycles. This is indicated by the
smoothness of the base of the phase portrait, i.e. in the region near the origin
in Fig. 17.7(a) (the construction of the phase portrait is described in Appendix
C). In contrast, a normal input of [norm = 0.25 results in an initial transient,
which after approximately 90 cycles leads to long-lived oscillatory behaviour
which is denoted by the oscillatory nature of the base of the phase portrait
in Fig. 17.7(b). This behaviour is superseded by the onset of chaotic states
z,., z,.." z,..2 =Q,[9i+i+6)
z,., z,.." z,..2 =Q,[9i+j+6)
1.2
1.0
0.8
1.2
1.0
0.8
0.6
0.6
0.4
0.4
0.2
1.2
O'%~~mlm
1.0
~~~~~~~:0.60.8
I
y , y .,' y
327
.2
= Q,[9i+i+3 )
0.2
x , X,." )(,..2 =Q,[9i+iJ
0.4
(a)
2.5
':
."
..
2.0
2.5
....:......
- .... ": ..... '
""";
1.5
. . . . .: ....
-:,
... '.
1.0
2.0
....
"
1.5
....
.......
1.0
0.5
0.5
o o
(b)
Fig. 17.7
Graphs showing the calculated phase space surface trajectory of the low priority
data process queue length for the cases of (a) [norm = 0.125 and [peak = 2.0, (b) [norm = 0.25 and
[peak = 2.0, (c) [norm = 0.332 and [peak = 2.0, and (d) [norm = 0.5 and [peak = 2.0. For clarity,
representative axes are labelled in (a) only. In (a) the system returns to a steady or transparent
state following the initial transient behaviour. The transient behaviour is indicated by the eight
peaks and the steady-state behaviour corresponds to the remaining flat portion of the phase
portrait. In (b) the system does not return to a transparent state, but exhibits long-lived oscillations.
In (c) the length of the data process queue varies in an apparently random fashion and in (d)
the data process grows without bound. The construction of the phase portraits is described
in Appendix C.
328
o 0
( )
1.80E3
1.60E3
1.40E3
1.20E3
1.00E3
8.00E2
6.00E2
4.00E2
2.00E2
. . ....
. '.
,"
..
'.',
1.20E3
1.00E3
8.00E2
6.00E2
4.00E2
2.00E2
-,
O.OOEO
O.OOEO
1.40E3
1.40E3
1.00E3
1.00E3
6.00E2
6.00E2
2.00E2
2.00E2
2.00E2
(d)
Fig. 17.7
(Contd).
in which data is queued in an apparently random fashion when the system

is driven with an input data rate of I norm = 0.332. This is illustrated by the
ad hoc structure of the centre portion of the phase portrait in Fig. I7.7(c).
Further driving the system with a rate of [norm = 0.5 results in the unbounded
329
growth of the process queue length which is clearly visible in Fig. 17.7(d),
in that the phase portrait grows away from the origin.
Figure 17.8 shows the numerically simulated phase spatial evolution of
the high priority data process queue length when the rates [norm E[0.125,0.25,
0.332,0.5J are successively used in conjunction with the above input data
rate profile to stimulate the data processor. It is shown that the application
of an input profile adopting the rate [norm = 0.125 results in a rapidly
diminishing transient response, for which the data process queue length is
empty (indicated by the smoothness of the base of the phase portrait in
Fig. 17.8(a)) after approximately 42 schedule cycles. In contrast, a normal
input of [norm = 0.25 results in an initial transient, which after approximately
100 cycles leads to long-lived oscillatory behaviour which is reflected by the
irregular structure in the phase portrait in Fig. 17 .8(b). This behaviour is
superseded as the data process degrades into chaotic states in which the queue
lengths vary in an irregular manner when the system is driven with an input
data rate of [norm = 0.332. This is illustrated by the rich dynamical structure
of the centre forward portion of the phase portrait in Fig. 17.8(c). As the
frequency of the service requests increases still further, the high priority
process is granted the processor schedule; meanwhile the low priority process
queue continuously grows. Once the low priority process regains an execution
slot, the effect of the fixed delay is such that no data is transferred between
processes and the high priority process becomes idle. This effect is
compounded with time and explains why under conditions of input
information overload, the low priority process queue grows without bound,
whilst the high priority process contains little data. Assuming an input data
rate of [norm = 0.5, Fig. 17.8(d) shows that the high priority data process
queue remains relatively inactive when the rate of data arrivals exceeds a
critical value; the size of the fixed delay ensures that few jobs accumulate
in the higher priority queue.
Figure 17.9 shows how the output data rate varies against elapsed
processor cycles. In each case, the output rate is equal to the input rate prior
to the cycle in which the system was perturbed by the first single cycle of
peak rate in the input data stream. The output data rate subsequently begins
to oscillate between the assigned value of the high priority data process rate
and zero. However, whereas the system that was perturbed with the 'undercapacity' data stream quickly returns to stability (since the data processor
is able to satisfy the computational demands of the input data stream), the
'overloading' data stream, that violates the critical value of input data rate,
results in continued oscillatory behaviour. Driving the system further into
either the chaotic regime or unbounded mode results in similar sustained
oscillatory behavour.
330
.... -: ..
.......
:'
.. "
'
-'
0.25
. ,.
-.- ..
....
'
(a)
. -: ... " ....
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
.. ,-
..... >: : . : : : .
..
.....
,.
":
.:j~ :;::~: ::;: :.~
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
,.... '.
...... '
"
'
'
1.4
0.8
1.2
1.0
0.6
0.8
(b)
Fig. 17.8 Graphs showing the calculated phase space surface trajectory of the high-priority
data process queue length for the cases of (a) lnorm=0.125 and lpeak=2.0, (b) lnorm=0.25 and
lpeak = 2.0, (c) [norm = 0.332 and [peak = 2.0 and (d) [norm = 0.5 and [peak = 2.0. The representative
labelling of the axes is given in Fig. 17.7(a) and described in Appendix C. In (a) the system
returns to a steady or transparent state following the initial transient behaviour. The transient
behaviour is indicated by the two peaks and the steady-state behaviour corresponds to the
remaining flat portion of the phase portrait. In (b) the system does not return to a transparent
state, but exhibits long-lived oscillations. In (c) the length of the data process queue varies in
an apparently random manner and in (d) the data process remains virtually empty throughout
the system's execution.
RESULTS
AND DISC
USSION
0
. .... .-,......-::.........,... ..... .
1.4
1.2
1.0
0.8
0.6
0.4
0.2
........ "" ....
... ::: ::. ::'~ : ~ ::~

:
....
1.4~~~~3
1.2
1.0
0.8
..
:::'
.....
.0
1.4
1.2
1.0
0.8
0 .6
0.4
~~Ng2
1.4
1.2
1.!l
0 .6
0 .6
0 .4
0.2
c)
0.5
d)
fi g . 11.8
lco ntd).
0.8
331
332
0.55
0.50
0.45
0.40
Q)
0.35
(tj
0.30
L-
:J 0.25
0..
:J 0.20
0
L-
0.15
L-
0.10
L-
Cll
"0
0.05 L0
I,
20
40
60
80
100
data processor cycle number
(a)
0.550.50 L0.45 L0.40 IQ)
~ 0.35 LCll
(tj 0.30 L-
"0
:J 0.25 I - 0..
:J 0.20 I0
0.15 L0.10~
0.05 L0
20
40
60
80
100
120
140
160
180
200
data processor cycle number
(b)
Fig. 17.9
Graphs showing the calculated variation of the output data rates against discrete
processor cycles for the cases of (a) lnorm=0.125 and lpeak=2.0 and (b) lnorm=0.25 and
lpeak = 2.0. In (a) the system returns to a steady or transparent state following the initial transient
behaviour. In (b) the system does not return to a transparent state, but exhibits long-lived
oscillations.
17.4.2
333
Nonlinear data processor system behaviours
The first test application of the array has been the study of the processor
batching behaviours which frequently emerge in teletraffic systems. The
nonlinear processing system was driven into the stable, unstable and chaotic
modes of operation using different sets of input stimuli, each having a profile
which remained constant with a magnitude [norm prior to a peak value of
magnitude [peak which occurred for one processor cycle, after which a
constant rate [norm was established for the remaining cycles. The DSP was
used to calculate the phase portraits (see Appendix C) and frequency spectra
of the dynamical first data process queue, in real time. The phase portraits
represent the system dynamics projected on to a two-dimensional plane where
the first point of the original time series is represented by the phase space
ordinate and the third point of the time series, the phase space abscissa. The
second point of the time series is represented by the ordinate of the second
point in phase space. The connection of the points results in a trajectory which
enables the state dynamics of the nonlinear processing system to be visualized.
Figure 17.10 shows the resulting phase portrait when the system is operated
in a stable mode. A specific set of system parameters for the stable mode of
operation were chosen: R?ffload = 1, R~ff1oad = 0.5, [norm = 0.125, [peak = 2.0
3.3
2.8
2.3
N
+
'+
ci
i.
II
1.8
1.3
:oE 0.8
~
0.3
-0.2
-0.2
0.3
0.8
1.3
1.8
2.3
2.8
3.3
X"X,., = Q,[4i+j]
Fig. 17.10
Phase portrait of the nonlinear data processor system when operated in a stable
mode.
334
and d = 3. The reader is referred to Burton and Gell [7) where more detailed
discussion of the system behaviours is given. From the transparent state in
which the length of the first data process queue length is zero (given by the
point (0,0 the trajectory visits a series of points in phase space prior to
entering the basin of attraction of the fixed point (0,0). Thus, the system
returns to its transparent state following the transient response to an input
perturbation.
Figure 17.11 shows the phase portrait of the data processor system when
operated in an unstable mode and the following parameters were assumed:
Rfmoad= I, Rf moad =0.5, I norm =0.25, I peak =2.0 and d=3. Starting from the
transparent state given by the point (0,0), the system exhibits transient
behaviour about the input perturbation and subsequently adopts a limit cycle
bounded by the points (2,1)-(1,1)-(0,1)-(1,1)-(1,2). Figure 17.12 shows the
phase portrait of the data processor system when operated in the chaotic
mode. Starting from the transparent state given by the point (0,0), the system
quickly adopts the form of a strange attractor - a complicated geometric
form [10, II).
Figure 17.13 shows the power spectrum of the low-priority data process
queue length Q, [k) when the nonlinear data processor system was operated
3.3
2.8
N
+
2.3
'+
i. 1.8
cf
II
. 1.3
0.8
0.3
-0.2
-0.2
Fig. 17.11
0.3
0.8
1.3
1.8
2.3
2.8
3.3
Phase portrait of the nonlinear data processor system when operated

in an unstable mode.
335
2.5
2.0
C\j'
'+
i.
ci
1.5
1.0
II
,;
>: 0.5
0
-0.5
-0.5
0.5
1.0
X"X,+,
Fig. 17.12
1.5
2.0
2.5
= Q,[4i+i]
Phase portrait of the nonlinear data processor system when operated

in a chaotic mode.
in an unstable mode. This power spectrum shows the frequencies at which

different queue lengths are observed. The system exhibits strong peaks at
the frequencies DC, 0.12, 0.25, 0.37 and 0.5 Hz; there are no peaks at higher
104
103
ci
102
r:ii
.Q
10'
10
10-' L..-_.L-_..L-_-'-_....L.._--L_......L_--1_---JL......_.L-----I
o 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
frequency, f
Fig. 17.13
Power spectrum of the nonlinear data processor system when operated in an

unstable mode.
336
frequencies. The DC component (at 0 Hz) corresponds to an empty queue

which is the most frequently observed state of this system given the parameters
with which Fig. 17.13 was obtained. The form of this spectrum indicates that
the queue length adopts a limit cycle of period-5, in that the power spectrum
is dominated by five frequencies. Figure 17.14 shows the power spectrum
of the nonlinear data processor system when operated in the chaotic mode.
The spectrum is broadband and contains substantial power at low frequencies.
In spite of chaotic behaviour the power spectrum contains residue of the
dominant frequencies governing the unstable mode. This residue is sensitive
to system parameters.
10'
cr
o
Ol
.Q
102
10'
10
0.05
0.10 0.15
0.20
0.25 0.30
0.35 0.40 0.45
0.50
frequency, f
Fig. 17.14
Power spectrum of the nonlinear processor system when operated in a chaotic

mode.
17.5
CONCLUSIONS
The implementation of a novel architecture for studying the complex

behaviour in a nonlinear array of service processing elements which are based
on a mathematical model of a generalized switching system has been
described. The architecture provides an inexpensive platform for investigating
CONCLUSIONS
337
signalling in heterogeneous environments and has been used in the empirical

validation of the model of a nonlinear service processing system.
The nonlinear processing system confirmed the existence of four distinct
behavioural modes - stable, unstable, chaotic and unbounded. In the stable
mode the length of the first data process rate returned to normal values
following an initial period of transient behaviour, in response to a
perturbation in the input data stream. In the unstable mode the length of
the first data process queue underwent a sustained oscillation; cycling through
a fixed set of queue lengths. In the chaotic mode the length of the first data
process queue varied in an irregular manner and in the unbounded mode
the first data process queue length grew without bound at an exponential
rate. All these behaviours bring significant degradation in the overall
performance of the processing system and reduce the system's capability to
batch its workload.
The results of a series of numerical studies [7] and the empirical results
obtained from the nonlinear processing system were complementary showing
good qualitative agreement in the phase portraits and power spectrum of the
low priority data process queue length and temporal evolution of the output
data rate and data process queueing delay profiles. The simple serviceprocessing element was shown to exhibit a remarkably complicated range
of dynamics, illustrative of the behaviours expected in more complex
communications service systems.
These results have important implications for interworking between more
complex communications service systems, a topic which wili be studied further
using developments of the array. Further studies will focus particularly on
integration issues, stability criteria and fault tolerance in co-ordination and
control of asynchronous heterogeneous networks. Such issues will become
increasingly relevant as communications systems become more prone to
outage as a result of increasing complexity. The availability of this new
research tool will enable experimental simulation and exploration [12] of
a wide range of communications systems to be performed without having
to build large suites of software to assess (restrictive aspects of) system
performance, much of which is not tractable analytically. In addition to
allowing complexity and network outage problems to be explored in realistically massive networks, the array also provides a stimulus to developing
and constructing visualization tools for probing, analysing and managing
the complex dynamics.
338
APPENDIX A
Nonlinear data processor system discrete time analysis
With reference to Fig. 17.2, the data flow through the system may be studied
using discrete time analysis. The data process queue lengths at a sampling
interval k+ 1 are given by the queue length in the previous interval k plus the
net flow in that interval:
Qdk+l]
= Qdk]
+ I[k] - 0dk]
... (AI7.!)
and
... (AI7.2)
The output of the second process can be one of two possible values. The
maximum rate at which the second data process queue can be served (R 2)
or if there is a smaller amount of data on the second data process queue then
the output of the second data process will be the contents of the process during
the previous processor time cycle, Q2[k] +0 1 [k-d]. The output of the first
process is similarly constrained with the additional provision that only the
residual capacity after the second data process is served is available to serve
the first data process. That is, the maximum amount of data that can be dealt
with by the first process is the rate of the first process, R 1, multiplied by the
fraction of the time cycle remaining once the second process has finished.
The time taken by the second process is given by:
2[k]
... (AI7.3)
The maximum output from the first process is therefore:
R,
2[k] )
1---
R2
... (Al7.4)
If there is less than this amount of data in the first process queue then
the output of the first process will be the contents of the process during the
previous time cycle, QI [k] +/[k]:
min (Q2 [k] + 0, [k-d], R 2)
... (AI7.5)
min ( Qdk] +/[kJ, R 1 ( 1- 02R[2k] ) )
... (AI7.6)
APPENDIX A
339
The number of equations may be reduced by elimination of variables.

Equation (Al7.1) may be written in terms of 0 1 [k]:
0dk]
= Qdk] - Qdk+l] + I[k]
... (AI7.7)
and substituting for k = k - d results in:

Odk-d]
Qdk-d] - Qdk-d+l] + I[k-d]
... (AI7.8)
Similarly equation (AI7.2) may be written in terms of O2[k] :

... (AI7.9)
Replacing the 0 1 [k-d] term in equation (AI7.5) by equation (AI7.8) gives:

02[k] =min (Q2[k] +Qdk-d]-Qdk-d+l] +I[k-dJ, R 2)
... (AI7.10)
and noting that:
R1
1-
2[k]
) is equivalent to
term in equation (AI7.6) may be replaced by equation (AI7.10) to give:

R
0 1 [k] = min (Qdk] +I[kJ, ~ (R 2 -min(Q2[k] +Qdk-d]
R2
- Qdk-d+l] +I[k-dJ,R 2
... (AI7.11)
This expression may be simplified using the following irreversible nonlinear

function:
x+lxl
j(x) = - - and the identity j(a-b)=a-min(a,b).
2
In this particular case, a=R 2, and b=Q2[k] +Qdk-d]-Qdk-d+l]

+I[k-d] and so:
0dk] =min( Qdk] +I[k], R
R2
j(R 2-Q2[k]-Qdk-d]
+ Qdk-d+l]-I[k-d]))
... (AI7.12)
340
Substituting equation (AI7.6) for the 0, [k) term in equation (Al7.1) results
in:
Qdk+l] =Qdk] +I[k)-min ( Qdk] +I[kJ,R,( 1-
2[k)
R;- ))
... (AI7.13)
Qdk+l]=Qdk)+I[k]-min (Qdk)+I[kJ, R: (R 2-02 [k]))

... (AI7.14)
Furthermore, substituting equation (AI7.8) for the 0 1 [k-d) term in
equation (AI7.9) gives:
02[k] =Q2 [k] -Q2[k+ 1) +QI [k-d] -QI [k-d+ 1] +l[k-d]
... (AI7.15)
and substituting the above O2 [k] expression for the same term in equation
(AI7.14) yields:
Qdk+l]=
Q.[k] +l[k]
-min
QI [k] +I[kJ, -
R.
(R 2-(Q2[k]
R2
-Q2[k+l) +Qdk-d] -Qdk-d+l] +I[k-d))) )

... (AI7.16)
Finally, using the identity f(a-b)=a-min(a,b), where a=QI [k] +l[k) and
R.
R2
b = - (R 2-(Q2[k] -Q2[k+l] +Qdk-d] -Qdk-d+l] +l[k-d)))

gives:
Qdk+l] =f (QI [k] +I[k] +2(Q2[k] -Q2[k+l] +QI [k-d]
R2
-Qdk-d+l] +I[k-d] -R 2
... (AI7.17)
Substituting equation (AI7.8) for the 0 1 [k-d] term in equation (Al7.2)

results in:
APPENDIX A
341
Q2[k+l] =Q2[k] +QI [k-d] -QI [k-d+l] +/[k-d] -02[k] ... (AI7.18)
and substitution of equation (AI7.1O) for the O 2 [k] term in equation

(AI7.18) above gives:
Q2[k] +QI [k-d] -QI [k-d+l] +/[k-d]
-min (Q2[k] +QI [k-d] -QI [k-d+l] +/[k-d] ,R 2)

... (AI7.19)
which is of the form !(a-b)=a-min(a,b), where:
a=Q2[k] +Q, [k-d] -QI [k-d+l] +/[k-d]
and:
Hence:
Q2[k+l] =!(Q2[k] +QI [k-d] -Q, [k-d+l] +/[k-d] -R 2 )
...
(A17.20)
342
APPENDIX B
State transition tables
The state transition diagrams for both the data processor controller and the
data processes can be synthesized into an intermediate tabular form, known
as a state transition table, and used to determine the governing equations.
Table 81
State transition table for the data processor controller showing the inputs OPRdyJ,
OPRdy2, AckTimer and AckDelay, the present state variables p, q and r, the next state variables
Nextp, Nextq and Nextr, and the outputs Read], Read2, Sell, Se12, Write2 and RstDelay.
OPRdyl OPRdy2 AckTimer AckDelay P
0
0
0
-
0
I
0
0
0
0
0
0
0
0
I
I
I
I
0
0
0
0
0
0
0
0
0
0
0
I
I
I
I
NeXlp Nexlq Nextr Readl
I
I
0
0
1
1
I
0
0
0
0
0
0
0
0
0
I
I
0
I
0
I
0
0
0
0
0
0
I
I
I
I
0
0
0
0
0
0
0
0
0
0
0
I
I
0
0
0
1
1
0
0
0
1
I
Read2 Sell Sel2 Wrile2 RS1Deiay

0
0
0
I
I
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
I
I
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
I
0
0
0
0
I
I
Table 82
State transition table for the ith stage data process controller showing the inputs
Write, Read, R j _ 1 and R j + I , present and next states R j and NextR j , and the output control words
Enj, Mp and Mjl which correspond respectively to the three operating modes
Hold, Load, and Shift.
Write
Read
R j _1
Ri + 1
Rj
NextR i
En j
MDI
M;'
X
X
X
X
X
0
0
0
0
0
0
0
I
I
I
X
X
0
I
I
0
0
I
I
I
I
0
0
0
0
X
X
I
0
0
0
I
I
I
X
X
X
X
0
0
I
I
I
I
Table 83
0
I
0
0
State transition table for the first stage data process controller showing the inputs
Write, Read and R n - 2, present and next states R n _ 1 and NextR n _ I , and the output control words
Enn_1 and M~_I which correspond respectively to the two operating modes Hold and Load.
Write
Read
Rn- 2
I
I
I
X
X
0
X
I
0
0
0
I
I
X
X
X
R n _ 1 NextR n _ 1 En n _ 1 MR-I
0
0
0
0
I
I
I
0
0
0
I
I
I
I
I
I
0
0
0
APPENDIX C
343
Table 84
State transition table for the last stage data process controller showing the inputs
Write, Read and R ,. present and next states R o and NextR o, and the output control words Eno.
M8 and M6 which correspond respectively to the three operating modes Hold, Load. and Shift.
Write
Read
R,
Ro
NextR o
Eno
0
I
I
X
X
X
I
0
0
I
X
X
X
X
0
0
0
I
I
1
1
0
0
I
I
I
1
0
I
I
0
0
0
0
1
0
0
MoO MOl
-
0
0
0
0
APPENDIX C
Phase-space representation of time series
The two-dimensional phase portraits presented in this chapter are projected
on to a two-dimensional plane according to the following rule
Xk>Xk+1 =QI [4i+j]
... (CI7.l)
Yk>Yk+1 =QI [4i+ )+2]

N
for i=0,1,2, ... ,N, )=0,1 and k=0,2,4, ... , -2 -1 where x" 1
y. and x1+'
I y. 1 are
,+
the phase-space co-ordinates, QI [.] is the time series describing the length
of the low-priority data process queue, and N is the length of the time series.
The three-dimensional phase portraits show the phase trajectory plotted as
a surface and projected on to a two-dimensional plane according to the
following rule:
Xk>Xk+I>Xk+2 = QI [9i+)]
Yk>Yk+I>Yk+2=QI [9i+)+3]
Zk>Zk+I>Zk+2 = QI [9i+ )+6]
where the nomenclature of the two-dimensional case applies.
... (CI7.2)
344
REFERENCES
I.
Gell M A: 'Self-organisation and market-based systems in telecommunications',

in Docampo D and Figuerras Vidal A (Eds): 'Proceedings of the COST 229
workshop on adaptive systems in telecommunications', Vigo-Bayonne, Spain (June
1993).
2.
Olafsson Sand Gell M A: 'Application of an evolutionary model to telecommunication services', European Transactions on Telecommunications, 4 , No I,
pp 69-75 (1993).
3.
Lindstrom A H: 'Free trade symposium lights competitive fires', Telephony (8

April 1992).
4.

Telematics and Informatics, 10, No 2, pp 131-139 (1993).
5.
Erramilli A and Forys L J: 'Oscillations and chaos in a flow model of a switching

system', IEEE Journal on Sel Areas in Communications, 9, No 2,
pp 171-178 (1991).
-
6.
Erramilli A and Forys L J: 'Traffic synchronization effects in teletraffic systems',

Proceedings of 13th lTC, pp 201-206, Copenhagen, Denmark (1991).
7.
Burton F and Gell M A: 'Data flow through a double process system', European
Transactions on Telecommunications, -.!, No 2, pp 221-230 (1993).
8.
Pointon C T, Carrasco R A and Gell M A: 'Novel architecture for studying complex

behaviour in nonlinear processing systems', Proc of EUSIPCO-94, 3,
pp 1305-1308, Edinburgh, Scotland (1994).
-
9.
Pointon C T, Carrasco R A and Gell M A: 'The implementation of a large array

of nonlinear data processor elements using FPGAs', Proc of ASIC-94, pp 265-268,
New York, USA (1994).
10. Cvitanovic P: 'Universality in chaos', second edition, Adam Hilger, Bristol (1989).
II. Bai-Lin H: 'Elementary symbolic dynamics and chaos in dissipative systems',
World Scientific Publishing Co, Singapore (1989).
12. Mees A and Sparrow C: 'Some tools for analyzing chaos', Proc of the IEEE,
75 , No 8, pp 1058-1070 (1987).
Index
Abscissa, phase space 333
Access, see Network, Switching
Activation function 69, 72
Add-drop 185, 186
multiplexer, see ADM
Adjali I 45
ADM 96
Agent 61
autonomy 247
behaviour 261
box 53
central control 261
co-operation 58
distributed control 261
distribution 62
inter-agent
comunication 248
load management 250, 254-256,
258, 260
mobile 246, 248-249, 262-263
parent 253, 255-258, 260
performance 259
randomly distributed 53
robustness 256-257, 261
self-organizing 247, 255, 262
uniformly distributed 56, 59
see also Software
AI 9
distributed 248
AIS 208, 215
Alarm 125
Alarm insert signal, see AIS
Algorithm
Bellman-Ford 116,117
Benders' decomposition 97
bucket brigade 237
complexity 117
compression 204
convergence 118
Dijkstra's 116
distributed circuit
assignment 139
distributed control 262
distributed network 129
distributed restoration, see DRA
dynamic programming 117
embodied 117
Floyd-Warshall 116
general 235
genetic 227-228, 230, 232-233,
242-243
graph theoretical 109, 113
greedy 115, 116
heuristic 130, 137
k-means 38-39, 41
Kruskal 115
maximal flow 109, 117
maze-running 236
message flow 109
minimum cost-flow 117
Moore state machine 319
Munkres 67, 80
non-simplex 115
optimal route 251
optimization 121
parallel distributed 118
346
INDEX
Prim 117
recursion 117
restoration 137
route-finding 138, 139
self-copy 235
shortest-path 117
simplex 113, 121
span failure 135
stochastic decomposition 97
Amin S J 153
Amino-acid 51, 231
Amplification
optical 9,172,175,176,177,199
Analysis
equilibrium 48
market-oriented 48
marginal 48
performance 86
probabilistic 104
sensitivity 23
stochastic 144
Annealing
hysteretic 67
simulated 137
ANT 5
Anthropomorphize 236
Appleby S 22, 245
Approximation
distribution 57
heavy-traffic diffusion 89
linear noise 50
macro fluid 90
meso diffusion 90
time-dependent 63
time-independent 63
Van Kampen 54, 56, 62
APS 125
Arc 103-104
incoming 104, 107
minimal total 112
outgoing 104, 107
Architecture
meshed 137
network 138, 336
ring 120
seven-level OSI 86
subsumption 248-249, 262
switching 1, 107
time and space bottlenecks
in 107
Arpanet 112, 118
Array
nonlinear 317, 336
see also VLSI
Artificial intelligence, see AI
'Artificial Life' 224,233, 236, 239
simulation 232
systems 232
Assignment
capacity 112
capacity and flow 112
flow 112
probabilistic 290
see also Resource
Asynchronous transfer mode,
see ATM
ATM 4,49,97, 124, 144, 151, 153
network control 153
photonic 4
Autocorrelation 204, 205
Autocovariance 150
Automatic protection switching,
see APS
Autonomous network telepher,
see ANT
Backplane 4
Bandwidth 1,8,9,96,98,144,147
effective 149-151
limitation 197
limitless 197, 199
linear programme 100
management 145
transparency 197
unlimited 175
INDEX
Battery
central 190
Beggs S L 124
Behaviour 236
antiparasitism 53
chaotic 326, 336
colony 232
co-operative 236
emergent 225
flock 53
formation 51
oscillatory 317, 329
parasitism 51, 232
predator 232
social 53, 232
task-oriented 236
tit-for-tat 62, 63
unbounded 326
virsus 232
see also Market, Competence
Bellman-Ford, see Algorithm
Bell System 95
Benchmarking 172
BER 169, 201, 216
long-term 201
Billing 3, 9
Binary symbol 227
Biological
ants 237
competition 62
crossover 226-229, 237
evolution 224
message 237
mutation 226, 227
pay-off model 52-53
phenomena 224-225
sex ratio 265
stratification 51
systems 12, 51, 53, 61
techniques 239
Bit error analysis 200
Bit error ratio, see BER
347
Blocking
burst 99
call 99
Erlang 99
Botham C P 124
Breakage, see Maintenance
Breeding 232
Bridge 108
Broadband, see ISDN
Broadcast 3, 120
Brown G N 124
Brownian motion 25, 87, 94
local time process 91
reflected 89, 92
Buffer 87, 144, 146, 147
finite node 100
non-blocking 120
overflow 145, 146, 147, 149
queues 109
storage memory 119
Building block 224
Burst error 194
Business systems
automated 48
Butler R A 200
Bytes
frame overhead 126, 136
Cable
break 124, 199
coaxial 168, 172, 183, 188,
190, 192
direct bury 172, 188
failure
accidental 188
corrosion 168, L73, 188, 197
digger 173
dominant mechanism 183
moisture 168, 173
multiple 132
spade 173
statistical independence 170
windage 188
348
INDEX
fibre 169
outage 173-174
overhead 172, 188
repair, see Maintenance
risk 199
ship 182
size 168
undersea 169, 172, 182
Call
drop-out threshold 131
fax 145
file transfer 145
loss rate 148
telephone 124
video 124
Capital
initial 52
Cardinality 105
order 105
size 105
Carrasco R A 311
CCITT 153
Cell 51
header 97
loss rate 144, 146, 147, 148, 151
production rate 149
route 145
packetized 97, 124
Cellular 2, 3, 16, 20
'Central Office' 46, 48, 120, 176,
245, 312
Channel
allocation, virtual 97, 100
capacity 49
connection 157
identifier, virtual 97
Chaotic
attractor 25
phenomena 292
regime 329
state 49, 58, 63, 317, 326, 329
Characteristic
length 26
statistical 145
Children 235
CIP 46
Circuit
assignment 138, 139, 140
bi-directional 133
electronic 171
equalization 175
hot standby 183
integrated 171
multiple failure 184
protction 174
standby 184
Circulant, see Network topology
Classifier 234, 237
Clock
recovered 208
Cluster 32
Coaxial, see Cable
Cochrane PI, 168, 200
Code
evolved 243
generation 243
Coding
5B6B 216-219
debugged 239
error axis 219
HDB3 208-216
Coefficient
cubic 51
linear 51
quadratic 51
Communications, see Telecommunications
Competence 249
Competition 45, 53, 61, 63,
225, 311
co-evolving 232
sensitivity 61
INDEX
Compound interest 225

Computational ecosystem philosophy 286
Computing
availability 240
connection 240
cost-effective management 308
distributed 65, 198
innovation 233
model
macro 84
meso 84
micro 84
originality 233
parallel 235, 240
power 9
scheduling 313
non-pre-emptive 315
science 120
simulation 239
speed 240
system 53, 224
Concave function 94
Concentrator assignment,
see Resource
Conductor
power-feed 173
Connectivity
depth first search 109
edge 108
graph 109
multimedia 87
multi-point 87
multi-rate 87
vertex 108
Constant
decay time 220
Constraint
capacity 97,113,117
chance 100
connection (minimum/maximum)
107
349
demand 97
linear 112
probability 98
Converter
DC/DC 176
Copper 1, 3, 168-169
drop 197
twisted pair 172, 177, 183, 188
systems 172, 174, 175, 188
Correlation 212, 219, 225
length 32
Cosine series 204
Cost 52
distance-related
lowest 199
minimal total 115
negative 113
operating 190
primal improvement 113
reduction 168
running 197
transmission 2
Counting 118
CPU 53, 225, 231-232, 235-236
Craftsman 224
Crossbar 107
Crossconnect 131, 141
Crossover
operator 234
see also Biological
Curve
fitting 12, 229, 233-234
Koch 25
Customer
chance 12
control 9
expectation 9
Damage, see Maintenance, Cable
DAR 110
Darwin C R 225, 226
Data
census 37
350 INDEX
chaotic 233
clock rate 171
noisy 233
random stream 207
reliability, see Reliability
Database 49, 139, 140
access time 136
DBM 26
DeS 96, 137
computer-controlled 125
Debugging 249
Decision-making 57
optimal sequence 115, 116
Principle of Optimality 116
routeing 117
stages 115
stepwise 115, 116
Decision threshold 211, 216,
217-219
Decoder 215
Decomposition 242
phase space 318
Delay
fixed feedback 315
Delta
function 156
Kronecker 159
Dempster M A H 84
Dendritic structure 26
see a/so Morphology
Density
power spectral 150, 204
Dielectric breakdown model,
see DBM
Diffusion 95
Diffusion limited aggregation,
see DLA
see a/so Equation
Digital crossconnect system, see DeS
Digital system processor, see DSP
Digital sum variation, see DSV
Dijkstra, see Algorithm
Discretization of time 151, 315, 338

Dispersion
index 150-151
Distribution restoration, see Network
Distribution
bias 60
critical exponents 32
feed-back effects on 297
equilibrium 62
Gaussian 50, 94
limiting 69
network 22
non-stationary network 82
normal 289
NTA contagious 203
occupancy probability 26
Pareto 203, 303
Poisson 201, 203
population 23, 26
power-law 31, 32
probability 54, 57, 59, 61, 272,
287, 300, 305
scale-invariant 23
spatial 22, 23, 42
stationary state 95
tree-and-branch 188
Divider
dimension 24
gap 23
DLA 26
DNA 51, 231
DRA 124, 125-128, 139, 141
DSP 314, 317-318, 333
module 317
DSV 213, 215
boundary condition 213
Duct 172, 188
Dynamic
conservation 53
nonlinear 49
phase space 59
programming 115, 116, 118
INDEX 351
systems
25, 58
see also Equation, Algorithm

Economy
decentralizing global 47, 49
framework for 48
EDFA 3, 175
Edge
incident 104
matching 107
set 104, 107
EFS 201
Eigen-dimension 33
Eigenvalue 35, 266, 275, 277,
280, 281
Eigenvector 35
Electromagnetism 170
see also EMC
Electrostatic discharge 220
E-mail 2
EMC 173, 179
Entropy
function 287, 288, 292, 294
generalized 41
maximum 35
processor 300, 305-306
task 287,292,298,300,302-304
Enumeration 116
Environment 237
change 225
distributed multiservice 81
economic 11
heterogeneous 336
regulatory II
Enzymes 51
Equation
deterministic 60, 61-62
diffusion 27
dynamical 72, 73, 74, 80, 154
fluctuation 61
Fokker-Planck 50, 54, 62
governing 342
Hopfield 155
irreversible nonlinear 339
Laplace's 26
macroscopic 59
master 50
non-vanishing contribution 161
reliability 170, 172, 181
stochastic differential 95
traffic 88
Equilibrium 19,20,56,61,89,
159-160, 162, 268, 270-279,
308
queue length process 88
regulator 89
see also Analysis
strategy 273, 278, 282
Erbium-doped fibre amplifier,
see EDFA
Error
activity 211
bit 207, 208
burst 201, 202, 205, 212,
215, 220
palindrome effect 205
code 207, 208, 213, 215, 220
density 212, 213, 215, 220
detection 220
free seconds, see EFS
interval logger 208
probability 213
randomly generated 200
statistics 200
transient pattern 200
see also BER
ESS 265-266, 269, 281, 282, 283
Evolution 45, 53, 60, 61, 225-226,
228, 229, 234-236, 238, 243
'environment-oriented' 232
model 232
open-ended 233
strategy 226, 236
352
INDEX
'task-oriented' 232
Evolutionarily stable strategy,
see ESS
Execution time 66, 247
Expansion, large system-size 50
Exponents 41
Facsimile 17, 20
Failures in ten, see FIT
Fast Fourier Transform, see FFT
FDM 175
Fernandez-Villacanas Martin J L
45, 224
FFT 318
Fibre 1-3, 137, 169, 192
low-loss 173
multimode 172, 188
one per customer 175
reliability 168-200
signal distortion 175
single mode 172, 188
splice 74
system 174, 175
technology 169
terrestrial 199
to the home, see FTTH
to the kerb, see FTTK
undersea 181, 199
Filtering
linear 150
see also Kalman filtering
FIT 171, 176
Fitness 227, 229, 232, 236, 239, 276
Flow
averaging 150
capacitated network 113, 118
commodity 111, 114
conservation 112-114
basic feasible solution 113
control 86
deterministic fluid 87, 89
maximum 112-114
minimum cost 113
multicommodity 114, 117, 118

multiterminal maximum 114
pattern 111
random cell 98
single commodity 112
stochastic multicommodity
network 96, 99
vector 113
wavelength assignment 100, 107
Floyd-Warshall, see Algorithm
Fokker-Planck, see Equation
Formalism 309
Foster's metric 195
Fractal
descriptor 25
dimension 42
geography 25-28
geometry 23-25
graph 31-32
network 22
structure 42
Frequency allocation 49
Frequency-division multiplexing,
see FDM
FTTH 197
FTTK 197
Gain 66, 154
continuous variable 66
Gaussian process 95
see also Distribution, Noise
Gell M A 45, 153,311
Genetic
information 226
Koza's 240
programming 229-230, 233-36,
239
see also Algorithm, Program
Genotype 227
Gradient 40
Grade of service 97, 99, 96, 197,
311, 313
Graph
INDEX
acyclic 105
bipartite 107, 113
complete 105
connected 105
dimensions 105
directed 96, 104
disconnected 108
distinct points 109
model 104
network-directed 109
nondirected 115
planar 109
redundancy 106
regular 105, 107
representation 103
switching 107
theory 103-104
problems 103-104, 109
vocabulary 103
trivial 108
see a/so Fractal, Tree
Greed 52, 60
see a/so Algorithm
Hardware
monitoring overhead 7
unreliability 5
Hausdorff dimension 37
Hawk-Dove game 271
Hawk-Dove-Bully-Retaliator
game 279, 281
Hawker I 124
Heatley 0 J T 168
Heuristic 114, 118, 228
Holding time
negative exponential 92
Holland J H 227
Homeworking 199
Hopfield net 65-67, 68-70, 71-77,
80-82, 153-154, 162
attractor 73-74, 160, 162
imposed 164
negative 160, 164
353
convergence 72, 74, 76

dynamical model 161, 164
energy function 71, 73, 74,
155-156, 158-159, 164
non-zero entries 164
matrix 76, 98
optimization 72, 74-75, 78
overload 78
performance 75-76
processing capability 154
sensitivity 72
see a/so Neural
Horse 232
Human
geography 25
interdiction 170, 173, 184, 188,
181, 200
see a/so Cable failure
Increment
stationary independent 94
Industrialization 224
Infant mortalities 170
Information
integrity 125
mobility 198
overload 329
superhighway 49
system 49
Information technology, see IT
Input
equilibrium cumulative 89
perturbation 326
with zero elements 161
Installation 188
Interferer 208-209, 211-212, 215,
216, 219, 220
decaying sine wave . 217
peak and decay 216, 220
Integer 51, 112, 114
Integrated services digital network,
see ISDN
Interconnection 12, 15, 17, 103, 313
354 INDEX
multistage 107
spanning tree 32, 105
Interface 9
humanized I
International Standards Organization, see ISO
Invariance 151
ISDN 19, 201
broadband 97
ISO 126, 141
IT 45
Ito integral 91
Jackson, see Theory
Johnson D 124
Joint 173
Kalman filtering 97
Kephart J 0 58, 62
Kleinrock independence
assumption 87, 110
Koza, see Genetic
Kruskal, see Algorithm
Kurtosis 205
LAN 103
Landsat images 25
Language
C 229, 233, 240
c++ 229
fitness specification 243
LISP 228-229
Mathematica 229
Scheme 229
XLisp 229
Laplace, see Equation
Law of large numbers 8, 90, 93, 197
Leaky feeder 3
LED 208,211
Light emitting diode, see LED
Lightning 194, 220
Linear
dependence 243
programming 113, 114, 137
regression 234
string 227, 228

Line
3R process 175
code 207
decoding 200
failure rate 174
gradient 24
Link
availability 173
cost 137
customer-to-customer 190
failed 129
high-capacity 184
international 198
long-distance 169, 184, 186
point-to-point 169
protection 125, 130
terrestrial 184
undersea 169
working 130
unavailability 198
List matching, see Resource
Localloop 3, 169, 176, 187, 198
all-optical 190
configuration 189
flexibility points 188
reliability 186-176
unavailability 190, 191, 198
vulnerability 171
Look-up table 132
Looping 118
Lyapunov function 155-156
Lyons M H 11
Macro-structure 51
Magnetization, spontaneous 59
Maintenance 7, 168, 169, 177, 182,
188, 197
MAN 103,107
Management science 84
Mandelbrot B B 23
Market
competitive 48, 265
INDEX 355
conditions 45
discontinuities 47
disequilibrium 48
dynamics 52
emergent behaviour 49
environment 54, 58
evolution 47
forces 45
global 51, 311
pluralism 47
principles 285
process
agent/resource 49
auctioning 49
bartering 49
bidding 49
share 15-16, 19, 20, 59, 60, 62
strategy 53, 62, 63
Markov 54, 62-63, 95, 170
modulated fluid 87, 92, 97, 99,
146, 148, 151
piecewise deterministic 92
process 151
holding times 151
Matrix
adjacency cost 116
configuration 158
connection 161, 164
symmetric 161
gain 266-267,271,273-276,279,
283, 298
incidence 104
interscale transfer 34-35
Leontief 92
pay-off 286
preference 293
probability 290
routeing transition 88
stability 266, 274-275, 277,
280-281, 283
utility 286, 298, 305, 308
vertex adjacency 104
weight 155
Maze running 235
software 235
McIllroy P W A 224
Mean time before failure, see MTBF
Mean time to repair, see MTTR
Measure
conventional information 24
performance 7, 9, 109, 110
see a/so Metric
Medova E A 103
Megastream 139
Memory 225, 231, 236
Menger, see Theory
Message passing
fast 125
Method, see Algorithm, Heuristic
Metric 139, 209
confidence 207, 208, 211, 217
decision-point 206
mean 205-206, 211, 215, 216,
217, 219
pattern 206
Microwave 2
MIMD 231
MINOS 147, 148
MIPS 240
MMP 28
Mobility 3, 198
Mode
all-nodes 129
all-spans 129
behavioural 336
chaotic 326, 333, 337
free-run 129
interactive 129
operational 197
simulation 129
stable 326
unbounded 326
unstable 326, 333, 334, 335, 337
Model
356
INDEX
biological 230
burst error 200
closed network 87, 105
connectivity 109
Byzantine general problem
109
evolutionary 225
hierarchical 84
multicommodity flow 117
multinomial logic 28
network flow 109, 110
open network 87, 105
inter-arrival process 88
service-time process 88
optimization 110
probabilistic 125
reliability 169, 170
three-level stochastic
optimization 85, 95
Moment generating function 149
Monitor for inferring network
overflow statistics, see MINOS
Monopoly 46
Morphology 25
dendritic town 26
urban 26
MTBF 7, 125, 129, 171, 174, 176,
187, 194, 195
MTTR 129, 130, 171, 173, 174,
176, 182, 192, 193
Multilayer framework 141
Multimedia 199
Multiple-instruction multi-data,
see MIMD
Multiplexing 97, 148, 176, 183
Boolean 229
control 323
duplication 182
statistical 149
terminal 180
Multiplicative multinomial process,
see MMP
Mutation 51,53,61,63
bit 238
operator 238
pay-off 53
search 234
Netput process 89
Network 1, 46-47, 311
access 5, 15
audit 138
balanced 113
capacity 2, 125, 127
availability 140
bounded link 114
upgrade 197
utilization 125, 287
circuit-switched 103, 250
classification 103
communications 105, 109
complexity 6, 312
congestion 250
control 80, 124
distributed 138
control software 9
cost optimization 137, 138
customer-to-customer 199
data 109
design 85, 114, 168
digitalization 5
disjoint path 106, 108, 109
down time, see MTBF
element manager 136
end-to-end view 9
extension 140
failure 1, 6, 8, 124, 125, 132,
139, 170, 171, 173, 185, 194
location forecasting 194
statistics 174-175, 194
see a/so Cable failure
failure-resilient 120
flexibility 124, 190
future 121
heterogeneous 337
INDEX
hierarchy 69, 175, 199

high-speed 121
homogeneous 312
integrated broadband 103
Jacksonian 87
life cycle 87
lightwave 100
local area, see LAN
management 124, 136, 138, 168,
176, 194, 200
of mass-produced
components 23
metropolitan area, see MAN
model 8, 121, 168
neural, see Neural
non-disjoint path 106
Omega 106
optical transparency 196
packet-switched 87, 118
passive optical, see PON
performance 9, 109, 168-172
protection 124-125, 137, 139,
173-174, 184
public switched telephone,
see PSTN
quantum effects 8
redundancy 138, 173
reliability 311
repair 170, 174
resource 139, 247
restoration 5, 124, 137, 138, 168
centralized 135, 137,
140-141
distributed 125, 126, 135,
137, 138, 141
make-good paths 125, 127,
137
plans 136
pre-planned 132, 141
prioritizing 137
real time 132, 136, 141
simulated 130
357
sub-second 140
time 136
see also DRA
ring 137
self-routeing 107
software 124
sparsely connected 161
star 35, 100
switching 105,107,109,120,144
telegraph 103
test 140
throughput 109
topology 86,100,107,109,118,
129, 130
irregular mesh 107
connected mesh 107, 112
ring (circulant) 107
traffic rebalancing 100
transparency 2, 8, 9
unstable behaviour 164, 313
utilization 184
vulnerability 108
wide area, see WAN
see also Hopfield, ATM
Neural
activity level 154
negative 161
McCulloch and Pitts model 68
network 65-66, 68, 76, 77, 80,
153, 158, 233
back-propagation 153
velocity 162
weights 154
synaptic 161
Neuron 71, 72-74, 155, 160, 161,
164
Neuroprocessor 76
Node
balanced bottleneck 88
bottleneck 88
in chip 118
chooser 127
358
INDEX
destination 110
failure 124, 125, 132, 134, 137
reduction 8
flexibility 180
geographical coverage 103, 110
identity (NID)
intermediate 133
interrogation 261
message sink 109
message source 109
non-bottleneck 88
occupancies 88
ordered pair 104
paths 103
protection 127
reduction 197
restoration 134
simultaneous failure 7
strict bottleneck 88
switch 195, 199
tandem 127
technical definition of 103
termination 35
tree 228
unordered pair 104
see a/so Origin-destination,
Vertices
Noise 61, 76
Gaussian 56, 207
nonlinear 63
see a/so Approximation
Nonlinear control strategy 229
NP-hard 118
Object oriented programming 110,
262
00 96, 100, 110
multiple-pair flow 114
pair revenues 99
Offspring 225
Olafsson S 153, 264, 285
Open systems interconnection,
see OSI
Optical
free-space 2, 3
HOB3 modem, see Coding
network
design 114
multiwavelength 120
transparency 200
wavelength 117, 169
receiver 208
technology 199
transmitter 208, 211
transmitter-receiver pair 100,
120
see also Fibre, Network
Optimization 85, 112, 113, 153,
158, 228
combinatorial 107
control 243
criterion 115
deterministic 110
linear 112
nonlinear 112, 118
objective function 110, 121
parameter 154, 159-161,
162-164
stochastic 86, 110
transportation 113
Optoelectronic
component 168
Order complementarity 89, 100
Organic molecule 51
Organism 51, 225, 236
Origin-destination, see 00
Oscillation
'hard-clipped' 326
persistent 58, 63, 326, 329
see also Behaviour
OSI 125
Output
equilibrium lost 89
potential 89
Overload 66
INDEX
avoidance 138
Packet 120, 156-158
delay 110
transmission 109
Paradigm
competitive industry, see CIP
public utility, see PUP
Parallelism 80, 121
Parameter
control 59
interference 209, 216, 220
preference 12
system 63, 217
uncertainty 58-59, 63, 286
variation 211
Parent 226-227
monitor 257, 260
Path
connecting 109
cyclic 105
delay 3
directed 110
geographical 184
length 127
restoration 133
end-to-end 134-135
see also ORA
shortest 116
simple 105
single 107
virtual 97, 100
see also Vertex
Pattern 224, 229
recognition 233, 234
Pay-off 58
changing 60, 61
cubic 59
linear 60
nonlinear 63
perceptions 62
stochastic effects 62
see also Biological, Mutation
359
PC 231, 239, 317

HP 240
Mac 140
Sun 240
PCM 175, 176, 184
POH 172, 185
Percolation 32-33
bond 32
site 32
theory 32
Performance
degradation 247
measure 7, 9, 110
Personal computer, see PC
Phase
portrait 326, 329, 334, 335
space representation 343
transition 59
Phenotype 227
Photocell 225
Photodetector 175
Pico-cellular 2
Pipe
transparent optical 175
Plant
cable & line 172
concatenated 199
Plesiochronous digital hierarchy,
see POH
Pointon C T 311
Poisson process 94, 99, 200, 203
Polyhedral combinatorics 118
Polynomial 51
solution 118
paN 175, 197
Population 42, 225, 228, 242, 253
diversity 225
management 257
mobility 9
multi-foci 32
POTS, see Service telephony
Power
360 INDEX
consumption 171
feeding 168, 176, 188
grid distribution 194
outage 184
spectrum 335, 336
supply duplication 176,180-181,
184, 199
surge 221
transient 194
Pre-smoothing 151
Pricing
real time strategies 48, 63
setting 63
Prim, see Algorithm
Probability 13, 24, 51
cell-loss 99
critical 33
distribution 266, 275, 280
factors
advertising 51
dealing 51
special offers 51
trust 51
momentary 305
system 58
transition 56
see also Distribution
Profit 52
maximization 112
Program 224, 229
application 235
'brittle' 242
co-operation 238
C-zoo 231, 235-236
error-free 242
error-sensitive 242
error-tolerant 243
evolved 239
genetic 242-243
Hermes 236-239
heterogeneous 237, 285, 286
length 243
template 231-232, 236

mutation 232
Tierra 230-231, 235-236, 239,
242
understandability 235
Propagation delay 110
Proteins 51
Protocol 1, 9
interconnection 312
network layer 117
PSTN 15, 16
Public switched telephone network,
see PSTN
Pulse code modulation, see PCM
PUP 46, 311, 312
Quality 51, 110, 137, 144-145, 153
see also Grade of service
Quantization 204
Quantum effect 197
Queuing
contention 316
delay 337
equal priority 314
input 156-158
length 326, 329, 335-338
multiple 158
network 87
process 337
'reaper' 231
'slicer' 231, 235, 236
theory 87, 145
Radio 1, 3, 137
drop 197
frequency, see Lf.
Radius
spectral 88
Random search 226, 228, 234
Reachability 103
Real number 51
Real time 114, 121
Regular, see Graph
Regulatory aspects 9, 11
INDEX 361
Re-investment 52
Relationship
cost/price 240
parasitic 232
power-law 23-24
weighted sum 41
predator/prey 12
symbiotic 232
Relaxation 53, 60, 61
Reliability 2, 7, 125, 172, 179, 180,
186, 190, 199
end-to-end 173, 179, 186, 199
hardware 196
long line systems 179, 197
operational 197
optical 199-200
statistical 170, 172
see also Fibre optic, Local
loop
Renormalization 32, 33, 41
Renyi A 24
Repair, see Maintenance
Repeater 175-176, 183, 186, 199
buried/surface 173
cascaded 180-182
duplication 176
line 179
optoelectronic 175, 176
reliability 180-197
spacing 168, 173, 175, 199
technology 172
Repeater stations 2, 197
Replication 232
Reproduction 225, 227, 235
Requirements capture 239
Re-routeing 173
Resilience 86, 125, 141, 173
Resolution 24
logarithm 24
Resource 225
allocation 65-67, 313
linear 67
struggle 225
Revenue 12
Reward mechanism 51
Lf. 173
Richter scale 6, 196
Robot control 248
Routeing 86, 103, 109, 111-112,
117, 119, 120, 153, 185, 198
alternative 125, 129, 172, 258
availability 176
configuration 173
cost 247
in data 117
diverse 173, 184, 198-199
duplication 198
dynamic alternative, see OAR
hierarchical adaptive 120
multiple diverse 127, 139
optimal 110-111, 116
protocol 144
restoration 127, 129-130, 132
selection 120, 258
table 250
Rule
complexity 238
condition-action 237
deletion 238
duplication 238
Sampling interval 77, 78
adaptive 78
Satellite 3
geostationary 3
link 3
low earth orbit 3
mobile 2
Schema 228, 230, 233, 234, 242
SOH 124, 125-126, 130-131, 139,
172
restoration routes in 136
see also Network
Search space 242
Selection 225, 227
362 INDEX
Serengheti 223
Service 8, 9
address 315
translation 315
availability 9
competition 11-12, 15, 20
customer 125
development 311
diversity 9
expected life 169
modelling interactions 11-20,49
new 18
origination 315
provider 45
telephony, see Telephony
uninterrupted 131
Shannon information 24
Sierplnski triangle 35-37, 41
Sigmoid function 69, 154
Signalling 3, 9, 136
common channel 49
duration 315
overhead 136
Signal-to-noise ratio, see SNR
Signature
acknowledgement 127
index numbers 127
Silicon technology 171, 172, 200
Simplex, see Algorithm
Simulated annealing method 35
Simulation
availability 129
computer 225
Monte Carlo 207,209,212,215,
217
on-line 127, 131
speed 129
Skewness 205
SNR 211, 216, 220
Software 1, 5, 9, 125, 195, 200
agent 209
decomposition 243
engineering 224
evolving 242, 243
practitioner 234
robustness 246, 247
scaling 242, 243
self-regulating 110, 208
SONET 107
survivability 107
Span
failure 127, 130, 131, 133, 137
restoration 125
pre-planned 135
real time 135
Spatial distribution, see Distribution
Spectrum 2
photonic 9
telephone 103
State transition
diagram 342
table 342
Static discharge 170
Stationary process 149
nondeterministic 150
Steward S 245
Strategy
evolutionarily stable, see ESS
integrated restoration 135
mutant 270
strong 268-273
winning 268
Subdivision
recursive 28
Subgraph, see Graph
'Survival of the fittest' 225
Switching 9,87,144,149,168,176,
185, 198, 313
access 153-154
adaptive 153
broadband 153
computer terminal 103
crosspoint node 107
failure 107
INDEX
fraction 88
hot standby 176
interconnectivity 198
mechanism 157
packet 153
protection, see APS
redundancy 161
robust 153
station 168
Symmetry 59
Synchronicity 208
Synchronized optical network,
see SONET
Synchronous digital hierarchy,
see SDH
System
agent/resource 50-51, 59, 63
attractor 70, 155, 316, 334
availability 171
base 51
linear 60
constant 60
random 60
relaxation 60
greed 60
behaviour 58
bistable 59
brittle 49
client/server 49
complexity 49, 54, 58, 313
crash 257
decision-support 85
development 311
distributed 245, 246, 262
failure 313
fault-tolerant 314
ferromagnetic 59
fluctuations 49
internal 57
nonlinear 50, 54, 57, 58
heterogeneous 314
hierarchical planning 85, 86
363
intelligent 198
irregular operation 313
long-distance 198
market-like 286
N+ 1 standby 125,173,183-184
nonlinear processing 314, 316,
319, 333, 335, 338
time cycle 320
open 49
open-ended 242
operating 245
performance 320
reliability 176
repeaterless 180
self-organizing 49, 198
statistical 287
teletraffic 333
terrestrial 173,176,180,182,184
unavailability 186,187,191,198
undersea 173, 176, 181, 199
utilization 308
see also ATM, Biological,
Parameter, Telecommunications
Systolic chip 188
Tariff 9
Task allocation 70-73,80,253,305
arbitrary preference 294
controller 70, 71
'do as the others do' 297
dynamic 285, 288
predetermined 308
self-confident choice 295-297,
305
Telecommunications
complexity105, 118
reduced 198
convergence with computing/
media 45
decentralization 311, 312
design 112
distribution 311
364
INDEX
diversity 311
engineer 104
evolutinary process 135, 243
global 199
infrastructure 8, 45
market 45, 47, 51
mobile 200
operator 22, 46, 311
routeing, see Routeing
UK 45,48
USA 45
Telephony 15, 20
see also Cellular
Telex 17,20
Temperature
Curie 59
TENDRA 124, 129, 130, 140
Terminal
duplication 176
station 176
Testing 239
Theory
central limit 92-94
central-place 25
deterministic 49, 100
dynamic systems 274
game 264, 282
dynamic 266-268
evolutionary 264-266,
282-283
principles 285
zero-sum 282
Jackson's 110
large deviation 94, 145
mean-field 62
Menger's 108
stochastic 49, 87, 94
see also Graph, Traffic,
Percolation
Throughput
maximization 158
Tools 224
Topology
multiple multi-butterfly 120
multi-ring 120
see also Network
Traffic
approximation 87-91
average 110
bursty 87, 144-146, 149
busy period 149
class 144, 149, 150
congestion 109, 313
diversion 200
Erlang theory 104
future 20
generator 250
input arrival rate III
intensity 88, 144, 260
management 86, 253
modelling 8
modes 8
offered 127
pattern 8, 258
profile 250, 258
queue length 109
stationarity assumption 110
studies 318
synchronization 313
waiting time 109
Transfer function 287, 291
Transmission 7, 49, 171, 198
cable 192
capacity 110
diplex, see WDM
duplex 193
length 115
reliability 173
technology 137, 169, 176
link 103, 104, 115, 118
see also Edge
Trans-shipment 112, 113
Travelling salesman problem 66,
73, 120, 227, 238
INDEX
algebraic property 120

Tree 105, 229
functional sub- 230
minimum spanning 115
parse 228, 235, 242
shortest spanning 117
spanning 105, 113, 115
Steiner 117
Tymnet 117
Utilization 77, 253-254
computer 285, 300, 302
maximum link 110
metric 305
Van Kampen N G, see
Approximation
Variable 112
random 110
Variance 150, 205
Vector
in-flow 88
Very large-scale integration, see VLSI
Vertex 103-104
adjacency 104
cut- 108
degree of 105
disjoint 109
365
incidence 104
path 105
set 107
see also Matrix
Videotelephony 18, 19, 20
quality 19
Virtual circuit 117
VLSI 25, 67, 319
field programmable gate
array 314, 319
WAN 103
Wavelength division hierarchy,
see WDH
Wavelength division multiplexing,
see WDM
WDH 172
WDM 2, 100, 120, 172, 175,
192, 193
soliton 4
Weber R 144
Wiener process 94
Winners 232
Winter C S 224
Workstation, see PC
Zone, communications free-trade
312

Modelling Future Telecom Systems

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Modelling Future Telecom Systems

Enviado por

Direitos autorais:

Formatos disponíveis

MODELLING FUTURE

The BT Telecommunications Series covers the broad spectrum of

SPRINGER-SCIENCE+BUSINESS MEDIA, B.V .

First edition 1996

(oo) Printed on permanent acid-free text paper, manufactured in accordance with

Modelling interactions between new services

Evaluation of Hopfield service assignment

Graph-theoretical optimization methods

System and network reliability

Pre-emptive network management

R A Butler and P Cochrane

C S Winter, P W A McIIlroy and

Software agents for control

S Appleby and S Steward

Dynamic task allocation

Complex behaviour in nonlinear systems

C T Pointon, R A Carrasco and M A Gell

Systems Research, BT Laboratories

Systems Research, BT Laboratories

Systems Research, BT Laboratories

Network Software Applications, BT

Network Modelling, BT Laboratories

Network Modelling, BT Laboratories

Faculty of Science and Technology, The

Advanced Applications and Technologies,

Department of Mathematics, University of

Systems Research, BT Laboratories

Multi-Business Zones Research, Ipswich

Network Modelling, BT Laboratories

Advanced Mobile Media, BT Laboratories

Network Modelling, BT Laboratories

Applications Research, BT Laboratories

Systems Research, BT Laboratories

Department of Electronic Systems

Systems Research, BT Laboratories

Distributed Systems, BT Laboratories

Management Studies Group,

Intelligent Systems Research,

Most telecommunications networks are still designed as if the constraints of

reduced component count;

reduced power and raw material usage;

increased capacity and utility.

A further logical (and revolutionary) development will see the

SWITCHING AND TRANSMISSION

badges/communicators and inter-desk/computer links. Applications in the

SWITCHING AND TRANSMISSION

Only a decade ago, a typical UK repeater station had to accommodate

Asynchronous transfer mode (ATM) is often quoted as the ultimate answer

THE ECONOMICS OF ANALOGUE AND DIGITAL

During the 1970s many administrations completed studies that established

NETWORK DISASTER SCALE

A number of network failures on a scale not previously experienced have

where N = number of customer circuits affected,

earthquakes in excess of 7.0 magnitude are definitely considered major

Monitoring systems and operations, extracting meaningful information, and

PEOPLE AND RELIABILITY

For a fully interconnected network of N nodes, this results in one failure

PEOPLE AND RELIABILITY

The reliability of operational systems is commonly less than that predicted

level of weakening is also supported by practical experience across a broad

QUANTUM EFFECTS AND NODE REDUCTION

The modelling of telecommunications networks has traditionally seen isolated

The increasing speed at which consumer and office electronics can be

a rapid migration of increased computing power and artificial intelligence