Escolar Documentos
Profissional Documentos
Cultura Documentos
EMBEDDED
SYSTEMS
2006 by Taylor & Francis Group, LLC
Publ i shed Books
Industrial Communication Technology Handbook
Edited by Richard Zurawski
Embedded Systems Handbook
Edited by Richard Zurawski
For t hc omi ng Books
Electronic Design Automation for Integrated Circuits Handbook
Luciano Lavagno, Grant Martin, and Lou Scheffer
Ser i es Edi t or
RICHARD ZURAWSKI
I NDUST RI AL I NF ORMAT I ON T E CHNOL OGY SE RI E S
2006 by Taylor & Francis Group, LLC
H A N D B O O K
E di t e d by
R I C H A R D Z U R AWS K I
EMBEDDED
SYSTEMS
A CRC title, part of the Taylor & Francis imprint, a member of the
Taylor & Francis Group, the academic division of T&F Informa plc.
Boca Raton London New York
2006 by Taylor & Francis Group, LLC
To my wife, Celine
2006 by Taylor & Francis Group, LLC
International Advisory Board
Alberto Sangiovanni-Vincentelli, University of California, Berkeley, U.S. (Chair)
Giovanni De Micheli, Stanford University, U.S.
Stephen A. Edwards, Columbia University, U.S.
Aarti Gupta, NEC Laboratories, Princeton, U.S.
Rajesh Gupta, University of California, San Diego, U.S.
Axel Jantsch, Royal Institute of Technology, Sweden
Wido Kruijtzer, Philips Research, The Netherlands
Luciano Lavagno, Cadence Berkeley Laboratories, Berkeley, U.S., and Politecnico di Torino, Italy
Robert de Simone, INRIA, France
Grant Martin, Tensilica, U.S.
Pierre G. Paulin, ST Microelectronics, Canada
Antal Rajnk, Volcano AG, Switzerland
Franoise Simonot-Lion, LORIA, France
Thomas Weigert, Motorola, U.S.
Reinhard Wilhelm, University of Saarland, Germany
Lothar Thiele, Swiss Federal Institute of Technology, Switzerland
2006 by Taylor & Francis Group, LLC
Preface
Introduction
The purpose of the Embedded Systems Handbook is to provide a reference useful to a broad range of
professionals and researchers from industry and academia involved in the evolution of concepts and
technologies, as well as development and use of embedded systems and related technologies.
The book provides a comprehensive overview of the eld of embedded systems and applications. The
emphasis is on advanced material to cover recent signicant research results and technology evolution and
developments. It is primarily aimed at experienced professionals from industry and academia, but will
also be useful to novices with some university background in embedded systems and related areas. Some
of the topics presented in the book have received limited coverage in other publications either owing to
the fast evolution of the technologies involved, or material condentiality, or limited circulation in the
case of industry-driven developments.
The book covers extensively the design and validation of real-time embedded systems, design and
verication languages, operating systems and scheduling, timing and performance analysis, power aware
computing, security in embedded systems, the design of application-specic instruction-set processors
(ASIPs), system-on-chip(SoC) andnetwork-on-chip(NoC), testing of core-basedICs, network embedded
systems and sensor networks, and embedded applications to include in-car embedded electronic systems,
intelligent sensors, and embedded web servers for industrial automation.
The book contains 46 contributions, written by leading experts from industry and academia directly
involved in the creation and evolution of the ideas and technologies treated in the book.
Many of the contributions are from industry and industrial research establishments at the forefront of
the developments shaping the eld of embedded systems: Cadence Systems and Cadence Berkeley Labs
(USA), CoWare (USA), Microsoft (USA), Motorola (USA), NEC Laboratories (USA), Philips Research
(The Netherlands), ST Microelectronics (Canada), Tensilica (USA), Volcano (Switzerland), etc.
The contributions from academia and governmental research organizations are represented by some
of the most renowned institutions such as Columbia University, Duke University, Georgia Institute of
Technology, Princeton University, Stanford University, University of California at Berkeley/Riverside/
San Diego/Santa Barbara, University of Texas at Austin/Dallas, Virginia Tech, Washington University
from the United States; Delft University of Technology (Netherlands), IMAG (France), INRIA/IRISA
(France), LORIA-INPL (France), Malardalen University (Sweden), Politecnico di Torino (Italy), Royal
Institute of Technology KTH (Sweden), Swiss Federal Institute of Technology ETHZ (Switzerland),
Technical University of Berlin (Germany), Twente University (The Netherlands), Universidad Politecnica
de Madrid (Spain), University of Bologna (Italy), University of Nice Sophia Antipolis (France), University
of Oslo (Norway), University of Pavia (Italy), University of Saarbrucken (Germany), University of Toronto
(Canada), and many others.
The material presented is in the formof tutorials, surveys, and technology overviews. The contributions
are grouped into sections for cohesive and comprehensive presentation of the treated areas. The reports
on recent technology developments, deployments, and trends frequently cover material released to the
profession for the rst time.
The book can be used as a reference (or prescribed text) for university (post)graduate courses: Section I
(Embedded Systems) provides core material on embedded systems. Selected illustrations of actual
applications are presented in Section VI (Embedded Applications). Sections II and III (System-on-Chip
Design, and Testing of Embedded Core-Based Integrated Circuits) offer material on recent advances in
system-on-chip design and testing of core-based ICs. Sections IV and V (Networked Embedded Systems,
and Sensor Networks) are suitable for a course on sensor networks.
2006 by Taylor & Francis Group, LLC
x Preface
The handbook is designed to cover a wide range of topics that comprise the eld of embedded sys-
tems and applications. The material covered in this volume will be of interest to a wide spectrum of
professionals and researchers from industry and academia, as well as graduate students, from the elds of
electrical and computer engineering, computer science and software engineering, as well as mechatronic
engineering.
It is an indispensable companion for those who seek to learn more about embedded systems and
applications, and those who want to stay up to date with recent technical developments in the eld. It is
also a comprehensive reference for university or professional development courses on embedded systems.
Organization
Embedded systems is a vast eld encompassing numerous disciplines. Not every topic, however important,
can be covered in a book of reasonable volume without supercial treatment. Choices need to be made
with respect to the topics covered, balance between research material and reports on novel industrial
developments and technologies, balance between so-calledcoretopics and newtrends, and other aspects.
The time-to-market is another important factor in making those decisions, along with the availability
of qualied authors to cover the topics.
One of the main objectives of any handbook is to give a well-structured and cohesive description of
fundamentals of the area under treatment. It is hoped that the section Embedded Systems has achieved this
objective. Every effort was made to make sure that each contribution in this section contains introductory
material to assist beginners with the navigation through more advanced issues. This section does not
strive to replicate or replace university level material, but, rather, tries to address more advanced issues,
and recent research and technology developments.
To make this book timely and relevant to a broad range of professionals and researchers, the book
includes material reecting state-of-the-art trends to cover topics such as design of ASIPs, SoC com-
munication architectures including NoC, design of heterogeneous SoC, as well as testing of core-based
integrated circuits. This material reports on new approaches, methods, technologies, and actual sys-
tems. The contributions come from the industry driving those developments, industry-afliated research
institutions, and academic establishments participating in major research initiatives.
Application domains have had a considerable impact on the evolution of embedded systems, in terms
of required methodologies and supporting tools, and resulting technologies. A good example is the accel-
erated evolution of the SoC design to meet demands for computing power posed by DSP, network and
multimedia processors. SoCs are slowly making inroads into the area of industrial automation to imple-
ment complex eld-area intelligent devices which integrate the intelligent sensor/actuator functionality by
providing on-chip signal conversion, data and signal processing, and communication functions. There is
a growing tendency to network eld-area intelligent devices around industrial communication networks.
Similar trends appear in the automotive electronic systems where the Electronic Control Units (ECUs)
are networked by means of safety-critical communication protocols such as FlexRay, for instance, for
the purpose of controlling vehicle functions such as electronic engine control, anti-locking break system,
active suspension, etc. The design of this kind of networked embedded system (this also includes hard
real-time industrial control systems) is a challenge in itself due to the distributed nature of processing
elements, sharing a common communication medium and safety-critical requirements. With the auto-
motive industry increasingly keen on adopting mechatronic solutions, it was felt that exploring, in detail,
the design of in-vehicle electronic embedded systems would be of interest to the readers of this book.
The applications part of the book also touches the area of industrial automation (networked control
systems) where the issues are similar. In this case, the focus is on the design of web servers embedded in
the intelligent eld-area devices, and the security issues arising from internetworking.
Sensor networks are another example of networked embedded systems, although, the embedding
factor is not soevident as inother applications; particularly for wireless andself-organizing networks where
the nodes may be embedded in the ecosystem, battleeld, or a chemical plant, for instance. The area of
2006 by Taylor & Francis Group, LLC
Preface xi
wireless sensor networks has now evolved into a relative maturity. Owing to novelty, and growing import-
ance, it has been included in the book to give a comprehensive overview of the area, and present new
research results which are likely to have a tangible impact on further developments and technology.
The specics of the design automation of integrated circuits have been deliberately omitted in this book
to keep the volume at a reasonable size and in view of the publication of another handbook which covers
these aspect in a comprehensive way: The Electronic Design Automation for Integrated Circuits Handbook,
CRC Press, FL, 2005, Editors: Luciano Lavagno, Grant Martin, and Lou Scheffer.
The aim of the Organization section is to provide highlights of the contents of the individual chapters
to assist readers with identifying material of interest, and to put topics discussed in a broader context.
Where appropriate, a brief explanation of the topic under treatment is provided, particularly for chapters
describing novel trends, and with novices in mind. The book is organized into six sections: Embed-
ded Systems, System-on-Chip Design, Testing of Embedded Core-Based Integrated Circuits, Networked
Embedded Systems, Sensor Networks, and Embedded Applications.
I Embedded Systems
This section provides a broad introduction to embedded systems. The presented material offers a com-
bination of fundamental and advanced topics, as well as novel results and approaches, to cover the area
fairly comprehensively. The presented topics include issues in real-time and embedded systems, design
and validation, design and verication languages, operating systems, timing and performance analysis,
power aware computing, and security.
Real-Time and Embedded Systems
This subsection provides a context for the material covered in the book. It gives an overview of real-time
and embedded systems and their networking to include issues, methods, trends, applications, etc.
The focus of the chapter Embedded Systems: Toward Networking of Embedded Systems is on network-
ing of embedded systems. It briey discusses the rationale for the emergence of these kinds of systems,
their benets, types of systems, diversity of application domains and requirements arising from that, as
well as security issues. Subsequently, the chapter discusses the design methods for networked embedded
systems, which fall into the general category of system-level design. The methods overviewed focus on
two separate aspects, namely the network architecture design and the system-on-chip design. The design
issues and practices are illustrated by examples from the automotive application domain. After that, the
chapter introduces selected application domains for networked embedded systems, namely: industrial
and building automation control, and automotive control applications. The focus of the discussion is on
the networking aspects. The chapter gives an overview of the networks used in industrial applications,
including the industrial Ethernet and its standardization process; building automation control; and net-
works for automotive control and other applications from the automotive domain but the emphasis
is on networks for safety critical solutions. Finally, general aspects of wireless sensor/actuator networks
are presented, and illustrated by an actual industrial implementation of the concept. At the end of the
chapter, a few paragraphs are dedicated to the security issues for networked embedded systems.
An authoritative introduction to real-time systems is provided in Real-Time in Embedded Systems. The
chapter covers extensively the areas of design and analysis, with some examples of analysis, as well as
tools; operating systems (an in-depth discussion of real-time embedded operating systems is presented in
the chapter Real-Time Embedded Operating Systems Standards and Perspectives); scheduling (the chapter
Real-Time Embedded Operating Systems: The Scheduling and Resource Management Aspects presents an
authoritative descriptionand analysis of real-time scheduling); communications to include descriptions of
selected eldbus technologies and Ethernet for real-time communications; and component based design,
as well as testing and debugging. This is essential reading for anyone interested in the area of real-time
systems.
2006 by Taylor & Francis Group, LLC
xii Preface
Design and Validation of Embedded Systems
The subsection Design and Validation of Embedded Systems contains material presenting design methodo-
logy for embeddedsystems andsupporting tools, as well as selectedsoftware andhardware implementation
aspects. Models of Computation (MoC) which are essentially abstract representations of computing
systems are used throughout to facilitate design and validation stages of systems development and
approaches to validation as well as available methods and tools. The verication methods, together
with an overview of verication languages, are presented in subsection Design and Verication Lan-
guages. In addition, the subsection presents novel research material including a framework used to
introduce different models of computation particularly suited to the design of heterogeneous multi-
processor SoC, and a mathematical model of embedded systems based on the theory of agents and
interactions.
A comprehensive introduction to the design methodology for embedded systems is presented in the
chapter Design of Embedded Systems. It gives an overview of the design issues and stages. Then, the
chapter presents, in quite some detail, the functional design, function/architecture and hardware/software
codesign, and hardware/software coverication and hardware simulation. Subsequently, the chapter dis-
cusses selected software and hardware implementation issues. While discussing different design stages and
approaches, the chapter also introduces and evaluates supporting tools.
An excellent introduction to the topic of models of computation, particularly for embedded systems, is
presented in the chapter Models of Embedded Computation. The chapter introduces the origin of MoC, and
the evolution from models of sequential and parallel computation to attempts to model heterogeneous
architectures. In the process, the chapter discusses, in relative detail, selected nonfunctional properties
such as power consumption, component interaction in heterogeneous systems, and time. It also presents a
new framework used to introduce four different models of computation, and shows how different time
abstractions can serve different purposes and needs. The framework is subsequently used to study the
coexistence of different computational models; specically the interfaces between two different MoCs and
the renement of one MoC into another. This part of the chapter is particularly relevant to the material
on the design of heterogeneous multiprocessor SoC presented in the section System-on-Chip Design.
A comprehensive survey of selected models of computation is presented in the chapter Modeling
Formalisms for Embedded System Design. The surveyed formalisms include Finite State Machines (FSM),
Finite State Machines with Datapath (FSMD), Moore machine, Mealy machine, Codesign Finite State
Machines (CFSM), Program State Machines (PSM), Specication and Description Language (SDL),
Message Sequence Charts (MSC), Statecharts, Petri nets, synchronous/reactive models, discrete event
system, Dataow Models, etc. The presentation of individual models is augmented by numerous
examples.
The chapter System Validation briey discusses approaches to requirements capture, analysis and
validation, and surveys available methods and tools to include: descriptive formal methods such as
VDM, Z, B, RAISE (Rigorous Approach to Industrial Software Engineering), CASL (Common Algebraic
Specication Language), SCR (Software Cost Reduction), and EVES; deductive veriers: HOL, Isabelle,
PVS, Larch, Nqthm, and Nuprl; state exploration tools: SMV (Symbolic Model Verier), Spin, COSPAN
(COordination SPecication Analysis), MEIJE, CADP, and Murphi. It also presents a mathematical model
of embedded systems based on the theory of agents and interactions. To underline a novelty of this form-
alism, classical theories of concurrency are surveyed to include process algebras, temporal logic, timed
automata, (Gurevichs) ASM (Abstract State Machine), and rewriting logic. As an illustration, the chapter
presents a specication of a simple scheduler.
Design and Verication Languages
This section gives a comprehensive overview of languages used to specify, model, verify, and program
embedded systems. Some of those languages embody different models of computation discussed in
the previous section. A brief overview of Architecture Description Languages (ADL) is presented in
2006 by Taylor & Francis Group, LLC
Preface xiii
Embedded Applications (Automotive Networks); the use of this class of languages, in the context of
describing in-car embedded electronic systems, is illustrated through the EAST-ADL language.
An authoritative introduction to a broad range of languages used in embedded systems is presen-
ted in the chapter Languages for Embedded Systems. The chapter surveys some of the most representative
and widely used languages. Software languages: assembly languages for complex instruction set computers
(CISC), reduced instruction set computers (RISC), digital signal processors (DSPs) and very-long instruc-
tion word processors (VLIWs), and for small (4- and 8-bit) microcontrollers; the C and C++Languages;
Java; and real-time operating systems. Hardware languages: Verilog andVHDL. Dataowlanguages: Kahn
Process Networks and Synchronous Dataow (SDF). Hybrid languages: Esterel, SDL, and SystemC. Each
group of languages is characterized for their specic application domains and illustrated with ample code
examples.
An in-depth introduction to synchronous languages is presented in The Synchronous Hypothesis and
Synchronous Languages. Before introducing the synchronous languages, the chapter discusses the concept
of synchronous hypothesis: the basic notion, mathematical models, and implementation issues. Sub-
sequently, it overviews the structural languages used for modeling and programming synchronous
applications. Imperative languages, Esterel and SyncCharts, provide constructs to deal with control-
dominated programs. Declarative languages, Lustre and Signal, are particularly suited for applications
based on intensive data computation and dataow organization. Future trends are also covered.
The chapter Introduction to UML and the Modeling of Embedded Systems gives an overview of the
use of UML (Unied Modeling Language) for modeling embedded systems. The chapter presents a
brief overview of UML and discusses UML features suited to represent the characteristics of embedded
systems. The UML constructs, the language use, and other issues are introduced through an example
of an automatic teller machine. The chapter also briey discusses a standardized UML prole (a spe-
cication language instantiated from the UML language family) suitable for modeling of embedded
systems.
A comprehensive survey and overview of verication languages is presented in the chapter Verication
Languages. It describes languages for verication of hardware, software, and embedded systems. The focus
is on the support that a verication language provides for dynamic verication based on simulation,
as well as static verication based on formal techniques. Before discussing the languages, the chapter
provides some background on verication methods. This part introduces basics of simulation-based
verication, formal verication, and assertion-based verication. It also discusses selected logics that
form the basis of languages described in the chapter: propositional logic, rst-order predicate logic,
temporal logics, andregular and-regular languages. The hardware vericationlanguages (HVLs) covered
include: e, OpenVera, Sugar/PSL, and ForSpec. The languages for software verication overviewed include
programming languages: C/C++, and Java; and modeling languages: UML, SDL, and Alloy. Languages
for SoCs and embedded systems verication include system-level modeling languages: SystemC, SpecC,
and SystemVerilog. The chapter also surveys domain-specic verication efforts, such as those based on
Esterel and hybrid systems.
Operating Systems and Quasi-Static Scheduling
This subsectionoffers a comprehensive introductiontoreal-time andembeddedoperating systems tocover
fundamentals and selected advanced issues. To complement this material with new developments, it gives
an overview of the operating system interfaces specied by the POSIX 1003.1 international standard and
related to real-time programming and introduces a class of operating systems based on virtual machines.
The subsection also includes research material on quasi-static scheduling.
The chapter Real-Time Embedded Operating Systems: Standards and Perspectives provides a compre-
hensive introduction to the main features of real-time embedded operating systems. It overviews some
of the main design and architectural issues of operating systems: system architectures, process and
thread model, processor scheduling, interprocess synchronization and communication, and network sup-
port. The chapter presents a comprehensive overview of the operating system interfaces specied by
2006 by Taylor & Francis Group, LLC
xiv Preface
the POSIX 1003.1 international standard and related real-time programming. It also gives a short
description of selected open-source real-time operating systems to include eCos, Clinux, RT-Linux and
RTAI, and RTEMS. The chapter also presents a fairly comprehensive introduction to a class of operating
systems based on virtual machines.
Task scheduling algorithms and resource management policies, put in the context of real-time
systems, are the main focus of the chapter Real-Time Embedded Operating Systems: The Schedul-
ing and Resource Management Aspects. The chapter discusses in detail periodic task handling to
include Timeline Scheduling (TS), Rate-Monotonic (RM) scheduling, Earliest Deadline First (EDF)
algorithm, and approaches to handle tasks with deadlines less than periods scheme; and aperi-
odic task handling. Protocols for accessing shared resources discussed include Priority Inherit-
ance Protocol (PIP) and Priority Ceiling Protocol (PCP). Novel approaches, which provide ef-
cient support for real-time multimedia systems, for handling transient overloads and execution
overruns in soft real-time systems working in dynamic environments are also mentioned in the
chapter.
The chapter Quasi-Static Scheduling of Concurrent Specications presents methods aimed at efcient
synthesis of uniprocessor software with an aim to improve speed of the scheduled design. The proposed
approach starts froma specication represented in terms of concurrent communicating processes, derives
an intermediate representation based on Petri nets or Boolean Dataow Graphs, and nally attempts
to obtain a sequential schedule to be implemented on a processor. The potential benets result from
replacement of explicit communication among processes by data assignment and reduced amount of
context switches due to a reduction of the number of processes.
Timing and Performance Analysis
Many embedded systems, particularly hard real-time systems, impose strict restrictions on the execution
time of tasks which are required to be completed within certain time bounds. For this class of systems,
schedulability analysis requires the upper bounds for the execution times of all tasks to be known in
order to verify whether the system meets its timing requirements. The chapter Determining Bounds on
Execution Times presents architecture of the aiT timing-analysis tool and an approach to timing analysis
implemented in the tool. In the process, the chapter discusses cache-behavior prediction, pipeline analysis,
path analysis using integer linear programming, and other issues. The use of this approach is put in the
context of upper bounds determination. Inaddition, the chapter gives a brief overviewof other approaches
to timing analysis.
The validation of nonfunctional requirements of selected implementation aspects such as deadlines,
throughputs, buffer space, power consumption, etc., comes under performance analysis. The chapter
Performance Analysis of Distributed Embedded Systems discusses issues behind performance analysis and its
role in the design process. It also surveys a few selected approaches to performance analysis for distributed
embedded systems to include simulation-based methods, holistic scheduling analysis, and compositional
methods. Subsequently, the chapter introduces the performance network approach, as stated by authors,
inuenced by the worst-case analysis of communication networks. The presented approach allows one to
obtain upper and lower bounds on quantities such as end-to-end delay and buffer space; it also covers
all possible corner cases independent of their probability.
Power Aware Computing
Embedded nodes, or devices, are frequently battery powered. The growing power dissipation, with
the increase in density of integrated circuits and clock frequency, has a direct impact on the cost of
packaging and cooling, as well as reliability and lifetime. These and other factors make the design
for low power consumption a high priority for embedded systems. The chapter Power Aware Embed-
ded Computing presents a survey of design techniques and methodologies aimed at reducing static and
dynamic power dissipation. The chapter discusses energy and power modeling to include instruction
2006 by Taylor & Francis Group, LLC
Preface xv
level and function level power models, micro-architectural power models, memory and bus models, and
battery models. Subsequently, the chapter discusses system/application level optimizations which explore
different task implementations exhibiting different power/energy versus quality-of-service characterist-
ics. Energy efcient processing subsystems: voltage and frequency scaling, dynamic resource scaling, and
processor core selection, are also overviewed in the chapter. Finally, the chapter discusses energy efcient
memory subsystems: cache hierarchy tuning, novel horizontal and vertical cache partitioning schemes,
dynamic scaling of memory elements, software controlled memories, scratch-pad memories, improving
access patterns to on-chip memory, special purpose memory subsystems for media streaming, and code
compression, and interconnect optimizations.
Security in Embedded Systems
There is a growing trendfor networking of embeddedsystems. Representative examples of suchsystems can
be found in automotive, train, and industrial automation domains. Many of those systems are required
to be connected to other networks to include LAN, WAN, and the Internet. For instance, there is a
growing demand for remote access to process data at the factory oor. This, however, exposes systems
to potential security attacks, which may compromise their integrity and cause damage. The limited
resources of embedded systems pose considerable challenge for the implementation of effective security
policies which, in general, are resource demanding. An excellent introduction to the security issues in
embedded systems is presented in the chapter Design Issues in Secure Embedded Systems. The chapter
outlines security requirements in computing systems, classies abilities of attackers, and discusses security
implementation levels. Security constraints in the embedded systems designs discussed include energy
considerations, processing power limitations, exibility and availability requirements, and cost of imple-
mentation. Subsequently, the chapter presents the main issues in the design of secure embedded systems.
It also covers, in detail, attacks and countermeasures of cryptographic algorithm implementations in
embedded systems.
II System-on-Chip Design
Multi-Processor Systems-on-Chip (MPSoC), which combine the advantages of parallel processing with
the high integration levels of SoCs, emerged as a viable solution to meet the demand for computational
power required by applications such as network and media processors. The design of MPSoCs typically
involves integration of heterogeneous hardware and software IP components. However, the support for
reuse of hardware and software IP components is limited, thus potentially making the design process
labor-intensive, error-prone, and expensive. Selected component-based design methodologies for the
integration of heterogeneous hardware and software IP components are presented in this section together
with other issues such as design of ASIPs, communication architectures to include NoC, and platform
based design, to mention some. Those topics are presented in eight chapters introducing the SoC concept
and design issues; design of ASIPs; SoC communication architectures; principles and guidelines for
the NoC design; platform-based design principles; converter synthesis for incompatible protocols; a
component-based design automation approach for multiprocessor SoC platforms; an interface-centric
approach to the design and programming of embedded multiprocessors; and an STMicroelectronics
developed exploration multiprocessor SoC platform.
A comprehensive introduction to the SoC concept, in general, and design issues is provided in the
chapter System-on-Chip and Network-on-Chip Design. The chapter discusses basics of SoC; IP cores and
virtual components; introduces the concept of architectural platforms and surveys selected industry
offerings; and provides a comprehensive overview of the SoC design process.
A retargetable framework for ASIP design is presented in A Novel Methodology for the Design of
Application-Specic Instruction-Set Processors. The framework, which is based on machine descriptions
in the LISA language, allows for automatic generation of software development tools including HLL
C-compiler, assembler, linker, simulator, and graphical debugger frontend. In addition, synthesizable
2006 by Taylor & Francis Group, LLC
xvi Preface
hardware description language code can be derived for architecture implementation. The chapter also
gives an overview of various machine description languages in the context of their suitability for the
design of ASIP; discusses the ASIPs design ow, and the LISA language.
On-chip communication architectures are presented in the chapter State-of-the-Art SoC Communica-
tion Architectures. The chapter offers an in-depth description and analysis of three most relevant, from
industrial and research viewpoints, architectures to include ARM developed AMBA (Advanced Micro-
Controller Bus Architecture) and new interconnect schemes, namely Multi-Layer AHB and AMBA AXI;
IBM developed CoreConect; and STMicroelectronics developed STBus. In addition, the chapter surveys
other architectures such as Wishbone, Sonics SiliconBackplane Micronetwork, Peripheral Interconnect
Bus (PI-Bus), Avalon, andCoreFrame. The chapter alsooffers analysis of selectedarchitectures andextends
the discussion of on-chip interconnects to NoC.
Basic principles and guidelines for the NoC design are introduced in Network-on-Chip Design for
Gigascale Systems-on-Chip. It discusses a rationale for the design paradigm shift of SoC communication
architectures from shared busses to NoCs; and briey surveys related work. Subsequently, the chapter
presents details of NoC building blocks to include switch, network interface, and switch-to-switch links.
In discussing the design guidelines, the chapter uses a case study of a real NoCarchitecture (Xpipes) which
employs some of the most advanced concepts in NoC design. It also discusses the issue of heterogeneous
NoC design, and the effects of mapping the communication requirements of an application onto a
domain-specic NoC.
An authoritative discussion of the platform-based design (PBD) concept is provided in the chapter
Platform-Based Design for Embedded Systems. The chapter introduces PBD principles and outlines the
interplay between micro-architecture platforms andApplication ProgramInterface (API), or programmer
model, which is a unique abstract representation of the architecture platform via the software layer. The
chapter also introduces three applications of PBD: network platforms for communication protocol design,
fault-tolerant platforms for the designof safety-critical applications, andanalog platforms for mixed-signal
integrated circuit design.
An approach to synthesis of interface converters for incompatible protocols in a component-
based design automation is presented in Interface Specication and Converter Synthesis. The chapter
surveys several approaches for synthesizing converters illustrated by simple examples. It also intro-
duces more advanced frameworks based on abstract algebraic solutions that guarantee converter
correctness.
The chapter Hardware/Software Interface Design for SoC presents a component-based design automa-
tion approach for MPSoC platforms. It briey surveys basic concepts of MPSoC design and discusses
some related platform and component-based approaches. It provides a comprehensive overview of
hardware/software IP integration issues to include bus-based and core-based approaches, integrating soft-
ware IP, communicationsynthesis (the concept is presentedindetail inInterface Specicationand Converter
Synthesis), andIPderivation. The focal point of the chapter is a newcomponent-baseddesignmethodology
and the design environment for the integration of heterogeneous hardware and software IP components.
The presented methodology, which adopts the automatic communication synthesis approach and uses a
high-level API, generates both hardware and software wrappers, as well as a dedicated operating systemfor
programmable components. The IP integration capabilities of the approach and accompanying software
tools are illustrated by redesigning a part of a VDSL modem.
The chapter Design and Programming of Embedded Multiprocessors: An Interface-Centric Approach
presents a design methodology for implementing media processing applications as MPSoCs centered
around the Task Transaction Level (TTL) interface. The TTL interface can be used to build
executable specications; it also provides a platform interface for implementing applications as
communicating hardware and software tasks on a platform infrastructure. The chapter introduces
the TTL interface in the context of the requirements, and discusses mapping technology which
supports structured design and programming of embedded multiprocessor systems. The chapter also
presents two case studies of implementations of TTL interface on different architectures: a multi-DSP
2006 by Taylor & Francis Group, LLC
Preface xvii
architecture, using an MP3 decoder application to evaluate this implementation; and a smart-imaging
multiprocessor.
The STMicroelectronics developed StepNP
TM
exible MPSoC platform and its key architectural com-
ponents are described in A MultiProcessor SoC Platform and Tools for Communications Applications. The
platform was developed with an aim to explore tool and architectural issues in a range of high-speed
communications applications, particularly packet processing applications used in network infrastructure
SoCs. Subsequently, the chapter reviews the MultiFlex modeling and analysis tools developed to support
the StepNP platform. The MultiFlex environment supports two parallel programming models: a distrib-
uted systemobject component (DSOC) message passing model and a symmetrical multiprocessing (SMP)
model using shared memory. It maps these models onto the StepNP MPSoCplatform. The use of the plat-
form and supporting environment are illustrated by two examples mapping IPv4 packet forwarding and
trafc management applications onto the StepNP platform. Detailed results are presented and discussed
for a range of architectural parameters.
III Testing of Embedded Core-Based Integrated Circuits
The ever-increasing circuit densities and operating frequencies, as well as the use of the SoC designs, have
resulted in enormous test data volume for todays embedded core-based integrated circuits. According
to the Semiconductor Industry Association, in the International Technology Roadmap for Semiconductors
(ITRS), 2001 Edition, the density of ICs can reach 2 billion transistors per square cm, and 16 billion
transistors per chip are likely by 2014. Based on that, according to some estimates (A. Khoche and J.
Rivoir, I/O bandwidth bottleneck for test: is it real? Test Resource Partitioning Workshop, 2002), the test
data volume for ICs in 2014 is likely to increase 150 times in reference to 1999. Some other problems
include the growing disparity between performance of the design and the automatic test equipment which
makes at-speed testing, particularly of high-speed circuits, a challenge and results in increasing yield loss;
high cost of manually developed functional tests; and growing cost of high-speed and high-pincount
testers. This section contains two chapters introducing new techniques addressing some of the issues
indicated above.
The chapter Modular Testing and Built-In Self-Test of Embedded Cores in System-on-Chip Integrated
Circuits presents a survey of techniques that have been proposed in the literature for reducing test time
and test data volume. The techniques surveyed rely on modular testing of embedded cores and built-in
self test (BIST). The material on modular testing of embedded cores in a system-on-a-chip describes
wrapper design and optimization, test access mechanism(TAM) design and optimization, test scheduling,
integrated TAMoptimization and test scheduling, and modular testing of mixed-signal SOCs. In addition,
the chapter reviews a recent deterministic BIST approach in which a recongurable interconnection
network (RIN) is placed between the outputs of the linear-feedback shift register (LFSR) and the inputs
of the scan chains in circuit under test. The RIN, which consists only of multiplexer switches, replaces the
phase shifter that is typically used in pseudo-random BIST to reduce correlation between the test data bits
that are fed into the scan chains. The proposed approach does not require any circuit redesign and it has
minimal impact on circuit performance.
Hardware-based self-testing techniques (BIST) have limitations due to performance, area, and design
time overhead, as well as problems causedby the applicationof nonfunctional patterns (whichmay result in
higher power consumptionduring testing, over-testing, yieldloss problems, etc.). The embeddedsoftware-
based self-testing technique has a potential to alleviate the problems caused by using external testers, as well
as structural BIST problems. The embedded software-based self-testing utilizes on-chip programmable
resources (such as embedded microprocessors and DSPs) for on-chip test generation, test delivery, signal
acquisition, response analysis, and even diagnosis. The chapter Embedded Software-Based Self-Testing for
SoC Design discusses processor self-test methods targeting stuck-at faults and delay faults; presents a brief
description of a processor self-diagnosis method; presents methods for self-testing of buses and global
2006 by Taylor & Francis Group, LLC
xviii Preface
interconnects as well as other nonprogrammable IP cores on SoC; describes instruction-level design-for-
testability (Df T) methods based on insertion of test instructions to increase the fault coverage and reduce
the test application time and test program size; and outlines DSP-based self-test for analog/mixed-signal
components.
IV Networked Embedded Systems
Networked embedded systems (NES) are essentially spatially distributed embedded nodes (implemented
on a board, or a single chip in future) interconnected by means of wireline or/and wireless communication
infrastructure and protocols, interacting with the environment (via sensor/actuator elements) and each
other, and, possibly, a master node performing some control and coordination functions to coordinate
computing and communication in order to achieve certain goal(s). An example of a network embedded
system may be an in-vehicle embedded network comprising a collection of ECUs networked by means of
safety-critical communication protocols, such as FlexRay or TTP/C, for the purpose of controlling vehicle
functions, such as electronic engine control, anti-locking brake system, active suspension, etc. (for details
of automotive applications see the last section in the book).
An excellent introduction to NES is presented in the chapter Design Issues in Networked Embedded Sys-
tems. This chapter outlines some of the most representative characteristics of NES, and surveys potential
applications. It also explains design issues for large-scale distributed NES such as environment interac-
tion, life expectancy of nodes, communication protocol, recongurability, security, energy constraints,
operating systems, etc. Design methodologies and tools are discussed as well.
The topic of middleware for NES is addressed in Middleware Design and Implementation for Networked
Embedded Systems. This chapter discusses the role of middleware in NES and the challenges in design and
implementation, such as remote communication, location independence, reuse of the existing infrastruc-
ture, providing real-time assurances, providing a robust DOCmiddleware, reducing middleware footprint,
and support for simulation environments. The focal points of the chapter are the sections describing the
design and implementation of nORB (a small footprint real-time object request broker tailored to spe-
cic embedded sensor/actuator applications), and the rationale behind the adopted approach, namely to
address the NES design and implementation challenges.
V Sensor Networks
The distributed (wireless) sensor networks are a relatively new and exciting proposition for collecting
sensory data in a variety of environments. The design of this kind of network poses a particular challenge
due to limited computational power and memory size, bandwidth restrictions, power consumption
restriction if battery powered, communication requirements, and unattended mode of operation in
case of inaccessible and/or hostile environments, to mention some. It provides a fairly comprehensive
discussion of the design issues related to, in particular, self-organizing wireless networks. It introduces
fundamental concepts behind sensor networks, discusses architectures, energy-efcient Medium Access
Control (MAC), time synchronization, distributed localization, routing, distributed signal processing,
security, and it surveys selected software solutions.
A general introduction to the area of wireless sensor networks is provided in Introduction to Wireless
Sensor Networks. A comprehensive overview of the topic is provided in Issues and Solutions in Wireless
Sensor Networks, which introduces fundamental concepts, selected application areas, design challenges,
and other relevant issues.
The chapter Architectures for Wireless Sensor Networks provides an excellent introduction to various
aspects of the architecture of wireless sensor networks. It includes the description of a sensor node
architecture and its elements: sensor platform, processing unit, communication interface, and power
source. In addition, it presents a mathematical model of power consumption by a node, to account for
energy consumption by radio, processor, and sensor elements. The chapter also discusses architectures
2006 by Taylor & Francis Group, LLC
Preface xix
of wireless sensor networks developed on the protocol stack approach and EYES project approach. In the
context of the EYES project approach, which consists of only two key systemabstraction layers, namely the
sensor and networking layer and the distributed services layer, the chapter discusses distributed services
that are required to support applications for wireless sensor networks and approaches adopted by various
projects.
Energy efciency is one of the main issues in developing MAC protocols for wirelesss sensor networks.
This is largely due to unattended operation and battery-based power supply, and a need for collabora-
tion as a result of limited capabilities of individual nodes. Energy-Efcient Medium Access Control offers
a comprehensive overview of the issues involved in the design of MAC protocols. It contains a discus-
sion of MAC requirements for wireless sensor networks such as hardware characteristics of the node,
communication patterns, and others. It surveys 20 medium access protocols specially designed for sensor
networks and optimized for energy efciency. It also discusses qualitative merits of different organizations;
contention-based, slotted, and TDMA-based protocols. In addition, the chapter provides a simulation-
based comparison of the performance and energy efciency of four MAC protocols: Low Power Listening,
S-MAC, T-MAC, and L-MAC.
The knowledge of time at a sensor node may be essential for the correct operation of the system. Time
Division Multiple Access (TDMA) scheme (adopted in TTP/C and FlexRay protocols, for instance see
section on automotive applications) requires the nodes to be synchronized. The time synchronization
issues in sensor networks are discussed in Overview of Time Synchronization Issues in Sensor Networks.
The chapter introduces basics of time synchronization for sensor networks. It also describes design
challenges and requirements in developing time synchronization protocols such as the need to be robust,
energy aware, able to operate correctly in absence of time servers (server-less), be light-weight, and
to offer a tunable service. The chapter also overviews factors inuencing time synchronization such as
temperature, phase noise, frequency noise, asymmetric delays, and clock glitches. Subsequently, different
types of timing techniques are discussed: Network Time Protocol (NTP), Timing-sync Protocol for Sensor
Networks (TPSN), Reference-Broadcast Synchronization (RBS), and Time-Diffusion Synchronization
Protocol (TDP).
The knowledge of the location of nodes is essential for the base station to process information from
sensors, and to arrive at valid and meaningful results. The localization issues in ad hoc wireless sensor
networks are discussed in Distributed Localization Algorithms. The focus of this presentation is on three
distributed localization algorithms for large-scale ad hoc sensor networks which meet the basic require-
ments for self-organization, robustness, and energy efciency: ad hoc positioning by Niculescu and Nath,
N-hop multilateration by Savvides et al., and robust positioning by Savarese et al. The selected algorithms
are evaluated by simulation.
In order to forward information from a sensor node to the base station or another node for processing,
the node requires routing information. The chapter Routing in Sensor Networks provides a comprehensive
survey of routing protocols usedinsensor networks. The presentationis dividedintoat routing protocols:
Sequential Assignment Routing (SAR), direct diffusion, minimum cost forwarding approach, Integer
Linear Program (ILP) based routing approach, Sensor Protocols for Information via Negotiation (SPIN),
geographic routing protocols, parametric probabilistic routing protocol, and Min-MinMax; and cluster-
based routing protocols: LowEnergy Adaptive Clustering Hierarchy (LEACH), Threshold sensitive Energy
Efcient sensor Network protocol (TEEN), and two-level clustering algorithm.
Due totheir limitedresources, sensor nodes frequently provide incomplete informationonthe objects of
their observation. Thus the complete information has to be reconstructed from data obtained from many
nodes, frequently providing redundant data. The distributed data fusion is one of the major challenges
in sensor networks. The chapter Distributed Signal Processing in Sensor Networks introduces a novel
mathematical model for distributed information fusion, which focuses on solving a benchmark signal
processing problem (spectrum estimation) using sensor networks.
With deployment of sensor networks in areas such as battleeld or factory oor, security becomes
of paramount importance, and a challenge. The existing solutions are impractical due to limited cap-
abilities (processing power, available memory, and available energy) of sensor nodes. The chapter
2006 by Taylor & Francis Group, LLC
xx Preface
Sensor Network Security gives an introduction to selected specic security challenges in wireless sensor
networks: denial of service and routing security, energy efcient condentiality and integrity, authentic-
ated broadcast, alternative approaches to key management, and secure data aggregation. Subsequently,
it discusses in detail some of the proposed approaches and solutions: SNEP and TESLA protocols
for condentiality and integrity of data, the LEAP protocol, and probabilistic key management for key
management, to mention some.
The chapter Software Development for Large-Scale Wireless Sensor Networks presents basic concepts
related to software development for wireless sensor networks, as well as selected software solutions.
The solutions include: TinyOS, a component-based operating system, and related software packages;
MAT, a byte-code interpreter; and TinyDB, a query processing system for extracting information from
a network of TinyOS sensor nodes. SensorWare, a software framework for wireless sensor networks,
provides querying, dissemination, and fusion of sensor data, as well as coordination of actuators. MiLAN
(Middleware Linking Applications and Networks), a middleware concept, aims to exploit information
redundancy provided by sensor nodes. EnviroTrack, a TinyOS-based application, provides a convenient
way to program sensor network applications that track activities in their physical environment. SeNeTs, a
middleware architecture for wireless sensor networks, is designed to support the pre-deployment phase.
The chapter also discusses software solutions for simulation, emulation, and test of large-scale sensor
networks: TinyOS SIMulator (TOSSIM), a simulator based on the TinyOS framework; EmStar, a software
environment for developing anddeploying applications for sensor networks consisting of 32-bit embedded
Microserver platforms; and SeNeTs, a test and validation environment.
VI Embedded Applications
The last sectioninthe book, Embedded Applications, focuses onselected applications of embedded systems.
It covers automotive eld, industrial automation, and intelligent sensors. The aim of this section is to
introduce examples of the actual embedded applications in fast-evolving areas which, for various reasons,
have not received proper coverage in other publications, particularly in the automotive area.
Automotive Networks
The automotive industry is aggressively adopting mechatronic solutions to replace or duplicate existing
mechanical/hydraulic systems. The embedded electronic systems together with dedicated communication
networks and protocols play pivotal roles in this transition. This subsection contains three chapters that
offer a comprehensive overviewof the area by presenting topics, such as networks and protocols, operating
systems and other middleware, scheduling, safety and fault tolerance, and actual development tools, used
by the automotive industry.
This section begins with a contribution entitled Design and Validation Process of In-Vehicle Embedded
Electronic Systems that provides a comprehensive introduction to the use of embedded systems in auto-
mobiles, their design and validation methods, and tools. The chapter identies and describes a number
of specic application domains for in-vehicle embedded systems, such as power train, chassis, body,
and telematics and HMI. It then outlines some of the main standards used in the automotive industry
to ensure interoperability between components developed by different vendors; this includes networks
and protocols, as well as operating systems. The surveyed networks and protocols include (for details
of networks and protocols see The Industrial Communication Technology Handbook, CRC Press, 2005,
Richard Zurawski, editor) Controller Area Network (CAN), Vehicle Area Network (VAN), J1850, TTP/C
(Time-TriggeredProtocol), FlexRay, Local Interconnect Network (LIN), Media OrientedSystemTransport
(MOST), and IDB-1394. This material is followed by a brief introduction of OSEK/VDX (Offene Systeme
und deren schnittstellen fr die Elektronik im Kraft-fahrzeug), a multitasking operating system that
has become a standard for automotive applications in Europe. The chapter introduces a new language,
EAST-ADL, which offers support for an unambiguous description of in-vehicle embedded electronic
2006 by Taylor & Francis Group, LLC
Preface xxi
systems at each level of their development. The discussion of the design and validation process and related
issues is facilitated by a comprehensive case study drawn from actual PSA Peugeot-Citron application.
This case study is essential reading for those interested in the development of this kind of embedded
system.
The planned adoption of X-by-wire technologies in automotive applications pushed the automotive
industry into the realm of safety critical systems. There is a substantial body of literature on safety critical
issues and fault tolerance, particularly when applied to components and systems. Less has been published
on safety-relevant communication services and fault-tolerant communication systems as mandated in
X-by-wire technologies in automotive applications. This is largely due to the novelty of fast-evolving
concepts and solutions, which is pursued mostly by industrial consortia. Those two topics are presented
in detail in Fault-Tolerant Services for Safe In-Car Embedded Systems. The material on safety-relevant
communication services discusses some of the main services and functionalities that the communication
system should provide to facilitate the design of fault-tolerant automotive applications. This includes ser-
vices supporting reliable communication, such as robustness against electromagnetic interference (EMI),
time-triggered transmission, global time, atomic broadcast, and avoidingbabbling-idiots.Also discussed
are higher-level services that provide fault-tolerant mechanisms that belong conceptually to layers above
MAC in the OSI reference model, namely group membership service, management of nodes redundancy,
support for functioning mode, etc. The chapter also discusses fault tolerant communication protocols to
include TTP/C, FlexRay, and variants of CAN (TTCAN, RedCAN, and CANcentrate).
TheVolcanoconcept for designandimplementationof in-vehicle networks using the standardizedCAN
and LIN communication protocols is presented in the chapter Volcano Enabling Correctness by Design.
This chapter provides an in-depth description of the Volcano approach and a suite of software tools,
developed by Volcano Communications Technologies AG, which supports requirements capture, model-
based design, automatic code generation, and system-level validation capabilities. This is an example of
an actual development environment widely used by the automotive industry.
Industrial Automation
The current trend for exible and distributed control and automation has accelerated the migration of
intelligence and control functions to the eld devices; particularly sensors and actuators. The increased
processing capabilities of those devices were instrumental in the emergence of a trend for networking of
eld devices around industrial data networks, thus making access to any device from any place in the
plant, or even globally, technically feasible. The benets are numerous, including increased exibility,
improved system performance, and ease of system installation, upgrade, and maintenance. Embed-
ded web servers are increasingly used in industrial automation to provide HumanMachine Interface
(HMI), which allows for web-based conguration, control and monitoring of devices and industrial
processes.
Anintroductiontothe designof embeddedwebservers is presentedinthe chapter EmbeddedWeb Servers
in Distributed Control Systems. The focus of this chapter is on Field Device Web Servers (FDWS). The
chapter provides a comprehensive overview of the context in which the embedded web servers are usually
implemented, as well as the structure of an FDWS application with the presentation of its component
packages and the mutual relationship between the content of the packages and the architecture of a typical
embedded site. All this is discussed in the context of an actual FDWS implementation and application
deployed at one of the Alstom (France) sites.
Remote access to eld devices may lead to many security challenges. The embedded web servers are
typically run on processors with limited memory and processing power. These restrictions necessitate
a deployment of lightweight security mechanisms. Vendor tailored versions of standard security protocol
suites such as Secure Sockets Layer (SSL) and IP Security Protocol (IPSec) may still not be suitable due
to excessive demand for resources. In applications restricted to the Hypertext Transfer Protocol (HTTP),
Digest Access Authentication (DAA), which is a security extension to HTTP, offers an alternative and
viable solution. Those issues are discussed in the chapter HTTP Digest Authentication for Embedded Web
2006 by Taylor & Francis Group, LLC
xxii Preface
Servers. This chapter overviews mechanisms and services, as well as potential applications of HTTP Digest
Authentication. It also surveys selected embedded web server implementations for their support for DAA.
This includes Apache 2.0.42, Allegro RomPager 4.05, and GoAhead 2.1.2.
Intelligent Sensors
The advances in the design of embedded systems, availability of tools, and falling fabrication costs allowed
for cost-effective migration of the intelligence and control functions to the eld devices, particularly
sensors and actuators. Intelligent sensors combine computing, communication, and sensing functions.
The trend for increased functional complexity of those devices necessitates the use of formal descriptive
techniques and supporting tools throughout the design and implementation process. The chapter Intelli-
gent Sensors: Analysis and Design tackles some of those issues. It reviews some of the main characteristics
of the generic intelligent sensor formal model; subsequently, it discusses an implementation of the model
using the CAP language, which was developed specically for the design of intelligent sensors. A brief
introduction to the language is also provided. The whole development process is illustrated by using an
example of a simple distance measuring system comprising an ultrasonic transmitter and two receivers.
Locating Topics
To assist readers with locating material, a complete table of contents is presented at the front of the book.
Each chapter begins with its own table of contents. Two indexes are provided at the end of the book: the
index of authors contributing to the book, together with the titles of their contributions, and a detailed
subject index.
Richard Zurawski
2006 by Taylor & Francis Group, LLC
Acknowledgments
My gratitude goes to Luciano Lavagno, Grant Martin, and Alberto Sangiovanni-Vincentelli who have
provided advice and support while preparing this book. This book would never have had a chance to
take off without their assistance. Andreas Willig helped with identifying some authors for the section on
Sensor Networks. Also, I would like to thank the members of the International Advisory Board for their
help with the organization of the book and selection of authors. I have received tremendous cooperation
from all contributing authors. I would like to thank all of them for that. I would like to express gratitude
to my publisher Nora Konopka, and other Taylor and Francis staff involved in the book production,
particularly Jessica Vakili, Elizabeth Spangenberger, and Gail Renard. My love goes to my wife who
tolerated the countless hours I spent on preparing this book.
2006 by Taylor & Francis Group, LLC
About the Editor
Dr. Richard Zurawski is president of ISA Group, San Francisco and Santa Clara, CA, involved in providing
solutions to Fortune 1000 companies. Prior to that, he held various executive positions with San Francisco
Bay area based companies. Dr. Zurawski is a cofounder of the Institute for Societal Automation, Santa
Clara, a research and consulting organization.
Dr. Zurawski has close to thirty years of academic and industrial experience, including a regular
professorial appointment at the Institute of Industrial Sciences, University of Tokyo, and a full-time
R&D advisor position with Kawasaki Electric Corp., Tokyo. He provided consulting services to Kawasaki
Electric, Ricoh, and Toshiba Corporations, Japan, and participated in 1990s in a number of Japanese
Intelligent Manufacturing Systems programs.
Dr. Zurawski has served as editor at large for IEEE Transactions on Industrial Informatics, and associate
editor for IEEE Transactions on Industrial Electronics; he also served as associate editor for Real-Time
Systems: The International Journal of Time-Critical Computing Systems, Kluwer Academic Publishers. He
was a guest editor of four special sections in IEEE Transactions on Industrial Electronics and a guest editor
of a special issue of the Proceedings of the IEEE dedicated to industrial communication systems. In 1998,
he was invited by IEEE Spectrum to contribute material on Java technology to Technology 1999: Analysis
and Forecast Issues. Dr. Zurawski is series editor for The Industrial Information Technology Series, Taylor
and Francis Group, Boca Raton, FL.
Dr. Zurawski has served as a vice president of the Institute of Electrical and Electronics Engineers
(IEEE) Industrial Electronics Society (IES), and was on the steering committee of the ASME/IEEE Journal
of Microelectromechanical Systems. In 1996, he received the Anthony J. Hornfeck Service Award from the
IEEE Industrial Electronics Society.
Dr. Zurawski has served as a general, program, and track chair for a number of IEEE conferences and
workshops, and has published extensively on various aspects of formal methods in the design of real-time,
embedded, and industrial systems, MEMS, parallel and distributed programming and systems, as well as
control and robotics. He is the editor of The Industrial Information Technology Handbook (2004), and The
Industrial Communication Technology Handbook (2005), both published by Taylor and Francis Group.
Dr. Richard Zurawski received his M.Sc. in informatics and automation, University of Mining and
Metallurgy, Krakow, Poland, and his Ph.D. incomputer science, La Trobe University, Melbourne, Australia.
2006 by Taylor & Francis Group, LLC
Contributors
ParhamAarabi
Department of Electrical and
Computer Engineering
University of Toronto
Ontario, Canada
Jos L. Ayala
Dpto. Ingenieria Electronica
E.T.S.I. Telecomunicacion
Ciudad Universitaria s/n
Madrid, Spain
Joo Paulo Barros
Universidade Nova de Lisboa
Faculdade de Cincias e
Tecnologia
Dep. Eng. Electrotcnica
Caparica, Portugal
Ali Alphan Bayazit
Princeton University
Princeton, New Jersey
Luca Benini
Dipartimento Elettronica
Informatica Sistemistica
University of Bologna
Bologna, Italy
Essaid Bensoudane
Advanced System Technology
STMicroelectronics
Ontario, Canada
Ivan Cibrario Bertolotti
IEIIT National Research
Council
Turin, Italy
Davide Bertozzi
Dipartimento Elettronica
Informatica Sistemistica
University of Bologna
Bologna, Italy
Jan Blumenthal
Institute of Applied
Microelectronics and
Computer Science
Dept. of Electrical
Engineering and
Information
Technology
University of Rostock
Rostock, Germany
Gunnar Braun
CoWare Inc.
Aachen, Germany
Giorgio C. Buttazzo
Dip. di Informatica e
Sistemistica
University of Pavia
Pavia, Italy
Luca P. Carloni
EECS Department
University of California at
Berkeley
Berkeley, California
Wander O. Cesrio
SLS Group
TIMA Laboratory
Grenoble, France
Krishnendu Chakrabarty
Department of Electrical and
Computer Engineering
Duke University
Durham, North Carolina
S. Chatterjea
Faculty of Electrical Engineering,
Mathematics, and Computer
Science
University of Twente
Enschede
The Netherlands
Kwang-Ting (Tim) Cheng
Department of Electrical and
Computer Engineering
University of California
Santa Barbara, California
Anik Costa
Universidade Nova de Lisboa,
Faculdade de Cincias e
Tecnologia
Dep. Eng. Electrotcnica
Caparica, Portugal
Mario Crevatin
Corporate Research
ABB Switzerland Ltd
Baden-Dattwil, Switzerland
Fernando De Bernardinis
EECS Department
University of California at
Berkeley
Berkeley, California
2006 by Taylor & Francis Group, LLC
xxviii Contributors
Erwin de Kock
Philips Research
Eindhoven, The Netherlands
Giovanni De Micheli
Gates Computer Science
Stanford University
Stanford, California
Robert de Simone
INRIA
Sophia-Antipolis, France
Eric Dekneuvel
University of Nice Sophia
Antipolis
Biot, France
S. Dulman
Faculty of Electrical Engineering,
Mathematics, and Computer
Science
University of Twente
Enschede
The Netherlands
Stephen A. Edwards
Department of Computer Science
Columbia University
New York, New York
Gerben Essink
Philips Research
Eindhoven, The Netherlands
A. G. Fragopoulos
Department of Electrical and
Computer Engineering
University of Patras
Patras, Greece
Shashidhar Gandham
The Department of Computer
Science
The University of Texas at Dallas
Richardson, Texas
Christopher Gill
Department of Computer Science
and Engineering
Washington University
St. Louis, Missouri
Frank Golatowski
Institute of Applied
Microelectronics and
Computer Science
Dept. of Electrical Engineering
and Information Technology
University of Rostock
Rostock, Germany
Lus Gomes
Universidade Nova de Lisboa
Faculdade de Cincias e
Tecnologia
Dep. Eng. Electrotcnica
Caparica, Portugal
Aarti Gupta
NEC Laboratories America
Princeton, New Jersey
Rajesh Gupta
Department of Computer Science
and Engineering
University of California at
San Diego
San Diego, California
Sumit Gupta
Tallwood Venture Capital
Palo Alto, California
Marc Haase
Institute of Applied
Microelectronics and
Computer Science
Dept. of Electrical Engineering
and Information Technology
University of Rostock
Rostock, Germany
Gertjan Halkes
Faculty of Electrical Engineering,
Mathematics, and Computer
Science
Delft University of Technology
Delft, The Netherlands
Matthias Handy
Institute of Applied
Microelectronics and
Computer Science
Dept. of Electrical Engineering
and Information
Technology
University of Rostock
Rostock, Germany
Hans Hansson
Department of Computer Science
and Engineering
Mlardalen University
Vsters, Sweden
P. Havinga
Faculty of Electrical Engineering,
Mathematics, and Computer
Science
University of Twente
Enschede
The Netherlands
ystein Haugen
Department of Informatics
University of Oslo
Oslo, Norway
Tomas Henriksson
Philips Research
Eindhoven, The Netherlands
Andreas Hoffmann
CoWare Inc.
Aachen, Germany
T. Hoffmeijer
Faculty of Electrical Engineering,
Mathematics, and Computer
Science
University of Twente
Enschede
The Netherlands
J. Hurink
Faculty of Electrical Engineering,
Mathematics, and Computer
Science
University of Twente
Enschede
The Netherlands
Margarida F. Jacome
Department of Electrical and
Computer Engineering
University of Texas at Austin
Austin, Texas
Omid S. Jahromi
Bioscrypt Inc.
Markham, Ontario, Canada
2006 by Taylor & Francis Group, LLC
Contributors xxix
Axel Jantsch
Department for Microelectronics
and Information Technology
Royal Institute of Technology
Kista, Sweden
A. A. Jerraya
SLS Group
TIMA Laboratory
Grenoble, France
J. V. Kapitonova
Glushkov Institute of Cybernetics
National Academy of Science of
Ukraine
Kiev, Ukraine
Alex Kondratyev
Cadence Berkeley Labs
Berkeley, California
Wido Kruijtzer
Philips Research
Eindhoven, The Netherlands
Koen Langendoen
Faculty of Electrical Engineering,
Mathematics, and Computer
Science
Delft University of Technology
Delft, The Netherlands
Michel Langevin
Advanced System Technology
STMicroelectronics
Ontario, Canada
Luciano Lavagno
Cadence Berkeley Laboratories
Berkeley, California; and
Dipartimento di Elettronica
Politecnico di Torino, Italy
A. A. Letichevsky
Glushkov Institute of Cybernetics
National Academy of Science
of Ukraine
Kiev, Ukraine
Marisa Lpez-Vallejo
Dpto. Ingenieria Electronica
E.T.S.I. Telecomunicacion
Ciudad Universitaria s/n
Madrid, Spain
Damien Lyonnard
Advanced System Technology
STMicroelectronics
Ontario, Canada
Yogesh Mahajan
Princeton University
Princeton, New Jersey
Grant Martin
Tensilica Inc.
Santa Clara, California
Birger Mller-Pedersen
Department of Informatics
University of Oslo
Oslo, Norway
Ravi Musunuri
The Department of Computer
Science
The University of Texas at Dallas
Richardson, Texas
Nicolas Navet
Institut National Polytechnique
de Lorraine
Nancy, France
Gabriela Nicolescu
Ecole Polytechnique
de Montreal
Montreal, Quebec
Canada
AchimNohl
CoWare Inc.
Aachen, Germany
Mikael Nolin
Department of Computer Science
and Engineering
Mlardalen University
Vsters, Sweden
Thomas Nolte
Department of Computer Science
and Engineering
Mlardalen University
Vsters, Sweden
Claudio Passerone
Dipartimento di Elettronica
Politecnico di Torino
Turin, Italy
Roberto Passerone
Cadence Design Systems, Inc.
Berkeley Cadence Labs
Berkeley, California
Hiren D. Patel
Electrical and Computer
Engineering
Virginia Tech
Blacksburg, Virginia
Maulin D. Patel
The Department of Computer
Science
The University of Texas at Dallas
Richardson, Texas
Pierre G. Paulin
Advanced System Technology
STMicroelectronics
Ontario, Canada
Chuck Pilkington
Advanced System Technology
STMicroelectronics
Ontario, Canada
Claudio Pinello
EECS Department
University of California at
Berkeley
Berkeley, California
Dumitru Potop-Butucaru
IRISA
Rennes, France
Antal Rajnk
Advanced Engineering Labs
Volcano Communications
Technologies AG
Tagerwilen, Switzerland
Anand Ramachandran
Department of Electrical and
Computer Engineering
University of Texas at Austin
Austin, Texas
Niels Reijers
Faculty of Electrical Engineering,
Mathematics, and Computer
Science
Delft University of Technology
Delft, The Netherlands
2006 by Taylor & Francis Group, LLC
xxx Contributors
Alberto L.
Sangiovanni-Vincentelli
EECS Department
University of California at
Berkeley
Berkeley, California
Udit Saxena
Microsoft Corporation
Seattle, Washington
Guenter Schaefer
Institute of Telecommunication
Systems
Technische Universitt Berlin
Berlin, Germany
D. N. Serpanos
Department of Electrical and
Computer Engineering
University of Patras
Patras, Greece
Marco Sgroi
EECS Department
University of California at
Berkeley
Berkeley, California
Sandeep K. Shukla
Electrical and Computer
Engineering
Virginia Tech
Blacksburg, Virginia
Franoise Simonot-Lion
Institut National Polytechnique
de Lorraine
Nancy, France
YeQiong Song
Universit Henri Poincar
Nancy, France
Weilian Su
Broadband and Wireless
Networking Laboratory
School of Electrical and Computer
Engineering
Georgia Institute of Technology
Atlanta, Georgia
Venkita Subramonian
Department of Computer Science
and Engineering
Washington University
St. Louis, Missouri
Jacek Szymanski
ALSTOM Transport
Centre Meudon La Fort
Meudon La Fort, France
Jean-Pierre Talpin
IRISA
Rennes, France
Lothar Thiele
Department Information
Technology and Electrical
Engineering
Computer Engineering and
Networks Laboratory
Swiss Federal Institute of
Technology
Zurich, Switzerland
Pieter van der Wolf
Philips Research
Eindhoven, The Netherlands
V. A. Volkov
Glushkov Institute of
Cybernetics
National Academy of Science
of Ukraine
Kiev, Ukraine
Thomas P. von Hoff
ABB Switzerland Ltd
Corporate Research
Baden-Dattwil, Switzerland
A. G. Voyiatzis
Department of Electrical and
Computer Engineering
University of Patras
Patras, Greece
Flvio R. Wagner
UFRGS Instituto de
Informtica
Porto Alegre, Brazil
Ernesto Wandeler
Department Information
Technology and Electrical
Engineering
Computer Engineering and
Networks Laboratory
Swiss Federal Institute of
Technology
Zurich, Switzerland
Yosinori Watanabe
Cadence Berkeley Labs
Berkeley, California
Thomas Weigert
Global Software Group
Motorola
Schaumburg, Illinois
Reinhard Wilhelm
University of Saarland
Saarbruecken, Germany
Richard Zurawski
ISA Group
San Francisco, California
2006 by Taylor & Francis Group, LLC
Contents
SECTION I Embedded Systems
Real-Time and Embedded Systems
1 Embedded Systems: Toward Networking of Embedded Systems
Luciano Lavagno and Richard Zurawski . . . . . . . . . . . . . 1-1
2 Real-Time in Embedded Systems Hans Hansson, Mikael Nolin, and
Thomas Nolte . . . . . . . . . . . . . . . . . . . . . . . . 2-1
Design and Validation of Embedded Systems
3 Design of Embedded Systems Luciano Lavagno and
Claudio Passerone . . . . . . . . . . . . . . . . . . . . . . 3-1
4 Models of Embedded Computation Axel Jantsch . . . . . . . . . 4-1
5 Modeling Formalisms for Embedded System Design Lus Gomes, Joo
Paulo Barros, and Anik Costa . . . . . . . . . . . . . . . . . 5-1
6 System Validation J.V. Kapitonova, A.A. Letichevsky, V.A. Volkov,
and Thomas Weigert . . . . . . . . . . . . . . . . . . . . . 6-1
Design and Verication Languages
7 Languages for Embedded Systems Stephen A. Edwards . . . . . . 7-1
8 The Synchronous Hypothesis and Synchronous Languages
Dumitru Potop-Butucaru, Robert de Simone, and Jean-Pierre Talpin . 8-1
9 Introduction to UML and the Modeling of Embedded Systems
ystein Haugen, Birger Mller-Pedersen, and Thomas Weigert . . . 9-1
10 Verication Languages Aarti Gupta, Ali Alphan Bayazit, and
Yogesh Mahajan . . . . . . . . . . . . . . . . . . . . . . . 10-1
Operating Systems and Quasi-Static Scheduling
11 Real-Time Embedded Operating Systems: Standards and Perspectives
Ivan Cibrario Bertolotti . . . . . . . . . . . . . . . . . . . . 11-1
2006 by Taylor & Francis Group, LLC
xxxii Contents
12 Real-Time Operating Systems: The Scheduling and Resource
Management Aspects Giorgio C. Buttazzo . . . . . . . . . . . 12-1
13 Quasi-Static Scheduling of Concurrent Specications
Alex Kondratyev, Luciano Lavagno, Claudio Passerone, and
Yosinori Watanabe . . . . . . . . . . . . . . . . . . . . . . 13-1
Timing and Performance Analysis
14 Determining Bounds on Execution Times Reinhard Wilhelm . . . 14-1
15 Performance Analysis of Distributed Embedded Systems
Lothar Thiele and Ernesto Wandeler . . . . . . . . . . . . . . . 15-1
Power Aware Computing
16 Power Aware Embedded Computing Margarida F. Jacome and
Anand Ramachandran . . . . . . . . . . . . . . . . . . . . . 16-1
Security in Embedded Systems
17 Design Issues in Secure Embedded Systems A.G. Voyiatzis,
A.G. Fragopoulos, and D.N. Serpanos . . . . . . . . . . . . . . 17-1
SECTION II System-on-Chip Design
18 System-on-Chip and Network-on-Chip Design Grant Martin . . . 18-1
19 A Novel Methodology for the Design of Application-Specic
Instruction-Set Processors Andreas Hoffmann, Achim Nohl, and
Gunnar Braun . . . . . . . . . . . . . . . . . . . . . . . . 19-1
20 State-of-the-Art SoC Communication Architectures Jos L. Ayala,
Marisa Lpez-Vallejo, Davide Bertozzi, and Luca Benini . . . . . . 20-1
21 Network-on-Chip Design for Gigascale Systems-on-Chip
Davide Bertozzi, Luca Benini, and Giovanni De Micheli . . . . . . 21-1
22 Platform-Based Design for Embedded Systems Luca P. Carloni,
Fernando De Bernardinis, Claudio Pinello,
Alberto L. Sangiovanni-Vincentelli, and Marco Sgroi . . . . . . . 22-1
23 Interface Specication and Converter Synthesis Roberto Passerone . 23-1
24 Hardware/Software Interface Design for SoC Wander O. Cesrio,
Flvio R. Wagner, and A.A. Jerraya . . . . . . . . . . . . . . . 24-1
25 Design and Programming of Embedded Multiprocessors: An
Interface-Centric Approach Pieter van der Wolf, Erwin de Kock,
Tomas Henriksson, Wido Kruijtzer, and Gerben Essink . . . . . . 25-1
2006 by Taylor & Francis Group, LLC
Contents xxxiii
26 A Multiprocessor SoC Platform and Tools for Communications
Applications Pierre G. Paulin, Chuck Pilkington, Michel Langevin,
Essaid Bensoudane, Damien Lyonnard, and Gabriela Nicolescu . . . 26-1
SECTION III Testing of Embedded Core-Based Integrated
Circuits
27 Modular Testing and Built-In Self-Test of Embedded Cores in
System-on-Chip Integrated Circuits Krishnendu Chakrabarty . . . 27-1
28 Embedded Software-Based Self-Testing for SoC Design
Kwang-Ting (Tim) Cheng . . . . . . . . . . . . . . . . . . . 28-1
SECTION IV Networked Embedded Systems
29 Design Issues for Networked Embedded Systems Sumit Gupta,
Hiren D. Patel, Sandeep K. Shukla, and Rajesh Gupta . . . . . . . 29-1
30 Middleware Design and Implementation for Networked Embedded
Systems Venkita Subramonian and Christopher Gill . . . . . . . 30-1
SECTION V Sensor Networks
31 Introduction to Wireless Sensor Networks S. Dulman, S. Chatterjea,
and P. Havinga . . . . . . . . . . . . . . . . . . . . . . . . 31-1
32 Issues and Solutions in Wireless Sensor Networks Ravi Musunuri,
Shashidhar Gandham, and Maulin D. Patel . . . . . . . . . . . 32-1
33 Architectures for Wireless Sensor Networks S. Dulman,
S. Chatterjea, T. Hoffmeijer, P. Havinga, and J. Hurink . . . . . . 33-1
34 Energy-Efcient Medium Access Control Koen Langendoen and
Gertjan Halkes . . . . . . . . . . . . . . . . . . . . . . . . 34-1
35 Overview of Time Synchronization Issues in Sensor Networks
Weilian Su . . . . . . . . . . . . . . . . . . . . . . . . . . 35-1
36 Distributed Localization Algorithms Koen Langendoen and
Niels Reijers . . . . . . . . . . . . . . . . . . . . . . . . . 36-1
37 Routing in Sensor Networks Shashidhar Gandham, Ravi Musunuri,
and Udit Saxena . . . . . . . . . . . . . . . . . . . . . . . 37-1
38 Distributed Signal Processing in Sensor Networks Omid S. Jahromi
and Parham Aarabi . . . . . . . . . . . . . . . . . . . . . . 38-1
2006 by Taylor & Francis Group, LLC
xxxiv Contents
39 Sensor Network Security Guenter Schaefer . . . . . . . . . . . 39-1
40 Software Development for Large-Scale Wireless Sensor Networks
Jan Blumenthal, Frank Golatowski, Marc Haase, and
Matthias Handy . . . . . . . . . . . . . . . . . . . . . . . 40-1
SECTION VI Embedded Applications
Automotive Networks
41 Design and Validation Process of In-Vehicle Embedded Electronic
Systems Franoise Simonot-Lion and YeQiong Song . . . . . . . 41-1
42 Fault-Tolerant Services for Safe In-Car Embedded Systems
Nicolas Navet and Franoise Simonot-Lion . . . . . . . . . . . . 42-1
43 Volcano Enabling Correctness by Design Antal Rajnk . . . . . 43-1
Industrial Automation
44 Embedded Web Servers in Distributed Control Systems
Jacek Szymanski . . . . . . . . . . . . . . . . . . . . . . . 44-1
45 HTTP Digest Authentication for Embedded Web Servers
Mario Crevatin and Thomas P. von Hoff . . . . . . . . . . . . . 45-1
Intelligent Sensors
46 Intelligent Sensors: Analysis and Design Eric Dekneuvel . . . . . 46-1
2006 by Taylor & Francis Group, LLC
I
Embedded Systems
2006 by Taylor & Francis Group, LLC
Real-Time and
Embedded Systems
1 Embedded Systems: Toward Networking of Embedded Systems
Luciano Lavagno and Richard Zurawski
2 Real-Time in Embedded Systems
Hans Hansson, Mikael Nolin, and Thomas Nolte
2006 by Taylor & Francis Group, LLC
1
Embedded Systems:
Toward Networking
of Embedded Systems
Luciano Lavagno
Cadence Berkeley Laboratories and
Politecnico di Torino
Richard Zurawski
ISA Group
1.1 Networking of Embedded Systems . . . . . . . . . . . . . . . . . . . . . 1-1
1.2 Design Methods for Networked Embedded
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
1.3 Networks Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
Networked Embedded Systems in Industrial Automation
Networked Embedded Systems in Building Automation
Automotive Networked Embedded Systems Sensor Networks
1.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14
1.1 Networking of Embedded Systems
The last two decades have witnessed a remarkable evolution of embedded systems from being assembled
from discrete components on printed circuit boards, although, they still are, to systems being assembled
from Intellectual Property (IP) components dropped onto silicon of the system on a chip. Systems on
a chip offer a potential for embedding complex functionalities, and to meet demanding performance
requirements of applications such as DSPs, network, and multimedia processors. Another phase in this
evolution, already in progress, is the emergence of distributed embedded systems; frequently termed as
networked embedded systems, where the word networked signies the importance of the networking
infrastructure and communication protocol. Anetworked embedded systemis a collection of spatially and
functionally distributed embedded nodes interconnected by means of wireline or wireless communication
infrastructure and protocols, interacting with the environment (via a sensor/actuator elements) and each
other, and, possibly, a master node performing some control and coordination functions, to coordinate
computing and communication in order to achieve certain goal(s). The networked embedded systems
appear in a variety of application domains such as, automotive, train, aircraft, ofce building, and
industrial primarily for monitoring and control, environment monitoring, and, in future, control,
as well.
There have been various reasons for the emergence of networked embedded systems, inuenced largely
by their application domains. The benet of using distributed systems and an evolutionary need to replace
point-to-point wiring connections in these systems by a single bus are some of the most important ones.
1-1
2006 by Taylor & Francis Group, LLC
1-2 Embedded Systems Handbook
The advances in design of embedded systems, tools availability, and falling fabrication costs of
semiconductor devices and systems, have allowed for infusion of intelligence into eld devices such as
sensors and actuators. The controllers used with these devices provide typically on-chip signal conversion,
data processing, and communication functions. The increased functionality, processing, and communic-
ation capabilities of controllers have been largely instrumental in the emergence of a widespread trend for
networking of eld devices around specialized networks, frequently referred to as eld area networks.
The eld area networks, or eldbuses [1] (eldbus is, in general, a digital, two-way, multi-drop commu-
nication link) as commonly referred to, are, in general, networks connecting eld devices such as sensors
and actuators with eld controllers (for instance, Programmable Logic Controllers [PLCs] in industrial
automation, or Electronic Control Units [ECUs] in automotive applications), as well as manmachine
interfaces, for instance, dashboard displays in cars.
In general, the benets of using those specialized networks are numerous, including increased exibility
attained through combination of embedded hardware and software, improved system performance, and
ease of systeminstallation, upgrade, andmaintenance. Specically, inautomotive andaircraft applications,
for instance, they allowfor a replacement of mechanical, hydraulic, andpneumatic systems by mechatronic
systems, where mechanical or hydraulic components are typically conned to the end-effectors; just to
mention their two different application areas.
Unlike Local Area Networks (LANs), due to the nature of communication requirements imposed by
applications, eld area networks, by contrast, tend to have low data rates, small size of data packets, and
typically require real-time capabilities which mandate determinism of data transfer. However, data rates
above 10 Mbit/sec, typical of LANs, have already become a commonplace in eld area networks.
The specialized networks tend to support various communication media such as twisted pair cables,
ber optic channels, power line communication, radio frequency channels, infrared connections, etc.
Based on the physical media employed by the networks, they can be, in general, divided into three main
groups, namely: wireline-based networks using media such as twisted pair cables, ber optic channels
(in hazardous environments like chemical and petrochemical plants), and power lines (in building
automation); wireless networks supporting radio frequency channels, and infrared connections; and
hybrid networks composed of wireline and wireless networks.
Although the use of wireline-based eld area networks is dominant, the wireless technology offers a
range of incentives in a number of application areas. In industrial automation, for instance, wireless device
(sensor/actuator) networks can provide a support for mobile operation required in case of mobile robots,
monitoring, and control of equipment in hazardous and difcult to access environments, etc. In a wireless
sensor/actuator network, stations may interact with each other on a peer-to-peer basis, and with a base
station. The base station may have its transceiver attached to a cable of a (wireline) eld area network,
giving rise to a hybrid wirelesswireline system [2]. A separate category is the wireless sensor networks,
mainly envisaged to be used for monitoring purposes, which is discussed in detail in the book.
The variety of application domains impose different functional and nonfunctional requirements onto
the operation of networked embedded systems. Most of themare required to operate in a reactive way; for
instance, systems used for control purposes. With that comes the requirement for real-time operation, in
which systems are required to respond within a predened period of time, mandated by the dynamics of
the process under control. Aresponse, in general, may be periodic to control a specic physical quantity by
regulating dedicated end-effector(s), or aperiodic arising from unscheduled events such as out-of-bounds
state of a physical parameter or any other kind of abnormal conditions, or sporadic with no period
but with known minimum time between consecutive occurrences. Broadly speaking, systems which can
tolerate a delay in response are called soft real-time systems; in contrast, hard real-time systems require
deterministic responses to avoid changes in the system dynamics which potentially may have negative
impact on the process under control, and as a result may lead to economic losses or cause injury to human
operators. Representative examples of systems imposing hard real-time requirement on their operation
are y-by-wire in aircraft control, and steer-by-wire in automotive applications, to mention a few.
The need to guarantee a deterministic response mandates using appropriate scheduling schemes, which
are frequently implementedinapplicationdomainspecic real-time operating systems or customdesigned
2006 by Taylor & Francis Group, LLC
Toward Networking of Embedded Systems 1-3
bare-bone real-time executives. Most of those issues (real-time scheduling and real-time operating
systems) are discussed in this book in a number of chapters.
The networked embedded systems used in safety-critical applications such as y-by-wire and steer-by-
wire require a high level of dependability to ensured that a system failure does not lead to a state in which
human life, property, or environment are endangered. The dependability issue is critical for technology
deployment; various solutions are discussed in this chapter in the context of automotive applications. One
of the main bottlenecks in the development of safety-critical systems is the software development process.
This issue is briey discussed in this chapter in the context of the automotive application domain.
As opposed to applications mandating hard real-time operation, such as the majority of industrial
automation controls or safety-critical automotive control applications, building automation control sys-
tems, for instance, seldom have a need for hard real-time communication; the timing requirements are
much more relaxed. The building automation systems tend to have a hierarchical network structure and
typically implement all seven layers of the ISO/OSI reference model [3]. In the case of eld area networks
employed in industrial automation, for instance, there is little need for the routing functionality and
end-to-end control. Therefore, typically, only the layers 1 (physical layer), 2 (data link layer, including
implicitly the mediumaccess control layer), and 7 (application layer, which also covers user layer) are used
in those networks.
This diversity of requirements imposed by different application domains (soft/hard real-time, safety
critical, network topology, etc.) necessitated different solutions, and using different protocols based on
different operationprinciples. This has resultedinplethora of networks developedfor different application
domains. Some of those networks will be overviewed in one of the subsequent sections.
With the growing trend for networking of embedded system and their internetworking with LAN,
Wide Area Network (WAN), and the Internet (for instance, there is a growing demand for remote access to
process data at the factory oor), many of those systems may become exposed to potential security attacks,
which may compromise their integrity and cause damage as a result. The limited resources of embedded
nodes pose considerable challenge for the implementation of effective security policies which, in general,
are resource demanding. These restrictions necessitate a deployment of lightweight security mechanisms.
Vendor tailored versions of standard security protocol suites, such as Secure Sockets Layer (SSL) and
IP Security Protocol (IPSec), may still not be suitable due to excessive demand for resources. Potential
security solutions for this kind of systems depend heavily on the specic device or system protected,
application domain, and extent of internetworking and its architecture. (The details of potential security
measures are presented in this book in two separate chapters.)
1.2 Design Methods for Networked Embedded Systems
Design methods for networked embedded systems fall into the general category of system-level design.
They include two separate aspects, which will be discussed briey. Arst aspect is the network architecture
design, in which communication protocols, interfaces, drivers, and computation nodes are selected and
assembled. A second aspect is the system-on-chip design, in which the best hardware/software partition
is selected, and an existing platform is customized, or a new chip is created for the implementation
of a computation or a communication node. Both aspects share several similarities, but so far have
generally been solved using ad hoc methodologies and tools, since the attempt to create a unied electronic
system-level design methodology have so far failed.
When one considers the complete networked system, including several digital and analog parts, many
more trade-offs can be played at the global level. However, it also means that the interaction between the
digital portion of the design activity and the rest is much more complicated, especially in terms of tools,
formats, and standards with which one must interoperate and interface.
In the case of network architecture design, tools such as OpNet and NS are used to identify commu-
nication bottlenecks, investigate the effect of parameters such as channel bit error rate, and analyze the
choice of coding, medium access, and error correction mechanisms on the overall system performance.
2006 by Taylor & Francis Group, LLC
1-4 Embedded Systems Handbook
For wireless networks, tools such as Matlab and Simulink are also used, in order to analyze the impact of
detailed channel models, thanks to their ability to model both digital and analog components, as well as
physical elements, at a high level of abstraction. In all cases, the analysis is essentially functional, that is, it
takes into account only in a very limited manner effects such as power consumption, computation time,
and cost. This is the main limitation that will need to be addressed in the future, if one wants to model and
design in an optimal manner low power networked embedded systems, such as those that are envisioned
for wireless sensor network applications.
At the system-on-chip architecture level, the rst decision to be made is whether to use a platform
instance or design an Application-Specic Integrated Circuit (ASIC) from scratch. The rst option builds
on the availability of large libraries of IP, both in the form of processors, memories, and peripherals, from
major silicon vendors. These IP libraries are guaranteed to work together, and hence constitute what is
termed as a platform. Aplatformis a set of components, together with usage rules that ensure their correct
andseamless interoperation. They are usedtospeeduptime-to-market, by ensuring rapidimplementation
of complex architectures. Processors (and the software executing on them) provide exibility to adapt to
different applications and customizations (e.g., localization and adherence to regional standards), while
hardware IPs provide efcient implementation of commonly used functions. Congurable processors can
be adapted to the requirements of specic applications and via instruction extensions, offer considerable
performance and power advantages over xed instruction set architectures.
Thus, a platform is a single abstract model that hides the details of a set of different possible imple-
mentations as clusters of lower level components. The platform, for example, a family of microprocessors,
peripherals, and bus protocols, allows developers of application designs to operate without detailed know-
ledge of the implementation (e.g., the pipelining of the processor or the internal implementation of the
UART). At the same time, it allows platform implementors to share design and fabrication costs among a
broad range of potential users, broader than if each design was a one-of-a-kind type.
Designmethods that exploit the notionof platformgenerally start froma functional specication, which
is then mapped onto an architecture (a platforminstance) in order to derive performance information and
explore the design space. Full exploitation of the notion of platform results in better reuse, by decoupling
independent aspects that would otherwise tie, for example, a given functional specication to low level
implementation details. The guiding principle of separation of concerns distinguishes between:
1. Computation and communication. This separation is important because renement of computa-
tion is generally done by hand, or by compilation and scheduling, while communication makes use
of patterns.
2. Application and platform implementation, because they are often dened and designed indepen-
dently by different groups or companies.
3. Behavior and performance, which should be kept separate because performance information
can represent either nonfunctional requirements (e.g., maximum response time of an embed-
ded controller), or the result of an implementation choice (e.g., the worst-case execution time of
a task). Nonfunctional constraint verication can be performed traditionally, by simulation and
prototyping, or with static formal checks, such as schedulability analysis.
Tool support for system-on-chip architectural design is, so far, mostly limited to simulation and interface
generation. The rst category includes tools such as NC-SystemC from Cadence, ConvergenSC from
CoWare, and SystemStudio from Synopsys. Simulators at the system-on-chip level provide abstractions
for the main architectural components (processors, memories, busses, and hardware blocks) and permit
quick instantiation of complete platform instances from template skeletons. Interface synthesis can take
various forms, from the automated instantiation of templates offered by N2C from CoWare, to the
automated consistent le generation for software and hardware offered by Beach Solutions.
A key aspect of design problems in this space is compatibility with respect to specications, at the inter-
face level (bus and networking standards), instruction-set architecture level, and Application Procedural
Interface (API) level. Assertion-based verication techniques can be used to ease the problem of verifying
compliance with a digital protocol standard (e.g., for a bus).
2006 by Taylor & Francis Group, LLC
Toward Networking of Embedded Systems 1-5
Let us consider an example of a design ow in the automotive domain, which can be considered as a
paradigm of any networked embedded system. Automotive electronic design starts, usually 5 to 10 years
before the actual introduction of a product, when a car manufacturer denes the specications for its
future line of vehicles.
It is now an accepted practice to use the notion of platform also in this domain, so that the electronic
portion (as well as the mechanical one, which is outside the scope of this discussion) is modularized and
componentized, enabling sharing across different models. An ECU generally includes a microcontroller
(8, 16, and 32 bits), memory (SRAM, DRAM, and Flash), some ASIC or FPGA for interfacing, one or
more in-vehicle network interfaces (e.g., CAN [Controller Area Network] or FlexRay), and several sensor
and actuator interfaces (analog/digital and digital/analog converters, pulse-width modulators, power
transistors, display drivers, and so on).
The system-level design activity is performed by a relatively small team of architects, who know the
domain well (mechanics, electronics, and business), dene the specications for the electronic component
suppliers, and interface with the teams that specify the mechanical portions (body and engine). These
teams essentially use past experience to performtheir job, and currently have serious problems forecasting
the state of electronics ten years in advance.
Control algorithms are dened in the next design phase, when the rst engine models (generally
described using Simulink, Matlab, and StateFlow) become available, as a specication for both the elec-
tronic design and the engine design. An important aspect of the overall ow is that these models are not
frozen until much later, and hence both algorithmdesign and (often) ECUsoftware design must cope with
their changes. Another characteristic is that they are parametric models, sometimes reused across multiple
engine generations and classes, whose exact parameter values will be determined only when prototypes
or actual products will be available. Thus, control algorithms must consider both allowable ranges and
combinations of values for these parameters, and the capability to measure directly or indirectly their
values from the behavior of engine and vehicle. Finally, algorithms are often distributed over a network
of cooperating ECUs, thus deadlines and constraints generally span a number of electronic modules.
While control design progresses, ECU hardware design can start, because rough computational and
memory requirement, as well as interfacing standards, sensors, and actuators, are already known. At the
end of both control design and hardware design, software implementation can start. As mentioned earlier,
most of the software running on modern ECUs is automatically generated (model-based design).
The electronic subsystem supplier in the hardware implementation phase can use both off-the-shelf
components (such as memories), Application Specic Standard Products (ASSPs) (such as microcontrol-
lers and standard bus interfaces), and even ASICs and FPGAs (typically for sensor and actuator signal
conditioning and conversion).
The nal phase, called system integration, is generally performed by the car manufacturer again.
It can be an extremely lengthy and an expensive phase, because it requires the use of expensive detailed
models of the controlled system (e.g., the engine, modeled with DSP-based multiprocessors) or even
of actual car prototypes. The goal of integration is to ensure smooth subsystem communication (e.g.,
checking that there are no duplicate module identiers and that there is enough bandwidth in every
in-vehicle bus). Simulation support in this domain is provided by companies such as Vast and Axys (now
part of ARM), who sell both fast instruction-set simulators for the most commonly used processors in
the networked embedded system domain, and network simulation models exploiting either proprietary
simulationengines, for example, inthe case of Virtio, or standard simulators (HDL [Hardware Description
Language] or SystemC).
1.3 Networks Embedded Systems
1.3.1 Networked Embedded Systems in Industrial Automation
Although for the origins of eld area networks, one can look back as far as the end of 1960s in the
nuclear instrumentation domain, CAMACnetwork [4], and beginning of 1970s in avionics and aerospace
2006 by Taylor & Francis Group, LLC
1-6 Embedded Systems Handbook
applications, MIL-STD-1553 bus [5], it was the industrial automation area which brought the main thrust
of developments. The need for integrationof heterogeneous systems, difcult at that time due to the lack of
standards, resultedintwomajor initiatives whichhave hada lasting impact onthe integrationconcepts, and
architecture of the protocol stack of eld area networks. These initiatives were TOP (Technical and Ofce
Protocol) [6] and MAP (Manufacturing Automation Protocol) [7] projects. The two projects exposed
some pitfalls of the full seven-layer stack implementations (ISO/OSI model [3]). As a result, typically, only
the layers 1 (physical layer), 2 (data link layer, including implicitly the medium access control layer), and
7 (application layer, which also covers user layer) are used in the eld area networks [8]; also prescribed
by the international eldbus standard, IEC 61158 [9]. In IEC 61158, functions of layers 3 and 4 are
recommended to be placed in either layers 2 or 7 network and transport layers are not required in
a single segment network typical of process and industrial automation (situation is different though in
building automation, for instance, where the routing functionality and end-to-end control may be needed
arising from a hierarchical network structure); functions of layers 5 and 6 are always covered in layer 7.
The evolution of eldbus technology which begun well over two decades ago has resulted in a multitude
of solutions reecting the competing commercial interests of their developers and standardization bodies,
both national and international: IEC [10], ISO [11], ISA[12], CENELEC [13], and CEN[14]. This is
also reected in IEC 61158 (adopted in 2000), which accommodates all national standards and user
organization championed eldbus systems. Subsequently, implementation guidelines were compiled into
communication proles, IEC 61784-1 [15]. Those communication proles identify seven main systems
(or communication prole families) known by brand names as Foundation Fieldbus (H1, HSE, H2)
used in process and factory automation; ControlNet and EtherNet/IP both used in factory automation;
PROFIBUS (DP, PA) used in factory and process automation respectively; PROFInet used in factory
automation; P-Net (RS 485, RS 232) used in factory automation and shipbuilding; WorldFIP used in
factory automation; INTERBUS, INTERBUS TCP/IP, and INTERBUS Subset used in factory automation;
Swiftnet transport, Swiftnet full stack used by aircraft manufacturers. The listed application areas are the
dominant ones.
Ethernet, the backbone technology for ofce networks, is increasingly being adoptedfor communication
in factories and plants at the eldbus level. The random and native CSMA/CD arbitration mechanism is
being replaced by other solutions allowing for deterministic behavior required in real-time communica-
tion to support soft and hard real-time deadlines, time synchronization of activities required to control
drives, for instance, and for exchange of small data records characteristic of monitoring and control
actions. The emerging Real-Time Ethernet (RTE), Ethernet augmented with real-time extensions, under
standardization by IEC/SC65C committee, is a eldbus technology which incorporates Ethernet for the
lower two layers in the OSI model. There are already a number of implementations, which use one of
the three different approaches to meet real-time requirements. First approach is based on retaining the
TCP/UDP/IP protocols suite unchanged (subject to nondeterministic delays); all real-time modications
are enforced in the top layer. Implementations in this category include Modbus/TPC [16] (dened by
Schneider Electric and supported by Modbus-IDA [17]), EtherNet/IP [18] (dened by Rockwell and
supported by the Open DeviceNet Vendor Association (ODVA) [19] and ControlNet International [20]),
P-Net (on IP) [21] (proposed by the Danish P-Net national committee), and Vnet/IP [22] (developed
by Yokogawa, Japan). In the second approach, the TCP/UDP/IP protocols suite is bypassed, the Ethernet
functionality is accessed directly in this case, RTE protocols use their own protocol stack in addition to
the standard IP protocol stack. The implementations in this category include Ethernet Powerlink (EPL)
[23] (denedby Bernecker +Rainer [B&R], andnowsupportedby the Ethernet PowerlinkStandardisation
Group [24]), TCnet (a Time-critical Control Network) [25] (a proposal from Toshiba), EPA (Ethernet
for Plant Automation) [26] (a Chinese proposal), and PROFIBUS CBA (Component-Based Automation)
[27] (dened by several manufacturers including Siemens, and supported by PROFIBUS International
[28]). Finally, in the third approach, the Ethernet mechanism and infrastructure are modied. The
implementations include SERCOS III [29] (under development by SERCOS), EtherCAT [30] (dened by
Beckhoff and supported by the EtherCat Technology Group [31]), and PROFINET IO [32] (dened by
several manufacturers including Siemens, and supported by PROFIBUS International).
2006 by Taylor & Francis Group, LLC
Toward Networking of Embedded Systems 1-7
The use of standard components such as protocol stacks, Ethernet controllers, bridges, etc., allows to
mitigate the ownership and maintenance cost. The direct support for the Internet technologies allows
for vertical integration of various levels of industrial enterprise hierarchy to include seamless integration
between automation and business logistic levels to exchange jobs and production (process) data; transpar-
ent data interfaces for all stages of the plant life cycle; the Internet- and web-enabled remote diagnostics
and maintenance, as well as electronic orders and transactions. In the case of industrial automation, the
advent and use of networking has allowed for horizontal and vertical integration of industrial enterprises.
1.3.2 Networked Embedded Systems in Building Automation
Another fast growing application area for networked embedded systems is building automation [33].
Building automation systems aim at the control of the internal environment, as well as the immediate
external environment of a building, or a building complex. At present, the focus of research and technology
development is on commercial type of buildings (ofce building, exhibition center, shopping complex,
etc.). In future, this will also include industrial type of buildings, which pose substantial challenges to
the development of effective monitoring and control solutions. Some of the main services to be offered
by the building automation systems typically include: climate control to include heating, ventilation, air
conditioning; visual comfort to cover articial lighting, control of day light; safety services such as re
alarm, and emergency sound system; security protection; control of utilities such as power, gas, water
supply, etc.; internal transportation systems to mention lifts, escalators, etc.
In terms of the quality of the service requirements imposed on the eld area networks, building
automation systems differ considerably from their counterparts in industrial automation, for instance.
There is seldom a need for hard real-time communication; the timing requirements are much more
relaxed. Trafc volume in normal operation is low. Typical trafc is event driven, and mostly uses peer-to-
peer communication paradigm. Fault tolerance and network management are important aspects. As with
industrial eldbus systems, there are a number of bodies involved in the standardization of technologies
for building automation, including the eld area networks.
The communication architecture supporting automation systems embedded in the buildings has typ-
ically three levels: eld, control, and management levels. The eld level, involves operation of elements
such as switches, motors, lighting cells, dry cells, etc. The peer-to-peer communication is perhaps most
evident at that level; toggling a switch should activate a lighting cell(s), for instance. The automation
level is typically used to evaluate new control strategies for the lower level in response to the changes in
the environment; reduction in the day light intensity, external temperature change, etc. LonWorks [34],
BACnet [35], and EIB/KNX [3639] are open system networks, which can be used at more than one
level of the communication architecture. A round up of LonWorks will be provided in the following, as a
representative example of specialized eld area networks used in building automation.
LonWorks (EIA-709), a trademark of Echelon Corp. [40], employs LonTalk protocol which implements
all seven layers of the ISO/OSI reference model. The LonTalk protocol was published as a formal standard
[41], and revised in 2002 [42].
In EIA-709, layer 2 supports various communication media such as twisted pair cables (78 Kbit/sec
[EIA-709.3] or 1.25 Mbit/sec), power line communication (4 Kbit/sec, EIA-709.2), radio frequency chan-
nel, infrared connections, ber optic channels (1.25 Mbit/sec), as well as IP connections based on the
EIA-852 protocol standard [43] in order to tunnel EIA-709 data packets through IP (Intranet, Inter-
net) networks. A p-persistent CSMA bus arbitration scheme is used on twisted pair cables. For other
communication media, the EIA-709 protocol stack uses the arbitration scheme dened for the very media.
The EIA-709 layer 3 supports a variety of different addressing schemes and advanced routing capa-
bilities. The entire routable address space of a LonTalk network is referred to as the domain (Figure 1.1).
Adomain is restricted to 255 subnets; a subnet allows for up to 127 nodes. The total number of addressable
nodes in a domain can reach 32385; up to 2
48
domains can be addressed. Domain gateways can be built
between logical domains in order to allow for a communication across domain boundaries. Groups can
be formed in order to send a single data packet to a group of nodes using a multicast addressed message.
2006 by Taylor & Francis Group, LLC
1-8 Embedded Systems Handbook
Node x
Node x
Node 3 Node x
Group # 1
Subnet 1
Subnet 1
Subnet x
Domain 1
Subnet 2
Node 1
Node 1 Node 1 Node 2
Node 1 Node 2 Node 1 Node 2
Domain 2
Node 2
Subnet 2
Subnet x
S/N
S/N S/N
S/N
S/N
S/N S/N
S/N
Router
Router Router
Router
S/N
S/N S/N
S/N
S/N S/N
Domain gateway
FIGURE 1.1 Addressing elements in EIA-709 networks. (From D. Loy, Fundamentals of LonWorks/EIA 709
networks: ANSI/EIA 709 protocol standard (LonTalk). In The Industrial Communication Technology Handbook,
Zurawski, R. (Ed.), CRC Press, Boca Raton, FL, 2005. With permission.)
Routing is performed between different subnets only. An EIA-709 node can send a unicast addressed
message to exactly one node using either unique 48-bit node identication (Node ID) address or the
logical subnet/node address. A multicast addressed message can be sent to either a group of nodes (group
address), or all nodes in the subnet, or all nodes in the entire domain (broadcast address).
The EIA-709 layer 4 supports four types of services. The unacknowledged service transmits the data
packet from the sender to the receiver. The unacknowledged repeated service transmits the same data
packet a number of times. The number of retries is programmable. The acknowledged service transmits
the data packet and waits for an acknowledgment from the receiver. If not received by the transmitter, the
same data packet is sent again. The number of retries is programmable. The request response service sends
a request message to the receiver; the receiver must respond with a response message, for instance, with
statistics information. There is a provision for authentication of acknowledged transmissions, although
not very efcient.
Network nodes (which, typically, include Neuron chip, RAM/Flash, power source, clock, network
transceiver, and input/output interface connecting to sensor and actuator) can be based on the Echelons
Neuronchipseries manufacturedby Motorola, Toshiba, andCypress; recently alsobasedonother platform
independent implementations such as LoyTec LC3020 controller. The Neuron chips-based controllers are
programmed with the Echelons Neuron C language, which is a derivative of ANSI C. Other controllers
such as LC3020 are programmed with standard ANSI C. The basic element of Neuron C is the Network
Variable (NV) which can be propagated over the network. For instance, SNVT_temp variable repres-
ents temperature in degree Celsius; SNVT stands for Standard Network Variable Type. Network nodes
communicate with each other by exchanging NVs. Another way to communicate between nodes is by
using explicit messages. The Neuron C programs are used to schedule application events and to react to
incoming data packets (receiving NVs) from the network interface. Depending on the network media and
the network transceivers, a variety of network topologies are possible with LonWorks nodes, to include
bus, ring, star, and free topology.
2006 by Taylor & Francis Group, LLC
Toward Networking of Embedded Systems 1-9
As the interoperability on all seven OSI layers does not guarantee interworkable products, the LonMark
organization [44] has published interoperability guidelines for nodes that use the LonTalk protocol.
A number of task groups within LonMark dene functional proles (subset of all the possible protocol
features) for analog input, analog output, temperature sensor, etc. The task groups focus on various types
of applications such as home/utility, HVAC, lighting, etc.
LonBuilder and NodeBuilder are development and integration tools offered by Echelon. Both tools
allow writing Neuron C programs, to compile and link them and download the nal application into
the target node hardware. NodeBuilder supports debugging of one node at the time. LonBuilder, which
supports simultaneous debugging of multiple nodes, has a built in protocol analyzer and a network binder
to create communication relationships between network nodes. The Echelons LNS (network operating
system) provides tools that allow one to install, monitor, control, manage, and maintain control devices,
and to transparently perform these services over any IP-based network, including the Internet.
1.3.3 Automotive Networked Embedded Systems
Similar trends appear in the automotive electronic systems where the ECUs are networked by means of
one of automotive specic communication protocols for the purpose of controlling one of the vehicle
functions; for instance, electronic engine control, antilocking break system, active suspension, telematics,
tomentiona few. InReference 45, a number of functional domains have beenidentiedfor the deployment
of automotive networked embedded systems. They include the powertrain domain, involving, in general,
control of engine and transmission; the chassis domain involving control of suspension, steering and
braking, etc.; the body domain involving control of wipers, lights, doors, windows, seats, mirrors, etc.;
the telematics domain involving, mostly, the integration of wireless communications, vehicle monitoring
systems, and vehicle locationsystems; and the multimedia and HumanMachine Interface (HMI) domains.
The different domains impose varying constraints on the networked embedded systems in terms of
performance, safety requirements, andQuality of Services (QoSs). For instance, the powertrainandchassis
domains will mandate real-time control; typically, bounded delay is required, as well as fault-tolerant
services.
There are a number of reasons for the interest of the automotive industry in adopting mechatronic
solutions, known by their generic name as x-by-wire, aiming to replace mechanical, hydraulic, and pneu-
matic systems by electrical/electronic systems. The main factors seem to be economic in nature, improved
reliability of components, and increased functionality to be achieved with a combination of embed-
ded hardware and software. Steer-by-wire, brake-by-wire, or throttle-by-wire systems are representative
examples of those systems. But, it seems that certain safety-critical systems such as steer-by-wire and
brake-by-wire will be complemented with traditional mechanical/hydraulic backups, for safety reasons.
The dependability of x-by-wire systems is one of the main requirements, as well as constraints on the
adoption of this kind of systems. In this context, a safety-critical x-by-wire system has to ensure that a
system failure does not lead to a state in which human life, property, or environment are endangered; and
a single failure of one component does not lead to a failure of the whole x-by-wire system [46]. When
using Safety Integrity Level (SIL) scale, it is required for x-by-wire systems that the probability of a failure
of a safety-critical system does not exceed the gure of 10
9
per hour/system. This gure corresponds to
the SIL4 level. Another equally important requirement for the x-by-wire systems is to observe hard real-
time constraints imposed by the system dynamics; the end-to-end response times must be bounded for
safety-critical systems. Aviolation of this requirement may lead to performance degradation of the control
system, and other consequences as a result. Not all automotive electronic systems are safety critical. For
instance, system(s) to control seats, door locks, internal lights, etc., are not. Different performance, safety,
and QoS requirements dictated by various in-car application domains necessitate adoption of different
solutions, which, in turn, gave rise to a signicant number of communication protocols for automotive
applications. Time-triggered protocols based on TDMA (Time Division Multiple Access) medium access
control technology are particularly well suitedfor the safety-critical solutions, as they provide deterministic
access to the medium. In this category, there are two protocols, which, in principle, meet the requirements
2006 by Taylor & Francis Group, LLC
1-10 Embedded Systems Handbook
of x-by-wire applications, namely TTP/C [47] and FlexRay [48] (FlexRay can support a combination of
both time-triggered and event-triggered transmissions). The following discussion will focus mostly on
TTP/C and FlexRay.
The TTP/C (Time-Triggered Protocol) is a fault-tolerant time-triggered protocol; one of two protocols
in the Time Triggered Architecture (TTA) [49]. The other one is a low cost eldbus protocol TTP/A
[50]. In TTA, the nodes are connected by two replicated communication channels forming a cluster.
In TTA, a network may have two different interconnection topologies, namely bus and star. In the bus
conguration, each node is connected to two replicated passive buses via bus guardians. The bus guardians
are independent units preventing associated nodes from transmitting outside predetermined time slots,
by blocking the transmission path; a good example may be a case of a controller with a faulty clock
oscillator which attempts to transmit continuously. In the star topology, the guardians are integrated in
to two replicated central star couplers. The guardians are required to be equipped with their own clocks,
distributed clock synchronization mechanism, and power supply. In addition, they should be located at a
distance from the protected node to increase immunity to spatial proximity faults. To cope with internal
physical faults, TTA employs partitioning of nodes in to so-called Fault-Tolerant Units (FTUs), each of
which is a collection of several stations performing the same computational functions. As each node is
(statically) allocated a transmission slot in a TDMA round, failure of any node or a frame corruption is
not going to cause degradation of the service. In addition, data redundancy allows, by voting process, to
ascertain the correct data value.
TTP/C employs synchronous TDMA medium access control scheme on replicated channels, which
ensures fault-tolerant transmission with known delay and bounded jitter between the nodes of a cluster.
The use of replicated channels, and redundant transmission, allows for the masking of a temporary fault
on one of channels. The payload section of the message frame contains up to 240 bytes of data protected by
a 24-bit CRCchecksum. In TTP/C, the communication is organized in to rounds. In a round, different slot
sizes may be allocated to different stations. However, slots belonging to the same stationare of the same size
in successive rounds. Every node must send a message in every round. Another feature of TTP/C is fault-
tolerant clock synchronization that establishes global time base without a need for a central time provider.
In the cluster, each node contains the message schedule. Based on that information, a node computes
the difference between the predetermined and actual arrival time of a correct message. Those differences
are averaged by a fault-tolerant algorithm, which allows for the adjustment of the local clock to keep it
in synchrony with clocks of other nodes in the cluster. TTP/C provides so-called membership service
to inform every node about the state of every other node in the cluster; it is also used to implement
the fault-tolerant clock synchronization mechanism. This service is based on a distributed agreement
mechanism, which identies nodes with failed links. A node with a transmission fault is excluded from
the membership until restarted with a proper state of the protocol. Another important feature of TTP/C
is a clique avoidance algorithm to detect and eliminate formation of cliques in case the fault hypothesis
is violated. In general, the fault-tolerant operation based on FTUs cannot be maintained if the fault
hypothesis is violated. In such a situation, TTA activates Never-Give-Up (NGU) strategy [46]. The NGU
strategy, specic to the application, is initiated by TTP/C in combination with the application with an aim
to continue operation in a degraded mode.
The TTA infrastructure, and the TTP/A and TTP/C protocols have a long history dating back to
1979 when the Maintainable Architecture for Real-Time Systems (MARSs) project started at the Technical
University of Berlin. Subsequently, the work was carriedout at theVienna University of Technology. TTP/C
protocol have been experimented with and considered for deployment for quite some time. However, to
date, there have been no actual implementations of that protocol involving safety-critical systems in
commercial automobiles, or trucks. In 1995, a proof of concept, organized jointly by Vienna University
of Technology and DaimlerChrysler, demonstrated a car equipped with a brake-by-wire system based
on time-triggered protocol. The TTA design methodology, which distinguishes between the node design
and the architecture design, is supported by a comprehensive set of integrated tools from TTTech. A range
of development and prototyping hardware is available from TTTech, as well. Austriamicrosystems offers
automotive certied TTP-C2 Communication Controller (AS8202NF).
2006 by Taylor & Francis Group, LLC
Toward Networking of Embedded Systems 1-11
Static segment Dynamic segment Symbol window
Optional
Network communication time Network idle time
Static slot Static slot Static slot
Mini-
slot
Mini-
slot
Mini-
slot
Mini-
slot
Mini-
slot
Mini-
slot
Communication cycle
FIGURE1.2 FlexRay communication cycle. (FromD. Millinger and R. Nossal, FlexRay Communication Technology.
In The Industrial Communication Technology Handbook, Zurawski, R. (Ed.), CRC Press, Boca Raton, FL, 2005. With
permission.)
FlexRay, which appears to be the frontrunner for future automotive safety-critical control applications,
employs a modied TDMA medium access control scheme on a single or replicated channel. The payload
section of a frame contains up to 254 bytes of data protected by a 24-bit CRC checksum. To cope with
transient faults, FlexRay also allows for a redundant data transmissionover the same channel(s) witha time
delay between transmissions. The FlexRay communication cycle comprises of a network communication
time, and network idle time, Figure 1.2. Two or more communication cycles can form an application
cycle. The network communication time is a sequence of static segment, dynamic segment, and symbol
window. The static segment uses a TDMA MAC protocol. The static segment comprises of static slots
of xed duration. Unlike in TTP/C, the static allocation of slots to a node (communication controller)
applies to one channel only. The same slot may be used by another node on the other channel. Also,
a node may possess several slots in a static segment. The dynamic segment uses a FTDMA (Flexible Time
Division Multiple Access) MAC protocol, which allows for a priority and demand driven access pattern.
The dynamic segment comprises of so-called mini-slots with each node allocated a certain number of
mini-slots, whichdo not have to be consecutive. The mini-slots are of a xedlength, andmuchshorter than
static slots. As the length of a mini-slot is not sufcient to accommodate a frame (a mini-slot only denes
a potential start time of a transmission in the dynamic segment), it has to be enlarged to accommodate
transmission of a frame. This in turn reduces the number of mini-slots in the reminder of the dynamic
segment. A mini-slot remains silent if there is nothing to transmit. The nodes allocated mini-slots toward
the end of the dynamic segment are less likely to get transmission time. This in turn enforces a priority
scheme. The symbol window is a time slot of xed duration used for network management purposes. The
networkidle time is a protocol specic time window, inwhichnotrafc is scheduledonthe communication
channel. It is used by the communication controllers for the clock synchronization activity; in principle,
similar to the one described for TTP/C. If the dynamic segment and idle window are optional, the idle
time, and minimal static segment are mandatory parts of a communication cycle; minimum two static
slots (degraded static segment), or four static slots for fault-tolerant clock synchronization are required.
With all that, FlexRay allows for three congurations: pure static; mixed, with both static and dynamic
bandwidth ratio depends on the application; and pure dynamic, where all bandwidth is allocated to the
dynamic communication.
FlexRay supports a range of network topologies offering a maximum of scalability and a consid-
erable exibility in the arrangement of embedded electronic architectures in automotive applications.
The supported congurations include bus, active star, active cascaded stars, and active stars with bus
extension. FlexRay also uses the bus guardians in the same way as TTP/C.
The existing FlexRay communication controllers support communication bit rates of up to 10 Mbit/sec
on two channels. The transceiver component of the communication controller also provides a set
2006 by Taylor & Francis Group, LLC
1-12 Embedded Systems Handbook
of automotive network specic services. Two major services are alarm handling and wakeup control.
In addition to the alarm information received in a frame, an ECU also receives the alarm symbol from
the communication controller. This redundancy can be used to validate critical signals; for instance, an
air bag re command. The wakeup service is required where electronic components have a sleep mode to
reduce power consumption.
FlexRay is a joint effort of a consortium involving some of the leading car makers and technology pro-
viders to mention BMW, Bosch, DaimlerChrysler, General Motors, Motorola, Philips, and Volkswagen,
as well as Hyundai Kia Motors as a premium associate member with voting rights. DECOMSYS offers
Designer Pro, a comprehensive set of tools to support the development process of FlexRay based applic-
ations. The FlexRay protocol specication version 2.0 was released in 2004. The controllers are currently
available fromFreescale, and in future fromNEC. The latest controller version, MFR4200, implements the
protocol specication versions 1.0 and 1.1. Austriamicrosystems offers high-speed automotive bus trans-
ceiver for FlexRay (AS8221). The special physical layer for FlexRay is provided by Phillips. It supports the
topologies described above, and a data rate of 10 Mbit/sec on one channel. Two versions of the bus driver
will be available.
Time-Triggered Controller Area Network (TTCAN) [51], that can support a combination of both time-
triggered and event-triggered transmissions, utilize physical and data-link layer of the CANprotocol. Since
this protocol, as in the standard, does not provide necessary dependability services, it is unlikely to play
any role in fault-tolerant communication in automotive applications.
TTP/C and FlexRay protocols belong to class D networks in the classication published by the Society
for Automotive Engineers [52, 53]. Although the classication dates back to 1994, it is still a reasonable
guideline for distinction of different protocols based on data transmission speed and functions distributed
over the network, which comprises of four classes. Class A includes networks with a data rate less than
10 Kbit/sec. Some of the representative protocols are Local Interconnect Network (LIN) [54] and TTP/A
[50]. Class A networks are employed largely to implement the body domain functions. Class B networks
operate within the range of 10 Kbit/sec to 125 Kbit/sec. Some of the representative protocols are J1850
[55], low-speed CAN [56], and VAN (Vehicle Area Network) [57]. Class C networks operate within
the range of 125 Kbit/sec to 1 Mbit/sec. Examples of this class networks are high-speed CAN [58] and
J1939 [59]. Network in this class are used for the control of powertrain and chassis domains. High-speed
CAN, although used in the control of powertrain and chassis domains, is not suitable for safety-critical
applications as it lacks the necessary fault-tolerant services. Class D networks (not formally dened as
yet) includes networks with a data rate over 1 Mbit/sec. Networks to support the x-by-wire solutions fall
in to this class, to include TTP/C and FlexRay. Also, MOST (Media Oriented System Transport) [60] and
IDB-1394 [61], both for multimedia applications, belong to this class.
The cooperative development process of networked embedded automotive applications brings with
itself heterogeneity of software and hardware components. Even with the inevitable standardization of
those components, interfaces, and even complete system architectures, the support for reuse of hard-
ware and software components is limited. Thus potentially making the design of networked embedded
automotive applications labor-intensive, error-prone, and expensive. This necessitates the development of
component-based design integration methodologies. An interesting approach is based on platform-based
design [62], discussed in this book with a viewfor automotive applications. Some industry standardization
initiatives include: OSEK/VDX with its OSEKTime OS (OSEK/VDX Time-Triggered Operating Systems)
[63]; OSEK/VDX Communication [64] which species a communication layer that denes common
software interfaces and common behavior for internal and external communications among application
processes; and OSEK/VDX FTCom (Fault-Tolerant Communication) [65] a proposal for a software
layer to provide services to facilitate development of fault-tolerant applications on top of time-triggered
networks; HIS (Herstellerinitiative Software)[66] with a broad range of goals including standardization of
software modules, specication of process maturity levels, development of software test, development of
software tools, etc; ASAM (Association for Standardization of Automation and Measuring Systems) [67]
which develops, amongst other projects, a standardized XML based format for data exchange between
tools from different vendors.
2006 by Taylor & Francis Group, LLC
Toward Networking of Embedded Systems 1-13
One of the main bottlenecks in the development of safety-critical systems is the software development
process. The automotive industry clearly needs a software development process model and support-
ing tools suitable for the development of safety-critical software. At present, there are two potential
candidates: MISRA (Motor Industry Software Reliability Association) [68], which published recommen-
ded practices for safe automotive software. The recommended practices, although automotive specic,
do not support x-by-wire. IEC 61508 [69] is an international standard for electrical, electronic and pro-
grammable electronic safety related systems. IEC 61508 is not automotive specic, but broadly accepted
in other industries.
1.3.4 Sensor Networks
Another trend in networking of eld devices has emerged recently; namely, sensor networks, which is
another example of networked embedded systems. Here, the embedding factor is not as evident as in
other applications; particularly true for wireless and self-organizing networks where the nodes may be
embedded in the ecosystem or a battleeld, to mention some.
Although potential applications in the projected areas are still under discussion, the wireless
sensor/actuator networks are in the deployment stage by the manufacturing industry. The use of wireless
links with eld devices, such as sensors and actuators, allow for exible installation and maintenance,
mobile operation required in case of mobile robots, and alleviates problems with cabling. A wireless
communication system to operate effectively in the industrial/factory oor environment has to guarantee
high reliability, low and predictable delay of data transfer (typically, less than 10 msec for real time appli-
cations), support for high number of sensor/actuators (over 100 in a cell of a few meters radius), and low
power consumption, to mention some. In the industrial environments, the characteristic for the wireless
channel degradation artifacts can be compounded by the presence of electric motors or a variety of equip-
ment causing the electric discharge, which contributes to even greater levels of bit error and packet losses.
One way to partially alleviate the problem is either by designing robust and loss-tolerant applications
and control algorithms, or by trying to improve the channel quality; all subject of extensive research and
development.
In a wireless sensor/actuator network, stations may interact with each other on the peer-to-peer basis,
and with the base station. To leverage low cost, small size, and low-power consumptions, standard
Bluetooth (IEEE 802.15.1) 2.4 GHz radio transceivers [70, 71] may be used as the sensor/actuators com-
munication hardware. To meet the requirements for high reliability, low and predictable delay of data
transfer, and support for high number of sensor/actuators, custom optimized communication protocols
may be required as the commercially available solutions such as IEEE 802.15.1, IEEE 802.15.4 [72], and
IEEE 802.11 [7375] variants may not fulll all the requirements. The base station may have its transceiver
attached to a cable of a eldbus, giving rise to a hybrid wireless-wireline eldbus system [2].
A representative example of this kind of systems is a wireless sensor/actuator network developed
by ABB and deployed in a manufacturing environment [76]. The system, known as WISA (wireless
sensor/actuator) has been implemented in a manufacturing cell to network proximity switches, which are
some of the most widely used position sensors in automated factories to control positions of a variety of
equipment, including robotic arms, for instance. The sensor/actuators communication hardware is based
on a standard Bluetooth 2.4 GHz radio transceiver and low power electronics that handle the wireless
communication link. The sensors communicate with a wireless base station via antennas mounted in the
cell. For the base station, a specialized RF front end was developed to provide collision free air access
by allocating a xed TDMA time slot to each sensor/actuator. Frequency Hopping (FH) was employed
to counter both frequency-selective fading and interference effects, and operates in combination with
Automatic Retransmission Requests (ARQs). The parameters of this TDMA/FH scheme were chosen to
satisfy the requirements of up to 120 sensor/actuators per base station. Each wireless node has a response
or cycle time of 2 msec, to make full use of the available radio band of 80 MHz width. The FH sequences
are cell-specic and were chosen to have low cross-correlations to permit parallel operation of many
cells on the same factory oor with low self-interference. The base station can handle up to 120 wireless
2006 by Taylor & Francis Group, LLC
1-14 Embedded Systems Handbook
sensor/actuators and is connected to the control system via a (wireline) eld bus. To increase capacity, a
number of base stations can operate in the same area. WISA provides wireless power supply to the sensors,
based on magnetic coupling [77].
1.4 Concluding Remarks
This chapter has presented an overview of trends for networking of embedded systems, their design, and
selected application domain specic network technologies. The networked embedded systems appear in
a variety of application domains to mention automotive, train, aircraft, ofce building, and industrial
automation. With the exception of building automation, the systems discussed in this chapter tend to be
conned to a relatively small area covered and limited number of nodes, as in the case of an industrial
process, an automobile, or a truck. In the building automation controls, the networked embedded systems
may take on truly large proportions in terms of area covered and number of nodes. For instance, in a
LonTalk network, the total number of addressable nodes in a domain can reach 32385; up to 2
48
domains
can be addressed.
The wireless sensor/actuator networks, as well as wireless-wireline hybrid networks, have started
evolving from the concept to actual implementations, and are poised to have a major impact on industrial,
home, and building automation at least in these application domains, for a start.
The networked embedded systems pose a multitude of challenges in their design, particularly for safety-
critical applications, deployment, and maintenance. The majority of the development environments and
tools for specic networking technologies do not have rm foundations in computer science or software
engineering models and practices making the development process labor-intensive, error-prone, and
expensive.
References
[1] R. Zurawski (Ed.), The Industrial Communication Systems, Special Issue. In Proceedings of the
IEEE, 93, June 2005.
[2] J.-D. Decotignie, P. Dallemagne, and A. El-Hoiydi, Architectures for the Interconnection of Wire-
less and Wireline Fieldbusses. In Proceedings of the 4th IFAC Conference on Fieldbus Systems and
Their Applications 2001 (FET 2001), Nancy, France, 2001.
[3] H. Zimmermann, OSI Reference Model: The ISO Model of Architecture for Open System
Interconnection. IEEE Transactions on Communications, 28, 425432, 1980.
[4] Costrell, CAMAC Instrumentation System Introduction and General Description. IEEE-
Transactions-on-Nuclear-Science, 18, 38, 1971.
[5] C.-A. Gifford, A Military Standard for Multiplex Data Bus. In Proceedings of the IEEE-1974,
National Aerospace and Electronics Conference, May 1315, 1974, Dayton, OH, USA, pp. 8588.
[6] N. Collins, Boeing Architecture and TOP (Technical and Ofce Protocol). In Networking:
A-Large-Organization-Perspective, April, 1986, Melbourne, FL, USA, pp. 4954.
[7] H.A. Schutz, The Role of MAP in Factory Integration. IEEE Transactions on Industrial Electronics,
35, 612, 1988.
[8] P. Pleinevaux and J.-D. Decotignie, Time Critical Communication Networks: Field Buses. IEEE
Network, 2, 5563, 1988.
[9] International Electrotechnical Commission, Digital data communications for measurement and
control Fieldbus for use in industrial control systems, Part 1: Introduction. IEC 61158-1, IEC,
2003.
[10] International Electrotechnical Commission (IEC). www.iec.ch.
[11] International Organization for Standardization (ISO). www.iso.org.
[12] Instrumentation Society of America (ISA). www.isa.org.
[13] Comit Europen de Normalisation Electrotechnique (CENELEC). www.cenelec.org.
2006 by Taylor & Francis Group, LLC
Toward Networking of Embedded Systems 1-15
[14] European Committee for Standardization (CEN). www.cenorm.be.
[15] International Electrotechnical Commission, Digital data communications for measurement and
control Part 1: Prole sets for continuous and discrete manufacturing relative to eldbus use in
industrial control systems, IEC 61784-1, IEC, 2003.
[16] International Electrotechnical Commission, Real Time Ethernet Modbus-RTPS, Proposal for a
Publicly Available Specication for Real-Time Ethernet, document IEC 65C/341/NP, 2004.
[17] www.modbus-ida.org.
[18] International Electrotechnical Commission, Real Time Ethernet: EtherNet/IP with Time Synchron-
ization, Proposal for a Publicly Available Specication for Real-Time Ethernet, document IEC,
65C/361/NP, IEC, 2004.
[19] www.odva.org.
[20] www.controlnet.org.
[21] International Electrotechnical Commission, Real Time Ethernet: P-NET on IP, Proposal for a Publicly
Available Specication for Real-Time Ethernet, document IEC, 65C/360/NP, IEC, 2004.
[22] International Electrotechnical Commission, Real Time Ethernet Vnet/IP, Proposal for a Publicly
Available Specication for Real-Time Ethernet, document IEC, 65C/352/NP, IEC, 2004.
[23] International Electrotechnical Commission, Real Real Time Ethernet EPL (ETHERNET Powerlink),
Proposal for a Publicly Available Specication for Real-Time Ethernet, document IEC, 65C/356a/NP,
IEC, 2004.
[24] www.ethernet-powerlink.org.
[25] International Electrotechnical Commission, Real Time Ethernet TCnet (Time-Critical Control
Network), Proposal for a Publicly Available Specication for Real-Time Ethernet, document IEC,
65C/353/NP, IEC, 2004.
[26] International Electrotechnical Commission, Real Time Ethernet EPA (Ethernet for Plant
Automation), Proposal for a Publicly Available Specication for Real-Time Ethernet, document IEC
65C/357/NP, IEC, 2004.
[27] J. Feld, PROFINET Scalable Factory Communication for all Applications. In Proceedings of the
2004 IEEE International Workshop on Factory Communication Systems, September 2224, 2004,
Vienna, Austria, pp. 3338.
[28] www.probus.org.
[29] International Electrotechnical Commission, Real Time Ethernet SERCOS III, Proposal for a Publicly
Available Specication for Real-Time Ethernet, document IEC, 65C/358/NP, IEC, 2004.
[30] International Electrotechnical Commission, Real Time Ethernet Control Automation Technology
(ETHERCAT), Proposal for a Publicly Available Specication for Real-Time Ethernet, document
IEC, 65C/355/NP, IEC, 2004.
[31] www.ethercat.org.
[32] International Electrotechnical Commission, Real-Time Ethernet PROFINET IO, Proposal for a
Publicly Available Specication for Real-Time Ethernet, document IEC, 65C/359/NP, IEC, 2004.
[33] Deborah Snoonian, Smart Buildings. IEEE Spectrum, 40, 1823, 2003.
[34] D. Loy, D. Dietrich, and H. Schweinzer, Open Control Networks, Kluwer, Dordrecht, 2004.
[35] Steven T. Bushby. BACnet: A Standard Communication Infrastructure for Intelligent Buildings.
Automation in Construction, 6, 529540, 1997.
[36] ENV 13154-2, Data Communication for HVAC Applications Field net Part 2: Protocols,
1998.
[37] EIA/CEA 776.5, CEBus-EIB Router Communications Protocol The EIB Communications Protocol,
1999.
[38] EN 50090-X, Home and Building Electronic Systems (HBES), 19942004.
[39] Konnex Association, Diegem, Belgium. KNX Specications, V. 1.1, 2004.
[40] www.echelon.com.
[41] Control Network Protocol Specication, ANSI/EIA/CEA-709.1-A, 1999.
[42] Control Network Protocol Specication, EIA/CEA Std. 709.1, Rev. B, 2002.
2006 by Taylor & Francis Group, LLC
1-16 Embedded Systems Handbook
[43] Tunneling Component Network Protocols Over Internet Protocol Channels, ANSI/EIA/CEA 852,
2002.
[44] www.lonmark.org.
[45] F. Simonot-Lion, In-Car Embedded Electronic Architectures: How to Ensure Their Safety.
In Proceedings of the 5th IFAC International Conference on Fieldbus Systems and their Applica-
tions FeT2003, July 2003, Aveiro, Portugal.
[46] X-by-Wire Project, Brite-EuRam 111 Program, X-By-Wire Safety Related Fault Tolerant Systems
in Vehicles, Final report, 1998.
[47] TTTech Computertechnik GmbH. Time-Triggered Protocol TTP/C, High-Level Specication
Document, Protocol Version 1.1, November 2003. www.tttech.com.
[48] FlexRay Consortium, FlexRay Communication System, Protocol Specication, Version 2.0,
June 2004. www.exray.com.
[49] H. Kopetz and G. Bauer, The Time Triggered Architecture. Proceedings of the IEEE, 91, 112126,
2003.
[50] H. Kopetz et al., Specication of the TTP/A Protocol, University of Technology, Vienna, 2002.
[51] International Standard Organization, 11898-4, Road Vehicles Controller Area Network (CAN)
Part 4: Time-Triggered Communication, ISO, 2000.
[52] Society of Automotive Engineers, J2056/1 Class CApplication Requirements Classications. In SAE
Handbook, SAE, 1994.
[53] Society of Automotive Engineers, J2056/2 Survey of Known Protocols. SAE Handbook, Vol. 2, SAE,
1994.
[54] Antal Rajnak, The LIN Standard. In The Industrial Communication Technology Handbook, CRC
Press, Boca Raton, FL, 2005.
[55] Society of Automotive Engineers, Class B Data Communications Network Interface SAE J1850
Standard rev. nov96, 1996.
[56] International Standard Organization, ISO 11519-2, Road Vehicles Low Speed Serial Data
Communication Part 2: Low Speed Controller Area Network, ISO, 1994.
[57] International Standard Organization, ISO 11519-3, Road Vehicles Low Speed Serial Data
Communication Part 3: Vehicle Area Network (VAN), ISO, 1994.
[58] International Standard Organization, ISO 11898, Road Vehicles Interchange of Digital
Information Controller Area Network for High-speed Communication, ISO, 1994.
[59] SAE J1939 Standards Collection. www.sae.org.
[60] MOST Cooperation, MOST Specication Revision 2.3, August 2004. www.mostnet.de.
[61] www.idbforum.org.
[62] K. Keutzer, S. Malik, A.R. Newton, J. Rabaey, and A. Sangiovanni-Vincentelli, System Level Design:
Orthogonalization of Concerns and Platform-Based Design. IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, 19(12), 15231543, 2000.
[63] OSEK Consortium, OSEK/VDX Operating System, Version 2.2.2, July 2004. www.osek-vdx.org.
[64] OSEK Consortium, OSEK/VDX Communication, Version 3.0.3, July 2004. www.osek-vdx.org.
[65] OSEK Consortium, OSEK/VDX Fault-Tolerant Communication, Version 1.0, July 2001.
www.osek-vdx.org.
[66] www.automotive-his.de.
[67] www.asam.de.
[68] www.misra.org.uk.
[69] International Electrotechnical Commission, IEC 61508:2000, Parts 17, Functional Safety of
Electrical/Electronic/Programmable Electronic Safety-Related Systems, 2000.
[70] Bluetooth Consortium, Specication of the Bluetooth System, 1999. www.bluetooth.org.
[71] Bluetooth Special Interest Group, Specication of the Bluetooth System, Version 1.1, December 1999.
[72] LAN/MAN Standards Committee, IEEE Standard for Information Technology Telecommuni-
cations and Information Exchange between Systems Local and Metropolitan Area Networks
Specic Requirements Part 15.4: Wireless Medium Access Control (MAC) and Physical Layer
2006 by Taylor & Francis Group, LLC
Toward Networking of Embedded Systems 1-17
(PHY) Specications for Low Rate Wireless Personal Area Networks (LR-WPANs), IEEE Computer
Society, Washington, 2003.
[73] LAN/MAN Standards Committee of the IEEE Computer Society, IEEE Standard for Information
Technology Telecommunications and Information Exchange between Systems Local and
Metropolitan Networks Specic Requirements Part 11: Wireless LAN Medium Access Control
(MAC) and Physical Layer (PHY) Specications: Higher Speed Physical Layer (PHY) Extension in
the 2.4 GHz band, 1999.
[74] LAN/MAN Standards Committee of the IEEE Computer Society, Information Technology
Telecommunications and Information Exchange between Systems Local and Metropolitan Area
Networks Specic Requirements Part 11: Wireless LAN Medium Access Control (MAC) and
Physical Layer (PHY) Specications, 1999.
[75] Institute of Electrical and Electronic Engineering Part 11: Wireless LAN Medium Access Control
(MAC) and Physical Layer (PHY) Specications, Amendment 4: Further Higher Data Rate Extension
in the 2.4 GHz Band, June 2003, aNSI/IEEE Std 802.11.
[76] Christoffer Apneseth, Dacfey Dzung, Snorre Kjesbu, Guntram Scheible, and Wolfgang
Zimmermann, Introducing Wireless Proximity Switches. ABB Review, 4, 4249, 2002.
www.abb.com/review.
[77] Dacfey Dzung, Christoffer Apneseth, and Jan Endresen, A Wireless Sensor/Actuator Communi-
cation System for Real-Time Factory Applications, private communication. IEEE Transactions on
Industrial Electronics (submitted).
2006 by Taylor & Francis Group, LLC
2
Real-Time in
Embedded Systems
Hans Hansson,
Mikael Nolin,
and Thomas Nolte
Mlardalen University
2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
2.2 Design of RTSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Reference Architecture Models of Interaction Execution
Strategies Component-Based Design Tools for Design
of RTSs
2.3 Real-Time Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
Typical Properties of RTOSs Mechanisms for Real-Time
Commercial RTOSs
2.4 Real-Time Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11
Introduction to Scheduling Ofine Schedulers
Online Schedulers
2.5 Real-Time Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
Communication Techniques Fieldbuses Ethernet for
Real-Time Communication Wireless Communication
2.6 Analysis of RTSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18
Timing Properties Methods for Timing Analysis Example
of Analysis Trends and Tools
2.7 Component-Based Design of RTS . . . . . . . . . . . . . . . . . . . . . . 2-25
Timing Properties and CBD Real-Time Operating
Systems Real-Time Scheduling
2.8 Testing and Debugging of RTSs. . . . . . . . . . . . . . . . . . . . . . . . . 2-29
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
In this chapter we will provide an introduction to issues, techniques, and trends in real-time
systems (RTSs). We will specically discuss design of RTSs, real-time operating systems (RTOSs), real-time
scheduling, real-time communication, real-time analysis, as well as testing and debugging of RTSs. For
each of these areas, state-of-the-art tools and standards are presented.
2.1 Introduction
Consider the airbag in the steering-wheel of your car. It should after the detection of a crash (and only
then) inate just in time to softly catch your head to prevent it from hitting the steering-wheel; not too
early since this would make the airbag deate before it can catch you; nor too late since the exploding
2-1
2006 by Taylor & Francis Group, LLC
2-2 Embedded Systems Handbook
airbag could then injure you by blowing up in your face and/or catch you too late to prevent your head
from banging into the steering wheel.
The computer controlled airbag systemis an example of a RTS. But RTSs come in many different avors,
including vehicles, telecommunication systems, industrial automation systems, household appliances, etc.
There is no commonly agreed upon denition of what a RTS is, but the following characterization is
(almost) universally accepted:
RTSs are computer systems that physically interact with the real world.
RTSs have requirements on the timing of these interactions.
Typically, the real-world interactions are via sensors and actuators, rather than the keyboard and screen
of your standard PC.
Real-time requirements typically express that an interaction should occur within specied timing
bound. It should be noted that this is quite different from requiring the interaction to be as fast as
possible.
Essentially all RTSs are embedded in products, and the vast majority of embedded computer systems are
RTSs. RTSs is the dominating application of computer technology, as more than 99%of the manufactured
processors (more than 8 billion in 2000 [1]) are used in embedded systems.
Returning to the airbag system, we note that, in addition to being a RTS it is a safety-critical system,
that is, a system that owing to severe risks of damage have strict Quality of Service (QoS) requirements,
including requirements on the functional behavior, robustness, reliability, and timeliness.
A typical strict timing property could be that a certain response to an interaction always must occur
within some prescribed time, for example, the charge in the airbag must detonate between 10 and 20 msec
from the detection of a crash; violating this must be avoided at any cost, since it would lead to something
unacceptable, such as having to spend a couple of months in hospital. A system that is designed to meet
strict timing requirements is often referred to as a hard RTS. In contrast, systems for which occasional
timing failures are acceptable possibly because this will not lead to something terrible are termed
soft RTS.
An illustrative comparison between hard and soft RTSs that highlights the difference between the
extremes is shown in Table 2.1. A typical hard RTS could in this context be an engine control system,
whichmust operate withsec-precision, andwhichwill severely damage the engine if timing requirements
fail by more than a few msec. A typical soft RTS could be a banking system, for which timing is important,
but where there are no strict deadlines and some variations in timing are acceptable.
Unfortunately, it is impossible tobuildreal systems that satisfy hardreal-time requirements, since, owing
to the imperfection of hardware (and designers) any system may break. The best that can be achieved is a
system that, with very high probability provides the intended behavior during a nite interval of time.
However, on the conceptual level hard real-time makes sense, since it implies a certain amount of rigor
in the way the system is designed, for example, it implies an obligation to prove that the strict timing
requirements are met.
TABLE 2.1 Typical Characteristics of Hard- and Soft-RTSs [2]
Characteristic Hard real-time Soft real-time
Timing requirements Hard Soft
Pacing Environment Computer
Peak-load performance Predictable Degraded
Error detection System User
Safety Critical Noncritical
Redundancy Active Standby
Time granularity Millisecond Second
Data les Small Large
Data integrity Short term Long term
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-3
Since the early 1980s a substantial research effort has provided a sound theoretical foundation
(e.g., [3, 4]) and many practically useful results for the design of hard RTSs. Most notably, hard RTS
scheduling has evolved into a mature discipline, using abstract, but realistic, models of tasks executing
on single CPU, multiprocessor, or distributed computer systems, together with associated methods for
timing analysis. Such schedulability analysis, for example, the well-known rate-monotonic analysis [57],
have also found signicant use in some industrial segments.
However, hard real-time scheduling is not the cure for all RTSs. Its main weakness is that it is based on
analysis of the worst possible scenario. For safety-critical systems this is of course a must, but for other
systems, where general customer satisfaction is the main criteria, it may be too costly to design the system
for a worst-case scenario that may not occur during the systems lifetime.
If we look at the other end of the spectrum, we nd the best-effort approach, which is still the dominating
approach in the industry. The essence of this approach is to implement the system using some best
practice, and then use measurements, testing, and tuning to make sure that the system is of sufcient
quality. On one hand such a system will hopefully satisfy some soft real-time requirement; the weakness
being that we do not know which. On the other hand, compared with the hard real-time approach, the
system can be better optimized for the available resources. A further difference is that hard RTS methods
essentially are applicable to static congurations only, whereas it is less problematic to handle dynamic
task creation etc., in best-effort systems.
Having identied the weaknesses of the hard real-time and best-effort approaches major efforts are
now put into more exible techniques for soft RTSs. These techniques provide analyzability (such as hard
real-time), together with exibility and resource efciency (such as best-effort). The basis for the exible
techniques are often quantied QoS characteristics. These are typically related to nonfunctional aspects,
such as timeliness, robustness, dependability, and performance. To provide a specied QoS, some sort of
resource management is needed. Such a QoS management is either handled by the application, by the
operating system (OS), by some middleware, or by a mix of the above. The QoS management is often a
exible online mechanism that dynamically adapts the resource allocation to balance between conicting
QoS requirements.
2.2 Design of RTSs
The main issue in designing RTSs is timeliness, that is, that the system performs its operations at proper
points in time. Not considering timeliness at the design phase will make it virtually impossible to analyze
and predict the timely behavior of the RTS. This section presents some important architectural issues for
embedded RTSs, together with some supporting commercial tools.
2.2.1 Reference Architecture
A generic system architecture for a RTS is depicted in Figure 2.1. This architecture is a model of any
computer-based system interacting with an external environment via sensors and actuators.
Since our focus is on the RTS we will look more into different organizations of that part of the generic
architecture in Figure 2.1. The simplest RTS is a single processor, but in many cases the RTS is a distributed
computer system consisting of a set of processors interconnected by a communications network. There
could be several reasons for making an RTS distributed, including:
The physical distribution of the application.
The computational requirements that may not be conveniently provided by a single CPU.
The need for redundancy to meet availability, reliability, or other safety requirements.
To reduce the cabling in the system.
Figure 2.2 shows an example of a distributed RTS. In a modern car, like the one depicted in the gure,
there are some 20 to 100 computer nodes (which in the automotive industry are called Electronic Control
2006 by Taylor & Francis Group, LLC
2-4 Embedded Systems Handbook
Environment
RTS
Sensors Actuators
FIGURE 2.1 A generic RTS architecture.
SCM
MMS
RSM
SRM
GSM
SHM
SWS
LSM
CPM
SHM
PAS
ISM
UEM
MP1
CCM
DEM
SUM
SUB
AEM
REM
ATM
AUD
SRS
SAS PSM
BSC
SWM
DDM CEM
DIM
ICM
PDM
PHM
MP2
MMM
ICM ECM
BCM
TCM
FIGURE 2.2 Network infrastructure of Volvo XC90.
Units [ECUs]) interconnected with one or more communication networks. The initial motivation for
this type of electronic architecture in cars was the need to reduce the amount of cabling. However, the
electronic architecture has also led to other signicant improvements, including substantial pollution
reduction and new safety mechanisms, such as computer controlled Electronic Stabilization Programs
(ESPs). The current development is toward making the most safety-critical vehicle functions, such as
braking and steering, completely computer controlled. This is done by removing the mechanical connec-
tions (e.g., between steering wheel and front wheels, and between break pedal and breaks), replacing them
with computers and computer networks. Meeting the stringent safety requirements for such functions
will require careful introduction of redundancy mechanisms in hardware and communication, as well
as software, that is, a safety-critical system architecture is needed (an example of such an architecture is
TTA [8]).
2.2.2 Models of Interaction
In Section 2.2.1 we presented the physical organization of a RTS, but for an application programmer this
is not the most important aspect of the system architecture. Actually, from an application programmers
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-5
perspective the system architecture is given more by the execution paradigm (execution strategy) and the
interaction model used in the system. In this section we describe what an interaction model is and how it
affects the real-time properties of a system, and in Section 2.2.3 we discuss the execution strategies used
in RTSs.
A model of interaction describes the rules by which components interact with each other (in this
section we will use the term component to denote any type of software unit, such as a task or a
module). The interaction model can govern both control ow and data ow between system com-
ponents. One of the most important design decisions, for all types of systems, is which interaction
models to use (sadly, however, this decision is often implicit, and hidden in the systems architectural
description).
When designing RTSs, attention should be paid to the timing properties of the interaction models
chosen. Some models have a more predictable and robust behavior with respect to timing than other
models. Examples of some of the more predictable models that are commonly used in RTSs design, are
pipes-and-lters, publishersubscriber, and blackboard.
On the other end of the spectrum of interaction models there are models that increase the (timing)
unpredictability of the system. These models should, if possible, be avoided when designing RTSs. The
two most notable, and commonly used, are clientserver and message boxes.
2.2.2.1 Pipes-and-Filters
In this model, both data and control ow is specied using input and output ports of components.
A component becomes eligible for execution when data has arrived on its input ports and when the
component nishes execution it produces output on its output ports.
This model ts well for many types of control programs, and control laws are easily mapped to this
interaction model. Hence, it has gained widespread use in the real-time community. The real-time
properties of this model are also quite nice. Since both data and control ows unidirectionally through a
series of components, the order of execution and end-to-end timing delay usually becomes predictable.
The model alsoprovides a highdegree of decoupling intime; that is, components canoftenexecute without
having to worry about timing delays caused by other components. Hence, it is usually straightforward to
specify the compound timing behavior of set of components.
2.2.2.2 PublisherSubscriber
The publishersubscriber model is similar to the pipes-and-lters model but it usually decouples data
and control ow. That is, a subscriber can usually choose different forms for triggering its execution.
If the subscriber chooses to be triggered on each new published value, the publishersubscriber model
takes on the form of the pipes-and-lters model. However, in addition, a subscriber could choose to
ignore the timing of the published values and decide to use the latest published value. Also, for the
publishersubscriber model, the publisher is not necessarily aware of the identity, or even the existence,
of its subscribers. This provides a higher degree of decoupling of components.
Similar to the pipes-and-lters model, the publishersubscriber model provides goodtiming properties.
However, a prerequisite for analysis of systems using this model is that subscriber components make
explicit the values they subscribe to (this is not mandated by the model itself). However, when using
the publishersubscriber model for embedded systems, it is the norm that subscription information is
available (this information is used, for instance, to decide the values that are to be published over a
communications network, and to decide the receiving nodes of those values).
2.2.2.3 Blackboard
The blackboard model allows variables to be published on a globally available blackboard area.
Thus, it resembles the use of global variables. The model allows any component to read or write
values to variables in the blackboard. Hence, the software engineering qualities of the blackboard
model is questionable. Nevertheless, it is a model that is commonly used, and in some situations it
2006 by Taylor & Francis Group, LLC
2-6 Embedded Systems Handbook
provides a pragmatic solution to problems that are difcult to address with more stringent interaction
models.
Software engineering aspects aside, the blackboard model does not introduce any extra elements of
unpredictable timing. On the other hand, the exibility of the model does not help engineers to achieve
predictable systems. Since the model does not address the control ow, components can execute relatively
undisturbed and decoupled from other components.
2.2.2.4 ClientServer
In the clientserver model, a client asynchronously invokes the service of a server. The service invocation
passes the control ow (plus any input data) to the server, and control stays at the server until it has
completed the service. When the server is done, the control ow (and any return data) is returned to the
client which in turn resumes execution.
The clientserver model has inherently unpredictable timing. Since services are invokedasynchronously,
it is very difcult to a priori asses the load on the server for a certain service invocation. Thus, it is difcult
to estimate the delay of the service invocation and, in turn, it is difcult to estimate the response time of
the client. This matter is furthermore complicated by the fact that most components often behave both
as clients and as servers (a server often uses other servers to implement its own services); leading to very
complex and unanalyzable control ow paths.
2.2.2.5 Message Boxes
Acomponent can have a set of message boxes, and components communicate by posting messages in each
others message boxes. Messages are typically handled in First In First Out (FIFO) order, or in priority
order (where the sender species a priority). Message passing does not change the ow of control for
the sender. A component that tries to receive a message from an empty message box, however, blocks on
that message box until a message arrives (often the receiver can specify a timeout to prevent indenite
blocking).
From a senders point of view, the message box model has similar problems as the clientserver model.
The data sent by the sender (and the action that the sender expects the receiver to perform) may be delayed
in an unpredictable way when the receiver is highly loaded. Also, the asynchronous nature of the message
passing makes it difcult to foresee the load of a receiver at any particular moment.
Furthermore, from the receivers point of view, the reading of message boxes is unpredictable in the
sense that the receiver may or may not block on the message box. Also, since message boxes often are of
limited size, there is a risk that a highly loaded receiver loose some message. Lost messages are another
source of unpredictability.
2.2.3 Execution Strategies
There are two main execution paradigms for RTSs: time-triggered and event-triggered. On one hand, when
using timed-triggered execution, activities occur at predened instances of time, for example, a specic
sensor value is read exactly every 10 msec and 2 msec, later the corresponding actuator receives an
updated control parameter. In an event-triggered execution, on the other hand, actions are triggered by
event occurrences, for example, when the toxic uid in a tank reaches a certain level an alarm will go
off. It should be noted that the same functionality, typically, can be implemented in both paradigms,
for example, a time-triggered implementation of the above alarm would be to periodically read the level-
measuring sensor and activate the alarm when the read level exceeds the maximum allowed. If alarms
are rare, the time-triggered version will have much higher computational overhead than the event-
triggered one. On the other hand, the periodic sensor readings will facilitate detection of a malfunctioning
sensor.
Time-triggered executions are used in many safety-critical systems with high dependability require-
ments (such as avionic control systems), whereas the majority of other systems are event-triggered.
Dependability can also be guaranteed in the event-triggered paradigm, but owing to the observability
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-7
provided by the exact timing of time-triggered executions, most experts argue for using time-triggered
in ultra-dependable systems. The main argument against time-triggered is its lack of exibility and the
requirement of pre runtime schedule generation (which is a nontrivial and possibly time-consuming task).
Time-triggered systems are mostly implemented by simple proprietary table-driven dispatchers [9]
(see Section 2.4.2 for a discussion on table-driven execution), but complete commercial systems including
design tools are also available [10, 11]. For the event-triggered paradigm a large number of commercial
tools and OSs are available (examples are given in Section 2.3.3). There are also examples of systems
integrating the two execution paradigms, thereby aiming at getting the best of two worlds: time-triggered
dependability and event-triggered exibility. One example is the Basement system [12] and its associated
real-time kernel Rubus [13].
Since computations in time-triggered systems are statically allocated both in space (to a specic
processor) and time, some sort of conguration tool is often used. This tool assumes that the com-
putations are packaged into schedulable units (corresponding to tasks or threads in an event-triggered
system). Typically, for example, in Basement, computations are control-ow based, in the sense that
they are dened by sequences of schedulable units, each unit performing a computation based on its
inputs and producing outputs to the next unit in sequence. The system is congured by dening the
sequences and their timing requirements. The conguration tool will then automatically (if possible)
1
generate a schedule which guarantees that all timing requirements are met.
Event-triggered systems typically have a richer and more complex Application Programming Interfaces
(APIs), dened by the used OS and middleware, which will be elaborated on in Section 2.3.
2.2.4 Component-Based Design
Component-Based Design (CBD) of software systems is an interesting approach for software engineering
in general, and for engineering of RTSs in particular. In CBD, a software component is used to encapsulate
some functionality. That functionality is only accessed through the interface of the component. A system
is composed by assembling a set of components and connecting their interfaces.
The reason CBD could prove extra useful for RTSs is the possibility to extend components with intro-
spective interfaces. An introspective interface does not provide any functionality per se, rather the interface
can be used to retrieve information about extra-functional properties of the component. Extra-functional
properties can include attributes such as memory consumption, execution times, task periods, etc. For
RTS, timing properties are of course of particular interest.
Unlike the functional interfaces of components, the introspective interfaces can be available ofine,
that is, during the component assembly phase. This way, the timing attributes of the system components
can be obtained at design time and tools to analyze the timing behavior of the system could be used. If the
introspective interfaces are also available online they could be used in, for instance, admission control
algorithms. An admission control could query new components for their timing behavior and resource
consumption before deciding to accept new component to the system.
Unfortunately, many industry standard software techniques are based on the clientserver or the
message-box models of interaction, which we deemed, in Section 2.2.2, unt for RTSs. This is espe-
cially true for the most commonly used component models. For instance, the Corba Component Model
(CCM) [14], Microsofts COM [15] and .NET [16] models, and Java Beans [17] all have the clientserver
model as their core model. Also, none of these component technologies allow the specication of extra-
functional properties through introspective interfaces. Hence, from the real-time perspective, the biggest
advantage of CBD is void for these technologies.
However, there are numerous research projects addressing CBD for real-time and embedded systems
(e.g., [1821]). These projects are addressing the issues left behind by the existing commercial technolo-
gies, such as timing predictability (using suitable computational models), support for ofine analysis of
1
This scheduling problem is theoretically intractable, so the conguration tool will have to rely on some heuristics
which works well in practice, but which does not guarantee to nd a solution in all cases when there is a solution.
2006 by Taylor & Francis Group, LLC
2-8 Embedded Systems Handbook
component assemblies, and better support for resource constrained systems. Often, these projects strive
to remove the considerable runtime exibility provided by existing technologies. This runtime exibility
is judged to be the foremost contributor to unpredictability (the exibility is also adding to the runtime
complexity and prevents CBD for resource constrained systems).
2.2.5 Tools for Design of RTSs
In the industry the term real-time system is highly overloaded, and can mean anything from interactive
systems to superfast systems, or embedded systems. Consequently, it is not easy to judge what tools are
suitable for developing RTSs (as we dene real-time in this chapter).
For instance, UML [22] is commonly used for software design. However, UMLs focus is mainly
on clientserver solutions, and it has proven inapt for RTSs design. As a consequence, UML-based
tools that extend UML with constructs suitable for real-time programs have emerged. The two most
known products are Rationals Rose RealTime [23] and i-Logix Rhapsody [24]. These tools provide
UML support with the extension of real-time proles. While giving real-time engineers access to suit-
able abstractions and computational models, these tools do not provide means to describe timing
properties or requirements in a formal way; thus they do not allow automatic verication of timing
requirements.
TeleLogic provides programming and design support using the language SDL [25]. SDL was originally
developed as a specication language for the telecomindustry, and is as such highly suitable for describing
complex reactive systems. However, the fundamental model of computation is the message-box model,
which has an inherently unpredictable timing behavior. However, for soft embedded RTSs, SDL can give
very time- and space-efcient implementations.
For more resource constrained hard RTSs, design tools are provided by, for example, Arcticus
System [13], TTTech [10], and Vector [26]. These tools are instrumental during both system design
and implementation, and also provide some timing analysis techniques that allow timing verication of
the system (or parts of the system). However, these tools are based on proprietary formats and processes,
and have as such reached a limited customer base (mainly within the automotive industry).
Within the near future UML2 will become an adopted standard [27]. UML2 has support for compu-
tational models suitable for RTSs. This support comes mainly in the form of ports that can have protocols
associated to them. Ports are either provided or required, hence allowing type-matching of connections
between components. UML2 also includes much of the concepts from Rose RealTime, Rhapsody, and
SDL. Other, future design techniques that are expected to have an impact on the design of RTSs include,
the EAST/EEA Architecture Description Language (EAST-ADL) [28]. The EAST-ADL is developed by the
automotive industry and is a description language that will cover the complete development cycle of
distributed, resource constrained, safety critical, RTSs. Tools to support development with EAST-ADL
(which is a UML2 compliant language) are expected to be provided by automotive tool vendors such as
ETAS [29], Vector [30], and Siemens [31].
2.3 Real-Time Operating Systems
A RTOS provides services for resource access and resource sharing, very similar to a general-purpose
OS. An RTOS, however, provides additional services suited for real-time development and also supports
the development process for embedded systems. Using a general-purpose OS when developing RTSs has
several drawbacks:
High resource utilization, for example, large RAM and ROM footprints, and high internal
CPU-demand.
Difcult to access hardware and devices in a timely manner, for example, no application level
control over interrupts.
Lack of services to allow timing sensitive interactions between different processes.
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-9
2.3.1 Typical Properties of RTOSs
The state of practice inRTOSs is reectedinReference 32. Not all OSs are RTOSs. ARTOS is typically multi-
threaded and preemptible, there has to be a notion of thread priority, predictable thread synchronization
has to be supported, priority inheritance should be supported, and the OS behavior should be known [33].
This means that the interrupt latency, worst-case execution time (WCET) of system calls, and maximum
time during which interrupts are masked must be known. A commercial RTOS is usually marketed as the
runtime component of an embedded development platform.
As a general rule of thumb one can say that RTOSs are:
Suitable for resource constrained environments. RTSs typically operate in such environments. Most
RTOSs can be congured pre runtime (e.g., at compile time) to only include a subset of the total
functionality. Thus, the application developer can choose to leave out unused portions of the RTOS
in order to save resources. RTOSs typically store much of their conguration in ROM. This is done
mainly for two purposes: (1) minimize the use of expensive RAM memory and (2) minimize the
risk that critical data is overwritten by an erroneous application.
Giving the application programmer easy access to hardware features. These include interrupts and
devices. Most often the RTOSs give the application programmer means to install Interrupt Ser-
vice Routines during compile time and/or during runtime. This means that the RTOS leaves all
interrupt handing to the application programmer, allowing fast, efcient, and predictable handling
of interrupts. In general-purpose OSs, memory-mapped devices are usually protected from direct
access using the MMU (Memory Management Unit) of the CPU, hence forcing all device accesses
to go through the OS. RTOSs typically do not protect such devices, but allow the application to
directly manipulate the devices. This gives faster and more efcient access to the devices. (However,
this efciency comes at the price of an increased risk of erroneous use of the device.)
Providing services that allow implementation of timing sensitive code. An RTOS typically has many
mechanisms to control the relative timing between different processes in the system. Most notably,
an RTOS has a real-time process scheduler whose function is to make sure that the processes
execute in the way the application programmer intended them to. We will elaborate more on the
issues of scheduling in Section 2.4. An RTOS also provides mechanisms to control the processes
relative performance when accessing shared resources. This can, for instance, be done by priority
queues, instead of plain FIFO queues as used in general-purpose OSs. Typically, an RTOS supports
one or more real-time resource locking protocols, such as priority inheritance or priority ceiling
(Section 2.3.2 discusses resource locking protocols further).
Tailored to t the embedded systems development process. RTSs are usually constructed in a host
environment that is different from the target environment, so called cross platform development.
Also, it is typical that the whole memory image, including both RTOS and one or more applica-
tions, is created at the host platform and downloaded to the target platform. Hence, most RTOSs
are delivered as source code modules or precompiled libraries that are statically linked with the
applications at compile time.
2.3.2 Mechanisms for Real-Time
One of the most important functions of an RTOS is to arbitrate access to shared resources in such a way
that the timing behavior of the systembecomes predictable. The two most obvious resource that the RTOS
manages access to are:
The CPU that is, the RTOS should allow processes to execute in a predictable manner.
Shared memory areas that is, the RTOS should resolve contention to shared memory in a way
that gives predictable timing.
The CPU access is arbitrated with a real-time scheduling policy. Section 2.4 will, in more depth,
describe real-time scheduling policies. Examples of scheduling policies that can be used in RTSs
2006 by Taylor & Francis Group, LLC
2-10 Embedded Systems Handbook
are priority scheduling, deadline scheduling, or rate scheduling. Some of these policies directly use
timing attributes (like deadline) of the tasks to perform scheduling decisions, whereas other policies
use scheduling parameters (like priority, rate, or bandwidth) that indirectly affect the timing of the
tasks.
A special form of scheduling, which is also very useful for RTSs, is table-driven (static) scheduling.
Table-driven scheduling is described further in Section 2.4.2. To summarize, in table-driven scheduling
all arbitration decisions have been made ofine and the RTOS scheduler just follows a simple table. This
gives very good timing predictability, albeit on the expense of system exibility.
The most important aspect of a real-time scheduling policy is that it should provide means to a priori
analyze the timing behavior of the system, hence giving a predictable timing behavior of the system.
Scheduling in general-purpose OSs normally emphasizes properties such as fairness, throughput, and
guaranteed progress; these properties may be adequate in their own respect, however, they are usually in
conict with the requirement that an RTOS should provide timing predictability.
Shared resources (such as memory areas, semaphores, and mutexes) are also arbitrated by the RTOS.
When a task locks a shared resource it will block all other tasks that subsequently tries to lock the resource.
In order to achieve predictable blocking times special real-time resource locking protocols have been
proposed ([34, 35] provides more details about the protocols).
2.3.2.1 Priority Inheritance Protocol
The priority inheritance protocol (PIP) makes a lowpriority task inherit the priority of any higher priority
task that becomes blocked on a resource locked by the lower priority task.
This is a simple and straightforward method to lower the blocking time. However, it is computationally
intractable to calculate the worst-case blocking (which may be innite since the protocol does not prevent
deadlocks). Hence, for hard RTSs or when timing performance needs to be calculated a priori, the PIP is
not adequate.
2.3.2.2 Priority Ceiling Inheritance Protocol
The priority ceiling protocol (PCP) associates, to each resource, a ceiling value that is equal to the highest
priority of any task that may lock the resource. By clever use of the ceiling values of each resource, the
RTOS scheduler will manipulate task priorities to avoid the problems of PIP.
PCP guarantees freedom from deadlocks, and the worst-case blocking is relatively easy to calculate.
However, the computational complexity of keeping track of ceiling values and task priorities gives PCP
high runtime overhead.
2.3.2.3 Immediate Ceiling Priority Inheritance Protocol
The immediate inheritance protocol (IIP) also associates, to each resource, a ceiling value that is equal to
the highest priority of any task that may lock the resource. However, different from PCP, in IIP a task is
immediately assigned the ceiling priority of the resource it is locking.
IIP has the same real-time properties as PCP (including the same worst-case blocking time).
2
However,
IIP is signicantly more easy to implement. It is, in fact, for single node systems easier to implement than
any other resource locking protocol (including non-real-time protocols). In IIP no actual locks need to be
implemented, it is enough for the RTOS to adjust the priority of the task that locks or releases a resource.
IIP has other operational benets, notably it paves the way for letting multiple tasks use the same stack
area. OSs based on IIP can be used to build systems with footprints that are extremely small [36, 37].
2.3.3 Commercial RTOSs
There is an abundance of commercial RTOSs. Most of them provide adequate mechanisms to enable
development of RTSs. Some examples are Tornado/VxWorks [38], LYNX [39], OSE [40], QNX [41],
RT-Linux [42], and ThreadX [43]. However, the major problem with these OSs is the rich set of
2
The average blocking time will however be higher in IIP than in PCP.
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-11
primitives provided. These systems provide both primitives that are suitable for RTSs and primitives
that are unt for RTSs (or that should be used with great care). For instance, they usually provide
multiple resource locking protocols; some of which are suitable and some of which are not suitable for
real-time.
This richness becomes a problem when these OSs are used by inexperienced engineers and/or project
managers when projects are large and project management does not provide clear design guidelines/rules.
In these situations, it is very easy to use primitives that will contribute to timing unpredictability of the
developed system. Rather, an RTOS should help the engineers and project managers by providing only
mechanisms that help in designing predictable systems. However, there is an obvious conict between
the desire/need of RTOS manufacturers to provide rich interfaces and stringency needed by designers
of RTSs.
There is a smaller set of RTOSs that have been designed to resolve these problems, and at the same time
also allowextreme lightweight implementations of predictable RTSs. The driving idea is to provide a small
set of primitives that guides the engineers toward good design of their system. Typical examples are the
research RTOS Asterix [36] and the commercial RTOS SSX5 [37]. These systems provide a simplied task
model, in which tasks cannot suspend themselves (e.g., no sleep() primitive) and tasks are restarted
from their entry point on each invocation. The only resource locking protocol that is supported is IIP, and
the scheduling policy is a xed priority scheduling. These limitations makes it possible to build an RTOS
that is able to run, for example, ten tasks using less than 200 bytes of RAM, and at the same time giving
predictable timing behavior [44]. Other commercial systems that follow a similar principle of reducing
the degrees of freedom and hence promote stringent design of predictable RTSs include Arcticus Systems
Rubus OS [13].
Many of the commercial RTOSs provide standard APIs. The most important RTOS standards are
RT-POSIX [45], OSEK [46], and APEX [47]. Here we will only deal with POSIX since it is the most widely
adopted RTOS standard, but those interested in automotive and avionic systems should take a closer look
at OSEK and APEX, respectively.
The POSIX standard is based on Unix, and its goal is portability of applications at the source code
level. The basic POSIX services include task and thread management, le system management, input
and output, and event notication via signals. The POSIX real-time interface denes services facilita-
ting concurrent programming and providing predictable timing behavior. Concurrent programming is
supported by synchronization and communication mechanisms that allow predictability. Predictable
timing behavior is supported by preemptive xed priority scheduling, time management with high
resolution, and virtual memory management. Several restricted subsets of the standard intended for
different types of systems have been dened, as well as specic language bindings, for example, for
Ada [48].
2.4 Real-Time Scheduling
Traditionally, real-time schedulers are divided into ofine and online schedulers. Ofine schedulers make
all scheduling decisions before the system is executed. At runtime a simple dispatcher is used to activate
tasks according to the ofine generated schedule. Online schedulers, on the other hand, decide during
execution, based on various parameters, which task should execute at any given time.
As there are loads of different schedulers developed in the research community, in this section we have
focused on highlighting the main categories of schedulers that are readily available in existing RTOSs.
2.4.1 Introduction to Scheduling
A RTS consists of a set of real-time programs, which in turn consists of a set of tasks. These tasks are
sequential pieces of code, executing on a platform with limited resources. The tasks have different timing
2006 by Taylor & Francis Group, LLC
2-12 Embedded Systems Handbook
properties, for example, execution times, periods, and deadlines. Several tasks can be allocated to a single
processor. The scheduler decides, at each moment, which task to execute.
A RTS can be preemptive or nonpreemptive. In a preemptive system, tasks can preempt each other,
letting the task with the highest priority execute. In a nonpreemptive system a task that has been allowed
to start will execute until its completion.
Tasks can be categorized into either being periodic, sporadic, or aperiodic. Periodic task execute with a
specied time (period) between task releases. Aperiodic tasks have no information saying when the task is
to be released. Usually aperiodics are triggered by interrupts. Similarly, sporadic tasks have no period, but
in contrast with aperiodics, sporadic tasks have a known minimum time between releases. Typically, tasks
that perform measurements are periodic, collecting some value(s) every nth time unit. A sporadic task
is typically reacting to an event/interrupt that we know has a minimum interarrival time, for example,
an alarm or the emergency shut down of a production robot. The minimum interarrival time can be
constrained by physical laws, or it can be enforced by some hardware mechanism. If we do not know the
minimum time between two consecutive events, we must classify the event-handling task to be aperiodic.
A real-time scheduler schedules the real-time tasks sharing the same resource (e.g., a CPU or a network
link). The goal of the scheduler is to make sure that the timing requirements of these tasks are satised. The
scheduler decides, based on the task timing properties, which task has to execute or to use the resource.
2.4.2 Ofine Schedulers
Ofine schedulers, or table-driven schedulers, work as follows: the schedulers create a schedule (the table)
before the system is started (ofine). At runtime, a dispatcher follows the schedule, and makes sure that
tasks are only executing at their predetermined time slots (according to the schedule). Ofine schedules
are commonly used to implement the time-triggered execution paradigm (described in Section 2.2.3).
By creating a schedule ofine, complex timing constraints can be handled in a way that would be
difcult to do online. The schedule that is created will be used at runtime. Therefore, the online behavior
of table-driven schedulers is very deterministic. Because of this determinism, table-driven schedulers are
more commonly used in applications that have very high safety-critical demands. However, since the
schedule is created ofine, the exibility is very limited, in the sense that as soon as the system changes
(due to, e.g., adding of functionality or change of hardware), a new schedule has to be created and given
to the dispatcher. To create new schedules is nontrivial and sometimes very time consuming.
There also exist combinations of the predictable table-driven scheduling and the more exible priority-
based schedulers, and there exists methods to convert one policy to another [13, 49, 50].
2.4.3 Online Schedulers
Scheduling policies that make their scheduling decisions during runtime are classied as online schedulers.
These schedulers make their scheduling decisions based onsome task properties, for example, task priority.
Schedulers that base their scheduling decisions on task priorities are also called priority-based schedulers.
2.4.3.1 Priority-Based Schedulers
Using priority-based schedulers the exibility is increased (compared with table-driven schedulers), since
the schedule is created online, based on the currently active tasks constraints. Hence, priority-based
schedulers can cope with changes in workload and added functions, as long as the schedulability of the
task set is not violated. However, the exact behavior of priority-based schedulers is harder to predict.
Therefore, these schedulers are not used often in the most safety-critical applications.
Two common priority-based scheduling policies are Fixed-Priority Scheduling (FPS) and Earliest
Deadline First (EDF). The difference between these scheduling policies is whether the priorities of the
real-time tasks are xed or if they can change during execution (i.e., they are dynamic).
In FPS, priorities are assigned to the tasks before execution (ofine). The task with the highest priority
among all tasks that are available for execution is scheduled for execution. It can be proven that some
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-13
priority assignments are better than others. For instance, for a simple task model with strictly periodic
noninterfering tasks with deadlines equal to the period of the task, a Rate Monotonic (RM) priority
assignment has been shown by Liu and Layland [5] to be optimal. In RM, the priority is assigned based
on the period of the task. The shorter the period is, the higher priority will be assigned to the task.
Using EDF, the task withthe nearest (earliest) deadline among all available tasks is selectedfor execution.
Therefore the priority is not xed, it changes with time. It has been shown that for simple task models
EDF is an optimal dynamic priority scheme [5].
2.4.3.2 Scheduling with Aperiodics
In order for the priority-based schedulers to cope with aperiodic tasks, different service methods have been
presented. The objective of these service methods is to give a good average response time for aperiodic
requests, while preserving the timing properties of periodic and sporadic tasks. These services are imple-
mented using special server tasks. In the scheduling literature many types of servers are described. Using
FPS, for instance, the Sporadic Server (SS) is presented by Sprunt et al. [51]. The SS has a xed priority
chosen according to the RMpolicy. Using EDF, Dynamic Sporadic Server (DSS) [52,53] extends SS. Other
EDF-based schedulers are the Constant Bandwidth Server (CBS), presented by Abeni and Buttazzo [54],
and the Total Bandwidth Server (TBS) by Spuri and Buttazzo [52, 55]. Each server is characterized partly
by its unique mechanism for assigning deadlines, and partly by a set of variables used to congure the
server. Examples of such variables are bandwidth, period, and capacity.
In Section 2.6 we give examples of how timing properties of FPS can be calculated.
2.5 Real-Time Communications
Real-time communication aims at providing timely and deterministic communication of data between
distributed devices. Inmany cases, there are requirements to provide guarantees of the real-time properties
of these transmissions. There are real-time communicationnetworks of different types, ranging fromsmall
eldbus-based control systems to large Ethernet/Internet distributed applications. There is also a growing
interest for wireless solutions.
Inthis sectionwe give a brief introductiontocommunications ingeneral andreal-time communications
inparticular. We thenprovide anoverviewof the currently most popular real-time communicationsystems
and protocols, both in industry and in academia.
2.5.1 Communication Techniques
Common access mechanisms used in communication networks are CSMA/CD (Carrier Sense Multiple
Access/Collision Detection), CSMA/CA (Carrier Sense Multiple Access/Collision Avoidance), TDMA
(Time Division Multiple Access), Tokens, Central Master, and Mini Slotting. These techniques are used
in both real-time and non-real-time communication, and each of the techniques have different timing
characteristics.
In CSMA/CD, collisions between messages are detected, causing the messages involved in the collision
to be retransmitted. CSMA/CD is used, for example, in Ethernet. CSMA/CA, on the other hand, is
avoiding collisions and is therefore more deterministic in its behavior compared with CSMA/CD. Hence,
CSMA/CA is more suitable for hard real-time guarantees, whereas CSMA/CD can provide soft real-time
guarantees. Examples of networks that implement CSMA/CA are Controller Area Networks (CAN) and
ARINC 629.
TDMA is using time to achieve exclusive usage of the network. Messages are sent at predetermined
instances in time. Hence, the behavior of TDMA-based networks is very deterministic, that is, very suitable
to provide real-time guarantees. One example of a TDMA-based real-time network is TTP.
An alternative way of eliminating collisions on the network is to use tokens. In token-based networks
only the owner of the (unique within the network) token is allowed to send messages on the network.
2006 by Taylor & Francis Group, LLC
2-14 Embedded Systems Handbook
Once the token holder is done or has used its allotted time the token is passed to another node. Tokens
are used in, for example, Probus.
It is also possible to eliminate collisions by letting one node in the network be the master node. The
master node is controlling the trafc on the network, and it decides which and when messages are allowed
to be sent. This approach is used in, for example, LIN and TTP/A.
Finally, mini slotting can also be used to eliminate collisions. When using mini slotting, as soon as
the network is idle and some node would like to transmit a message, the node has to wait for a unique
(for each node) time before sending any messages. If there are several competing nodes wanting to send
messages, the node with the longer waiting time will see that there is another node that already has started
its transmission of a message. In such a situation the node has to wait until the network becomes idle
again. Hence, collisions are avoided. Mini slotting can be found in, for example, FlexRay and ARINC 629.
2.5.2 Fieldbuses
Fieldbuses are a family of factory communication networks that have evolved as a response to the demand
toreduce cabling costs infactory automationsystems. By moving froma situationinwhichevery controller
has its own cables connecting the sensors to the controller (parallel interface), to a system with a set of
controllers sharing a bus (serial interface), costs could be cut and exibility could be increased. Pushing
for this evolution of technology was both the fact that the number of cables in the system increased as the
number of sensors and actuators grew, together with controllers moving from being specialized with their
own microchip, to sharing a microprocessor with other controllers. Fieldbuses were soon ready to handle
the most demanding applications on the factory oor.
Several eldbus technologies, usually very specialized, were developed by different companies to meet
the demands of their applications. Fieldbuses used in the automotive industry are, for example, CAN,
TT-CAN, TTP, LIN, and FlexRay. In avionics, ARINC 629 is one of the frequently used communication
standards. Probus is widely used in automation and robotics, while in trains TCN and WorldFIP are very
popular communication technologies. We will now present each of these eldbuses in some more detail,
outlining key features and specic properties.
2.5.2.1 Controller Area Network
The Controller Area Network (CAN) [56] was standardized by the International Standardisation
Organisation (ISO) [57] in 1993. Today CAN is a widely used eldbus, mainly in automotive systems
but also in other real-time applications, for example, medical equipment. CAN is an event-triggered
broadcast bus designed to operate at speeds of up to 1 Mbps. CAN is using a xed-priority based arbi-
tration mechanism that can provide timing guarantees using FPS type of analysis [58, 59]. An example of
this analysis will be provided in Section 2.6.3.
CAN is a collision-avoidance broadcast bus, using deterministic collision resolution to control access
to the bus (so-called CSMA/CA). The basis for the access mechanism is the electrical characteristics of a
CANbus, allowing sending nodes to detect collisions in a nondestructive way. By monitoring the resulting
bus value during message arbitration, a node detects if there are higher priority messages competing for
the access to the bus. If this is the case, the node will stop the message transmission, and try to retransmit
the message as soon as the bus becomes idle again. Hence, the bus is behaving like a priority-based queue.
2.5.2.2 Time-Triggered CAN
Time-triggered communication on CAN (TT-CAN) [60] is a standardized session layer extension to the
original CAN. InTT-CAN, the exchange of messages is controlled by the temporal progressionof time, and
all nodes are following a predened static schedule. It is also possible to support original event-triggered
CAN trafc together with the time-triggered trafc. This trafc is sent in dedicated arbitration windows,
using the same arbitration mechanism as native CAN.
The static schedule is based on a time division (TDMA) scheme, where message exchanges may only
occur during specic time slots or in time windows. Synchronization of the nodes is done using either a
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-15
clock synchronization algorithm, or by periodic messages from a master node. In the latter case, all nodes
in the system are synchronizing with this message, which gives a reference point in the temporal domain
for the static schedule of the message transactions, that is, the masters view of time is referred to as the
networks global time.
TT-CAN appends a set of new features to the original CAN, and being standardized, several
semiconductor vendors are manufacturing TT-CAN compliant devices.
2.5.2.3 Flexible Time-Triggered CAN
Flexible time-triggered communication on CAN (FTT-CAN) [61, 62] provides a way to schedule CAN in
a time-triggered fashion with support for event-triggered trafc as well. In FTT-CAN, time is partitioned
into Elementary Cycles (ECs) which are initiated by a special message, the Trigger Message (TM). This
message triggers the start of the EC and contains the schedule for the time-triggered trafc that shall
be sent within this EC. The schedule is calculated and sent by a master node. FTT-CAN supports both
periodic and aperiodic trafc by dividing the EC into two parts. In the rst part, the asynchronous
window, the aperiodic messages are sent, and in the second part, the synchronous window, trafc is sent
in a time-triggered fashion according to the schedule delivered by the TM. FTT-CAN is still mainly an
academic communication protocol.
2.5.2.4 Time-Triggered Protocol
The Time-Triggered Protocol Class C for, TTP/C [10, 63], is a TDMA based communication network
intended for truly hard real-time communication. TTP/Cis available for network speeds of up to 25 Mbps.
TTP/C is part of the Time-Triggered Architecture (TTA) by Kopetz [10, 64], which is designed for safety-
critical applications. TTP/C has support for fault tolerance, clock synchronization, membership services,
fast error detection, and consistency checks. Several major automotive companies are supporting this
protocol.
For the less hard RTSs (e.g., soft RTSs), there exists a scaled-down version of TTP/C called TTP/A [10].
2.5.2.5 Local Interconnect Network
The Local Interconnect Network LIN [65], was developed by the LIN Consortium (including Audi, BMW,
DaimlerChrysler, Motorola, Volvo, and VW) as a low cost alternative for small networks. LIN is cheaper
than, for example, CAN. LIN is using the UART/SCI interface hardware, and transmission speeds are
possible up to 20 Kbps. Among the nodes in the network, one node is the master node, responsible for
synchronization of the bus. The trafc is sent in a time-triggered fashion.
2.5.2.6 FlexRay
FlexRay [66] was proposed in 1999 by several major automotive manufacturers, for example, Daimler
Chrysler and BMW, as a competitive next generation eldbus replacing CAN. FlexRay is a real-time
communication network that provides both synchronous and asynchronous transmissions with network
speeds up to 10 Mbps. For the synchronous trafc FlexRay is using TDMA, providing deterministic data
transmissions with a bounded delay. For the asynchronous trafc mini-slotting is used. Compared with
CAN, FlexRay is more suitable for the dependable application domain, by including support for redundant
transmission channels, bus guardians, and fast error detection and signaling.
2.5.2.7 ARINC 629
For avionic andaerospace communicationsystems, theARINC429[67] standardandits newer ARINC629
[67] successor are the most commonly used communication systems today. ARINC 629 supports both
periodic and sporadic communication. The bus is scheduled in bus cycles, which in turn are divided in
two parts. In the rst part periodic trafc is sent, and in the second part the sporadic trafc is sent. The
arbitration of messages is based on collision avoidance (i.e., CSMA/CA) using mini-slotting. Network
speeds are as high as 2 Mbps.
2006 by Taylor & Francis Group, LLC
2-16 Embedded Systems Handbook
2.5.2.8 Probus
Probus [68] is used in process automation and robotics. There are three different versions of
Probus: (1) Probus-DP is optimized for speed and low cost, (2) Probus-PA is designed for pro-
cess automation, and (3) Probus-FMS is a general purpose version of Probus. Probus provides
masterslave communication together with token mechanisms. Probus is available with data rates up
to 12 Mbps.
2.5.2.9 Train Communication Network
The Train Communication Network (TCN) [69] is widely used in trains, and implements the IEC 61275
standard as well as the IEEE 1473 standard. TCN is composed of two networks: the Wire Train Bus (WTB)
and the Multifunction Vehicle BUS (MVB). The WTB is the network used to connect the whole train, that
is, all vehicles of the train. Network data rate is up to 1 Mbps. The MVB is the network used within one
vehicle. Here the maximum data rate is 1.5 Mbps.
Both the WTB and the MVB are scheduled in cycles called basic periods. Each basic period consists of
a periodic phase and a sporadic phase. Hence, there is a support for both periodic and sporadic type of
trafc. The difference between the WTB and the MVB (apart from the data rate) is the length of the basic
periods (1 or 2 msec for the MVB and 25 msec for the WTB).
2.5.2.10 WorldFIP
The WorldFIP [70] is a very popular communication network in train control systems. WorldFIP is
based on the ProducerDistributorConsumers (PDC) communication model. Currently, network speeds
are as high as 5 Mbps. The WorldFIP protocol denes an application layer that includes PDC- and
messaging-services.
2.5.3 Ethernet for Real-Time Communication
In parallel with the search for the holy grail of real-time communication, Ethernet has established itself as
the de facto standard for non-real-time communication. Comparing networking solutions for automation
networks and ofce networks, eldbuses was the choice for the former. At the same time, Ethernet
developed as the standard for ofce automation, and owing to its popularity, prices on networking
solutions dropped. Ethernet is not originally developed for real-time communication since the original
intention with Ethernet is to maximize throughput (bandwidth). However, nowadays, a big effort is being
made in order to provide real-time communication using Ethernet. The biggest challenge is to provide
real-time guarantees using standard Ethernet components.
The reason why Ethernet is not very suitable for real-time communication is its handling of collisions
on the network. Several proposals to minimize or eliminate the occurrence of collisions on Ethernet have
been proposed. The following sections present some of these proposals.
2.5.3.1 TDMA
A simple solution would be to eliminate the occurrence of collisions on the network. This has been
explored by, for example, Kopetz et al. [71], using a TDMA protocol on top of Ethernet.
2.5.3.2 Usage of Tokens
Another solution to eliminate the occurrence of collisions is the usage of tokens. Token-based solutions
[72, 73] on the Ethernet also eliminates collisions, but is not compatible with standard hardware.
A token-based communication protocol is a way to provide real-time guarantees on most types
of networks. This is because they are deterministic in their behavior, although a dedicated net-
work is required. That is, all nodes sharing the network must obey the token protocol. Examples
of token-based protocols are the Timed Token Protocol (TTP) [74] and the IEEE 802.5 Token Ring
Protocol.
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-17
2.5.3.3 Modied Collision Resolution Algorithm
A different approach is to modify the collision resolution algorithm [75, 76]. Using standard Ethernet
controllers, the modied collision resolution algorithm is nondeterministic. In order to make a deter-
ministic modied collision resolution algorithm, a major modication of the Ethernet controllers is
required [77].
2.5.3.4 Virtual Time and Window Protocols
Another solution to real-time communication using Ethernet is the usage of the Virtual Time CSMA
(VTCSMA) [7880] protocol, where packets are delayed in a deterministic way in order to eliminate the
occurrence of collisions. Moreover, Window Protocols [81] are using a global window (synchronized time
interval) that also remove collisions. The window protocol is more dynamic and somewhat more efcient
in its behavior compared with the VTCSMA approach.
2.5.3.5 Master/Slave
A fairly straightforward way of providing real-time trafc on Ethernet is by using a master/slave
approach. As a part of the FTT framework [82], FTT Ethernet [83] is proposed as a master/multislave
protocol. At the cost of some computational overhead at each node in the system, timely delivery of
messages on Ethernet is provided.
2.5.3.6 Trafc Smoothing
The most recent work, without modications to the hardware or networking topology (infrastructure),
is the usage of trafc smoothing. Trafc smoothing can be used to eliminate bursts of trafc [84, 85] that
have severe impact on the timely delivery of message packets on the Ethernet. By keeping the network
load below a given threshold, a probabilistic guarantee of message delivery can be provided. Hence, trafc
smoothing could be a solution for soft RTSs.
2.5.3.7 Black Bursts
Black burst [86] is implementing a collision avoidance protocol on Ethernet. When a station wants to
submit a message, the station waits until the network is idle, i.e., no trafc is being transmitted. Then,
to avoid collisions, the transmitting station starts jamming the network. Several transmitting stations
might start jamming the network at the same time. However, each station is using unique length jamming
signals, always allowing a unique station to win. Winning means that once the jamming signal is over, the
network should be idle, i.e., no other stations are jamming the network. If this is the case, the message is
transmitted. In other cases, a loosing station will wait until the network is idle again, and the mechanism
starts over. Hence, no message collisions will occur on the network.
2.5.3.8 Switches
Finally, a completely different approach to achieve real-time communication using Ethernet is by changing
the infrastructure. One way of doing this is to construct the Ethernet using switches to separate collision
domains. By using these switches, a collisionfree network is provided. However, this requires newhardware
supporting the IEEE 802.1p standard. Therefore it is not an as attractive solution for existing networks as,
for example, trafc smoothing.
2.5.4 Wireless Communication
There are no commercially available wireless communication protocols providing real-time guarantees.
3
Two of the more common used wireless protocols today are the IEEE 802.11 (WLAN) and Bluetooth.
However, these protocols are not providing the temporal guarantees needed for hard real-time commu-
nication. Today, a big effort is being made (as with Ethernet) to provide real-time guarantees for wireless
communication, possibly by using either WLAN or Bluetooth.
3
Bluetooth provides real-time guarantees limited to streaming voice trafc.
2006 by Taylor & Francis Group, LLC
2-18 Embedded Systems Handbook
2.6 Analysis of RTSs
The most important property to analyze in a RTS is its temporal behavior, that is, the timeliness of the
system. The analysis should provide strong evidence that the system performs as intended at the correct
time. This section will give an overview of the basic properties that are analyzed in a RTS. The section
concludes with a presentation of trends and tools in the area of RTS analysis.
2.6.1 Timing Properties
Timing analysis is a complex problem. Not only are the used techniques sometimes complicated, but also
the problem itself is elusive; for instance, what is the meaning of the term program execution time?
Is it the average time to execute the program, or the worst possible time, or does it mean some form of
normal execution time? Under what conditions does a statement regarding program execution times
apply? Is the program delayed by interrupts or higher priority tasks? Does the time include waiting for
shared resources? etc.
To straighten out some of these questions, and to be able to study some existing techniques for timing
analysis, we categorize timing analysis into three major types. Each type has its own purpose, benets, and
limitations. The types are listed below.
2.6.1.1 Execution Time
This refers to the execution time of a singe task (or program, or function, or any other unit of single-
threaded sequential code). The result of an execution-time analysis is the time (i.e., the number of clock
cycles) the task takes to execute, when executing undisturbed on a single CPU, that is, the result should
not account for interrupts, preemption, background DMA transfers, DRAM refresh delays, or any other
type of interfering background activities.
At a rst glance, leaving out all types of interference from the execution-time analysis would give
us unrealistic results. However, the purpose of the execution-time analysis is not to deliver estimates on
real-worldtiming when executing the task. Instead, its role is to nd out howmuch computing resources
is needed to execute the task. (Hence, background activities that are not related to the task should not be
accounted for.)
There are some different types of execution times that can be of interest:
Worst-case execution time (WCET). This is the worst possible execution time a task could exhibit,
or equivalently, the maximum amount of computing resources required to execute the task. The
WCET should include any possible atypical task execution such as exception handling or clean up
after abnormal task termination.
Best-case execution time (BCET). During some types of real-time analysis, not only the WCET is
used, but also, as we will describe later, having knowledge about the BCET of tasks is useful.
Average execution time (AET). The AET can be useful in calculating throughput gures for a
system. However, for most RTS analysis the AET is of less importance, simply since a reasonable
approximation of the average case is easy to obtain during testing (where typically, the average
system behavior is studied). Also, only knowing the average and not knowing any other statistical
parameters such as standard deviation or distribution function makes statistical analysis difcult.
For analysis purposes a more pessimistic metric such as the 95% quartile would be more useful.
However, analytical techniques using statistical metrics of execution time are scarce and not very
well developed.
2.6.1.2 Response Time
The response time of a task is the time it takes from the invocation to the completion of the task. In other
words, the time from when the task is rst placed in the OSs ready-queue to the time when it is removed
from the running state and placed in the idle or sleeping state.
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-19
Typically, for analysis purposes it is assumed that a task does not voluntarily suspend itself dur-
ing its execution. That is, the task may not call primitives such as sleep() or delay(). However,
involuntary suspension, such as blocking on shared resources, is allowed. That is, primitives such as
get_semaphore() and lock_database_tuple() are allowed. When a program voluntarily
suspends itself, then that program should be broken down into two (or more) analysis tasks.
The response time is typically a system level property, in that it includes interference from other, unre-
lated, tasks and parts of the system. The response time also includes delays caused by contention on
shared resources. Hence, the response time is only meaningful when considering a complete system, or in
distributed systems, a complete node.
2.6.1.3 End-to-End Delay
The described execution time and response time are useful concepts since they are relatively easy to
understand and have well dened scopes. However, when trying to establish the temporal correctness
of a system, knowing the WCET and/or the response times of tasks is often not enough. Typically, the
correctness criteria is stated using end-to-end latency timing requirements, for instance, an upper bound
on the delay between the input of a signal and the output of a response.
In a given implementation there may be a chain of events taking place between the input of a signal and
the output of a response. For instance, one task may be in charge of reading the input and another task
for generating the response, and the two tasks may have to exchange messages on a communications link
before the response can be generated. The end-to-end timing denotes timing of externally visible events.
2.6.1.4 Jitter
The termjitter is used as a metric for variability in time. For instance, the jitter in execution time of a task
is the difference between the tasks BCET and WCET. Similarly, the response-time jitter of a task is the
difference between its best-case response time and its worst-case response time. Often, control algorithms
has requirements that the jitter of the output should be limited. Hence, the jitter is sometimes a metric
equally important as the end-to-end delay.
Also input to the system can have jitter. For instance, an interrupt which is expected to be periodic may
have a jitter (owing to some imperfection in the process generating the interrupt). In this case the jitter
value is used as a bound on the maximum deviation from the ideal period of the interrupt. Figure 2.3
illustrates the relation between the period and the jitter for this example.
Note that jitter shouldnot accumulate over time. For our example, eventhoughtwosuccessive interrupts
could arrive closer than one period, in the long run, the average interrupt interarrival time will be that of
the period.
In the above list of types of time, we only mentioned time to execute programs. However, in many
RTSs, other timing properties may also exist. This includes delays on communications network and other
resources, such as hard disk drives may be causing delays and need to be analyzed. The above introduced
times can all be mapped on to different types of resources, for instance, the WCET of a task corresponds to
the maximum size of a message to be transmitted, and the response time of message is dened analogous
to the response time of a task.
Time
I
n
t
e
r
r
u
p
t
I
n
t
e
r
r
u
p
t
Period Jitter
Earliest time Latest time
FIGURE 2.3 Jitter used as a bound on variability in periodicity.
2006 by Taylor & Francis Group, LLC
2-20 Embedded Systems Handbook
2.6.2 Methods for Timing Analysis
When analyzing hard RTSs it is essential that the estimates obtained during timing analysis are safe.
An estimate is considered safe if it is guaranteed that it is not an underestimation of the actual worst-case
time. It is also important that the estimate is tight, meaning that the estimated time is close to the actual
worst-case time.
For the previously dened types of timings (Section 2.6.1) there are different methods available that
are given in the following sections.
2.6.2.1 Execution-Time Estimation
For real-time tasks the WCET is the most important execution time measure to obtain. Sadly, however, it
is also often the most difcult measure to obtain.
Methods to obtain the WCET of a task can be divided into two categories: (1) static analysis, and
(2) dynamic analysis. Dynamic analysis is essentially equivalent to testing (i.e., executing the task on the
target hardware) and has all the drawbacks/problems that testing exhibits (such as being tedious and error
prone). One major problem with dynamic analysis is that it does not produce safe results. In fact, the
result can never exceed the true WCET and it is very difcult to make sure that the estimated WCET is
really the true WCET.
Static analysis, on the other hand, can give guaranteed safe results. Static analysis is performed by
analyzing the code (source and/or object code is used) and basically counting the number of clock
cycles that the task may use to execute (in the worst possible case). Static analysis uses models of the
hardware to predict execution times for each instruction. Hence, for modern hardware it may be very
difcult to produce static analyzers that give good results. One source of pessimism in the analysis
(i.e., overestimation) is hardware caches; whenever an instruction or data item cannot be guaranteed to
reside in the cache, a static analyzer must assume a cache miss. And since modeling the exact state of
caches (sometimes of multiple levels), branch predictors etc. is very difcult and time consuming, few
tools that give adequate results for advanced architectures exists. Also, to performa programowand data
analysis that exactly calculates, for example, the number of times a loop iterates or the input parameters
for procedures is difcult.
Methods for good hardware and software modeling do exist in the research community, however,
combining these methods into good quality tools has proven tedious.
2.6.2.2 Schedulability Analysis
The goal of schedulability analysis is to see whether or not a system is schedulable. A system is deemed
schedulable if it is guaranteed that all task deadlines will always be met. For statically scheduled (table
driven) systems, calculation of response times are trivially given from the static schedule. However, for
dynamically scheduled systems (such as xed priority or deadline scheduling) more advanced techniques
have to be used.
There are two main classes of schedulability analysis techniques: (1) response-time analysis, and
(2) utilization analysis. As the name suggest, a response-time analysis calculates an (safe) estimate of
the worst-case response time of a task. That estimate can then be compared with the deadline of the task
and if it does not exceed the deadline then the task is schedulable. Utilization analysis, in contrast, does not
directly derive the response times for tasks, rather they give a boolean result for each task telling whether
or not the task is schedulable. This result is based on the fraction of utilization of the CPU for a relevant
subset of the tasks, hence the term utilization analysis.
Both the analyses are based on similar types of task models. However, typically, the task models used
for analysis are not the task models provided by commercial RTOSs. This problem can be resolved by
mapping one or more OS task on to one or more analysis task. However, this mapping has to be performed
manually and requires an understanding of the limitations of the analysis task model and the analysis
technique used.
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-21
2.6.2.3 End-to-End Delay Estimation
The typical way to obtain end-to-end delay estimations is to calculate the response time for each
task/message in the end-to-end chain and to summarize these response times to obtain an end-to-end
estimate. When using an utilization based analysis technique (in which no response time is calculated)
one has to resort to using the task/message deadlines as safe upper bounds on the response times.
However, when analyzing distributed RTSs, it may not be possible to calculate all response times in one
pass. The reason for this is that delays on one node will lead to jitter on another node, and that this jitter
may in turn affect the response times on that node. Since jitter can propagate in several steps between
nodes, in both directions, there may not exist a right order to analyze the nodes. (If A sends a message
to B, and B sends a message to A; which node should one analyze rst?) Solution to these type of problems
are called holistic schedulability analysis methods (since they consider the whole system). The standard
method for holistic response-time analysis is to repeatedly calculate response times for each node (and
update jitter values in the nodes affected by the node just analyzed) until response times do not change
(i.e., a x point is reached).
2.6.2.4 Jitter Estimation
To calculate the jitter one need not only perform a worst-case analysis (of for instance, response time or
end-to-end delay), but also perform a best-case analysis.
However, eventhough best-case analysis techniques oftenare conceptually similar to worst-case analysis
techniques, there has been little attention paid to best-case analysis. One reason for not spending too much
time on best-case analysis is that it is quite easy to make a conservative estimate of the best-case: the best-
case time is never less than zero (0). Hence, in many tools it is simply assumed that the BCET(for instance)
is zero, whereas great efforts can be spent analyzing the WCET.
However, it is important to have tight estimates of the jitter, and to keep the jitter as low as possible.
It has been shown that the number of execution paths a multi-tasking RTS can take, dramatically increases
if jitter increases [87]. Unless the number of possible execution paths is kept as low as possible it becomes
very difcult to achieve good coverage during testing.
2.6.3 Example of Analysis
In this section we give simple examples of schedulability analysis. We show a very simple example of how
a set of tasks running on a single CPU can be analyzed, and we also give an example of how the response
times for a set of messages sent on a CAN bus can be calculated.
2.6.3.1 Analysis of Tasks
This example is based on some 30 year old task models and is intended to give the reader a feeling for how
these types of analysis work. Todays methods allow for far richer and more realistic task models; with
resulting increase of complexity of the equations used (hence they are not suitable for use in our example).
In the rst example we will analyze a small task set as described in Table 2.2, where T, C, and D denote
the tasks period, WCET, and deadline, respectively. In this example T = D for all tasks and priorities have
been assigned in RM order, that is, the highest rate gives the highest priority.
TABLE 2.2 Example Task Set for Analysis
Task T C D Prio
X 30 10 30 High
Y 40 10 40 Medium
Z 52 10 52 Low
2006 by Taylor & Francis Group, LLC
2-22 Embedded Systems Handbook
For the task set in Table 2.2 original analysis techniques of Liu and Layland [5], and Joseph and Pandya
[88] are applicable, and we can perform both utilization-based and response-time based schedulability
analysis.
We start with the utilization based analysis; for this task model Liu and Laylands result is that a task set
of n tasks is schedulable if its total utilization, U
tot
, is bounded by the following equation:
U
tot
n(2
1/n
1)
Table 2.3 shows the utilization calculations performed for the schedulability analysis. For our example,
task set n = 3 and the bound is approximately 0.78. However, the utilization (U
tot
=
n
i=1
(C
i
/T
i
)) for
our task set is 0.81, which exceeds the bound. Hence, the task set fails the RMtest and cannot be deemed
schedulable.
Joseph and Pandyas response-time analysis allows us to calculate worst-case response time, R
i
, for each
task i in our example (Table 2.2). This is done using the following formula:
R
i
= C
i
+
jhp(i)
R
i
T
j
C
j
(2.1)
where hp(i) denotes the set of tasks with priority higher than task i.
The observant reader may have noticed that equation 2.1 is not on closed form, in that R
i
is not
isolated on the left-hand side of the equality. As a matter of fact, R
i
cannot be isolated on the left-hand
side of the equality; instead equation 2.1 has to be solved using x-point iteration. This is done with the
recursive formula in equation 2.1, starting with R
0
i
= 0 and terminating when a x point has been reached
(i.e., when R
m+1
i
= R
m
i
).
R
m+1
i
= C
i
+
jhp(i)
R
m
i
T
j
C
j
(2.2)
For our example task set Table 2.4 shows the results of calculating equation 2.1. From the table we can
conclude that no deadlines will be missed and that the system is schedulable.
Remarks As we could see for our example task set in Table 2.2, the utilization based test could not deem
the task set as schedulable whereas the response-time basedtest could. This situationis symptomatic for the
relation between utilization based and response-time based schedulability tests. That is, the response-time
based tests nd more task sets schedulable than the utilization based tests.
TABLE 2.3 Result of RM Test
Task T C D Prio U
X 30 10 30 High 0.33
Y 40 10 40 Medium 0.25
Z 52 10 52 Low 0.23
Total 0.81
Bound 0.81
TABLE 2.4 Result of Response-Time Analysis for Tasks
Task T C D Prio R R D
X 30 10 30 High 10 Yes
Y 40 10 40 Medium 20 Yes
Z 52 10 52 Low 52 Yes
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-23
TABLE 2.5 Example CAN-Message Set
Message T S D Id
X 350 8 300 00010
Y 500 6 400 00100
Z 1000 5 800 00110
TABLE 2.6 Result of Response-Time Analysis for CAN
Message T S D Id Prio C w R R D
X 350 8 300 00010 High 130 130 260 Yes
Y 500 6 400 00100 Medium 111 260 371 Yes
Z 1000 5 800 00110 Low 102 612 714 Yes
However, as alsoshownby the example, the response-time basedtest needs toperformmore calculations
than the utilization based tests. For this simple example the extra computational complexity of the
response-time test is insignicant. However, whenusing moderntask models (that are capable of modeling
realistic systems) the computational complexity of response-time based tests is signicant. Unfortunately,
for these advanced models, utilization based tests are not always available.
2.6.3.2 Analysis of Messages
In our second example we show how to calculate the worst-case response times for a set of periodic
messages sent over the CAN-bus (CAN is described in Section 2.5.2). We use a response-time analysis
technique similar to the one we used when we analyzed the task set in Table 2.2. In this example our
message set is given in Table 2.5, where T, S, D, and Id denotes the messages period, data size (in bytes),
deadline, and CAN identier, respectively. (The time-unit used in this example is bit-time, that is, the
time it takes to send one bit. For a 1 Mbit CAN this means that 1 time-unit is 10
6
sec.)
Before we attack the problem of calculating response times we extend Table 2.5 with two columns.
First, we need the priority of each message; in CAN this is given by the identier, the lower the numerical
value the higher the priority. Second, we need to know the worst-case transmission time of each message.
The transmission time is given partly by the message data size but we also need to add time for the frame
header and for any stuff bits.
4
The formula to calculate the transmission time, C
i
for a message i containing
S
i
bytes of pay load data is given below:
C
i
= 8S
i
+47 +
34 +8S
i
1
4
In Table 2.6 the two columns Prio and C shows the priority assignment and the transmission times for
our example message set.
Now we have all the data needed to perform the response-time analysis. However, since CAN is a
nonpreemptive resource the structure of the equation is slightly different from equation 2.1 which we
used for analysis of tasks. The response-time equation for CAN is given in equation 2.3.
R
i
= w
i
+C
i
w
i
= B
i
+
jhp(i)
w
i
+1
T
j
C
j
(2.3)
4
CAN adds stuff bits, if necessary, to avoid the two reserved bit patterns 000000 and 111111. These stuff bits are
never seen by the CAN user but have to be accounted for in the timing analysis.
2006 by Taylor & Francis Group, LLC
2-24 Embedded Systems Handbook
Inequation2.3, B
i
denotes the blocking time orginating froma lower priority message already intransition
when message i enters arbitration (B
i
135 which is the largest possible message), and hp(i) denotes the
set of messages with higher priority than message i. Note that (similar to equation 2.1) w
i
is not isolated
on the left-hand side of the equation, and its value has to be calculated using x-point iteration (compare
with equation 2.2).
Applying equation 2.3 we can now calculate the worst-case response time for our example messages.
In Table 2.6 the two columns w and R shows the results of the calculations, and the nal column shows
the schedulablilty verdict for each message.
As we can see from Table 2.6, our example message set is schedulable, meaning that the messages will
always be transmitted before their deadlines. Note that this analysis was made assuming that there will
not be any retransmissions of broken messages. Normally, CAN automatically retransmits any message
that has been broken owing to interference on the bus. To account for such automatic retransmissions an
error model needs to be adopted and the response-time equation adjusted accordingly, see, for example,
Reference 59.
2.6.4 Trends and Tools
As discussed earlier, and also illustrated by our example in Table 2.2, there is a mismatch between the
analytical task models and the task models provided by commonly used RTOSs. One of the basic problems
is that there is no one-to-one mapping between analysis tasks and RTOS tasks. In fact, for many systems
there is a N-to-N mapping between the task types. For instance, an interrupt handler may have to be
modeled as several different analysis task (one analysis task for each type of interrupt it handles), and
one OS task may have to be modeled as several analysis tasks (for instance, one analysis task per call to
sleep() primitives).
Also, current schedulability analysis techniques cannot adequately model other types of task synchro-
nization than locking/blocking on shared resources. Abstractions such as message queues are difcult to
include in the schedulability analysis.
5
Furthermore, tools to estimate the WCET are also scarce. Currently
only two tools that gives safe WCET estimates are commercially available [90, 91].
These problems have led to low penetration of schedulability analysis in industrial software-
development processes. However, in isolated domains, such real-time networks, some commercial tools
that are based on real-time analysis do exist. For instance, Volcano [92, 93] provides tools for the CAN
bus that allow system designers to specify signals on an abstract level (giving signal attributes such as size,
period, and deadline) and automatically derive a mapping of signals to CAN messages where all deadlines
are guaranteed to be met.
On the software side, tools provided by, for instance, TimeSys [94], Arcticus Systems [13], and TTTech
[10] can provide system development environments with timing analysis as an integrated part of the tool
suite. However, all these tools require that the software development processes is under complete control
of the respective tool. This requirement has limited the use of these tools.
The widespread use of UML [22] in software design has led to some specialized UML products for
real-time engineering [23, 24]. However, these products, as of today, do not support timing analysis of the
designed systems. There is, however, recent work within the OMG that species a prole Schedulability,
Performance, and Time(SPT) [95], which allows specication of both timing properties and requirement
in a standardized way. This will in turn lead to products that can analyze UML models conforming to the
SPT prole.
The SPT prole has, however, not been received without criticism. Critique has mainly come from
researchers active in the timing analysis eld, claiming both that the prole is not precise enough and that
some important concepts are missing. For instance, the Universidad de Cantabria has instead developed
5
Techniques to handle more advancedmodels include timedlogic andmodel checking. However, the computational
and conceptual complexity of these techniques has limited their industrial impact. However, there are examples of
commercial tools for this type of verication, for example, Reference [89].
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-25
the MASTUML prole and an associated MAST tool for analyzing MASTUML models [96, 97].
MAST allows modeling of advanced timing properties and requirement, and the tool also provides
state-of-the-art timing analysis techniques.
2.7 Component-Based Design of RTS
Component-Based Design is a current trend in software engineering. In the desktop-area component tech-
nologies like COM [15], .NET [16], and Java Beans [17] have gained widespread use. These technologies
give substantial benets, in terms of reduced development time and software complexity, when designing
complex and/or distributed systems. However, for RTSs these, and other, desktop oriented component
technologies does not sufce.
As stated before, the main challenge of designing RTSs is the need to consider issues that do not typically
apply to general-purpose computing systems. These issues include:
Constraints on extra-functional properties, such as timing, QoS, and dependability.
The need to statically predict (and verify) these extra-functional properties.
Scarce resources, including processing power, memory, and communication bandwidth.
In the commercially available component technologies today, there is little or no support for these
issues. Also on the academic scene, there are no readily available solutions to satisfactorily handle all these
issues.
In the remainder of this chapter we will discuss how these issues can be addressed in the context of
CBD. In doing so, we also highlight the challenges in designing a CBD process and component technology
for development of RTS.
2.7.1 Timing Properties and CBD
In general, for systems where timing is crucial there will necessarily be at least some global timing
requirements that have to be met. If the system is built from components, this will imply the need for
timing parameters/properties of the components and some proof that the global timing requirements
are met.
In Section 2.6 we introduced the following four types of timing properties:
execution time
response time
end-to-end delay
jitter.
So, how are these related to the use of a CBD methodology?
2.7.1.1 Execution Time
For a component used in a real-time context, an execution time measure will have to be derived. This
is, as discussed in Section 2.6, not an easy or satisfactorily solved problem. Furthermore, since execution
time is inherently dependent on the target hardware, and since reuse is the primary motivation for CBD,
it is highly desirable if the execution time for several targets would be available. (Alternatively, that the
execution time for new hardware platforms is automatically derivable.)
The nature of the applied component model may also make execution-time estimation more or less
complex. Consider, for instance, a clientserver oriented component model, with a server component that
provides services of different types, as illustrated in Figure 2.4(a). What does execution time mean for
such a component? Clearly, a single execution time is not appropriate, rather the analysis will require a set
of execution times related to servicing different requests. On the other hand, for a simple port-based object
component model [21] in which components are connected in sequence to form periodically executing
transactions (illustrated in Figure 2.4[b]), it could be possible to use a single execution time measure,
2006 by Taylor & Francis Group, LLC
2-26 Embedded Systems Handbook
Server component
Client comp.
Client comp.
Client comp.
(a) (b)
Client comp.
Client comp.
Client comp.
Client comp.
FIGURE 2.4 (a) A complex server component, providing multiple services to multiple users, and (b) a simple chain
of components implementing a single thread of control.
Task
Component
(a)
Task
Component Component
Component
Task
(b)
Component
Task
Task
Task
Component
Task Task
(c)
Task
(d)
Component
Task
Component
Task
Component
Task
Component
Component Component
FIGURE 2.5 Tasks and components: (a) one-to-one correspondence, (b) one-to-many correspondence, (c) many-
to-one correspondence, (b + c) many-to-many correspondence, and (d) irregular correspondence.
corresponding to the execution time required for reading the values at the input ports, performing the
computation, and writing values to the output ports.
2.7.1.2 Response Time
Response times denote the time from invocation to completion of tasks, and response-time analysis is the
activity to statically derive response-time estimates.
The rst question to ask from a CBD perspective is: what is the relation between a task and a
component?
This is obviously highly related to the component model used. As illustrated in Figure 2.5(a), there
could be a one-to-one mapping between components and tasks, but in general, several components could
be implemented in one task (Figure 2.5[b]) or one component could be implemented by several tasks
(Figure 2.5[c]), hence there is a many-to-many relation between components and tasks. In principle,
there could even be more irregular correspondence between components and tasks, as illustrated in
Figure 2.5(d). Furthermore, in a distributed system there could be a many-to-many relation between
components and processing nodes, making the situation even more complicated.
Once we have sorted out the relation between tasks and components, we can calculate the response times
of tasks, given that we have an appropriate analysis method for the used execution paradigm, and that
relevant execution time measures are available. However, to relate these response times to components
and the application level timing requirements may not be straightforward, but this is an issue for the
subsequent end-to-end analysis.
Another issue with respect to response times is how to handle communication delays in distrib-
uted systems. In essence there are two ways to model the communication, as depicted in Figure 2.6.
In Figure 2.6(a) the network is abstracted away and the intercomponent communication is handled by
the framework. In this case, response-time analysis is made more complicated since it must account for
different delays in intercomponent communication, depending on the physical location of components.
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-27
Network
Node Node Node
Comp.
Comp.
Comp.
Comp.
Comp.
Comp.
Comp.
Comp.
Comp.
Network
Node Node Node
Comp.
Comp.
Comp.
Comp.
Comp.
Comp.
Comp.
Comp.
Comp.
Network component
(a) (b)
Network
FIGURE 2.6 Components and communication delays: (a) communication delays can be part of the intercomponent
communication properties, and (b) communication delays can be timing properties of components.
In Figure 2.6(b), on the other hand, the network is modeled as a component itself, and network delays can
be modeled as delays in any other component (and intercomponent communication can be considered
instantaneous).
However, the choice of how to model network delays also has an impact on the software engineer-
ing aspects of the component model. In Figure 2.6(a), the communication is completely hidden from
the components (and the software engineers), hence giving optimizing tools many degrees of freedom
with respect to component allocation, signal mapping, and scheduling parameter selection. Whereas, in
Figure 2.6(b) the communication is explicitly visible to the components (and the software engineers),
hence putting a larger burden on the software engineers to manually optimize the system.
2.7.1.3 End-to-End Delay
End-to-end delays are application level timing requirements relating the occurrence in time of one event
to the occurrence of another event. As pointed out earlier, how to relate such requirements to the lower
level timing properties of components is highly dependent on both the component model and the timing
analysis model.
When designing RTS using CBD the component structure gives excellent information about the points
of interaction between the RTS and its environment. Since, end-to-end delays is about timing estimates
and timing requirements on such interactions, CBD gives a natural way of stating timing requirements in
terms of signals received or generated. (In traditional RTS development, the reception and generation of
signals is embedded into the code of tasks and are not externally visible, hence making it difcult to relate
response times of tasks to end-to-end requirements.)
2.7.1.4 Jitter
Jitter is an important timing parameter that is related to execution time, and that will affect response times
and end-to-end delays. There may also be specic jitter requirements. Jitter has the same relation to CBD
as does end-to-end delay.
2.7.1.5 Summary of Timing and CBD
As described earlier, there is no single solution for how to apply CBD to RTSs. In some cases, timing
analysis is made more complicated when using CBD, for example, when using clientserver oriented
component models, whereas in other cases, CBD actually helps timing analysis, for example, identifying
interfaces/events associated with end-to-end requirements is facilitated when using CBD.
Further, the characteristics of the component model has great impact on the analyzability of CBDed
RTSs. For instance, interaction patterns such as clientserver does not map well to established analysis
methods and makes analysis difcult, whereas pipes-and-lter based patterns (such as the port based
objects component model [21]) maps very well to existing analysis methods and allow for tight analysis of
timing behavior. Also, the executionsemantics of the component model has animpact onthe analyzability.
The execution semantics gives restrictions on how to map components to tasks, for example, in the Corba
Component Model [14] eachcomponent is assumedtohave its ownthreadof execution, making it difcult
2006 by Taylor & Francis Group, LLC
2-28 Embedded Systems Handbook
to map multiple components to a single thread. On the other hand, the simple execution semantics of
pipes-and-lter based models allow for automatic mapping of multiple components to a single task,
simplifying timing analysis and making better use of system resources.
2.7.2 Real-Time Operating Systems
There are two important aspects regarding CBDand RTOSs: (1) the RTOS may itself be component based,
and (2) the RTOS may support or provide a framework for CBD.
2.7.2.1 Component-Based RTOSs
Most RTOSs allow for ofine conguration where the engineer can choose to include or exclude large
parts of functionality. For instance, which communications protocols to include is typically congurable.
However, this type of congurability is not the same as the RTOS being component based (even though
the unit of conguration is often referred to as components in marketing material). For an RTOS to be
component based it is required that the components conform to a component model, which is typically
not the case in most congurable RTOSs.
There has been some research on component-based RTOSs, for instance, the research RTOS VEST
[18]. In VEST, schedulers, queue managers, and memory management is built up out of components.
Furthermore, special emphasis has beenput onpredictability andanalyzability. However, VESTis currently
still on the research stage and has not been released to the public. Publicly available is, however, the eCos
RTOS [98, 99] which provides a component based conguration tool. Using eCos components the RTOS
can be congured by the user, and third party extension can be provided.
2.7.2.2 RTOSs that Support CBD
Looking at component models in general and those intended for embedded systems in particular,
we observe that they are all supported by some runtime executive or simple RTOS. Many component
technologies provides frameworks that are independent of the underlying RTOS, and hence, RTOS can be
used to support CBD using such an RTOS-independent framework. Examples include Corbas ORB [100]
and the framework for PECOS [20, 101].
Other component technologies have a tighter coupling between the RTOS and component framework,
in that the RTOS explicitly supports the component model by providing the framework (or part of the
framework). Such technologies include:
Koala [19] is a component model and architectural description language from Philips. Koala
provides high-level APIs to the computing and audio/video hardware. The computing layer
provides a simple proprietary real-time kernel with priority-driven preemptive scheduling. Special
techniques for thread sharing is used to limit the number of concurrent threads.
The Chimera RTOS provides anexecutionframework for the Port-Based-Object component model
[21], intendedfor development of sensor-basedcontrol systems, specically recongurable robotics
applications. Chimera has multiprocessor support, andhandles bothstatic anddynamic scheduling,
the latter EDF based.
The Rubus is a RTOS. Rubus supports a component model in which behaviors are dened by
sequences of port-based objects [13]. The Rubus kernel supports predictable execution of statically
scheduled periodic tasks (termed red tasks in Rubus) and dynamically xed-priority preemptive
scheduled tasks (termed Blue). In addition, support for handling interrupts is provided. In Rubus,
support is provided for transforming sets of components into sequential chains of executable code.
Each such chain is implemented as a single task. Support is also provided for analysis of response
times and end-to-end deadlines, based on execution-time measures that have to be provided, that
is, execution-time analysis is not provided by the framework.
The Time-Triggered Operating System (TTOS) is an adapted and extended version of the MARS
OS [71]. Task scheduling in TTOS is based on an ofine generated scheduling table, and relies on
the global time base provided by the TTP/Ccommunication system. All synchronization is handled
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-29
by the ofine scheduling. TTOS, and in general the entire TTA is (just as IEC61131-3) well suited
for the synchronous execution paradigm.
In a synchronous execution the system is considered sequential; computing in each step (or cycle) a
global output based on a global input. The effect of each step is dened by a set of transformation rules.
Scheduling is done statically by compiling the set of rules into a sequential program implementing these
rules and executing them in some statically dened order. A uniform timing bound for the execution of
global steps is assumed. In this context, a component is a design level entity.
TTA denes a protocol for extending the synchronous language paradigm to distributed platforms,
allowing distributedcomponents tointeroperate, as long as they conformtoimposedtiming requirements.
2.7.3 Real-Time Scheduling
Ideally, from a CBD perspective, the response time of a component should be independent of the envir-
onment in which it is executing (since this would facilitate reuse of the component). However, this is in
most cases highly unrealistic, since:
1. The execution time of the task will be different in different target environments.
2. The response time is additionally dependent on the other tasks competing for the same resources
(CPU etc.) and the scheduling method used to resolve the resource contention.
Rather thanaiming for the nonachievable ideal, a realistic ambitioncouldbe tohave a component model
and framework which allows for analysis of response times based on abstract models of components and
their compositions. Time-triggered systems goes one step toward the ideal solution, in that components
can be timely isolated from each other. While not having a major impact on the component model,
time-triggered systems simplify implementation of the component framework since all synchronization
between components is resolved ofine. Also, from a safety perspective, the time-triggered paradigm
gives benets, in that it reduces the number of possible execution scenarios (owing to the static order of
execution of components and owing to the lack of preemption).
Also, in time-triggered component models it is possible to use the structure given by the component
composition to synthesize scheduling parameters. For instance, in Rubus [13] and TTA [8] this is already
done, by generating the static schedule using the components as schedulable entities.
In theory, a similar approach could be used also for dynamically scheduled systems; using a scheduler/
task conguration tool to automatically derive mappings of components to tasks and scheduling param-
eters (such as priorities or deadlines) for the tasks. However, this approach is still on the research stage.
2.8 Testing and Debugging of RTSs
According to a recent study by NIST [102] up to 80% of the life cycle cost for software is spent on testing
and debugging. Despite the importance, there are few results on RTSs testing and debugging.
The main reason for this is that it is actually quite difcult to test and debug RTS. Remember that
RTSs are timing critical and that they interact with the real world. Since testing and debugging typically
involves some instrumentation of the code, the timing behavior of the system will be different when
testing/debugging compared with when executing the deployed system. Hence, the test-cases that were
passed during testing may lead to failures in the deployed system, and tests that failed may not cause
any problem at all in the deployed system. For debugging the situation is possibly even worse, since
in addition to a similar effect when running the system in a debugger, entering a breakpoint will stop
the execution for an unspecied time. The problem with this is that the controlled external process will
continue to evolve (e.g., a car will not momentarily stop by stopping the execution of the controlling
software). The result of this is that we get a behavior of the debugged system which will not be possible in
the real system. Also, it is often the case that the external process cannot be completely controlled, which
2006 by Taylor & Francis Group, LLC
2-30 Embedded Systems Handbook
means that we cannot reproduce the observed behavior, which means that it will be difcult to use (cyclic)
debugging to track down an error that caused a failure.
The following are two possible solutions to the presented problems:
To build a simulator that faithfully captures the functional as well as timing behavior of both the
RTS and the environment which it is controlling. Since this is both time consuming and costly, this
approach is only feasible in very special situations. Since such situations are rare we will not further
consider this alternative here.
To record the RTSs behavior during testing or execution, and then if a failure is detected replay
the execution in a controlled way. For this to work it is essential that the timing behavior is the
same during testing as in the deployed system. This can either be achieved by using nonintrusive
hardware recorders, or by leaving the software used for instrumentation in the deployed system.
The latter comes at a cost in memory space and execution time, but gives the additional benet
that it becomes possible to debug also the deployed system in case of a failure [103].
An additional problem for most RTSs is that the system consists of several concurrently executing
threads. This is also the case for the majority of nonRTSs. This concurrency will per se lead to a problematic
nondeterminism, since owing to race conditions caused by slight variations in execution time the exact
preemption points will vary, causing unpredictability, both in terms of the number of scenarios and in
terms of being able to predict which scenario will actually be executed in a specic situation.
In conclusion we note that testing and debugging of RTSs are difcult and challenging tasks.
The following is a brief account of some of the fewresults on testing of RTSs reported in the literature:
Thane and Hansson [87] proposed a method for deterministic testing of distributed RTSs. The
key element here is, to identify the different execution orderings (serializations of the concurrent
system) and treat each of these orderings as a sequential program. The main weakness of this
approach is the potentially exponential blow-up of the number of execution orderings.
For testing of temporal correctness Tsai et al. [104] provide a monitoring technique that records
runtime information. This information is then used to analyze if the temporal constraints are
violated.
Schtz [105] has proposed a strategy for testing distributed RTSs. The strategy is tailored for the
time-triggered MARS system [71].
Zhu et al. [106] have proposed a framework for regression testing of real-time software in
distributed systems. The framework is based on the Onomas [107] regression testing process.
When it comes to RTS debugging the most promising approach is record/replay [108112] as men-
tioned earlier. Using record/replay, rst, a reference execution of the system is executed and observed,
second, a replay execution is performed based on the observations made during the reference execution.
Observations are performed by instrumenting the system, in order to extract information about the
execution.
The industrial practice for testing and debugging of multi-tasking RTS is a time consuming activity. At
best, hardware emulators, for example, Reference 113, are used to get some level of observability without
interfering with the observed system. More often, it is an ad hoc activity, using intrusive instrumentations
of the code to observe test results or try to track down intricate timing errors. However, some tools using
the above record/replay method is now emerging on the market, for example, Reference 114.
2.9 Summary
This chapter has presented the most important issues, methods, and trends in the area of embedded RTSs.
Awide range of topics have been covered, fromthe initial design of embedded RTSs to analysis and testing.
Important issues discussed and presented are design tools, OSs, and major underlying mechanisms such
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-31
as architectures, models of interactions, real-time mechanisms, executions strategies, and scheduling.
Moreover, communications, analysis, and testing techniques are presented.
Over the years, the academics have put an effort in increasing the various techniques used to compose
and design complex embedded RTSs. Standards and industry are following a slower pace, while also
adopting and developing area-specic techniques. Today, we can see diverse techniques used in different
application domains, such as automotive, aero, and trains. In the area of communications, an effort is
made in the academic, and also in some parts of industry, toward using Ethernet. This is a step toward a
common technique for several application domains.
Different real-time demands have led to domain specic OSs, architectures, and models of interactions.
As many of these have several commonalities, there is a potential for standardization across several
domains. However, as this takes time, we will most certainly stay with application specic techniques for
a while, and for specic domains, with extreme demands on safety or low cost, specialized solutions will
most likely be used also in the future. Therefore, knowledge of the techniques used in and suitable for the
various domains will remain important.
References
[1] Tom R. Halfhill. Embedded Markets Breaks New Ground. Microprocessor Report, 17, 2000.
[2] H. Kopetz. Introduction. In Real-Time Systems: Introduction and Overview. Part XVIII of Lecture
Notes from ESSES 2003 European Summer School on Embedded Systems. Ylva Boivie, Hans
Hansson, and Sang Lyul Min, Eds., Vsters, Sweden, September 2003.
[3] IEEE Computer Society. Technical Committee on Real-Time Systems Home Page. http://www.cs.
bu.edu/pub/ieee-rts/.
[4] Kluwer. Real-Time Systems (Journal). http://www.wkap.nl/kapis/CGI-BIN/WORLD/
journalhome.htm?0922-6443.
[5] C. Liu and J. Layland. Scheduling Algorithms for Multiprogramming in a Hard-Real-Time
Environment. Journal of the ACM, 20:4661, 1973.
[6] M.H. Klein, T. Ralya, B. Pollak, R. Obenza, and M.G. Harbour. A Practitioners Handbook for
Rate-Monotonic Analysis. Kluwer, Dordrecht, 1998.
[7] N.C. Audsley, A. Burns, R.I. Davis, K. Tindell, and A.J. Wellings. Fixed Priority Pre-Emptive
Scheduling: An Historical Perspective. Real-Time Systems, 8:129154, 1995.
[8] Hermann Kopetz and Gnther Bauer. The Time-Triggered Architecture. Proceedings of the IEEE,
Special Issue on Modeling and Design of Embedded Software, 91:112126, 2003.
[9] J. Xu and D.L. Parnas. Scheduling Processes with Release Times, Deadlines, Precedence, and
Exclusion Relations. IEEE Transactions on Software Engineering, 16:360369, 1990.
[10] Time Triggered Technologies. http://www.tttech.com.
[11] H. Kopetz and G. Grnsteidl. TTP A Protocol for Fault-Tolerant Real-Time Systems. IEEE
Computer, 27(1):1423, 1994.
[12] H. Hansson, H. Lawson, and M. Strmberg. BASEMENT a Distributed Real-Time Architecture
for Vehicle Applications. Real-Time Systems, 3:223244, 1996.
[13] Arcticus Systems. The Rubus Operating System. http://www.arcticus.se.
[14] OMG. CORBA Component Model 3.0, June 2002. http://www.omg.org/technology/documents/
formal/components.htm.
[15] Microsoft. Microsoft .COM Technologies. http://www.microsoft.com/com/.
[16] Microsoft. .NET Home Page. http://www.microsoft.com/net/.
[17] SUN Microsystems. Introducing Java Beans. http://developer.java.sun.com/developer/online/
Training/Beans/ Beans1/index.html.
[18] John A. Stankovic. VEST A Toolset for Constructing and Analyzing Component-Based
Embedded Systems. Lecture Notes in Computer Science, 2211:390402, 2001.
[19] Rob van Ommering. The Koala Component Model. In Building Reliable Component-Based
Software Systems. Artech House Publishers, July 2002, pp. 223236.
2006 by Taylor & Francis Group, LLC
2-32 Embedded Systems Handbook
[20] P.O. Mller, C.M. Stich, and C. Zeidler. Component-Based Embedded Systems. In Building
Reliable Component-Based Software Systems. Artech House Publisher, 2002, pp. 303323.
[21] D.B. Stewart, R.A. Volpe, and P.K. Khosla. Design of Dynamically Recongurable Real-Time
Software Using Port-Based Objects. IEEE Transactions on Software Engineering, 23(12):759776,
1997.
[22] OMG. Unied Modeling Language (UML), Version 1.5, 2003. http://www.omg.org/technology/
documents/formal/uml.htm.
[23] Rational. Rational Rose RealTime. http://www.rational.com/products/rosert.
[24] I-Logix. Rhapsody. http://www.ilogix.com/products/rhapsody.
[25] TeleLogic. Telelogic tau. http://www.telelogic.com/products/tau.
[26] Vector. DaVinci Tool Suite. http://www.vector-informatik.de/.
[27] OMG. Unied Modeling Language (UML), Version 2.0 (draft). OMG document ptc/03-09-15,
September 2003.
[28] ITEA. EAST/EEA Project Site. http://www.east-eea.net.
[29] ETAS. http://en.etasgroup.com.
[30] Vector. http://www.vector-informatik.com.
[31] Siemens. http://www.siemensvdo.com.
[32] Comp.realtime FAQ. Available at http://www.faqs.org/faqs/realtime-computing/faq/.
[33] Roadmap Adaptive Real-Time Systems for Quality of Service Management. ARTIST Project
IST-2001-34820, May 2003. http://www.artist-embedded.org/Roadmaps/.
[34] G.C. Buttazzo. Hard Real-Time Computing Systems. Kluwer Academic Publishers, Dordrecht,
1997.
[35] A. Burns and A. Wellings. Real-Time Systems and Programming Languages, 2nd ed. Addison-
Wesley, Reading, MA, 1996.
[36] The Asterix Real-Time Kernel. http://www.mrtc.mdh.se/projects/asterix/.
[37] LiveDevices. Realogy Real-Time Architect, SSX5 Operating System, 1999. http://www.livedevices.
com/realtime.shtml.
[38] Wind River Systems Inc. VxWorks Programmers Guide. http://www.windriver.com/.
[39] Lynuxworks. http://www.lynuxworks.com.
[40] Enea OSE Systems. Ose. http://www.ose.com.
[41] QNX Software Systems. QNX Realtime OS. http://www.qnx.com.
[42] List of Real-Time Linux Variants. http://www.realtimelinuxfoundation.org/variants/
variants.html.
[43] Express Logic. Threadx. http://www.expresslogic.com.
[44] Northern Real-Time Applications. Total Time Predictability. Whitepaper on SSX5, 1998.
[45] IEEE. Standard for Information Technology Standardized Application Environment Prole
POSIX Realtime Application Support (AEP). IEEE Standard P1003.13-1998, 1998.
[46] OSEK Group. OSEK/VDX Operating System Specication 2.2.1. http://www.osek-vdx.org/.
[47] Airlines Electronic Engineering Committee (AEEC). ARINC 653: Avionics Application Software
Standard Interface (Draft 15), June 1996.
[48] ISO. Ada95 Reference Manual. ISO/IEC 8652:1995(E), 1995.
[49] G. Fohler, T. Lennvall, and R. Dobrin. A Component Based Real-Time Scheduling Archi-
tecture. In Architecting Dependable Systems, Vol. LNCS-2677. R. de Lemos, C. Gacek, and
A. Romanovsky, Eds., Springer-Verlag, Heidelberg, 2003.
[50] J. Mki-Turja and M. Sjdin. Combining Dynamic and Static Scheduling in Hard Real-Time
Systems. Technical report MRTC no. 71, Mlardalen Real-Time Research Centre (MRTC),
October 2002.
[51] B. Sprunt, L. Sha, and J.P. Lehoczky. Aperiodic Task Scheduling for Hard Real-Time Systems.
Real-Time Systems, 1:2760, 1989.
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-33
[52] M. Spuri and G.C. Buttazzo. Efcient Aperiodic Service under Earliest Deadline Scheduling.
In Proceedings of the 15th IEEE Real-Time Systems Symposium (RTSS94). IEEE Computer Society,
San Juan, Puerto Rico, December 1994, pp. 211.
[53] M. Spuri and G.C. Buttazzo. Scheduling Aperiodic Tasks in Dynamic Priority Systems. Real-Time
Systems, 10:179210, 1996.
[54] L. Abeni and G. Buttazzo. Integrating Multimedia Applications in Hard Real-Time Systems.
In Proceedings of the 19th IEEE Real-Time Systems Symposium (RTSS98). IEEE Computer Society,
Madrid, Spain, December 1998, pp. 413.
[55] M. Spuri, G.C. Buttazzo, and F. Sensini. Robust Aperiodic Scheduling under Dynamic Priority
Systems. In Proceedings of the 16th IEEE Real-Time Systems Symposium (RTSS95). IEEE Computer
Society, Pisa, Italy, December 1995, pp. 210219.
[56] CAN Specication 2.0, Part-A and Part-B. CAN in Automation (CiA), Am Weichselgarten 26,
D-91058 Erlangen, 2002. http://www.can-cia.de.
[57] Road Vehicles Interchange of Digital Information Controller Area Network (CAN) for
High Speed Communications, ISO/DIS 11898, February 1992.
[58] K.W. Tindell, A. Burns, and A.J. Wellings. Calculating Controller Area Network (CAN) Message
Response Times. Control Engineering Practice, 3:11631169, 1995.
[59] K. Tindell, H. Hansson, and A. Wellings. Analysing Real-Time Communications: Controller Area
Network (CAN). In Proceedings of the 15th IEEE Real-Time Systems Symposium (RTSS). IEEE
Computer Society Press, December 1994, pp. 259263.
[60] Road Vehicles Controller Area Network (CAN) Part 4: Time Triggered Communication.
ISO/CD 11898-4.
[61] L. Almeida, J.A. Fonseca, and P. Fonseca. Flexible Time-Triggered Communication on a Con-
troller Area Network. In Proceedings of the Work-In-Progress Session of the 19th IEEE Real-Time
Systems Symposium (RTSS98). IEEE Computer Society, Madrid, Spain, December 1998.
[62] L. Almeida, J.A. Fonseca, and P. Fonseca. A Flexible Time-Triggered Communication
System Based on the Controller Area Network: Experimental Results. In Proceedings of the IFAC
International Conference on Fieldbus Technology (FeT). Springer, 1999, pp. 342350.
[63] TTTech Computertechnik AG. Specication of the TTP/C Protocol v0.5, July 1999.
[64] H. Kopetz. The Time-Triggered Model of Computation. In Proceedings of the 19th IEEE Real-
Time Systems Symposium (RTSS98). IEEE Computer Society, Madrid, Spain, December 1998,
pp. 168177.
[65] LIN. Local Interconnect Network. http://www.lin-subbus.de.
[66] R. Belschner, J. Berwanger, C. Ebner, H. Eisele, S. Fluhrer, T. Forest, T. Fhrer, F. Hartwich,
B. Hedenetz, R. Hugel, A. Knapp, J. Krammer, A. Millsap, B. Mller, M. Peller, and A. Schedl.
FlexRay Requirements Specication, April 2002. http://www.exray-group.com.
[67] ARINC/RTCA-SC-182/EUROCAE-WG-48. Minimal Operational Performance Standard for
Avionics Computer Resources, 1999.
[68] PROFIBUS. PROFIBUS International. http://www.probus.com.
[69] H. Kirrmann and P.A. Zuber. The IEC/IEEE Train Communication Network. IEEE Micro,
21:8192, 2001.
[70] WorldFIP. WorldFIP Fieldbus. http://www.worldp.org.
[71] H. Kopetz, A. Damm, C. Koza, and M. Mullozzani. Distributed Fault Tolerant Real-Time Systems:
The MARS Approach. IEEE Micro, 9(1):2540, 1989.
[72] C. Venkatramani and T. Chiueh. Supporting Real-Time Trafc on Ethernet. In Proceedings of the
15th IEEE Real-Time Systems Symposium (RTSS94). IEEE Computer Society, San Juan, Puerto
Rico, December 1994, pp. 282286.
[73] D.W. Pritty, J.R. Malone, S.K. Banerjee, and N.L. Lawrie. A Real-Time Upgrade for Ethernet
Based Factory Networking. In Proceedings of the IECON95. IEEE Industrial Electronics Society,
1995, pp. 16311637.
2006 by Taylor & Francis Group, LLC
2-34 Embedded Systems Handbook
[74] N. Malcolm and W. Zhao. The Timed Token Protocol for Real-Time Communication. IEEE
Computer, 27:3541, 1994.
[75] K.K. Ramakrishnan and H. Yang. The Ethernet Capture Effect: Analysis and Solution. In
Proceedings of the 19th IEEE Local Computer Networks Conference (LCNC94), October 1994,
pp. 228240.
[76] M. Molle. A New Binary Logarithmic Arbitration Method for Ethernet. Technical report, TR
CSRI-298, CRI, University of Toronto, Canada, 1994.
[77] G. Lann and N. Riviere. Real-Time Communications over Broadcast Networks: The CSMA/DCR
and the DOD-CSMA/CD Protocols. Technical report, TR 1863, INRIA, 1993.
[78] M. Molle and L. Kleinrock. Virtual Time CSMA: Why Two Clocks are Better than One. IEEE
Transactions on Communications, 33:919933, 1985.
[79] W. Zhao and K. Ramamritham. AVirtual Time CSMA/CD Protocol for Hard Real-Time Commu-
nication. In Proceedings of the 7th IEEE Real-Time Systems Symposium (RTSS86). IEEE Computer
Society, New Orleans, LA, December 1986, pp. 120127.
[80] M. El-Derini and M. El-Sakka. A Novel Protocol Under a Priority Time Constraint for Real-Time
Communication Systems. In Proceedings of the 2nd IEEE Workshop on Future Trends of Distrib-
uted Computing Systems (FTDCS90). IEEE Computer Society, Cairo, Egypt, September 1990,
pp. 128134.
[81] W. Zhao, J.A. Stankovic, and K. Ramamritham. A Window Protocol for Transmission of Time-
Constrained Messages. IEEE Transactions on Computers, 39:11861203, 1990.
[82] L. Almeida, P. Pedreiras, and J.A. Fonseca. The FTT-CAN Protocol: Why and How? IEEE
Transaction on Industrial Electronics, 49(6):11891201, 2002.
[83] P. Pedreiras, L. Almeida, and P. Gai. The FTT-Ethernet Protocol: Merging Flexibility, Timeliness
and Efciency. In Proceedings of the 14th Euromicro Conference on Real-Time Systems (ECRTS02).
IEEE Computer Society, Vienna, Austria, June 2002, pp. 152160.
[84] S.K. Kweon, K.G. Shin, and G. Workman. Achieving Real-Time Communication over Ethernet
with Adaptive Trafc Smoothing. In Proceedings of the Sixth IEEE Real-Time Technology and
Applications Symposium (RTAS00). IEEE Computer Society, Washington DC, USA, June 2000,
pp. 90100.
[85] A. Carpenzano, R. Caponetto, L. LoBello, and O. Mirabella. Fuzzy Trafc Smoothing: An
Approach for Real-Time Communication over Ethernet Networks. In Proceedings of the Fourth
IEEE International Workshop on Factory Communication Systems (WFCS02). IEEE Industrial
Electronics Society, Vsters, Sweden, August 2002, pp. 241248.
[86] J.L. Sobrinho and A.S. Krishnakumar. EQuB-Ethernet Quality of Service Using Black Bursts.
In Proceedings of the 23rd IEEE Annual Conference on Local Computer Networks (LCN98). IEEE
Computer Society, Lowell, MA, October 1998, pp. 286296.
[87] H. Thane and H. Hansson. Towards Systematic Testing of Distributed Real-Time Systems.
In Proceedings of the 20th IEEE Real-Time Systems Symposium (RTSS). December 1999,
pp. 360369.
[88] M. Joseph and P. Pandya. Finding Response Times in a Real-Time System. Computer Journal,
29:390395, 1986.
[89] The Times Tool. http://www.docs.uu.se/docs/rtmv/times.
[90] AbsInt. http://www.absint.com.
[91] Bound-T Execution Time Analyzer. http://www.bound-t.com.
[92] L. Casparsson, A. Rajnak, K. Tindell, and P. Malmberg. Volcano A Revolution in On-Board
Communications. Volvo Technology Report, 1:919, 1998.
[93] Volcano automotive group. http://www.volcanoautomotive.com.
[94] TimeSys. Timewiz A Modeling and Simulation Tool. http://www.timesys.com/.
[95] OMG. UML Prole for Schedulability, Performance, and Time Specication. OMG document
formal/2003-09-01, September 2003.
2006 by Taylor & Francis Group, LLC
Real-Time in Embedded Systems 2-35
[96] J.L. Medina, M. Gonzlez Harbour, and J.M. Drake. MAST Real-Time View: A Graphic UML
Tool for Modeling Object-Oriented Real-Time Systems. In Proceedings of the 22nd IEEE Real-Time
Systems Symposium (RTSS). IEEE Computer Society, December 2001, pp. 245256.
[97] MAST home-page. http://mast.unican.es/.
[98] A. Massa. Embedded Software Development with eCos. Prentice Hall, New York, November 2002,
ISBN: 0130354732.
[99] eCos Home Page. http://sources.redhat.com/ecos.
[100] OMG. CORBA Home Page. http://www.omg.org/corba/.
[101] PECOS Project Web Site. http://www.pecos-project.org.
[102] U.S. Department of Commerce. The Economic Impacts of Inadequate Infrastructure for Software
Testing. NIST report, May 2002.
[103] M. Ronsse, K. De Bosschere, M. Christiaens, J. Chassin de Kergommeaux, and D. Kranzlmller.
Record/Replay for Nondeterministic Program Executions. Communications of the ACM, 46:6267,
2003.
[104] J.J.P. Tsai, K.Y. Fang, and Y.D. Bi. On Realtime Software Testing and Debugging. In Proceedings
of the 14th Annual International Computer Software and Application Conference. IEEE Computer
Society, November 1990, pp. 512518.
[105] W. Schtz. Fundamental Issues in Testing Distributed Real-Time Systems. Real-Time Systems,
7:129157, 1994.
[106] H. Zhu, P. Hall, and J. May. Software Unit Test Coverage and Adequacy. ACM Computing Surveys,
29(4):366427, 1997.
[107] K. Onoma, W.-T. Tsai, M. Poonawala, and H. Suganuma. Regression Testing in an Industrial
Environment. Communications of the ACM, 41:8186, 1998.
[108] J.D. Choi, B. Alpern, T. Ngo, M. Sridharan, and J. Vlissides. A Pertrubation-Free Replay Platform
for Cross-Optimized Multithreaded Applications. In Proceedings of the 15th International Parallel
and Distributed Processing Symposium. IEEE Computer Society Press, Washington, April 2001.
[109] J. Mellor-Crummey and T. LeBlanc. A Software Instruction Counter. In Proceedings of the Third
International Conference on Architectural Support for Programming Languages and Operating
Systems. ACM, April 1989, pp. 7886.
[110] K.C. Tai, R. Carver, and E. Obaid. Debugging Concurrent ADA Programs by Deterministic
Execution. IEEE Transactions on Software Engineering, 17:280287, 1991.
[111] H. Thane and H. Hansson. Using Deterministic Replay for Debugging of Distributed Real-Time
Systems. In Proceedings of the 12th Euromicro Conference on Real-Time Systems. IEEE Computer
Society Press, Washington, June 2000, pp. 265272.
[112] F. Zambonelli and R. Netzer. An Efcient Logging Algorithm for Incremental Replay of Message-
Passing Applications. In Proceedings of the 13th International and 10th Symposium on Parallel and
Distributed Processing. IEEE, April 1999, pp. 392398.
[113] Lauterbach. Lauterbach. http://www.laterbach.com.
[114] ZealCore. ZealCore Embedded Solutions AB. http://www.zealcore.com.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 1 #1
Design and Validation
of Embedded Systems
3 Design of Embedded Systems
Luciano Lavagno and Claudio Passerone
4 Models of Embedded Computation
Axel Jantsch
5 Modeling Formalisms for Embedded System Design
Lus Gomes, Joo Paulo Barros, and Anik Costa
6 System Validation
J.V. Kapitonova, A.A. Letichevsky, V.A. Volkov, and Thomas Weigert
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 1 #3
3
Design of Embedded
Systems
Luciano Lavagno
Cadence Berkeley Laboratories
and Politecnico di Torino
Claudio Passerone
Politecnico di Torino
3.1 The Embedded System Revolution . . . . . . . . . . . . . . . . . . . . . 3-1
3.2 Design of Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
3.3 Functional Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
3.4 FunctionArchitecture and HardwareSoftware
Codesign. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
3.5 HardwareSoftware Coverication and Hardware
Simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11
3.6 Software Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12
Compilation, Debugging, and Memory Model Real-Time
Scheduling
3.7 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16
Logic Synthesis and Equivalence Checking Placement,
Routing, and Extraction Simulation, Formal Verication,
and Test Pattern Generation
3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-22
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-22
3.1 The Embedded System Revolution
The world of electronics has witnessed a dramatic growth of its applications in the last few decades. From
telecommunications to entertainment, from automotives to banking, almost any aspect of our everyday
life employs some kind of electronic components. In most cases, these components are computer-based
systems, which are not, however, used or perceived as a computers. For instance, they often do not have
a keyboard or a display to interact with the user, and they do not run standard operating systems and
applications. Sometimes, these systems constitute a self-contained product themselves (e.g., a mobile
phone), but they are frequently embedded inside another system, for which they provide better function-
alities and performance (e.g., the engine control unit of a motor vehicle). We call these computer-based
systems embedded systems.
The huge success of embedded electronics has several causes. The main one in our opinion is that
embedded systems bring the advantages of Moores Law into everyday life, that is, an exponential increase
in performance and functionality at an ever decreasing cost. This is possible because of the capabilities
of integrated circuit technology and manufacturing, which allows one to build more and more complex
devices, and because of the development of newdesign methodologies, which allows one to efciently and
cleverly use those devices. Traditional steel-based mechanical development, onthe other hand, has reached
3-1
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 2 #4
3-2 Embedded Systems Handbook
a plateau near the middle of the twentieth century, and thus it is not a signicant source of innovation
any longer, unless coupled to electronic manufacturing technologies (microelectromechanical systems
MEMS) or embedded systems, as argued above.
There are many examples of embedded systems in the real world. For instance, a modern car contains
tens of electronic components (control units, sensors, and actuators) that perform very different tasks. The
rst embedded systems that appeared in a car were related to the control of mechanical aspects, such as
the control of the engine, the antilock brake system, and the control of suspension and transmission.
However, nowadays cars also have a number of components that are not directly related to mechanical
aspects, but are mostly related to the use of the car as a vehicle for moving around, or the communication
needs of the passengers: navigation systems, digital audio and video players, and phones are just a few
examples. Moreover, many of these embedded systems are connected together using a network, because
they need to share information regarding the state of the car.
Other examples come from the communication industry: a cellular phone is an embedded system whose
environment is the mobile network. These are very sophisticated computers whose main task is to send
and receive voice, but are also currently used as personal digital assistants, for games, to send and receive
images and multimedia messages, and to wirelessly browse the Internet. They have been so successful
and pervasive that in just a decade they became essential in our life. Other kinds of embedded systems
signicantly changed our life as well: for instance, ATM and Point-of-Sale (POS) machines modied the
way we do payments, and multimedia digital players changed how we listen to music and watch videos.
We are just at the beginning of a revolution that will have an impact on every other industrial sector.
Special purpose embedded systems will proliferate and will be found in almost any object that we use.
They will be optimized for the application and show a natural user interface. They will be exible, in order
to adapt to a changing environment. Most of them will also be wireless, in order to follow us wherever
we go and keep us constantly connected with the information we need and the people we care. Even the
role of computers will have to be reconsidered, as many of the applications for which they are used today
will be performed by specially designed embedded systems.
What are the consequences of this revolution in the industry? Modern car manufacturers today need
to acquire a signicant amount of skills in hardware and software design, in addition to the mechanical
skills that they already had in-house, or they should outsource the requirements they have to an external
supplier. In either case, a broad variety of skills needs to be mastered, from the design of software
architectures for implementing the functionality, to being able to model the performance, because real-
time aspects are extremely important in embedded systems, especially those related to safety critical
applications. Embedded system designers must also be able to architect and analyze the performance of
networks, as well as validate the functionality that has been implemented over a particular architecture
and the communication protocols that are used.
A similar revolution has happened or is about to happen to other industrial and socioeconomical areas
as well, such as entertainment, tourism, education, agriculture, government, and so on. It is therefore clear
that new, more efcient and easy to use embedded electronics design methodologies need to be developed,
in order to enable the industry to make use of the available technology.
3.2 Design of Embedded Systems
Embeddedsystemare informally denedas a collectionof programmable parts surroundedby Application
Specic Integrated Circuits (ASICs) and other standard components (Application Specic Standard Parts,
ASSPs) that interact continuously with an environment through sensors and actuators. The collection
can be physically a set of chips on a board, or a set of modules on an integrated circuit. Software is
used for features and exibility, while dedicated hardware is used for increased performance and reduced
power consumption. An example of an architecture of an embedded system is shown in Figure 3.1.
The main programmable components are microprocessors and Digital Signal Processors (DSPs), that
implement the software partition of the system. One can view recongurable components, especially
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 3 #5
Design of Embedded Systems 3-3
mP/mC CoProc
Bridge
Memory
Peripheral
Dual port memory
IP Block Memory DSP
FPGA
FIGURE 3.1 A reactive real-time embedded system architecture.
if they can be recongured at runtime, as programmable components in this respect. They exhibit
area, cost, performance, and power characteristics that are intermediate between dedicated hardware and
processors. Custom and programmable hardware components, on the other hand, implement application-
specic blocks and peripherals. All components are connected through standard and dedicated buses and
networks, and data is stored on a set of memories. Often several smaller subsystems are networked together
to control, for example, an entire car, or to constitute a cellular or wireless network.
We can identify a set of typical characteristics that are commonly found in embedded systems. For
instance, they are usually not very exible and are designed to perform always the same task: if you
buy an engine control embedded system, you cannot use it to control the brakes of your car, or to play
games. A PC, on the other hand, is much more exible because it can perform several very different tasks.
An embedded system is often part of a larger controlled system. Moreover, cost, reliability, and safety
are often more important criteria than performance, because the customer may not even be aware of the
presence of the embedded system, and so he looks at other characteristics, such as the cost, the ease of use,
or the lifetime of a product.
Another common characteristic of many embedded systems is that they need to be designed in an
extremely short time to meet their time to market. Only a few months should elapse from conception
of a consumer product to the rst working prototypes. If these deadlines are not met, the result is a
concurrent increase in design costs and decrease of the prots, because fewer items will be sold. So delays
in the design cycle may make a huge difference between a successful product and an unsuccessful one.
In the current state of the art, embedded systems are designed with an ad hoc approach that is heavily
based on earlier experience with similar products and on manual design. Often the design process requires
several iterations to obtain convergence, because the system is not specied in a rigorous and unambiguous
fashion, and the level of abstraction, details, and design style in various parts are likely to be different. But
as the complexity of embedded systems scales up, this approach is showing its limits, especially regarding
design and verication time.
New methodologies are being developed to cope with the increased complexity and enhance designers
productivity. In the past, a sequence of two steps has always been used to reach this goal: abstraction and
clustering. Abstraction means describing an object (i.e., a logic gate made of metal oxide semiconductor
[MOS] transistors) using a model where some of the low-level details are ignored (i.e., the Boolean
expression representing that logic gate). Clustering means connecting a set of models at the same level of
abstraction, to get a newobject, which usually shows newproperties that are not part of the isolated models
that constitute it. By successively applying these two steps, digital electronic design went from drawing
layouts, to transistor schematics, to logic gate netlists, to register transfer level (RTL) descriptions, as shown
in Figure 3.2.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 4 #6
3-4 Embedded Systems Handbook
Transistor model
Gate-level model
Register transfer level
System level
A
b
s
t
r
a
c
t
RTL RTL Cluster
Cluster
A
b
s
t
r
a
c
t
A
b
s
t
r
a
c
t
Cluster
A
b
s
t
r
a
c
t
SW
1970s 2000+ 1990s 1980s
FIGURE 3.2 Abstraction and clustering levels in hardware design.
The notion of platform is key to the efcient use of abstraction and clustering. A platform is a single
abstract model that hides the details of a set of different possible implementations as clusters of lower-
level components. The platform, for example, a family of microprocessors, peripherals, and bus protocol,
allows developers of designs at the higher level (generically called applications in the following) to
operate without detailed knowledge of the implementation (e.g., the pipelining of the processor or the
internal implementation of the Universal Asychronous Receiver/Transmitter [UART]). At the same time,
it allows platform implementors to share design and fabrication costs among a broad range of potential
users, broader than if each design was a one-of-a-kind type.
Today we are witnessing the appearance of a new higher level of abstraction, as a response to the
growing complexity of integrated circuits. Objects can be functional descriptions of complex behaviors,
or architectural specications of complete hardware platforms. They make use of formal high-level models
that can be used to perform an early and fast validation of the nal system implementation, although with
reduced details with respect to a lower-level description.
The relationship between an application and elements of a platform is called a mapping. This exists,
for example, between logic gates and geometric patterns of a layout, as well as between RTL statements and
gates. At the system level, the mapping is between functional objects with their communication links,
and platform elements with their communication paths. Mapping at the system level means associating
a functional behavior (e.g., an FFT [fast Fourier transform] or a lter) to an architectural element that
can implement that behavior (e.g., a CPU or DSP or piece of dedicated hardware). It can also associate
a communication link (e.g., an abstract FIFO[rst in rst out]) to some communication services available
in the architecture (e.g., a driver, a bus, and some interfaces). The mapping step may also need to specify
parameters for these associations (e.g., the priority of a software task or the size of a FIFO), in order to
completely describe it. The object that we obtain after mapping shows properties that were not directly
exposed in the separate descriptions, such as the performance of the selected system implementation.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 5 #7
Design of Embedded Systems 3-5
Performance is not just timing, but any other quantity that can be dened to characterize an embed-
ded system, either physical (area, power consumption, ) or logical (quality of service [QOS], fault
tolerance, ).
Since the system-level mapping operates on heterogeneous objects, it also allows one to nicely separate
different and orthogonal aspects such as:
1. Computation and communication. This separation is important because renement of computa-
tion is generally done by hand, or by compilation and scheduling, while communication makes use
of patterns.
2. Application and platform implementation (also called functionality and architecture, e.g., in
Reference 1), because they are often dened and designed independently by different groups or
companies.
3. Behavior and performance, which should be kept separate because performance information can
either represent nonfunctional requirements (e.g., maximum response time of an embedded con-
troller), or the result of an implementation choice (e.g., the worst-case execution time [WCET] of
a task). Nonfunctional constraint verication can be performed traditionally, by simulation and
prototyping, or with static formal checks, such as schedulability analysis.
All these separations result in better reuse, because they decouple independent aspects, that would other-
wise tie, for example, a given functional specication to low-level implementation details, by modeling it
as assembler or Verilog code. This in turn allows one to reduce design time, by increasing the productivity
and decreasing the time needed to verify the system.
A schematic representation of a methodology that can be derived from these abstraction and cluster-
ing steps is shown in Figure 3.3. At the functional level, a behavior for the system to be implemented
is specied, designed, and analyzed, either through simulation or by proving that certain properties are
satised (the algorithm always terminates, the computation performed satises a set of specications, the
complexity of the algorithm is polynomial, etc.). In parallel, a set of architectures are composed from
a clustering of platform elements, and selected as candidates for the implementation of the behavior.
These components may come from an existing library or may be specications of components that will
be designed later.
Now functional operations are assigned to the various architecture components, and patterns provided
by the architecture are selected for the dened communications. At this level we are now able to verify
the performance of the selected implementation, with much richer details than at the pure functional
Implementation level
Mapping level
Functional level
Mapping
Verify
function
Behavioral
libraries
Function Architecture
Verify
architecture
Architecture
libraries
Refinement
Verify
performance
Verify
refinements
Implementation
FIGURE 3.3 Design methodology for embedded system.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 6 #8
3-6 Embedded Systems Handbook
level. Different mappings to the same architecture, or mapping to different architectures, allow one to
explore the design space to nd the best solutions to important design challenges. These kinds of analysis
let the designer identify and correct possible problems early in the design cycle, thus reducing drastically
the time to explore the design space and weed out potentially catastrophic mistakes and bugs. At this
stage it is also very important to dene the organization of the data storage units for the system. Various
kind of memories (e.g., ROM, SRAM, DRAM, Flash, ) have different performance and data persistency
characteristics, and must be used judiciously to balance cost and performance. Mapping data structures to
different memories, and even changing the organization and layout of arrays can have a dramatic impact
on the satisfaction of a given latency in the execution of an algorithm, for example. In particular, a System-
On-Chip designer can afford to do a very ne tuning of the number and sizes of embedded memories
(especially SRAM, but now also Flash) to be connected to processors and dedicated hardware [2].
Finally, at the implementation level, the reverse transformation of abstraction and clustering occurs,
that is, a lower-level specication of the embedded system is generated. This is obtained through a series
of manual or automatic renements and modications that successively add more details, while checking
their compliance with the higher-level requirements. This step does not need to generate directly a
manufacturable nal implementation, but rather produces a new description that in turn constitutes the
input for another (recursive) application of the same overall methodology at a lower level of abstraction
(e.g., synthesis, placement and routing for hardware, and compilation and linking for software). Moreover,
the results obtained by these renements can be back-annotated to the higher level, to perform a better
and more accurate verication.
3.3 Functional Design
As discussed in the previous section, system-level design of embedded electronics requires two distinct
phases. In a rst phase, functional and nonfunctional constraints are the key aspects. In the second
phase, the available architectural platforms are taken into account, and detailed implementation can
proceed after a mapping phase that denes the architectural component on which every functional model
is implemented. This second phase requires a careful analysis of the trade-offs between algorithmic
complexity, functional exibility, and implementation costs.
In this section we describe some of the tools that are used for requirements capture, focusing especially
on those that permit executable specication. Such tools generally belong to two broad classes.
The rst class is represented, for example, by Simulink [3], MATRIXx [4], Ascet-SD [5], SPW [6],
SCADE [7], and SystemStudio [8]. It includes block-level editors and libraries using which the designer
composes data-dominated digital signal processing and embedded control systems. The libraries include
simple blocks, such as multiplication, addition, and multiplexing, as well as more complex ones, such as
FIR lters, FFTs, and so on.
The secondclass is representedby tools, suchas Tau[9], StateMate [10], Esterel Studio[7], StateFlow[3].
It is oriented to control-dominated embedded systems. In this case, the emphasis is placed on the decisions
that must be taken by the embedded system in response to environment and user inputs, rather than on
numerical computations. The notation is generally some form of Harels Statecharts [11].
The Unied Modeling Language (UML), as standardized by the Object Management Group [12], is in
a class by itself, since rst of all it focused historically more on general-purpose software (e.g., enter-
prise and commercial software), rather than on embedded real-time software. Only recently have some
embedded aspects, such as performance and time, been incorporated in UML 2.0 [12,13], and emphasis
has been placed on model-based software generation. However, tool support for UML 2.0 is still limited
(Tau [9], Real Time Studio [14], and Rose RealTime [15] provide some), and UML-based hardware
design is still in its infancy. Furthermore, the UML is a collection of notations, some of which (especially
Statecharts) are supported by several of the tools listed above in the control-dominated class.
Simulink and its related tools and toolboxes, both from Mathworks and from third parties, such as
dSPACE [16], is the workhorse of modern model-based embedded system design. In model-based design,
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 7 #9
Design of Embedded Systems 3-7
a functional executable model is used for algorithm development. This is made easier in the case of
Simulink by its tight integration with Matlab, the standard tool in DSP algorithm development. The same
functional model, with added annotations such as bitwidths and execution priorities, is then used for
algorithmic renements, such as oating-point to xed-point conversion and real-time task generation.
Then automated software generators, such as Real-Time Workshop, Embedded Coder [3], and
TargetLink [16], are used to generate task code and sometimes to customize a real-time operating system
(RTOS) on which the tasks will run. Ascet-SD, for example, automatically generates a customization of
the OSEK automotive RTOS [17] for the tasks that are generated from a functional model. In all these
cases, a task is typically generated from a set of blocks that are executed at the same rate or triggered by
the same event in the functional model.
Task formation algorithms can use either direct user input (e.g., the execution rate of each block in
discrete time portions of a Simulink or Ascet-SD design), or static scheduling algorithms for dataow
models (e.g., based on relative block-to-block rate specications in SPW or SystemStudio [18,19]).
Simulink is also tightly integrated with StateFlow, a design tool for control-dominated applications, in
order to ease the integration of decisionmaking and computation code. It also allows one to smoothly gen-
erate bothhardware andsoftware fromthe very same specication. This capability, as well as the integration
withsome sort of Statechart-basednite state machine (FSM) editor, is available frommost tools inthe rst
class above. The difference in market share can be attributed to the availability of Simulinktoolboxes for
numerous embedded system design tasks (from xed-point optimization to FPGA [Field Programmable
Gate Array]-based implementation) and its widespread adoption in undergraduate university courses,
which makes it well known to most of todays engineers.
The second class of tools either plays an ancillary role in the design of embedded control systems (e.g.,
as StateFlow and EsterelStudio), or is devoted to inherently control-dominated application areas, such
as telecommunication protocols. In the latter market the clear dominator today is Tau. The underlying
languages, such as the Specication and Description Language (SDL) and Message Sequence Charts, are
standardized by the International Telecommunication Union (ITU). They are commonly used to describe
in a tool-independent way protocol standards, thus modeling in SDL is quite natural in this application
domain, since validation and renement can proceed formally within a unied environment. Tau also has
code generation capabilities for both application code and customization of real-time kernels on which
the FSM-generated code will run. The use of Tau for embedded code generation (model-based design)
signicantly predates that of Simulink-based code generators, mostly due to the highly complex nature of
telecom protocols and the less demanding memory and computing power constraints that switches and
other networking equipment have.
Tau has links to the requirements capture tool Doors [9], also from Telelogic, which allows one to
trace dependencies between multiple requirements written in English, and connect them to aspects of the
embedded systemdesign les that implement these requirements. The state of the art of such requirement
tracing, however, is far from satisfactory, since there is no formal means in Doors to automatically check
for violations. Similar capabilities are provided by Reqtify [20].
Techniques for automated functional constraint validation, starting from formal languages, are
described in several books, for example, References 21 and 22. Deadline, latency, and throughput con-
straints are special kinds of nonfunctional requirements that have received extensive treatment in the
real-time scheduling community. They are also covered in several books, for example, References 2325.
While model-based functional verication is quite attractive, due to its high abstraction level, it ignores
cost and performance implications of algorithmic decisions. These are taken into account by the tools
described in the next section.
3.4 FunctionArchitecture and HardwareSoftware Codesign
In this section, we describe some of the tools that are available to help embedded system designers to
optimally architect the implementation of the system, and choose the best solution for each functional
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 8 #10
3-8 Embedded Systems Handbook
component. After these decisions have been made, detailed design can proceed using the languages, tools,
and methods described in the following chapters in this book.
This step of the design process, whose general structure has been outlined in Section 3.2 by using
the platform-based design paradigm, has received various names in the past. Early work [26,27] called
it hardwaresoftware codesign (or cosynthesis), because one of the key decisions at this level is what
functionality has to be implemented in software and in dedicated hardware, and how the two partitions
of the design interact together with minimum cost and maximum performance.
Later on, people came to realize that hardwaresoftware was too coarse a granularity, and that more
implementation choices had to be taken into account. For example, one could trade-off single versus mul-
tiple processors, general-purpose CPUs versus specialized DSPs and Application-Specic Instruction-set
Processors (ASIPs), dedicated ASIC versus ASSP (e.g., an MPEG coprocessor or an Ethernet Medium
Access Controller), standard cells versus FPGA. Thus the term functionarchitecture codesign was
coined[1], torefer tothe more complex partitioning problemof a givenfunctionality ontoa heterogeneous
architecture such as the one in Figure 3.1.
The term system-level design also had some popularity in the industry [6,28], to indicate the level
of design above Register Transfer, at which software and hardware interact. Other terms, such as timed
functional model have also been used [29].
The key problems that are tackled by tools acting as a bridge between the system-level application and
the architectural platform are:
1. How to model the performance impact of making mapping decisions from a virtually
implementation-independent functional specication to an architectural model.
2. How to efciently drive downstream code generation, synthesis, and validation tools to avoid
redoing the modeling effort from scratch at the RTL, C, or assembly code levels respectively. The
notion of automated implementation generation from a high-level functional model is called
model-based design in the software world.
In both cases, the notion of what is an implementation-independent functional specication, which
can be retargeted indifferently to hardware and software implementations, must be carefully evaluated and
considered. Taken in its most literal terms, this idea has often been taunted as a myth. However, current
practice shows that it is already a reality, at least for some application domains (automotive electronics
and telecommunication protocols). It is intuitively very appealing, since it can be considered as a high-
level application of the platform-based design principle, by using a formal system-level platform. Such
a platform, embodied in one of the several models of computation that are used in embedded system
design, is a perfect candidate to maximize design reuse, and to optimally exploit different implementation
options.
In particular, several of the tools that have been mentioned in the previous section (e.g., Simulink,
TargetLink, StateFlow, SPW, SystemStudio, Tau, Ascet-SD, StateMate, Esterel Studio) have code generation
capabilities that are considered good enough for implementation and not just for rapid prototyping
and simulation acceleration. Moreover, several of them (e.g., Simulink, StateFlow, SPW, System Studio,
StateMate, Esterel Studio) can generate indifferently C for software implementation, and synthesizable
VHDL or Verilog for hardware implementation. Unfortunately, these code generation capabilities often
require the laborious creation of implementation models for each target platform (e.g., software in C or
assembler for a given DSP, synthesizable VHDL or macroblock netlist for ASIC or FPGA, etc.). However,
since these careful implementations are instances of the system-level platform mentioned above, their
development cost can be shared among a multitude of designs performed using the tool.
Most block diagram or Statechart-based code generators work in a syntax-directed fashion. A piece
of C or synthesizable VHDL code is generated for each block and connection, or for each hierarchical
state and transition. Thus the designer has tight control over the complexity of the generated software or
hardware. While this is a convenient means to bring manual optimization capabilities within the model-
based design ow, it has a potentially signicant disadvantage in terms of cost and performance (such
as disabling optimizations in the case of a C compiler). On the other hand, more recent tools, such as
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 9 #11
Design of Embedded Systems 3-9
Esterel Studio and System Studio, take a more radical approach to code generation, based on aggressive
optimizations [30]. These optimizations, based on logic synthesis techniques also in the case of software
implementation, destroy the original model structure, and thus make debugging and maintenance much
harder. However, they can result in an order of magnitude improvement in terms of cost (memory size)
and performance (execution speed) with respect to their syntax-directed counterparts [31].
Assuming that good automated code generation, or manual design, is available for each block in the
functional model of the application, we are now faced with the functionarchitecture codesign problem.
This essentially means tuning the functional decomposition, as well as the algorithms employed by the
overall functional model and each block within it, to the available architecture, and vice versa.
Several design environments, for example:
POLIS [1], COSYMA [26], Vulcan [27], COSMOS [32], and Roses [33] in the academic world
Real Time Studio [14], Foresight [34], and CARDtools [35] in the commercial world
help the designer in this task by using somehow the notion of independence between functional
specication on one side, and hardwaresoftware partitioning or architecture mapping choices on
the other.
The step of performance evaluation is performed in an abstract, approximate manner by the tools
listed above. Some of them use estimators to evaluate the cost and performance of mapping a functional
block to an architectural block. Others (e.g., POLIS) rely on cycle-approximate simulation to perform the
same task in a manner which better reects real-life effects, such as burstiness of resource occupation and
so on. Techniques for deriving both abstract static performance models (e.g., the WCET of a software
task) and performance simulation models are discussed below.
In all cases, both the cost of computation and that of communication must be taken into account.
This is because the best implementation, especially in the case of multimedia systems that manipulate
large amounts of image and sound data, is often one that reduces the amount of transferred data between
multiple memory locations, rather than one that nds the absolute best trade-off between software
exibility and hardware efciency. Inthis area, the Atomiumproject at IMEC[2,36] has focused onnding
the best memory architecture andschedule of memory transfers for data-dominatedapplications onmixed
hardwaresoftware platforms. By exploiting array access models based on polyhedra, they identify the best
reorganization of inner loops of DSP kernels and the best embedded memory architecture. The goal is to
reduce memory trafc due to register spills, and maximize the overall performance by accessing several
memories in parallel (many DSPs offer this opportunity even in the embedded software domain). A very
interesting aspect of Atomium, which distinguishes it from most other optimization tools for embedded
systems, is the ability to return a set of Pareto-optimal solutions (i.e., solutions which are not strictly
better than one another in at least one aspect of the cost function), rather than a single solution. This
allows the designer to pick the best point based on the various aspects of cost and performance (e.g.,
silicon area versus power and performance), rather than forcing him to abstract optimality into a single
number.
Performance analysis can be based on simulation, as mentioned above, or rely on automatically con-
structed models that reect the WCET of pieces of software (e.g., RTOS tasks) running on an embedded
processor. Such models, which must be both provably conservative and reasonably accurate, can be
constructed by using an execution model called abstract interpretation [37]. This technique traverses
the software code, while building a symbolic model, often in the form of linear inequalities [38,39],
which represents the requests that the software makes to the underlying hardware (e.g., code fetches,
data loads and stores, code execution). A solution to those inequalities then represents the total cost
of one execution of the given task. It can be combined then with processor, bus, cache, and main
memory models that in turn compute the cost of each of these requests in terms of time (clock cycles) or
energy. This nally results in a complete model for the cost of mapping that task to those architectural
resources.
Another technique for software performance analysis, which does not require detailed models of the
hardware, uses an approximate compilation step from the functional model to an executable model
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 10 #12
3-10 Embedded Systems Handbook
(rather than a set of inequalities as above) annotated with the same set of fetch, load, store, and execute
requests. Then simulation is used, in a more traditional setting, to analyze the cost of implementing
that functionality on a given processor, bus, cache, and memory conguration. Simulation is more
effective than WCET analysis in handling multiprocessor implementations, in which bus conicts and
cache pollution can be difcult, if not utterly impossible, to predict statically in a manner that is not
too conservative. However, its success in identifying the true worst-case depends on the designer ability
to provide the appropriate simulation scenarios. Coverage enhancement techniques from the hardware
verication world [40,41] can be extended to help also in this case.
Similar abstract models can be constructed in the case of implementation as dedicated hardware,
by using high-level synthesis techniques. Such techniques are not yet good enough to generate production-
quality RTL code, but can be considered as a reasonable estimator of area, timing, and energy costs for
both ASIC and FPGA implementations [4244].
SystemC [29] and SpecC [45,46], on the other hand, are more traditional modeling and simulation
languages, for which the design ow is based on successive renement rather than codesign or mapping.
Finally, OPNET [47] and NS [48] are simulators with a rich modeling library specialized for wireline and
wireless networking applications. They help the designer in the more abstract task of generic performance
analysis, without the notion of functionarchitecture separation and codesign.
Communicationperformance analysis, onthe other hand, is generally not done using approximate com-
pilationor WCETanalysis techniques like those outlined above. Communicationis generally implemented
not by synthesis but by renement using patterns andrecipes, such as interrupt-based, DMA-based, and
so on. Thus several design environments and languages at the functionarchitecture level, such as POLIS,
COSMOS, Roses, SystemC, and SpecC, as well as N2C [6], provide mechanisms to replace abstract com-
munication, for example, FIFO-based or discrete-event-based, with detailed protocol stacks using buses,
interrupt controllers, memories, drivers, and so on. These renements can then be estimated either using
a library-based approach (they are generally part of a library of implementation choices anyway), or
sometimes using the approaches described above for computation. Their cost and performance can thus
be combined in an overall system-level performance analysis.
However, approximate performance analysis is often not good enough, and a more detailed
simulation step is required. This can be achieved by using tools, such as Seamless [49], CoMET [50],
MaxSim [51], and N2C [6]. They work at a lower abstraction level, by cosimulating software running on
Instruction Set Simulators (ISSs) and hardware running in a Verilog or VHDL simulator. While the simu-
lation is often slower than with more abstract models, and dramatically slower than with static estimators,
the precision can now be at the cycle level. Thus it permits close investigation of detailed communication
aspects, such as interrupt handling and cache behavior. These approaches are further discussed in the next
section.
The key advantage of using the mapping-based approach over the traditional designevaluateredesign
one is the speed with which design space exploration can be performed. This is done by setting up exper-
iments that change either mapping choices or parameters of the architecture (e.g., cache size, processor
speed, or bus bandwidth). Key decisions, such as the number of processors and the organization of the bus
hierarchy, can thus be based on quantitative application-dependent data, rather than on past experience.
If mapping can then be used to drive synthesis, in addition to simulation and formal verication, advan-
tages in terms of time-to-market and reduction of design effort are even more signicant. Model-based
code generation, as we mentioned in the previous section, is reasonably mature, especially for embedded
software in application areas, such as avionics, automotive electronics, and telecommunications. In these
areas, considerations other than absolute minimum memory footprint and execution time, for example,
safety, sheer complexity, and time-to-market, dominate the design criteria.
At the very least, if some form of automated model-based synthesis is available, it can be used to rapidly
generate FPGA- and processor-based prototypes of the embedded system. This signicantly speeds up
verication, with respect to workstation-based simulation. It permits even some hardware-in-the-loop
validation for cases (e.g., the notion of driveability of a car) in which no formalization or simulation is
possible, but a real physical experiment is required.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 11 #13
Design of Embedded Systems 3-11
3.5 HardwareSoftware Coverication and Hardware Simulation
Traditionally the term hardwaresoftware codesign has been identied with the ability to execute
a simulation of the hardware and software at the same time. We prefer to use the termhardwaresoftware
coverication for this task, and leave codesign for the synthesis- and mapping-oriented approaches out-
lined in the previous section. In the form of simultaneously running an ISS and a Hardware Description
Language (HDL) simulator, while keeping the timing of the two synchronized, the area is not new [52].
In recent years, however, we have seen a number of approaches to speeding up the task, in order to tackle
platforms with several processors, and the need, for example, to boot an operating system in order to
coverify a platform with a processor and its peripherals.
Recent techniques have been devoted to the three main ways in which cosimulation speed can be
increased:
Accelerate the hardware simulator. Coverication generally works at the clock cycle accurate level,
meaning that both the hardware simulator and the ISS view time as a sequence of discrete clock cycles,
ignoring ner aspects of timing (sometimes clock phases are considered, e.g., for DSP systems, in which
different memory banks are accessed in different phases of the same cycle). This allows one to speed up
simulation with respect to traditional event-driven logic simulation, and yet retain enough precision to
identify, for example, bottlenecks, such as interrupt service latency or bus arbitration overhead.
Native-code hardware simulation (e.g., NCSim [28]) and emulation (e.g., QuickTurn [28] and Mentor
Emulation[49]) canbe usedto further speeduphardware simulation, at the expense of longer compilation
times and much higher costs, respectively.
Accelerate the ISS. Compiled-code simulation has been a popular topic in this area as well [53]. The
technique compiles a piece of assembler or Ccode for a target processor into object code that can be run on
a host workstation. This code generally also contains annotations counting clock cycles by modeling the
processor pipeline. The speed-up that can be achieved with this technique over a traditional ISS, which
fetches, decodes, and executes each target instruction individually, is signicant (at least one order of
magnitude). Unfortunately this technique is not suitable for self-modifying code, such as that of a RTOS.
This means that it is difcult to adapt to modern embedded software, which almost invariably runs under
RTOS control, rather than on the bare CPU. However, hybrid techniques involving partial compilation on
the y are reportedly used by companies selling fast ISSs [50,51].
Accelerate the interface between the two simulators. This is the area where the earliest work has been
performed. For example, Seamless [49] uses sophisticated lters to avoid sending requests for memory
accesses over the CPU bus. This allows the bus to be used only for peripheral access, while memory data
are provided to the processor directly by a memory server, which is a simulation lter sitting in between
the ISS and the HDL simulator. The lter reduces stimulation of the HDL simulator, and thus can result in
speed-ups of one or more orders of magnitude, when most of the bus trafc consists of ltered memory
accesses. Of course, also precision of analysis drops, since, for example, it becomes harder to identify an
overload in the processor bus due to a combination of memory and peripheral accesses, since no simulator
component sees both.
In the HDL domain, as mentioned above, progress in the levels of performance has been achieved
essentially by raising the level of abstraction. A cycle-based simulator, that is, one that ignores the
timing information within a clock cycle, can be dramatically faster than one that requires the use of a
timing queue to manage time-tagged events. This is mainly due to two reasons. The rst one is that
now most of the simulation can be executed always, at every simulation clock cycle. This means that it
is much more parallelizable, while event-driven simulators do not t well over a parallel machine due
to the presence of the centralized timing queue. Of course, there is a penalty if most of the hardware is
generally idle, since it has to be evaluated anyway, but clock gating techniques developed for low-power
consumption can obviously be applied here. The second one is that the overhead of managing the time
queue, which often accounts for 50% to 90% of the event-driven simulation time, can now be completely
eliminated.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 12 #14
3-12 Embedded Systems Handbook
Modern HDLs either are totally cycle-based (e.g., SystemC 1.0 [29]) or have a synthesizable subset,
which is fully synchronous and thus fully compilable to cycle-based simulation. The same synthesizable
subset, by the way, is also supported by hardware emulation techniques, for obvious reasons.
Another interesting area of cosimulation in embedded system design is analogdigital cosimulation.
This is because such systems quite often include analog components (ampliers, lters, A/D and D/A
converters, de-modulators, oscillators, phase locked loops [PLLs], etc.), and models of the environment
quite often involve only continuous variables (distance, time, voltage, etc.). Simulink includes a component
for simulating continuous-time models, employing a variety of numerical integration methods, which
can be freely mixed with discrete-time sampled-data subsystems. This is very useful when modeling and
simulating, for example, a control algorithm for automotive electronics, in which the engine dynamics
are modeled with differential equations, while the controller is described as a set of blocks implementing
a sampled-time subsystem.
Simulink is still mostly used to drive software design, despite good toolkits implementing it in recon-
gurable hardware [54,55]. Simulators in the hardware design domain, on the other hand, generally use
HDLs as their input languages. Analog extensions of both VHDL [56] and Verilog [57] are available.
In both cases, one can represent quantities that satisfy either of Kirchhoff s Laws (i.e., conserved over
cycles or nodes). Thus one can easily build netlists of analog components interfacing with the digital
portion, modeled using traditional Boolean or multivalued signals. The simulation environment will then
take care of synchronizing the event-driven portion and the continuous time portion. A key problem
here is to avoid causality errors, when an event that happens later in host workstation time (because
the simulator takes care of it later) has an effect on events that preceded it in simulated time. In this
case, one of the simulators has to roll back in time, undoing any potential changes in the state of the
simulation, and restart with the new information that something has happened in the past (generally the
analog simulator does it, since it is easier to reverse time in that case).
Also in this case, as we have seen for hardwaresoftware cosimulation, execution is much slower than
in the pure event-driven or cycle-based case, due to the need to take small simulation steps in the analog
part. There is only one case in which the performance of the interface between the two domains or of the
continuous time simulator is not problematic. It is when the continuous time part is much slower in reality
than the digital part. A classical example is automotive electronics, in which mechanical time constants
are larger by several orders of magnitude than the clock period of a modern integrated circuit. Thus the
performance of continuous time electronics and mechanical cosimulation may not be the bottleneck,
except in the case of extremely complex environment models with huge systems of differential equations
(e.g., accurate combustion engine models). In that case, hardware emulation of the differential equation
solver is the only option (e.g., see Reference 16).
3.6 Software Implementation
The next two sections provide an overview of traditional design ows for embedded hardware and software.
They are meant to be used as a general introduction to the topics described in the rest of the book, and as
a source of references to standard design practice.
The software components of an embedded system are generally implemented using the traditional
designcodetestdebug cycle, which is often represented using a V-shaped diagram to illustrate the fact
that every implementation level of a complex software systemmust have a corresponding verication level
(Figure 3.4). The parts of the V-cycle which relate to system design and partitioning have been described
in the previous sections. Here we outline the tools that are available to the embedded software developer.
3.6.1 Compilation, Debugging, and Memory Model
Compilation of mathematical formulas into binary machine-executable code followed almost immedi-
ately the invention of electronic computer. The rst Fortran compiler dates back to 1954, and subroutines
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 13 #15
Design of Embedded Systems 3-13
Requirements Product
System
validation
Function and
system analysis
Subsystem and
communication
testing
System design
partitioning
SW design
specification
SW integration
Implementation
FIGURE 3.4 V-cycle for software implementation.
were introduced in 1958, resulting in the creation of the Fortran II language. Since then, languages have
evolved a little, more structured programming methodologies have been developed, and compilers have
improved quite a bit, but the basic method has remained the same. In particular the C language, originally
designed by Kernighan and Ritchie [58] between 1969 and 1972, and used extensively for programming
the Unix operating system, is now dominant in the embedded system world, almost replacing the more
exible but much more cumbersome and less portable assembler. Its descendants Java and C++ are begin-
ning to make some inroads, but are still viewed as requiring too much memory and computing power
for widespread embedded use. Java, although originally designed for embedded applications [59,60],
has a memory model based on garbage collection, that still dees effective embedded real-time
implementation [61].
The rst compilation step from a high-level language is the conversion of the human-written or
machine-generated code into an internal format, called Abstract Syntax Tree [62], which is then translated
into a representation that is closer to the nal output (generally assembler code) and is suitable for a
host of optimizations. This representation can take the form of a control/dataow graph or a sequence
of register transfers. The internal format is then mapped, generally via a graph-matching algorithm, to
the set of available machine instructions, and written out to a le. A set of assembler les, in which
references to data variables and to subroutine names are still based on symbolic labels, are then con-
verted to an absolute binary le, in which all addresses are explicit. This phase is called assembly
and loading. Relocatable code generation techniques, which basically permit code and its data to be
placed anywhere in memory, without requiring recompilation, are now being used also in the embedded
system domain, thanks to the availability of index registers and relative addressing modes in modern
microprocessors.
Debuggers for modern embedded systems are much more vital than for general-purpose programming,
due to the more limited accessibility of the embedded CPU (often no le system, limited display and
keyboard, etc.). They must be able to show several concurrent threads of control, as they interact with
each other and with the underlying hardware. They must also be able to do so by minimally disrupting
normal operation of the system, since it often has to work in real time, interacting with its environment.
Bothhardware andoperating systemsupport are essential, andthe mainRTOS vendors, suchas WindRiver,
all provide powerful interactive multitask debuggers. Hardware support takes the form of breakpoint and
watchpoint registers, which can be set to interrupt the CPU when a given address is used for fetching or
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 14 #16
3-14 Embedded Systems Handbook
data load/store, without requiring one to change the code (which may be in ROM) or to continuously
monitor data accesses, which would dramatically slow down execution.
A key difference between most embedded software and most general-purpose software is the memory
model. In the latter case, memory is viewed as an essentially innite uniform linear array, and the compiler
provides a thin layer of abstraction on top of it, by means of arrays, pointers, and records (or structs).
The operating system generally provides virtual memory capabilities, in the form of user functions to
allocate and deallocate memory, and by swapping less frequently used pages of main memory to disk. This
provides the illusion of a memory as large as the disk area allocated to paging, but with the same direct
addressability characteristics as main memory. In embedded systems, however, money is an expensive
resource, both in terms of size and speed. Cost, power, and physical size constraints generally forbid the
use of virtual memory, and performance constraints force the designer to always carefully lay out data
in memory, and match its characteristics (SRAM, DRAM, Flash, ROM) to those of the data and code.
Scratchpads [63], that is, manually managed areas of small and fast memory, often on-chip SRAM, are still
dominant in the embedded world. Caches are frowned upon in the real-time application domain, since
the time at which a computation is performed often matters much more than the accuracy of its result.
This is due to the fact that, despite a large body of research devoted to timing analysis of software code
in the presence of caches (e.g., see References 64 and 65), their performance must still be assumed to be
worst-case, rather than average-case as in general-purpose and scientic computing, thus leading to poor
performance at a high cost (large and power-hungry tag arrays).
However, compilers that traditionally focused on code optimizations for various underlying architec-
tural features of the processor [66], nowoffer more andmore support for memory-orientedoptimizations,
in terms of scheduling data transfers, sizing memories of various types, and allocating data to memory,
sometimes moving it back and forth between fast and expensive and slow and cheap storage
1
[2,63].
3.6.2 Real-Time Scheduling
Another key difference with respect to general-purpose software are the real-time characteristics of most
embedded software, due to its continual interaction with an environment that seldom can wait. In hard
real-time applications, results producedafter the deadline are totally useless. Onthe other hand, in soft real-
time applications a merit function measures QOS, allowing one to evaluate trade-offs between missing
various deadlines and degrading the precision or resolution with which computations are performed.
While the former is often associated with safety-critical (e.g., automotive or avionics) applications and
the latter is associated to multimedia and telecommunication applications, algorithm design can make a
difference even within the very same domain. Consider, for example, a frame decoding algorithm that
generates its result at the end of each execution, and that is scheduled to be executed in real-time every
50th of a second. If the CPU load does not allow it to complete each execution before the deadline, the
algorithm will not produce any results, and thus behave as a hard real-time application, without being
life-threatening. On the other hand, a smarter algorithm or a smarter scheduler would just reduce the
frame size or the frame rate, whenever the CPU load due to other tasks increases, and thus produce a
result that has lower quality, but is still viewable.
Ahuge amount of research, summarized in excellent books, such as References 2325, has been devoted
to solving the problems introduced by real-time constraints on embedded software. Most of this work
models the system (application, environment, and platform) in very abstract terms, as a set of tasks, each
with a release time (when the task becomes ready), a deadline (by which the task must complete), and
a WCET. In most cases tasks are periodic, that is, release times and deadlines of multiple instances of the
same task are separated by a xed period. The job of the scheduler is to nd an execution order such
that each task can complete by its deadline, if it exists. The scheduler may or may not, depending on the
underlying hardware andsoftware platform(CPU, peripherals, andRTOS) be able topreempt anexecuting
1
While this may seem similar to virtual memory techniques, it is generally done explicitly, always keeping cost,
power, and performance under tight control.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 15 #17
Design of Embedded Systems 3-15
task in order to execute another one. Generally the scheduler bases its preemption decision, and the choice
of which task must be run next, on an integer rank assigned to each task and called priority. Priorities
may be assigned statically, at compile time, or dynamically, at runtime. The trade-off is between usage
of precious CPU resources for runtime (also called online) priority assignment, based on an observation
of the current execution conditions, versus the waste of resources inherent in the a priori denition of
a priority assignment. A scheduling algorithm is also supposed in general to be able to tell conservatively
if a set of tasks is unschedulable on a given platform, and given a set of modeling assumptions (e.g.,
availability of preemption, xed or stochastic execution time, and so on). Unschedulability may occur, for
example, because the CPU is not powerful enough and the WCETs are too long to satisfy some deadline.
In this case the remedy could be either the choice of a faster clock frequency, or a change of CPU, or
the transfer of some functionality to a hardware coprocessor, or the relaxation of some of the constraints
(periods, deadlines, etc.).
A key distinction in this domain is between time-triggered and event-triggered scheduling [67]. The
former (also called Time-Division Multiple Access in telecommunications) relies on the fact that the start,
preemption (if applicable), and end times of all instances of all tasks are decided a priori, based on worst-
case analysis. The resulting system implementation is very predictable, easy to debug, and allows one to
guarantee some service even under fault hypotheses [68]. The latter decides start and preemption times
based on the actual time of occurrence of the release events, and possibly on the actual execution time
(shorter than worst-case). It is more efcient than time-triggering in terms of CPU utilization, especially
when release and execution times are not known precisely but subject to jitter. It is, however, more difcult
to use in practice, because it requires either some form of conservative schedulability analysis a priori, and
the dynamic nature of event arrival makes troubleshooting much harder.
Some models and languages listed above, such as synchronous languages and dataow networks, lend
themselves well to time-triggered implementations. Some form of time-triggered scheduling is being, or
will most likely be used for both CPUs and communication resources for safety-critical applications. This
is already state of the art in avionics (y-by-wire, as used e.g., in the Boeing 777 and in all Airbus models),
and it is being seriously considered for automotive applications (X-by-wire, where Xcan stand for brake,
drive, or steer). It is considered, coupled with certied high-level language compilers and standardized
code review and testing processes, to be the only mechanism to comply with the rules imposed by various
governmental certication agencies. Moving such control functions to embedded hardware and software,
thus replacing older mechanical parts, is considered essential in order to both reduce costs and improve
safety. Embedded electronic systems can analyze continuously possible wearing and faults in the sensors
and the actuators, and thus warn drivers or maintenance teams.
The simple task-based model outlined above can also be modied in various ways in order to take into
account:
The cost of various housekeeping operations, such as recomputing priorities, swapping tasks in
and out (also called context switch), accessing memory, and so on.
The availability of multiple resources (processors).
The fact that a task may need more than one resource (e.g., the CPU, a peripheral, a lock on a
given part of memory), and possibly may have different priorities and different preemptability
characteristics on each such resource (e.g., CPU access may be preemptable, while disk or serial
line access may not).
Data or control dependencies between tasks.
Most of these renements of the initial model can be taken into account by appropriately modifying the
basic parameters of a task set (release time, execution time, priority, and so on). The only exception is the
extension to multiple concurrent CPUs, which makes the problem substantially more complex. We refer
the interested reader to References 2325 for more information about this subject. This sort of real-time
schedulability is currently replacing manual trial-and-error and extensive simulation as a means to ensure
satisfaction of deadlines or a given QOS requirement.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 16 #18
3-16 Embedded Systems Handbook
3.7 Hardware Implementation
The modern hardware implementation process [69,70] in most cases starts from the so-called RTL. At this
level of abstraction the required functionality of the circuit is modeled with the accuracy of a clock cycle,
that is, it is known in which clock cycle each operation, such as addition or data transfer, occurs, but the
actual delay of each operation, and hence the stabilization time of data on the inputs of the registers, is not
known. At this level the number of registers and their bitwidths are also precisely known. The designer
usually writes the model using an HDL, such as Verilog or VHDL, in which registers are represented using
special kinds of clock-triggered assignments, and combinational logic operations are represented using
the standard arithmetic, relational, and Boolean operators that are familiar to software programmers
using high-level languages.
The target implementation generally is not in terms of individual transistors and wires, but uses the
Boolean gate abstraction as a convenient hand-off point between logic designer and technology specialist.
Such abstraction can take the form of a standard cell , that is, an interconnection of transistors realized
and well characterized on silicon, which implements a given Boolean function, and exhibits a specic
propagation delay from inputs to outputs, under given supply, temperature, and load conditions. It can
also be a Combinational Logic Block (CLB) in a FPGA. The former, which is the basis of the modern ASIC
design ow, is much more efcient than the latter,
2
however, it requires a very signicant investment in
terms of EDA
3
tools, mask production costs and engineer training.
The advantage of ASICs over FPGAs in terms of area, power, and performance efciency comes from
two main factors. The rst one is the broader choice of basic gates: an average standard cell library includes
about 100 to 500 gates, with both different logic functions and different drive strengths, while a given
FPGA contains only one type of CLB. The second one is the use of static interconnection techniques,
that is, wires and contact vias, versus the transistor-based dynamic interconnects of FPGAs.
The much higher nonrecurrent engineering cost of ASICs comes rst of all from the need to create
at least a set of masks for each design (assuming it is correct the rst time, that is, there is no need to
respin), which can be up to about $ 1 million for current technologies and is growing very fast, and from
the long fabrication times, which can be up to several weeks. Design costs are also higher, again in the
million dollar range, both due to the much greater exibility, requiring skilled personnel and sophisticated
implementation tools, and due to the very high cost of design failure, requiring sophisticated verication
tools. Thus ASIC designs are the most economically viable solution only for very high volumes. The rising
mask costs and manufacturing risks are making the FPGA option viable for larger and larger production
counts as technology evolves. A third alternative, structured ASICs, has been proposed recently. It features
xedlayout schemes, similar toFPGAs, but alsoimplements interconnect using contact vias. Acomparison
of the alternatives, for a given design complexity and varying production volumes, is shown in Figure 3.5
(the exact points at which each alternative is best are still subject to debate, and they are moving to the
right over time).
3.7.1 Logic Synthesis and Equivalence Checking
The semantics of HDLs and of languages, such as C or Java, are very different from each other. HDLs
were born in the 1970s in order to model highly concurrent hardware systems, built using registers and
Boolean gates. They, and the associated simulators that allow one to analyze the behavior of the modeled
design in detail, are very efcient in handling ne-grained concurrency and synchronization, which is
necessary when simulating huge Boolean netlists. However, they often lack constructs found in modern
programming languages, such as recursive functions and complex data types (only recently introduced in
Verilog), or objects, methods, and interfaces. An HDL model is essentially meant to be simulated under
2
The difference is about one order of magnitude interms of area, power, andperformance for the current fabrication
technology, and the ratio is expected to remain constant over future technology generations.
3
The term EDA, which stands for Electronic Design Automation, is often used to distinguish this class of tools from
the CAD tools used for mechanical and civil engineering design.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 17 #19
Design of Embedded Systems 3-17
T
o
t
a
l
c
o
s
t
FPGA SA
Standard cell
C
10,000100,000 100,000. . . 110,000
Volume
B A
FIGURE 3.5 Comparison between ASIC, FPGA, and Structured ASIC production costs.
a variety of timing models (generally at the register transfer or gate level, even though cosimulation with
analog components or continuous time models is also supported, that is, in Verilog-AMS and AHDL).
Synthesis fromanHDL into aninterconnectionof registers and gates, normally consists of two substeps.
The rst one, called RTL synthesis and module generation, transforms high-level operators, such as adders,
multiplexers, and so on, into Boolean gates using an appropriate architecture (e.g., ripple carry or carry
lookahead). The second one, called logic synthesis, optimizes the combinational logic resulting from the
above step, under a variety of cost and performance constraints [71,72].
It is well known that, given a function to be implemented (e.g., 32-bit twos-complement addition),
one can use the properties of Boolean algebra in order to nd alternative implementations with different
characteristics in terms of:
1. Area, for example, estimated as the number of gates, or as the number of gate inputs, or as the
number of literals in the Boolean expression representing each gate function, or using a specic
value for each gate selected from the standard cell library, or even considering an estimate of
interconnect area. This sequence of cost functions increases estimation precision, but is more and
more expensive to compute.
2. Delay, for example, estimated as number of levels, or more precisely as a combination of levels and
fanout of each gate, or even more precisely as a table that takes into account gate type, transistor
size, input transition slope, output capacitance, and so on.
3. Power, for example, estimated as transition activity times capacitance times voltage squared, using
the well-known equation valid for Complementary MOS (CMOS) transistors.
It is also well known that generally Pareto-optimal solutions to this problem exhibit an area-delay product
that is approximately constant for a given function.
Modern EDA tools, such as Design Compiler from Synopsys [8], RTL Compiler from Cadence [28],
Leonardo Spectrum from Mentor Graphics [49], Synplify from Synplicity [73], and Blast Create from
Magma Design Automation [74] and others, perform such task efciently for designs that today may
include a few million gates. Their widespread adoption has enabled designers to tackle huge designs in
a matter of months, which would have been unthinkable or extremely inefcient using either manual
or purely block-based design techniques. Such logic synthesis systems take into account the required
functionality, the target clock cycle, and the set of physical gates that are available for implementation
(the standard-cell library or the CLB characteristics, e.g., number of inputs), as well as some estimates
of capacitance and resistance of interconnection wires
4
and generate efcient netlists of Boolean gates,
which can be passed on to the following design steps.
4
Some such tools also include rough placement and routing steps, which will be described below, in order to
increase the precision of such interconnect estimates for current deep submicron (DSM) technologies.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 18 #20
3-18 Embedded Systems Handbook
While synthesis is performed using precise algebraic identities, bugs can creep into any program.
Thus, in order to avoid extremely costly respins due to an EDA tool bug, it is essential to verify that the
functionality of the synthesized gate netlist is the same as that of the original RTL model. This verication
step was traditionally performed using a multilevel HDL simulator, comparing responses to designer-
written stimuli in both representations. However, multimillion gate circuits would require too many very
slow simulation steps (a large circuit today can be simulated at the speed of a handful of clock cycles
per second). Formal verication is thus used to prove, using algorithms that are based on the same laws
as synthesis techniques, but which have been written by different people and thus hopefully have different
bugs, that indeed the responses of the two circuit models are identical under all legal input sequences.
This verication, however, solves only half of the problem. One must also check that all combinational
logic computations complete within the required clock cycle. This second check can be performed using
timing simulators, however, complexity considerations also suggest the usage a more static approach.
Static Timing Analysis, based on worst-case longest-path search within combinational logic, is today
a workhorse of any logic synthesis and verication framework. It can either be based on purely topological
information, or consider only so-called true paths along which a transition can propagate [75], or even
include the effects of crosstalk on path delay. Crosstalk may alter the delay of a victim wire, due to
simultaneous transitions of temporally and spatially close aggressor wires, as analyzed by tools such
as PrimeTime from Synopsys [8] and CeltIc from Cadence [28]. This kind of coupling of timing and
geometry makes crosstalk-aware timing analysis very hard, and essentially contributes to the breaking of
traditional boundaries between synthesis, placement, and routing.
Tools performing these tasks are available from all major EDA vendors (e.g., Synopsys, Cadence)
as well as from a host of startups. Synthesis has become more or less a commodity technology, while
formal verication, even in its simplest form of equivalence checking, as well as in other emerging forms,
such as property checking, which are described below, is still an emerging technology, for which disruptive
innovation occurs mostly in smaller companies.
3.7.2 Placement, Routing, and Extraction
After synthesis (and sometimes during synthesis) gates are placed on silicon, either at xed locations (the
positions of CLBs) for FPGAs and Structured ASICs, or with a row-based organization for standard cell
ASICs. Placement must avoid overlaps between cells, while at the same time satisfying clock cycle time
constraints, avoiding excessively long wires on critical paths.
5
Placement, especially for multimillion-gate circuits, is an extremely difcult problem, which requires
complex constrained combinatorial optimization. Modern algorithms [76] drastically simplify the model,
in order to ensure reasonable runtimes. For example, the quadratic placement model used in several
modern EDA tools, minimizes the sum of squares of net lengths. This permits very efcient derivation
of the cost function and fast identication of a minimum cost solution. However, this quadratic cost
only approximately correlates with the true objective, which is the minimization of the clock period, due
to parasitic capacitance. True cost rst of all depends also on the actual interconnect, which is designed
only later by the routing step, and second depends on the maximum among a set of sums (one for each
register-to-register path), rather than on the sum over all gate-to-gate interconnects. For this reason,
modern placers iterate steps solved using fast but approximate algorithms, with more precise analysis
phases, often involving actual routing, in order to recompute the actual cost function at each step.
Routing is the next step, and involves generating (or selecting from the available prelaid-out tracks
in FPGAs) the metal and via geometries that will interconnect placed cells. It is also extremely difcult
in modern submicron technologies, not only due to the huge number of geometries involved (10 million
gates can easily involve a billion wire segments and contacts), but also due to the complexity of modern
5
Power density has recently become a prime concern for placement as well, implying the need to avoid hot spots
of very active cells, where power dissipation through the silicon substrate would be too difcult to manage.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 19 #21
Design of Embedded Systems 3-19
interconnect modeling. A wire used to be modeled, in CMOS technology, essentially as a parasitic capaci-
tance. This (or minor variations considering also resistance) is still the model used by several commercial
logic synthesis tools. However, nowadays a realistic model of a wire, to be used when estimating the cost
of a placement or of a routing solution, must take into account:
Realistic resistance and capacitance, for example, using the Elmore model [77], considering each
wire segment separately, due to the very different resistance and capacitance characteristics of
different metal layers.
6
Crosstalk noise due to capacitive coupling.
7
This means that, exactly as in placement (and sometimes during placement), one needs to alternate
betweenfast routing using approximate cost functions anddetailedanalysis steps that rene the value of the
cost function. Again, all major EDA vendors offer solutions to the routing problem, which are generally
tightly integrated with the placement tool, even though in principle the two perform separate functions.
The reason for the tight coupling lies in the above-mentioned need for the placer to accurately estimate
the detailed route taken by a given interconnect, rather than just estimate it with the square of the distance
between its terminals.
Exactly as in the case of synthesis, a verication step must be performed after placement and routing.
This is required in order to verify that:
All design rules are satised by the nal layout.
All and only the desired interconnects have been realized by placement and routing.
This step is done by extracting electrical and logic models fromlayout masks, and comparing these models
with the input netlist (already veried for equivalence with the RTL). Note that within each standard
cell, design rules are veried independently, since the ASIC designer, for reason of intellectual property
protection, generally does not see the actual layout of the standard cells, but only an external envelope
of active (transistor) and interconnect areas, which is sufcient to perform this kind of verication. The
layout of each cell is known and used only at the foundry, when masks are nally produced.
3.7.3 Simulation, Formal Verication, and Test Pattern Generation
The steps mentioned above create a layout implementation from RTL, while checking simultaneously that
no errors are introduced, either due to programming errors, or due to manual modications, and that
performance and power constraints are satised. However, they neither ensure that the original RTL
model satised the customer-dened requirements, nor that the circuit after manufacturing did not have
any aws compromising either its functionality or its performance.
The former problem is tackled by simulation, prototyping, and formal verication. None of these
techniques is sufcient to ensure that an ill-dened problem has a solution: customer needs are inherently
nonformalizable.
8
However, they help building up condence in the fact that the nal product will
satisfy the requirements. Simulation and prototyping are both trial-and-error procedures, similar to the
compiledebug cycle used for software. Simulation is generally cheaper, since it only requires a general-
purpose workstation (nowadays often a PC running Linux), while prototyping is faster (it is based on
synthesizing the RTL model into one or several FPGAs). Cost and performance of these options differ by
6
Layers that are farther away from silicon are best for long-distance wires, due to the smaller substrate and mutual
capacitance, as well as due to the smaller sheet resistance [78].
7
Inductance fortunately is not yet playing a signicant role, and many doubt that it ever will, for digital integrated
circuits.
8
For example, what is the denition of a correct phone call? Does this refer to not dropping the communication?
To transferring exactly a certain number of voice samples per second? To setting up quickly a communication path?
Since all these desirable characteristics have a cost, what is the maximum price various classes of customers are willing
to pay for them, and what is the maximum degree of violation that can be admitted by each class?
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 20 #22
3-20 Embedded Systems Handbook
several orders of magnitude. Prototyping on multi-FPGA platforms, such as those offered by Quickturn,
is thus limited to the most expensive designs, such as microprocessors.
9
Unfortunately, both simulation and prototyping suffer from a basic capacity problem. It is true that
cost decreases exponentially and performance increases exponentially over technology generations for the
simulation and prototyping platforms (CPUs and FPGAs). However, the complexity of the verication
problem grows as a double or even triple exponential (approximately) with technology. The reason is that
the number of potential states of a digital design grows exponentially with the number of memory-holding
components (ip-ops and latches), and the complexity of the verication problemfor a sequential entity
(e.g., a FSM) grows even more than exponentially with its state-space. For this reason, the growth in the
number of input patterns which are required to prove up to a given level of condence that a design is
correct, grows triply exponentially with each technology generation, while capacity and performance grow
only as a single exponential. This is clearly an untenable situation, given that the number of engineers is
nite, and the size of the verication teams is already much larger than that of the design teams.
Formal verication, dened as proving semiautomatically that, under a set of assumptions, a given
property holds for a design, is a means of alleviating at least the human aspect of the verication
complexity explosion problem. Formal verication allows one to state a property, such as, for example,
this protocol never deadlocks or the value of this register is never overwritten before being read,
using relatively simple mathematical formulas. Then one can automatically check that the property holds
over all possible input sequences. The problem, unfortunately, is inherently extremely complex (the triple
exponential mentionedabove affects this formulationas well). However, the complexity is nowrelegatedto
the automated portion of the ow. Thus manual generation and checking of individual pattern sequences
is no longer required. Several EDAcompanies onthe market, such as Cadence, Mentor Graphics, Synopsys,
as well as several silicon vendors, such as Intel and IBM, currently offer or internally develop and use such
tools. The key barriers to adoption are twofold:
1. The complexity of the task, as mentioned above, is just shifted. While a workstation costs much
less than an engineer, exponential growth is never tenable in the long term, regardless of the
constant factors. This means that signicant human intervention is still required in order to keep
within acceptable limits the time required to check each individual property. This involves both
breaking properties into simpler subproperties and abstracting away aspects of the system that
are not relevant for the property at hand. Abstraction, however, hides aspects of the real design
from the automated prover, and thus implies the risk of false positive results, that is, of declaring
a system correct even when it is not.
2. Specication of properties is much more difcult than identication of input patterns. A property
must encompass a variety of possible scenarios and state explicitly all assumptions made (e.g.,
there is no deadlock in the bus access protocol only if no master makes requests at every clock
cycle). The language in which properties are specied is often a form of mathematical logics, and
thus is even less familiar than software languages to a typical design engineer.
However, signicant progress is being made in this area every year by researchers, and adoption of such
automated formal verication techniques in the specication verication domain is growing.
Testing a manufactured circuit to verify that it operates correctly according to the RTL model is a
closely related problem. In principle, one would need to prove equivalent behavior under all possible
inputoutput sequences, which is clearly impossible. In practice, test engineers either use a naturally
orthogonal architecture, such as that of a microprocessor, in order to functionally test small sequences of
instructions. Or they decompose testing into that of combinational and sequential logic. Combinational
logic testing is a relatively easy task, as compared to the formal verication described above. If one
considers only Booleanfunctionality (i.e., delay is not tested), its complexity (assuming that no polynomial
9
Nowadays evenmicroprocessors are mostly designedusing a modiedASIC-like ow, except for memories, register
les, and sometimes portions of the ALU, which are still designed by hand down to the polygon level, at least for leading
edge CPUs.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 21 #23
Design of Embedded Systems 3-21
algorithm exists for NP-complete problems) is just a single exponential in the number of combinational
circuit inputs.
While a priori there is no reason why testing only Boolean equivalence between the specication and the
manufactured circuit should be enough to ensure correct functionality, empirically there is a signicant
amount of evidence that fully testing for a relatively small class of Boolean manufacturing faults, namely
stuck-at faults, is sufcient to ensure satisfactory actual yield for ASICs. The stuck-at-fault model assumes
that the only problem that can occur during manufacturing is the fact that some gate inputs are xed
at logical 0 or 1. This may have been a physically realistic model in the early days of bipolar-based
TransistorTransistor Logic. However, in DSM CMOS a host of physical defects may short wires together,
increase or decrease their resistance and capacitance, short a transistor gate to its source or drain, and so
on. At the logic level, a combinational function may become sequential (even worse, may exhibit dynamic
behavior, that is, slowly change output values over time, without changing inputs), or it may become faster
or slower. Still, full checking for stuck-at faults is excellent to ensure that none of these complex physical
problems has occurred, or will affect the operation of the circuit.
For this reason, today testing is mostly accomplished by rst of all reducing sequential testing to
combinational testing, by using special memory elements, the so-called scan ip-ops and latches. Second,
combinational test pattern generation is performed only at the Boolean level, using the above-mentioned
stuck-at model. Test pattern generation is similar to equivalence checking, because it amounts to proving
that two copies of the same circuit, one with and one without a given fault, are indeed not equivalent. The
witness to this nonequivalence is the pattern to be applied to the circuit inputs to identify the fault.
The problem of actually applying the pattern to the physical fragment of combinational logic, and then
observing its outputs to verify if the fault is present, is solved by converting all or most of the registers
of the sequential circuit into one (or a handful of) giant shift registers, each including several hundred
thousand bits. The pattern (and several others, used to test several CLBs in parallel) is rst loaded serially
through the shift register. Then a multiplexer at the input of each ip-op is switched, transforming the
serial loading mode into parallel loading mode, using as register inputs the outputs of each CLB. Finally,
serial conversion is performed again, and the outputs of the logic are checked for correctness by the test
equipment. Figure 3.6 shows an example of this sort of arrangement, in which also the ip-op clock is
changed fromnormal operation (in which it can be gated) to test mode. The only drawback of this elegant
solution, due to the IBM engineers in the 1970s, is the additional time that the circuit needs to spend on
very expensive testing machines, in order to shift patterns in and out through very long ip-op chains.
Test pattern generation for combinational circuits is a very well-established area of research, and again the
reader is referred to one of many books in the area for a more extensive description [79].
Note that memories are not tested using this mechanism, both because it would be too expensive to
convert each cell into a scan register, and because the stuck-at-fault model does not apply to this kind
of circuits. Memories are tested using appropriate inputoutput pattern sequences, which are generated,
applied, and veried on-chip, using either self-test software running on the embedded processor, or some
Test_Data
Test_Mode
Test_Clk
User_Clk
Q Q
Sout Sout
FIGURE 3.6 Two scan ip-ops with combinational logic.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 22 #24
3-22 Embedded Systems Handbook
form of Built-In Self-Test (BIST) logic circuitry. Modern RAM generators, that produce directly the layout
in a given process, based on the requested number of rows and columns, often produce directly also the
BIST circuitry.
3.8 Conclusions
This chapter discussed several aspects of embedded system design, including both methodologies that
allow one to perform judicious algorithmic and architectural decisions, and tools supporting various
steps of these methodologies. One must not forget, however, that often embedded systems are complex
compositions of parts that have been implemented by various parties, and thus the task of physical board
or chip integration can be as difcult as, and much more expensive than, the initial architectural decisions.
In order to support the integration and system testing tasks one must use formal models throughout
the design process, and if possible perform early evaluation of the difculties of integration, by virtual
integration and rapid prototyping techniques. These allow one to nd or avoid completely subtle bugs
and inconsistencies earlier in the design cycle, and thus reduce overall design time and cost.
Thus the ow and tools that we described in this chapter help not only with the initial design, but
also with the nal integration. This is because they are based on executable specications of the whole
system (including models of its environment), early virtual integration, and systematic (often automated)
renement toward implementation.
The last part of the chapter summarized the main characteristics of the current hardware and software
implementation ows. While complete coverage of this huge topic is beyond its scope, a lightweight intro-
duction can hopefully serve to direct the interested reader, who has only a general electrical engineering
or computer science background, toward the most appropriate source of information.
References
[1] F. Balarin, E. Sentovich, M. Chiodo, P. Giusto, H. Hsieh, B. Tabbara, A. Jurecska, L. Lavagno,
C. Passerone, K. Suzuki, and A. Sangiovanni-Vincentelli. HardwareSoftware Co-design of
Embedded Systems The POLIS Approach. Kluwer Academic Publishers, Dordrecht, 1997.
[2] F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, and A. Vandecapelle. Custom
Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia
SystemDesign. Kluwer Academic Publishers, Dordrecht, 1998.
[3] The Mathworks Simulink and StateFlow. http://www.mathworks.com.
[4] National Instruments MATRIXx. http://www.ni.com/matrixx/.
[5] ETAS Ascet-SD. http://www.etas.de.
[6] N2C CoWare SPW and LISATek. http://www.coware.com.
[7] Esterel Technologies Esterel Studio. http://www.esterel-technologies.com.
[8] Design Compiler Synopsys SystemStudio and PrimeTime. http://www.synopsys.com.
[9] Telelogic Tau and Doors. http://www.telelogic.com.
[10] I-Logix Statemate and Rhapsody. http://www.ilogix.com.
[11] D. Harel, H. Lachover, A. Naamad, A. Pnueli, M. Politi, R. Sherman, A. Shtull-Trauring, and
M.B. Trakhtenbrot. STATEMATE: a working environment for the development of complex reactive
systems. IEEE Transactions on Software Engineering, 16:403414, 1990.
[12] The Object Management Group UML. http://www.omg.org/uml/.
[13] L. Lavagno, G. Martin, and B. Selic, Eds. UML for Real: Design of Embedded Real-Time Systems.
Kluwer Academic Publishers, Dordrecht, 2003.
[14] Artisan Software Real Time Studio. http://www.artisansw.com/.
[15] IBM Rational Rose RealTime. http://www.rational.com/products/rosert/.
[16] dSPACE TargetLink and Prototyper. http://www.dspace.de.
[17] OSEK/VDX. http://www.osek-vdx.org/.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 23 #25
Design of Embedded Systems 3-23
[18] E.A. Lee and D.G. Messerschmitt. Synchronous data ow. IEEE Proceedings, 75(9):12351245,
1987.
[19] J. Buck and R. Vaidyanathan. Heterogeneous modeling and simulation of embedded systems in
El Greco. In Proceedings of the International Conference on Hardware Software Codesign, May 2000.
[20] TNI Valiosys Reqtify. http://www.tni-valiosys.com.
[21] R.P. Kurshan. Automata-Theoretic Verication of Coordinating Processes. Princeton University Press,
Princeton, NJ, 1994.
[22] K. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, Dordrecht, 1993.
[23] G. Buttazzo. Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applica-
tions. Kluwer Academic Publishers, Dordrecht, 1997.
[24] H. Gomaa. Software Design Methods for Concurrent and Real-Time Systems. Addison-Wesley,
Reading, MA, 1993.
[25] W.A. Halang and A.D. Stoyenko. Constructing Predictable Real Time Systems. Kluwer Academic
Publishers, Dordrecht, 1991.
[26] R. Ernst, J. Henkel, and T. Benner. Hardwaresoftware codesign for micro-controllers. IEEE Design
and Test of Computers, 10:6475, 1993.
[27] R.K. Gupta and G. De Micheli. Hardwaresoftware cosynthesis for digital systems. IEEE Design
and Test of Computers, 10:2941, 1993.
[28] CeltIc Cadence Design Systems RTL Compiler and Quickturn. http://www.cadence.com.
[29] Open SystemC Initiative. http://www.systemc.org.
[30] G. Berry. The foundations of esterel. In Plotkin, Stirling, and Tofte, Eds., Proof, Language and
Interaction: Essays in Honour of Robin Milner. MIT Press, Lanchester, 2000.
[31] S.A. Edwards. Compiling Esterel into sequential code. In International Workshop on
Hardware/Software Codesign. ACM Press, May 1999.
[32] T.B. Ismail, M. Abid, and A.A. Jerraya. COSMOS: a codesign approach for communicating systems.
In International Workshop on Hardware/Software Codesign. ACM Press, 1994.
[33] W. Cesario, A. Baghdadi, L. Gauthier, D. Lyonnard, G. Nicolescu, Y. Paviot, S. Yoo, A.A. Jerraya,
and M. Diaz-Nava. Component-based design approach for multicore socs. In Proceedings of the
Design Automation Conference, June 2002.
[34] Foresight Systems. http://www.foresight-systems.com.
[35] CARDtools. http://www.cardtools.com.
[36] IMEC ATOMIUM. http://www.imec.be/design/atomium/.
[37] P. Cousot and R. Cousot. Abstract interpretation: a unied lattice model for static analysis of
programs by construction of approximation of xpoints. In Proceedings of the ACM Symposium
on Principles of Programming Languages. ACM Press, 1977.
[38] AbsInt Worst-Case Execution Time Analyzers. http://www.absint.com.
[39] Y.T.S. Li and S. Malik. Performance analysis of embedded software using implicit path
enumeration. In Proceedings of the Design Automation Conference, June 1995.
[40] 0-In Design Automation. http://www.0-in.com/.
[41] C. Norris Ip. Simulation coverage enhancement using test stimulus transformation. In Proceedings
of the International Conference on Computer Aided Design, November 2000.
[42] Forte Design Systems Cynthesizer. http://www.forteds.com.
[43] Celoxica DK Design suite. http://www.celoxica.com.
[44] K. Wakabayashi. Cyber: high level synthesis system from software into ASIC. In R. Camposano
and W. Wolf, Eds., High Level VLSI Synthesis. Kluwer Academic Publishers, Dordrecht, 1991.
[45] D. Gajski, J. Zhu, and R. Domer. The SpecC Language. Kluwer Academic Publishers, Dordrecht,
1997.
[46] D. Gajski, J. Zhu, R. Domer, A. Gerstlauer, and S. Zhao. SpecC: Specication Language and
Methodology. Kluwer Academic Publishers, Dordrecht, 2000.
[47] OPNET. http://www.opnet.com.
[48] Network Simulator NS-2. http://www.isi.edu/nsnam/ns/.
2006 by Taylor & Francis Group, LLC
ZURA: 2824_C003 2005/6/21 20:01 page 24 #26
3-24 Embedded Systems Handbook
[49] Mentor Graphics Seamless and Emulation. http://www.mentor.com.
[50] VAST Systems CoMET. http://www.vastsystems.com/.
[51] Axys Design Automation MaxSim and MaxCore. http://www.axysdesign.com/.
[52] J. Rowson. Hardware/software co-simulation. In Proceedings of the Design Automation Conference,
1994, pp. 439440.
[53] V. Zivojnovic and H. Meyr. Compiled HW/SW co-simulation. In Proceedings of the Design
Automation Conference, 1996.
[54] Altera DSP Builder. http://www.altera.com.
[55] Xilinx System Generator. http://www.xilinx.com.
[56] IEEE. Standard 1076.1, vhdl-ams. http://www.eda.org/vhdl-ams.
[57] OVI. Verilog-a standard. http://www.ovi.org.
[58] B. Kernighan and D. Ritchie. The C Programming Language. Prentice-Hall, New York, 1988.
[59] K. Arnold and J. Gosling. The Java Programming Language. Addison-Wesley, Reading, MA, 1996.
[60] Sun Microsystem, Inc. Embedded Java Specication. Available at http://java.sun.com, 1998.
[61] Real-Time for Java Expert Group. The real time specication for Java. Available at http://
rtsj.dev.java.net/, 1998.
[62] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer Algorithms.
Addison-Wesley, Reading, MA, 1974.
[63] P. Panda, N. Dutt, and A. Nicolau. Efcient utilization of scratch-pad memory in embedded
processor applications. In Proceedings of Design Automation and Test in Europe (DATE), February
1997.
[64] Y.T.S. Li, S. Malik, and A. Wolfe. Performance estimation of embedded software with instruc-
tion cache modeling. In Proceedings of the International Conference on Computer-Aided Design,
November 1995.
[65] F. Mueller and D.
1
may mean that data is sent fromCPU
1
to CPU
2
with a higher
frequency, at least for a limited amount of time. This means, that the bus is more loaded by this trafc,
which may slow down the communication from CPU
3
to CPU
4
. If this communication performance has
a direct inuence on the system performance, we will see a decreased overall system performance.
Over synchronization. Assume that the upper andlower branches inFigure 4.1 have nomutual functional
dependence as the dataow arrows indicate. Assume further that process B is blocked when it tries to send
data to C1 or D1, but the receiver is not ready to accept the data. Then, a delay or deadlock in branch D
will propagate back through process B to both A and the entire C branch.
These examples are not limited to situations when different MoCs interact. They show that, when
separate, seemingly unrelated subsystems interact via a nonobvious mechanism, which is often a shared
resource, the effects can be hard to analyze. When the different subsystems are modeled in different
MoCs the problem is even more pronounced due to different communication semantics, synchronization
mechanisms, and time representation.
4.1.5 Time
The treatment of time will serve for us as the most important dimension to distinguish MoCs. We can
identify at least four levels of accuracy, which are continuous time, discrete time, clocked time, and
causality. In the sequel, we only cover the last three levels.
When time is not modeled explicitly, events are only partially ordered with respect to their causal
dependences. In one approach, taken for instance in deterministic dataow networks [14, 15], the system
A B
C1 C2 C3
D1 D2 D3
FIGURE 4.1 Over synchronization between functionally independent subsystems.
2006 by Taylor & Francis Group, LLC
Models of Embedded Computation 4-7
behavior is independent of delays and timing behavior of computation elements and communication
channels. These models are robust with respect to time variations in that any implementation, no matter
how slow or fast it is, will exhibit the same behavior as the model. Alternatively, different delays may affect
the systems behavior and we obtain an inherently nondeterministic model since time behavior, which is
not modeled explicitly is allowed to inuence the observable behavior. This approach has been taken both
in the context of dataow models [1619] and process algebras [20, 21]. In this chapter we follow the
deterministic approach, which can be generalized to approximate nondeterministic behavior by means of
stochastic processes as shown in Reference 22.
To exploit the very regular timing of some applications, the synchronous dataow (SDF) [23] has been
developed. Every process consumes and emits a statically xed number of events in each evaluation cycle.
The evaluation cycle is the reference time. The regularity of the application is translated into a restriction
of the model, which in turn allows efcient analysis and synthesis techniques that are not applicable for
more general models. Scheduling, buffer size optimization, and synthesis have been successfully developed
for the SDF.
One facet related to the representation of time is the dichotomy of dataow dominated and control ow
dominated applications. Dataow dominated applications tend to have events that occur in very regular
intervals. Thus, explicit representation of time is not necessary and in fact often inefcient. In contrast,
control dominated applications deal with events occurring at very irregular time instants. Consequently,
explicit representation of time is a necessity because the timing of events cannot be inferred. Difculties
arise in systems that contain both elements. Unfortunately, these kind of systems become more common
since the average system complexity steadily increases. As a consequence, several attempts to integrate
dataow and control dominated modeling concepts have emerged.
In the synchronous piggybacked dataow model [24] control events are transported on dataow
streams to represent a global state without breaking the locality principle of dataow models.
The composite signal ow [25] distinguishes between control and dataow processes and puts sig-
nicant effort to maintain the frame-oriented processing which is so common in dataow and signal
processing applications for efciency reasons. However, conicts occur when irregular control events
must be synchronized with dataow events inside frames. The composite signal ow addresses this prob-
lem by allowing an approximation of the synchronization and denes conditions when approximations
are safe and do not lead to erroneous behavior.
Time is divided up into time slots or clock cycles by various synchronous models. According to the
perfect synchrony assumption [26, 27] neither communication nor computation takes any noticeable
time and the time slots or evaluation cycles are completely determined by the arrival of input events. This
assumption is useful because designer and tools can concentrate solely on the functionality of the system
without mixing this activity with timing considerations. Optimization of performance can be done in
a separate step by means of static timing analysis and local retiming techniques. Even though timing
does not appear explicitly in synchronous models, the behavior is not independent of time. The model
constrains all implementations such that they must be fast enough to process input events properly and
to complete an evaluation cycle before the next events arrive. When no events occur in an evaluation
cycle, a special token called absent event is used to communicate the advance of time. In our framework
we use the same technique in Sections 4.2.4 and 4.2.5 for both the synchronous MoC and the fully
timed MoC.
Discrete timed models use a discrete set, usually integers or natural numbers, to assign a time stamp to
each event. Many discrete event models fall into this category [2830] as well as most popular hardware
descriptionlanguages, suchas VHDLandVerilog. Timing behavior canbe modeled most accurately, which
makes it the most general model we consider here and makes it applicable to problems such as detailed
performance simulation where synchronous and untimed models cannot be used. The price for this is
the intimate dependence of functional behavior on timing details and signicantly higher computation
costs for analysis, simulation, and synthesis problems. Discrete timed models may be nondeterministic,
as mainly used in performance analysis and simulation (see e.g., Reference 30), or deterministic, as more
desirable for hardware description languages such as VHDL.
2006 by Taylor & Francis Group, LLC
4-8 Embedded Systems Handbook
The integration of these different timing models into a single framework is a difcult task. Many
attempts have been made on a practical level with a concrete design task, mostly simulation, in mind
[3135]. On a conceptual level Lee and Sangiovanni-Vincentelli [36] have proposed a tagged time model
in which every event is assigned a time tag. Depending on the tag domain we obtain different MoCs.
If the tag domain is a partially ordered set, it results in an untimed model according to our denition.
Discrete, totally ordered sets lead to timed MoCs and continuous sets result in continuous time MoCs.
There are two main differences between the tagged time model and our proposed framework. First, in the
tagged time model processes do not know how much time has progressed when no events are received
since global time is only communicated via the time stamps of ordinary events. For instance, a process
cannot trigger a time-out if it has not received events for a particular amount of time. Our timed model
in Section 4.2.5 does not use time tags but absent events to globally order events. Since absent events are
communicated between processes whenever no other event occurs, processes are always informed about
the advance of global time. We chose this approach because it resembles better the situation in design
languages, such as VHDL, C, or SDL (Specication and Description Language) where processes always can
experience time-outs. Second, one of our main motivations was the separation of communication and
synchronization issues from the computation part of processes. Hence, we strictly distinguish between
process interfaces and process functionality. Only the interfaces determine to which MoC a process
belongs, while the core functionality is independent of the MoC. This feature is absent from the tagged
token model. This separation of concerns has been inspired by the concept of ring cycles in dataow
process networks [37]. Our mechanism for consuming and emitting events based on signal partitionings
as described in Sections 4.2.2 and 4.2.3.1 is only slightly more general than the ring rules described by
Lee [37] but it allows a useful denition of process signatures based on the way processes consume and
emit events.
4.1.6 The Purpose of an MoC
As mentioned several times, the purpose of a computational model determines, how it is designed, what
properties it exposes, and what properties it suppresses.
We argue that MoCs for embedded systems should not address principal questions of computability or
feasibility, but should rather aid the design and validation of concrete systems. How this is accomplished
best remains a subject of debate, but for this chapter we assume that an MoCshould support the following
properties:
Implementation independence. An abstract model should not expose too much details of a possible
implementation, for example, which kind of processor is used, how much parallel resources are available,
what kind of hardware implementation technology is used, details of the memory architecture, etc. Since
an MoC is a machine abstraction, it should, by denition, avoid unnecessary machine details. Practically
speaking, the benets of an abstract model include that analysis and processing is faster and more efcient,
that analysis results are relevant for a larger set of implementations, and that the same abstract model
can be directed to different architectures and implementations. On the downside we note diminished
analysis accuracy and a lack of knowledge of the target architecture that can be exploited for modeling
and design. Hence, the right abstraction level is a ne line that is also changing over time. While many
embedded system designers could for long safely assume a purely sequential implementation, current
and future computational models should avoid such an assumption. Resource sharing and scheduling
strategies become more complex, and an MoC should thus either allow the explicit modeling of such a
strategy or restrict the implementations to follow a particular, well-dened strategy.
Composability. Since many parts and components are typically developed independently and integrated
into a system, it is important to avoid unexpected interferences. Thus, some kind of composability property
[38] is desirable. One step in this direction is to have a deterministic computational model such as Kahn
process networks that guarantee a particular behavior independent of the time or individual activities and
independent of the amount of available resources in general.
2006 by Taylor & Francis Group, LLC
Models of Embedded Computation 4-9
This is of course only a rst step since, as argued earlier, time behavior is often an integral part of
the functional behavior. Thus, resource sharing strategies, that greatly inuence timing, will still have
a major impact on the system behavior even for fully deterministic models. We can reconcile good system
composability with shared resources by allocating a minimum but guaranteed amount of resources for
each subsystem or task. For instance, two tasks get a xed share of the communication bandwidth of
a bus. This approach allows for ideal composability but has to be based on worst case behavior. It is very
conservative and hence, does not utilize resources efciently.
We can relax this approach by allocating abstract resource budgets as part of the computational model.
Then we require from the implementation to provide the requested resources, and at the same time to
minimize the abstract budgets and thus the required resources. As example consider two tasks that have
a particular communication need per abstract time slot, where the communication need may be different
for different slots. The implementation has to fulll the communication requirements of all tasks by
providing the necessary bandwidth in each time slot, tuning the length of the individual time slots, or by
moving communication from one slot to another. These optimizations will also have to consider global
timing and resource constraints. In any case, in the abstract model we can deal with abstract budgets and
assume that they will be provided by any valid implementation.
Analyzability. A general tradeoff exists between the expressiveness of a model and its analyzability.
By restricting models in clever ways, one can apply powerful and efcient analysis and synthesis methods.
For instance, the SDF model allows all actors only a constant amount of input and output tokens in
each activation cycle. While this restricts the expressiveness of the model, it allows to efciently compute
static schedules when they exist. For general dataow graphs this may not be possible because it could be
impossible to ensure that the amount of input and output is always constant for all actors, even if they
are in a particular case. Since SDF covers a fairly large and important application domain, it has become
a very useful MoC. The key is to understand what are the important properties (nding static schedules,
nding memory bounds, nding maximum delays, etc.) and devising an MoC that allows to handle these
properties efciently and does not restrict the modeling power too much.
In the following sections we discuss a framework to study different MoCs. The idea is to use different
types of process constructors to instantiate processes of different MoCs. Thus, one type of process con-
structors would yield only untimed processes, while another type results in timed processes. The elements
for process construction are simple functions and are in principle independent of a particular MoC.
However, the independence is not complete since some MoCs put specic constraints on the functions.
But still the separation of the process interfaces from the internal process behavior is fairly far reaching.
The interfaces determine the time representation, synchronization, and communication, hence the MoC.
In this chapter we will not elaborate all interesting and desirable properties of computational models.
Rather we will use the framework to introduce four different MoCs that only differ in their timing
abstraction. Since time plays a very prominent role in embedded systems, we focus on this aspect and
show how different time abstractions can serve different purposes and needs. Another dening aspect of
embedded systems is heterogeneity, which we address by allowing different MoCs to coexist in a model.
The common framework makes this integration semantically clean and simple. We study two particular
aspects of this coexistence, namely the interfaces between two different MoCs and the renement of one
MoC into another.
Other central issues of embedded systems, such as power consumption, global analysis and
optimization, are not covered, mostly because they are not very well understood in this context and
few advanced proposals exist on how to deal with them from an MoC perspective.
4.2 The MoC Framework
In the remainder of this chapter we discuss a framework that accommodates MoCs with different timing
abstractions. It is based on process constructor, which is a mechanism to instantiate processes. A process
constructor takes one or more pure functions as arguments and creates a process. The functions represent
2006 by Taylor & Francis Group, LLC
4-10 Embedded Systems Handbook
the process behavior and have no notion of time or concurrency. They simply take arguments and produce
results. The process constructor is responsible for establishing communication with other processes.
It denes the time representation, the communication, and synchronization semantics. A set of process
constructors determines a particular MoC. This leads to a systematic and clean separation of computation
and communication. A function, that denes the computation of a process, can in principle be used
to instantiate processes in different computational models. However, a computational model may put
constraints on functions. For instance, the synchronous MoC requires a function to take exactly one event
on each input and produce exactly one event for each output. The untimed MoC does not have a similar
requirement.
After some preliminary denitions in this section, we introduce the untimed processes, give a formal
denition of an MoC, and dene the untimed MoC (Section 4.2.3) the perfectly synchronous and the
clocked synchronous MoC (Section 4.2.4), and the discrete time MoC (Section 4.2.5). Based on this
we introduce interfaces between MoCs and present an interface renement procedure in the next section.
Furthermore, we discuss the renement fromanuntimedMoCtoa synchronous MoCandtoa timedMoC.
4.2.1 Processes and Signals
Processes communicate with each other by writing to and reading from signals. Given is a set of values V,
which represents the data communicated over the signals. Events, which are the basic elements of signals,
are or contain values. We distinguish among three different kinds of events.
Untimed events
E are just values without further information,
E = V. Synchronous events
E include
a pseudo-value in addition to the normal values, hence
E = V {}. Timed events
E are identical
to synchronous events,
E =
E. However, since it is often useful to distinguish them, we use different
symbols. Intuitively, timedevents occur at muchner granularity thansynchronous events andthey would
usually represent physical time units, such as a nanosecond. In contrast, synchronous events represent
abstract time slots or clock cycles. This model of events and time can only accommodate discrete time
models. Continuous time would require a different representation of time and events. We use the symbols
e, e, and e to denote individual untimed, synchronous, andtimedevents, respectively. We use E =
E
E
and e E to denote any kind of event.
Signals are sequences of events. Sequences are ordered and we use subscripts as in e
i
to denote the
ith event in a signal. For example, a signal may be written as e
0
, e
1
, e
2
. In general, signals can be nite
or innite sequences of events and S is the set of all signals. We also distinguish among three kinds of
signals and
S,
S, and
S denote the untimed, synchronous, and timed signal sets, respectively, and s, s, and
s designate individual untimed, synchronous, and timed signals.
is the empty signal and concatenates two signals. Concatenation is associative and has the empty
signal as its neutral element: s
1
(s
2
s
3
) = (s
1
s
2
) s
3
, s = s = s. To keep the notation
simple we often treat individual events as one-event sequences, for example, we may write e s to denote
e s.
We use angle brackets, and not only to denote ordered sets or sequences of events, but also to
denote sequences of signals if we impose an order on a set of signals.
#s gives the length of signal s. Innite signals have innite length and # = 0.
[] is an index operation to extract an event on a particular position from a signal.
For example, s[2] = e
2
if s = e
1
, e
2
, e
3
.
Processes are dened as functions on signals
p : S S.
Processes are functions in the sense that for a given input signal we always get the same output signal,
that is, s = s
p(s) = p(s
). Note, that this still allows processes to have an internal state. Thus,
a process does not necessarily react identical to the same event applied at different times. But it will
2006 by Taylor & Francis Group, LLC
Models of Embedded Computation 4-11
s=1r
0
,r
1
, ...2=11e
0
, e
1
, e
2
2,1e
3
, e
4
, e
5
2, ...2
p
n
(s) =1r
i
2 for n(i ) =3 for all i
s=1r
0
, r
1
, ...2=11e
0
, e
1
2, 1e
2
,e
3
2, ...2
n
(s) =1r
i
2 for n(i ) =2 for all i
p
FIGURE 4.2 The input signal of process p is partitioned into an innite sequence of subsignals each of which
contains three events, while the output signal is partitioned into subsignals of lengths 2.
produce the same, possibly innite, output signal when confronted with identical, possibly innite, input
signals provided it starts with the same initial state.
4.2.2 Signal Partitioning
We shall use the partitioning of signals intosubsequences todene the portions of a signal that is consumed
or emitted by a process in each evaluation cycle.
Apartition (, s) of a signal s denes an ordered set of signals, r
i
, which, when concatenated together,
form almost the original signal s. The function : N
0
N
0
denes the lengths of all elements in the
partition. (0) = #r
0
gives the length of the rst element in the partition, (1) = #r
1
gives the length of
the second element, etc.
Example 4.1 Let s
1
= 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 and
1
(0) =
1
(1) = 3,
1
(2) = 4. Then we get the
partition (
1
, s
1
) = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
Let s
2
= 1, 2, 3, . . . be the innite signal with ascending integers. Let
2
(i) = 2 for all i 0.
The resulting partition is innite: (
2
, s
2
) = 1, 2, 3, 4, . . ..
The function (i) denes the length of the subsignals r
i
. If it is constant for all i we usually omit the
argument and write . Figure 4.2 illustrates a process with an input signal s and an output signal s
. s is
partitioned into subsignals of length 3 and s
S
S be two processes with one input and one output each, and let s
1
, s
2
S be
two signals. Their parallel composition, denoted as p
1
p
2
, is dened as follows.
(p
1
p
2
)(s
1
, s
2
) = p
1
(s
1
), p
2
(s
2
).
Since processes are functions we can easily dene sequential composition in terms of functional
composition.
Denition 4.3 Let again p
1
, p
2
:
S
S be two processes and let s
S be a signal. The sequential composition,
denoted as p
1
p
2
, is dened as follows.
(p
2
p
1
)(s) = p
2
(p
1
(s)).
Denition 4.4 Given a process p : (S S) (S S) with two input signals and two output signals we
dene the process p : S S by the equation
(p)(s
1
) = s
2
where p(s
1
, s
3
) = (s
2
, s
3
).
The behavior of the process p is dened by the least xed point semantics based on the prex order of signals.
The operator gives feedback loops (Figure 4.3) a well-dened semantics. Moreover, the value of the
feedback signal can be constructed by repeatedly simulating the process network starting with the empty
signal until the values on all feedback signals stabilize and do not change any more [39].
Now we are in a position to dene precisely what we mean with an MoC.
Denition 4.5 An MoC is a 2-tuple MoC = (C, O), where C is a set of process constructors, each of which,
when given constructor specic parameters, instantiates a process. O is a set of process composition operators,
each of which, when given processes as arguments, instantiates a new process.
2006 by Taylor & Francis Group, LLC
Models of Embedded Computation 4-13
p
s
1
s
3
s
2
mp
FIGURE 4.3 Feedback composition of a process.
Denition 4.6 The untimed MoC is dened as untimed MoC = (C, O ), where
C = {mealyU, zipU, unzipU}
O = {, , }.
In other words, a process or a process network belongs to the untimed MoC domain iff all its processes
and process compositions are constructed either by one of the named process constructors or by one of the
composition operators. We call such processes U-MoC processes.
Because the process interface is separated from the functionality of the process, interesting transforma-
tions can be done. For instance, a process can be mechanically transformed into a process that consumes
and produces a multiple number of events of the original process. Processes can be easily merged into
more complex processes. Moreover, there may be the opportunity to move functionality from one process
to another. For more details on this kind of transformations see Reference 39.
4.2.4 The Synchronous MoC
The synchronous languages StateCharts [40], Esterel [41], Signal [42], Argos, Lustre [43], and some others
have been developed on the basis of the perfect synchrony assumption.
Perfect synchrony hypothesis. Neither computation nor communication takes time.
Timing is entirely determined by the arriving of input events because the system processes input
samples in zero time and then waits until the next input arrives. If the implementation of the system is
fast enough to process all inputs before the next sample arrives, it will behave exactly as the specication
in the synchronous language.
4.2.4.1 Process Constructors
Formally, we develop synchronous processes as a special case of untimed processes. This will allow us later
to easily connect different domains.
Synchronous processes have two specic characteristics. First, all synchronous processes consume and
produce exactly one event on each input or output in each evaluation cycle, that is, the signature is
always {1, . . .}, {1, . . .}. Second, in addition to the value set V events can carry the special value ,
which denotes the absence of an event; this is the way we dened synchronous events
E and signals
S in
Section 4.2.1. Both, the processes and their contained functions must be able to deal with these events.
All synchronous process constructors and processes operate exclusively on synchronous signals.
Denition 4.7 Let V be an arbitrary set of values,
E = V {}, let g, f : (
E
S)
S and let w
0
V be
an initial state. mealyS is a process constructor which, given f , g, and w
0
as arguments, instantiates a process
p :
S
S. p repeatedly applies g on the current state and the input event to compute the next state. Further it
2006 by Taylor & Francis Group, LLC
4-14 Embedded Systems Handbook
applies repeatedly f on the current state and the input event to compute the output event. p consumes exactly
one input event in each evaluation cycle and emits exactly one output event.
We only require that g and f are dened for absent input events and that the output signal partitioning is
the constant 1.
When we merge two signals into one we have to decide how to represent the absence of an event in one
input signal in the compound signal. We choose to use the symbol for this purpose also, which has the
consequence, that appears also in tuples together with normal values. Thus, it is essentially used for two
different purposes. Having claried this, the denition for zipS and unzipS is straightforward. zipS-
based processes pack two events from the two inputs into an event pair at the output, while unzipS
performs the inverse operation.
4.2.4.2 The Perfectly Synchronous MoC
Again, we can now make precise what we mean by synchronous MoC.
Denition 4.8 The synchronous MoC is dened as synchronous MoC = (C, O), where
C = {mealyS, zipS, unzipS}
O = {, ,
S
}.
In other words, a process or a process network belongs to the synchronous MoC domain iff all its processes
and process compositions are constructed either by one of the named process constructors or by one of the
composition operators. We call such processes S-MoC processes.
Note, that we do not use the same feedback operator for the synchronous MoC.
S
denes the semantics
of the feedback loop based on the Scott order of the values in
E. It is also based on a xed point semantics
but it is resolved for each event and not over a complete signal. We have adopted
S
to be consistent with
the zero-delay feedback loop semantics of most synchronous languages. For our purpose here this is not
signicant and we do not need to go into more details. For precise denitions and a thorough motivation
the reader is referred to Reference 39.
Merging of processes and other related transformations are very simple in the synchronous MoC
because all processes have essentially identical interfaces. For instance, the merge of two mealyS-based
processes can be formulated as follows.
mealyS(g
1
, f
1
, v
0
) mealyS(g
2
, f
2
, w
0
) = mealyS(g, f , (v
0
, w
0
))
where g((v, w), e) = (g
1
(v, f
2
(w, e)), g
2
(w, e)) f ((v, w), e) = f
1
(v, f
2
(w, e)).
4.2.4.3 The Clocked Synchronous MoC
It is useful to dene a variant of the perfectly synchronous MoC, the clocked synchronous MoC that is
based on the following hypothesis.
Clocked synchronous hypothesis. There is a global clock signal controlling the start of each computation in
the system. Communication takes no time and computation takes one clock cycle.
First, we dene a delay process that delays all inputs by one evaluation cycle.
= mealyS( f , g, )
where g(w, e) = e, f (w, e) = w.
Based on this delay process we dene the constructors for the clocked synchronous model.
2006 by Taylor & Francis Group, LLC
Models of Embedded Computation 4-15
Denition 4.9
mealyCS(g, f , w
0
) = mealyS(g, f , w
0
)
zipCS()( s
1
, s
2
) = zipS()(( s
1
), ( s
2
))
unzipCS() = unzipS() .
(4.1)
Thus, elementary processes are composed of a combinatorial function and a delay function that essentially
represents a latch at the inputs.
Denition 4.10 The clocked synchronous MoC is dened as clocked synchronous MoC = (C, O), where
C = {mealyCS, zipCS, unzipCS}
O = {, , }.
In other words, a process or a process network belongs to the clocked synchronous MoC Domain iff all its
processes and process compositions are constructed either by one of the named process constructors or by one
of the composition operators. We call such processes CS-MoC processes.
4.2.5 Discrete Timed MoCs
Timed processes are a blend of untimed and synchronous processes in that they can consume and produce
more than one event per cycle and they also deal with absent events. In addition, they have to comply
with the constraint that output events cannot occur before the input events of the same evaluation cycle.
This is achieved by enforcing an equal number of input and output events for each evaluation cycle, and
by prepending an initial sequence of absent events. Since the signals also represent the progression of
time, the prex of absent events at the outputs corresponds to an initial delay of the process in reacting
to the inputs. Moreover, the partitioning of input and output signals corresponds to the duration of each
evaluation cycle.
Denition 4.11 mealyT is a process constructor which, given , f , g, and w
0
as arguments, instantiates
a process p :
S
S. Again, is a function of the current state and determines the number of input events
consumed in a particular evaluation cycle. Function g computes the next state and f computes the output
events with the constraint that the output events do not occur earlier than the input events on which they
depend.
This constraint is necessary because in the timed MoC each event corresponds to a time stamp and we
have a globally total order of time, relating all events in all signals to each other. To avoid causality aws
every process has to abide by this constraint.
Similarly zipT-based processes consume events from their two inputs and pack them into tuples of
events emitted at the output. unzipT performs the inverse operation. Both have also to comply with the
causality constraint.
Again, we can now make precise what we mean by timed MoC.
Denition 4.12 The timed MoC is dened as timed MoC = (C, O), where
C = {mealyT, zipT, unzipT}
O = {, , }.
In other words, a process or a process network belongs to the timed MoCdomain iff all its processes and process
compositions are constructed either by one of the named process constructors or by one of the composition
operators. We call such processes T-MoC processes.
2006 by Taylor & Francis Group, LLC
4-16 Embedded Systems Handbook
Merging other transformations as well as analysis of time process networks is more complicated than
for synchronous or untimed MoCs, because the timing may interfere with the pure functional behavior.
However, we can further restrict the functions used in constructing the processes, to more or less separate
behavior from timing also in the timed MoC. To illustrate this we discuss a few variants of the Mealy
process constructor.
mealyPT. In mealyPT ( , f , g, w
0
) based processes the functions f and g are not exposed to absent
events and they are only dened on untimed sequences. The interface of the process strips-off all absent
events of the input signal, hands over the result to f and g, and inserts absent events at the output as
appropriate to provide proper timing for the output signal. The function , which may depend on the
process state as usual, denes how many events are consumed. Essentially, it represents a timer and
determines when the input should be checked the next time.
mealyST. In mealyST ( , f , g, w
0
) based processes determines the number of nonabsent events
that should be handed over to f and g for processing. Again, f and g never see or produce absent
events and the process interface is responsible for providing them with the appropriate input data and
for synchronization and timing issues on inputs and outputs. Unlike mealyPT processes, functions f
and g in mealyST processes have no inuence on when they are invoked. They only control how many
nonabsent events have appeared before their invocation. f and g in mealyPT processes on the other
hand determine the time instant of their next invocation independent of the number of nonabsent events.
mealyTT. However, a combination of these two process constructors is mealyTT, which allows to
control the number of nonabsent input events and a maximum time period, after which the process is
activated in any case independent of the number of nonabsent input events received. This allows to model
processes that wait for input events but can set internal timers to provide time-outs.
These examples illustrate that process constructors and MoCs could be dened, which allow us to
precisely dene to which extent communication issues are separated from the purely functional behavior
of the processes. Obviously, a stricter separation greatly facilitates verication and synthesis but may
restrict expressiveness.
4.3 Integration of MoCs
4.3.1 MoC Interfaces
Interfaces between different MoCs determine the relation of the time structure in the different domains
and they inuence the way a domain is triggered to evaluate inputs and produce outputs. If an MoC
domain is time triggered, the time signal is made available through the interface. Other domains are
triggered when input data is available. Again, the input data appears through the interfaces.
We introduce a fewsimple interfaces for the MoCs of the previous sections, in order to be able to discuss
concrete examples.
Denition 4.13 A stripS2U process constructor takes no arguments and instantiates a process p :
S
S,
which takes a synchronous signal as input and generates an untimed signal as output. It reproduces all data
from the input in the output in the same order with the exception of the absent event, which is translated into
the value 0.
Denition 4.14 An insertU2S process constructor takes no arguments and instantiates a process
p :
S
S, which takes an untimed signal as input and generates a synchronous signal as output. It reproduces
all data from the input in the output in the same order without any change.
These interface processes between the synchronous and the untimed MoCs are very simple. However, they
establish a strict and explicit time relation between two connected domains.
Connecting processes from different MoCs also requires a proper semantic basis, which we provide by
dening a hierarchical MoC.
2006 by Taylor & Francis Group, LLC
Models of Embedded Computation 4-17
Denition 4.15 A hierarchical model of computation (HMoC) is a 3-tuple HMoC = (M, C, O), where M
is a set of HMoCs or simple MoCs, each capable of instantiating processes or process networks; C is a set of
process constructors; O is a set of process composition operators that governs the process composition at the
highest hierarchy level but not inside process networks instantiated by any of the HMoCs of M.
In the following examples and discussion we will use a specic but rather simple HMoC.
Denition 4.16 H = (M, C, O) with
M = {U-MoC, S-MoC}
C = {stripS2U, insertU2S}
O = {, , }.
Example 4.2 As example, consider the equalizer systemof Figure 4.4 [39]. The control part consists of two
synchronous MoC processes and the dataow part, modeled as untimed MoC processes, lter and analyze
an audio stream. Depending on the analysis results of the Analyzer process, the Distortion control will
modify the lter parameters. The Button control takes also user input into account to steer the lter. The
purpose of Analyzer and Distortion control are to avoid dangerously strong signals that could jeopardize
the loud speakers.
Control and dataow parts are connected via two interface processes. The dataow processes can be
developed and veried separately in the untimed MoC domain, but as soon as they are connected to the
synchronous MoC control part, the time structure of the synchronous MoC domain gets imposed on all
the untimed MoC processes. With the simple interfaces of Figure 4.4, the Filter process consumes 4096
data tokens from the primary input, 1 token from the stripS2U process, and it emits 4096 tokens in
every synchronous MoC time slot. Similarly, the activity of the Analyzer is precisely dened for every
synchronous MoC time slot. Also, the activities of the two control processes are related precisely to the
activities of the dataow processes in every time slot. Moreover, the timing of the two primary inputs
and the primary outputs are now related timewise. Their timing must be consistent because the timing
of the primary input data determines the timing of the entire system. For example, if the input signal to
U-MoC
S-MoC
insertU2S
4096 4096
1
1
1
1 1
1
1
1
1
1
1
4096
Filter
Button
control
}
Distortion
control
stripS2U
Analyzer
FIGURE4.4 Adigital equalizer consisting of a dataowpart and control. The numbers annotating process inputs and
outputs denote the number of tokens consumed and produced in each evaluation cycle. (From A. Jantsch. Modeling
Embedded Systems and SoCs. Morgan Kaufmann Publishers, San Francisco, CA, 2004. With permission.)
2006 by Taylor & Francis Group, LLC
4-18 Embedded Systems Handbook
the Button control process assumes that each time slot has the same time duration, the 4096 data samples
of the Filter input in each evaluation cycle must correspond to the same constant time period. It is the
responsibility of the domain interfaces to correctly relate the timing of the different domains to each other.
It is required that the time relation established by all interfaces is consistent with each other and with the
timing of the primary inputs. For instance if the stripS2U takes 1 token as input and emits 1 token
as output in each evaluation cycle, the insertU2S process cannot take 1 token as input and produce 2
tokens as output.
The interfaces in Figure 4.4 are very simple and lead to a strict coupling between the two MoC domains.
Could more sophisticated or nondeterministic interfaces avoid this coupling effect? The answer is no
because even if the input and output tokens of the interfaces vary from evaluation cycle to evaluation
cycle in complex or nondeterministic ways, we still have a very precise timing relation in each and every
time slot. Since in every evaluation cycle all interface processes must consume and produce a particular
number of tokens, this determines the time relation in that particular cycle. Even though this relation
may vary from cycle to cycle, it is still well dened for all cycles and hence for the entire execution of the
system.
The possibly nondeterministic communication delay between MoC domains, as well as between any
other processes, can be modeled, but this should not be confused with establishing a time relation between
two MoC domains.
4.3.2 Interface Renement
In order to show this difference and to illustrate how abstract interfaces can be gradually rened to
accommodate channel delay information and detailed protocols, we propose an interface renement
procedure, given below:
1. Add a time interface. When we connect two different MoC domains, we always have to dene the
time relation between the two. This is the case even if the two domains are of the same type, for example,
both are synchronous MoC domains, because the basic time unit may or may not be identical in the two
domains.
In our MoC framework the occurrence of events also represent time in both the synchronous MoC
and timed MoC domains. Thus, setting the time relation means to determine the number of events in
one domain that correspond to one event in the other domain. For example, in Figure 4.4 the interfaces
establish a one-to-one relation while the interface in Figure 4.5 represents a 3/2 relation.
MoC B
I
1
MoC A
MoC A MoC B
Q
3 2
P
P Q
FIGURE 4.5 Determining the time relation between two MoC domains. (From A. Jantsch. Modeling Embedded
Systems and SoCs. Morgan Kaufmann Publishers, San Francisco, CA, 2004. With permission.)
2006 by Taylor & Francis Group, LLC
Models of Embedded Computation 4-19
In other frameworks the establishing of a time relation will take a different form. For instance, if
languages such as SystemC or VHDL are used, the time of the different domains have to be related to the
common time base of the simulator.
2. Rene the protocol. When the time relation between the two domains is established, we have to
provide a protocol that is able to communicate over the nal interface at that point. The two domains
may represent different clocking regimes on the same chip, or one may end up as software while the other
is implemented as hardware, or both may be implemented as software on different chips or cores, etc.
Depending on the nal implementations we have to develop a protocol fullling the requirements of the
interface, such as buffering and error control.
In our example in Figure 4.6 we have selected a simple handshake protocol with limited buffering
capability. Note, however, that this assumes that for every three events arriving fromMoC A there are only
two useful events to be delivered to MoC B. The interface processes I
1
and I
2
, and the protocol processes
P
1
, P
2
, Q
1
, and Q
2
must be designed carefully to avoid both losing data and deadlock.
3. Model the channel delay. In order to have a realistic channel behavior, the delay can be modeled
deterministically or stochastically. In Figure 4.7 we have added a stochastic delay varying between 2 and 5
MoC Bcycles. The protocol will require more buffering to accommodate the varying delays. To dimension
the buffers correctly we have to identify the average and the worst-case behavior that we should be able to
handle.
This renement procedure proposed here is consistent with and complementary to other techniques
proposed, for example, in the context of SystemC [44]. We only want to emphasize here that the time
relation between domains from channel delay and protocol design have to be separated. Often these issues
MoC A
MoC B
P
1
P
2
I
1
I
2
Q
2
Q
1
FIGURE 4.6 A simple handshake protocol. (From A. Jantsch. Modeling Embedded Systems and SoCs. Morgan
Kaufmann Publishers, San Francisco, CA, 2004. With permission.)
MoC B
MoC A
Q
2
Q
1
P
2
P
2
I
1
I
2
D
[2,5]
D
[2,5]
FIGURE 4.7 The channel delay can vary between 2 and 5 cycles measured in MoC B cycles. (From A. Jantsch.
Modeling Embedded Systems and SoCs. Morgan Kaufmann Publishers, San Francisco, CA, 2004. With permission.)
2006 by Taylor & Francis Group, LLC
4-20 Embedded Systems Handbook
are not separated clearly making interface design more complicated than necessary. More details about
this procedure and the example can be found in Reference 39.
4.3.3 MoC Renement
The three introducedMoCs represent three time abstractions and, naturally, designoftenstarts withhigher
time abstractions and gradually leads to lower abstractions. It is not always appropriate to start with an
untimed MoC because when timing properties are an inherent and crucial part of the functionality, a
synchronous model is more appropriate to start with. But if we start with an untimed model, we need
to map it onto an architecture with concrete timing properties. Frequently, resource sharing makes the
consideration of time functionally relevant, because of deadlock problems and complicated interaction
patterns. All the three phenomena discussed in Section 4.1.4, priority inversion, performance inversion,
and over-synchronization, emerged due to resource sharing.
Example 4.3 We discuss therefore an example for MoC renement from the untimed through the syn-
chronous to the timed MoC, which is driven by resource sharing. In Figure 4.8 we have two unlimited MoC
process pairs, which are functionally independent from each other. At this level, under the assumption of
innite buffers and unlimited resources, we can analyze and develop the core functionality embodied by
the process internal functions f and g.
In the rst renement step, shown in Figure 4.9, we introduce nite buffers between the processes. B
n,2
and B
m,2
represent buffers of size n and m, respectively. Since the untimed MoCassumes implicitly innite
buffers between two communicating processes, there is no point in modeling nite buffers in the untimed
MoCdomain. We just would not see any effect. In the synchronous MoCdomain, however, we can analyze
S
1
R
1
P
1
Q
1
P
1
=mealyU(1, f
P
1
, g
P
1
, w
P
1
)
Q
1
=mealyU(1, f
Q
1
, g
Q
1
, w
Q
1
)
R
1
=mealyU(1, f
R
1
, g
R
1
, w
R
1
)
S
2
=mealyU(1, f
S
1
, g
S
1
, w
S
1
)
FIGURE 4.8 Two independent process pairs.
B
m,2
B
n,2 P
2
Q
2
S
2
R
2
P
2
=mealyS: 2:1(f
P
2
, g
P
2
, w
P
2
)
Q
2
=mealyS(f
2
Q
, g
2
Q
, w
2
Q
)
B
n,2
=mealyS(f
2
B
n
, g
2
B
n
, w
2
B
n
)
R
2
=mealyS:2:1(f
R
2
, g
R
2
, w
R
2
)
S
2
=mealyS(f
2
S
, g
2
S
, w
S
2
)
B
m,2
=mealyS(f
2
B
m
, g
2
B
m
, w
2
B
m
)
FIGURE 4.9 Two independent process pairs with explicit buffers.
2006 by Taylor & Francis Group, LLC
Models of Embedded Computation 4-21
the consequences of nite buffers. The processes need to be rened. Processes P
2
and R
2
have to be able to
handle full buffers while processes Q
2
and S
2
have to handle empty buffers. In the untimed MoC, processes
always block on empty input buffers. This behavior can also be modeled in synchronous MoC processes
easily. In addition more complicated behavior such as time-outs can be modeled and analyzed. To nd
the minimum buffer sizes while avoiding deadlock and ensuring the original system behavior is by itself
a challenging task. Basten and Hoogerbrugge [45] propose a technique to address this. More frequently,
the buffer minimization problem is formulated as part of the process scheduling problem [46, 47].
The communication infrastructure is typically shared among many communicating actors.
In Figure 4.10 we map the communication links onto one bus, represented as process I
3
. It contains
an arbiter that resolves conicts when both processes B
n,3
and B
m,3
try to access the bus at the same time.
It also implements a bus access protocol, that has to be followed by connecting processes. The synchronous
MoC model in Figure 4.10 is cycle true and the effect of bus sharing on system behavior and perform-
ance can be analyzed. A model checker can prove and use the soundness and fairness of the arbitration
algorithm and performance requirements on the individual processes can be derived to achieve a desirable
system performance.
Sometimes, it is a feasible option to synthesize the model of Figure 4.10 directly into a hardware or
software implementation, provided we can use standard templates for the process interfaces. Alternatively
we can rene the model into a fully timed model. However, we still have various options depending
on what exactly we would like to model and analyze. For each process we can decide how much of the
timing and synchronization details should be explicitly taken care of by the process and how much can be
handled implicitly by the process interfaces. For instance in Section 4.2.5 we have introduced constructors
mealyST and mealyPT. The rst provides a process interface that strips-off all absent events and
inserts absent events at the output as needed. The internal functions have only to deal with the functional
events but they have no access to timing information. This means that an untimed mealyU process can be
directly rened into a timed mealyST process with exactly the same functions f and g. Alternatively, the
constructor mealyPT provides an interface that invokes the internal functions at regular time intervals.
If this interval corresponds to a synchronous time slot, a synchronous MoC process can be easily mapped
onto a mealyPT type of process, with the only difference, that the functions in a mealyPT process may
receive several nonabsent events in each cycle. But in both cases the processes experience a notion of time
based on cycles.
In Figure 4.11 we have chosen to rene processes P, Q, R, and S into mealyST-based processes to
keep them as similar to the original untimed processes. Thus, the original f and g functions can be used
without major modication. The process interfaces are responsible to collect the inputs, present them to
the f and g functions and emit properly synchronized output.
The buffer and the bus processes however have been mapped onto mealyPT processes. The constants
and /2 represent the cycle time for the processes. Process B
m,4
operates with half the cycle time of
B
n,3
P
3
R
3
B
m,3
I
3
S
3
I
3
=mealyS:4:2(f
3
I
, g
3
I
, w
3
I
)
Q
3
P
3
=mealyS(f
P
3
, g
P
3
, w
P
3
)
Q
2
=mealyS(f
3
Q
, g
3
Q
, w
3
Q
)
B
n,3
=mealyS:2:1(f
3
B
n
, g
3
B
n
, w
3
B
n
)
P
3
=mealyS(f
R
3
, g
R
3
, w
R
3
)
S
3
=mealyS(f
3
S
, g
3
S
, w
3
S
)
B
n,3
=mealyS:2:1(f
3
B
m
, g
3
B
m
, w
3
B
m
)
FIGURE 4.10 Two independent process pairs with explicit buffers.
2006 by Taylor & Francis Group, LLC
4-22 Embedded Systems Handbook
B
n,3
P
3
R
3
B
m,3
I
3
S
3
I
4
=mealyPT:4:2(l, f
4
I
, g
4
I
, w
4
I
)
Q
3
P
4
=mealyST(1, f
P
4
, g
P
4
, w
P
4
)
Q
2
=mealyST(f
4
Q
, g
4
Q
, w
4
Q
)
B
n,4
=mealyPT:2:1(l,f
4
B
n
, g
4
B
n
, w
4
B
n
)
R
4
=mealyST(1, f
R
4
, g
R
4
, w
R
4
)
S
4
=mealyS(1, f
4
S
, g
4
S
, w
4
S
)
B
m,4
=mealyPT:2:1( ,f
4
B
m
, g
4
B
m
, w
4
B
m
)
l
2
FIGURE 4.11 All processes are rened into the timed MoC but with different synchronization interfaces.
the other processes, which illustrates that the modeling accuracy can be arbitrarily selected. We can also
choose other process constructors and hence interfaces if desirable. For instance, some processes can be
mapped onto mealyT-type processes in a further renement step to expose them to even more timing
information.
4.4 Conclusion
We tried to motivate that MoC for embedded systems should be different from the many computational
models developed in the past. The purpose of model of embedded computation should be to support
analysis and design of concrete systems. Thus, it needs to deal with salient and critical features of embed-
ded systems in a systematic way. These features include real-time requirements, power consumption,
architecture heterogeneity, application heterogeneity, and real-world interaction.
We have proposed a framework to study different MoCs that allow us to appropriately capture
some, but unfortunately not all, of these features. In particular power consumption and other non-
functional properties are not covered. Time is of central focus in the framework but continuous
time models are not included in spite of their relevance for the sensors and actuators in embedded
systems.
Despite the deciencies of this framework we hope that we were able to argue well for a few important
points:
Different computational models should and will continue to coexist for a variety of technical and
nontechnical reasons.
To use the right computational model in a design and for a particular design task can greatly
facilitate the design process and the quality of the result. What is the right model depends on the
purpose and objectives of a design task.
Time is of central importance and computational models with different timing abstractions should
be used during system development.
2006 by Taylor & Francis Group, LLC
Models of Embedded Computation 4-23
From an MoC perspective, several important issues are open research topics and should be addressed
urgently to improve the design process for embedded systems:
We need to identify efcient ways to capture a few important nonfunctional properties in MoCs.
At least power and energy consumption and perhaps signal noise issues should be attended to.
The effective integration of different MoCs will require (1) the systematic manipulation and
renement of MoCinterfaces and interdomain protocols; (2) the crossdomain analysis of function-
ality, performance, and power consumption; (3) the global optimization and synthesis including
migration of tasks and processes across MoC domain boundaries.
In order to make the benets and the potential of well-dened MoCs available in the practical
design work, we need to project MoCs into design languages, such as VHDL, Verilog, SystemC,
C++, etc. This should be done by properly subsetting a language and by developing pragmatics to
restrict the use of a language. If accompanied by tools to enforce the restrictions and to exploit the
properties of the underlying MoC, this will be accepted quickly by designers.
In the future we foresee a continuous and steady further development of MoCs to match future
theoretical objectives and practical design purposes. But we also hope that they become better accepted
as practically useful devices for supporting the design process just like design languages, tools, and
methodologies.
References
[1] Ralph Gregory Taylor. Models of Computation and Formal Language. Oxford University Press,
New York, 1998.
[2] Peter van Embde Boas. Machine models and simulation. In J. van Leeuwen, Ed., Handbook of
Theoretical Computer Science, Vol. A: Algorithms and Complexity. Elsevier Science Publishers B.V.,
Amsterdam, 1990, chap. 1, pp. 166.
[3] S. Cook and R. Reckhow. Time bounded randomaccess machines. Journal of Computer and System
Sciences, 7:354375, 1973.
[4] B.M. Maggs, L.R. Matheson, and R.E. Tarjan. Models of parallel computation: a survey and
synthesis. In Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS),
Vol. 2, 1995, pp. 6170.
[5] S. Fortune and J. Wyllie. Parallelism in random access machines. In Proceedings of the 10th Annual
Symposium on Theory of Computing, San Diego, CA, 1978.
[6] Alok Aggarwal, Ashok K. Chandra, and Marc Snir. Communication complexity of PRAMs.
Theoretical Computer Science, 71:328, 1990.
[7] Phillip B. Gibbons, Yossi Matias, and Vijaya Ramachandran. The QRQW PRAM: accounting for
contention in parallel algorithms. In Proceedings of the 5th Annual ACM-SIAM Symposium on
Discrete Algorithms, Arlington, VA, January 1994, pp. 638648.
[8] Eli Upfal. Efcient schemes for parallel communication. Journal of the ACM, 31:507517, 1984.
[9] A. Aggarwal, B. Alpern, A.K. Chandra, and M. Snir. A model for hierarchical memory.
InProceedings of the 19thAnnual ACMSymposiumonTheory of Computing, May 1987, pp. 305314.
[10] Bowen Alpern, Larry Carter, EphraimFeig, and Ted Selker. The uniformmemory hierarchy model
of computation. Algorithmica, 12:72109, 1994.
[11] Thomas Lengauer. VLSI theory. In J. van Leeuwen, Ed., Handbook of Theoretical Computer
Science, Vol. A: Algorithms and Complexity, 2nd ed., Elsevier Science Publishers, Amsterdam,
1990, chap. 16, pp. 835868.
[12] Johan Eker, Jrn W. Janneck, Edward A. Lee, Jie Liu, Xiaojun Liu, Jozsef Ludvig,
Stephen Neuendorffer, Sonia Sachs, and Yuhong Xiong. Taming heterogeneity? The Ptolemy
approach. Proceedings of the IEEE, 91:127144, 2003.
[13] Rolf Ernst. MPSOC Performance Modeling and Analysis. Paper Presented at the 3rd International
Seminar on Application-Specic Multi-Processor SoC, Chamonix, France, 2003.
2006 by Taylor & Francis Group, LLC
4-24 Embedded Systems Handbook
[14] Gilles Kahn. The semantics of a simple language for parallel programming. In Proceedings of the
IFIP Congress 74. North-Holland, Amsterdam, 1974.
[15] Edward A. Lee and T.M. Parks. Dataow process networks. Proceedings of the IEEE, 83:773801,
1995.
[16] Jarvis Dean Brock. A Formal Model for Non-Deterministic Dataow Computation. Ph.D. thesis,
Massachusetts Institute of Technology, Cambridge, MA, 1983.
[17] J. Dean Brock and William B. Ackerman. Scenarios: a model of nondeterminate computation.
In J. Diaz and I. Ramos, Eds., Formalism of Programming Concepts, Vol. 107 of Lecture Notes in
Computer Science. Springer Verlag, Heidelberg, 1981, pp. 252259.
[18] Paul R. Kosinski. A straight forward denotational semantics for nondeterminate data ow
programs. In Proceedings of the 5th ACM Symposium on Principles of Programming Languages,
1978, pp. 214219.
[19] David Park. The fairness problem and nondeterministic computing networks. In J.W. De Baker
and J. van Leeuwen, Eds., Foundations of Computer Science IV, Part 2: Semantics and Logic.
Mathematical Centre Tracts, Amsterdam, The Netherlands, 1983, Vol. 159, pp. 133161.
[20] Robin Milner. Communication and Concurrency. International Series in Computer Science.
Prentice Hall, New York, 1989.
[21] C.A.R. Hoare. Communicating sequential processes. Communications of the ACM,
21:666676, 1978.
[22] Axel Jantsch, Ingo Sander, and Wenbiao Wu. The usage of stochastic processes in embedded
system specications. In Proceedings of the Ninth International Symposium on Hardware/Software
Codesign, April 2001.
[23] Edward Ashford Lee and David G. Messerschmitt. Static scheduling of synchronous data ow
programs for digital signal processing. IEEE Transactions on Computers, C-36:2435, 1987.
[24] Chanik Park, Jaewoong Jung, and Soonhoi Ha. Extended synchronous dataow for efcient DSP
system prototyping. Design Automation for Embedded Systems, 6:295322, 2002.
[25] Axel Jantsch and Per Bjurus. Composite signal ow: a computational model combining
events, sampled streams, and vectors. In Proceedings of the Design and Test Europe Conference
(DATE), 2000.
[26] Nicolas Halbwachs. Synchronous programming of reactive systems. In Proceedings of Computer
Aided Verication (CAV), 2000.
[27] Albert Benveniste and Grard Berry. The synchronous approach to reactive and real-time systems.
Proceedings of the IEEE, 79:12701282, 1991.
[28] Frank L. Severance. System Modeling and Simulation. John Wiley & Sons, New York, 2001.
[29] Averill M. Law and W. David Kelton. Simulation, Modeling and Analsysis, 3rd ed., Industrial
Engineering Series. McGraw Hill, New York, 2000.
[30] Christos G. Cassandras. Discrete Event Systems. Aksen Associates, Boston, MA, 1993.
[31] Per Bjurus and Axel Jantsch. Modeling of mixed control and dataow systems in MASCOT. IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, 9:690704, 2001.
[32] Peeter Ellervee, Shashi Kumar, Axel Jantsch, Bengt Svantesson, Thomas Meincke, and Ahmed
Hemani. IRSYD: an internal representation for heterogeneous embedded systems. In Proceedings
of the 16th NORCHIP Conference, 1998.
[33] P. Le Marrec, C.A. Valderrama, F. Hessel, A.A. Jerraya, M. Attia, and O. Cayrol. Hardware,
software and mechanical cosimulation for auto-motive applications. In Proceedings of the Ninth
International Workshop on Rapid System Prototyping, 1998, pp. 202206.
[34] Ahmed A. Jerraya and K. OBrien. Solar: an intermediate format for system-level modeling
and synthesis. In Jerzy Rozenblit and Klaus Buchenrieder, Eds., Codesign: Computer-Aided
Software/Hardware Engineering. IEEE Press, Piscataway, NJ, 1995, chap. 7, pp. 145175.
[35] Edward A. Lee and David G. Messerschmitt. An Overview of the Ptolemy Project. Report from
Department of Electrical Engineering and Computer Science, University of California, Berkeley,
January 1993.
2006 by Taylor & Francis Group, LLC
Models of Embedded Computation 4-25
[36] Edward A. Lee and Alberto Sangiovanni-Vincentelli. A framework for comparing models of
computation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
17:12171229, 1998.
[37] Edward A. Lee. A Denotational Semantics for Dataow with Firing. Technical report UCB/ERL
M97/3, Department of Electrical Engineering and Computer Science, University of California,
Berkeley, January 1997.
[38] Axel Jantsch and Hannu Tenhunen. Will networks on chip close the productivity gap? In Axel
Jantsch and Hannu Tenhunen, Eds., Networks on Chip, Kluwer Academic Publishers, Dordrecht,
2003, chap. 1, pp. 318.
[39] Axel Jantsch. Modeling Embedded Systems and SoCs Concurrency and Time in Models of
Computation. Systems on Silicon. Morgan Kaufmann Publishers, San Francisco, CA, 2003.
[40] D. Harel. Statecharts: A visual formalism for complex systems. Science of Computer Programming,
8:231274, 1987.
[41] G. Berry, P. Couronne, and G. Gonthier. Synchronous programming of reactive systems: an
introduction to Esterel. In Kazuhiro Fuchi and M. Nivat, Eds., Programming of Future Generation
Computers, Elsevier, New York, 1988, pp. 3555.
[42] Paul le Guernic, Thierry Gautier, Michel le Borgne, and Claude le Maire. Programming real-time
applications with SIGNAL. Proceedings of the IEEE, 79:13211336, 1991.
[43] N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data ow programming
language LUSTRE. Proceedings of the IEEE, 79:13051320, 1991.
[44] Thorsten Grtker, Stan Liao, Grant Martin, and Stuart Swan. System Design with SystemC. Kluwer
Academic Publishers, Dordrecht, 2002.
[45] Twan Basten and Jan Hoogerbrugge. Efcient execution of process networks. In Alan Chalmers,
Majid Mirmehdi, and Henk Muller, Eds., Communicating Process Architectures. IOS Press,
Amsterdam, 2001.
[46] Sundararajan Sriram and Shuvra S. Bhattacharyya. Embedded Multiprocessors: Scheduling and
Synchronization. Marcel Dekker, New York, 2000.
[47] Shuvra S. Bhattacharyya, Praveen K. Murthy, and Edward A. Lee. Software Synthesis from Dataow
Graphs. Kluwer Academic Publishers, Dordrecht, 1996.
2006 by Taylor & Francis Group, LLC
5
Modeling Formalisms
for Embedded System
Design
Lus Gomes
Universidade Nova de Lisboa and
UNINOVA
Joo Paulo Barros
Instituto Politcnico de Beja and
UNINOVA
Anik Costa
Universidade Nova de Lisboa and
UNINOVA
5.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
5.2 Notions of Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
5.3 Communication Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
5.4 Common Modeling Formalisms. . . . . . . . . . . . . . . . . . . . . . . . 5-5
Finite State Machines Finite State Machines with
Datapath Statecharts and Hierarchical/Concurrent Finite
State Machines Program-State Machines Codesign Finite
State Machines Specication and Description Language
Message Sequence Charts Petri Nets Discrete Event
Synchronous/Reactive Models Dataow Models
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-31
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-32
5.1 Introduction
The importance of the system specication phase is directly proportional to the respective system
complexity. Embedded systems have become more and more complex not only due to the increasing
systems dimension, but also due to the interactions among the different system design aspects. These
include, among others, correctness, platformheterogeneity, performance, power consumption, costs, and
time-to-market.
Therefore, a multitude of modeling formalisms have been applied to embedded system design.
Typically, these formalisms strive for a maximumof preciseness, as they rely on a mathematical (formal)
model.
Modeling formalisms are often referred as models of computation (MoC) [15]. An MoC is composed
by a notation and by the rules for computation of the behavior. Instead of notation, we talk about the
syntax of the model; the rules that dene the model semantics.
Usage of formal models in embedded systemdesign allows (at least) one of the following [2]:
Unambiguously capture the required systems functionality.
Verication of functional specication correctness with respect to its desired proprieties.
Support synthesis onto a specic architecture and communication resources.
5-1
2006 by Taylor & Francis Group, LLC
5-2 Embedded Systems Handbook
Use different tools based on the same model (supporting communication among teams involved
in design, producing, and maintaining the system).
It has to be stressed that model-based verication of proprieties is a subject of major importance in
embedded system design (and also in system design in general terms), as far as it allows to verify model
correctness even if the system does not exist (physically), or if it is difcult/dangerous/costly to analyze the
system directly. The construction of a system model brings several advantages, as it forces a more complete
system comprehension and allows the comparison of distinct approaches. Hence, it becomes easier to
identify desired and undesired system properties, as the requirements become more precise and complete.
Most modeling formalisms for embedded systems design are based on a particular diagrammatic
(or graphic) language. Despite known arguments against diagrammatic languages (e.g., Reference 6), they
are presently widely acknowledged as extremely useful and popular for software development and also
for embedded system development in general. The history of the Unied Modeling Language (UML),
Specication and Description Language (SDL), and Message Sequence Charts (MSCs) certainly proves it.
Even though diagrammatic languages are often seen as inherently less precise than textual languages, this
is certainly not true (see, e.g., References 7 and 8).
These diagrammatic representations are usually graph-based. Finite state machines (FSMs), in their
different forms (Moore, Mealy) and extensions (hierarchical and concurrent, Statecharts, etc.) are
a well-known example. The same is true for dataows and Petri nets. These formalisms offer a variety of
semantics for the modeling of time, communication, and concurrency modeling.
Besides distinct graphical syntaxes and semantics, different formalisms also have different analysis and
verication capabilities.
The plethora of MoCs ready to be used by embedded system designers means that the task of the
modeler trying to choose thebestformalismis a very difcult task. Different embedded systems can, and
often do, emphasize different aspects, namely the reactive nature associated with their behavior, the real-
time constraints, or data processing capabilities. The same happens with the available MoCs. For example,
some MoCs for embedded systems are control dominated (data processing and computationare minimal),
emphasizing the reactiveness of systems behavior. Others emphasize data processing, containing complex
data transformations, normally described by dataows. Reactive control systems are in the rst group and
digital signal-processing applications are in the second. For example, digital signal-processing applications
emphasize the usage of dataow models, whereas FSMs explicitly emphasize reactiveness. Unfortunately,
other aspects are also important to be considered when producing the model for the system; for example,
the need to model specic notions of time or different modes of communication among components may
further complicate the search for the right MoC.
So, in some embedded systemdesigns, heterogeneity in terms of the implementation platforms has to
be faced, and it is not possible to nd a unique formalismto model the whole system. In those situations,
the goal is to decompose the systems model into submodels and to pick up the right formalism for the
different submodels; although, at the end, designer has to be able to integrate all those models in a coherent
way [9]. Several formalisms allowthe modeler to partitionthe systems model and describe it as a collection
of communicating modules (components). In this sense, behaviors modeling and communication among
components are often interdependent. Yet, separating behavior and communication is a sound attitude
as it allows handling system design complexity and reusability of components. In fact, it is very difcult
to reuse components if behavior and communication are intertwined, as behavior is dependent on the
communication mechanisms with other components of the systemdesign [2].
Modeling formalisms for embedded system design have been widely studied, and several reviews
and textbooks about MoCs can be found in the literature [15]. This chapter surveys several modeling
formalisms for embedded system design taking Reference 5 as the main reference and expanding it to
encompass a set of additional modeling formalisms widely used by embedded systemdesigners in several
application areas.
The following sections address aspects of time representation and communication support. Afterwards,
several selected modeling formalisms are presented.
2006 by Taylor & Francis Group, LLC
Modeling Formalisms 5-3
5.2 Notions of Time
Embedded systems are often characterized as real-time systems. Thus the notion of time is extremely
important in many of the modeling formalisms for embedded systemdesign.
Generally speaking, we may identify three approaches to time modeling:
1. Continuous time and differential equations.
2. Discrete time and difference equations.
3. Discrete events.
The rst approach (see Figure 5.1[a]) uses differential equations to model continuous time functions.
This attitude is mostly used for the modeling of specic interface components, where the continuous
nature of signal evolution is present, such as analog circuits modeling and physical systems modeling in
a broad sense.
In the second approach (Figure 5.1[b]), it is assumed that the time is discrete; in this sense, difference
equations replace differential equations. A global clock (the tick) denes the specic points in time
where signals have values. For some applications, involving heterogeneous components, it is also useful
to consider multirate difference equations (which mean that several clock signals are available). Digital
signal processing is one of the main application areas.
In the third approach, a signal is seen as a sequence of events (see Figure 5.1[c]). This concept of
events can be associated to physical signals evolution as presented in Figure 5.2 for a Boolean signal; there
event a is associated with the rising edge of the signal x, while event x is generated at all falling edges
of signal x. Extension to other useful types of signals is straightforward, namely for signals that can hold
multivalued, enumerated or integer values. Each event has a value and a time tag. The events are processed
in a chronological order, based on a predened precedence.
If the time tags are totally ordered [10], we are in presence of a timed system: for any distinct t
1
and t
2
,
either t
1
< t
2
or t
2
< t
1
(this is called a total order). It is possible to dene an associated metric, for instance
f (t
1
, t
2
) = |t
1
t
2
|. If the metric is a continuumwe have a continuous time system. A discrete-event system
is a timed systemwhere the time tags are totally ordered.
Time (sec)
Time (sec)
A
(a) (b) (c)
A
Time
Events
a b c d e
FIGURE5.1 Time representations. (FromLus Gomes andJoo Paulo Barros, Models of Computationfor Embedded
Systems. In The Industrial Information Technology Handbook, Richard Zurawski Ed., Section VI Real Time and
Embedded Systems, chapter 83, CRC Press, Boca Raton, FL, 2005. With permission.)
Boolean
signal x
x+ x x+
t
x=0 x=0 x=1 x=1
Associated
events
Associated
holding
conditions
a a
FIGURE 5.2 Fromsignals to events and conditions.
2006 by Taylor & Francis Group, LLC
5-4 Embedded Systems Handbook
Two events are synchronous if they have the same time tag attached (they occur simultaneously).
Similarly, two signals are synchronous if for each event in one signal, there is a synchronous event in the
other signal, and vice versa. Asystemis synchronous if every signal in the systemis synchronous with every
other signal in the system. In this sense, a discrete-time systemis a synchronous discrete-event system.
Totally ordered events are used with digital hardware simulators, namely the ones associated with
VHDL and Verilog hardware description languages. Any two events are either simultaneous, which means
that they have the same time tag, or any one of them can precede the other. Events can be considered
partially ordered in the sense that the order does not include all the events in the system. When tags are
partially ordered instead of totally ordered, the system is untimed. This means that we can build several
event sequences that do not contain all the system events. These missing events are included in other
completely ordered event sequences. It is known [11] that total order of the events cannot be maintained
in distributed systems, where a partial order is sufcient to analyze system behavior. Partial orders have
also been used to analyze Petri nets [12].
An asynchronous system is a system in which no two events can have the same tag [1]. The system
is asynchronous interleaved if tags are totally ordered, and is asynchronous concurrent if tags are partially
ordered.
As time is intrinsically continuous, real systems are asynchronous by nature. Yet, synchronicity is a very
convenient abstraction, allowing efcient and robust implementations, through the use of a reference
clock signal.
5.3 Communication Support
The embedded systems complexity usually motivates their decomposition in several interacting
components. These can be more or less independent, for example, they can be executed in true concur-
rency or in an interleaved way, but probably all will have to communicate with some other components.
Therefore, communication is of topmost importance. It can be classied as implicit or explicit [3]:
Implicit communication generally requires totally ordered tag events, normally associated with
physical time. In order to support this form of communication it is necessary to have a physically
shared signal (for instance, a clock signal), whose availability may be difcult or unfeasible in a large
number of embedded systemapplications.
Explicit communication imposes an order on the events: the sender process will guarantee that all
the receiver processes are informed about some part of its internal state.
The following models of communication are normally considered:
Handshake using a synchronization mechanism; all intervening components are blocked, waiting
for conclusion.
Message passing using a sendreceive pattern where the receiver will wait for the message.
Shared variables; the blocking is decided by the control part of the memory where the shared
variable is stored.
The referred communication modes are supported by a set of communication primitives (or by some
combination), namely [3]:
Unsynchronized: producer and consumer(s) are not synchronized; there are no guarantees that
the producer does not overwrite previously produced data, or that the consumer(s) will get all
produced data.
Read-modify-write: this is the common way to get access to shared data structures from different
processes in software; access to the data structure is locked during a data access (either readwrite
or read-modify-write), it is an atomic action (indivisible, and thus uninterruptible).
Unbounded FIFO (rst in rst out) buffered: producer generates a sequence of data tokens and the
consumer will get those tokens using a FIFO discipline.
2006 by Taylor & Francis Group, LLC
Modeling Formalisms 5-5
Bounded FIFO buffered: as in the latter but the buffer size is limited, so the difference between
writings and readings will be bounded by some value. This means that writings can be blocked if
the buffer is full.
Petri net places: producers generate sequences of data tokens and consumers will read those tokens.
Rendezvous: the writing process and the reading process must simultaneously be at the point where
the write and the read occur.
5.4 Common Modeling Formalisms
Most modeling formalisms are control dominated or data dominated. However, as already referred,
embedded systems are composed of a mixture of reactive behavior, control functions, and data processing,
especially those targeted for networking and multimedia applications. In the following sections, a set of
selected formalisms are presented, taking FSMs as the starting point, which proved to be adequate for low
to medium complexity control-dominated system modeling.
We can nd in the literature numerous proposals extending FSMs in several directions. Each extension
tries to overcome one or more intrinsic FSMs shortages, from concurrency modeling inability and the
associated state-space explosion problem, to data processing modeling and the absence of hierarchical
structuring mechanisms (supporting specication at different levels of abstraction).
After control-dominated formalisms (emphasizing the reactive nature of embedded systems),
dataow-dominant formalisms will be presented.
5.4.1 Finite State Machines
Finite state machines are common computational models that have been used by system designers for
decades. It is common to represent FSM in different ways: from graphical-based representations (like
state diagrams and owcharts), to textual-based representations. In this chapter, state diagrams are used.
The modeling attitude is based on the characterization of the system in terms of the global states that the
system can exhibit, and also in terms of the conditions that can cause a change in those states (transitions
between states). A basic FSM consists of a nite set of states S (with a specied initial state, s
is
), a set of input
signals I , a set of output signals O, an output function f , and a next-state function h. Output and next-state
functions ( f and h, respectively) map a crossproduct of S and I into S and O, respectively ( f : S I S,
h: S I O). Two basic models can be considered for output modeling: Moore-type machine [13],
also called state-based FSM, where outputs are associated with state activation (and where the output
function f only maps states S into outputs O), and Mealy-type machine [14], also called transition-based
FSM, where outputs are associated with transitions between states. It is important to note that both models
have the same modeling capabilities. The referred FSMmodel can be limited or extended to accommodate
different needs (specic modeling capabilities or target architectures), as analyzed in some of the following
sections.
Figure 5.3 illustrates a basic notation for a state diagram. Circles or ellipses represent states; transitions
between states use a directed arc. Each arc has an attached expression, potentially containing reference to
the input event and/or to an external condition that will cause the change of state. Outputs can be modeled
as Moore-type output actions (associated with states, such as z in state S
2
), or as Mealy-type output events
(associated with transitions, such as x in the presented transition expression).
FSMs are a control-dominated MoC, so intrinsically adequate to model the embedded systemreactive
component.
We will introduce a running example, adapted from Reference 15, and we will start using an FSM to
model the systemcontroller. The systemto be modeled is the controller of an electric car, which is installed
in an industrial plant. The electric car has to carry goods fromone point to another, and come back. The
controller receives commands from the operator, namely actuation on key GO to start the movement
from home position, and actuation on the key BACK to force returning the car to home position after
2006 by Taylor & Francis Group, LLC
5-6 Embedded Systems Handbook
S0
a (C) / x
S1
z
S2
State
Transition
Transition
expression
FIGURE 5.3 State diagram basic notation.
Controller
M
DIR
A
B
GO
BACK
A B
GO
BACK
M
DIR
From
plant
From
operator
To
motor
(a) (b)
FIGURE 5.4 Electric car plant running example.
A=0
A=1
B=1
B=0
BACK=0
BACK=1
GO=0
(a) (b)
GO=1
S0
S1
S2
S3
M=1
M=1
DIR=right
DIR=left
M=0
M=0
A
B
BACK
GO
S0
S1
M=1
DIR=right
S3
M=1
DIR=left
M=0
S2
M=0
FIGURE 5.5 State diagram models of an electric car plant controller. (From Lus Gomes and Joo Paulo
Barros, Models of Computation for Embedded Systems. In The Industrial Information Technology Handbook,
Richard Zurawski, Ed., Section VI Real Time and Embedded Systems, chapter 83, CRC Press, Boca Raton, FL,
2005. With permission.)
end position reached. After receiving an order, the car motor is adequately activated, while the initial, or
the nal, position is not reached. There are two sensors available for detecting home and end position
reached, A and B, respectively. Figure 5.4(a) represents the external view of the controller in terms of
inputs and outputs, and Figure 5.4(b) illustrates the layout of the plant.
Figure 5.5(a) and (b) present two possible (and equivalent) models for the control of the referred
system. The rst relies on the evaluation of external conditions (signal values are explicitly checked),
while the second relies on external events (obtained through the preprocessing of external signals). It is
clear that events usage will produce a lighter model, with less arcs and inscriptions (it is assumed in this
representation that an event associated to a signal is generated when the signal changes its state from
0 to 1).
2006 by Taylor & Francis Group, LLC
Modeling Formalisms 5-7
State
variables
Output
function
Output
function
Next-state
function
Inputs
Mealy
outputs
Moore
outputs
N
e
x
t
s
t
a
t
e
C
u
r
r
e
n
t
s
t
a
t
e
FIGURE 5.6 FSM implementation reference model.
From the point of view of the implementation model, it is common to decompose the system into a set
of functions to compute next state and outputs, and a set of state variables, as presented in Figure 5.6.
From the execution semantics point of view, one of two reference approaches can be chosen (which
correspond to different MoCs) [2]:
1. Synchronous FSMs.
2. Asynchronous FSMs.
In synchronous FSMs, both computation and communication happen instantaneously at discrete-time
instants (under the control of clock ticks). In this sense, from the point of view of active state changes, each
transition arc expression is implicitly ANDed with a rising (or falling) edge event of the clock signal.
Referring to Figure 5.6, the clock signal will be connected to the State variables block (for hardware
implementations, a register will be used to implement this block, while for software implementation the
clock will be used to trigger the execution cycle). One strong aspect toward synchronous FSMs usage is its
implementation robustness, especially when using synchronous hardware. However, when heterogeneous
implementations are foreseen, some difculties or inefciencies may arise (namely synchronous clock
signal distribution). For distributed heterogeneous systems, it is also of interest to consider a globally
asynchronous locally synchronous attitude (GALS systems), where the interaction between components is
asynchronous, although the implementation of each component is synchronous. So, within a synchronous
implementation island (a component), it is possible to rely on robust compilation techniques, either to
optimally map FSMs into Boolean and sequential circuits (hardware) or into software code (supported
by specic tools).
In asynchronous FSMs, process behavior is similar to the one on synchronous FSMs, but without
dependency on a clock tick. An asynchronous systemis a systemin which two events cannot have the same
time tag. In this sense, two asynchronous FSMs never execute a transition at the same time (asynchronous
interleaving). For heterogeneous architectures or for multirate specications, implementationcanbe easier
than in synchronous case. The difculties come fromthe need to synchronize communicating transitions,
and to assure that they occur at the same instant, which is essential for a correct implementation of
rendezvous on a distributed architecture.
FSMs have well-known strengths and weaknesses. Among the strengths, we should mention that they
are simple and intuitive to understand; also that they benet from the availability of robust compilation
tools. These are some of the reasons why designers have extensively used them in the past, and continue
to use. Unfortunately, several weaknesses prevent their usage for complex systems modeling. Namely,
FSMs do not provide data processing capabilities, support for concurrency modeling, (practical) support
2006 by Taylor & Francis Group, LLC
5-8 Embedded Systems Handbook
Datapath
control
inputs
... ...
... ...
...
...
Datapath
Datapath
control
outputs
Control
part
External
control
inputs
External
data
inputs
External
control
outputs
External
data
outputs
FIGURE 5.7 Control and datapath decomposition.
for data memory, and hierarchical constructs. Several of the modeling formalisms to be presented try to
overcome one, some, or all the referred weaknesses.
5.4.2 Finite State Machines with Datapath
One common extension to FSMs trying to cope with the lack of support for data memory and data
processing capabilities are Finite State Machines with Datapath (FSMD) [16].
For instance, to model an 8-bit variable with 256 possible values through an FSM, it is necessary to use
256 states; the model looses its expressiveness and the designer cannot manage the specication.
An FSMD adds to a basic FSM a set of variables and redenes the next-state and output functions.
So, an FSMD consists of a nite set of states S (with a specied initial state, s
is
), a set of input signals I ,
a set of output signals O, a set of variables V , an output function f , and a next-state function h. The
next-state function h maps a crossproduct of S, I , and V into S ( f : S I V S). Output function f
maps current states to outputs and variables (h: S O +V ). As dened, output function f only supports
Moore-type outputs; it can also be easily extended to accommodate Mealy-type outputs.
From the implementation point of view, an FSMD model is decomposed as presented in Figure 5.7,
where the control part can be represented by a simple FSM model, and the datapath part can be charac-
terized through a register transfer architecture. So, the datapath is decomposed into a set of variables to
store operands and results, and a set of processing blocks to performcomputation on those values. It has
to be stressed that this is the common reference architecture for single-purpose processor and simple
microprocessor designs.
As a simple example, Figure 5.8 presents the decomposition associated with the modeling of a multiplier
of two numbers, A and B, producing result C through successive additions. Figure 5.8(a) presents top-level
decomposition and interconnections of control and data blocks, while Figure 5.8(c) presents a simple FSM
to model the control part and Figure 5.8(d) shows the register transfer architecture to support the required
computations (left-hand side is responsible for counting B times, while right-hand side is responsible for
the successive additions of A into C).
5.4.3 Statecharts and Hierarchical/Concurrent Finite State Machines
A second common extension to FSMs tries to cope with the lack of support for concurrency and
hierarchical structuring mechanisms of the model (still emphasizing the reactive part of the model).
2006 by Taylor & Francis Group, LLC
Modeling Formalisms 5-9
A B
C OK
GO
C=A B
C=A+A+
...
+A
B times
RA
RC
C
B
A
CB
=
RB
RA
RC
CB
RB
RA
RC
CB
=
RB
CB
RB
Control
part
Datapath
Clock
2N bits
STOP
LO AD_C
CLEAR_C
INC_B
CLEAR_B
LOAD_B
LOAD_A
N bits N bits
S0
OK
S4
OK
GO=0
GO=0
GO=1
GO=1
S1
LOAD_A
LOAD_B
CLEAR_B
CLEAR_C
STOP=1
STOP=1
STOP=0
STOP=0
S3
INC_B
LOAD_C
S2
INC_B
CLEAR_C
LOAD_C
LOAD_A
2N bits
2N bits
RC
RA
STOP
N bits
N bits
N bits
N bits
LOAD_B
CLEAR_B
INC_B
(a)
(c)
(b)
(d)
FIGURE 5.8 Decomposition of a multiplier into control and datapath.
Several formalisms can be included in the group of hierarchical/concurrent nite state machines
(HCFSMs), all of them including mechanisms for concurrency and hierarchy support, but having differ-
ent execution semantics. Among them, Statecharts [7,17] are the most well-known modeling formalism
providing a MoC to specify complex reactive systems. One main advantage of Statecharts over FSMs is the
structuring of the specication, magnifying the legibility, and improving the system maintenance. Those
characteristics were key points that supported its adoption as one of the specication formalisms within
the UML [1820].
Statecharts are based on state diagrams, plus the notions of hierarchy, parallelism, and commu-
nication between parallel components. Statecharts were informally dened in [17] as Statecharts =
state-diagrams + depth + orthogonality + broadcast-communication.
Depth concept encapsulates the multilevel hierarchical structuring mechanism and is supported by the
XOR renement mechanism, while orthogonality concept allows concurrency modeling and is supported
by the AND renement mechanism. Unfortunately, the broadcast-communication mechanism semantics
is not similar in all Statecharts variants as it was dened in different ways by several authors. This fact had
a strong impact on possible Statecharts operational semantics, as discussed later in this section.
Statecharts dene three types of state instances: the set (implementing theAND renement mechanism),
the cluster (implementing the XOR renement mechanism), and the simple state. The cluster supports the
hierarchy concept, through encapsulation of state machines. The set supports the concurrency concept,
through parallel execution of clusters.
Figure 5.9 illustrates the usage of the cluster mechanism, adopting a bottom-up approach. Starting with
the SYS_C model, the state diagramcomposed by states C and D, and associated arcs can be encapsulated
by the state A, as represented in SYS_B model. This provides us with a top-level view of the model
composed only by the states A and B, complemented by the inner level if one wants to get further details
about the system behavior, as represented in SYS_A model. In this sense, the designer has the possibility
to describe the system at different levels of abstraction. The designer is free to follow a top-down or
2006 by Taylor & Francis Group, LLC
5-10 Embedded Systems Handbook
A
B
C
D
x
y
w
z
A B
w
z
B
C
D
x y
w
z
z
SYS_C SYS_B
SYS_A
FIGURE 5.9 Usage of XOR renement in Statecharts basic models.
A
B
E
F
x
y
w
z
C
I
v p
J
D
q r
G
H
K
L
q p
M
H H*
FIGURE 5.10 Usage of AND renement in Statecharts basic models. (From Lus Gomes and Joo
Paulo Barros, Models of Computation for Embedded Systems. In The Industrial Information Technology Handbook,
Richard Zurawski, Ed., Section VI Real Time and Embedded Systems, chapter 83, CRC Press, Boca Raton, FL, 2005.
With permission.)
a bottom-up approach while producing the systems model by applying the hierarchical decomposition
constructs available through the XOR renement mechanism.
Figure 5.10 presents a simple model containing a set A composed by three AND components
(B, C, and D); whenever A is activated/deactivated the associated components B, C, and D will also
be activated/deactivated.
Apart from the referred main characteristics, the Statecharts formalism presents some interesting
features, such as:
The default state that denes which state will take control in the case where a transition reaches
a cluster state. In Figure 5.9, SYS_Bmodel, the systeminitial state is state B, and, after the occurrence
of w, states A and C will become active.
2006 by Taylor & Francis Group, LLC
Modeling Formalisms 5-11
The notion of history, simple or deep, can be associated to cluster state instances. When the system
enters a cluster with history property, the state that will be active upon entrance will be the one that
was active upon the last exit from that cluster. In the case of the rst entrance in the cluster, the
active state will be the default one. This is the case for the cluster C of Figure 5.10, which holds
the H attribute inside a circle. The history property can also be deep history, meaning that all
the clusters inside the cluster with the deep-history property, also have that property; this is the
case for cluster B in Figure 5.10, which holds the H
while performing the action a. (Sometimes the termevent is used instead of action.)
An automaton is a more special case, where the set of actions is the set of input/output values. Continuity
of time, if necessary, can be introduced by supplying actions with a duration, that is, by considering
complex actions (a, t ), where a is a discrete component of an action (its content) and t is a real number
representing the duration of a. In timed automata, duration is dened nondeterministically and intervals
for possible durations are used instead of specic moments in time.
Transition systems separate the observable part of a system, which is represented by actions, from
the hidden part, which is represented by states. Actions performed by a system are observable by an
external observer and other systems, which can communicate with the given system, synchronizing their
actions, and combining their behaviors. The internal states of a system are not observable; they are
hidden. Therefore, the representation of states can be ignored when considering the external behavior of
a system.
The activity of a system can be described by its history which is a sequence of transitions, beginning
from an initial state:
s
0
a
1
s
1
a
2
s
n
a
n1
s
n+1
A history can be nite or innite. Each history has an observable part (a sequence of actions
a
1
, a
2
, . . . , a
n
, . . .) and a hidden part (a sequence of states). The former is called a trace generated by
the initial state s
0
(in Reference 5, the term behavior is used instead of trace). Two states are called to be
trace-equivalent if the set of all traces generated by these states coincide.
A nal history cannot be continued: it is innite or for the last state s
n
in the sequence, there are
no transitions s
n
a
n
s
n+1
from this state; such a state is called a nal state. We distinguish a nal state
representing successful termination fromdeadlock states (states where one part of a systemis waiting for an
event caused by another part and the latter is waiting for an event caused by the former) and divergent or
undened states. Such states can be dened later or constitute livelocks (states that contain hidden innite
loops or innite recursive unfolding without observable actions).
Transition systems can be nondeterministic in which a system can move from a given state s into
different states performing the same action a. A labeled transition system (without hidden transitions) is
deterministic if for arbitrary transitions s
a
s
and s
a
s
, it follows that s
= s
.
6.2.2.1 Behaviors
Agents with the same behavior (i.e., agents which cannot be distinguished by observing their interaction
with other agents and environments) are considered equivalent. We characterize the equivalence of agents
in terms of the complete continuous algebra of behaviors F(A). This algebra has two sorts of elements
behaviors u F(A), represented as nite or innite trees, and actions a A, and two operations
prexing and nondeterministic choice. If a is an action and u is a behavior, prexing results in a new
behavior denoted as a u. Nondeterministic choice is an associative, commutative, and idempotent binary
operation over behaviors denoted as u +v, where u, v F(A). The neutral element of nondeterministic
choice is the deadlock element (impossible behavior) 0. The empty behavior performs no actions and
denotes the successful termination of an agent. The generating relations for the algebra of behaviors are
as follows:
u +v = v +u
(u +v) +w = u +(v +w)
u +u = u
u +0 = u
u = 0
where is the impossible action.
Both operations are continuous functions on the set of all behaviors over A. The approximation relation
is a partial order with minimal element . Both prexing and nondeterministic choice are monotonic
with respect to this approximation:
u
u v u +w v +w
u v a u a v
The algebra F(A) is constructed so that prexing and nondeterministic choice are also continuous with
respect tothe approximationandit is closedrelative tothe limits (least upper bounds) of the directedsets of
nite behaviors. Thus, we canuse the xedpoint theoremtogive a recursive denitionof behaviors starting
2006 by Taylor & Francis Group, LLC
System Validation 6-5
from the given behaviors. Finite elements are generated by three termination constants: (successful
termination), (the minimal element of the approximation relation), and 0 (deadlock).
F (A ) can be considered as a transition system with the transition relation dened by u
a
v if u can be
represented in the form u = a v + u
. The terminal states are those that can be represented in the form
u + , divergent states are that which can be represented in the form u + . In algebraic terms we can
say that u is terminal (divergent) iff u = u + (u = u + ), which follows from the idempotence of
nondeterministic choice. Thus, behaviors can be considered as states of a transition system. Let beh(s)
denote the behavior of an agent in a state s, then the behavior of an agent in state s can be represented as
the solution u
s
F (A ) of the system
u
s
=
s
a
t
a u
t
+
s
(6.1)
where
s
= 0 if s is neither terminal nor divergent,
s
= if s is terminal but not divergent,
s
= for
divergent but not terminal states, and
s
= +for states which are both terminal and divergent. If all
summands in the representation (6.1) are different, then this representation is unique up to associativity
and commutativity of nondeterministic choice.
As an example, consider the behavior u dened as u = tick.u. This behavior models a clock that
never terminates. It can be represented by a transition system with only one state u which generates the
innite history
u
tick
u
tick
The innite tree with only one path representing this behavior can be obtained as the limit of the sequence
of nite approximations u
(0)
= , u
(1)
= tick., u
(2)
= tick.tick., . . . . Now consider,
u = tick.u +stop.
This is a model of a clock which can terminate by performing the action stop, but the number of steps
to be done before terminating are not known in advance. The transition system representing this clock
has two states, one of which is a terminal state. The rst two approximations of this behavior are
u
(1)
= tick.+stop.
u
(2)
= tick.(tick.+stop.) +stop.
Note that, the second approximation cannot be written in the formtick.tick.+tick.+stop.
because distributivity of choice does not hold in behavior algebra.
u = tick.u +tick.0
describes a similar behavior but is terminated by deadlock rather than successfully.
6.2.2.2 Bisimilarity
Trace equivalence is too weak to capture the notion of the behavior of a transition system. Consider the
systems shown in Figure 6.1.
Both systems in Figure 6.1 start by performing the action a. But the system at the left-hand side has
a choice at the second step to perform either action b or c. The system on the right can only perform
an action b and can never perform c or it can only perform c and never perform b, depending on what
decision was made at the rst step. The notion of bisimilarity [7] captures the difference between these
two systems.
2006 by Taylor & Francis Group, LLC
6-6 Embedded Systems Handbook
a
c
a
b b c
a
FIGURE 6.1 Two systems which are trace equivalent but have different behaviors.
A binary relation R S S on the set of states S of a transition system without terminal and divergent
states is called a bisimulation if for each s and t such that (s, t ) R and for each a A:
1. If s
a
s
S such that t
a
t
and (s
, t
) R.
2. If t
a
t
S such that s
a
s
and (s
, t
) R.
Two states s and t are called bisimilar if there exists a bisimulation relation R such that (s, t ) R.
Bisimilarity is an equivalence relation whose denition is easily extended to the case when R is dened as
a relation between the states of two different systems, considering the disjoint union of their sets of states.
Two transition systems are bisimilar if each state of one of them is bisimilar to some state of the other.
For systems with nontrivial sets of terminal states S
, partial bisimulation is
considered instead of bisimulation. A binary relation R S S is a partial bisimulation if for all s and t
such that (s, t ) R and for all a A,
1. If s S
then t S
and if s / S
then t / S
.
2. If s
a
s
such that t
a
t
and (s
, t
) R.
3. If t
a
t
such that s
a
s
and (s
, t
) R.
A state s of a transition system S is called a bisimilar approximation of t , denoted by s
B
t , if there
exists a partial bisimulation R such that (s, t ) R. Bisimilarity s
B
t can then be introduced as the
relation s
B
t t
B
s. For attributed transition systems, the additional requirement is that if (s, t ) R,
then s and t have the same attributes.
A divergent state without transition approximates arbitrary other states that are not terminal. If s
approximates t and s is convergent (not divergent) then t is also convergent, s and t have transitions for
the same sets of actions, and satisfy the same conditions as for bisimulation without divergence. Otherwise
if s is divergent, the set of actions, for which s has transitions, is only included in the set of actions for
which t has transitions, that is, s is less dened than t . For the states of a transition system it can be
proved that
s
B
t beh(s) beh(t )
s
B
t beh(s) = beh(t )
and, therefore, the states of an agent considered up to bisimilarity can be identied with corresponding
behaviors. If S is a set of states of an agent then a set U = {beh(s)|s S} is a set of all its behaviors.
This set is transition closed which means that u U and u
a
v implies v U. Therefore, U is also a
transition system equivalent to S and can be used as a standard behavior representation of an agent.
2006 by Taylor & Francis Group, LLC
System Validation 6-7
For many applications, a weaker equivalence such as weak bisimilarity introduced by Milner [8], or
insertion equivalence as discussed in Section 6.2.3, have been considered. Note that, for deterministic
systems, if two systems are trace-equivalent, they are also bisimilar.
6.2.2.3 Composition of Behaviors
Composition of behaviors is dened as an operation over agents and is expected to preserve equivalence;
it can, therefore, also be dened as an operation on behaviors.
The sequential composition of behaviors u and v is a new behavior denoted as (u; v) and dened by
means of the following inference rules and equations:
u
a
u
(u; v)
a
(u
; v
)
(6.2)
((u +); v) = (u; v) +v (6.3)
((u +); v) = (u; v) + (6.4)
(0; u) = 0 (6.5)
We consider a transition system with states built from arbitrary behaviors over the set of states A by
means of operations of the behavior algebra F(A) and a new operation denoted as (u; v). Expressions
are considered up to the equivalence dened by the above equations (thus, the extension of a behavior
algebra by this operation is conservative). The inference rule (6.2) denes a transition relation on a set of
equivalence classes.
From rule (6.2) and equation (6.4) it follows that (; v) = v and (; v) = . One can prove that
(u; ) = u and that sequential composition is associative and distributives to the left
((u +v); w) = (u; w) +(v; w)
Sequential composition can also be dened explicitly by the following recursive denition:
(u; v) =
u
a
u
a (u
; v) +
u=u+
(; v)
6.2.2.3.1 Parallel Composition of Behaviors
We dene an algebraic structure on the set of actions A by introducing the combinator a b of actions
a and b. This operation is commutative and associative with the impossible action as the zero element
(a = ). As u = 0, there are no transitions labeled . The inference rules and equations dening
the parallel composition u v of behaviors u and v are
u
a
u
, v
b
v
, a b =
u v
ab
u
u
a
u
u v
a
u
v
u
a
u
u (v +)
a
u
u v
a
u v
v
a
v
(u +) v
a
v
(u +) (v +) = (u +) (v +) +
(u +) v = (u +) v +
u (v +) = u (v +) +
The following equations for termination constants are direct consequences of these denitions:
= = = =
0 = 0 = 0 if = +
0 = 0 = if = +
Parallel composition is commutative and associative.
Parallel composition is the primary means for describing the interaction of agents. The simplest inter-
action is interleaving, which trivially denes composition as a b = for arbitrary actions. Agents
in a parallel composition interact with each other and can synchronize via combined actions. Parallel
composition can also be dened explicitly by the following recursive denition:
(u v ) =
u
a
u
v
b
v
(a b ) (u
) +
u
a
u
a (u
v ) +
v
b
v
b (u v
) +
u
v
where
u
is a termination constant in the equational representation of behavior u.
6.2.3 Environments
An environment E is an agent over an action algebra C with an insertion function. All states of the
environment are initial states. The insertion function, denoted by e[u] takes an argument e (the behavior of
an environment) and the behavior of an agent over an action algebra A in a given state u (the action algebra
of agents may be a parameter of the environment) and yields a new behavior of the same environment.
The insertion function is continuous in both its arguments.
We consider agents up to a weaker equivalence than bisimilarity. Consider the example in Figure 6.2.
Clearly, these systems are not bisimilar. However, if a represents the transmition of a message, and b
represents the reception of that message, the second trace on the left-hand side gure would not be
possible within an environment that supports asynchronous message passing. Consequentially, both
systems would always behave the same. Insertion equivalence captures this difference: the environment
can impose constraints on the inserted agent, such as disallowing the behavior b a, in this example. In
such environment, both behaviors shown in Figure 6.2 are considered equivalent.
Insertion equivalence depends on the environment and its insertion function. Two agents u and v are
insertion equivalent with respect to an environment E, written as u
E
v, if for all e E, e[u] = e[v].
Each agent u denes a transformation on the set of environment states; two agents are equivalent with
respect to a given environment if they dene the same transformation of the environment.
2006 by Taylor & Francis Group, LLC
System Validation 6-9
a
a
b
b
a
b
FIGURE 6.2 Two systems which are not bisimilar, but may be insertion equivalent.
External environment
Insertion function
Agent
E
n
v
i
r
o
n
m
e
n
t
FIGURE 6.3 Agents in environment.
After insertion of an agent into an environment, the new environment is ready to accept new agents to
be inserted. Since insertion of several agents is a common operation, we shall use the notation
e[u
1
, . . . , u
n
] = e[u
1
] [u
n
]
as a convenient shortcut for insertion of several agents.
In this expression, u
1
, . . . , u
n
are agents inserted into the environment simultaneously, but the order of
insertion may be essential for some environments. If we wanted an agent u to be inserted after an agent v,
we must nd some transition e[u]
a
s and consider the expression s[v]. Some environments can move
independently, suspending the actions of an agent inserted into it. In this case, if e[u]
a
e
[u], then
e
[u, v] describes the simultaneous insertion of u and v into the environment in state e
as well as the
insertion of u when the environment is in a state e and is followed by the insertion of v.
An agent can be inserted into the environment e[u
1
, u
2
, . . . , u
n
], or that environment can itself be
considered as an agent which can be inserted into a new external environment e
(z)[G
(z)]
F(x)[G(y)] F
(z)[G
(z)]
where x = (x
1
, . . . , x
n
), y = (y
1
, . . . , y
n
), z = (x
1
, x
2
, . . . , y
1
, y
2
, . . .), x
1
, x
2
, . . . , y
1
, y
2
are action or
behavior variables, F, G, F
, G
are expressions in the behavior algebra, that is, expressions built by non-
deterministic choice and prexing. More complex rules allow arbitrary expressions on the right-hand
side in the behavior algebra extended by insertion as two sorted operation. The rst type of rule denes
observable transitions
F(x)[G(y)]
d
F
(z)[G
(z)]
The second type of rule denes unlabeled transitions which can be used as auxiliary rules. They are not
observable outside the environment and can be reduced by the rule
e[u]
e
[u
], e
[u
]
d
e
[u
]
e[u]
d
e
[u
]
where
means the transitive closure of unlabeled transitions. Special rules or equations must be added
for termination constants. Rewriting rules must be left linear with respect to the behavior variables, that is,
none of the behavior variables can occur more than once in the left-hand side. Additional completeness
conditions must be present to ensure all possible states of the environment are covered by the left-
hand side of the rules. Under these conditions, the insertion function will be continuous even if there
are innitely many rules. This is because, to compute the function e[u] one needs to know only some
nite approximations of e and u. If e and u are dened by means of a system of xed point equations,
these approximations can be easily constructed by unfolding these equations sufciently many times.
Insertion functions that are dened by means of rewriting rules can be classied on the basis of the
height of terms F(x) and G(y) in the left-hand side of the rules. The simplest case is when this height
is no more than 1, that is, terms are the sum of variables and expressions of the form c z, where c is
an action, and z is a variable. Such insertion functions are called one-step insertions, other important
classes are head insertion and look-ahead insertion functions. For head insertion the restriction on the
height should not exceed 1 which refers only to the agent behavior term G(y). The term F(x) can be
of arbitrary height. Head insertion can be reduced to one-step insertion by changing the structure of
the environment but preserving the insertion equivalence of agents. In head insertion, the interaction
between the environment and agent is similar to the interaction between the server and the client: a server
has information only about the next step in the behavior of the client but knows everything about its own
behavior. In a look-ahead insertion environment, the behavior of an agent can be analyzed for arbitrary
long (but nite) future steps. We can liken such environment to the interaction between an interpreter
and a program.
2006 by Taylor & Francis Group, LLC
System Validation 6-11
We consider a one-step insertion which is applied in many practical cases by restricting ourselves to
purely additive insertion functions that satisfy the following conditions:
e
i
[u] =
e
i
[u]
e
u
i
e[u
i
]
Given two functions D
1
: A C 2
C
and D
2
: C 2
C
, the transition rules for insertion functions are
u
a
u
, e
c
e
, d D
1
(a, c)
e[u]
d
e
[u
]
e
c
e
, d D
2
(c)
e[u]
d
e
[u]
We refer to D
1
and D
2
as residual functions. The rst rule (interaction rule) denes the interaction between
the agent and the environment which consists of choosing a matching pair of actions a A and c C.
Note that, the environment and the agent move independently. If the choice of action is made rst by
the environment, then the choice of action c by the environment denes a set of actions that the agent
may take: a can be chosen only so that D
1
(a, c) = . The observable action d must be selected from the
set D
1
(a, c). This selection can be restricted by the external environment if e[u] considered as an agent
is inserted into the environment by other agents inserted into environment e[u] after u. This rule can be
combined with rules for unobservable transitions if some action, say (as in Milner CCS), is selected in C
to hide the transition. For this case we formulate the interaction rule to account for hidden interactions.
u
a
u
, e
c
e
, D
1
(a, c)
e[u] e
[u
]
The second rule (environment move rule) describes the case when the environment transitions inde-
pendently of the inserted agent and the agent is waiting until the environment will allow it to move.
Unobservable transitions can also be combined with environment moves. Some equations should be
added for the case when e or u are termination constants. We shall assume that [u] = , 0[u] = 0,
e[] = e, e[] = , and e[0] = 0. There are no specic assumptions about [u] but usually neither
nor 0 belong to E. Note that, in the case when E and [u] = u, insertion equivalence coincides
with bisimulation. The denition of the insertion function for one-step insertion discussed earlier will be
complete, if we assume that there are no transitions other than those dened by the rules.
The denition above can be expressed in the form of rewriting rules as follows:
d D
1
(a, c) (c x)[a y] d y
d D
2
(c) (c x)[y] d x[y]
and in the form of explicit recursive denition as
e[u] =
e
c
e
u
a
u
dD
1
(a,c)
d e
[u
] +
e
c
e
dD
2
(c)
d e
[u] +
e
[u]
2006 by Taylor & Francis Group, LLC
6-12 Embedded Systems Handbook
To compute transitions for the multiagent environment e[u
1
, u
2
, . . . , u
n
] we recursively compute
transitions for e[u
1
], then for e[u
1
, u
2
] = (e[u
1
])[u
2
], and eventually for e[u
1
, u
2
, . . . , u
n
] =
(e[u
1
, u
2
, . . . , u
n1
])[u
n
].
Important special cases of one-step insertion functions are parallel and sequential insertion. An
insertion function is called a parallel insertion if
e[u, v] = e[u v]
This means that the subsequent insertion of two agents can be replaced by the insertion of their parallel
composition. The simplest example of a parallel insertion is dened as e[u] = e u. This special case
holds when the sets of actions of environment and agents are the same (A = C), b = D
1
(a, a b), and
D
2
(a) = A. In the case when E, this environment is a set of all other agents interacting with a given
agent in parallel, and insertion equivalence coincides with bisimilarity. Sequential insertion is introduced
in a similar way:
e[u, v] = e[u; v]
This situation holds, for example, when D
1
(a, c) = , D
2
(c) = C, and [u] = u.
6.2.3.2 Example: Agents over a Shared and Distributed Store
As an example, consider a store, which generalizes the notions of memory, data bases, and other inform-
ation environments used by programs and agents to hold data. An abstract store environment E is an
environment over an action algebra C, which contains a set of actions A used by agents inserted into this
environment. We shall distinguish between local and shared store environments. The former can interact
with an agent inserted into it while this agent is not in a nal state and, if another agent is inserted into this
environment, the activity of the latter is suspended until the former completes its work. The shared store
admits interleaving of the activity of agents inserted into it, and they can interact concurrently through
this shared store.
6.2.3.2.1 Local and Shared Store
The residual functions for a local store are dened as:
D
1
(a, c) = {d|c = a d}, where d = for d C\A or d = otherwise, and D
2
(c) = C
and for a shared store as
D
1
(a, c) = {d|c = a d}, where d = , d C, and D
2
(c) = C.
It can be proved that the one-step insertion function for a local store is a sequential insertion and that
one-step insertion for a shared store is a parallel insertion. In other words,
e[u
1
, u
2
, . . .] = e[u
1
; u
2
; . . .]
for a local store, and
e[u
1
, u
2
, . . .] = e[u
1
u
2
. . .]
for a shared store. The interaction move for the local store is dened as
u
a
u
, e
ad
e
e[u]
d
e
[u
]
2006 by Taylor & Francis Group, LLC
System Validation 6-13
When the store moves according to this rule, an agent inserted into it plays the role of control for this
store. A store in a state e[u] can only perform actions which are allowed by the agent u. This action can be
combined only with an action d which is not from the action set A and cannot be used by another agent
in a transition. The actions returned by the residual function are external actions and can be observed and
used only from outside the store environment.
Different from a local store, in a shared store environment several agents can perform their actions in
parallel according to the rule
u
1
a
1
u
1
, . . . , u
n
a
n
u
n
, e
a
1
a
n
d
e
e[u
1
u
n
v]
d
e
[u
1
u
n
v]
An important special case of the store environment E is a memory over a set of names R and a data
domain D. The memory can be represented by an attributed transition systemwith attributes R and states
e : R R
. Agent actions are assignments and conditions, and their combinations are possible if they can
be performed simultaneously. If a is a set of assignments, then in a transition e
a
e
the state e
results
from applying a to e. The conjunction of conditions c enables a transition e
ca
e
if c is valid on e and
e
a
e
.
6.2.3.2.2 Multilevel Store
For a shared memory store the residual action d in the transition
e[u
1
u
n
v]
d
e
[u
1
u
n
v]
is intended to be used by external agents inserted later, but in a multilevel store it is convenient to restrict
the interaction with the environment to a given set of agents which have already been inserted. For this
purpose, a shared memory can be inserted into a higher level closure environment with an insertion
function dened by the equation
g[e[u]][v] = g[e[u v]]
where g is a state of this environment, e is a shared memory environment, and only the following two
rules are used for transitions in the closure environment:
e[u]
c
e
[u
], c C
ext
c =
g[e[u]]
ext
(c,e)
g[e
[u
]]
e[u]
e
[u
]
g[e[u]] g[e
[u
]]
Here C
ext
is a distinguished set of external actions. Some of external actions can contain occurrences
of names from e. The function
ext
substitutes the values of these names in c and performs other
transformations to make an action be observable for external environment.
Two level insertions can be described in the following way. Let R = R
1
R
2
be divided into two
nonintersecting parts: the external and internal memory. Let A
1
be the set of actions which change only
the values of R
1
, but can use the values of R
2
(external output actions), let A
2
be the set of actions which
change only the values of R
2
, but can use the values of R
1
(external input actions), and A
3
be the set
of actions which change and use only the values of R
2
(internal actions). These sets are assumed to be
dened on the syntactical level. Redene the residual function D
1
and transitions of E: let a A and split
a into a combination of actions
1
(a)
2
(a)
3
(a) so that
1
(a) A
1
,
2
(a) A
2
, and
3
(a) A
3
2006 by Taylor & Francis Group, LLC
6-14 Embedded Systems Handbook
(some of these actions may be equal to ). Dene the interaction rule in the following way:
u
a
u
, e
(
2
(a))
3
(a)
e
e[u]
c
1
(a)
e
[u
]
where is anarbitrary substitutionof names used inconditions and inthe right-hand sides of assignments
of
2
(a) into the set of their values, b is an application of the substitution to b, c
is a substitution
written in the form of the condition r
1
= (r
1
) r
2
= (r
2
) . Dene
ext
(b, e) = be, that is
a substitution of the values of R
2
to b.
Consider a two level structure of a store state
t [g[e
1
[u
1
]] g[e
1
[u
1
]] ]
where t D
R
1
is a shared store and e
1
, e
2
, . . . D
R
2
represent the distributed store (memory). When the
component g[e
i
[u
i
]] performs internal actions these are hidden and do not affect the shared memory. Per-
forming external output actions change the names of the sharedmemory andexternal input actions receive
values from the shared memory to change components of the distributed memory. This construction is
easily iterated as the components of a distributed memory can have multilevel structure.
6.2.3.2.3 Message Passing
Distributed components can interact via shared memory. We nowintroduce direct interaction via message
passing. Synchronous communicationcanbe organized by extending the set of actions with a combination
of actions in parallel composition independently of the insertion function. To describe synchronous data
exchange in the most general abstract schema, let
u =
dD
a(d) F(d) u
D
a
(d
) F
(d
)
be two agents which use data domain D for the exchange of information. Functions a and a
is
u u
a(d)a
(d
)=
a(d) a
(d
)(F(d) F
(d
)) +
dD
a(d)(F(d) u
) +
D
a
(d
)(F
(d
) u)
(note that,
u
=
u
= 0, i.e., this is a special case of parallel composition where there are no termination
constants). The rst summand corresponds to the interaction of two agents. The other two summands
reect the possibility of interleaving. The interaction can be deterministic even if u and u
are non-
deterministic if a(d) a
(d
is embedded into another parallel composition. They can also be hidden by a closure
environment (similar to restriction in Calculus of Concurrent Systems, CCS).
The exchange of information through combination is bidirectional. An important special case
of information exchange is the use of send/receive pairs. For example, consider the following
combination rule
send(addr, d)receive (addr
, d
)=
exch(addr), if addr=addr
, d=d
, otherwise
In the latter case, if
u = send(addr, d) v
2006 by Taylor & Francis Group, LLC
System Validation 6-15
and
u
D
receive (addr, d
) F(d
)
the interaction summand of the parallel composition will be exch(addr) (v F(d)).
Asynchronous message passing via channels can be described by introducing a special communication
environment. The attributes of this environment are channels and their values are sequences (queues)
of stored messages. It is organized similarly to the memory environment but queue operations are used
instead of storing. In addition, send and receive actions are separated in time. This environment is a
special case of a store environment and can be combined with a store environment keeping separate the
different types of attributes and actions.
6.2.4 Classical Theories of Concurrency
The theory of interaction of agents and environments [911] focuses on the description of multi-agent
systems comprised of agents cooperatively working within a distributed information environment.
Other mathematical models for specications of dynamic and real time systems interacting with envir-
onments have been developed based on process algebras (CSP, CCS, ACP, etc.), automata models (timed
Bchi and Muller automata, abstract state machines [ASM]), and temporal logic (LPTL, LTL, CTL, CTL
).
New models are being developed to support different peculiarities of application areas, such as Milners
-calculus [12] for mobility and its recent extension to object-oriented descriptions.
The environment may change the predened behavior of an agent. For example, it may contain
some other agents designed independently and intended to interact and communicate with the agent
during its execution. The classical theories of communication consider this interaction as part of the
parallel composition of agents. The inuence of the environment can be expressed as an explicit language
operation such as restriction (CCS) or hiding (CSP).
In contrast to the classical theories of interaction which are based on an implicit and hence not
formalized notion of an environment, the theory of interaction of agents and environments studies them
as objects of different types. In our approach the environment is considered as a semantic notion and is
not explicitly included in the agent. Instead, the meaning of an agent is dened as a transformation of an
environment which corresponds to inserting the agent into its environment. When the agent is inserted
into the environment, the environment changes and this change is considered to be a property of the agent
described.
6.2.4.1 Process Algebras
An algebraic theory of concurrency and communication that deals with the occurrence of events rather
than with updates of stored values is called a process algebra. The main variants of process algebra are
generally known by their acronyms: CCS [8] Calculus of Concurrent Systems developed by Milner,
CSP [13] Hoares Communicating Sequential Processes, and ACP Algebra of Communicating
Processes of Bergstra and Klop [14]. These theories are based on transition systems and bisimulation,
and consider interaction of composed agents. They employ nondeterministic choice as well as parallel
and sequential compositions as primitive constructs. The inuence of the environment on the system
may be expressed as an explicit language operation, such as restriction in CCS or hiding in CSP. These
theories consider communicating agents as objects of the same type (this type may be parameterized by
the alphabets for events or actions) and dene operations on these types.
The CCS model species sets of states of systems (processes) and transitions between these states.
The states of a process are terms and the transitions are dened by the operational semantics of the
computation, which indicates how and under which conditions a term transforms itself into another
term. Processes are representedby the synchronizationtree (or process graph). Twoprocesses are identied
through bisimulation.
2006 by Taylor & Francis Group, LLC
6-16 Embedded Systems Handbook
CCS introduces a special action , called the silent action, which represents an internal and invisible
transition within a process. Other actions are split into two classes: output actions, which are indicated by
an overbar, and input actions, which are not decorated. Synchronization only takes place between a single
input and a single output, and the result is always the silent action . Thus, a a = , for all actions a.
Consequentially, communication serves only as synchronization; its result is not visible.
The -calculus [12] is an enhancement of CCS and models concurrent computation by processes
that exchange messages over named channels. A distributed interpretation of the -calculus provides for
synchronous message passing and nondeterministic choice. The -calculus focuses on the specication
of the behavior of mobile concurrent processes, where mobility refers to variable communication via
named channels, which are the main entities in the -calculus. Synchronization takes place only between
two channel agents when they are available for interchange (a named output channel is indicated by an
overbar, while an input channel with the same name is not decorated). The inuence of the environment
in the -calculus is expressed as an explicit operation of the language (hiding). As a result of this operation,
a channel is declared inaccessible to the environment.
CSP explicitly differentiates the set of atomic actions that are allowed in each of the parallel processes.
The parallel combinator is indexed by these sets: when (P
{A}
Q
{B}
), P engages only in events from the set
A, and Q only in events from the set B. Each event in the intersection of A and B requires a synchronous
participation of both processes, whereas other events only require participation of the relevant single
process. As a result, a a = a, for all actions a. The associative and commutative binary operator
describes howthe output data supplied by two processes is combined before transmissionto their common
environment.
In CSP, a process is considered to run in an environment which can veto the performance of certain
atomic actions. If, at some moment during the execution, no action, in which the process is prepared to
engage in, is allowed by the environment, then a deadlock occurs, which is considered to be observable.
Since in CSP a process is fully determined by the observations obtainable from all possible nite interac-
tions, a process is represented by its failure set. To dene the meaning of a CSP program, we determine
the set of states corresponding to normal termination of the program, and the set of states corresponding
to its failures. Thus, the CSP semantics is presented in model-theoretic terms: two CSP processes are
identied if they have the same failure set (failure equivalence).
The main operations of ACP are prexing and nondeterministic choice. This algebra allows an event
to occur with the participation of only a subset of the concurrently active processes perhaps omitting
any that are not ready. As a result, the parallel composition of processes is a mixture of synchronization
and interleaving, where each of the processes either occurs independently or is combined by with a
corresponding event of another process. The merge operator is dened as
Merge(a, b) = (a b) +(a; b) +(b; a).
ACP denes its semantics algebraically; processes are identied through bisimulation.
Most differences between CCS, ACP, and CSP can be attributed to differences in the chosen style of
presentation of the semantics: the CSP theory provides a model, illustrated with algebraic laws. CCS is
a calculus, but the rules and axioms in this calculus are presented as laws, valid in a given model. ACP is a
calculus that forms the core of a family of axiomatic systems, eachdescribing some features of concurrency.
6.2.4.2 Temporal Logic
Temporal logic is a formal specication language for the description of various properties of systems.
A temporal logic is a logic augmented with temporal modalities to allow a specication of the order of
events in time, without introducing time explicitly as a concept. Whereas traditional logics can specify
properties relating to the initial and nal states of terminating systems, a temporal logic is better suited to
describe the on-going behavior of nonterminating and interacting (reactive) systems.
As an example, Lamports TLA (Temporal Logic of Actions) [5,15] is based on Pnuelis temporal logic
[16] with assignment and an enriched signature. It supports syntactic elements taken from programming
2006 by Taylor & Francis Group, LLC
System Validation 6-17
languages to ease maintenance of large-sized specications. TLA uses formulae on behavior, which are
considered as a sequence of states. States in TLA are assignments of values to variables. A system satises
a formula iff that formula is true in all behaviors of this system. Formulae where the arguments are only
the old and the new states are called actions.
Here, we distinguish between linear and branching temporal logics. In a linear temporal logic, each
moment of time has a unique possible future, while in branching temporal logic, each moment of time
may have several possible futures. On one hand, linear temporal logic formulae are interpreted over linear
sequences of points in time and specify the behavior of a single computation of a system. Formulae of
a branching temporal logic, on the other hand, are interpreted over tree-like structures, each describing
the behavior of possible computations of a nondeterministic system.
Many temporal logics are decidable and corresponding decision procedures exist for linear and branch-
ing time logics [17], propositional modal logic [18], and some variants of CTL
has two readings. Computationally, it means that a fragment of a system state that is an instance of
the pattern t can change to the corresponding instance of t
)((t s
> t s) P time s
)
Statements, which do not explicitly mention state or time are considered as referring to an arbitrary
current state or an arbitrary current moment of time. Statements with Kleene operations refer to discrete
time and are reduced to logical statements as follows:
(P
1
P
2
. . .
P
n
) time t (P
1
time (t n +1)) (P
2
time (t n +2)) (P
n
time t )
It(P) time t (s t )(s
)((t s s
t ) P time s
)
6.4.4 Example: Railroad Crossing Problem
The railroad crossing problem is a well-known benchmark to assess the expressiveness of development
techniques for interactive systems. We illustrate the description of a synchronous system(in discrete time)
2006 by Taylor & Francis Group, LLC
6-32 Embedded Systems Handbook
relying on duration functionals. The problem statement is to develop a control device for a railroad
crossing so that safety and liveness conditions are satised. This system has three components, as shown
in Figure 6.4).
The n-track railroad has the following observable attributes: InCr is a Boolean variable equal to 1 if
a train is at the crossing; Cmg(i) is a Boolean variable equal to 1 if a train is coming on track number i.
At the moment this attribute becomes equal to 1, the time left until the train will reach the crossing is not
less than d_min and it remains 1 until the train reaches the crossing. Cmg(i) is an input signal to the
controller which has a single output signal DirOp. When DirOp equals to 1, the gate starts opening, and
when it becomes 0, the gate starts closing. The attribute gate shows the position of the gate. It is equal
to open when the gate is completely opened and closed if it is completely closed. The time taken for
the gate to open is d_open, the time taken to close is d_close. The requirements text below omits the
straightforward static requirements. The dynamic properties of the system are safety and liveness. Safety
means that when the train is at the crossing, the gate is closed. Liveness means that the gate will open
when the train is at a safe distance (Code 6.1).
n-track railroad Controller Gate
InCr Gate
Cmg DirOp
n
FIGURE 6.4 Railroad crossing problem.
Code 1
parameters(
d_min,
d_close,
d_open,
WT );
attributes(n:int)(
InCr:bool,
Cmg(n):bool,
DirOp:bool,
gate );
let C1:(d_min>d_close);
let C2:(d_close>0);
let Duration Theorem: Forall(x,d)(
always(dur Cmg(x) > d -> (DirOp))->
always(dur (DirOp) > dur Cmg(x)+(-1)*(d+1)) );
/* ------------- Environment spec ------------------------ */
let CrCm: always(InCr->Exist x (dur Cmg(x) > d_min));
let OpnOpnd:always( dur DirOp >d_open ->(gate=opened));
let ClsClsd:always( dur (DirOp)>d_close->(gate=closed));
/* ------------ Controller spec ------------------------ */
let Contr1: always(Exist x (dur Cmg(x) > WT ) -> (DirOp));
let Contr2: always(Forall x (WT >= dur Cmg(x)) -> DirOp );
/* ------------- Safety and Liveness --------------------- */
let(WT=d_min+(-1)*d_close);
prove Safety: always(InCr->(gate=closed));
prove Liveness: always(
Forall x ((WT > dur Cmg x) -> (gate=opened)));
2006 by Taylor & Francis Group, LLC
System Validation 6-33
Note the assumption of the Duration Theorem in the requirements to shorten the proofs of safety and
liveness.
6.4.5 Requirement Specications
The example in Section 6.4.4 is rather simple in the number of requirements. Requirement specications
used in practice to describe embedded systems are typically much more complex. Requirement specic-
ations may consist of hundreds or thousands of static requirements, and a large domain descriptions
through attributes and parameters. Each requirement is usually simple but taken together the resultant
behavior may be complex and contain inconsistencies or be incomplete.
We use attributed transition systems to describe the requirements for embedded systems. The formal
specication of requirements consists of the environment description, the description of common system
properties in the form of axioms, the insertion function dened by static requirements, and intended
properties of the system as a whole dened as dynamic requirements.
A typed list of system parameters and a typed list of system attributes are used to describe the structure
of the environment. The parameters of the system are variables, which have inuence on the behavior of
the environment and can change their values from one conguration of the system to another but they
never change their value during the execution of the system. Examples of system parameters are the set of
tasks for an embedded operating system, the bus threshold for a device controller, etc. System attributes
are variables that differ between the observable states of the environment. Attributes may change their
values during runtime. Examples of attributes are the queue of tasks, which are ready to be executed by
the operating system, or the current data packet for a device controller.
As an example, we consider (in simplied form) several fragments of the formalized requirements for
an embedded operating system for automotive electronics, OSEK [103]. A typed list of system parameters
and a typed list of system attributes describes the structure of the environment (Code 6.2).
Code 2
parameters (
tasks: Set of name,
resources: Set of name
);
attributes (
suspended: Set of name,
ready: Set of name,
running: name
);
Parameters of the system are variables, which have inuence on the behavior of the environment and
can change their values from one conguration of the system to another, but never change their value
during the execution of the system.
The operating system (environment) and executing tasks (agents) interact via service calls. The list of
actions contains the names of the services dened provided by the system, including service parameters,
if any (Code 6.3).
Common system properties are dened as a propositions in rst-order logic extended with temporal
modalities. For example, consider the following requirements: the length of the queue of suspended
tasks can never be greater than the set of dened tasks. We will formalize this requirement as follows
(Code 6.4).
To dene the transitions of the system when processing the request for a service we use Hoare-style
triples notation, as dened above (Code 6.5).
2006 by Taylor & Francis Group, LLC
6-34 Embedded Systems Handbook
Code 3
a: name) (
Activate a,
Terminate,
Schedule );
Code 4
Let SuspendedLengthReq:
Always ((length(suspended)<|tasks|) |/ (length(suspended) = |tasks|));
Code 5
req Activate1: Forall (a:name, s: Set of name, r: Set of name) (
( (suspended = s) & (ready = r) & (a in s) )
-> after (Activate a)
( (suspended = (s setminus a)) & (ready = (r union a)) ));
The insertion function expressed by this rule is sequential, in that only one running task can be
performed at a time, all others are in a state suspended or ready. A task becomes running as a result of
performing a schedule action. It is selected from a queue of ready tasks ordered by priorities. Agents can
change the behavior of the environment by service requests. The interaction between the environment
and the agents is dened by an insertion function, which computes the new behavior of the environment
with inserted agents.
The part of the description of requirements specic to sequential environments is the denition of the
interaction of agents and environments, where this interaction is described by the insertion function. The
most straightforward way to dene this function is through interactive requirements: an action is allowed
to be processed if and only if the current state of the environment matches one of the preconditions for
service requests. This is denoted as E-(act)->E intuitively meaning that the environment E allows
the action act and if it will be processed then the environment will be equal to E
.
The agent (composition of all agents) interacting with the environment requests the service act if and
only if it transits from its current state u into state u
cannot be
true simultaneously.
Static requirements for synchronous systems can use Kleene expressions over conditions and duration
functions with numeric inequalities in preconditions. These requirements are converted into standard
form with logic statements relating to adjacent time intervals.
2006 by Taylor & Francis Group, LLC
System Validation 6-37
S1
S2
S3
t1 t2 t3
FIGURE 6.6 Sample wave diagram.
6.4.6 Reasoning about Embedded Systems
The theory of agents and environments has been implemented in the system 3CR [104]. The kernel of
our system [105] consists of a simulator for a generic Action Language (AL) [10,11] for the description of
system behaviors, of services for automatic exploration of the behavior tree of a system, and of a theorem
prover for rst-order predicate logic, enrichedwitha theory of linear equations andinequalities. It provides
the following technologies supporting the development, verication, and validation of requirements for
embedded systems:
Prove the internal consistency and completeness of static requirements of a system.
Prove dynamic properties of the system dened by static requirements including safety, liveness,
and integrity conditions.
Translate systems described in standard engineering languages (e.g., MSC, SDL, or wave
diagrams) into the rst-order format described earlier and simulate these models in user-dened
environments.
Generate test suites for a system dened by veried requirements specications and validate the
implementations of the system against these test cases.
These facilities can be used in automated as well as in interactive mode. To determine consistency and
completeness of requirements for interactive systems we rely on the theory of interaction of agents and
environments as the underlying formal machinery.
6.4.6.1 Algebraic Programming
The mathematical models described in Section 6.2 can be made more concrete by imposing structure on
the state space of transition systems. An universal approach is to consider an algebraic structure of the set
of states of a system. Then states are represented by algebraic expressions and transitions can conveniently
be dened by (conditional) rewriting rules. Acombination of conditional rewriting rules with congruence
on the set of algebraic expressions can be dened in terms of rewriting logic [32].
Most modern rewriting techniques are considered primarily in the context of equational theories
but could also be applied to rst-order or higher-order clausal or nonclausal theorem proving. The
main disadvantage of computations with such systems is their relatively weak performance. For instance,
rewriting modulo associativity and commutativity (AC-matching) is NP-complete. Consequentially, these
systems are usually not powerful enough when real-life problems are considered.
Our environment [105] supports reasoning in noncanonical rewriting systems. It is possible to combine
arbitrary systems of rewriting rules with different rewrite strategies. The equivalence relation (basic
congruence) on a set of algebraic expressions is introduced by means of interpreters for operations
2006 by Taylor & Francis Group, LLC
6-38 Embedded Systems Handbook
which dene a canonical form. The primary strategy of rewriting is a one-step syntactic rewriting with
postcanonization by means of reducing the rewritten node to this canonical form. All other strategies are
combinations of the primary strategy with different traversals of the tree representing a term structure.
Rewrite strategies can be chosen from the library of strategies or written as procedures or functions.
The generic AL [11,106] is used for the syntactical representation of agents as programs and is based
on the behavior algebra dened in Section 6.2. The main syntactic constructs of AL are prexing, non-
deterministic choice, sequential composition, and parallel composition. Actions and procedure calls are
primitive statements. It provides the standard termination constants (successful termination, divergence,
deadlock). The semantics of this language is parameterized by an intensional semantics dened through
an unfolding function for procedure calls and an interaction semantics dened by the insertion function
of an environment into which the programwill be inserted. The intensional semantics and the interaction
semantics are dened as systems of rewriting rules.
The intensional semantics of an AL program is an agent which is obtained by unfolding procedure calls
in the program and dening transitions on a set of program states. It is dened independently of the
environment by means of rewriting rules for the unfolding function (unfolding rules) up to bisimulation.
The left-hand side of an unfolding rule is an expression representing a procedure call. The right-hand side
of an unfolding rule is an AL program which may be unfolded further generating more and more exact
approximations of the behavior under recursive computation.
The only built-in compositions of AL are prexing and nondeterministic choice. The unfolding of
parallel and sequential compositions are exible and can be adjusted by the user. Alternatives for parallel
composition are dened by the choice of the combination operator. For example, when the combination
of arbitrary actions is the impossible action, parallel composition is reduced to interleaving. On the other
hand, exclusion of interleaving from the unfolding rules denes parallel composition as synchronization
at each step (similar to hand shaking in Milners -calculus).
The interaction semantics of AL programs is dened through the insertion function. Programs are
considered up to insertion equivalence. Rewriting rules which dene the insertion function (insertion
rules) have the following structure: the left-hand side of an insertion rule is the state or behavior of the
environment with a sequence of agents inserted into this environment (represented as AL programs). The
right-hand side is a program in AL augmented by calls to the insertion function denoted as env(E, u),
where E is an environment state expression and u is anAL program. To compute the interaction semantics
of AL program one uses both the unfolding rules for procedure calls and the insertion rules to unfold calls
to the insertion function.
In this approach, the environment is considered as a semantic notion and is not explicitly included
in the agent. Instead, the meaning of an agent is dened as a transformation of an environment which
corresponds to inserting the agent into its environment. When the agent is inserted into the environment,
the environment changes and this change is considered to be a property of the agent described.
6.4.6.2 Simulating of Transition Systems
The AL has been implemented by means of a simulator [10,106,107], an interactive program which
generates all histories of an environment with inserted agents and which can explore the behavior of
this environment step-by-step, starting from any possible initial state, with branching at nondetermin-
istic points and backtracking to previous states. The simulator permits forward and backward moves
along histories; in automatic mode it can search for states satisfying predened properties (deadlock,
successful termination, etc.) or properties dened by the user. The generation of histories may be user
guided and thus permits examination of different histories. The user can retrieve information about the
current state of a system and change this state by means of inserting new agents using different insertion
functions.
Arbitrary data structures can be used for the representation of the states of an environment and the
environment actions. The set of states of an environment is closed under the insertion function e[u]
which is denoted in the simulator as env(e, u). The agent u is represented by an AL expression. Arbitrary
algebraic data structures can be used for the representation of agent actions and procedure calls.
2006 by Taylor & Francis Group, LLC
System Validation 6-39
The core of the simulator is specied as a nondeterministic transition system that functions as an
environment for the system model. Actions of the simulating environment are expressed by means of calls
for services of the simulator. Local services dene one-step transition of the simulated system. Global
services permit the user to compute different properties of the behavior of a simulated system. The user
can formulate the property of a state by means of a rewriting rule system or some other predicate function
and the simulator will search for the existence of a state satisfying the property among the states reachable
fromthe current state. Examples of such properties are deadlock, successful termination, undened states,
and so on.
6.4.6.3 Theorem Proving
The proof system [108] is based on the interactive evidence algorithm [109111] a Gentzen-style
calculus with unication used for rst-order reasoning.
The Interactive Evidence Algorithm is a sequent calculus and relies on the construction of an auxiliary
goal as the main inference step which allows easy control of the direction of the search for proofs at each
step through the choice of auxiliary goals. This algorithm can be represented as a combination of two
calculi: inference in the calculus of auxiliary goals is used as a single-step inference in the calculus of
conditional sequents. In a sense, the interactive evidence algorithm generalizes logic programming in that
for the latter, auxiliary goals are extracted from Horn disjuncts while in the interactive evidence algorithm
they are extracted from arbitrary formulae with quantiers (which need not be skolemized).
The interactive evidence algorithm is implemented as a nondeterministic algebraic program extracted
from the calculus based on the simulator for AL. This program is inserted as an agent into a control
environment which searches for a proof, organizes interaction with the user and the knowledge bases,
and implements strategies and heuristics to speed up the proof search. The control environment contains
the assumptions of a conditional sequent, and so the local information can be combined with other
information taken from knowledge base agents and used in search strategies.
The prover is invoked by the function prove implemented as a simple recursive procedure with
backtracking which takes an initial conditional sequent as argument and searches for a path from the
initial statement to axioms, and this path is converted to a proof. The inference search is nondeterministic
owing to disjunction rules.
Predicates are considered up to equivalence dened by means of all Boolean equations except dis-
tributivity. A function Can dened by means of a system of rewriting rules denes the reduction of
predicate formulae as well as propositional formulae to a normal form. Predicate formulae are considered
up to renaming of bound variables and equations (x)p = (x)p, (x)p = (x)p. Associativity,
commutativity, and idempotence of conjunction and disjunction as well as the laws of contradiction,
excluded middle, and the laws for propositional constants are used implicitly in these equations.
6.4.7 Consistency and Completeness
The notion of consistency of requirements in general is equivalent to the existence of an implementation
or model of a system that satises these requirements. Completeness means that this model is unique
up to some predened equivalence. The traditional way of proving consistency is to develop a model
coded in some programming or simulation language and to prove that this code is correct with respect to
requirements. However, direct proving of correctness is difcult because it demands computing necessary
invariant conditions for the states of a program. Another method is generating the space of all possible
states of a system reachable from the initial states and checking whether the dynamic requirements are
satised in each state. This approach is known as model checking and many systems which support model
checking have been developed. Unfortunately, model checking is realistic only if the state space is nite,
and all reachable states can be generated in a reasonable amount of time.
Our approach proves consistency and completeness of requirements directly, without developing a
model or implementation of the system. We prove that the static requirements dene the system com-
pletely and that dynamic properties of consistent requirements are all the logical consequences of static
2006 by Taylor & Francis Group, LLC
6-40 Embedded Systems Handbook
requirements. Based on this assumption, one can dene an executable specication using only static
requirements and then execute it using a simulator.
We distinguish between the consistency and completeness of static requirements and dynamic con-
sistency. The rst is dened in terms of static requirements only and reects the property of a system
to be deterministic to actions by the environment. For example, a query from a client to a server as the
action of an inserted agent can be selected nondeterministically, but the response must be dened by static
requirements selected in a deterministic manner. When all dynamic requirements are the consequences
of static requirements, we say the system is dynamically consistent.
Sufcient conditions for the consistency of static requirements depend on subject domains and implicit
assumptions about the change of observable attributes. For example, for the classes of asynchronous
systems considered previously, the condition for internal consistency is simply that the conjunction of two
preconditions corresponding to different rules with the same action is not satisable. Completeness means
that the disjunction of all preconditions for all rules corresponding to the same action is generally valid.
For synchronous systems, on the other hand, it is the nonsatisability of two preconditions corresponding
to rules which dene conicting changes to the same (usually binary) attribute. The incompleteness of
static requirements usually is not harmful, it merely postpones design decisions to the implementation
stage. However, it is harmful if there exists an implementation which meets the static requirements but
does not meet the dynamic requirements.
Dynamic consistency of requirements (the invariance of dynamic conditions expressed using the tem-
poral modalityalways) can be proven inductively using the structure of static requirements. Consistency
checking proceeds by formulating and proving consistency conditions for every pair of static requirements
with the same starting event. Every such pair of requirements must satisfy the condition that for arbitrary
values of attributes there must be at least one of the two requirements which has a false precondition or
the postconditions are equivalent.
Completeness of requirements means that there exists exactly one model for the requirements up to
some equivalence. We distinguish two main cases depending on the focus of the requirements specication.
If the specication denes the environment, the equivalence of environments needs to be considered.
Otherwise, if an agent is dened by the requirements, the equivalence of agents needs to be examined.
Let e and e
be two environment states (of the same or different environments). We say that e and e
, e[u] and e
[u
] are also
bisimilar). If there are restrictions on possible behaviors of the agents, we consider admissible agents
rather than arbitrary agents.
Let E and E
be two environments (each being a set of environment states and an insertion function).
These environments are equivalent if each state of one of the environments is equivalent to some state of
the other.
If the set of environments denes an agent for a given environment E, logical completeness (with
respect to agent denition) means that all agents satisfying these requirements are insertion equivalent
with respect to the environment E, that is, if u and u
].
We check completeness for the set of all static requirements that refer to the same starting event. Every
such set of requirements must satisfy the condition that for arbitrary values of attributes there must be at
least one among the requirements that is applicable with a true precondition.
6.5 Examples and Results
Figure 6.7 exhibits a design process using the 3CR [104] tool set. The requirements for a system are
represented as input text written in the formal requirements language or translated from engineering
notations, such as SDL or MSC. Static requirements are sent to the checker which establishes their
consistency and completeness. The checker analyzes a requirement statement and generates a logical
2006 by Taylor & Francis Group, LLC
System Validation 6-41
Behavior
model
Checker
Static
requirements
Prover
Generate
executablespec
Simulator
Generate tests
Validate
Structure
model
Dynamic
requirements
Environment
model
FIGURE 6.7 Design process.
statement expressing the consistency of the given requirement with other requirements already accepted,
as well as a statement expressing completeness after all static requirements have been accepted. Then this
statement is submitted to the prover in order to search for a proof. The prover may return one of three
answers: proved, not proved, or unknown. In the case where consistency could not be proven, one of the
following types of inconsistencies is considered.
Inconsistent formalization. This type of inconsistency can be eliminated through improved formal-
ization, if the postconditions are consistent for the states where all preconditions are true. Splitting
the requirements can help.
Inconsistency resulting from incompleteness. This is the case when two requirements are consist-
ent, but nonintersection of preconditions cannot be proved because complete knowledge of the
subject domain is not available. A discussion with experts or the authors of the requirements is
recommended.
Inconsistency. Preconditions are intersected, but postconditions are inconsistent after performing
an action. This is a sign of a possible error, which can be corrected only by the change of require-
ments. If the intersection is not reachable, the inconsistency will not actually arise. In this case,
a dynamic property can be formulated and proven.
Dynamic properties are checked after accepting all static requirements. These are logical statements
expressing properties of a system in terms of rst-order predicate calculus, extended by temporal modal-
ities, as well as higher-order functions and types. If an inductive proof is needed, all static requirements
are used for generating lemmas to prove the inductive step.
After checking the consistency and completeness of static requirements, the requirements are used for
the automatic generation of an executable specication of a system satisfying the static requirements. At
this point, the dynamic requirements have already been proven to be consequences of static requirements,
so the system also satises the dynamic requirements. The next step of system design would be the use of
2006 by Taylor & Francis Group, LLC
6-42 Embedded Systems Handbook
the obtained information in the next stages of development. For example, executable specications can
be used for generating complete test cases for system test.
6.5.1 Example: Embedded Operating System
In this section, we shall describe a general model which could be used for developing formal requirements
for embedded operating systems such as OSEK [103].
The requirements for the OSEK operating system can serve as an example of the application of the gen-
eral methodology of checking consistency. These requirements comprise two documents: OSEK Concept
and OSEK API. The rst document contains an informal description of conformance classes (BCC1,
BCC2, ECC1, ECC2, ECC3) and requirements of the main services of the system. The second document
renes the requirements in terms of C function headers and types of service calls.
Two kinds of requirements can be distinguished in these documents. Static requirements dene per-
manent properties of the operating system, which must be true for arbitrary states and any single-step
transition. These requirements refer to the structure of operating system states and their changes in
response to the performance of services. Dynamic requirements state global system properties such as the
absence of deadlocks or priority inversions.
Using the theory of interaction of agents and environments as the formalism for the description
of OSEK, an environment consists of a processor (or processor network), an operating system, and
the external world which interacts with the environment via some kind of communication network;
agents are tasks interacting with the operating system and communication network via services. We use
nondeterministic agents over a set of actions representing operating systemservices as models of tasks. The
states of the environment are characterized by a set of observable attributes with actions corresponding
to the actions of task agents.
Each attribute denes a partial function from the set E of environment states to the set of val-
ues D. E is considered as an algebra with the set of (internal or external) operations dened on it.
The domain D should be dened as abstract as possible, for example, by means of set theoretic construc-
tions (functions, relations, powersets) over abstract data types represented as initial algebras, in order to
be independent as much as possible of the details of implementation when formulating the requirements
specications.
In monoprocessor systems only one agent is in the active state, that is capturing a processor resource.
If e is a state of the environment with no active agents then in the representation e[u] of the environment
the state u is a state of an active agent. All other agents are in nonactive states (suspended and ready states
for OSEK) and are included into the state e as parts of the values of attributes.
The properties of an environment can be divided into static and dynamic properties. Static properties
dene one-step transitions of a system; dynamic properties dene the properties of the total system. The
general form of a rule for transitions is:
e
c
e
, u
a
u
e[u]
d
e
[u
]
In this rule d, e
, and u
= e
, and u
= u
,
e
i
for all i I [1 : n] where I is the set of all
indices for which such transitions dened. Then e
p
i
= v
i
for i I and e
p
i
= e p
i
for i / I . From
this denition it follows that if I = and e
c
e
then e
p
i
= e p
i
for all i I [1 : n].
In the case, when two states of the environment are bisimilar, this rule is sufcient to dene the
transitions of the environment. Otherwise we can introduce the hidden part of the environment state and
consider transitions of attributes jointly with this hidden component.
For space considerations, in Section 6.5.1.1 we show only the example of a simple scheduler applicable
to this class of operating systems.
6.5.1.1 Requirements Specication for a Simple Scheduler
This example of a simplied operating system providing initial loading and scheduling for tasks and
interrupt processing is used as a benchmark to demonstrate the approach for formalizing and checking
the consistency of requirements. We use the terminology of OSEK [103].
The attributes of the scheduler are:
Active, a name
Priority, a partial function from names to natural numbers
Ready, a list of name/agent pairs
Call, a partial function from names to agents
The empty list and the everywhere undened function are denoted as Nil. These attributes are dened
only for nonterminal and deterministic states. The actions of task agents are calls for services:
new_task (a, i), a is a name of an agent, i is an integer
activate a, a is a name
terminate
schedule
In the following requirements we assume that the current state of the environment is e[u] and that
u
c
u
for a given service c. The values of attributes are their values in a state e. We dene the transitions
e[u]
d
e
[u
].
The actions of environment include all task actions and, in addition, the following actions which are
specic only for the environment and are addressed to an external observer of scheduler activity:
loaded a, a is a name
activated a, a is a name
activate_error
schedule_error
terminated a, a is a name
schedule u, u is an agent
scheduled a, a is a name
wait
start_interrupt
end_interrupt
6.5.1.1.1 Requirements for new_task
This action substitutes the old task with the same name if it was previously dened in the scheduler or
adds the task to an environment as a new task otherwise. Transitions for the attributes:
priority : f
new_task(a:v,i)
priority : f [a := i]
2006 by Taylor & Francis Group, LLC
6-44 Embedded Systems Handbook
We use the following notation for the redenition of functions: if f : X Y and x X then f [x := y] is
a new function g such that g(x) = y and g(x
) = f (x
) for x = x
, u
new_task (a:v,i)
u
e[u]
loaded a
e
[u
]
6.5.1.1.2 Requirements for Activate
We use the following notation: if p is an attribute, its value is a function and x is in the domain of this
function, then p(x) denotes the current value of this function on x.
call a = v
ready : r
activate a
ready : ord(a : v, r)
The function ord is dened on the set of lists of pairs (a : u) where a is a name and u is an agent and this
function must satisfy the following system of axioms where all parameters are assumed to be universally
quantied:
ord(a : , r) = r
priority b priority a ord(a : u, b : v, r) = (b : v, ord(a : u, r))
Hence ready is a queue of task agents ordered by priorities and adding a pair (a : u) put this pair as the
last one among all pairs of the same priority as a. The rules are:
e
activate a
e
, u
activate a
u
, a Dom(call)
e[u]
activated a
e
[u
]
u
activate a
u
, a / Dom(call)
e[u]
activate_error
Anundenedstate of the environment only means that a decisionabout the behavior of the environment
in this case is left for the implementation stage. For instance, the denition can be extended so that the
environment sends an error message and calls error processing programs or continuous its functioning
ignoring the incorrect action.
6.5.1.1.3 Requirements for Terminate
u
terminate
u
e[u]
activated (e.active)
e[schedule ]
6.5.1.1.4 Requirements for Schedule
Let P(u, b, v, s) = P
1
P
2
where
P
1
= e.active = Nil ord(e.active : u, e.ready) = (b : v, s)
P
2
= e.active = Nil u = e.ready = (b : v, s)
2006 by Taylor & Francis Group, LLC
System Validation 6-45
Let r = e.ready, and a = e.active, then the rules for attributes are:
P(u, b, v, s)
ready : r
schedule u
ready : s
P(u, b, v, s)
active : a
schedule u
active : b
Note that, transitions for attributes and therefore for the environment are highly nondeterministic because
the parameter u is an arbitrary agent behavior. But this nondeterminism disappears in the rule for
scheduling which restricts the possible values for u to no more than one value. The rules are:
P(u
, b, v, s), e
schedule u
, u
schedule
u
e[u]
scheduled b
e
[v]
u
schedule
u
, e.active = Nil u
=
e[u]
schedule_error
u
schedule
, e.ready = Nil
e[u]
wait
e
[]
Therefore, if a task has no name (it can happen if a task is initially inserted into an environment) it can
use scheduling only as the last action, otherwise it is an error. And if there is nothing to schedule, the
scheduling action is ignored.
6.5.1.1.5 Interrupts
The simplest way to introduce interrupts to our model is to hide the occurrence of interrupts and the
choice of the start of interrupt processing. Only actions which show the start and the end of interrupt
processing are observable. The rules are:
e
start_interrupt
e
[v]
e
start_interrupt
e
[v; end_interrupt ; u]
We have no transitions for attributes labeled by the interrupt action so in this transition e and e
have the
same values for all attributes. The program v is an interrupt processing routine.
u
end_interrupt
u
e[u]
end_interrupt
e[u
]
Nesting of interrupts can be of arbitrary depth. The action end_interrupt is an environment action
but it is usedby the insertedagent after interrupt startedtoshowthe endof interrupt processing. Therefore,
the set of actions for inserted agent is extended, but still it is not an action of an agent before its insertion
into the environment.
2006 by Taylor & Francis Group, LLC
6-46 Embedded Systems Handbook
6.5.1.1.6 Termination
When all tasks are successfully terminated, the scheduler reaches the waiting state:
active : a
wait
active : Nil
ready : Nil, e
wait
e
e[]
wait
e
[]
6.5.1.1.7 Dynamic Requirements
A state e of an environment is called initial if e.ready = e.active = Nil, and the domains of
functions e.priority and e.call are empty. Let E
0
be the set of all states reachable from the initial
states. Dene E
n+1
, n = 0, 1, . . . as a set of all states reachable from the states e[u], where e E
n
and u is
an arbitrary task agent. The set E of admissible states is dened as a union E = E
0
E
1
. . .. Multiple
insertion rules show that the insertion function is sequential. Dynamic requirements for environment
states are as follows:
E does not contain the deadlock state 0.
There are no undened states in E except for those which result from error actions.
Tasks are scheduled in FIFO discipline for the tasks of the same priority, tasks of a higher priority
are scheduled rst and interrupt actions are nested as brackets.
6.5.1.1.8 Consistency
The only nonconstructive transition in the requirements specication of the simple scheduler is the inser-
tion of an arbitrary agent as an interrupt processing routine. If we restrict the corresponding transitions
to the selection from some nite set (even nondeterministically) the requirements will be executable.
To prove dynamic properties, rst some invariant properties for E (always statements) must be proved.
Then after their formalization, dynamic properties are inferred from these invariants:
Dom(e.priority ) = Dom(e.call)
(a : u) e.ready a Dom(e.priority )
e.active = Nil e.active Dom(e.priority )
e.ready is ordered by priority
In the invariants formulated above, e is assumed to be nonterminal.
6.5.1.2 Input Text to the Consistency Checker
The consistency checker accepts static requirements represented in the form of Hoare-style triples and
dynamic requirements in the form of logical formulae. Requirements include the description of typed
attributes and actions. The following input text is obtained from the description of simple scheduler
consideredabove. It is statically consistent andcanbe usedfor proving dynamic properties of the scheduler.
Each requirement describes the change of a state of environment with the inserted agent represented as the
value of the attribute active_task. The value of this attribute is the behavior of a previously inserted
agent which is currently active. The predicate active_task>a u is used to represent the transition
active_task
a
u. The action axiom is needed to prove consistency for action wait (Code 6.7).
Code 7
attributes(
active: name,
priority: name -> Nat,
ready: list of (name:agent),
call: name -> agent,
2006 by Taylor & Francis Group, LLC
System Validation 6-47
active_task: agent
);
actions(a:name,u:agent,i:int)(
new_task(a:u,i),
activate a,
terminate,
schedule,
loaded a,
activated a,
activate_error,
schedule_error,
terminated a,
schedule u,
scheduled a,
wait,
start_interrupt,
end_interrupt
);
Let action axiom: Forall x((x.Delta = Delta));
Let ord Delta: Forall(a,r)(ord(a:Delta,r) = r);
Let ord: Forall(a,b,u,v,r)(
(priority b <= priority a) & (a = Delta)
-> (ord(a:u,b:v,r) = (b:v,ord(a:u,r))));
/* ------------ new_task ------------------------------ */
req new_task: Forall(a:name, (u,v):agent, i:int)(
(active_task --> new_task(a:v,i).u)
-> after(loaded a)
((active_task = u) & (priority a = i) & (call a = v)));
/* ------------ activate ------------------------------ */
req activate success: Forall(a:name,(u,v):agent, r:list of(name:agent))(
((active_task --> activate a.u) & (ready = r) &(call a = v)
& (v = Nil))
-> after(activated a)
(active_task = u & ready = ord(a:v,r)));
req activate error: Forall(a:name,u:agent)(
((active_task --> activate a.u) & (call a = Nil))
-> after activate_error
bot);
/* ------------ terminate ----------------------------- */
req terminate: Forall(a:name, u:agent)(
((active_task --> terminate.u) & (active = a))
-> after(terminated a)
(active_task = schedule));
/* ------------ schedule ------------------------------ */
req schedule success active:
Forall((u,v):agent, a:name,s:list of(name:agent))(
((active_task --> schedule.u) & (active = Nil) &
(ord(active:u,ready) = (a:v,s)))
-> after(scheduled a)
((active_task =u) & (active = a) & (ready = s)));
req schedule success not active:
Forall(v:agent, a:name,s:list of(name:agent))(
( (active_task = schedule) & (active = Nil) & (ready = (a:v,s)))
2006 by Taylor & Francis Group, LLC
6-48 Embedded Systems Handbook
-> after(scheduled a)
((active_task =v) & (active = a) & (ready = s)));
req schedule error: Forall(u:agent)(
((active_task --> schedule.u) & (active = Nil) & (u = Delta))
-> after schedule_error
bot);
req schedule final: Forall(v:agent, b:name,s:list of(name:agent))(
((active_task --> schedule.Delta) & (ready = Nil))
-> after wait
(active_task = Delta));
/* ------------ interrupt ------------------------------ */
req start interrupt: Forall((u,v):agent)(
((active_task = u) & (interrup_process = v))
-> after start_interrupt
(active_task = (v;end_interrupt;u)));
req end interrupt: Forall(u:agent)(
(active_task --> end_interrupt.u)
-> after end_interrupt
(active_task = u));
/* ------------ termination --------------------------- */
req termination: Forall(u:agent)(
(active_task = Delta) & (ready = Nil)
-> after wait
(active_task = Delta))
/* ------------ dynamic properties -------------------- */
prove always Forall(a:name)( a in_set Dom(priority)<=>a in_set Dom(call);
prove always Forall(a:name,u:agent)(
(a:u)in_list(ready)-> a in_set Dom(priority));
prove always (active = Nil)-> active in_set Dom(priority);
prove always is_ord ready
6.5.2 Experimental Results in Various Domains
We have developed specializations for the following subject domains: sequential asynchronous environ-
ments, parallel asynchronous environments, and sequential synchronous agents. We have conducted a
number of projects in each domain to determine the effectiveness of formal requirements verication.
Figure 6.8 exhibits the performance of our provers. We show the measurements in terms of MSC
diagrams, a familiar engineering notation often used to describe embedded systems. The chart on the left
shows performance in terms of arrows, that is, communications between instances on an MSC diagram.
We can see that the performance is roughly linear to the number of arrows up to roughly 800 arrows per
diagram. Note that, a typical diagram has much less arrows, no more than hundred in most cases. The
chart on the right shows that performance is linear in the number of MSC diagrams (of typical size).
Jointly, these charts indicate that the system is scalable to realistically sized applications.
6.5.2.1 OSEK
OSEK [103] is a representative example of an asynchronous sequential environment. The OSEK standard
denes an open, embedded operating system for automotive electronics.
The OSEK formal model has been described as an environment for application tasks of different types,
considered as agents inserted into this environment. The actions common for agents and environment
are the services of the operating system. The system is multitasking but has only one processor and only
one task is running at any given moment and, therefore, the system is considered to be sequential. The
2006 by Taylor & Francis Group, LLC
System Validation 6-49
0.00
0.05
0.10
0.15
0.20
0.25
0 200 400 600 800 1000 1200
Arrows
P
r
o
v
i
n
g
t
i
m
e
p
e
r
a
r
r
o
w
(
s
e
c
)
(
s
e
c
)
0
200
400
600
800
25 50 75 100 125 150
MSC
FIGURE 6.8 Performer of prover in terms of MSC diagrams.
system is asynchronous because all actions performed by tasks independently of the operating system are
not observable and so the time between two services cannot be taken into account. Static requirements
are represented by transition rules with preconditions and postconditions. The reachable states for OSEK
can be characterized by integrity conditions.
After developing the formal requirements for OSEK, the proof system was used to prove static consist-
ency and completeness of the requirements. Several interesting dynamic properties of the requirements
were also proven. The formalization of OSEK requirements led to the discovery of 12 errors in the
nonformal OSEK standard. For example, Section 6.7.5 of the OSEK/VDX specication [103] denes a
transition related to the current priority of a task in the case when it has a priority less than the ceiling
priority of the resource; however, no transition is dened in the case when the current priority of the task
is equal to the ceiling priority.
All these errors were documented and the corrections have been integrated into the OSEK standard. In
the formal specication, we have covered 10 services dened by the OSEK standard and have proven the
consistency and completeness of this specication. This covers approximately 40% of the complete OSEK
standard. Moreover, we have found a number of mistakes in the other parts of the OSEK standard, which
prevented formalization of the rest of the standards document.
Consistency and completeness of the covered parts of the standard (49 requirements) were proven,
after making corrections for the above mentioned defects. The proof of consistency took approximately
7 min on a Pentium III computer with 256M of RAM running the Red Hat Linux Operating System.
6.5.2.2 RIO
The RapidIO Interconnect Protocol [112] is an example of a parallel asynchronous environment. This is
a protocol for a set of processor elements to communicate amongst each other. There are three layers of
abstraction developed: logic, transport, and physical layers.
The static requirements for RIO are standard (pre- and postconditions referring to the adjacent
moments of time). But while in OSEK an action is uniquely dened by a running task, in RIO it is
generated by a nondeterministic choice of one of the processor elements that generates an observable
action.
The formal requirements descriptionof RIOfor logic (14 requirements) and transport layers (6 require-
ments) was obtained from the documentation and proved to be consistent and complete (46 sec);
46 requirements for the physical layer have been proven consistent in 8.5 min.
6.5.2.3 Vger
The formal requirements for the protocol used by the SC-Vger processor [113] for communicating with
other processor elements of a system via the MAXbus bus device were extracted from the documentation
of the MAXbus and from discussions with experts. Vger is a representative example of a synchronous
2006 by Taylor & Francis Group, LLC
6-50 Embedded Systems Handbook
sequential agent inserted into a parallel environment. Vger is a deterministic automaton with binary
inputoutput signals and shared data available from the bus. The attributes of the system are its input
output signals and its shared data. Originally, there are no actions and we can consider the clock signal
synchronizing the system as the only observable action. Static requirements are written using asser-
tion/deassertion conditions for output signals. Each requirement is a rule for setting the signal to a given
value (0 or 1). The precondition is a history of conditions represented in a Klenee-like algebra with time.
Several rules can be applied at the same moment. For the static consistency conditions, the preconditions
of two rules which set the same attribute to different values can never be true at the same lock interval.
There are no static completeness conditions because we dene the semantics of the requirements text so
that if there are no rules to change the output value, it remains in the same state as in the previous moment
of time. We use binary attribute symbols as predicates and as long as there are no other predicate symbols
the systems represents a propositional calculus.
To prove statements with Klenee algebra expressions, these must rst be reduced to rst-order logic, that
is, to requirements with preconditions referring to one moment of time (without histories). A converter
has been developed for the automatic translation of subject domains relying on Kleene algebra and the
interval calculus notation.
The set of reachable states of Vger is not dened in rst-order logic, and the proof of the consistency
condition is only a sufcient condition for consistency. A more powerful yet still sufcient condition is
the provability of consistency conditions by standard induction from static requirements. There exists a
sequence of increasingly powerful conditions which converge to the results obtained by model checking.
All 26 Vger requirements have been proven to be consistent (192 sec).
6.6 Conclusions and Perspectives
In this chapter, we reviewed tools and methods to ensure that the right system is developed, by which
we mean a system that matches what the customer really wants. Systems that do not match customer
requirement result in cost overruns owing to later changes of the system at best, and, in the worst case,
may never be deployed. Based on the mathematical model of the theory of agents and interactions we
developed a set of tools capable of establishing the consistency and completeness of system requirements.
Roughly speaking, if the requirements are consistent, an implementation which meets the requirements is
possible; if the requirements are complete, this implementation is dened uniquely by the requirements.
We discuss how to represent requirements specications for formal validation and exhibit experimental
results of deploying these tools to establish the correctness of embedded software systems. This chapter
also reviews other models of system behavior and other tools for system validation and verication.
Our experience has shown that dramatic quality improvements are possible through formal valida-
tion and verication of systems under development. In practice, deployment of these techniques will
require increased upstream development effort: thorough analysis of requirements and their capture in
specication languages result in a longer design phase. In addition, signicant training and experience
are needed before signicant benets can be achieved. Nevertheless, the improvements in quality and
reduction in effort in later development phases warrant this investment, as application of these methods
in pilot projects has demonstrated.
References
[1] D. Harel and A. Pnueli. On the development of reactive systems. In K. Apt, Ed., Logics and Models
of Concurrent Systems. NATO ASI Series, vol. 13. Springer-Verlag, pp. 477498.
[2] Z. Manna and A. Pnueli. The Temporal Logic of Reactive and Concurrent Systems. Springer-Verlag,
Heidelberg, 1992.
[3] Z. Manna and A. Pnueli. Temporal Verication of Reactive Systems: Safety. Springer-Verlag,
Heidelberg, 1995.
2006 by Taylor & Francis Group, LLC
System Validation 6-51
[4] F.P. Brooks. The Mythical Man-Month: Essays on Software Engineering. Addison-Wesley, Reading,
MA, 1995.
[5] L. Lamport. Introduction to TLA. SRC Technical note 1994-001, 1994.
[6] R.J. van Glabbeek. Notes on the methodology of CCS and CSP. Theoretical Computer Science,
177: 329349, 1997.
[7] D.M.R. Park. Concurrency and automata on innite sequences. In Proceedings of the 5th GI
Conference. Lecture Notes in Computer Science, vol. 104. Springer-Verlag, Heidelberg, 1981.
[8] R. Milner. Communication and Concurrency. Prentice Hall, New York, 1989.
[9] J.V. Kapitonova and A.A. Letichevsky. On constructive mathematical descriptions of subject
domains. Cybernetics, 4: 408418, 1988.
[10] A.A. Letichevsky and D.R. Gilbert. Towards an implementation theory of nondeterministic con-
current languages. Second Workshop of the INTAS-93-1702 Project: Efcient Symbolic Computing,
St Petersburg, October 1996.
[11] A.A. Letichevsky and D.R. Gilbert. A general theory of action languages. Cybernetics and System
Analysis, 1: 1231, 1998.
[12] R. Milner. The polyadic -calculus: a tutorial. In F.L. Bauer, W. Brauer, and H. Schwichtenberg,
Eds., Logic and Algebra of Specication. Springer-Verlag, Heidelberg, 1993, pp. 203246.
[13] C.A.R. Hoare. Communicating Sequential Processes. Prentice Hall, New York, 1985.
[14] J.A. Bergstra and J.W. Klop. Process algebra for synchronous communication. Information and
Control, 60: 109137, 1984.
[15] L. Lamport. The temporal logic of actions. ACM Transactions on Programming Languages and
Systems, 16(3): 872923, 1994.
[16] A. Pnueli. The temporal logic of programs. In Proceedings of the 18th Annual Symposium on the
Foundations of Computer Science, November 1977, pp. 4652.
[17] E. Emerson and J. Halpern. Decision procedures and expressiveness in the temporal logic of
branching time. Journal of Computer and System Science, 30: 124, 1985.
[18] M.J. Fisher and R.E. Ladner. Propositional modal logic of programs. In Proceedings of the 9th
ACM Annual Symposium on Theory of Computing, pp. 286294.
[19] E. Emerson. Temporal and modal logic. InJ. vanLeeuwen, Ed., Handbook of Theoretical Computer
Science. MIT Press, Cambridge, MA, 1991, pp. 9971072.
[20] R. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Transactions on
Computers, 35: 677691.
[21] J. Burch, E. Clarke, K. McMillan, D. Dill, and L. Hwang. Symbolic model checking: 1020 states
and beyond. Information and Computation, 98: 142170, 1992.
[22] E. Clarke and E. Emerson. Synthesis of synchronization skeletons for branching time temporal
logic. InThe Workshop on Logic of Programs. Lecture Notes in Computer Science, vol. 131. Springer-
Verlag, Heidelberg, 1981, pp. 128143.
[23] J. Quielle and J. Sifakis. Specication and verication of concurrent systems in CESAR.
In Proceedings of the 5th International Symposium on Programming, pp. 142158.
[24] L. Lamport. What goodis temporal logic? InR. Mason, Ed., InformationProcessing-83: Proceedings
of the 9th IFIP World Computer Congress, Elsevier, 1983, pp. 657668.
[25] M. Abadi and L. Lamport. Composing specications. ACM Transactions on Programming
Languages and Systems, 15: 73132, 1993.
[26] W. Thomas. Automata on innite objects. In J. van Leeuwen, Ed., Handbook of Theoretical
Computer Science. MIT Press, Cambridge, MA, 1991, pp. 131191.
[27] A.P. Sistla, M. Vardi, and P. Wolper. The complementation problem for Bchi automata with
application to temporal logic. Theoretical Computer Science, 49: 217237, 1987.
[28] M. Vardi and P. Wolper. An automata-theoretic approach to automatic program verication.
In Proceedings of the 1st IEEE Symposium on Logic in Computer Science, pp. 332344.
[29] H. Rodgers. Theory of Recursive Functions and Effective Computability. McGraw-Hill, New York,
1967.
2006 by Taylor & Francis Group, LLC
6-52 Embedded Systems Handbook
[30] Y. Gurevich. Evolving algebras: anattempt todiscover semantics. InG. Rozenberg andA. Salomaa,
Eds., Current Trends in Theoretical Computer Science, World Scientic, River Edge, NJ, 1993,
pp. 266292.
[31] Y. Gurevich. Evolving algebras 1993: Lipari guide. In E. Brger, Ed., Specication and Validation
Methods. University Press, 1995, pp. 936.
[32] J. Meseguer. Conditional rewriting logic as a unied model of concurrency. Theoretical Computer
Science, 96: 73155, 1992.
[33] P. Lincoln, N. Marti-Oliet, and J. Meseguer. Specication, transformation and programming
of concurrent systems in rewriting logic. In G. Bleloch et al., Eds., Proceedings of the DIMACS
Workshop on Specication of Parallel Algorithms American Mathematical Society, Providence, 1994.
[34] M. Clavel. Reection in General Logics and Rewriting Logic with Application to the Maude
Language. Ph.D. thesis, University of Navarra, 1998.
[35] M. Clavel and J. Meseguer. Axiomatizing reective logics and languages. In G. Kicrales, Ed.,
Reection96. 1996, pp. 263288.
[36] M. Clavel, F. Durn, S. Eker, P. Lincoln, N. Mart-Oliet, J. Meseguer, and J. Quesada. Towards
Maude 2.0. In F. Futatsugi, Ed., Proceedings of the 3rd International Workshop on Rewriting Logic
and its Applications. Notes in Theoretical Computer Science, vol. 36, Elsevier, 2000.
[37] J. Meseguer and P. Lincoln. Introduction in Maude. Technical report, SRI International, 1998.
[38] J. Brackett. Software Requirements. Technical report SEI-CM-19-1.2, Software Engineering
Institute, 1990.
[39] B. Boehm. Industrial software metrics top 10 list. IEEE Software, 4: 8485, 1987.
[40] B. Boehm. Software Engineering Economics. Prentice Hall, New York, 1981.
[41] J.C. Kelly, S.S. Joseph, and H. Jonathan. An analysis of defect densities found during software
inspections. Journal of Systems Software, 17: 111117, 1992.
[42] R. Lutz. Analyzing requirements errors in safety-critical embedded sytems. In IEEE International
Symposium Requirements Engineering, San Diego, 1993, pp. 126133.
[43] T. DeMarco. Structured Analysis and System Specication. Yourdon Press, New York, 1979.
[44] C.V. Ramamoorthy, A. Prakash, W. Tsai, and Y. Usuda. Software engineering: problems and
perspectives. Computer, 17: 191209, 1984.
[45] M.E. Fagan. Design and code inspections to reduce errors in program evelopment. IBM Systems
Journal, 15: 182211, 1976.
[46] M.E. Fagan. Advances in software inspection. IEEE Transactions on Software Engineering,
12: 744751, 1986.
[47] J. Rushby. Formal Methods and their Role in the Certication of Critical Systems. Technical
report CSL-95-1, March 1995.
[48] C.B. Jones. Systematic Software Development Using VDM. Prentice Hall, New York, 1990.
[49] J.M. Spivey. Understanding Z: A Specication Language and its Formal Semantics. Cambridge
University Press, London, 1988.
[50] J.-R. Abrial. The B-Book: Assigning Programs to Meanings. Cambridge University Press, London,
1996.
[51] International Organization for Standardization Information Processing Systems Open
Systems Interconnection. Lotos A Formal Description Technique Based on the Temporal
Ordering of Observational Behavior. ISO Standard 8807. Geneva, 1988.
[52] R.S. Boyer and J.S. Moore. A Computational Logic Handbook. Academic Press, New York, 1988.
[53] M.J.C. Gordonand T.F. Melham, Eds., Introduction to HOL. Cambridge University Press, London,
1993.
[54] D. Craigen, S. Kromodimoeljo, I. Meisels, B. Pase, and M. Saaltink. EVES: an overview. In
VDM91: Formal Software Development Methods. Lecture Notes in Computer Science, vol. 551.
Springer-Verlag, Heidelberg, 1991, pp. 389405.
[55] M. Saaltink, S. Kromodimoeljo, B. Pase, D. Craigen, and I. Meisels. Data abstraction in EVES. In
Formal Methods Europe93, Odense, April 1993.
2006 by Taylor & Francis Group, LLC
System Validation 6-53
[56] S. Owre, N. Shankar, and J.M. Rushby. User Guide for the PVS Specication and Verication
System. Technical report, SRI International, 1996.
[57] E. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, Cambridge, MA, 2000.
[58] P. Godefroid. VeriSoft: A tool for the automatic analysis of concurrent reactive software. In
Proceedings of the 9th Conference on Computer Aided Verication. Lecture Notes in Computer
Science, vol. 1254. Springer-Verlag, Heidelberg, 1997, pp. 476479.
[59] J. Burch, E. Clarke, D. Long, K. McMillan, and D. Dill. Symbolic model checking for sequen-
tial circuit verication. IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, 13(4): 401424, 1994.
[60] G. Holzmann. The SPIN Model Checker, Primer and Reference Manual. Addison-Wesley, Reading,
MA, 2004.
[61] S.J. Garland and J.V. Guttag. A Guide to LP, the Larch Prover. Technical report, DEC Systems
Research Center Report 82, 1991.
[62] J. Crow, S. Owre, J. Rushby, N. Shankar, and M. Srivas. A tutorial introduction to PVS. InWIFT 95:
Workshop on Industrial-Strength Formal Specication Techniques. Boca Raton, FL, April 1995.
[63] S. Rajan, N. Shankar, and M. Srivas. An integration of model checking with automated
proof checking. In Proceedings of the 7th International Conference on Computer Aided Veric-
ation CAV 95. Lecture Notes in Computer Science, vol. 939. Springer-Verlag, Heidelberg, 1995,
pp. 8497.
[64] B. Berard, Ed., Systems and Software Verication: Model-Checking Techniques and Tools. Springer-
Verlag, Heidelberg, 2001.
[65] International Telecommunications Union. Recommendation Z.120 Message Sequence Charts.
Geneva, 2000.
[66] Object Management Group. Unied Modeling Language Specication, 2.0. 2003.
[67] J. Hooman. Towards formal support for UML-based development of embedded systems.
In Proceedings of the 3rd PROGRESS Workshop on Embedded Systems, 2002, pp. 7176.
[68] M. Bozga, J. Fernandez, L. Ghirvth, S. Graf, J.P. Krimm, L. Mounier, and J. Sifakis. IF: an
intermediate representation for SDL and its applications. In Proceedings of the 9th SDL Forum,
Montreal, June 1999.
[69] F. Regensburger and A. Barnard. Formal verication of SDL systems at the Siemens mobile phone
department. In Tools and Algorithms for the Construction and Analysis of Systems ACAS98.
Lecture Notes in Computer Science, vol. 1384. Springer-Verlag, Heidelberg, 1998, pp. 439455.
[70] O. Shumsky and L. J. Henschen. Developing a framework for verication, simulation and testing
of SDL specications. In M. Kaufmann and J.S. Moore, Eds., Proceedings of the ACL2 Workshop
2000, Austin, 2000.
[71] P. Baker, P. Bristow, C. Jervis, D. King, and B. Mitchell. Automatic generation of conformance
tests from message sequence charts. In Proceedings of the 3rd SAM (SDL And MSC) Workshop,
Telecommunication and Beyond, Aberystwyth. Lecture Notes in Computer Science, 2003, p. 2599.
[72] B. Mitchell, R. Thomson, and C. Jervis. Phase automaton for requirements scenarios. In Pro-
ceedings of the Feature Interactions in Telecommunications and Software Systems, vol. VII, 2003,
pp. 7787.
[73] L. Philipson and L. Hogskola. Survey compares formal verication tools. EETIMES, 2001.
http://www.eetimes.com/story/OEG20011128S0037
[74] S. Yovine. Kronos: Averication tool for real-time systems. International Journal of Software Tools
for Technology Transfer, 1: 123133, 1997.
[75] P. Pettersson and K. Larsen. UPPAAL2k. Bulletin of the European Association for Theoretical
Computer Science, 70: 4044, 2000.
[76] D. Bjorner and C.B. Jones, Eds., The Vienna development method: the meta-language. In Logic
Programming. Lecture Notes in Computer Science, vol. 61. Springer-Verlag, Heidelberg, 1978.
[77] Y. Ledru and P.-Y. Schobbens. Applying VDM to large developments. ACM SIGSOFT Software
Engineering Notes, 15: 5558, 1990.
2006 by Taylor & Francis Group, LLC
6-54 Embedded Systems Handbook
[78] A. Puccetti and J.Y. Tixadou. Application of VDM-SL to the development of the SPOT4 pro-
gramming messages generator. FM 99: World Congress on Formal Methods, VDM Workshop,
Toulouse, 1999.
[79] J.C. Bicarregui and B. Ritchie. Reasoning about VDM developments using the VDM support tool
in Mural. In VDM 91: Formal Software Development Methods. Lecture Notes in Computer Science,
vol. 551. Springer-Verlag, Heidelberg, 1991, pp. 371388.
[80] A. Diller. Z: An Introduction to Formal Methods. John Wiley & Sons, New York, 1990.
[81] W. Grieskamp, M. Heisel, and H. Dorr. Specifying embedded systems with statecharts and
Z: an agenda for cyclic software components. In Proceedings of the Formal Aspects of Soft-
ware Engineering FASE 98. Lecture Notes in Computer Science, vol. 1382. Springer-Verlag,
Heidelberg, 1998.
[82] D. Bert, S. Boulm, M.-L. Potet, A. Requet, and L. Voisin. Adaptable translator of B specications
to embedded C programs. In Formal Methods 2003. Lecture Notes in Computer Science, vol. 2805.
Springer-Verlag, Heidelberg, 2003, pp. 94113.
[83] R. Milne. The Semantic Foundations of the RAISE Specication Language. RAISE report
REM/11, STC Technology, 1990.
[84] M. Nielsen, K. Havelund, K. Wagner, and C. George. The RAISE language, methods, and tools.
Formal Aspects of Computing, 1: 85114, 1989.
[85] T. Mossakowski, Kolyang, and B. Krieg-Bruckner. Static semantic analysis and theorem proving
for CASL. In F. Parisi Presicce, Ed., Proceedings of the 12th Workshop on Algebraic Development
Techniques. Lecture Notes in Computer Science, vol. 1376. Springer-Verlag, Heidelberg, 1998,
pp. 333348.
[86] P.D. Mosses. COFI: the common framework initiative for algebraic specication and develop-
ment. In TAPSOFT97: Theory and Practice of Software Development. Lecture Notes in Computer
Science. vol. 1214. Springer-Verlag, Heidelberg, 1997, pp. 115137.
[87] B. Krieg-Brckner, J. Peleska, E. Olderog, andA. Baer. The UniForMworkbench, a universal devel-
opment environment for formal methods. In J. Wing, J. Woodcock, and J. Davies, Eds., FM99,
Formal Methods. Lecture Notes in Computer Science, vol. 1709. Springer-Verlag, Heidelberg, 1999,
pp. 11861205.
[88] C.L. Heitmeyer, J. Kirby, and B. Labaw. Tools for formal specication, verication and valid-
ation of requirements. In Proceedings of the 12th Annual Conference on Computer Assurance,
Gaithersburg, June 1997.
[89] S. Easterbrook, R. Lutz, R. Covington, Y. Ampo, and D. Hamilton. Experiences using lightweight
formal methods for requirements modeling. IEEE Transactions on Software Engineering, 24: 414,
1998.
[90] L.C. Paulson. Isabelle: A Generic Theorem Prover. Lecture Notes in Computer Science, vol. 828.
Springer-Verlag, Heidelberg, 1994, pp. 2334.
[91] B.J. Krmer and N. Vlker. a highly dependable computer architecture for safety-critical control
applications. Real-Time Systems Journal, 13: 237251, 1997.
[92] D. Muthiayen. Real-Time Reactive System Development A Formal Approach Based on UML
and PVS. Technical report, Concordia University, 2000.
[93] P.B. Jackson. The Nuprl Proof Development System, Reference Manual and User Guide. Cornell
University, Ithaca, NY, 1994.
[94] L. Cortes, P. Eles, and Z. Peng. Formal coverication of embedded systems using model checking.
In Proceedings of the 26th EUROMICRO Conference, Maastricht, September 2000, pp. 106113.
[95] G. Holzmann. Design and Validation of Computer Protocols. Prentice Hall, New York, 1991.
[96] G. Holzmann. The model checker SPIN. IEEETransactions onSoftware Enginering, 23: 320, 1997.
[97] R. Kurshan. Automata-Theoretic Verication of Coordinating Processes. Princeton University Press,
Princeton, NJ, 1993.
[98] R. de Simone and M. Lara de Souza. Using partial-order methods for the verication of beha-
vioural equivalences. In G. von Bochmann, R. Dssouli, and O. Raq, Eds., Formal Description
Techniques VIII, 1995.
2006 by Taylor & Francis Group, LLC
System Validation 6-55
[99] J. Fernandez, H. Garavel, A. Kerbrat, R. Mateescu, L. Mounier, and M. Sighireanu. CADP:
a protocol validation and verication toolbox. In Proceedings of the 8th Conference on Computer-
Aided Verication. New Brunswick, August 1996, pp. 437440.
[100] D. Dill, A. Drexler, A. Hu, and C. Yang. Protocol verication as a hardware design aid. In IEEE
International Conference on Computer Design: VLSI in Computers and Processors. October 1992,
pp. 522525.
[101] E. Astegiano and G. Reggio. Formalism and method. Theoretical Computer Science, 236:
334, 2000.
[102] Z. Chaochen, C.A.R. Hoare, and A.P. Ravn. Acalculus of durations. Information Processing Letter,
40: 269276, 1991.
[103] OSEK Group. OSEK/VDX. Operating System.Version 2.1. May 2000.
[104] S.N. Baranov, V. Kotlyarov, J. Kapitonova, A. Letichevsky, and V. Volkov. Requirement capturing
and 3CR approach. In Proceedings of the 26th International Computer Software and Applications
Conference, Oxford, 2002, pp. 279283.
[105] J.V. Kapitonova, A.A. Letichevsky, and S.V. Konozenko. Computations in APS. Theoretical
Computer Science, 119: 145171, 1993.
[106] D.R. Gilbert and A.A. Letichevsky. A universal interpreter for nondeterministic concurrent pro-
gramming languages. In M. Gabbrielli, Ed., Fifth Compulog Network Area Meeting on Language
Design and Semantic Analysis Methods, September 1996.
[107] T. Valkevych, D.R. Gilbert, and A.A. Letichevsky. A generic workbench for modelling the
behaviour of concurrent and probabilistic systems. In Workshop on Tool Support for System
Specication, Development and Verication, TOOLS98, Malente, June 1998.
[108] A.A. Letichevsky, J.V. Kapitonova, and V.A. Volkov. Deductive tools in algebraic programming
system. Cybernetics and System Analysis, 1: 1227, 2000.
[109] A. Degtyarev, A. Lyaletski, andM. Morokhovets. Evidence algorithmandsequent logical inference
search. In H. Ganzinger, D. McAllester, and A. Voronkov, Eds., Logic for Programming and
Automated Reasoning (LPAR99). Lecture Notes in Computer Science, vol. 1705. Springer-Verlag,
1999, pp. 4461.
[110] V.M. Glushkov, J.V. Kapitonova, A.A. Letichevsky, K.P. Vershinin, and N.P. Malevanyi. Con-
struction of a practical formal language for mathematical theories. Cybernetics, 5: 730739,
1972.
[111] V.M. Glushkov. On problems of automata theory and articial intelligence. Cybernetics, 5: 313,
1970.
[112] Motorola. RIO Interconnect Globally Shared Memory Logical Specication. Motorola, 1999.
[113] Motorola. SC-Vger Microprocessor Implementation Denition. Motorola, 1997.
[114] S. Abramsky. A domain equation for bisimulation. Information and Computation, 92:
161218, 1991.
[115] R. Alur and D. Dill. A theory of timed automata. Theoretical Computer Science, 126:
183235, 1994.
[116] S.N. Baranov, C. Jervis, V. Kotlyarov, A. Letichevsky, and T. Weigert. Leveraging UML to deliver
correct telecom applications. In L. Lavagno, G. Martin, and B. Selic, Eds., UML for Real: Design
of Embedded Real-Time Systems. Kluwer Academic Publishers, Amsterdam, 2003.
[117] J. Bicarregui, T. Dimitrakos, B. Matthews, T. Maibaum, K. Lano, and B. Ritchie. The VDM+B
project: objectives and progress. In World Congress on Formal Methods in the Development of
Computing Systems. Toulouse, September 1999.
[118] G. Booch, J. Rumbaugh, and I. Jacobson. Unied Modeling Language User Guide. Addison-Wesley,
Reading, MA, 1997.
[119] S. Chandra, P. Godefroid, and C. Palm. Software model checking in practice: an industrial
case study. In Proceedings of the International Conference on Software Engineering, Orlando,
May 2002.
2006 by Taylor & Francis Group, LLC
6-56 Embedded Systems Handbook
[120] E. Clarke, I. Draghicescu, and R. Kurshan. A Unied Approach for Showing Language Con-
tainment and Equivalence between Various Types of Omega-Automata. Technical report,
Carnegie-Mellon University, 1989.
[121] F. VanDewerker and S. Booth. Requirements Consistency ABasis for DesignQuality. Technical
report, Ascent Logic, 1998.
[122] E. Felt, G. York, R. Brayton, and A. Vincentelli. Dynamic variable reordering for BDD
minimization. In Proceedings of the EuroDAC, 1993, pp. 130135.
[123] M. Fitting. A Kripke-Kleene semantics for logic programs. Journal of Logic Programming,
2: 295312, 1985.
[124] I. Graham. Migrating to Object Technology. Addison-Wesley, Reading, MA, 1995.
[125] Green Mountain Computing Systems. Green Mountain VHDL Tutorial, 1995.
[126] International Telecommunications Union. RecommendationZ.100 SpecicationandDescrip-
tion Language. Geneva, 1999.
[127] B. Jacobs. Objects and classes, coalgebraically. In B. Freitag, C.B. Jones, C. Lengauer,
and H.-J. Schek, Eds., Object-Orientation with Parallelism and Persistence. Kluwer Academic
Publishers, 1996, pp. 83101.
[128] I. Jacobson. Object-Oriented Software Engineering, A Use Case Driven Approach. Addison-Wesley,
Reading, MA, 1992.
[129] N.D. Jones, C. Gomard, and P. Sestoft. Partial Evaluation and Automatic Program Generation.
Prentice Hall, New York, 1993.
[130] J.V. Kapitonova, T.P. Marianovich, and A.A. Mishchenko. Automated design and simulation of
computer systems components. Cybernetics and System Analysis, 6: 828840, 1997.
[131] M. Kaufmann and J.S. Moore. ACL2: an industrial strength version of NQTHM. In Proceedings
of the 11th Annual Conference on Computer Assurance (COMPASS96), June 1996, pp. 2334.
[132] S. Kripke. Semantical considerations on modal logic. Acta Philosophica Fennica, 16: 8394, 1963.
[133] J. van Leeuwen, Ed., Handbook of Theoretical Computer Science. MIT Press, Cambridge,
MA, 1991.
[134] A.A. Letichevsky, and J.V. Kapitonova. Mathematical information environment. In Proceedings
of the 2nd International THEOREMA Workshop, Linz, June 1998, pp. 151157.
[135] A.A. Letichevsky andD.R. Gilbert. Agents andenvironments. InProceedings of the 1st International
Scientic and Practical Conference on Programming, Kiev, 1998.
[136] A.A. Letichevsky and D.R. Gilbert. A model for interaction of agents and environments. In
Selected Papers from the 14th International Workshop on Recent Trends in Algebraic Development
Techniques. Lecture Notes in Computer Science. vol. 1827, 2004, pp. 311328.
[137] P. Lindsay. On transferring VDM verication techniques to Z. In Proceedings of Formal Methods
Europe FME94, Barcelona, October 1994.
[138] W. McCune. Otter 3.0 Reference Manual and Guide. Technical report, Argonne National
Laboratory Report ANL-94, 1994.
[139] K. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, Dordrecht, 1993.
[140] M. Morockovets and A. Luzhnykh. Representing mathematical texts in a formalized natural
like language. In Proceedings of the 2nd International THEOREMA Workshop, Linz, June 1998,
pp. 157160.
[141] T. Nipkow, L. Paulson, and Markus Wenzel. Isabelle/HOL A Proof Assistant for Higher-Order
Logic. Lecture Notes in Computer Science, vol. 2283. Springer-Verlag, Heidelberg, 2002.
[142] S. Owre, J.M. Rushby, and N. Shankar. A prototype verication system. In D. Kapur, Ed., Pro-
ceedings of the 11th International Conference on Automated Deduction (CADE). Lecture Notes in
Articial Intelligence, vol. 601. Springer-Verlag, Heidelberg, 1992, pp. 748752.
[143] G. Plotkin. A Structured Approach to Operational Semantics. Technical report, DAIMI FN-19,
Aarhus University, 1981.
[144] K.S. Rubin and A. Goldberg. Object behavior analysis. Communications of the ACM, 35:
4862, 1992.
2006 by Taylor & Francis Group, LLC
System Validation 6-57
[145] R. Rudell. Dynamic variable reordering for ordered binary decision diagrams. In Proceedings of
the IEEE/ACM ICCAD93, 1993, pp. 4247.
[146] J. Rushby. Mechanized formal methods: where next? In J. Wing and J. Woodcock, Eds., FM99: The
World Congress in Formal Methods. Lecture Notes in Computer Science, vol. 1708. Springer-Verlag,
Heiderberg, 1999, pp. 4851.
[147] J. Rushby, S. Owre, and N. Shankar. Subtypes for specications: predicate subtypes in PVS. IEEE
Transactions on Software Engineering, 24: 709720, 1998.
[148] M. Saeki, H. Horai, and H. Enomoto. Software development process from natural language
specication. In International Conference on Software Engineering. Pittsburgh, March 1989,
pp. 6473.
[149] J. Tsai and T. Weigert. Knowledge-Based Software Development for Real-Time Distributed Systems.
World Scientic Publishers, Singapore, 1993.
[150] M. Vardi. Verication of concurrent programs the automata-theoretic framework. In Proceed-
ings of the 2nd IEEE Symposium on Logic in Computer Science, pp. 167176.
[151] T. Weigert and J. Tsai. A logic-based requirements language for the specication and analysis of
real-time systems. In Proceedings of the 2nd Conference on Object-Oriented Real-Time Dependable
Systems, Laguna Beach, 1996, pp. 816.
2006 by Taylor & Francis Group, LLC
Design and Verication
Languages
7 Languages for Embedded Systems
Stephen A. Edwards
8 The Synchronous Hypothesis and Synchronous Languages
Dumitru Potop-Butucaru, Robert de Simone, and Jean-Pierre Talpin
9 Introduction to UML and the Modeling of Embedded Systems
ystein Haugen, Birger Mller-Pedersen, and Thomas Weigert
10 Verication Languages
Aarti Gupta, Ali Alphan Bayazit, and Yogesh Mahajan
2006 by Taylor & Francis Group, LLC
7
Languages for
Embedded Systems
Stephen A. Edwards
Columbia University
7.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
7.2 Software Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
Assembly Languages The C Language C++ Java
Real-Time Operating Systems
7.3 Hardware Languages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
Verilog VHDL
7.4 Dataow Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12
Kahn Process Networks Synchronous Dataow
7.5 Hybrid Languages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-14
Esterel SDL SystemC
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18
7.1 Introduction
An embedded system is a computer masquerading as a non-computer that must perform a small set of
tasks cheaply and efciently. A typical system might have communication, signal processing, and user
interface tasks to perform.
Because the tasks must solve diverse problems, a language general-purpose enough to solve them all
would be difcult to write, analyze, and compile. Instead, a variety of languages have evolved, each best
suited to a particular problem domain. The most obvious divide is between languages for software and
hardware, but there are others. For example, a language for signal-processing is often more convenient for
a particular problemthan, say, assembly, but might be poor for control-dominated behavior.
This chapter describes popular hardware, software, dataow, and hybrid languages, each of which excels
a certainproblems. Dataowlanguages are good for signal processing, and hybrid languages combine ideas
fromthe other three classes.
Due to space, this chapter only describes the main features of each language. The authors book on the
subject [1] provides many more details on all of these languages.
Some of this chapter originally appeared in the Online Symposiumfor Electrical Engineers (OSEE).
7-1
2006 by Taylor & Francis Group, LLC
7-2 Embedded Systems Handbook
7.2 Software Languages
Software languages describe sequences of instructions for a processor to execute (Table 7.1). As such, most
consist of sequences of imperative instructions that communicate through memory: an array of numbers
that hold their values until changed.
Each machine instruction typically does little more than, say, add two numbers, so high-level languages
aim to specify many instructions concisely and intuitively. Arithmetic expressions are typical: coding an
expression such as ax
2
+ bx + c in machine code is straightforward, tedious, and best done by a compiler.
The C language provides such expressions, control-ow constructs such as loops and conditionals, and
recursive functions. The C++ language adds classes as a way to build new data types, templates for
polymorphic code, exceptions for error handling, and a standard library of common data structures.
Java is a still higher-level language that provides automatic garbage collection, threads, and monitors to
synchronize them.
7.2.1 Assembly Languages
An assembly language program (Figure 7.1) is a list of processor instructions written in a symbolic, human-
readable form. Each instruction consists of an operation such as addition along with some operands. For
example, add r5,r2,r4 might add the contents of registers r2 and r4 and write the result to r5. Such
arithmetic instructions are executed in order, but branch instructions can perform conditionals and loops
by changing the processors program counter the address of the instruction being executed.
A processors assembly language is dened by its opcodes, addressing modes, registers, and memories.
The opcode distinguishes, say, addition from conditional branch, and an addressing mode denes how and
where data is gathered and stored (e.g., from a register or from a particular memory location). Registers
can be thought of as small, fast, easy-to-access pieces of memory.
There are roughly four categories of modern assembly languages (Table 7.2). The oldest are those for the
so-called complex instruction set computers, or CISC. These are characterized by a rich set of instructions
and addressing modes. For example, a single instruction in Intels x86 family, a typical CISC processor,
can add the contents of a register to a memory location whose address is the sum of two other registers
and a constant offset. Such instruction sets are usually convenient for human programmers, who are
generally fairly skilled at using a heterogeneous set of tools, and the code itself is usually quite compact.
Figure 7.1(a) illustrates a small program in x86 assembly.
By contrast, reduced instruction set computers (RISC) tend to have fewer instructions and much
simpler addressing modes. The philosophy is that while you generally need more RISC instructions to
accomplish something, it is easier for a processor to execute them because it does not need to deal with
the complex cases and easier for a compiler to produce thembecause they are simpler and more uniform.
Figure 7.1(b) illustrates a small programin SPARC assembly.
TABLE 7.1 Software Language Features Compared
C C++ Java
Expressions
Control-ow
Recursive functions
Exceptions
Classes and inheritance
Templates
Namespaces
Multiple inheritance
Threads and locks
Garbage collection
Note: , full support; , partial support.
2006 by Taylor & Francis Group, LLC
Languages for Embedded Systems 7-3
jmp L2
L1:
movl %ebx, %eax
movl %ecx, %ebx
L2:
xorl %edx, %edx
divl %ebx
movl %edx, %ecx
testl %ecx, %ecx
jne L1
(a) mov %i0, %o1
b .LL3
mov %i1, %i0
mov %i0, %o1
b .LL3
mov %i1, %i0
.LL5:
mov %o0, %i0
.LL3:
mov %o1, %o0
call .rem, 0
mov %i0, %o1
cmp %o0, 0
bne .LL5
mov %i0, %o1
(b)
FIGURE7.1 Euclids algorithm(a) i386 assembly (CISC) and (b) SPARCassembly (RISC). SPARChas more registers
and must call a routine to compute the remainder (the i386 has division instruction). The complex addressing modes
of the i386 are not shown in this example.
TABLE 7.2 Typical Modern Processor Architectures
CISC RISC DSP Microcontroller
x86 SPARC TMS320 8051
68000 MIPS DSP56000 PIC
ARM ASDSP-21xx AVR
move #samples, r0
move #coeffs, r4
move #n-1, m0
move m0, m4
movep y:input, x:(r0)
clr a
x:(r0)+, x0 y:(r4)+, y0
rep #n-1
mac x0,y0,a
x:(r0)+, x0 y:(r4)+, y0
macr x0,y0,a (r0)-
movep a, y:output
(a)
START:
MOV SP, #030H
ACALL INITIALIZE
ORL P1,#0FFH
SETB P3.5
LOOP:
CLR P3.4
SETB P3.3
SETB P3.4
WAIT:
JB P3.5, WAIT
CLR P3.3
MOV A,P1
ACALL SEND
SETB P3.3
AJMP LOOP
(b)
FIGURE 7.2 (a) A nite impulse response lter in DSP56001 assembly. The mac instruction (multiply and accumu-
late) does most of the work, multiplying registers X0 and Y0, adding the result to accumulator A, fetching the next
sample and coefcient frommemory, and updating circular buffer pointers R0 and R4. The rep instruction repeats the
mac instruction in a zero-overhead loop. (b) Writing to a parallel port in 8051 microcontroller assembly. This code
takes advantage of the 8051s ability to operate on single bits.
The third category of assembly languages arise from more specialized processor architectures such as
digital signal processors (DSPs) and very-long instruction word processors (VLIWs). The operations in
these instruction sets are simple like those in RISC processors (e.g., add two registers); but they tend to
be very irregular (only certain registers may be used with certain operations) and support a much higher
degree of instruction-level parallelism. For example, Motorolas DSP56001 can, in a single instruction,
multiply two registers, add the result to a third, load two registers frommemory, and increase two circular
buffer pointers. However, the instruction severely limits which registers (and even which memory) it may
use. Figure 7.2(a) shows a lter implemented in 56001 assembly.
2006 by Taylor & Francis Group, LLC
7-4 Embedded Systems Handbook
The fourth category includes instruction sets on small (4- and 8-bit) microcontrollers. In some sense,
these combine the worst of all worlds: there are few instructions and each cannot do much, much like
a RISC processor, and there are also signicant restrictions on which registers can be used when, much
like a CISC processor. The main advantage of such instruction sets is that they can be implemented very
cheaply. Figure 7.2(b) shows a routine that writes to a parallel port in 8051 assembly.
7.2.2 The C Language
C is currently the most popular language for embedded system programming. C compilers exist for
virtually every general-purpose processor, from the lowliest 4-bit microcontroller to the most powerful
64-bit processor for compute servers.
C was originally designed by Dennis Ritchie [2] as an implementation language for the Unix operating
system being developed at Bell Labs for a 24K DEC PDP-11. Because the language was designed for
systems programming, it provides very direct access to the processor through such constructs as untyped
pointers and bit-manipulation operators, things appreciated today by embedded systems programmers.
Unfortunately, the language also has many awkward aspects, such as the need to dene everything before
it is used, that are holdovers from the cramped execution environment in which it was rst implemented.
A C program (Figure 7.3) contains functions built from arithmetic expressions structured with loops
and conditionals. Instructions in a C programrun sequentially, but control-ow constructs such as loops
of conditionals can affect the order in which instructions execute. When control reaches a function call in
an expression, control is passed to the called function, which runs until it produces a result, and control
returns to continue evaluating the expression that called the function.
C derives its types from those a processor manipulates directly: signed and unsigned integers ranging
from bytes to words, oating point numbers, and pointers. These can be further aggregated into arrays
and structures groups of named elds.
C programs use three types of memory. Space for global data is allocated when the program is compiled,
the stack stores automatic variables allocated and released when their function is called and returns, and
the heap supplies arbitrarily-sized regions of memory that can be deallocated in any order.
The C language is an ISO standard, but most people consult the book by Kernighan and Ritchie [3].
C succeeds because it can be compiled into very efcient code and because it allows the programmer
almost arbitrarily low-level access to the processor when necessary. As a result, virtually every function can
be written in C (exceptions include those that must manipulate specic processor registers) and can be
expected to be fairly efcient. Cs simple execution model also makes it fairly easy to estimate the efciency
of a piece of code and improve it if necessary.
#include <stdio.h>
int main(int argc, char *argv[])
{
char *c;
while (++argv, --argc > 0) {
c = argv[0] + strlen(argv[0]);
while (--c >= argv[0])
putchar(*c);
putchar(\n);
}
return 0;
}
FIGURE 7.3 A C program that prints each of its arguments backwards. The outermost while loop iterates through
the arguments (count in argc, array of strings in argv), while the inner loop starts a pointer at the end of the current
argument and walks it backwards, printing each character along the way. The ++ and -- prexes increment the
variable they are attached to before returning its value.
2006 by Taylor & Francis Group, LLC
Languages for Embedded Systems 7-5
While C compilers for workstation-class machines usually comply closely to ANSI/ISO standard C,
C compilers for microcontrollers are often much less standard. For example, they often omit support for
oating-point arithmetic and certain library functions. Many also provide language extensions that, while
often very convenient for the hardware for which they were designed, can make porting the code to a
different environment very difcult.
7.2.3 C++
C++(Figure 7.4) [4] extends Cwith structuring mechanisms for big programs: user-dened data types, a
way to reuse code with different types, namespaces to group objects and avoid accidental name collisions
when programpieces are assembled, and exceptions to handle errors. The C++standard library includes a
collection of efcient polymorphic data types such as arrays, trees, strings for which the compiler generates
customimplementations.
A class denes a new data type by specifying its representation and the operations that may access
and modify it. Classes may be dened by inheritance, which extends and modies existing classes. For
example, a rectangle class might add length and width elds and an area method to a shape class.
A template is a function or class that can work with multiple types. The compiler generates custom
code for each different use of the template. For example, the same min template could be used for both
integers and oating-point numbers.
C++ also provides exceptions, a mechanism intended for error recovery. Normally, each method
or function can only return directly to its immediate caller. Throwing an exception, however, allows
control to return to an arbitrary caller, usually an error-handling mechanism in the main function
or similar. Exceptions can be used, for example, to gracefully recover fromout-of-memory conditions no
matter where they occur, without the tedium of having to check whether every function encountered an
out-of-memory condition.
Memory consumption is a disadvantage to C++s exception mechanism. While most C++compilers
do not generate slower code when exceptions are enabled, they do generate larger executables by including
tables that record the location of the nearest exception handler. For this reason, many compilers, such as
GNUs gcc, have a ag that completely disables exceptions.
class Cplx {
double re, im;
public:
Cplx(double v) : re(v), im(0) {}
Cplx(double r, double i)
: re(r), im(i) {}
double abs() const {
return sqrt(re*re + im*im);
}
void operator+= (const Cplx& a) {
re += a.re; im += a.im;
}
};
int main() {
Cplx a(5), b(3,4);
b += a;
cout << b.abs() << \n;
return 0;
}
FIGURE 7.4 A C++ fragment illustrating a partial complex number type and how it can be used (the C++ library
has a complete version). This class denes how to create a new complex number fromeither a scalar or by specifying
the real and imaginary components, how to compute the absolute value of a complex number, and how to add a
complex number to an existing one.
2006 by Taylor & Francis Group, LLC
7-6 Embedded Systems Handbook
C++ is being used more and more within embedded systems, but it is sometimes a less suitable choice
than C for a number of reasons. First, C++ is a much more complicated language that demands a much
larger compiler, so C ++ has been ported to fewer architectures than C. Second, certain language features
such as dynamic dispatch (virtual function calls) and exceptions can be too costly to implement in very
small embedded systems. It is a more difcult language to learn and use properly, meaning there may be
fewer qualied C++ programmers. Also, it is often more difcult to estimate the cost of a certain construct
in C++ because the object-oriented programming style encourages many more function calls than the
procedural style of C, and the cost of these is harder to estimate.
7.2.4 Java
Suns Java language [57] resembles C++ but is not a superset. Like C ++, Java is object-oriented,
providing classes and inheritance. It is a higher-level language than C ++ since it uses object references,
arrays, and strings instead of pointers. Javas automatic garbage collection frees the programmer from
memory management.
Java omits a number of C ++s more complicated features. Templates are absent, although there
are plans to include them in a future release of the language because they make it possible to write
type-safe container classes. Java also omits operator overloading, which can be a boon to readability
(e.g., when performing operations on complex numbers) or a powerful obfuscating force. Java also does
not support C++s complex multiple inheritance mechanismcompletely. But it does provide the notion
of an interface a set of methods provided by a class that is equivalent to one of the most common
uses of multiple inheritance.
Java provides concurrent threads (Figure 7.5). Creating a thread involves extending the Thread class,
creating instances of these objects, and calling their start methods to start a new thread of control that
executes the objects run methods.
Synchronizing a method or block uses a per-object lock to resolve contention when two or more threads
attempt to access the same object simultaneously. A thread that attempts to gain a lock owned by another
thread will block until the lock is released, which can be used to grant a thread exclusive access to a
particular object.
For embedded systems, Java holds promise but also many caveats. On the positive side, it is a simple,
powerful language that provides the programmer a convenient set of abstractions. For example, unlike C,
Java provides true strings and variable-sized arrays. On the negative side, Java is a heavyweight language,
even more so than C++. Its runtime systemis large, consisting of either a bytecode interpreter, a just-in-
time compiler, or perhaps both, and its libraries are absolutely vast. While work has been done on paring
down these things, Java still requires a much larger footprint than C.
Unpredictable runtimes are a more serious problem for Java. For time-critical embedded systems,
Javas automatic garbage collector, bytecode interpreter, or just-in-time compiler make runtimes both
unpredictable and variable, making it difcult to assess efciency both beforehand and in simulation.
The real-time Java specication [8] attempts to address many of these concerns. It introduces mech-
anisms for more precise control over the scheduling policy for concurrent threads (the standard Java
specication is deliberately vague on this point to improve portability), memory regions for which auto-
matic garbage collection can be disabled, synchronization mechanisms for avoiding priority inversion,
and various other real-time features such as timers. It remains to be seen, however, whether this specica-
tion addresses enough real-time concerns and is sufciently efcient to be practical. For example, a naive
implementation of the memory management policies would be very inefcient.
7.2.5 Real-Time Operating Systems
Many embedded systems use a real-time operating system (RTOS) to simulate concurrency on a single
processor. An RTOS manages multiple running processes, each written in sequential language such
as C. The processes perform the systems computation and the RTOS schedules them attempts to
2006 by Taylor & Francis Group, LLC
Languages for Embedded Systems 7-7
import java.io.*;
class Counter {
int value = 0;
boolean present = false;
public synchronized void count() {
try { while (present) wait(); }
catch (InterruptedException e) {}
value++; present = true; notifyAll();
}
public synchronized int read() {
try { while (!present) wait(); }
catch (InterruptedException e) {}
present = false; notifyAll();
return value;
}
}
class Count extends Thread {
Counter cnt;
public Count(Counter c) { cnt = c; start(); }
public void run() { for (;;) cnt.count(); }
}
class Mod5 {
public static void main(String args[]) {
Counter c = new Counter();
Count count = new Count(c);
int v;
for (;;) if ( (v = c.read()) % 5 == 0 )
System.out.println(v);
}
}
FIGURE 7.5 A contrived Java programthat spawns a counting thread to print all numbers divisible by 5. The main
method in the Mod5 class creates a new Counter, then a new Count object. The Count class extends the Thread class
and spawns a new thread in its constructor by executing start. This invokes its run method, which calls the method
count. Both count and read are synchronized, meaning at most one may run on a particular Count object at once, here
guaranteeing the counter is either counting or waiting for its value to be read.
A A A
B B B
C C
B A B C A B A B C
A preempts B
A completes, allows B to resume
B completes, allows C to run
C completes, A takes priority over B
A completes, allows B to run
B completes
FIGURE 7.6 The behavior of an RTOS with xed-priority preemptive scheduling. Rate-monotonic analysis gives
process A the highest priority since it has the shortest period; C has the lowest.
meet deadlines by deciding which process runs when. Labrosse [9] describes the implementation of a
particular RTOS.
Most RTOSes uses xed-priority preemptive scheduling in which each process is given a particular
priority (a small integer) when the system is designed (Figure 7.6). At any time, the RTOS runs the
highest-priority running process, which is expected to run for a short period of time before suspending
itself to wait for more data. Priorities are usually assigned using rate-monotonic analysis [10] (due to Liu
and Layland [11]), which assigns higher priorities to processes that must meet more frequent deadlines.
2006 by Taylor & Francis Group, LLC
7-8 Embedded Systems Handbook
H H
M
L
L M H M L H H
L begins running
L acquires lock on resource
M preempts L
H preempts M
H blocks waiting for lock, M runs
M delays the execution of L
H misses its deadline
FIGURE 7.7 Priority inversion illustrated. When low-priority process L acquires a lock on a resource needed by
process H, it effectively blocks process H, but then intermediate-priority process M preempts L, preventing it from
running and releasing the resource needed by H. Priority inheritance, the common solution, temporarily raises the
priority of L to that of H when H requests the resource held by L.
Priority inversion is a fundamental problem in xed-priority preemptive scheduling that can lead to
missed deadlines by enabling a lower-priority process to delay indenitely the execution of a higher-
priority one. Figure 7.7 illustrates the typical scenario: a low priority process L runs and acquires a
resource. Shortly thereafter, a high-priority process H preempts L, attempts to acquire the same resource,
and blocks waiting for L to release it. This can cause H to miss its deadline even though it is at a higher
priority than L. Even worse, if a process M with priority between L and now starts, it can delay the
execution of H indenitely. Process M does not allow L to run since M is at a higher priority, so L cannot
execute and release the lock and H will continue to block.
Priority inversion is usually solved with priority inheritance. When a process L acquires a lock, its
priority is temporarily raised to a level where it will not be preempted by any other process that will also
attempt to acquire the lock. Many RTOSes provide a mechanism for doing this automatically.
7.3 Hardware Languages
Concurrency and the notion of control is the fundamental difference between hardware and software. In
hardware, every part of the program is always running, but in software, exactly one part of the program
is running at any one time. Software languages naturally focus on sequential algorithms while hardware
languages enable concurrent function evaluation, speculation, and concurrency.
Ironically, efcient simulation in software is a main focus of the hardware languages presented here,
so their discrete-event semantics are a compromise between what would be ideal for hardware and what
simulates efciently.
Verilog [12,13] and VHDL [1417] are the most popular languages for hardware description and
modeling (Figure 7.8 and Figure 7.9). Both model systems with discrete-event semantics that ignore idle
portions of the design for efcient simulation. Both describe systems with structural hierarchy: a system
consists of blocks that contain instances of primitives, other blocks, or concurrent processes. Connections
are listed explicitly.
Verilog provides more primitives geared specically toward hardware simulation. VHDLs primitive are
assignments such as a = b + c or procedural code. Verilog adds transistor and logic gate primitives, and
allows new ones to be dened with truth tables.
Both languages allow concurrent processes to be described procedurally. Such processes sleep until
awakened by an event that causes themto run, read and write variables, and suspend. Processes may wait
for a period of time (e.g., #10 in Verilog, wait for 10ns in VHDL), a value change (@(a or b),
wait on a,b), or an event (@(posedge clk), wait on clk until clk=1).
2006 by Taylor & Francis Group, LLC
Languages for Embedded Systems 7-9
VHDL communication is more disciplined and exible. Verilog communicates through wires or regs:
shared memory locations that can cause race conditions. VHDLs signals behave like wires but the res-
olution function may be user-dened. VHDLs variables are local to a single process unless declared
shared.
Verilogs type system models hardware with four-valued bit vectors and arrays for modeling memory.
VHDL does not include four-valued vectors, but its type system allows them to be added. Furthermore,
composite types such as C structs can be dened.
Overall, Verilog is the leaner language more directly geared toward simulating digital integrated circuits.
VHDL is a much larger, more verbose language capable of handing a wider class of simulation and
modeling tasks.
7.3.1 Verilog
Verilog was rst devised in 1984 as an input language for a discrete-event simulator for digital hardware
design. It was one of the rst hardware description languages able to specify both the circuit and a test
bench in the same language, which remains one of its strengths.
Verilog has since been pressed into use as both a modeling language and a specication language.
Although Verilog is still simulated frequently, it is also frequently fed to a logic synthesis system that
translates it into an actual circuit. This is a technically challenging process and not all Verilog constructs
can be translated into hardware since Verilogs semantics are nondeterministic and effectively dened by
the behavior of an event-driven simulator.
Verilog provides both structural and behavioral modeling styles, and allows them to be combined at
will. Consider the simple multiplexer circuit shown in Figure 7.8(a). It can be modeled in Verilog as a
schematic composed of logic gates (Figure 7.8[b]), with a continuous assignment statement that represents
logic using an expression (Figure 7.8[c]), with a truth table as a user-dened primitive (Figure 7.8[d]),
or with imperative, event-driven code (Figure 7.8[e]).
The imperative modeling style is particularly useful for creating testbenches: models of an environment
that stimulate a particular circuit and check its behavior. Figure 7.8(f) illustrates such a testbench, which
instantiates a multiplexer (the instance is called dut device under test) and starts a simple process
(the initial block) to apply inputs and monitor outputs. Running Figure 7.8(f) in a Verilog simulator
gives a partial truth table for the multiplexer.
As these examples illustrate, a Verilog program is composed of modules. Each module has an interface
with named input and outputs ports and contains one or more instances of other modules, continuous
assignments, and imperative code in initial and always blocks. Modules perform the same information
hiding function as functions in imperative languages: a modules contents is not visible from outside and
names for instances, wires, and whatnot inside a module do not have to differ from those in other modules.
Verilog programs manipulate four-valued bit vectors intended to model digital hardware. Each bit
is 0, 1, X, representing unknown, or Z, used to represent an undriven tri-state bus. While such vectors are
very convenient for modeling circuitry, one of Verilogs shortcomings is the lack of a more complicated
type system. It does provide arrays of bit vectors but no other aggregate types.
The plumbing within a module comes in two varieties, one for structural modeling, the other for
behavioral. Structural components, such as instances of primitive logic gates and other modules, commu-
nicate through wires, each of which may be connected to drivers such as gates or continuous assignments.
Conceptually, the value of a wire is computed constantly from whatever drives it. Practically, the simulator
evaluates the expression in a continuous assignment whenever any of its inputs changes.
Behavioral components communicate through regs, which behave like memory in traditional program-
ming languages. The value of a reg is set by an assignment statement executed within an initial or always
block, and that value persists until the next time the reg is assigned. While a reg can be used to model
a state-holding element such as a latch or ipop, it is important to remember that they are really just
memory. Figure 7.8(e) illustrates this: a reg is used to store the output of the mux, even though it is not a
state-holding element. This is because imperative code can only change the value of regs, not wires.
2006 by Taylor & Francis Group, LLC
7-10 Embedded Systems Handbook
g1
g4
g2
g3 f
nsel
f1
f2
module mux(f,a,b,sel);
output f;
input a, b, sel;
and g1(f1, a, nsel),
g2(f2, b, sel);
or g3(f, f1, f2);
not g4(nsel, sel);
endmodule
module mux(f,a,b,sel);
output f;
input a, b, sel;
assign f = sel ? a : b;
endmodule
primitive
mux(f,a,b,sel);
output f;
input a, b, sel;
table
1?0 : 1;
0?0 : 0;
?11 : 1;
?01 : 0;
11? : 1;
00? : 0;
endtable
endprimitive
module mux(f,a,b,sel);
output f;
input a, b, sel;
reg f;
always @(a or b or sel)
if (sel) f = a;
else f = b;
endmodule
module testbench;
reg a, b, sel;
wire f;
mux dut(f, a, b, sel);
initial begin
$display("a,b,sel -> f");
$monitor($time,,
"%b%b%b -> ",
a, b, sel, f);
a = 0; b = 0 ; sel = 0;
#10 a = 1;
#10 sel = 1;
#10 b = 1;
#10 sel = 0;
end
endmodule
(a) (b)
(c) (d)
(e) (f)
a
b
sel
FIGURE 7.8 Verilog examples. (a) A multiplexer circuit, (b) the multiplexer described as a Verilog structural model,
(c) the multiplexer described using a continuous assignment, (d) a user-dened primitive for the multiplexer, (e) the
multiplexer described with imperative code, (f) a testbench for the multiplexer.
Verilog is a large language that contains many now-little-used features such as switch-level transistor
models, pure event handling, and complicated delay specications, all remnants of previous design
methodologies. Today, switch-level modeling is rarely usedbecauseVerilogs precisionis toolowfor circuits
that take advantage of this behavior (a continuous simulator such as SPICE is preferred). Delays are rarely
used because static timing analysis has replaced event-driven simulation as the timing analysis method
of choice because its speed and precision. Nevertheless, Verilog remains one of the most commonly-used
languages for hardware design.
SystemVerilog, a recently-introduced standard (2002), is an extension to the Verilog language designed
to aid in the creation of large specications. It adds a richer set of datatypes, including C-like structures,
unions, and multidimensional arrays, a richer set of processes (e.g., an always_comb block has an
2006 by Taylor & Francis Group, LLC
Languages for Embedded Systems 7-11
implied sensitivity to all variables it references), the concept of an interface to encapsulate communication
and function between blocks, and many other features. Whether SystemVerilog supplants Verilog as a
standard language for hardware specication remains to be seen, but it does have the advantage of being
an obvious evolutionary improvement over previous versions of Verilog.
7.3.2 VHDL
The VHDL language (VHDL is a two-level acronym, standing for VHSIC [Very High Speed Integrated
Circuit] Hardware Description Language) was designed to be a exible modeling language for digital
systems. It has fewer built-in features such as Verilogs four-valued bit vectors, gate, and transistor-level
models. Instead, it has very exible type and package systems that allow such things to be specied in the
language.
Unlike Verilog, VHDL draws a strong distinction between the interface to a hierarchical object and its
implementation. VHDL interfaces are called entities and their implementations are called architectures.
Figure 7.9 illustrates how these are used in a simple model: the entities are essentially named lists of ports
and the architectures consist of named lists of component instances. While this increases the verbosity
of the language, it makes it possible to use different implementations, perhaps at differing levels of
abstraction.
Like Verilog, VHDL supports structural, dataow, and behavioral modeling styles, illustrated in
Figure 7.9. As in Verilog, they can be mixed. In the three styles, an architecture is specied by listing
components and their connections (structural), as a series of equations (dataow, like Verilogs assign
declarations), or as a sequence of imperative instructions (behavioral, like Verilogs always blocks).
In general, a process runs until it reaches a wait statement. This suspends the process until a particular
event occurs, which may be an event on a signal, a condition on a signal, a timeout, or any combination
of these. By itself, wait terminates a process. At the other extreme, wait on Clk until Clk =
1 for 5ns; waits for the clock to rise or for 5 ns, whichever comes rst.
Combinational processes, which always run in response to a change on any of their inputs, are com-
mon enough to warrant a shorthand. Thus, process(A, B, C) effectively executes a wait on
A, B, C statement at the end.
VHDLs type systemis much more elaborate thanVerilogs. It provides integers, oating-point numbers,
enumerations, and physical quantities. Integers and oating-point numbers include a range specication.
For example, a 16-bit integer might be declared as
type address is range 16#0000# to 16#FFFF#;
Enumerated literals may be single characters or identiers. Identiers are useful for FSM states and
single characters are useful for Boolean wire values. Typical declarations:
type Bit is (0, 1);
type FourV is (0, 1, X, Z);
type State is (Reset, Running, Halted);
Objects in VHDL, such as types, variables, and signals, have attributes such as size, base, and range.
Such information can be useful for, say, iterating over all elements in an array. For example, if type
Index is range 31 downto 0, then IndexLOW is 0. Access to information about signals can
be used for collecting simulation statistics. For example, if Count is a signal, then CountEVENT is true
when there is an event on the signal.
VHDL has a powerful library and package facility for encapsulating and reusing denitions. For
example, the standard logic library for VHDL includes types for representing wire states and stand-
ard functions such as AND and OR that operate on these types. Verilog has such facilities built in, but is
not powerful enough to allow such functionality to be written as a library.
2006 by Taylor & Francis Group, LLC
7-12 Embedded Systems Handbook
entity NAND is
port (a: in Bit; b: in Bit; y: out Bit);
end NAND;
architecture arch1 of mux2 is
signal cc, ai, bi : Bit; -- internal signals
component Inverter -- component interface
port (a:in Bit; y: out Bit);
end component;
component AndGate
port (a1, a2:in Bit; y: out Bit);
end component;
component OrGate
port (a1, a2:in Bit; y: out Bit);
end component;
begin
I1: Inverter port map(c => a, y => cc); -- by name
A1: AndGate port map(a, c, ai); -- by position
A2: AndGate port map(a1 => b, a2 => cc, y => bi);
O1: OrGate port map(a1 => ai, a2 => bi, y => d);
end;
architecture arch2 of mux2 is
signal cc, ai, bi : Bit;
begin
cc <= not c;
ai <= a and c;
bi <= b and cc;
d <= ai or bi;
end;
architecture arch3 of mux2 is
begin
process(a, b, c) -- sensitivity list
begin
if c = 1 then
d <= a;
else
d <= b;
end if;
end process;
end;
(a)
(b)
(c)
(d)
FIGURE 7.9 VHDL examples. Compare with Figure 7.8. (a) The entity declaration for the multiplexer, which denes
its interface, (b) a structural description of the multiplexer from Figure 7.8(a), (c) a dataow description with one
equation per gate, (d) an imperative behavioral description.
7.4 Dataow Languages
The hardware and software languages described earlier have semantics very close to that of their imple-
mentations (e.g., as instructions on a sequential processor or as digital logic gates), which makes for
efcient realizations, but some problems are better described using different models of computation.
Many embedded systems perform signal processing tasks such as reconstructing a compressed audio
signal. While such tasks can be described and implemented using the hardware and software languages
described earlier, signal processing tasks are more conveniently represented with systems of processes
2006 by Taylor & Francis Group, LLC
Languages for Embedded Systems 7-13
that communicate through queues. Although clumsy for general applications, dataow languages are a
perfect t for signal-processing algorithms, which use vast quantities of arithmetic derived from linear
system theory to decode, compress, or lter data streams that represent periodic samples of continuously-
changing values such as sound or video. Dataow semantics are natural for expressing the block diagrams
typically used to describe signal-processing algorithms, and their regularity makes dataow implementa-
tions very efcient because otherwise costly run-time scheduling decisions can be made at compile time,
even in systems containing multiple sampling rates.
7.4.1 Kahn Process Networks
Kahn Process Networks [18] form a formal basis for dataow computation. Kahns systems consist
of processes that communicate exclusively through unbounded point-to-point rst-in, rst-out queues
(Figure 7.10). Reading from a port makes a process wait until data is available, but writing to a port always
completes immediately.
Deterministic behavior is the most unique aspect of Kahns networks. Processes blocking read behavior
guarantees the overall system behavior (specically, the sequence of data tokens that ow through each
queue) is the same regardless of the relative execution rates of the processes, that is, regardless of the
scheduling policy. This is generally a very desirable property because it provides a guarantee about the
behavior of the system, ensures that simulation and reality will match, and greatly simplies the design
task since a designer is not obligated to ensure this herself.
Balancing processes relative execution rates to avoid an unbounded accumulation of tokens is the
challenge in scheduling a Kahn network. One general approach, proposed in Parks thesis [19] places
process f(in int u, in int v, out int w)
{
int i; bool b = true;
for (;;) {
i = b ? wait(u) : wait(w);
printf("%i\n", i);
send(i, w);
b = !b;
}
}
process g(in int u, out int v, out int w)
{
for (;;) {
send(wait(u), v); send(wait(u), w);
}
}
process h(in int u, out int v, int init)
{
send(v, init);
for(;;)
send(wait(u), v);
}
channel int X, Y, Z, T1, T2;
f(Y, Z, X);
g(X, T1, T2);
h(T1, Y, 0);
h(T2, Z, 1);
FIGURE7.10 AKahnProcess Networkwrittenina C-like dialect. Here, processes are functions that runcontinuously,
may be attached to communication channels, and may call wait to wait for data on a particular port and send to write
data to a particular port. The f process alternately copies from its u and v ports to its w port; the g process does the
opposite, copying its u port to alternately v and w; and h simply copies its input to its output.
2006 by Taylor & Francis Group, LLC
7-14 Embedded Systems Handbook
In Filt
1 1
Hil
1 8
Eq
2 4
Mul
2
2
Conj
2 2
Fork
1 2
Mul
1
2
2 2
sc
2 1
Add
1 1
Biq
1
1
Biq
1
1
Fork 1
1
1
1
Deci
1
1
Deco
2 2
Out
1 1
2
2 2
2
FIGURE 7.11 A modem in SDF. Each node represents a process. The labels on each arc indicate the number of
tokens sent or received by a process each time it res.
articial limits on the size of each buffer. Any process that writes to a full buffer blocks until space is
available, but if the system deadlocks because all buffers are full, the scheduler increases the capacity of
the smallest buffer.
In practice, Kahn networks are rarely used in their pure form since they are fairly costly to sched-
ule and their completely deterministic behavior is sometimes overly restrictive since they cannot easily
handle sporadic events (e.g., an occasional change of volume level in a digital volume control) or server-
like behavior where the environment may make requests in an unpredictable order. Nevertheless, Kahns
model still has useful properties and forms a starting point for other dataow models.
7.4.2 Synchronous Dataow
Lee and Messerschmitts [20] Synchronous Dataow (SDF) xes the communication patterns of the blocks
in a Kahn network (Figure 7.11 is an example after Bhattacharyya et al. [21]). Each time a block runs,
it consumes and produces a xed number of data tokens on each of its ports. Although more restrictive
than Kahn networks, SDFs predictability allows it to be scheduled completely at compile time, producing
very efcient code.
Scheduling operates in two steps. First, the rate at which each block res is established by considering
the production and consumption rates of each block at the source and sink of each queue. For example,
the arc between the Hil and Eq nodes in Figure 7.11 implies Hil runs twice as frequently. Once the rates
are established, any algorithmthat simulates the execution of the network without buffer underow will
produce a correct schedule if one exists. However, more sophisticated techniques reduce generated code
and buffer sizes by better ordering the execution of the blocks (see Bhattacharyya et al. [22]).
Synchronous dataow specications are built by assembling blocks typically written in an imperative
language such as C. The SDF block interface is specic enough to make it easy to create libraries of
general-purpose blocks such as adders, multipliers, and even FIR lters.
While SDF is often used as a simulation language, it is also well-suited to code generation. It enables a
practical technique for generating code for digital signal processors, for which C compilers often cannot
generate efcient code. Assembly code is handcrafted for each block in a library, and code synthesis
consists of assembling these handwritten blocks, sometimes generating extra code that handles the inter-
block buffers. For large, specialized blocks such as fast Fourier transforms, this canbe very effective because
most of the generated code was carefully optimized by hand.
7.5 Hybrid Languages
The languages in this section use even more novel models of computation than the hardware, software, or
dataow languages presented earlier (Table 7.3). While such languages are more restrictive than general-
purpose ones, they are much better-suited for certain applications. Esterel excels at discrete control by
blending software-like control ow with the synchrony and concurrency of hardware. Communication
2006 by Taylor & Francis Group, LLC
Languages for Embedded Systems 7-15
TABLE 7.3 Hybrid Language Features Compared
Esterel SDL SystemC
Concurrency
Hierarchy
Preemption
Determinism
Synchronous communication
Buffered communication
FIFO communication
Procedural
Finite-state machines
Dataow
Multi-rate dataow
Software implementation
Hardware implementation
Note: , full support; , partial support.
protocols are SDLs forte; it uses extended nite-state machines with single input queues. SystemC provides
a very exible discrete-event simulation environment built on C ++.
7.5.1 Esterel
Intended for specifying control-dominated reactive systems, Esterel [23] combines the control constructs
of an imperative software language with concurrency, preemption, and a synchronous model of time
like that used in synchronous digital circuits. In each clock cycle, the program awakens, reads its inputs,
produces outputs, and suspends.
An Esterel program communicates through signals that are either present or absent each cycle. In each
cycle, each signal is absent unless an emit statement for the signal runs and makes the signal present for
that cycle only. Esterel guarantees determinism by requiring each emitter of a signal to run before any
statement that tests the signal.
Esterel is strongest at specifying hierarchical state machines. In addition to sequentially composing
statements (separated by a semicolon), it has the ability to compose arbitrary blocks of code in parallel
(the double vertical bars) and abort or suspend a block of code when a condition is true. For example, the
every-do construct in Figure 7.12 effectively wraps a reset statement around two state machines running
in parallel.
7.5.2 SDL
SDL is a graphical specication language developed for describing telecommunication protocols dened
by the ITU [24] (Ellsberger [25] is more readable). A system consists of concurrently-running FSMs,
each with a single input queue, connected by channels that dene which messages they carry. Each FSM
consumes the most recent message in its queue, reacts to it by changing internal state or sending messages
to other FSMs, changes to its next state, and repeats the process. Each FSM is deterministic, but because
messages from other FSMs may arrive in any order because of varying execution speed and communication
delays, an SDL system may behave nondeterministically.
In addition to a fairly standard textual format, SDL has a formalized graphical notation. There are three
types of diagrams. Flowcharts dene the behavior of state machines at the lowest level (Figure 7.13). Block
diagrams illustrating the communication among state machines local to a single processor are at the next
level up. Each communication channel is labeled with the set of messages that it conveys. The top level is
another block diagramthat depicts the communication among processors. The communication channels
2006 by Taylor & Francis Group, LLC
7-16 Embedded Systems Handbook
module Example:
input S, I;
output O;
signal R, A in
every S do
await I;
weak abort
sustain R
when immediate A;
emit O
||
loop
pause; pause;
present R then emit A end;
end
end
end
end module
FIGURE 7.12 An Esterel program modeling a shared resource. This implements two parallel threads (separated
by ||), one that waits for an I signal, then asserts R until it received an A from the other thread and emits an O.
Meanwhile, the second thread emits an R in response to an A in alternate cycles.
Estab
Close
Seqn: =Seq
Seq: =Seq+1
Fin : = 1
Len(9)
Packet
wait1
Packet
Fin?
Seqn: =Ack
Ack: =Ack+1
Ackn: =Seqn+1
Ack : = 1
Len(9)
Rst?
Closed Size?
FIGURE 7.13 A fragment of an SDL owchart specication for a TCP protocol. The rounded boxes denote states
(Estab, wait1, and Closed). Immediately below Estab are inward-pointing boxes that receive signals (Close, Packet).
The square and diamond boxes below these are actions and decisions. The outward-pointed boxes (e.g., Packet) emit
signals.
2006 by Taylor & Francis Group, LLC
Languages for Embedded Systems 7-17
in these diagrams are also labeled with the signals they convey, but are assumed to have signicant delay,
unlike the channels among FSMs in a single processor.
The behavior of an SDL state machine is straightforward. At the beginning of each cycle, it gets the next
signal in its input queue and sees if there is a receive block for that signal off the current state. If there is,
the associated code is executed, possibly emitting other signals and moving to a next state. Otherwise, the
signal is simply discarded and the cycle repeats from the same state. By itself, such semantics have a hard
time dealing with signals that arrive out-of-order, but SDL has an additional construct for handling this
condition. The save construct is like the receive construct, appearing immediately a state and matching a
signal, but when it matches it stores the signal in a buffer that holds it until another state has a matching rule.
7.5.3 SystemC
The SystemC language (Figure 7.14) is a C++ subset for system modeling. A SystemC specication is
simulated by compiling it with a standard C++ compiler and linking in freely-distributed class libraries
from www.systemc.org.
The SystemC language builds systems fromVerilog- and VHDL-like modules. Each has a collection of
I/O ports and may contain instances of other modules or processes dened by a block of C++code.
SystemC uses a discrete-event simulation model. The SystemC scheduler executes the code in a process
in response to an event such as a clock signal, or a delay. This model resembles that used in Verilog and
VHDL, but has the exibility of operating with a general-purpose programming language.
SystemC began life aiming to replace Verilog or VHDL as a hardware description language (it did
not offer designers a sufciently compelling reason to switch), but has since moved beyond that. Very
often in system design, it is desirable to run simulations to estimate such high-level behavior as bus
activity or memory accesses. Historically, designers had custom written simulators in a general-purpose
language such as C, but this was time-consuming because of the need to write a new simulation kernel
(i.e., something that provided concurrency) for each new simulator.
SystemC is emerging as a standard for writing system-level simulations. While not perfect, it works
well enough and makes it fairly easy to glue large pieces of existing software together. Although Verilog
has a PLI (programming language interface) that allows arbitrary C/C++ code to be linked and run
simultaneously with a simulation, the higher integration of the SystemC approach is more efcient.
#include "systemc.h"
struct complex_mult : sc_module {
sc_in<int> a, b;
sc_in<int> c, d;
sc_out<int> x, y;
sc_in_clk clock;
void do_mult() {
for (;;) {
x = a * c - b * d;
wait();
y = a * d + b * c;
wait();
}
}
SC_CTOR(complex_mult) {
SC_CTHREAD(do_mult, clock.pos());
}
};
FIGURE 7.14 A SystemC model for a complex multiplier.
2006 by Taylor & Francis Group, LLC
7-18 Embedded Systems Handbook
SystemC supports transaction-level modeling, in which bus transactions, rather than being modeled
on a per-cycle basis as would be done in a language such as Verilog, are modeled as function calls. For
example, a burst-mode bus transfer would be modeled with a function that marks the bus as in use,
advances simulation time according to the number of bytes to be transferred, actually copies the data in
the simulator, and marks the bus as unused. Nowhere in the simulation would the actual sequence of
signals and bits transferred over the bus appear.
7.6 Summary
Currently, most embedded systems are programmed using C for software and Verilog, or possibly VHDL,
for hardware components suchas FPGAs or ASICs, but this will probably change. The increasedcomplexity
of such designs makes a compelling case for different, higher-level languages. Years ago, designers made
the jump fromassembly to C, and the higher-level constructs of Java are growing more attractive despite
its performance loss.
Domain-specic languages, especially for signal-processing problems, already have a signicant beach-
head, and will continue to make inroads. Most signal processing algorithms are already prototyped using
a higher-level language (Matlab), but it remains to be seen whether synthesis from Matlab will ever be
practical.
For hardware, the direction is less clear. While modeling languages such as SystemC will continue to
grow in importance, there is currently no clear winner for the successor to VHDL and Verilog. Roughly
a decade ago, a different, high-level subset of VHDL and Verilog was proposed as the new behavioral
synthesis subset, but did not catch on because it was too limiting, largely because of restrictions placed on
it by the synthesis algorithms. Additions such as SystemVerilog are incremental, if helpful, improvements,
but will not provide the quantum leap forward that synthesis from the RTL (register-transfer level)
subsets of Verilog and VHDL provided. Perhaps future hardware languages may contain constructs such
as Esterels.
References
[1] Stephen A. Edwards. Languages for Digital Embedded Systems. Kluwer, Boston, MA, September
2000.
[2] Dennis M. Ritchie. The Development of the C Language. In History of Programming Languages II.
Thomas J. Bergin, Jr. and Richard G. Gibson, Jr., Eds. ACMPress, NewYork and Addison-Wesley,
Reading, MA, 1996.
[3] Brian W. Kernighan and Dennis M. Ritchie. The C Programming Language, 2nd ed. Prentice Hall,
Upper Saddle River, NJ, 1988.
[4] Bjarne Stroustrup. The C++ Programming Language, 3rd ed. Addison-Wesley, Reading, MA, 1997.
[5] Ken Arnold, James Gosling, and David Holmes. The Java Programming Language, 3rd ed.
Addison-Wesley, Reading, MA, 2000.
[6] James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. The Java Language Specication, 2nd ed.
Addison-Wesley, Reading, MA, 2000.
[7] TimLindholmand Frank Yellin. The Java Virtual Machine Specication. Addison-Wesley, Reading,
MA, 1999.
[8] Greg Bollella, Ben Brosgol, Peter Dibble, Steve Furr, James Gosling, David Hardin, Mark Turnbull,
Rudy Belliardi, Doug Locke, Scott Robbins, Pratik Solanki, and Dionisio de Niz. The Real-Time
Specication for Java. Addison-Wesley, Reading, MA, 2000.
[9] Jean Labrosse. MicroC/OS-II. CMP Books, Lawrence, Kansas, 1998.
[10] Loic P. Briand and Daniel M. Roy. Meeting Deadlines in Hard Real-Time Systems: The Rate
Monotonic Approach. IEEE Computer Society Press, NewYork, 1999.
2006 by Taylor & Francis Group, LLC
Languages for Embedded Systems 7-19
[11] C. L. Liu and James W. Layland. Scheduling Algorithms for Multiprogramming in a Hard Real-
Time Environment. Journal of the Association for Computing Machinery, 20: 4661, 1973.
[12] IEEE Computer Society. IEEE Standard Hardware Description Language Based on the Verilog
Hardware Description Language (13641995). IEEE Computer Society Press, NewYork, 1996.
[13] Donald E. Thomas and Philip R. Moorby. The Verilog Hardware Description Language, 4th ed.
Kluwer, Boston, MA, 1998.
[14] IEEE Computer Society. IEEE Standard VHDL Language Reference Manual (10761993). IEEE
Computer Society Press, NewYork, 1994.
[15] Douglas L. Perry. VHDL, 3rd ed. McGraw-Hill, NewYork, 1998.
[16] Ben Cohen. VHDL Coding Styles and Methodologies, 2nd ed. Kluwer, Boston, MA, 1999.
[17] Peter J. Ashenden. The Designers Guide to VHDL. Morgan Kaufmann, San Francisco, CA, 1996.
[18] Gilles Kahn. The Semantics of a Simple Language for Parallel Programming. In Information
Processing 74: Proceedings of IFIP Congress 74. North-Holland, Stockholm, Sweden, August 1974,
pp. 471475.
[19] Thomas M. Parks. Bounded Scheduling of Process Networks. PhDthesis, University of California,
Berkeley, 1995. Available as UCB/ERL M95/105.
[20] Edward A. Lee and David G. Messerschmitt. Synchronous Data Flow. Proceedings of the IEEE,
75: 12351245, 1987.
[21] Shuvra S. Bhattacharyya, Praveen K. Murthy, and Edward A. Lee. Synthesis of Embedded Software
fromSynchronous DataowSpecications. Journal of VLSI Signal Processing Systems, 21: 151166,
1999.
[22] Shuvra S. Bhattacharyya, Ranier Leupers, and Peter Marwedel. Software Synthesis and Code
Generation for Signal Processing Systems. IEEE Transactions on Circuits and Systems II: Analog
and Digital Signal Processing, 47: 849875, 2000.
[23] Grard Berry and Georges Gonthier. The Esterel Synchronous Programming Language: Design,
Semantics, Implementation. Science of Computer Programming, 19: 87152, 1992.
[24] International Telecommunication Union. ITU-T Recommendation Z.100: Specication and
Description Language. International Telecommunication Union, Geneva, 1999.
[25] Jan Ellsberger, Dieter Hogrefe, and Amardeo Sarma. SDL: Formal Object-Oriented Language for
Communicating Systems, 2nd ed. Prentice Hall, Upper Saddle River, NJ, 1997.
2006 by Taylor & Francis Group, LLC
8
The Synchronous
Hypothesis and
Synchronous
Languages
Dumitru Potop-Butucaru
IRISA
Robert de Simone
INRIA
Jean-Pierre Talpin
IRISA
8.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1
8.2 The Synchronous Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
What For? Basic Notions Mathematical Models
Implementation issues
8.3 Imperative Style: Esterel and SyncCharts . . . . . . . . . . . . . . 8-5
Syntax and Structure Semantics Compilation and
Compilers Analysis/Verication/Test Generation: Benets
from Formal Approaches
8.4 The Declarative Style: Lustre and Signal . . . . . . . . . . . . . . . 8-11
A Synchronous Model of Computation Declarative Design
Languages Compilation of Declarative Formalisms
8.5 Success Stories A Viable Approach for System
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-18
8.6 Into the Future: Perspectives and Extensions . . . . . . . . . . 8-18
Asynchronous Implementation of Synchronous Specications
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-20
8.1 Introduction
Electronic Embedded Systems are not new, but their pervasive introduction in ordinary-life objects (cars,
phones, home appliances) brought a new focus onto design methods for such systems. New development
techniques are needed to meet the challenges of productivity in a competitive environment. This hand-
book reports on a number of such innovative approaches to the matter. Here, we shall concentrate on
Synchronous Reactive (S/R) languages [14].
S/R languages rely on the synchronous hypothesis, which divides computations and behaviors into a
discrete sequence of computation steps that are equivalently called reactions or execution instants. In itself,
Partly supported by the ARTIST IST European project.
8-1
2006 by Taylor & Francis Group, LLC
8-2 Embedded Systems Handbook
this assumption is rather common in practical embedded system design. But the synchronous hypothesis
adds the fact that, inside each instant, the behavioral propagation is well-behaved (causal), so that the status
of every signal or variable is established and dened prior to being tested or used. This criterion, which
may be seen at rst as an isolated technical requirement, is in fact the key point of the approach. It ensures
strong semantic soundness by allowing universally recognized mathematical models such as the Mealy
machines andthe digital circuits tobe usedas supporting foundations. Inturn, these models give access toa
large corpus of efcient optimization, compilation, and formal verication techniques. The synchronous
hypothesis also guarantees full equivalence between various levels of representation, thereby avoiding
altogether the pitfalls of nonsynthesizability of other similar formalisms. In that sense, the synchronous
hypothesis is, in our view, a major contribution to the goal of model-based design of embedded systems.
Structured languages have been introduced for the modeling and programming of S/R applications.
They are roughly classied into two families:
Imperative languages, such as Esterel [57] and SyncCharts [8], provide constructs to shape control-
dominated programs such as hierarchical synchronous automata, in the wake of the StateCharts
formalism, but with a full-edged treatment of simultaneity, priority, and absence notication
of signals in a given reaction. Thanks to this, signals assume a consistent status for all parallel
components in the system at any given instant.
Declarative languages, such as Lustre [9] and Signal [10], shape applications based on intensive data
computation and data-ow organization, with the control ow part operating under the form of
(internally generated) activation clocks. These clocks prescribe which data computation blocks
are to be performed as a part of the current reaction. Here again, the semantics of the languages
deal with the issue of behavior consistency, so that every value needed in a computation is indeed
available at that instant.
Here, we shall describe the synchronous hypothesis and its mathematical background, together with
a range of design techniques empowered by the approach and a short comparison with neighboring
formalisms; then, we introduce both classes of S/R languages, with their special features and a couple
of programming examples; nally, we comment on the benets and shortcomings of S/R modeling,
concluding with a look at future perspectives and extensions.
8.2 The Synchronous Hypothesis
8.2.1 What For?
Program correctness (the process performs as intended) and program efciency (it performs as fast as
possible) are major concerns in computer science, but they are even more stringent in the embedded area,
as no online debugging is feasible, and time budgets are often imperative (for instance in multimedia
applications).
Programcorrectness is sought by introducing appropriate syntactic constructs and dedicated languages,
making programs more easily understandable by humans, as well as allowing high-level modeling and
associated verication techniques. Provided semantic preservation is ensured down to actual implement-
ation code, this provides reasonable guarantees on functional correctness. However, while this might
sound obvious for traditional software compilation schemes, the hardware synthesis process is often not
seamless, as it includes manual rewriting.
Program efciency is traditionally handled in the software world by algorithmic complexity analysis,
and expressed in terms of individual operations. But in modern systems, owing to a number of phe-
nomena, this high-level complexity reects rather imperfectly the low-level complexity in numbers
of clock cycles spent. In the hardware domain, one considers various levels of modeling, correspond-
ing to more abstract (or conversely more precise) timing account: transaction level, cycle accurate, time
accurate.
2006 by Taylor & Francis Group, LLC
Synchronous Hypothesis and Languages 8-3
One possible way (amongst many) to view synchronous languages is to take up the analogy of
cycle-accurate programming to a more general setting, including (reactive) software as well. This ana-
logy is supported by the fact that simulation environments in many domains (from scientic engineering
to Hardware Description Language [HDL] simulators) often use lockstep computation paradigms, very
close to the synchronous cycle-based computation. In these settings, cycles represent logical steps, not
physical time. Of course timing analysis is still possible afterwards, and in fact often simplied by the
previous division into cycles.
The focus of synchronous languages is thus to allow modeling and programming of systems where cycle
(computation step) precision is needed. The objective is to provide domain-specic structured languages for
their description, and to study matching techniques for efcient design, including compilation/synthesis,
optimization, and analysis/verication. The strong condition insuring the feasibility of these design
activities is the synchronous hypothesis, described in Section 8.2.2.
8.2.2 Basic Notions
What has come to be known as the synchronous hypothesis, laying foundations for S/R systems, is really
a collection of assumptions of a common nature, sometimes adapted to the framework considered. We
shall avoid heavy mathematical formalization in this presentation, and defer the interested reader to the
existing literature, such as References 3 and 4. The basics are:
Instants and reactions. Behavioral activities are divided according to (logical, abstract) discrete time. In
other words, computations are divided according to a succession of nonoverlapping execution instants.
In each instant, input signals possibly occur (for instance, by being sampled), internal computations take
place, and control and data are propagated until output values are computed and a new global system
state is reached. This execution cycle is called the reaction of the system to the input signals. Although
we used the word time just before, there is no real physical time involved, and instant durations need
not be uniform (or even considered!). All that is required is that reactions converge and computations are
entirely performed before the current execution instant ends and a new one begins. This empowers the
obvious conceptual abstraction that computations are innitely fast (instantaneous, zero-time), and
take place only at discrete points in (physical) time, with no duration. When presented without sufcient
explanations, this strong formulation of the synchronous hypothesis is often discarded by newcomers as
unrealistic (while, again, it is only an abstraction, amply used in other domains where all-or-nothing
transaction operations take place).
Signals. Broadcast signals are used to propagate information. At each execution instant, a signal can
either be present or absent. If present, it also carries some value of a prescribed type (pure signals exists
as well, that carry only their presence status). The key rule is that a signal must be consistent (same
present/absent status, same data) for all read operations during any given instant. In particular, reads
from parallel components must be consistent, meaning that signals act as controlled shared variables.
Causality. The crucial task of deciding whenever a signal can be declared absent is of utter importance
in the theory of S/R systems, and an important part of the theoretical body behind the synchronous
hypothesis. This is of course especially true for local signals, that are both generated and tested inside the
system. The fundamental rule is that the present status and value of a signal should be dened before they
are read (and tested). This requirement takes various practical forms depending on the actual language
or formalism considered, and we shall come back to this later. Here, note that before refers to causal
dependency in the computation of the instant, and not to physical or even logical time between successive
instants [11]. The synchronous hypothesis ensures that all possible schedules of operations amount to the
same result (convergence); it also leads to the denition of correct programs, as opposed to ill-behaved
ones where no causal scheduling can be found.
Activation conditions and clocks. Each signal can be seen as dening (or generating) a new clock, ticking
when it occurs; in hardware design, this is called gated clocks. Clocks and sub-clocks, either externally or
internally generated, can be used as control entities to activate (or not) component blocks of the system.
We shall also call them activation conditions.
2006 by Taylor & Francis Group, LLC
8-4 Embedded Systems Handbook
8.2.3 Mathematical Models
If one forgets temporarily about data values, and one accepts the duality of present/absent signals mapped
on to true/false values, then there is a natural interpretation of synchronous formalisms as synchronous
digital circuits at schematic gate level, or netlists (roughly RTL level with only Boolean variables and
registers). In turn, such circuits have a straightforward behavioral expansion into Mealy Finite State
Machines (FSMs).
The two slight restrictions given here are not essential: the adjunction of types and values into digital
circuit models has been successfully attempted in a number of contexts, and S/R systems can also be seen
as contributing to this goal. Meanwhile, the introduction of clocks and present/absent signal status in
S/R languages departs drastically from the prominent notion of sensitivity list generally used to dene the
simulation semantics of HDLs.
We now comment on the opportunities made available through the interpretation of S/R systems into
Mealy machines or netlists:
Netlists. Here, we consider netlists a simple form, as Boolean equation systems dening the values
of wires and Boolean registers as a Boolean function of other wires and previous register values. Some
wires represent input and output signals (with value true indicating signal presence), others are internal
variables. This type of representation is of special interest because it can provide exact dependency
relations between variables, and thus a good representation level to study causality issues with accurate
analysis. Notions of constructive causality have been the subject of much attention here. They attempt
to rene the usual crude criterion for synthesizability, which forbids cyclic dependencies between nonre-
gister variables (so that a variable seems to depend upon itself in the same instant), but neither takes into
account the Boolean interpretation, nor the potentially reachable congurations. Consider the equation
x = y z, while it has been established that y is the constant true. Then x does not really depend on z,
since its (constant) value is forced by ys. Constructive causality seeks for the best possible faithful notion
of true combinatorial dependency taking the Boolean interpretation of functions into account. For details,
see Reference 12.
Another equally important aspect of the mathematical model is that a number of combinatorial and
sequential optimization techniques have been developed over the years, in the context of hardware syn-
thesis approaches. The main ones are now embedded in the SIS and MVSIS optimization suites, from UC
Berkeley [13, 14]. They come as a great help in allowing programs written in high-level S/R formalisms to
compile into efcient code, either software or hardware targeted [15].
Mealy machines. Mealy machines are nite-state automata corresponding strictly to the synchronous
assumption. In a given state, provided a certain input valuation (a subset of present signals), the machine
reacts by immediately producing a set of output signals before entering a new state.
The Mealy machines can be generated fromnetlists (and by extension fromany S/R system). The Mealy
machine construction can then be seen as a symbolic expansion of all possible behaviors, computing
the space of reachable states (RSS) on the way. But while the precise RSS is won, the precise causal
dependencies relations are lost, which is why both Mealy FSM and netlists models are useful in the course
of S/R design [16].
When the RSS is extracted, often in symbolic Binary Decision Diagram (BDD) form, it can be used in a
number of ways: we already mentioned that constructive causality only considers dependencies inside the
RSS; similarly, all activities of model-checking formal verication, and test coverage analysis are strongly
linked to the RSS construction [1720].
The modeling style of netlists can be extrapolated to block-diagram networks, often used in multimedia
digital signal processing, by adding more types and arithmetic operators, as well as activation condi-
tions to introduce some amount of control ow. The declarative synchronous languages can be seen as
attempts to provide structured programming to compose large systems modularly in this class of applica-
tions, as described in Section 8.4. Similarly, imperative languages provide ways to program in a structured
way, hierarchical systems of interacting Mealy FSMs, as described in Section 8.3.
2006 by Taylor & Francis Group, LLC
Synchronous Hypothesis and Languages 8-5
reaction () {
}
decode state ; read input ;
compute ;
write output ; encode state ;
FIGURE 8.1 The reaction function is called at each instant to perform the computation of the current step.
8.2.3.1 Synchronous Hypothesis versus Neighboring Models
Many quasi-synchronous formalisms exist in the elds of embedded system (cosimulation): the simulation
semantics of SystemC and regular HDLs at RTL level, or the discrete-step Simulink/Stateow simulation,
or the ofcial StateCharts semantics, for instance. Such formalisms generally employ a notion of physical
time in order to establish when to start the next execution instant. Inside the current execution instant,
however, delta-cycles allow zero-delay activity propagation, and potentially complex behaviors occur inside
a given single reaction. The main difference here is that no causality analysis (based on the synchronous
hypothesis) is performed at compilation time, so that an efcient ordering/scheduling cannot be precom-
puted before simulation. Instead, each variable change recursively triggers further recomputations of all
depending variables in the same reaction.
8.2.4 Implementation Issues
The problem of implementing a synchronous specication mainly consists in dening the step reaction
function that will implement the behavior of an instant, as shown in Figure 8.1. Then, the global behavior
is computed by iterating this function for successive instants and successive input signal valuations.
Following the basic mathematical interpretations, the compilation of a S/R program may either consist in
the expansion into a at Mealy FSM, or in the translation into a at netlist (with more types and arithmetic
operators, but without activation conditions). Here, the runtime implementation consists in the execution
of the resulting Mealy machine or netlist. In the rst case, the automaton structure is implemented as a
big top-level switch between states. In the second case, the netlist is totally ordered in a way compatible
with causality, and all the equations in the ordered list are evaluated at each execution instant. These basic
techniques are at the heart of the rst compilers, and some industrial ones.
In the last decade fancier implementation schemes have been sought, relying on the use of activation
conditions: during each reaction, execution starts by identifying the truly useful program blocks that are
marked as active. Then only the actual execution of the active blocks is scheduled (a bit more dynam-
ically) and performed in an order that respects the causality of the program. In the case of declarative
languages, the activation conditions come in the form of a hierarchy of clock under samplings the clock
tree, obtained through a clock calculus computation performed at compile time (see Section 8.4.3).
In the case of imperative formalisms, activation conditions are based on the halting points (where
the control ow can stop between execution instants) and on the signal-generated (sub-)clocks (see
Section 8.3.3).
8.3 Imperative Style: Esterel and SyncCharts
For control-dominated systems, comprising a fair number of (sub-)modes and macro-states with activity
swapping between them, it is natural to employ a description style that is algorithmic and imperative,
describing the changes and progression of control in an explicit ow. In essence, one seeks to represent
hierarchical (Mealy) FSMs, but with some data computation and communication treatment performed
inside states andtransitions. Esterel provides this ina textual fashion, while SyncCharts propose a graphical
counterpart, with visual macro-states. It should be noted that systems here remain in nite-state (at least
their control structure).
2006 by Taylor & Francis Group, LLC
8-6 Embedded Systems Handbook
8.3.1 Syntax and Structure
Esterel introduces a specic pauseconstruct, usedto divide behaviors into successive instants (reactions).
The pause statement excepts, control is owing through sequential, parallel, and ifthenelse constructs,
performing data operations, and interprocess signaling. But it stops at pause, memorizing the activity of
that location point for the next execution instant. This provides the needed atomicity mechanism, since
the instant is over when all currently active parallel components reach a pause statement.
The full Esterel language contains a large number of constructs that facilitate modeling, but there exists
a reduced kernel of primitive statements (corresponding to the natural structuring paradigms) fromwhich
all the other constructs can be derived. This is of special interest for model-based approaches, because
only primitives need to be assigned semantics as transformations in the model space. The semantics of
the primitives are then combined to obtain the semantics of composed statements. Figure 8.2 provides
the list of primitive operators for the data-less subset of Esterel (also called Pure Esterel). A few comments
are here in order:
In p; q the reaction where p terminates is the same as the reaction where q starts (control can be
split into reactions only by pause statements inside p or q).
The loop constructs do not terminate, unless aborted from above. This abortion can be owing to
an external signal received by an abort statement, or to an internal exception raised through the
trap/exit mechanism, or to any of the two (like for the weak abort statement). The body
of a loop statement should not instantly terminate, or else the loop will unroll endlessly in the
same instant, leading to divergence. This is checked by static analysis techniques. Finally, loops are
the only means of dening iterating behaviors (there is no general recursion), so that the system
remains in nite-state.
The presentsignal testing primitive allows an elsepart. This is essential to the expressive power
of the language, and has strong semantic implications pertaining to the synchronous hypothesis.
It is enough to note that, according to the synchronous hypothesis, signal absence can effectively be
asserted.
The difference between abort p when S and weak abort p when S is that in the rst
case signal S can only come from outside p and its occurrence prevents p from executing during
the execution instant where S arrives. In the second case, S can also be emitted by p, and the
preemption occurs only after p has completed its execution for the instant.
[p]
pause
p; q
loop p end
[p || q]
signal S inend
emit S
present S then p else q end
abort p when S
weak abort p when S
suspend p when S
trap T in p end
exit T
Enforces precedence by parenthesis
Suspends the execution until next instant
Executes p, then q as soon as p terminates
Iterates p forever in sequence
Executes p and q in parallel, synchronously
Declares local signal S in p
Emits signal S
Executes p or q upon S being present or absent!
Executes p until Soccurs (exclusive)
Executes p until S occurs (inclusive)
Executes p unless S occurs
Declare/catch exception T in p
Raise exception T
FIGURE 8.2 Pure Esterel statements.
2006 by Taylor & Francis Group, LLC
Synchronous Hypothesis and Languages 8-7
Technically speaking, the trap/exit mechanisms can emulate the abort statements. But we feel
that the ease of understanding makes the latter worth including in the set of primitives. Similarly,
we shall sometimes use await S as a shorthand for abort loop pause end when S, and
sustain S for loop emit S end.
Most of the data-handling part of the language is deferred to a general-purpose host language (C, C++,
Java, ). Esterel only declares type names, variables types, and function signatures (which are used as mere
abstract instructions). The actual type specications and function implementations must be provided and
linked at later compilation time.
In addition to the structuring primitives of Figure 8.2, the language contains (and requires) interface
declarations (for signals, most notably), and modular division with submodules invocation. Submodule
instantiation allows signal renaming, that is, transforming virtual name parameters into actual ones (again,
mostly for signals). Rather than providing a full user manual for the language, we shall illustrate most of
these features on an example.
The small example of Figure 8.3 has four input signals and one output signal. Meant to model a cyclic
computation like a communication protocol, the core of our example is the loop that awaits the input I,
emits O, and then awaits J before instantly restarting. The local signal END signals the completion of loop
cycles. When started, the await statement waits for the next clock cycle where its signal is present. The
computation of all the other statements present in our example is performed during a single clock cycle,
so that the await statements are the only places where control can be suspended between reactions (they
preserve the state of the program between cycles). A direct consequence is that the signals I and J must
come in different clock cycles in order not to be discarded.
The loop is preempted by the exception handling statement trap when exit T is executed. In
this case, trap instantly terminates, control is given in sequence, and the program terminates. The
preemption protocol is triggered by the input signal KILL, but the exception T is raised only when END is
emitted. The programis suspended no computation is performed and the state is kept unchanged in
clock cycles where the SUSP signal is received. A possible execution trace for our program is given
in Figure 8.4.
module Example: input I,J,KILL,SUSP; output O;
suspend
trap T in %exception handler, performs the preemption
signal END in
loop %basic computation loop
await I;emit O;await J;emit END
end
||
%preemption protocol, triggered by KILL
await KILL;await END;exit T
end
end;
when SUSP %suspend signal
end module
FIGURE 8.3 A simple Esterel programmodeling a cyclic computation (such as a communication protocol) that can
be interrupted between cycles and which can be suspended.
2006 by Taylor & Francis Group, LLC
8-8 Embedded Systems Handbook
Clock Inputs Outputs Comments
0 Any All inputs discarded
1 I O
2 KILL Preemption protocol triggered
3 Nothing happens
4 J,SUSP Suspend, J discarded
5 J END emitted, T raised, program terminates
FIGURE 8.4 A possible execution trace for our example.
8.3.2 Semantics
Esterel enjoys a full-edged formal semantics, in the form of Structural Operational Semantic (SOS)
rules [12]. In fact, there are two main levels of such rules, with the coarser describing all potential,
logically consistent behaviors, while the more precise one only selects those that can be obtained in a
constructive way (thereby discarding some programs as unnatural in this respect). This issue can be
introduced with two small examples:
present S then emit S end present S else emit S end
In the rst case the signal S can logically be assumed as either present or absent: if assumed present, it will
be emitted, so it will become present; if assumed absent, it will not be emitted. In the second case, following
a similar reasoning, the signal can be neither present nor absent. In both cases, anyhow, the analysis is
done byguessingbefore branching to the potentially validating emissions. While more complex causality
paradoxes can be built using the full language, these two examples already show that the problem stems
from the existence of causality dependencies inside a reaction, prompted by instantaneous sequential
control propagation and signal exchanges. The so-called constructive causality semantics of Esterel checks
precisely that control and signal propagation are well-behaved, so that no guess is required. Programs
that pass this requirement are deemed as correct, and they provide deterministic behaviors for whatever
input is presented to the program(which is a desirable feature in embedded systemdesign).
8.3.3 Compilation and Compilers
Following the pattern presented in Section 8.2.4, the rst compilers for Esterel were based on the trans-
lation of the source into (Mealy) nite automata or into digital synchronous circuits at netlist level.
Then, the generated sequential code was a compiled automata or netlist simulator. The automata-based
compilation [7] was used in the rst Esterel compilers (known as Esterel V3). Automaton generation
was done here by exhaustive expansion of all reachable states using symbolic execution (all data is kept
uninterpreted). Execution time was then theoretically optimal, but code size could blowup (as the number
of states), and huge code duplication was mandatory for actions that were performed in several different
states. The netlist-based compilation (Esterel V5) is based on a quasi-linear, structural Esterel-to-circuits
translation scheme [21] that ensures the tractability of compilation even for the largest examples. The
drawback of the methodis the reactiontime (the simulationtime for the generatednetlist), whichincreases
linearly with the size of the program.
Apart fromthese two compilation schemes, which have matured into full industrial-strength compilers,
several attempts have been made to develop a more efcient, basically event-based type of compilation that
follows more readily the naive execution path and control propagation inside each reaction, and in par-
ticular executes as much as possible only the truly active parts of the program.
1
Here, we mention three
1
Recall that this is a real issue in Esterel, since programs may contain reaction to absence of signals, and determining
this absence may require to check that no emission remains possible in the potential behaviors, whatever feasible test
2006 by Taylor & Francis Group, LLC
Synchronous Hypothesis and Languages 8-9
Loop
Seq
Par
Signal
Trap
Suspend
Hierarchical
state
representation
(a)
Program
activation
Program
start
Concurrent
control-flow
graph
(b)
FIGURE 8.5 GRC intermediate representation for our Esterel example.
such approaches: the Saxo compiler of Closse and co-workers [22], the EC compiler of Edwards [23],
and the GRC2C compiler of Potop-Butucaru and de Simone [24]. All of them are structured around
owgraph-based intermediate representations that are easily translated into well-structured sequential
code. The different intermediate representations also give the differences between approaches, by determ-
ining which Esterel programs can be represented, and what optimization and code generation techniques
can be applied.
We exemplify on the GRC2C compiler [24], which is structured around the GRC intermediate form.
The GRC representation of our example, given in Figure 8.5, uses two graph-based structures a
hierarchical state representation (HSR) and a concurrent control-ow graph (CCFG) to preserve most of
the structural information of the Esterel program while making the control ow explicit with few graph-
building primitive nodes. The HSR is an abstraction of the syntax tree of the initial Esterel program.
It can be seen as a structured data memory that preserves state information across reactions. During
each instant, a set of activation conditions (clocks) is computed from this memory state, to drive the
execution toward active instructions. The CCFG represents, in an operational fashion, the computation of
an instant (the transition function). During each reaction, the dynamic CCFG operates on the static HSR
by marking/unmarking component nodes (subtrees) withactive tags as they are activated or deactivated
by the semantics.
For instance, when we start our small example (Figure 8.3 and Figure 8.5), the program start (1) and
program (0) HSR nodes are active, while all the statements of the program (and the associated HSR
nodes) are not. Like in any instant, control enters the CCFG by the topmost node and uses the rst state
decoding node (labeled 0) to read the state of the HSR and branch to the start behavior, which sets the
program start (1) indicator to inactive (with exit 1), and activates await I and await KILL (with
enter 8 and enter 11).
branches could be taken. To achieve this goal at a reasonable computational price, current compilers require, in fact,
additional restrictions in essence, the acyclicity of the dependency/causality graph at some representation level.
Acyclicity ensures constructiveness, because any topological order of the operations in the graph gives an execution
order which is correct for all instants.
2006 by Taylor & Francis Group, LLC
8-10 Embedded Systems Handbook
The HSR also serves as a repository for tags, which record redundancies between various activation
clocks, and are used by the optimization and code generation algorithms. Such a tag is #, which tells that
at most one child of the tagged node can retain control between reactions at a time (the activation clocks
of the branches are exclusive). Other tags (not gured here) are computed through complex static analysis
of both the HSR and CCFG. The tags allow efcient optimization and sequential code generation.
The CCFG is obtained by making the control ow of the Esterel programexplicit (a structural, quasi-
linear translation process).
2
Usually, it can be highly optimized using classical compiler techniques and
some methods derived fromcircuit optimization, both driven by the HSRtags computed by static analysis.
Code generation froma GRC representation is done by encoding the state on sequential variables, and by
scheduling the CCFG operators using classical compilation techniques [25].
The Saxo compiler of Closse and co-workers [22] uses a discrete-event interpretation of Esterel to
generate a compiled event-driven simulator. The compiler ow is similar to that of VeriSUIF [26], but
Esterels synchronous semantics are used to highly simplify the approach. An event graph intermediate
representation is used here to split the program into a list of guarded procedures. The guards intuitively
correspond to events that trigger computation. At each clock cycle, the simulation engine traverses the list
once, fromthe beginning to the end, and executes the procedures with an active guard. The execution of a
procedure may modify the guards for the current cycle and for the next cycle. The resulting code is slower
than its GRC2C-generated counterpart for two reasons: rst, it does not exploit the hierarchy of exclusion
relations determined by switching statements like the tests. Second, optimization is less effective because
the programhierarchy is lost when the state is (very redundantly) encoded using guards.
The EC compiler of Edwards [23] treats Esterel as having control-ow semantics (in the spirit of
[25,27]) in order to take advantage of the initial programhierarchy and produce efcient, well-structured
C code. The Esterel program is rst translated into a CCFG representing the computation of a reaction.
The translation makes the control ow explicit and encodes the state access operations using tests and
assignments of integer variables. Its static scheduling algorithm takes advantage of the mutual exclusions
between parts of the program and generates code that uses program counter variables instead of simple
Boolean guards. The result is therefore faster than its Saxo-generated counterpart. However, it is usually
slower than the GRC2C-generated code because the GRC representation preserves the state structure of
the initial Esterel programand uses static analysis techniques to determine redundancies in the activation
pattern. Thus, it is able to better simplify the nal state representation and the CCFG.
8.3.4 Analysis/Verication/Test Generation: Benets from
Formal Approaches
We claimed that the introduction of well-chosen structuring primitives, endowed with formal mathem-
atical semantics and interpretations as well-dened transformations in the realms of Mealy machines and
synchronous circuits, was instrumental in allowing powerful analysis and synthesis techniques as part of
the design of synchronous programs. What are they, and how do they appear in practice to enhance the
condence in the correctness of safety-critical embedded applications?
Maybe, the most obvious is that synchronous formalisms can fully benet from the model-checking
and automatic verication usually associated to the netlist and Mealy machine representations, and now
widely popular in the hardware design community with the PSL/SuGaR and assertion-based design
approaches. Symbolic BDD- and SAT-based model-checking techniques are thus available on all S/R
systems. Moreover, the structured syntax allows in many cases the introduction of modular approaches,
or guide abstraction techniques with the goal of reducing complexity of analysis.
The ability of formal methods akin to model-checking can also be used to automatically produce test
sequences that seek to reach the best possible coverage in terms of visited states or exercised transitions.
Here again specic techniques were developed to match the S/R models.
2
Sucha process is necessary, because most Esterel statements pack together twodistinct, andoftendisjoint behaviors:
one for the execution instants where they are started, and one for instants where control is resumed frominside.
2006 by Taylor & Francis Group, LLC
Synchronous Hypothesis and Languages 8-11
Also, symbolic representations of the reachable state spaces (or abstracted over-approximations), which
can effectively be produced and certied correct, thanks to the formal semantics, can be used in the course
of compilation and optimization. In particular for Esterel, the RSS computation allows more correct
programs with respect to constructiveness: indeed causal dependencies may vary in direction depending on
the state. If all dependencies are put together regardless of the states, then a causality cycle may appear, while
not all components of the cycle may be active at the same instant, and so no real cycle exists (but it takes a
dynamic analysis to establish this). Similarly, the RSS may exhibit combinatorial relations between registers
encoding the local states, so that register elimination is possible to further simplify the state space structure.
Finally, the domain-specic structuring primitives empowering dedicated programming can also be
seen as an important criterion. Readable, easily understandable programs are a big step toward correct
programs. And when issues of correctness are not so plain and easy, as for instance, when regarding proper
scheduling of behaviors inside a reaction to respect causal effects, then powerful abstract hypothesis are
dened in the S/R domain that dene admissible orderings (and build them for correct programs).
A graphical version of Esterel, named SyncCharts for synchronous StateCharts, has been dened to
provide a visual formalism with a truly synchronous semantics.
8.4 The Declarative Style: Lustre and Signal
The presentation of declarative formalisms implementing the synchronous hypothesis as dened in
Section 8.2 can be cast into a model of computation (proposed in Reference 28 consisting of a domain
of traces/behaviors and of a semilattice structure that renders the synchronous hypothesis using a timing
equivalence relation: clock equivalence. Asynchrony can be superimposed on this model by consider-
ing a ow equivalence relation. Heterogeneous systems [29] can also be modeled by parameterizing the
composition operator using arbitrary timing relations.
8.4.1 A Synchronous Model of Computation
We consider a partially-ordered set of tags t to denote instants (which are seen, in the sense of Section 8.2.2,
as symbolic periods in time during which one reaction takes place). The relation t
1
t
2
says that t
1
occurs
before t
2
. A minimum tag exists, denoted by 0. A totally ordered set of tags C is called a chain and denotes
the sampling of a possibly continuous or dense signal over a countable series of causally related tags.
Events, signals, behaviors, and processes are dened as follows:
An event e is a pair consisting of a value v and a tag t .
A signal s is a function from a chain of tags to a set of values.
A behavior b is a function from a set of names x to signals.
A process p is a set of behaviors that have the same domain.
In the remainder, we write tags (s ) for the tags of a signal s, vars (b ) for the domains of b, b [
X
for the
projection of a behavior b on a set of names X, and b /X for its complementary. Figure 8.6 depicts a
behavior (b) over three signals named x, y, and z. Two frames depict timing domains formalized by
chains of tags. Signal x and y belong to the same timing domain: x is a down-sampling of y. Its events are
synchronous to odd occurrences of events along y and share the same tags, for example, t
1
. Even tags of y,
for example, t
2
, are ordered along its chain, for example, t
1
< t
2
, but absent from x. Signal z belongs to
a different timing domain. Its tags, for example, t
3
are not ordered with respect to the chain of y, for
example, t
1
, t
3
and t
3
, t
1
.
The synchronous composition of the processes p and q is denoted by p [ [ q. It is dened by the union b c
of all behaviors b (from p) and c (from q) that hold the same values at the same tags b [
I
= c[
I
for all
signal x I = vars (b ) vars (c ) they share. Figure 8.7 depicts the synchronous composition, right, of the
behaviors b, left, and the behavior c, middle. The signal y, shared by b and c, carries the same tags and the
same values in both b and c. Hence, b c denes the synchronous composition of b and c.
2006 by Taylor & Francis Group, LLC
8-12 Embedded Systems Handbook
FIGURE 8.6 A behavior (named b) over three signals (x, y, and z) belonging to two clock domains.
FIGURE 8.7 Synchronous composition of b p and c q.
FIGURE 8.8 Scheduling relations between simultaneous events.
FIGURE 8.9 Relating synchronous behaviors by stretching.
A scheduling structure is dened to schedule the occurrence of events along signals during an instant t .
A scheduling by a preorder relation between dates x
t
where t represents the time and x the location
of the event. Figure 8.8 depicts such a relation, superimposed to the signals x and y of Figure 8.6. The
relation y
t
1
x
t
1
, for instance, requires y to be calculated before x at the instant t
1
. Naturally, scheduling
is contained in time: if t < t
/
then x
t
b
x
t
/ for any x and b and if x
t
b
x
t
/ then t
/
,< t .
A synchronous structure is dened by a semilattice structure to denote behaviors that have the same
timing structure. The intuition behind this relation (depicted in Figure 8.9) is to consider a signal as an
elastic with ordered marks on it (tags). If the elastic is stretched, marks remain in the same relative and
partial order but have more space (time) between each other. The same holds for a set of elastics: a behavior.
If elastics are equally stretched, the order between marks is unchanged. In Figure 8.9, the timescale of x
and y changes but the partial timing and scheduling relations are preserved. Stretching is a partial-order
relation which denes clock equivalence. Formally, a behavior c is a stretching of b of same domain, written
b c, if there exists an increasing bijection on tags f that preserves the timing and scheduling relations.
If so, c is the image of b by f . Last, the behaviors b and c are called clock-equivalent, written b c, iff there
exists a behavior d such that d b and d c.
8.4.2 Declarative Design Languages
The declarative design languages Lustre [9] and Signal [10] share the core syntax of Figure 8.10 and
can both be expressed within the synchronous model of computation of Section 8.4.1. In both languages,
2006 by Taylor & Francis Group, LLC
Synchronous Hypothesis and Languages 8-13
FIGURE 8.10 A common syntactic core for Lustre and Signal.
-> pre
if then
FIGURE 8.11 The ifthenelse condition in Lustre.
node counter (tick, reset: bool) returns (count: int);
let
count = i f true->reset
then 0
else if tick then pre count+1 else pre count;
FIGURE 8.12 A resettable counter in Lustre.
a process P is aninnite loopthat consists of the synchronous compositionP [ [ Qof simultaneous equations
x = y f z over signals named x, y, and z. Both Lustre and Signal support the restriction of a signal
name x to a process P, noted P/x. The analogy stops here as Lustre and Signal differ in fundamental
ways. Lustre is a single-clocked programming language, while Signal is a multi-clocked (polychronous)
specicationformalism. This difference originates inthe choice of different primitive combinators (named
f in Figure 8.10) and results in orthogonal systemdesign methodologies.
8.4.2.1 Combinators for Lustre
In a Lustre process, each equation processes the nth event of each input signal during the nth reaction
(to possibly produce an output event). As it synchronizes upon availability of all inputs, the timing
structure of a Lustre programis easily captured within a single clock domain: all input events are related
to a master clock and the clock of the output signals is dened by sampling the master. There are three
fundamental combinators in Lustre:
Delay. x = prey initially lets x undened and then denes it by the previous value of y.
Followed-by. x = y ->z initially denes x by the value v, and then by z. The pre and -> operators
are usually used together, like in x = v -> pre(y), to dene a signal x initialized to v and dened by
the previous value of y. Scade, the commercial version of Lustre, uses a one-bit analysis to check that each
signal dened by a pre is effectively initialized by an ->.
Conditional. x = ifb theny elsez denes x by y if b is true and by z if b is false. It can be used
without alternative x = ifb theny to sample y at the clock b, as shown in Figure 8.11.
Lustre programs are structured as data-ow functions, also called nodes. A node takes a number of
input signals and denes a number of output signals upon the presence of an activation condition. If that
condition matches an edge of the input signal clock, then the node is activated and possibly produces
output. Otherwise, outputs are undetermined or defaulted. As an example, Figure 8.12 denes a resettable
counter. It takes an input signal tick and returns the count of its occurrences. Aboolean reset signal
can be triggered to reset the count to 0. We observe that the boolean input signals tick and reset are
synchronous to the output signal count and dene a data-ow function.
2006 by Taylor & Francis Group, LLC
8-14 Embedded Systems Handbook
:=
FIGURE 8.13 The delay operator in Signal.
:= when
:= default
FIGURE 8.14 The merge operator in Signal.
process counter = (? event tick, reset ! integer value)
(| value := (0 when reset)
default ((value$ init 0 + 1) when tick)
default (value$ init 0)
|);
FIGURE 8.15 A resettable counter in Signal.
8.4.2.2 Combinators for Signal
As opposed to nodes in Lustre, equations x := y f z in Signal more generally denote processes that dene
timing relations between input and output signals. There are three primitive combinators in Signal:
Delay. x := y$1init v initially denes the signal x by the value v and then by the previous value of
the signal y. The signal y and its delayed copy x := y$1init v are synchronous: they share the same
set of tags t
1
, t
2
, . . . . Initially (at t
1
), the signal x takes the declared value v. At tag t
n
, x takes the value of
y at tag t
n 1
. This is displayed in Figure 8.13.
Sampling. x := y when z denes x by y when z is true (and both y and z are present); x is present
with the value v
2
at t
2
only if y is present with v
2
at t
2
and if z is present at t
2
with the value true. When
this is the case, one needs to schedule the calculation of y and z before x, as depicted by y
t
2
x
t
2
z
t
2
.
Merge. x = y default z denes x by y when y is present and by z otherwise. If y is absent and
z present with v
1
at t
1
then x holds (t
1
, v
1
). If y is present (at t
2
or t
3
) then x holds its value whether
z is present (at t
2
) or not (at t
3
). This is depicted in Figure 8.14.
The structuring element of a Signal specication is a process. A process accepts input signals originating
from possibly different clock domains to produce output signals when needed. Recalling the example of
the resettable counter (Figure 8.12), this allows, for instance, to specify a counter (pictured in Figure 8.15)
where the inputs tick and reset and the output value have independent clocks. The body of
counter consists of one equation that denes the output signal value. Upon the event reset, it sets
the count to 0. Otherwise, upon a tick event, it increments the count by referring to the previous value
of value and adding 1 to it. Otherwise, if the count is solicited in the context of the counter process
(meaning that its clock is active), the counter just returns the previous count without having to obtain a
value fromthe tick and reset signals.
A Signal process is a structuring element akin to a hierarchical block diagram. A process may struc-
turally contain sub-processes. A process is a generic structuring element that can be specialized to the
timing context of its call. For instance, a denition of the Lustre counter (Figure 8.12) starting from the
specication of Figure 8.15 consists of the renement depicted in Figure 8.16. The input tick and reset
clocks expected by the process counter are sampled fromthe boolean input signals tick and reset
2006 by Taylor & Francis Group, LLC
Synchronous Hypothesis and Languages 8-15
process synccounter = (? boolean tick, reset ! integer value)
(| value := counter (when tick, when reset)
| reset = tick = value
|);
FIGURE 8.16 Synchronization of the counter interface.
+ - * (clock expression)
when (clock relations)
FIGURE 8.17 The syntax of clock expressions and clock relations (equations).
:=
:= when when
:= default + when -
FIGURE 8.18 The clock inference system of Signal.
by using thewhen tickandwhen reset expressions. The count is then synchronized to the inputs
by the equation reset = tick = count.
8.4.3 Compilation of Declarative Formalisms
The analysis and code generation techniques of Lustre and Signal are necessarily different, tailored to
handle the specic challenges determined by the different models of computation and programming
paradigms.
8.4.3.1 Compilation of Signal
Sequential code generation starting from a Signal specication starts with an analysis of its implicit
synchronization and scheduling relations. This analysis yields the control- and data-ow graphs that
dene the class of sequentially executable specications and allow to generate code.
8.4.3.1.1 Synchronization and Scheduling Analysis
In Signal, the clock x of a signal x denotes the set of instants at which the signal x is present. It
is represented by a signal that is true when x is present and that is absent otherwise. Clock expressions
(see Figure 8.17) represent control. The clock when x (respectively whennot x) represents the time
tags at which a boolean signal x is present and true (respectively false). The empty clock is denoted
by 0. Clock expressions are obtained using conjunction, disjunction, and symmetric difference over
other clocks. Clock equations (also called clock relations) are Signal processes: the equation e= e
/
. An -regular expression
consists of the form U V
denotes an innite
repetition of expression V. These are used for specifying properties of systems that do not terminate.
10.3 Languages for Hardware Verication
In this section, we focus on verication languages for hardware RTL designs. (Languages for system-level
hardware designs are covered in Section 10.5.)
10.3.1 HDLs and Interfaces to Programming Languages
Register Transfer Level (RTL) designs are typically implemented in standard HDLs, such as Verilog and
VHDL. However, it is not practical to implement simulation testbenches in HDLs alone. Indeed the test-
bench can be purely behavioral, that is, it need not be synthesizable into hardware. Furthermore, it can be
implemented at higher levels than RTL. Historically, a popular testbench development approach has been
to implement some of its parts in a software programming language, such as C/C++ or Perl. These parts
are integrated with HDL simulators through standard programming language interfaces. Unfortunately,
this approach not only slows down the simulator, but it also requires signicant development effort, such
as dening new data types (e.g., 128-bit bus), handling concurrency, dynamic memory objects, etc. [7].
For property specication, VHDL has some support for static assertions, with different severity levels.
Although Verilog lacks such explicit constructs, it is straightforward to use the if and $display
constructs to implement a similar effect.
Example 10.1 Suppose we need to check that two signals A and B cannot be high at the same time.
The fragment below shows an assertion template and an instance in VHDL. The keyword assert species
a property that must hold during simulation.
[label] assert expression
[report message]
[severity level]
assert (A NAND B)
report "error: A & B cannot both be 1"
severity 0;
A similar effect can be achieved by using Verilogs if/$display combination, which species the undesired
situation, as shown below:
always (A or B) begin
if (A & B) begin
$display("error: A = B = 1");
$finish; // end simulation
end
end
2006 by Taylor & Francis Group, LLC
Verication Languages 10-7
10.3.2 Open Verication Library
Though simple assertions are quite useful, they do not provide a practical way for specifying properties.
Temporal properties can be specied using checkers or monitors in HDLs, but this involves a signicant
development effort. Therefore, there has been a great deal of interest in developing a library of reusable
monitors. An example is the Open Verication Library (OVL), available from Accellera [3]. It is not a
standalone language, but a set of modules that can be used to check common temporal specications
within Verilog or VHDL design descriptions.
Example 10.2 As an example [17] consider the following PCI Local Bus Specication: to prevent AD,
C/BE#, and PAR signals from oating during reset, the central resource may drive these lines during reset
(bus parking) but only to a logic low level; they may not be driven high.
Suppose ad, cbe_, par, rst_ are the Verilog signal names corresponding to AD, C/BE#, PAR, and RST#
respectively.
2
The given property can be specied within a Verilog implementation of OVL as follows:
assert_always master_reset (clk,!rst_,!|(ad ,cbe_ ,par));
Here, assert-always is the name of the library monitor, and master_reset is an assertion instance. On every
rising edge of clk signal, whenever !rst_ is high, the monitor asserts that the last parameter should evaluate
to true. Here the last parameter is the negation (!) of the bitwise-or ( |) over the given bits. Note that this
is a simple safety property, which is expected to always hold during simulation.
Example 10.3 Consider another example [17] from the PCI Local Bus Specication: the assertion of
IRDY# and deassertion of FRAME# should occur as soon as possible after STOP# is asserted, preferably
within one to three cycles.
For simplicity, consider the FRAME# and STOP# signals only, that is, check whether frame_ will be
de-asserted within one to three clock cycles after stop_ is asserted, as shown below:
assert_frame#(1,1,3) check_frame_da (clk, true,!stop_, frame_)
Again, check_frame_da is an instance of the assert_frame module dened in the library. Three optional
parameters #(1, 1, 3) are used, corresponding to the severity level, minimum number of clocks, and
maximum number of clocks, respectively. The monitor will check whether frame_ goes high within one
to three clock cycles after !stop_. Here, the severity level is set to 1 to continue the simulation even if the
assertion is violated, and the reset parameter is set to true, as an example of where it is not needed.
Open Verication Library has many advantages. First, it can be used with any Verilog, VHDL, or a
mixed simulator, with no need for additional verication tools. Second, it is open, that is, the library can
be modied easily, for example, for assessing functional coverage [17]. Another useful feature of OVL is
that it does not slow down simulation, primarily because it is hard to specify very complex assertions.
Unfortunately, OVL is used mainly for checking safety properties during simulation, and is not very useful
for checking liveness or for formal verication. In some sense, OVL provides a transition from a traditional
simulation-based methodology, to an assertion-based methodology.
10.3.3 Temporal e
Verisitys e language is an advanced verication language that is intended to cover many verication aspects.
It has been recently chosen by the IEEE DASC [4] for standardization as a verication language. Like
many high-level languages, it has constructs for Object-Oriented Programming, such as class denitions,
inheritance and polymorphism [7]. It also provides elements of Aspect-Oriented Programming (AOP).
AOP allows modifying the functionality of the environment without duplicating or modifying the original
code, in a manner more advanced than simple inheritance (see References 7 and 24 for more details).
2
In PCI Local Bus Specication, signal names ending with # indicates that signal is active low. In the examples,
we will use _, for the same purpose.
2006 by Taylor & Francis Group, LLC
10-8 Embedded Systems Handbook
As a testbench language, e provides many constructs related to stimuli generation, such as specication
of input constraints and facilities for data packing, as well as for assessing simulation coverage. It also
provides support for property specication, and has been used widely both in simulation-based and
formal verication.
Example 10.4 As an example for stimuli generation, suppose we have a struct type
3
frame for modeling
an Ethernet frame, with one of the data elds dened as %payload. The type of payload can be dened in
e as follows [24]:
struct payload {
%id :byte;
%data :list of byte;
keep soft data.size()in [45..1499];
}
In this example, the % character in front of the eld name means that the corresponding eld is physical
and represents data to be sent to the DUV. The keep soft keywords are used for bounding the values of
the variable the size of the data eld in this case. It also allows specication of weighted ranges or
constraints. In the example, the size will be varied automatically within the given range. (Using the !
character along with % would have indicated that the eld would not be generated automatically.)
Typically, a user-denedfunctionis usedfor driving stimuli tothe DUV. For example, suppose my_frame
is an instance of the struct frame. The following e code can be used to input the frame data serially into
the DUV:
Example 10.5
var bitList: list of bit;
bitList = pack (packing.low, my_frame);
for each (b) in bitList {
testbench.duv.transmit_stream0= b;
wait cycle; };
In this example, the keyword pack provides the mechanism to pack all data elds into a single list of bits,
which is then fed serially to the Verilog signal testbench.duv.transmit_stream0. After each bit transfer, the
function waits for one clock, denoted by the wait cycle keywords.
Support for specication of temporal properties is provided in e through use of Temporal Expressions
(TEs). A TE is dened as a combination of events and temporal operators. The language also supports the
keyword sync, which is used as a point of synchronization for TEs.
Example 10.6 Returning to PCI specications, consider the following requirement, and its corresponding
specication in e: Once a master has asserted IRDY#, it cannot change IRDY# or FRAME # until the
current data phase completes regardless of the state of TRDY#.
expect @new_irdy => {
[..]*((not @irdy_rise) and (not change (frame_)));
@ data_phase_complete} @sys.pci_clk;
else
dut_error("Error, IRDY# or FRAME# changed",
"before current data phase completed.");
Here, suppose that the events (shown as @event) have been dened already. The shown expression
species that whenever IRDY# is asserted (@new_irdy), de-assertion of IRDY# (@irdy_rise) or a change
in FRAME# should not occur, until the data phase completes (@data_phase_complete). The use of
3
A struct type basically corresponds to a class type in C++, that is, it allows method denitions along with data
denitions. Since it is conceptually similar to other object-oriented languages, we omit the actual syntax.
2006 by Taylor & Francis Group, LLC
Verication Languages 10-9
@sys.pci_clk denotes that the event pci_clk is used for sampling signals in evaluating the given TE. This
feature is also useful for verifying multiple clocked designs.
10.3.4 OpenVera and OVA
OpenVera from Synopsys is another testbench language similar to e in terms of functionality and similar
to C++ in terms of syntax. Since conceptually OpenVera is very similar to e, we do not include here
testbench examples for OpenVera. It has similar constructs for coverage, random stimuli generation, data
packing, etc.
OpenVera Assertions (OVA) is a standalone language, which is also part of the OpenVera suite [25].
OpenVera comes with a checker library (OVA IP), which is similar to OVL. OVA and OpenVera also have
event denitions, repetition operators (
i=1
C
i
T
i
n(2
1/n
1) (12.1)
where C
i
and T
i
represent the worst-case computation time and the period of task i, respectively.
The quantity
U =
n
i=1
C
i
T
i
represents the processor utilization factor and denotes the fraction of time used by the processor to execute
the entire task set. Table 12.1 shows the values of n(2
1/n
1) for n from 1 to 10. As can be seen, the factor
decreases with n and, for large n, it tends to the following limit value:
lim
n
n(2
1/n
1) = ln 2 0.69
TABLE 12.1 Maximum
Processor Utilization for the
RM Algorithm
n U
lub
1 1.000
2 0.828
3 0.780
4 0.757
5 0.743
6 0.735
7 0.729
8 0.724
9 0.721
10 0.718
2006 by Taylor & Francis Group, LLC
Real-Time Operating Systems 12-5
We note that the test by Liu and Layland only gives a sufcient condition for guaranteeing a feasible
schedule under the RM algorithm. Hence, a task set can be schedulable by RM even though the utilization
condition is not satised. Nevertheless, we can certainly state that a periodic task set cannot be feasibly
scheduled by any algorithm if U > 1. A statistical study carried out by Lehoczky et al. [6] on randomly
generated task sets showed that the utilization bound of the RM algorithm has an average value of 0.88,
and becomes 1 for periodic tasks with harmonic period relations. Necessary and sufcient schedulability
tests for RM have been proposed [6,10,11,29], but they have pseudo-polynomial complexity. Recently,
Bini and Buttazzo derived a sufcient polynomial time test, the Hyperbolic Bound [28], capable of
accepting more tasks than the Liu and Layland test. In spite of the limitation on the schedulability bound,
which in most cases prevents the full processor utilization, the RM algorithm is widely used in real-time
applications, mainly for its simplicity. At the same time, being a static scheduling algorithm, it can be
easily implemented on top of commercial operating systems, using a set of xed priority levels. Moreover,
in overload conditions, the highest priority tasks are less prone to missing their deadlines. For all these
reasons, the Software Engineering Institute of Pittsburgh has prepared a sort of user guide for the design
and analysis of real-time systems based on the RM algorithm [7]. Since the RM algorithm is optimal
among all xed priority assignments, the schedulability bound can only be improved through a dynamic
priority assignment.
12.2.3 Earliest Deadline First
The earliest deadline rst (EDF) algorithm entails selecting (among the ready tasks) the task with the
earliest absolute deadline. The EDF algorithm is typically preemptive, in the sense that, a newly arrived
task can preempt the running task if its absolute deadline is shorter. If the operating system does not
support explicit timing constraints, EDF (as RM) can be implemented on a priority-based kernel, where
priorities are dynamically assigned to tasks. A task will receive the highest priority if its deadline is the
earliest among those of the ready tasks, whereas it will receive the lowest priority if its deadline is the latest
one. A task gets a priority that is inversely proportional to its absolute deadline. The EDF algorithm is
more general than RM, since it can be used to schedule both periodic and aperiodic task sets, because
the selection of a task is based on the value of its absolute deadline, which can be dened for both types
of tasks. Typically, a periodic task that completed its execution is suspended by the kernel until its next
release, coincident with the end of the current period. Dertouzos [8] showed that EDF is optimal among
all online algorithms, while Liu and Layland [5] proved that a set =
1
, . . . ,
n
of n periodic tasks is
schedulable by EDF if and only if
n
i=1
C
i
T
i
1
It is worth noting that the EDF schedulability condition is necessary and sufcient to guarantee
a feasible schedule. This means that, if it is not satised, no algorithm is able to produce a feasible
schedule for that task set.
The dynamic priority assignment allows EDF to exploit the full processor, reaching up to 100 utilization
factor less than one, the residual fraction of time can be efciently used to handle aperiodic requests
activated by external events. In addition, compared with RM, EDF generates a lower number of context
switches, thus causing less runtime overhead. On the other hand, RM is simpler to implement on a xed
priority kernel and is more predictable in overload situations, because higher priority tasks are less viable
to miss their deadlines.
12.2.4 Tasks with Deadlines Less than Periods
Using RM or EDF, a periodic task can be executed at any time during its period. The only guarantee
provided by the schedulability test is that each task will be able to complete its execution before the next
2006 by Taylor & Francis Group, LLC
12-6 Embedded Systems Handbook
release time. In some real-time applications, however, there is the need for some periodic task to complete
within an interval less than its period. The deadline monotonic (DM) algorithm, proposed by Leung and
Whitehead [9], extends RM to handle tasks with a relative deadline less than or equal to their period.
According to DM, at each instant the processor is assigned the task with the shortest relative deadline.
In priority-based kernels, this is equivalent to assigning each task a priority P
i
inversely proportional
to its relative deadline. With D
i
xed for each task, DM is classied as a static scheduling algorithm.
In the recent years, several authors [6, 10, 11] independently proposed a necessary and sufcient test to
verify the schedulability of a periodic task set. For example, the method proposed by Audsley et al. [10]
involves computing the worst-case response time R
i
of each periodic task. It is derived by summing its
computation time and the interference caused by tasks with higher priority:
R
i
= C
i
+
khp(i)
R
i
T
k
C
k
(12.2)
where hp(i) denotes the set of tasks having priority higher than task i and x denotes the ceiling of
a rational number, that is, the smaller integer greater than or equal to x. The equation above can be solved
by an iterative approach, starting with R
i
(0) = C
i
and terminating when R
i
(s) = R
i
(s 1). If R
i
(s) > D
i
for some task, then the task set cannot be feasibly scheduled by DM. Under EDF, the schedulability analysis
for periodic task sets with deadlines less than periods is based on the processor demand criterion, proposed
by Baruah et al. [12]. According to this method, a task set is schedulable by EDF if and only if, in every
interval of length L (starting at time 0), the overall computational demand is no greater than the available
processing time, that is, if and only if
L > 0,
n
i=1
L +T
i
D
i
T
i
C
i
L (12.3)
This test is feasible, because L can only be checked for values equal to task deadlines no larger than
the least common multiple of the periods. A detailed analysis of EDF has been presented by Stankovic,
Ramamritham, Spuri and Buttazzo [30] under several workload conditions.
12.3 Aperiodic Task Handling
Although in a real-time system most acquisition and control tasks are periodic, there exist computational
activities that must be executed only at the occurrence of external events (typically signaled through
interrupts), which may arrive at irregular times. When the system must handle aperiodic requests of
computation, we have to balance two conicting interests: on the one hand, we would like to serve an event
as soon as possible to improve system responsiveness; on the other hand, we do not want to jeopardize the
schedulability of periodic tasks. If aperiodic activities are less critical than periodic tasks, then the objective
of a scheduling algorithm should be to minimize their response time, while guaranteeing that all periodic
tasks (although being delayed by the aperiodic service) complete their executions within their deadlines.
If some aperiodic task has a hard deadline, we should try to guarantee its timely completion ofine. Such a
guarantee can only be done by assuming that aperiodic requests, although arriving at irregular intervals,
do not exceed a maximum given frequency, that is, they are separated by a minimum interarrival time.
An aperiodic task characterized by a minimum interarrival time is called a sporadic task. Let us consider
an example in which an aperiodic job J
a
of 3 units of time must be scheduled by RM along with two
periodic tasks, having computation times C
1
= 1, C
2
= 3 and periods T
1
= 4, T
2
= 6, respectively.
As shown in Figure 12.2, if the aperiodic request is serviced immediately (i.e., with a priority higher than
that assigned to periodic tasks), then task
2
will miss its deadline.
The simplest technique for managing aperiodic activities while preserving the guarantee for periodic
tasks is to schedule them in background. This means that an aperiodic task executes only when the
2006 by Taylor & Francis Group, LLC
Real-Time Operating Systems 12-7
t
1
t
2
J
a
0 12 4 6 2 8 10
4 8 12
6 12 0
0
Deadline miss
FIGURE 12.2 Immediate service of an aperiodic task. Periodic tasks are scheduled by RM.
t
1
t
2
0 12 4 6 2 8 10
4 8 12
6 12 0
0
J
a
FIGURE 12.3 Background service of an aperiodic task. Periodic tasks are scheduled by RM.
processor is not busy with periodic tasks. The disadvantage of this solution is that, if the computational
load due to periodic tasks is high, the residual time left for aperiodic execution can be insufcient for
satisfying their deadlines. Considering the same task set as before, Figure 12.3 illustrates how job J
a
is
handled by a background service.
The response time of aperiodic tasks can be improved by handling them through a periodic server
dedicated to their execution. As any other periodic task, a server is characterized by a period T
s
and
an execution time C
s
, called the server capacity (or budget). In general, the server is scheduled using the
algorithm adopted for periodic tasks and, once activated, it starts serving the pending aperiodic requests
within the limit of its current capacity. The order of service of the aperiodic requests is independent
of the scheduling algorithm used for the periodic tasks, and it can be a function of the arrival time,
computation time, or deadline. During the last years, several aperiodic service algorithms have been
proposed in the real-time literature, differing in performance and complexity. Among the xed priority
algorithms we mention the Polling Server, the Deferrable Server [13,14], the Sporadic Server [15], and the
Slack Stealer [16]. Among those servers using dynamic priorities (which are more efcient on the average),
we recall the Dynamic Sporadic Server [17,18], the Total Bandwidth Server [19], the Tunable Bandwidth
Server [20], and the Constant Bandwidth Server [21]. In order to clarify the idea behind an aperiodic
server, Figure 12.4 illustrates the schedule produced, under EDF, by a Dynamic Deferrable Server with
capacity C
s
= 1 and period T
s
= 4. We note that, when the absolute deadline of the server is equal to the
2006 by Taylor & Francis Group, LLC
12-8 Embedded Systems Handbook
t
1
t
2
J
a
C
s
4 12 8
6 12 0
0
0 12 4 2 6 8 10
0 12 4 2 6 8 10
FIGURE 12.4 Aperiodic service performed by a Dynamic Deferrable Server. Periodic tasks, including the server,
are scheduled by EDF. C
s
is the remaining budget available for J
a
.
t
1
t
2
0 12 4 6 2 8 10
4 8 12
6 1 2 0
0
J
a
FIGURE 12.5 Optimal aperiodic service under EDF.
one of a periodic task, priority is given to the server in order to enhance aperiodic responsiveness. We also
observe that the same task set would not be schedulable under a xed priority system.
Although the response time achieved by a server is less than that achieved through the background
service, it is not the minimum possible. The minimum response time can be obtained with an optimal
server (TB
) that assigns each aperiodic request the earliest possible deadline which still produces a feasible
EDF schedule [20]. The schedule generated by the optimal TB
khp(i)
C
k
T
k
+
B
i
T
i
i(2
1/i
1) (12.4)
where hp(i) denotes the set of tasks withpriority higher than
i
. The same test is validfor boththe protocols
described above, the only difference being the amount of blocking that each task may experience.
12.5 New Applications and Trends
In the last years, real-time system technology has been applied to several application domains, where
computational activities have less stringent timing constraints and occasional deadline misses are typically
tolerated. Examples of such systems include monitoring, multimedia systems, ight simulators, and, in
general, virtual reality games. In such applications, missing a deadline does not cause catastrophic effects
on the system, but just a performance degradation. Hence, instead of requiring an absolute guarantee for
the feasibility of the schedule, such systems demand an acceptable quality of service (QoS). It is worth
observing that, since some timing constraints needtobe handledanyway (althoughnot critical), a nonreal-
time operating system, sucha Linux or Windows, is not appropriate: rst of all, suchsystems donot provide
temporal isolation among tasks, thus a sporadic peak load on a task may negatively affect the execution
of other tasks in the system. Furthermore, the lack of concurrency control mechanisms that prevent
priority inversion makes these systems unsuitable for guaranteeing a desired QoS level. On the other
hand, a hard real-time approach is also not well suited for supporting such applications, because resources
would be wasted due to static allocation mechanisms and pessimistic design assumptions. Moreover, in
many multimedia applications, tasks are characterized by highly variable execution times (consider, for
instance, an MPEG player), thus providing precise estimations on task computation times is practically
impossible, unless one uses overly pessimistic gures. In order to provide efcient as well as predictable
support for this type of real-time applications, several new approaches and scheduling methodologies
have been proposed. They increase the exibility and the adaptability of a system to online variations.
For example, temporal protection mechanisms have been proposed to isolate task overruns and reduce
reciprocal task interference [21, 24]. Statistical analysis techniques have been introduced to provide a
probabilistic guarantee aimed at improving system efciency [21]. Other techniques have been devised to
handle transient and permanent overload conditions in a controlled fashion, thus increasing the average
computational load in the system. One method absorbs the overload by regularly aborting some jobs
of a periodic task, without exceeding a maximum limit specied by the user through a QoS parameter
describing the minimum number of jobs between two consecutive abortions [25, 26]. Another technique
handles overloads through a suitable variation of periods, managed to decrease the processor utilization
up to a desired level [27].
2006 by Taylor & Francis Group, LLC
12-12 Embedded Systems Handbook
12.6 Conclusions
This paper surveyed some kernel methodologies aimed at enhancing the efciency and the predict-
ability of real-time control applications. In particular, the paper presented some scheduling algorithms
and analysis techniques for periodic and aperiodic task sets. Two concurrency control protocols have
been described to access shared resources in mutual exclusion while avoiding the priority inversion
phenomenon. Each technique has the property to be analyzable, so that an ofine guarantee can be
provided for feasibility of the schedule within the timing constraints imposed by the application. For
soft real-time systems, such as multimedia systems or simulators, the hard real-time approach can be too
rigid and inefcient, especially when the application tasks have highly variable computation times. In
these cases, novel methodologies have been introduced to improve average resource exploitation. They
are also able to guarantee a desired QoS level and control performance degradation during overload
conditions. In addition to research efforts aimed at providing solutions to more complex problems,
a concrete increase in the reliability of future real-time systems can only be achieved if the mature
methodologies are actually integrated in next generation operating systems and languages, dening new
standards for the development of real-time applications. At the same time, programmers and software
engineers need to be educated about the appropriate use of the available technologies.
References
[1] J. Stankovic, A Serious Problem for Next-Generation Systems. IEEE Computer. 1019, 1988.
[2] J. Stankovic and K. Ramamritham, Tutorial on Hard Real-Time Systems, IEEE Computer Society
Press, Washington, 1988.
[3] G.C. Buttazzo, Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and
Applications, Kluwer Academic Publishers, Boston, MA, 1997.
[4] J. Stankovic, M. Spuri, M. Di Natale, and G. Buttazzo, Implications of Classical Scheduling Results
for Real-Time Systems. IEEE Computer, 28, 1625, 1995.
[5] C.L. Liu and J.W. Layland, Scheduling Algorithms for Multiprogramming in a Hard Real-Time
Environment. Journal of the ACM, 20, 4061, 1973.
[6] J.P. Lehoczky, L. Sha, and Y. Ding, The Rate-Monotonic Scheduling Algorithm: Exact Character-
ization and Average Case Behaviour. In Proceedings of the IEEE Real-Time Systems Symposium,
pp. 166171, 1989.
[7] M.H. Klein et al., A Practitioners Handbook for Real-Time Analysis: Guide to Rate Monotonic
Analysis for Real-Time Systems, Kluwer Academic Publishers, Boston, MA, 1993.
[8] M.L. Dertouzos, Control Robotics: the Procedural Control of Physical Processes. Information
Processing, Vol. 74, North-Holland, Amsterdam, 1974.
[9] J. Leung and J. Whitehead, On the Complexity of Fixed Priority Scheduling of Periodic Real-Time
Tasks. Performance Evaluation, 2, 237250, 1982.
[10] N.C. Audsley, A. Burns, M. Richardson, K. Tindell, and A. Wellings, Applying New Scheduling
Theory to Static Priority Preemptive Scheduling. Software Engineering Journal, 8, 284292, 1993.
[11] M. Joseph and P. Pandya, Finding Response Times in a Real-Time System. The Computer Journal,
29, 390395, 1986.
[12] S.K. Baruah, R.R. Howell, and L.E. Rosier, Algorithms and Complexity Concerning the Preemptive
Scheduling of Periodic Real-Time Tasks on One Processor. Real-Time Systems, 2, 301324, 1990.
[13] J.P. Lehoczky, L. Sha, and J.K. Strosnider, Enhanced Aperiodic Responsiveness in Hard Real-Time
Environments. In Proceedings of the IEEE Real-Time Systems Symposium, pp. 261270, 1987.
[14] J.K. Strosnider, J.P. Lehoczky, and L. Sha, The Deferrable Server Algorithmfor EnhancedAperiodic
Responsiveness in Hard Real-Time Environments. IEEE Transactions on Computers, 44, 1995.
[15] B. Sprunt, L. Sha, and J. Lehoczky, Aperiodic Task Scheduling for Hard Real-Time System. Journal
of Real-Time Systems, 1, 2760, 1989.
2006 by Taylor & Francis Group, LLC
Real-Time Operating Systems 12-13
[16] J.P. Lehoczky and S. Ramos-Thuel, An Optimal Algorithm for Scheduling Soft-Aperiodic
Tasks in Fixed-Priority Preemptive Systems. In Proceedings of the IEEE Real-Time Systems
Symposium, 1992.
[17] T.M. Ghazalie and T.P. Baker, Aperiodic Servers in a Deadline Scheduling Environment Real-Time
Systems, 9(1), 3167, 1995.
[18] M. Spuri and G.C. Buttazzo, Efcient Aperiodic Service under Earliest Deadline Scheduling.
In Proceedings of IEEE Real-Time System Symposium, San Juan, PR, December 1994.
[19] M. Spuri and G. Buttazzo, Scheduling Aperiodic Tasks in Dynamic Priority Systems. Real-Time
Systems, 10(2), 179210, 1996.
[20] G. Buttazzo and F. Sensini, Optimal Deadline Assignment for Scheduling Soft Aperiodic Tasks in
Hard Real-Time Environments. IEEE Transactions on Computers, 48(10), 10351052, 1999.
[21] L. Abeni and G. Buttazzo, Integrating Multimedia Applications in Hard Real-Time Systems.
In Proceedings of the IEEE Real-Time Systems Symposium, Madrid, Spain, December 1998.
[22] R. Rajkumar, Synchronous Programming of Reactive Systems, Kluwer Academic Publishers,
Boston, MA, 1991.
[23] L. Sha, R. Rajkumar, and J.P. Lehoczky, Priority Inheritance Protocols: An Approach to Real-Time
Synchronization. IEEE Transactions on Computers, 39, 11751185, 1990.
[24] I. Stoica, H-Abdel-Wahab, K. Jeffay, S. Baruah, J.E. Gehrke, and G.C. Plaxton, A Proportional
Share Resource Allocation Algorithm for Real-Time Timeshared Systems. In Proceedings of IEEE
Real-Time Systems Symposium, December 1996.
[25] G. Buttazzo and M. Caccamo, Minimizing Aperiodic Response Times in a Firm Real-Time
Environment. IEEE Transactions on Software Engineering, 25, 2232, 1999.
[26] G. Koren and D. Shasha, Skip-Over: Algorithms and Complexity for Overloaded Systems that
Allow Skips. In Proceedings of the IEEE Real-Time Systems Symposium, 1995.
[27] G. Buttazzo, G. Lipari, M. Caccamo, and L. Abeni, Elastic Scheduling for Flexible Workload
Management. IEEE Transactions on Computers, 51, 289302, 2002.
[28] E. Bini, G.C. Buttazzo, andG.M. Buttazzo, AHyperbolic Boundfor the Rate Monotonic Algorithm.
In Proceedings of the 13th Euromicro Conference on Real-Time Systems, Delft, The Netherlands,
pp. 5966, June 2001.
[29] E. Bini and G.C. Buttazzo, The Space of Rate Monotonic Schedulability. In Proceedings of the 23rd
IEEE Real-Time Systems Symposium, Austin, TX, December 2002.
[30] J. Stankovic, K. Ramamritham, M. Spuri, and G. Buttazzo, Deadline Scheduling for Real-Time
Systems, Kluwer Academic Publishers, Boston, MA, 1998.
2006 by Taylor & Francis Group, LLC
13
Quasi-Static
Scheduling of
Concurrent
Specications
Alex Kondratyev
Cadence Berkeley Laboratories and
Politecnico di Torino
Luciano Lavagno
Politecnico di Torino
Claudio Passerone
Politecnico di Torino
Yosinori Watanabe
Cadence Berkeley Laboratories
13.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1
Quasi-Static Scheduling A Simple Example
13.2 Overview of Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-4
13.3 QSS for PNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-5
Denitions Specication Model Schedulability
Analysis Algorithmic Implementation
13.4 QSS for Boolean Dataow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-10
Denitions Schedulability Analysis Comparison to
PN Model
13.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-14
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-14
13.1 Introduction
13.1.1 Quasi-Static Scheduling
The phenomenal growth of complexity and breadth of use of embedded systems can be managed only by
raising the level of abstraction at which design activities start and most design space exploration occurs.
This enables greater reuse potential, but requires signicant tool support for efcient analysis, mapping,
andsynthesis. Inthis chapter we deal withmethods aimedat providing designers withefcient methods for
uniprocessor software synthesis, from formal models that explicitly represent the available concurrency.
These methods can be extended to multiprocessor support and to hardware synthesis, however these
advanced topics are outside the scope of this chapter.
Concurrent specications, such as dataow networks [1], Kahn process networks [2], Communicating
Sequential Processes [3], synchronous languages [4], and graphical state machines [5], are interest-
ing because they expose the inherent parallelism in the application, which is much harder to recover
a posteriori by optimizing compilers. In such a specication, the application is described as a set of
processes that sequentially execute operations and communicate with each other. In considering an
13-1
2006 by Taylor & Francis Group, LLC
13-2 Embedded Systems Handbook
implementation of the application, it is often necessary to analyze how these processes interact with each
other. This analysis is used for evaluating how often a process will be invoked during an execution of
the system, or how much memory will be required for implementing the communication between the
processes.
Quasi-Static Scheduling (QSS) is a technique for nding sequences of operations to be executed across
the processes that constitute a concurrent specication of the application. Several approaches have been
proposed [611], where they use certain mathematical models to abstract the specication and aim
to compute graphs of nite size such that the sequences are given by traversing the graphs. We call
the sequences of operations, or the graph which represents them, a schedule of the specication. The
schedule is static in the sense that it statically commits to a particular execution order of operations of
the processes. In general, there exist more than one possible order of operations to be executed, with
a different implementation cost for each. On the other hand, by committing to a particular sequence,
a static schedule allows a more rigorous analysis of the interaction among the processes than dynamic
schedules, because one can precisely observe how the operations fromdifferent processes are interleaved
to constitute the systemexecution.
The reason to start froma concurrent specication is twofold. First of all, coarse-grained parallelismis
very difcult to recover froma sequential specication, except in relatively simple cases (e.g., nested loops
with afne memory accesses [12]). Second, parallel specications offer a good model to performsystem-
level partitioning experiments, aimed at nding the best mixed hardware/software implementation on a
complex SOC platform. The reason to look for a sequencing of the originally concurrent operations is
that we are considering in this chapter embedded software implementations, for which context switching
implied by a multithreaded concurrent implementation would be very expensive, whenever concurrency
can be resolved at compile time.
This resolution is especially difcult if the specication involves data-dependent conditional constructs,
such as if-then-elsewitha data-dependent condition, because different sets of operations may be executed
depending upon how the constructs are resolved. For such a specication, the static scheduling produces
in principle a sequence of operations for each possible way of resolving the constructs (in practice, these
multiple sequences are collapsed as much as possible, in order to reduce code size). Note that these
constructs are resolved based on the data values, and therefore some of the resolutions of the constructs
may not happen at the runtime in a particular execution of the system. The information about data
values is not available to the static scheduling algorithm, because the latter runs statically at compile time.
In this sense, scheduling for a specication with such constructs is called quasi-static. It is responsible for
providing a sequence of operations to be executed for each possible runtime resolution of data-dependent
choices.
After a simple motivating example, we present an overview of some approaches proposed in the
literature. In Section 13.2, we consider two questions that one is concerned with in QSS, and briey
describe how these questions are addressed in two different models that have been proposed in the
literature. One of the models is Petri nets (PNs), and the other is Boolean Dataow (BDF) Graphs. They
model a given concurrent specication in different ways, and thus the expressiveness of the models
and issues that need to be accounted for to solve the scheduling problem are different. These two
models and issues on their scheduling approaches are presented more in detail in Sections 13.3 and 13.4,
respectively.
13.1.2 A Simple Example
Figure 13.1 illustrates how QSS works. In Figure 13.1(a), there are two processes, each with a sequential
program associated. The one shown on the left reads a value from port DATA into variable d, computes
a value for the variable D and writes it to the port PORT, and then goes back to the beginning. The other
process reads a value for variable Nfromport START, and then iterates the body of the for-loop Ntimes.
For each iteration, it reads two values from port IN, and sets them to x[0] and x[1], respectively. Here,
the third argument of the read function designates the number of data items to be read at a time. Since
2006 by Taylor & Francis Group, LLC
QSS of Concurrent Specications 13-3
while (1) {
read(DATA, d, 1);
D = d*d;
write(PORT, D, 1);
}
while (1) {
read(START, N, 1);
for (i=0, y=0; i<N; i++) {
read(IN, x, 2);
y = y + x[0] + 2*x[1];
}
write(OUT, y, 1);
}
DATA
START
goto DE;
Start: read(START, N, 1); i=0; y=0;
DE: if (i<N) {
read(DATA, d, 1); D = d*d; x[1] = D;
y = y + x[0] + 2*x[1]; i++;
else
write(OUT, y, 1);
goto Start;
read(DATA, d, 1); D = d*d; x[0] = D;
} {
}
DATA START
OUT
PORT
OUT
IN
(a)
(b)
FIGURE 13.1 A simple example: (a) initial specication, (b) result of the schedule.
IN is connected with PORT, it means that the process on the left needs to produce the necessary data.
However, it writes only one data item at a time to PORT, and therefore it needs to iterate the body
of the while-loop twice in order to provide enough data items required by this read statement. Once
the values of x have been set, a value for variable y is computed. At the end of the for-loop, the result
assigned to y is written to the port OUT, and this whole process is repeated. Throughout this paper,
we assume that the communication between processes is made through a point-to-point nite buffer
with the rst in rst out (FIFO) semantics. Therefore, a read operation can be completed only when the
requested number of data items is available at the corresponding buffer. Similarly, a write operation can
be completed only if the number of data items does not exceed the predened bound of the buffer after
writing.
A result of scheduling this system is shown in Figure 13.1(b). It is a single process that interleaves the
statements of the original two processes. Note that the resulting process does not have ports for PORTand
IN, which were originally used for connecting the two processes, because the read and write operations
to these ports are replaced by assignments of the variable D to x[0] and x[1]. In this way, scheduling
uses data assignments to realize the communication between the original processes, which is often more
efcient to implement. Further, it repeats the same set of operations given by read; D = d*d; x[i] = D;,
making explicit the fact that one of the original processes needs to iterate twice for each iteration of the
for-loop of the other process. Such a repetition could be effectively exploited in general to realize an
efcient implementation, but it can be identied only by analyzing how the original processes interact
with each other, and therefore is not taken into account when implementing each process individually. The
effectiveness of this kind of scheduling is shown by case studies such as [13], where the QSS was applied
to a part of the MPEG video decoding algorithm and the speed of the scheduled design was improved
by 45%. The improvement was mainly due to the replacement of the explicit communication among the
processes by data assignments, and also due to a reduction of the number of processes, which in turn
reduced the amount of context switches.
2006 by Taylor & Francis Group, LLC
13-4 Embedded Systems Handbook
13.2 Overview of Related Work
When solving the scheduling problem two main questions are usually of interest:
1. Does the specication have a bounded-length cyclic schedule? By length, we mean the number
of steps in a schedule required to return the specication to its initial state. This question is
important if the specication is to be scheduled with a hard real-time constraint.
2. Can the specication be scheduled within bounded memory? This means that at every state
of a schedule one can compute and move to the next step using a nite amount of memory
for communication buffers, and eventually return to the original state. A bounded-length cyclic
schedule implies bounded memory but not vice versa, as will be discussed more in detail in
Section 13.4.
Depending on the descriptive power of a model used to represent the specication, these questions
have different answers. One such model is dataow graphs, which are commonly used for digital signal
processing applications. In Static Dataow (SDF) Graphs [1], the number of tokens produced by a
process
1
on an output port, or consumed by a process from an input port, is xed and known statically,
or at compile time. Computationally efcient algorithms exist to answer questions 1 and 2 for any SDF
specication [1]. Furthermore, all schedulable graphs have bounded length schedules and the required
memory is bounded.
When the specication contains data-dependent conditional constructs, SDF graphs are insufcient to
model it. An extension of SDF to handle such constructs could be done differently: (1) by associating data
values with token ows, or (2) by introducing nondeterminism structurally (see Figure 13.2).
Examples of the rst modeling approach can be found in a rich body of research on BDF Graphs and
their derivatives/extensions [6, 7, 9, 11]. A similar modeling mechanism is also exploited in scheduling
approaches starting fromnetworks of nite state machine-like models [10, 14].
Interestingly, answering the question about the existence of bounded length schedules for an arbitrary
BDF graph can be done nearly as simply as for SDF. However, the status of the bounded memory problem
in BDF is very different. Annotating tokens with values makes the BDF model Turing complete and the
problemof nding a bounded memory schedule becomes undecidable [6]. For this reason, papers devoted
to BDF scheduling propose heuristic approaches for schedulability analysis.
An example of this is given in Reference 9. The proposed method initially sets a bound on the size of
each buffer based on the structural information of the specication, and tries to nd a schedule within
the bound. If a schedule is not found, the procedure heuristically increases the sizes of some buffers, and
repeats the search. In order to claimthe absence of a schedule within a given bound, the reachability space
of the systemdened for the bound is completely analyzed.
Other heuristics exploit clustering algorithms, which in case of success derive a bounded memory
schedule, while in case of failure leave the question open [6].
The work given in Reference 8 employs a PN as the underlying model for the systemspecication and
searches for a schedule in its reachability space. It abstracts data-dependent conditional constructs of the
specication using nondeterministic choices (see Figure 13.2). This abstraction in general helps improve
the efciency of the scheduling procedure, while it makes the approach conservative. The PN model is
not Turing complete and there are only a few problems that are undecidable for PNs [15]. Nevertheless,
decidability of the problem of nding bounded memory schedules for a PN has not been proven or
disproven. However for the important subclass of equal-choice PNs (an extension of free-choice PNs),
bounded memory schedules are found efciently (if they exist).
Alist of modeling approaches andthe complexity of their schedulability problems is showninTable 13.1,
where O(|Cycle_seq|) denotes that the problem is decidable and its computational complexity is linear
1
In the terminology of dataow graphs, a process is often called actor, and we may use these terms interchangeably
in this chapter.
2006 by Taylor & Francis Group, LLC
QSS of Concurrent Specications 13-5
if (x > 0)
else
A: y=x;
B: y=x*x;
if (x > 0)
else
A: y=x;
B: y=x*x;
Static
scheduling
SDF = Marked graphs
Data dependence
Out=d*x[0]*x[1];
read(DATA,x,2);
read(COEF,d,1);
OUT
COEF DATA
2
Structural
representation
Value
representation
SWITCH
x
>0
F T
B A
A B
BDF
x
PN
QSS
FIGURE 13.2 The PN and BDF modeling approaches.
TABLE 13.1 Models for Embedded Systems and the Complexity of Scheduling Problems
PN
SDF graph BDF graph Equal-choice General
Modeling data dependence No Yes Yes Yes
Bounded length schedule O(|Cycle_seq|) O(|Cycle_seq|) O(|Cycle_seq|) O(|Cycle_seq|)
Bounded memory schedule O(|Cycle_seq|) Undecidable O(|Cycle_seq|) Unknown
in the length of the sequence that brings the specication back to its initial state (called cyclic sequence).
Note that, however, the size of this cyclic sequence can be exponential in the size of the SDF graph.
We will reviewscheduling approaches basedonPNs andonBDFmore indetail inSections 13.3 and13.4,
respectively.
13.3 QSS for PNs
13.3.1 Denitions
A PN is dened by a tuple (P, T, F, M
0
), where P and T are sets of places and transitions respectively.
F is a function from (P T) (T P) to nonnegative integers. A marking M is another function from
P to nonnegative integers, where M[p] denotes the number of tokens at p in M. M
0
is the initial marking.
A PN can be represented by a directed bipartite graph, where an edge [u, v] exists if F(u, v) is positive,
which is called the weight of the edge. A transition t is enabled at a marking M, if M[p] F( p, t ) for
all p of P. In this case, one may re the transition at the marking, which yields a marking M
given by
M
c;
WRITE(OUT,d,1)
c=1; j =0
READ(DATA,d,1) READ(COEF,c,1)
j==N
j++
IN
T
in
p
1
p
5
t
1
t
2
t
6
p
6
t
7
t
3
t
5
t
8
t
4
p
7
t
9
t
10
p
4
p
3
p
2
true
False
DATA
COEF
T
coef
false
true
i <N
FIGURE 13.4 Systemspecication and corresponding PN.
Denition 13.1 (Sequential schedule). Given a Petri net N = (P, T, F, M
0
), a sequential schedule of N is
a transition systemSch = (S, T, , s
0
) with the following properties:
1. S is nite and there is a mapping : S R(M
0
), with (s
0
) = M
0
.
3
2. If transition t is reable in state s, with s
t
s
, then (s)
t
(s
) in N.
3. If t
1
is reable in s, then t
2
is reable in s if and only if t
2
ECS(t
1
).
4. For each state s S, there is a path s
s
t
for each uncontrollable source transition t of N.
Property 2 implies trace containment between Sch and N (any feasible trace in the schedule is feasible
in the original PN). Property 3 indicates that one ECS is scheduled at each state. Finally, the existence of
the path in property 4 ensures that any input event fromthe environment will be eventually served.
3
This mapping is required in order to enable the same state to be visited multiple times with different termination
criteria.
2006 by Taylor & Francis Group, LLC
QSS of Concurrent Specications 13-9
Intuitively, scheduling can be deemed as a game between the scheduler and the environment. The rules
of the game are the following:
The environment makes a move by ring any of the uncontrollable source transitions.
The scheduler picks up any of the enabled transitions to re with two exceptions:
(a) It has no control over choosing which one of the uncontrollable source transitions to re.
(b) It cannot resolve choice for data-dependent constructs, which are described by equal-choice
places.
In cases (a) and (b) the scheduler must explore all possible branches during the traversal of the
reachability space, that is, re all the transitions from the same ECS. However, it can decide the
moment for serving the source transitions or for resolving an equal choice, because it can nitely
postpone these by choosing some other enabled transitions to re.
The goal of the game is to process any input from the environment while keeping the traversed space
(and hence the amount of memory required to implement the communication buffers) nite. In case of
success, the result is to both classify the original PN as schedulable and derive the set of states (schedule)
that the scheduler can visit while serving an arbitrary mix of source transitions.
Under the assumption that the environment is sufciently slow to allow the scheduler to re all
nonsource transitions, the schedule is an upper approximation of the set of states visited during the
real-time execution of the specication. This is due to the fact that the scheduler is constructed taking
into account the worst possible conditions, since it has no knowledge about the environment behavior
and data dependencies.
13.3.4 Algorithmic Implementation
In this section, we describe an algorithm for nding a schedule for each uncontrollable source transition a
of a given PN. The algorithm, which is fully described in Reference 8, gradually creates a rooted tree, and
a postprocessing step creates a cycle for each leaf to generate a schedule.
The algorithm initially creates a root node corresponding to the initial marking of the PN, and res
the source transition a, generating a new marking. From here, it tries to create a tree by ring-enabled
transitions. For each node that is added to the tree, it checks whether a termination condition is satised,
or if an ancestor with the same marking exists. In the latter case, the search along the path is stopped and
the branch is closed into a loop with the ancestor node. To avoid exploring the possibly innite reachability
space of the PN, the algorithm uses a heuristic to identify a boundary of that space so that it would not
need to search beyond it [8].
If a schedule is found, the corresponding code that implements the schedule must be generated.
Although a direct translation of the schedule into code is possible, it usually increases the code size, since
different paths of the schedule may be associated with the same sequence of transitions. Optimizations
are thus required to reduce the code size. Also, ports that originally belong to different processes might
become part of the same nal task, and therefore do not require any communication primitive, but
rather are implemented using assignments or circular buffers, whose size can be statically determined by
analyzing the schedule.
As an example, let us consider the system illustrated in Figure 13.1(a). The PN model for the two
processes is shown in Figure 13.3, where the source port START is uncontrollable, while the source port
DATA is controllable. The ports PORT and IN are connected through place p
4
. In the initial marking,
places p
2
and p
6
have a token each.
After creating the root node of the tree, the algorithm to nd the schedule res the only uncontrollable
source transition START, generating a new node in the schedule with marking p
2
p
5
p
6
. Then, either transi-
tion C or DATA are enabled, and we may decide to re C. In the newly created node with marking p
2
p
7
,
transitions D and E are both enabled, and they constitute an equal-choice set. Therefore, the algorithm
explores the two branches, until it can close loops for both of them. The nal schedule is shown in
Figure 13.5.
2006 by Taylor & Francis Group, LLC
13-10 Embedded Systems Handbook
p
2
p
6
p
2
p
5
p
6
p
1
p
2
p
4
p
8
START
C
D
DATA
A
B
DATA A B
F
E
OUT
p
2
p
7
p
2
p
6
p
9
p
2
p
8
p
1
p
2
p
8
p
3
p
8
p
2
p
4
p
8
p
3
p
4
p
8
p
1
p
4
p
4
p
8
FIGURE 13.5 Schedule for the PN of Figure 13.3.
The last step is to generate the code, already shown in Figure 13.1(b). A node in the schedule with mul-
tiple outgoing arcs corresponds to an if-then-else, and loops are implemented using the goto statement.
Note that in this example no optimization has been performed to reduce code size. On the other hand,
the communication between the two processes has been implemented using assignments in the single task
that is generated.
13.4 QSS for Boolean Dataow
13.4.1 Denitions
An SDF graph [1] is a directed graph D = (A, E ) with actors A represented by nodes and arcs E representing
connections between the actors. These connections convey values between nodes, similar to the tokens in
PNs. Values arrive to actors respecting FIFO ordering.
Two mapping functions I and O are dened from E to nonnegative integers. They dene the consump-
tion and production rates of values for the connections between nodes, that is, for a connection e = (a, b )
from an actor a to an actor b, O (e ) (respectively I [e ]) shows how many tokens are produced at (consumed
from) e when the actor a (b) res.
The initial marking M
0
tells how many tokens reside on the arcs E before SDF starts an execution.
An actor a res if every input arc e carries at least I (e ) tokens. Firing an actor consumes I (e ) tokens
fromeach input arc and produces O(e) tokens on every output arc. A connection of an actor to its input
(or output) is denoted as input (or output) port.
A simple example of SDF graph is shown in Figure 13.6. In its initial marking only actor a is enabled.
Firing of a produces a token on each output port (arcs (a, c) and (a, b)). a needs to re twice to enable c
because I (a, c) = 2. The feasible sequence of actor rings a, a, c, b returns the graph to the original
marking.
An extension of SDF graphs to capture specications with data dependency results in adding to the
model dynamic actors [6] that satisfy the following properties:
1. An input port may be a conditional port, where the number of tokens consumed by the port
is given by a two-valued integer function of the value of a Boolean token received at the
2006 by Taylor & Francis Group, LLC
QSS of Concurrent Specications 13-11
a
b
c
1
2
1
2
1
1
FIGURE 13.6 SDF graph.
SWITCH
F T
T
SELECT
F
e
2
e
4
e
5
e
6
e
7
e
8
e
1
e
3
D
A B
E
b
e
1
e
2
e
3
e
4
e
5
e
6
e
7
e
8
C
}
A;
if ( b ) {
C;
else {
D;
}
E;
A SW D C SEL E B
=
0 0 1 0 0 0 0
0 0 0 1 0 p1 0
0 0 p 0 1 0 0
0 0 0 0 1 p 0
0 0 0 0 0 1 1
1 0 1 0 0 0 0
1 0 0 0 0 1 0
0 0 1p 1 0 0 0
(a) (b) (c)
FIGURE 13.7 If-then-else BDF graph.
special input port (the control port) of the same actor. One of the two values of the function
is zero.
2. Control ports are never conditional ports, and always transfer exactly one token per execution.
The canonical examples of this type of actors are SWITCH and SELECT
4
(e.g., see Figure 13.7[b]).
The SWITCHactor consumes an input token and a control token. If the control token is TRUE, the input
token is copied to the output labeled T; otherwise it is copied to the output labeled F. The SELECT actor
performs the inverse operation, reading a token fromthe T input if the control token is TRUE, otherwise
reading fromthe F input, and copying the token to the output.
Figure 13.7(b) shows an example of BDF that uses SWITCH and SELECT vectors to model the piece
of programin Figure 13.7(a).
13.4.2 Schedulability Analysis
13.4.2.1 Bounded Length Schedules
Deriving a bounded length schedule in a BDF graph reduces to the following two steps:
1. Finding a sequence of actors (cyclic sequence) that returns the graph to the initial marking.
2. Simulating the ring of a cyclic sequence to make sure that it is reable under the given initial
marking.
The rst task can be done through solving the so-called system of balance equations. This requires to
construct the incidence matrix of the BDF graph, which contains the integer O(e
i
) in position ( j, i)
if the ith actor produces O(e
i
) tokens on the jth arc and I (e
i
) if the ith actor consumes I (e
i
) tokens from
the jth arc (self-loop arcs are ignored, since their consistency checking is trivial). For dynamic actors the
number of produced and consumed tokens depends on control ports. This is represented in the incidence
matrix by using symbolic variables p
i
(one for each Boolean stream) that are interpreted as ratios of TRUE
4
Note that this is different from the select operation introduced in Section 13.3.1, because it is a deterministic
operation, depending not on the number of available input tokens, which in turn may depend on the scheduling order,
but rather on the value of the control port, which is guaranteed to be independent of the scheduling order.
2006 by Taylor & Francis Group, LLC
13-12 Embedded Systems Handbook
values out of all values present in the stream (this ratio is [1 p
i
] for FALSE values). Then the system of
equations to be solved is:
r = 0
where 0 is a vector with all entries equal to 0, and r is the repetition vector with one entry, called r
i
, for
each actor, representing how many times actor i res in order to bring the BDF to the original marking.
If a nonzero solution for this system exists then the repetition vector shows how many times each actor
must re to return the graph to the initial marking.
Applying the above procedure to the incidence matrix in Figure 13.7(c) corresponding to the BDF
graph from Figure 13.7(b) one can nd the repetition vector r = [1 1 (1 p ) p 1 1 1]. Note, that the
existence of solution cannot depend on the value of p, since the values of the Boolean stream b are arbitrary.
By simulating the ring of actors according to the values of r for both p = 0 and p = 1, one can see
that the repetition vector indeed describes the reable sequence of actors, and the existence of a bounded
length schedule for BDF graph Figure 13.7(b) is proved. This procedure is effective for any arbitrary
BDF graph [6].
13.4.2.2 Bounded Memory Schedule
If a bounded length schedule is found, then it obviously provides a bounded memory schedule as well.
However, the converse is not true. There are specications that do not have a bounded length schedule, but
are perfectly schedulable with bounded memory. A common example is given by a loop with an unknown
bound on the number of iterations (e.g., see Figure 13.1). For such specications the length of the cyclic
sequence is unbounded, because it depends on the number of loop iterations.
The problem of nding bounded memory schedules in BDF graphs is undecidable [6], hence conser-
vative heuristic techniques, which may not nd a solution even if one exists, exactly like the algorithmof
Section 13.3.4, must be used. We describe two of them: clustering and state enumeration.
Clustering. The goal of the clustering algorithm is to map a BDF graph into the traditional control
structures used by high-level languages, such as if-then-else and do-while, whenever possible. The
subgraphs corresponding to these structures can then be treated as atomic actors.
At rst, adjacent actors with the same producing/consuming rates are merged into a single cluster, where
possible. Actors may not be merged if this would create deadlock, or if the resulting cluster would not be
a BDF actor (e.g., it may depend on a control arc that is hidden by the merge operation). Then clusters
are enclosed into conditional or loop constructs, as required in order to match the token production and
consumption rates of their neighbors. The procedure terminates when no more changes in the BDF graph
are possible. At this point, if the interior of each cluster has a schedule of bounded length, and the top-level
cluster does as well, then the entire graph can be scheduled with bounded memory.
State Enumeration. One can enumerate the states that the systemcan reach by simulating the execution
of the BDF graph, similar to the scheduling approach described in Section 13.3.4. If the graph cannot
be scheduled in bounded memory, however, a straightforward state enumeration procedure will not
terminate. One possible solution is to impose an upper bound to the number of tokens that may appear
on each arc, according to some heuristic, and to assume that there is a problemif this bound is exceeded.
A technique similar to this is used in Ptolemys dynamic dataow scheduler [18].
13.4.3 Comparison to PN Model
BooleanDataFlowgraphs being Turing complete provide a very powerful specicationmodel. It is remark-
able, that in spite of that, some important scheduling problems (like bounded length schedule) have
efcient and simple solutions for them. When a designer seeks for schedules with that kind of properties,
BDF graphs are an excellent choice. The attractive feature of BDF modeling is that keeping track about
the consistency of decisions made by different actors consuming the same Boolean streamis easy. This is
automatically ensured through the use of FIFO semantics in storing Boolean values at actor ports.
2006 by Taylor & Francis Group, LLC
QSS of Concurrent Specications 13-13
In PN modeling, on the other hand, data dependencies are fully abstracted as nondeterministic choices.
This makes the designer responsible to ensure that different choices are resolved consistently when they
stem from the same data.
Undecidability in the BDF case comes from the fact that establishing the equivalence of Boolean streams
is also undecidable. Therefore, ensuring the consistency of choices done by dynamic actors is possible only
when the stream is exactly the same (like for SWITCH and SELECT actors in Figure 13.7[b]), and hence
when a single p variable can be used to represent both.
Note that in such cases, that is, with syntactic equivalence, an improved version of the tool generating
a PN model from a C model could annotate the PN so as to make this equivalence explicit. However,
no such capability is available from the PN-based tools. They resort to the simple, but cumbersome
techniques described in Reference 13. Hence the BDF scheduling implementation in Ptolemy [18] is more
user-friendly in this respect.
The abstraction of data by nondeterministic choices in PNs is, however, of great importance when
solving more difcult scheduling problems. Applications very commonly contain computations with
unknown number of iterations. For them the most interesting scheduling problem is nding a bounded
memory schedule. Here the power of BDF model becomes a burden, and makes it very difcult to devise
efcient heuristics to solve the problem.
To illustrate this let us look at Figure 13.8, which represents a BDF graph corresponding to the example of
specication in Figure 13.1, where diamonds labeled with F denote the initial marking of the corresponding
arcs with Boolean value False.
Applying clustering to this example does not provide a conclusive decision about its schedulability,
because it cannot prove that the Boolean value False would ever be produced at the input arcs of the
SELECT actors, which is needed in order to return the network to the initial marking. This is an inherent
problem of the clustering approach: it is not clear how often it succeeds in doing scheduling analysis,
unless the specication was already devised so as to use only recognizable actor patterns.
To nd a schedule for the example in Figure 13.8 one must use the state enumeration approach.
However, contrary to the PN case, the state of a BDF graph must also include the values stored in the
FIFO queues for all Boolean streams of dynamic actors. This leads to signicant memory penalties when
SWITCH
F T
SWITCH
T F
T
SELECT
F F
SELECT
T
DATA
A
B
Process 1 Process 2
2
>0?
0
OUT
1
START
+
F F
FIGURE 13.8 BDF graph for the example of Figure 13.1.
2006 by Taylor & Francis Group, LLC
13-14 Embedded Systems Handbook
storing the state graph. Even worse, it also signicantly reduces the capabilities of pruning the explored
reachability space based on different termination conditions. These conditions impose a partial order
between states and avoid generation of reachability space beyond ordered states [6, 8]. For PNs the
partial order is established purely by markings, while for BDF graphs in addition to markings it also
requires to consider values of Boolean streams. Due to this, state graphs of BDF have sparser ordering
relations and are signicantly larger.
Hence we feel that for bounded memory quasi-static schedulability analysis, the PNapproach is simpler
and more suitable, especially if the limitations of current translators fromC to PNs are addressed.
13.5 Conclusions
This chapter described modeling methods and scheduling algorithms that bridge the gap between
specication and implementation of reactive systems. From a specication given in terms of concur-
rent communicating processes, and by deriving intermediate representations based on PNs and dataow
graphs, one can(unfortunately not always) obtaina sequential schedule that canbe efciently implemented
on a processor.
Future work should consider better heuristic to nd such schedules, since the problem is undecidable
in general, once data-dependent choices come into play. Furthermore, it would be interesting to extend it
by considering sequential and concurrent implementations on several resources (e.g., CPUs and custom
datapaths) [19]. Another body of future research concerns the extension of the notion of schedule into
the time domain, in order to cope with performance constraints, while all the approaches considered in
this chapter assume innite processing speed with respect to the speed of the environment. For real-time
applications one would need to extend the scheduling frameworks by explicit annotation of systemevents
with delays, and by using timing driven algorithms for schedule construction.
References
[1] E.A. Lee and D.G. Messerschmitt. Static scheduling of synchronous data ow graphs for digital
signal processing. IEEE Transactions on Computers, C-36(1), 2435, 1987.
[2] G. Kahn. The semantics of a simple language for parallel programming. In Proceedings of IFIP
Congress, August 1974.
[3] C.A.R. Hoare. Communicating Sequential Processes. International Series in Computer Science.
Prentice Hall, Hertfordshire, 1985.
[4] N. Halbwachs. Synchronous Programming of Reactive Systems. Kluwer Academic Publishers,
Boston, MA, 1993.
[5] D. Harel, H. Lachover, A. Naamad, A. Pnueli, M. Politi, R. Sherman, A. Shtull-Trauring, and
M. Trakhtenbrot. STATEMATE: a working environment for the development of complex reactive
systems. IEEE Transactions on Software Engineering, 16(4), 403414, 1990.
[6] J. Buck. Scheduling Dynamic Dataow Graphs with Bounded Memory Using the Token Flow
Model. Ph.D. thesis, University of California, Berkeley, 1993.
[7] J.T. Buck. Static scheduling and code generationfromdynamic dataowgraphs with integer valued
control streams. In Proceedings of the 28th Asilomar Conference on Signals, Systems, and Computer,
October 1994.
[8] J. Cortadella, A. Kondratyev, L. Lavagno, C. Passerone, and Y. Watanabe. Quasi-static scheduling
of independent tasks for reactive systems. IEEE Transactions on Computer-Aided Design, 24(9),
2004.
[9] T.M. Parks. Bounded Scheduling of Process Networks. Ph.D. thesis, Department of EECS,
University of California, Berkeley, 1995. Technical report UCB/ERL 95/105.
[10] K. Strehl, L. Thiele, D. Ziegenbein, R. Ernst et al. Scheduling hardware/software systems using
symbolic techniques. In International Workshop on Hardware/Software Codesign, 1999.
2006 by Taylor & Francis Group, LLC
QSS of Concurrent Specications 13-15
[11] P. Wauters, M. Engels, R. Lauwereins, andJ.A. Peperstraete. Cyclo-dynamic dataow. InProceedings
of the 4th EUROMICRO Workshop on Parallel and Distributed Processing, January 1996.
[12] T. Stefanov, C. Zissulescu, A. Turjan, B. Kienhuis, and E. Deprettere. System design using Kahn
process networks: the Compaan/Laura approach. In Proceedings of the Design Automation and Test
in Europe Conference, February 2004.
[13] G. Arrigoni, L. Duchini, L. Lavagno, C. Passerone, and Y. Watanabe. False path elimination in
quasi-static scheduling. In Proceedings of the Design Automation and Test in Europe Conference,
March 2002.
[14] F. Thoen, M. Cornero, G. Goossens, and H. De Man. Real-time multi-tasking in software synthesis
for informationprocessing systems. InProceedings of the International SystemSynthesis Symposium,
1995.
[15] Javier Esparza. Decidability and complexity of Petri net problems an introduction. In Lectures
on Petri Nets I: Basic Models, Advances in Petri Nets, Lecture notes on Computer Science, vol. 1491,
Petri Nets, 1996, pp. 374428.
[16] T. Murata. Petri nets: properties, analysis, andapplications. Proceedings of the IEEE, 77(4), 541580,
1989.
[17] E.A. de Kock, G. Essink, W.J.M. Smits, P. van der Wolf, J.-Y. Brunel, W.M. Kruijtzer, P. Lieverse,
and K.A. Vissers. YAPI: application modeling for signal processing systems. In Proceedings of the
37th Design Automation Conference, June 2000.
[18] Joseph Buck, Soonhoi Ha, Edward A. Lee, and David G. Messerschmitt. Ptolemy: a framework for
simulating and prototyping heterogenous systems. International Journal in Computer Simulation,
4(2), 1994.
[19] J. Cortadella, A. Kondratyev, L. Lavagno, A. Taubin, and Y. Watanabe. Quasi-static scheduling for
concurrent architectures. Fundamenta Informaticae, 62, 171196, 2004.
2006 by Taylor & Francis Group, LLC
Timing and
Performance
Analysis
14 Determining Bounds on Execution Times
Reinhard Wilhelm
15 Performance Analysis of Distributed Embedded Systems
Lothar Thiele and Ernesto Wandeler
2006 by Taylor & Francis Group, LLC
14
Determining Bounds
on Execution Times
Reinhard Wilhelm
Universitt des Saarlandes
14.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2
Tool Architecture and Algorithm Timing Anomalies Contexts
14.2 Cache-Behavior Prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-6
Cache Memories Cache Semantics Abstract Semantics
14.3 Pipeline Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-10
Simple Architectures without Timing Anomalies Processors
with Timing Anomalies Algorithm Pipeline-Analysis
Pipeline Modeling Formal Models of Abstract Pipelines
Pipeline States
14.4 Path Analysis Using Integer Linear Programming. . . . . 14-17
14.5 Other Ingredients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18
Value Analysis Control Flow Specication and Analysis
Frontends for Executables
14.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-19
A (Partly) Dynamic Method Purely Static Methods
14.7 State of the Art and Future Extensions . . . . . . . . . . . . . . . . . 14-20
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-21
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-21
Run-time guarantees play an important role in the area of embedded systems and especially hard real-time
systems. These systems are typically subject to stringent timing constraints, which often result from
the interaction with the surrounding physical environment. It is essential that the computations are
completed within their associated time bounds; otherwise severe damages may result, or the system may
be unusable. Therefore, a schedulability analysis has to be performed which guarantees that all timing
constraints will be met. Schedulability analyses require upper bounds for the execution times of all
tasks in the system to be known. These bounds must be safe, that is, they may never underestimate the
real execution time. Furthermore, they should be tight, that is, the overestimation should be as small
as possible.
In modern microprocessor architectures, caches, pipelines, and all kinds of speculation are key features
for improving (average-case) performance. Unfortunately, they make the analysis of the timing behavior
of instructions very difcult, since the execution time of an instruction depends on the execution history.
A lack of precision in the predicted timing behavior may lead to a waste of hardware resources, which
would have to be invested in order to meet the requirements. For products which are manufactured
14-1
2006 by Taylor & Francis Group, LLC
14-2 Embedded Systems Handbook
in high quantities, for example, in the automobile or telecommunications markets this would result in
intolerable expenses.
Subject of this chapter are one particular approach and the subtasks involved in computing safe and
precise bounds on the execution times for real-time systems.
14.1 Introduction
Hard real-time systems are subject to stringent timing constraints which are dictated by the surrounding
physical environment. We assume that a real-time system consists of a number of tasks, which realize
the required functionality. A schedulability analysis for this set of tasks and a given hardware has to
be performed in order to guarantee that all the timing constraints of these tasks will be met (timing
validation). Existing techniques for schedulability analysis require upper bounds for the execution times
of all the systems tasks to be known. These upper bounds are commonly called the worst-case execution
times (WCETs), a misnomer that causes a lot of confusion and will therefore not be adopted in this
presentation. In analogy, lower bounds on the execution time have been named best-case execution times
(BCET). These upper bounds (and lower bounds) have to be safe, that is, they must never underestimate
(overestimate) the real execution time. Furthermore, they should be tight, that is, the overestimation
(underestimation) should be as small as possible.
Figure 14.1 depicts the most important concepts of our domain. The system shows a certain variation
of execution times depending on the input data or different behavior of the environment. In general,
the state space is too large to exhaustively explore all possible executions and so determine the exact
worst-case and best-case execution times, WCET and BCET, respectively. Some abstraction of the system
is necessary to make a timing analysis of the system feasible. These abstractions loose information, and
thus are responsible for the distance between WCETs and upper bounds and between BCETs and lower
bounds.
How much is lost depends both on the methods used for timing analysis and on system properties,
such as the hardware architecture and the cleanness of the software. So, the two distances mentioned above,
termed upper predictability and lower predictability can be seen as a measure for the timing predictability
of the system. Experience has shown that the two predictabilities can be quite different, cf. Reference 1.
The methods used to determine upper bounds and lower bounds are the same. We will concentrate on
the determination of upper bounds unless otherwise stated.
Methods to compute sharp bounds for processors with xed execution times for each instruction have
long been established [2,3]. However, in modern microprocessor architectures caches, pipelines, and all
kinds of speculation are key features for improving (average-case) performance. Caches are used to bridge
the gap between processor speed and the access time of main memory. Pipelines enable acceleration
by overlapping the executions of different instructions. The consequence is that the execution time of
individual instructions, and thus the contribution of one execution of an instruction to the programs
t
0
Best
case
Worst
case
Upper
bound
Lower
bound
Variation of execution time
w.c. performance
w.c. guarantee
Predictability
FIGURE 14.1 Basic notions concerning timing analysis of systems.
2006 by Taylor & Francis Group, LLC
Determining Bounds on Execution Times 14-3
No
30
1
3
1
3
6
6
6
41
6
4
Execute Fetch
ICache miss? Unit occupied? Multicycle? Pending instructions?
Issue Retire
19
Yes
FIGURE 14.2 Different paths through the execution of a multiply instruction. Unlabeled transitions take 1 cycle.
execution time can vary widely. The interval of execution times for one instruction is bounded by the
execution times of the following two cases:
The instruction goes smoothly through the pipeline; all loads hit the cache, no pipeline hazard
happens, that is, all operands are ready, no resource conicts with other currently executing
instructions exist.
Everything goes wrong, that is, instruction and/or operand fetches miss the cache, resources
needed by the instruction are occupied, etc.
Figure 14.2 shows the different paths through a multiply instruction of a PowerPC processor. The
instruction-fetch phase may nd the instruction in the cache (cache hit ), in which case it takes 1 cycle to
load it. In the case of a cache miss, it may take something like 30 cycles to load the memory block con-
taining the instruction into the cache. The instruction needs an arithmetic unit, which may be occupied
by a preceding instruction. Waiting for the unit to become free may take up to 19 cycles. This latency
would not occur, if the instruction fetch had missed the cache, because the cache-miss penalty of 30 cycles
has allowed any preceding instruction to terminate its arithmetic operation. The time it takes to multiply
two operands depends on the size of the operands; for small operands, 1 cycle is enough, for larger, three
are needed. When the operation has nished, it has to be retired in the order it appeared in the instruction
stream. The processor keeps a queue for instructions waiting to be retired. Waiting for a place in this queue
may take up to 6 cycles. On the dashed path, where the execution always takes the fast way, its overall
execution time is 4 cycles. However, on the dotted path, where it always takes the slowest way, the overall
execution time is 41 cycles.
We will call any increase in execution time during an instructions execution a timing accident and
the number of cycles by which it increases the timing penalty of this accident. Timing penalties for an
instruction can add up to several hundred processor cycles. Whether the execution of an instruction
encounters a timing accident depends on the execution state, for example, the contents of the cache(s), the
occupancy of other resources, and thus on the execution history. It is therefore obvious that the attempt
to predict or exclude timing accidents needs information about the execution history.
For certain classes of architectures, namely those without timing anomalies of Section 1, excluding
timing accidents means decreasing the upper bounds. However, for those with timing anomalies this
assumption is not true.
14.1.1 Tool Architecture and Algorithm
A more or less standard architecture for timing-analysis tools has emerged [46]. Figure 14.3 shows
one instance of this architecture. A rst phase, depicted on the left, predicts the behavior of processor
2006 by Taylor & Francis Group, LLC
14-4 Embedded Systems Handbook
CFG builder
FIGURE 14.3 The architecture of the aiT timing-analysis tool.
components for the instructions of the program. It usually consists of a sequence of static program
analyses of the program. They altogether allow to derive safe upper bounds for the execution times of
basic blocks. A second phase, the column on the right, computes an upper bound on the execution
times over all possible paths of the program. This is realized by mapping the control ow of the pro-
gram to an Integer Linear Program and solving this by appropriate methods. This architecture has been
successfully used to determine precise upper bounds on the execution times of real-time programs run-
ning on processors used in embedded systems [1,710]. A commercially available tool, aiT by AbsInt,
cf. http://www.absint.de/wcet.htm, was implemented and is used in the aeronautics and
automotive industries.
The structure of the rst phase, processor-behavior prediction, often called microarchitecture analysis,
may vary depending on the complexity of the processor architecture. A rst, modular approach would be
the following:
1. Cache-behavior prediction determines statically and approximately the contents of caches at each
program point. For each access to a memory block, it is checked, whether the analysis can safely
predict a cache hit.
Information about cache contents can be forgotten after the cache analysis. Only the miss/hit
information is needed by the pipeline analysis.
2. Pipeline-behavior prediction analyzes, how instructions pass through the pipeline taking cache-hit
or miss information into account. The cache-miss penalty is assumed for all cases, where a cache
hit can not be guaranteed.
At the end of simulating one instruction, the pipeline analysis continues with only those states
that show the locally maximal execution times. All others can be forgotten.
14.1.2 Timing Anomalies
Unfortunately, this approach is not safe for many processor architectures. Most powerful microprocessors
have so-called timing anomalies. Timing anomalies are contra-intuitive inuences of the (local) execution
time of one instruction on the (global) execution time of the whole program [11]. The interaction of
2006 by Taylor & Francis Group, LLC
Determining Bounds on Execution Times 14-5
several processor features can interact in such a way that a locally faster execution of an instruction can
lead to a globally longer execution time of the whole program.
For example, a cache miss contributes the cache-miss penalty to the execution time of a program. It was,
however, observed for the MCF 5307 [12], that a cache miss may actually speed up program execution.
Since the MCF 5307 has a unied cache and the fetch and execute pipelines are independent, the following
can happen: a data access that is a cache hit is served directly from the cache. At the same time, the
fetch pipeline fetches another instruction block from main memory, performing branch prediction and
replacing two lines of data in the cache. These may be reused later on and cause two misses. If the data
access was a cache miss, the instruction fetch pipeline may not have fetched those two lines, because the
execution pipeline may have resolved a misprediction before those lines were fetched.
The general case of a timing anomaly is the following. Different assumption about the processors
execution state, for example, the fact that the instruction is or is not in the instruction cache, will result in
a difference T
local
of the execution time of the instruction between these two cases. Either assumption
may lead to a difference T of the global execution time compared to the other one. We say that a timing
anomaly occurs if either
T
local
< 0 that is, the instruction executes faster, and
T < T
local
, the overall execution is accelerated by more than the acceleration of the
instruction, or
T > 0, the program runs longer than before.
T
local
> 0 that is, the instruction takes longer to execute, and
T > T
local
, that is, the overall execution is extended by more than the delay of the
instruction, or
T < 0 that is, the overall execution of the program takes less time to execute than before.
The case T
local
< 0 T > 0 is a critical case for our timing analysis. It makes it impossible to use
local worst cases for the calculation of the programs execution time. The analysis has to follow all possible
paths as will be explained in Section 14.3.
14.1.3 Contexts
The contribution of an individual instruction to the total execution time of a program may vary widely
depending on the execution history. For example, the rst iteration of a loop typically loads the caches, and
later iterations prot from the loaded memory blocks being in the caches. In this case, the execution of an
instruction in a rst iteration encounters one or more cache misses and pays with the cache-miss penalty.
Later executions, however, will execute much faster because they hit the cache. Asimilar observation holds
for dynamic branch predictors. They may need a few iterations until they stabilize and predict correctly.
Therefore, precision is increased if instructions are considered in their control-ow context, that is,
the way control reached them. Contexts are associated with basic blocks, that is, maximally long straight-
line code sequences that can be only entered at the rst instruction and left at the last. They indicate
through which sequence of function calls and loop iterations control arrived at the basic block. Thus,
when analyzing the cache behavior of a loop, precision can be increased by regarding the rst iteration of
the loop and all other iterations separately; more precisely, to unroll the loop once and then analyze the
resulting code.
1
Denition 14.1 Let p be a program with set of functions P = {p
1
, p
2
, . . . , p
n
] and set of loops
L = {l
1
, l
2
, . . . , l
n
]. A word c over the alphabet P L IN is called a context for a basic block b, if b
can be reached by calling the functions and iterating through the loops in the order given in c.
1
Actually, this unrolling transformation need not be really performed, but can be incorporated into the iteration
strategy of the analyzer. So, we talk of virtually unrolling the loops.
2006 by Taylor & Francis Group, LLC
14-6 Embedded Systems Handbook
Even, if all loops have static loop bounds and recursion is also bounded, there are in general too many
contexts to consider themexhaustively. Aheuristics is used to keep relevant contexts apart and summarize
the rest conservatively, if their inuence on the behavior of instructions does not signicantly differ.
Experience has shown [10], that a few rst iterations and recursive calls are sufcient to stabilize the
behavior information, as the above example indicates, and that the right differentiation of contexts is
decisive for the precision of the prediction [13].
A particular choice of contexts transforms the call and the control ow graph into a context-extended
control-ow graph by virtually unrolling the loops and virtually inlining the functions as indicated by the
contexts. The formal treatment of this concept is quite involved and shall not be given here. It can be
found in Reference 14.
14.2 Cache-Behavior Prediction
Abstract Interpretation [15] is used to compute invariants about cache contents. How the behavior of
programs on processor pipelines is predicted follows in Section 14.3.
14.2.1 Cache Memories
A cache can be characterized by three major parameters:
Capacity is the number of bytes it may contain.
Line size (also calledblock size) is the number of contiguous bytes that are transferredfrommemory
on a cache miss. The cache can hold at most n = capacity/line size blocks.
Associativity is the number of cache locations where a particular block may reside.
n/associativity is the number of sets of a cache.
If a block can reside in any cache location, then the cache is called fully associative. If a block can reside in
exactly one location, then it is called direct mapped. If a block can reside in exactly A locations, then the
cache is called A-way set associative. The fully associative and the direct mapped caches are special cases
of the A-way set associative cache where A = n and A = 1, respectively.
In the case of an associative cache, a cache line has to be selected for replacement when the cache is
full and the processor requests further data. This is done according to a replacement strategy. Common
strategies are LRU (Least Recently Used), FIFO (First In First Out), and random.
The set where a memory block may reside in the cache is uniquely determined by the address of the
memory block, that is, the behavior of the sets is independent of each other. The behavior of an A-way set
associative cache is completely described by the behavior of its n/A fully associative sets. This holds also
for direct mapped caches where A = 1.
For the sake of space, we restrict our description to the semantics of fully associative caches with LRU
replacement strategy. More complete descriptions that explicitly describe direct mapped and A-way set
associative caches can be found in References 8 and 16.
14.2.2 Cache Semantics
In the following, we consider a (fully associative) cache as a set of cache Lines L = {l
1
, . . . , l
n
] and the
store as a set of memory blocks S = {s
1
, . . . , s
m
].
To indicate the absence of any memory block in a cache line, we introduce a newelement I ; S
/
= S {I ].
Denition 14.2 (concrete cache state) A (concrete) cache state is a function c : L S
/
.
C
c
denotes the set of all concrete cache states. The initial cache State c
I
maps all cache lines to I .
If c(l
i
) = s
y
for a concrete cache state c, then i is the relative age of the memory block according to the
LRU replacement strategy and not necessarily the physical position in the cache hardware.
2006 by Taylor & Francis Group, LLC
Determining Bounds on Execution Times 14-7
z
y
t
x
z
y
x
z
t
x
s
z
t
x
s
s
[s]
young
old
Age
FIGURE 14.4 Update of a concrete fully associative (sub-) cache.
The update function describes the effect on the cache of referencing a block in memory. The referenced
memory block s
x
moves into l
1
if it was in the cache already. All memory blocks in the cache that had
been used more recently than s
x
increase their relative age by one, that is, they are shifted by one position
to the next cache line. If the referenced memory block was not yet in the cache, it is loaded into l
1
after all
memory blocks in the cache have been shifted and the oldest, that is, least recently used memory block,
has been removed from the cache if the cache was full.
Denition 14.3 (cache update) A cache update function U : C
c
S C
c
determines the new cache state
for a given cache state and a referenced memory block.
Updates of fully associative caches with LRU replacement strategy are pictured as in Figure 14.4.
14.2.2.1 Control Flow Representation
We represent programs by control ow graphs consisting of nodes and typed edges. The nodes represent
basic blocks. A basic block is a sequence (of fragments) of instructions in which control ow enters
at the beginning and leaves at the end without halt or possibility of branching except at the end. For
cache analysis, it is most convenient to have one memory reference per control ow node. Therefore,
our nodes may represent the different fragments of machine instructions that access memory. For non-
precisely determined addresses of data references, one can use a set of possibly referenced memory blocks.
We assume that for each basic block, the sequence of references to memory is known (This is appropriate
for instruction caches and can be too restricted for data caches and combined caches. See References 7
and 16 for weaker restrictions.), that is, there exists a mapping from control ow nodes to sequences of
memory blocks: L : V S
.
We can describe the effect of such a sequence on a cache with the help of the update function U. There-
fore, we extend U to sequences of memory references by sequential composition: U(c, s
x
1
, . . . , s
x
y
)) =
U(. . . (U(c, s
x
1
)), . . . , s
x
y
).
The cache state for a path (k
1
, . . . , k
p
) in the control ow graph is given by applying U to the
initial cache state c
I
and the concatenation of all sequences of memory references along the path:
U(c
I
, L(k
1
), . . . , L(k
p
)).
The Collecting Semantics of a program gathers at each program point the set of all execution states,
which the program may encounter at this point during some execution. A semantics on which to base
a cache analysis has to model cache contents as part of the execution state. One could thus compute the
collecting semantics and project the execution states onto their cache components to obtain the set of
all possible cache contents for a given program point. However, the collecting semantics is in general
not computable.
Instead, one restricts the standard semantics to only those program constructs, which involve the
cache, that is, memory references. Only they have an effect on the cache modelled by the cache update
function, U. This coarser semantics may execute program paths which are not executable in the start
2006 by Taylor & Francis Group, LLC
14-8 Embedded Systems Handbook
semantics. Therefore, the Collecting Cache Semantics of a program computes a superset of the set of all
concrete cache states occurring at each program point.
Denition 14.4 (Collecting Cache Semantics) The Collecting Cache Semantics of a program is
C
coll
( p) = {U(c
I
, L(k
1
), . . . , L(k
n
))[(k
1
, . . . , k
n
) path in the CFG leading to p]
This collecting semantics would be computable, although often of enormous size. Therefore, another
step abstracts it into a compact representation, so called abstract cache states. Note that every information
drawn from the abstract cache states allows to safely deduce information about sets of concrete cache
states, that is, only precision may be reduced in this two step process. Correctness is guaranteed.
14.2.3 Abstract Semantics
The specication of a program analysis consists of the specication of an abstract domain and of the
abstract semantic functions, mostly called transfer functions. The least upper bound operator of the
domain combines information when control ow merges.
We present two analyses. The must analysis determines a set of memory blocks that are in the cache at
a given program point whenever execution reaches this point. The may analysis determines all memory
blocks that may be in the cache at a given program point. The latter analysis is used to determine the
absence of a memory block in the cache.
The analyses are used to compute a categorization for each memory reference describing its cache
behavior. The categories are described in Table 14.1.
The domains for our abstract interpretations consist of abstract cache states.
Denition 14.5 (abstract cache state) An abstract cache state c : L 2
S
maps cache lines to sets of
memory blocks.
C denotes the set of all abstract cache states.
The position of a line in an abstract cache will, as in the case of concrete caches, denote the relative age
of the corresponding memory blocks. Note, however, that the domains of abstract cache states will have
different partial orders and that the interpretation of abstract cache states will be different in the different
analyses.
The following functions relate concrete and abstract domains. An extraction function, extr, maps a
concrete cache state to an abstract cache state. The abstraction function, abstr, maps sets of concrete cache
states to their best representation in the domain of abstract cache states. It is induced by the extraction
function. The concretization function, concr, maps an abstract cache state to the set of all concrete cache
states represented by it. It allows to interpret abstract cache states. It is often induced by the abstraction
function, cf. Reference 17.
Denition 14.6 (extraction, abstraction, concretization functions) The extraction function extr:
C
c
C forms singleton sets from the images of the concrete cache states it is applied to, that is,
extr(c)(l
i
) = {s
x
] if c(l
i
) = s
x
.
The abstraction function abstr: 2
C
c
C is dened by abstr(C) = .{extr(c)[c C]
The concretization function concr:
C 2
C
c
is dened by concr( c) = {c[extr(c) _ c].
TABLE 14.1 Categorizations of Memory References and Memory Blocks
Category Abbreviation Meaning
Always hit ah The memory reference will always result in a cache hit.
Always miss am The memory reference will always result in a cache miss.
Not classied nc The memory reference could neither be classied as ah nor am.
2006 by Taylor & Francis Group, LLC
Determining Bounds on Execution Times 14-9
So much of commonalities of all the domains to be designed. Note, that all the constructions are
parameterized in . and _.
The transfer functions, the abstract cache update functions, all denoted
A = 2
C
(14.1)
t
1
,
t
2
, . . . ,
t
m
].
max ({[
t
1
[, [
t
2
[, . . . , [
t
m
[]) is the bound for this basic block in this context.
The set of output states {last (
t
1
), last (
t
2
), . . . , last (
t
m
)] will be passed on to the successor block(s) in
context c as initial states.
Basic blocks (in some context) having more than one predecessor receive the union of the set of output
states as initial states.
The abstraction we use as analysis states is a set of abstract pipeline states, since the number of possible
pipeline states for one instruction is not too big. Hence, our abstraction computes an upper bound to
the collecting semantics. The abstract update for an analysis state a is thus the application of the concrete
update on each abstract pipeline state in a extended with the possibility of multiple successor states in
case of uncertainties.
Figure 14.9 shows the possible pipeline states for a basic block in this example. Such pictures are shown
by aiT tool upon special demand. The large dark grey boxes correspond to the instructions of the basic
block, and the smaller rectangles in them stand for individual pipeline states. Their cyclewise evolution is
indicated by the strokes connecting them. Each layer in the trees corresponds to one CPU cycle. Branches
in the trees are caused by conditions that could not be statically evaluated, for example, a memory access
with unknown address in presence of memory areas with different access times. On the other hand,
two pipeline states fall together when details they differ in leave the pipeline. This happened, for instance,
at the end of the second instruction, reducing the number of states from four to three.
The update function belonging to an edge (,
/
) of the control-ow graph updates each abstract
pipeline state separately. When the bus unit is updated, the pipeline state may split into several successor
states with different cache states. The initial analysis state is a set of empty pipeline states plus a cache
that represents a cache with unknown content. There can be multiple concrete pipeline states in the initial
states, since the adjustment of internal to external clock of the processor is not known in the beginning
and every possibility (aligned, one cycle apart, etc.) has to be considered. Thus prefetching must start from
2006 by Taylor & Francis Group, LLC
14-14 Embedded Systems Handbook
FIGURE 14.9 Possible pipeline states in a basic block.
scratch, but pending bus requests are ignored. To obtain correct results, they must be taken into account
by adding a xed penalty to the calculated upper bounds.
14.3.4 Pipeline Modeling
The basis for pipeline analysis is a model of an abstract version of the processor pipeline, which is
conservative with respect to the timing behavior, that is, times predicted by the abstract pipeline must
never be lower than those observed in concrete executions. Some terminology is needed to avoid confusion.
Processors have concrete pipelines, which may be described in some formal language, for example, VHDL.
If this is the case, there exists a formal model of the pipeline. Our abstraction step, by which we eliminate
many components of a concrete pipeline that are not relevant for the timing behavior lead us to an abstract
pipeline. This may again be described in a formal language, for example, VHDL, and thus have a formal
model. Deriving an abstract pipeline is a complex task. It is demonstrated for the Motorola ColdFire
processor, a processor quite popular in the aeronautics and the submarine industry. The presentation
follows closely that of Reference 18.
2
14.3.4.1 The ColdFire MCF 5307 Pipeline
The pipeline of the ColdFire MCF 5307 consists of a fetch pipeline that fetches instructions from memory
(or the cache), and an execution pipeline that executes instructions, cf. Figure 14.10. Fetch and execution
pipelines are connected and as far as speed is concerned decoupled by a FIFO instruction buffer that can
hold at most 8 instructions.
The MCF 5307 accesses memory through a bus hierarchy. The fast pipelined K-bus connects the cache
and an internal 4KB SRAM area to the pipeline. Accesses to this bus are performed by the IC1/IC2 and the
AGEX and DSOC stages of the pipeline. On the next level, the M-Bus connects the K-Bus to the internal
peripherals. This bus runs at the external bus frequency, while the K-Bus is clocked with the faster internal
core clock. The M-Bus connects to the external bus, which accesses off-chip peripherals and memory.
The fetch pipeline performs branch prediction in the IED stage, redirecting fetching long before the
branch reaches the execution stages. The fetch pipeline is stalled if the instruction buffer is full, or if
the execution pipeline needs the bus for a memory access. All these stalls cause the pipeline to wait for
one cycle. After that, the stall condition is checked again.
The fetch pipeline is also stalled if the memory block to be fetched is not in the cache (cache miss). The
pipeline must wait until the memory block is loaded into the cache and forwarded to the pipeline. The
instructions that are already in the later stages of the fetch pipeline are forwarded to the instruction buffer.
2
The model of the abstract pipeline of the MCF 5307 has been derived by hand. A computer-supported derivation
would have been preferable. Ways to develop this are subject of actual research.
2006 by Taylor & Francis Group, LLC
Determining Bounds on Execution Times 14-15
Instruction
Address
Generation
Instruction
Fetch Cycle 1
Instruction
Fetch Cycle 2
Instruction
Early Decode
FIFO
Instruction Buffer
Decode & Select,
Operand Fetch
Address
Generation,
Execute
IAG
IC1
IC2
IB
DSOC
AGEX
IED
Instruction
Fetch
Pipeline
(IFP)
Operand
Execution
Pipeline
(OEP)
Address [31:0]
Data[31:0]
FIGURE 14.10 The pipeline of the Motorola ColdFire 5307 processor.
The execution pipeline nishes the decoding of instructions, evaluates their operands, and executes
the instructions. Each kind of operation follows a xed schedule. This schedule determines, how many
cycles the operation needs and in which cycles memory is accessed.
3
The execution time varies between
2 cycles and several dozen cycles. Pipelining admits a maximum overlap of 1 cycle between consecutive
instructions: the last cycle of each instruction may overlap with the rst of the next one. In this rst cycle,
no memory access and no control-ow alteration happen. Thus, cache and pipeline cannot be affected by
two different instructions in the same cycle. The execution of an instruction is delayed if memory accesses
lead to cache misses. Misaligned accesses lead to small time penalties of 1 to 3 cycles. Store operations
are delayed if the distance to the previous store operation is less than 2 cycles. (This does not hold if the
previous store operation was issued by a MOVEMinstruction.) The start of the next instruction is delayed
if the instruction buffer is empty.
14.3.5 Formal Models of Abstract Pipelines
An abstract pipeline can be seen as a big nite state machine, which makes a transition on every clock cycle.
The states of the abstract pipeline, although greatly simplied still contain all timing relevant information
3
In fact, there are some instructions like MOVEM whose execution schedule depends on the value of an argument
as immediate constant. These instructions can be taken into account by special means.
2006 by Taylor & Francis Group, LLC
14-16 Embedded Systems Handbook
of the processor. The number of transitions it takes from the beginning of the execution of an instruction
until its end gives the execution time of that instruction.
The abstract pipeline although greatly reduced by leaving out irrelevant components still is a really big
nite state machine, but it has structure. Its states canbe naturally decomposedinto components according
to the architecture. This makes it easier to specify, verify, and implement a model of an abstract pipeline.
In the formal approach presented here, an abstract pipeline state consists of several units with inner states
that communicate with one another and the memory via signals, and evolve cycle-wise according to their
inner state and the signals received. Thus, the means of decomposition are units and signals.
Signals may be instantaneous, meaning that they are receivedinthe same cycle as they are sent, or delayed,
meaning that they are received one cycle after they have been sent. Signals may carry data, for example,
a fetch address. Note that these signals are only part of the formal pipeline model. They may or may
not correspond to real hardware signals. The instantaneous signals between units are used to transport
information between the units. The state transitions are coded in the evolution rules local to each unit.
Figure 14.11 shows the formal pipeline model for the ColdFire MCF 5307. It consists of the following
units: IAG(instruction address generation), IC1 (instruction fetch cycle 1), IC2 (instruction fetch cycle 2),
IED (instruction early decode), IB (instruction buffer), EX (execution unit), SST (store stall timer).
In addition, there is a bus unit modeling the buses that connect the CPU, the static RAM, the cache, and
set(a)/stop
IAG
IC1
IC2
IED
IB
EX
SST
B
U
S
U
N
I
T
addr(a)
await(a)
put(a)
instr
start
store
wait
wait
wait
wait
fetch(a)
hold
code(a)
wait
wait
read(A)/write(A)
data/hold
next
cancel
cancel
FIGURE 14.11 Abstract model of the Motorola ColdFire 5307 processor.
2006 by Taylor & Francis Group, LLC
Determining Bounds on Execution Times 14-17
the main memory. The signals between these units are shown as arrows. Most units directly correspond
to a stage in the real pipeline. However, the SST unit is used to model the fact that two stores must be
separated by at least two clock cycles. It is implemented as a (virtual) counter. The two stages of the
execution pipeline are modeled by a single stage, EX, because instructions can only overlap by one cycle.
The inner states and emitted signals of the units evolve in each cycle. The complexity of this state
update varies from unit to unit. It can be as simple as a small table, mapping pending signals and inner
state to a new state and signals to be emitted, for example, for the IAG unit and the IC1 unit. It can be
much more complicated, if multiple dependencies have to be considered, for example, the instruction
reconstruction and branch prediction in the IED stage. In this case, the evolution is formulated in pseudo
code. Full details on the model can be found in Reference 19.
14.3.6 Pipeline States
Abstract Pipeline States are formed by combining the inner states of IAG, IC1, IC2, IED, IB, EX, SST,
and bus unit plus additional entries for pending signals into one overall state. This overall state evolves
from one cycle to the next. Practically, the evolution of the overall pipeline state can be implemented by
updating the functional units one by one in an order that respects the dependencies introduced by input
signals and the generation of these signals.
14.3.6.1 Update Function for Pipeline States
For pipeline modeling, one needs a function that describes the evolution of the concrete pipeline state
while traveling along an edge (,
/
) of the control-ow graph. This function can be obtained by iterating
the cycle-wise update function of the previous paragraph.
An initial concrete pipeline state at has an empty execution unit EX. It is updated until an instruc-
tion is sent from IB to EX. Updating of the concrete pipeline state continues using the knowledge that
the successor instruction is
/
until EX has become empty again. The number of cycles needed from the
beginning until this point can be taken as the time needed for the transition from to
/
for this concrete
pipeline state.
14.4 Path Analysis Using Integer Linear Programming
The structure of a program and the set of program paths can be mapped to an ILP in a very natural way.
A set of constraints describes the control ow of the program. Solving these constraints yields very precise
results [5]. However, requirements for precision of the results demand analyzing basic blocks in different
contexts, that is, in different ways, how control reached them. This makes the control quite complex,
so that the mapping to an ILP may be very complex [14].
Aproblemformulated in an ILP consists of two parts: the cost function and constraints on the variables
used in the cost function. Our cost function represents the number of CPU cycles. Correspondingly, it
has to be maximized. Each variable in the cost function represents the execution count of one basic block
of the program and is weighted by the execution time of that basic block. Additionally, variables are used
corresponding to the traversal counts of the edges in the control ow graph, see Figure 14.12.
The integer constraints describing how often basic blocks are executed relative to each other can be
automatically generated from the control ow graph (Figure 14.13). However, additional information
about the program provided by the user is usually needed, as the problem of nding the worst case
program path is unsolvable in the general case. Loop and recursion bounds cannot always be inferred
automatically and must therefore be provided by the user.
The ILP approach for program path analysis has the advantage that users are able to describe in precise
terms virtually anything they know about the program by adding integer constraints. The system rst
generates the obvious constraints automatically and then adds user supplied constraints to tighten the
WCET bounds.
2006 by Taylor & Francis Group, LLC
14-18 Embedded Systems Handbook
fi
if v
1
then
else
e
1
e
3
e
6
trav(e
1
)
trav(e
3
)
trav (e
5
)
trav (e
6
)
e
5
e
4
e
2
v
1
cnt (v
1
)
cnt (v
3
)
cnt (v
4
)
trav (e
4
)
cnt(v
2
)
trav(e
2
)
v
2
v
3
v
4
v
2
v
3
v
4
FIGURE 14.12 A program snippet, the corresponding control ow graph, and the ILP variables generated.
en
e
m
e
1
e
1
. . .
v
. . .
n
trav(e
i
) = cnt(v) =
m
i =1 i =1
trav(e
i
)
FIGURE 14.13 Control ow joins and splits and ow-preservation laws.
14.5 Other Ingredients
14.5.1 Value Analysis
A static method for data-cache behavior prediction needs to know effective memory addresses of data,
in order to determine where a memory access goes. However, effective addresses are only available at run
time. Interval analysis as described by Cousot and Halbwachs [20] can help here. It can compute intervals
for address-valued objects like registers and variables. An interval computed for such an object at some
program point bounds the set of potential values the object may have when program execution reaches
this program point. Such an analysis, in aiT called value analysis has shown to be able to determine many
effective addresses in disciplined code statically [10].
14.5.2 Control Flow Specication and Analysis
Any information about the possible ow of control of the program may increase the precision of the
subsequent analyses. Control ow analysis may attempt to exclude infeasible paths, determine execution
frequencies of paths or the relation between execution frequencies of different paths or subpaths, etc.
The purpose of control owanalysis is to determine the dynamic behavior of the program. This includes
information about what functions are called and with which arguments, how many times loops iterate,
if there are dependencies between successive if-statements, etc. The main focus of ow analysis has been
the determination of loop bounds, since the bounding of loops is a necessary step in order to nd an
execution time bound for a program.
Control-ow analysis can be performed manually or automatically. Automatic analyses have been
based on various techniques, like symbolic execution, abstract interpretation, and pattern recognition
on parse trees. The best precision is achieved by using interprocedural analysis techniques, but this has
to be traded off with the extra computation time and memory required. All automatic techniques allow
a user to complement the results and guide the analysis using manual annotations, since this is sometimes
necessary in order to obtain reasonable results.
Since the ow analysis in general is performed separately from the path analysis, it does not know the
execution times of individual program statements, and must thus generate a safe (over)approximation
including all possible program executions. The path analysis will later select the path from the set of
possible program paths that corresponds to the upper bound using the time information computed by
processor behavior prediction.
2006 by Taylor & Francis Group, LLC
Determining Bounds on Execution Times 14-19
Control ow specication is preferrably done on the source level. Concepts based on source-level
constructs are used in References 6 and 21.
14.5.3 Frontends for Executables
Any reasonably precise timing analysis takes fully linkedexecutable programs as input. Source programs do
not contain information about program and data allocation, which is essential for the described methods
to predict the cache behavior.
Executables must be analyzed to reconstruct the original control ow of the program. This may be
a difcult task depending on the instruction set of the processor and the code generation of the used
compiler. A generic approach to this problem is described in References 14, 22, and 23.
14.6 Related Work
It is not possible in general to obtain upper bounds on running times for programs. Otherwise, one could
solve the halting problem. However, real-time systems only use a restricted form of programming, which
guarantees that programs always terminate. That is, recursion is not allowed (or explicitly bounded) and
the maximal iteration counts of loops are known in advance.
A worst-case running time of a program could easily be determined if the worst-case input for the
program were known. This is in general not the case. The alternative, to execute the program with all
possible inputs, is often prohibitively expensive. As a consequence, approximations for the worst-case
execution time are determined. Two classes of methods to obtain bounds can be distinguished:
Dynamic methods employ real program executions to obtain approximations. These approxima-
tions are unsafe as they only compute the maximum of a subset of all executions.
Static methods only need the program itself, maybe extended with some additional information
(like loop bounds).
14.6.1 A (Partly) Dynamic Method
A traditional method, still used in industry, combines measuring and static methods. Here, small snippets
of code are measured for their execution time, then a safety margin is applied and the results for code
pieces are combined according to the structure of the whole task. For example, if a tasks rst executes
a snippet A and thena snippet B, the resulting time is that measured for A, t
A
, added to that measured for B,
t
B
: t = t
A
t
B
. This reduces the amount of measurements that have to be made, as code snippets tend to
be reused a lot in control software and only the different snippets need to be measured. It adds, however,
the need for an argumentation about the correctness of the composition step of the measured snippet
times. This typically relies on certain implicit assumptions about the worst-case initial execution state
for these measurements. For example, the snippets are measured with an empty cache at the beginning of
the measurement under the assumption that this is the worst-case cache state. In Reference 19 it is shown
that this assumption can be wrong. The problem of unknown worst-case input exists for this method
as well, and it is still infeasible to measure execution times for all input values.
14.6.2 Purely Static Methods
14.6.2.1 The Timing-Schema Approach
In the timing-schemata approach [24], bounds for the execution times of a composed statement are
computed from the bounds of the constituents. One timing schema is given for each type of statement.
Basis are known times of the atomic statements. These are assumed to be constant and available from
a manual or are assumed to be computed in a preceding phase. Abound for the whole programis obtained
by combining results according to the structure of the program.
2006 by Taylor & Francis Group, LLC
14-20 Embedded Systems Handbook
The precision can be very bad because of some implicit assumptions underlying this method. Timing
schemes assume compositionality of bounds for execution times, that is, they compute bounds for
execution times of composed constructs from already computed bounds of the constituents. However,
as we have seen, the execution times of the constituents depend heavily on the execution history.
14.6.2.2 Symbolic Simulation
Another static method simulates the execution of the program on an abstract model of the processor. The
simulation is performed without input; the simulator thus has to be capable to deal with partly unkown
execution states. This method combines ow analysis, processor-behavior prediction, and path analysis
in one integrated phase [25,26]. One problem with this approach is that analysis time is proportional to
the actual execution time of the program with a usually large factor for doing a simulation.
14.6.2.3 WCET Determination by ILP
Li, Malik, and Wolfe proposed an ILP-based approach to WCET determination [2730]. Cache and
pipeline behavior prediction are formulated as a single linear program. The i960KBis investigated, a 32-bit
microprocessor with a 512 byte direct mapped instruction cache and a fairly simple pipeline. Only
structural hazards needtobe modeled, thus keeping the complexity of the integer linear programmoderate
compared to the expected complexity of a model for a modern microprocessor. Variable execution times,
branch prediction, and instruction prefetching are not considered at all. Using this approach for super-
scalar pipelines does not seemvery promising, considering the analysis times reportedinone of the articles.
One of the severe problems is the exponential increase of the size of the ILP in the number of competing
l -blocks. l-blocks are maximally long contiguous sequences of instructions in a basic block mapped to
the same cache set. Two l -blocks mapped to the same cache set compete if they do not have the same
address tag. For a xed cache architecture, the number of competing l -blocks grows linearly with the
size of the program. Differentiation by contexts, absolutely necessary to achieve precision, increases
this number additionally. Thus, the size of the ILP is exponential in the size of the program. Even
though the problem is claimed to be a network-ow problem the size of the ILP is killing the approach.
Growing associativity of the cache increases the number of competing l -blocks. Thus, also increasing
cache-architecture complexity plays against this approach.
Nonetheless, their method of modeling the control ow as an ILP, the so-called Implicit Path Enumera-
tion, is elegant and can be efcient if the size of the ILP is kept small. It has been adopted by many groups
working in this area.
14.6.2.4 Timing Analysis by Static Program Analysis
The method described in this chapter uses a sequence of static program analyses for determining
the programs control ow and its data accesses and for predicting the processors behavior for the
given program.
An early approach to timing analysis using data-ow analysis methods can be found in References 31
and 32. Jakob Engblom showed how to precompute parts of a timing analyzer to speed up the actual
timing analysis for architectures without timing anomalies [33].
Reference 34 gives an overview of existing tools for timing analysis, both commercially available tools
and academic prototypes.
14.7 State of the Art and Future Extensions
The timing-analysis technology described in this chapter is realized in the aiT tool and is used in the
aeronautics and automotive industries. Several benchmarks have shown that precision of the predicted
upper bounds is in the order of 10% [10]. To obtain such a precision, however, requires competent users
since the available knowledge about the programs control ow may be difcult to specify.
2006 by Taylor & Francis Group, LLC
Determining Bounds on Execution Times 14-21
The computational effort is high, but acceptable. Future optimizations will reduce this effort. As often
in static program analysis, there is a trade-off between precision and effort. Precision can be reduced if the
effort is intolerable.
The only really drawback of the described technology is the huge effort for producing abstract processor
models. Work is under way to support this activity through transformations on the VHDL level.
Acknowledgments
Many former students have worked on different parts of the method presented in this chapter and
have together built a timing-analysis tool satisfying industrial requirements. Christian Ferdinand studied
cache analysis andshowedthat precise informationabout cache contents canbe obtained. Stephan Thesing
together with Reinhold Heckmann and Marc Langenbach developed methods to model abstract pro-
cessors. Stephan went through the pains of implementing several abstract models for real-life processors
such as the ColdFire MCF 5307 and the PPC 755. I owe him my thanks for help with the presenta-
tion of pipeline analysis, Henrik Theiling contributed the preprocessor technology for the analysis of
executables and the translation of complex control ow to integer linear programs. Many thanks to
him for his contribution to the path analysis section. Michael Schmidt implemented powerful ver-
sions of value analysis. Reinhold Heckmann managed to model even very complex cache architectures.
Florian Martin implemented the program-analysis generator, PAG, which is the basis for many of the
program analyses.
References
[1] Reinhold Heckmann, Marc Langenbach, Stephan Thesing, and Reinhard Wilhelm. The inu-
ence of processor architecture an the design and the results of WCET tools. IEEE Proceedings on
Real-Time Systems, 91: 10381054, 2003.
[2] P. Puschner and Ch. Koza. Calculating the maximum execution time of real-time programs.
Real-Time Systems, 1: 159176, 1989.
[3] Chang Yun Park and Alan C. Shaw. Experiments with a program timing tool based on source-level
timing schema. IEEE Computer, 24: 4857, 1991.
[4] Christopher A. Healy, David B. Whalley, and Marion G. Harmon. Integrating the timing analysis
of pipelining and instruction caching. In Proceedings of the IEEE Real-Time Systems Symposium,
December 1995, pp. 288297.
[5] Henrik Theiling, Christian Ferdinand, and Reinhard Wilhelm. Fast and precise WCET prediction
by separated cache and path analyses. Real-Time Systems, 18: 157179, 2000.
[6] Andreas Ermedahl. A Modular Tool Architecture for Worst-Case Execution Time Analysis.
Ph.D. thesis, Uppsala University, Uppsala, Sweden, 2003.
[7] Martin Alt, Christian Ferdinand, Florian Martin, and Reinhard Wilhelm. Cache behavior predic-
tion by abstract interpretation. In Proceedings of SAS96, Static Analysis Symposium, Vol. 1145 of
Lecture Notes in Computer Science, Springer-Verlag, Heidelberg, 1996, pp. 5266.
[8] Christian Ferdinand, Florian Martin, and Reinhard Wilhelm. Cache behavior prediction by
abstract interpretation. Science of Computer Program, 35: 163189, 1999.
[9] C. Ferdinand, R. Heckmann, M. Langenbach, F. Martin, M. Schmidt, H. Theiling, S. Thesing, and
R. Wilhelm. Reliable and precise WCET determination for a real-life processor. In Proceedings of
the First International Workshop on Embedded Software Workshop, Vol. 2211 of Lecture Notes on
Computer Science, Springer-Verlag, London, 2001, pp. 469485.
[10] Stephan Thesing, Jean Souyris, Reinhold Heckmann, Famantanantsoa Randimbivololona,
Marc Langenbach, Reinhard Wilhelm, and Christian Ferdinand. An abstract interpretation-
based timing validation of hard real-time avionics software systems. In Proceedings of the 2003
International Conference onDependable Systems andNetworks (DSN2003), IEEEComputer Society,
Washington, 2003, pp. 625632.
2006 by Taylor & Francis Group, LLC
14-22 Embedded Systems Handbook
[11] Thomas Lundqvist and Per Stenstrm. Timing Anomalies in Dynamically Scheduled Micro-
processors. In Proceedings of the 20th IEEE Real-Time Systems Symposium, December 1999,
pp. 1221.
[12] T. Reps, M. Sagiv, and R. Wilhelm. Shape analysis and applications. In Y.N. Srikant and
Priti Shankar, Eds., The Compiler Design Handbook: Optimizations and Machine Code Generation,
CRC Press, Boca Raton, FL, 2002, pp. 175217.
[13] Florian Martin, Martin Alt, Reinhard Wilhelm, and Christian Ferdinand. Analysis of loops.
In Proceedings of the International Conference on Compiler Construction (CC98), Vol. 1383 of
Lecture Notes in Computer Science, Springer-Verlag, Heidelberg, 1998, pp. 8094.
[14] Henrik Theiling. Control Flow Graphs For Real-Time Systems Analysis. Ph.D. thesis, Universitt
des Saarlandes, Saarbrcken, Germany, 2002.
[15] Patrick Cousot and Radhia Cousot. Abstract interpretation: a unied lattice model for
static analysis of programs by construction or approximation of xpoints. In Proceedings of
the 4th ACM Symposium on Principles of Programming Languages, Los Angeles, CA, 1977,
pp. 238252.
[16] Christian Ferdinand. Cache Behavior Prediction for Real-Time Systems. Ph.D. thesis, Universitt
des Saarlandes, Saarbrueken, 1997.
[17] Flemming Nielson, Hanne Riis Nielson, and Chris Hankin. Principles of Program Analysis.
Springer-Verlag, Heidelberg, 1999.
[18] Marc Langenbach, Stephan Thesing, and Reinhold Heckmann. Pipeline modelling for tim-
ing analysis. In Manuel V. Hermenegildo and German Puebla, Eds., Static Analysis Symposium
SAS 2002, Vol. 2477 of Lecture Notes in Computer Science, Springer-Verlag, Heidelberg, 2002,
pp. 294309.
[19] Stephan Thesing. Safe and Precise WCET Determination by Abstract Interpretation of Pipeline
Models. Ph.D. thesis, Saarland University, Saarbruecken, 2004.
[20] Patrick Cousot and Nicolas Halbwachs. Automatic discovery of linear restraints among variables
of a program. In Proceedings of the 5th ACM SIGPLAN-SIGACT Symposium on Principles of
Programming Languages, Tucson, AZ, ACM Press, New York, 1978, pp. 8496.
[21] Andreas Ermedahl and Jan Gustafsson. Deriving annotations for tight calculation of execution
time. In Proceedings of the Euro-Par, 1997, pp. 12981307.
[22] Henrik Theiling. Extracting safe and precise control ow from binaries. In Proceedings of the
Seventh International Conference on Real-Time Systems and Application, IEEE Computer Society,
2000, pp. 2330.
[23] Henrik Theiling. Generating decision trees for decoding binaries. In ACM SIGPLAN 2001
Workshop on Languages, Compilers, and Tools for Embedded Systems, 2001, pp. 112120.
[24] Alan C. Shaw. Reasoning about time in higher-level language software. IEEE Transactions on
Software Engineering, 15: 875889, 1989.
[25] Thomas Lundqvist and Per Stenstrm. An integrated path and timing analysis method based on
cycle-level symbolic execution. Real-Time Systems, 17: 183207, 1999.
[26] Thomas Lundqvist. A WCET Analysis Method for Pipelined Microprocessors with Cache
Memories. Ph.D. thesis, Department of Computer Engineering, Chalmers University of
Technology, Sweden, 2002.
[27] Yau-Tsun Steven Li and Sharad Malik. Performance analysis of embedded software using implicit
path enumeration. In Proceedings of the 32nd ACM/IEEE Design Automation Conference, June 1995,
pp. 456461.
[28] Yau-Tsun Steven Li, Sharad Malik, and Andrew Wolfe. Efcient microarchitecture modeling and
path analysis for real-time software. In Proceedings of the IEEE Real-Time Systems Symposium,
December 1995, pp. 298307.
[29] Yau-Tsun Steven Li, Sharad Malik, and Andrew Wolfe. Performance estimation of embedded
software withinstructioncache modeling. InProceedings of the IEEE/ACMInternational Conference
on Computer-Aided Design, November 1995, pp. 380387.
2006 by Taylor & Francis Group, LLC
Determining Bounds on Execution Times 14-23
[30] Yau-Tsun Steven Li, Sharad Malik, and Andrew Wolfe. Cache modeling for real-time software:
beyonddirect mappedinstructioncaches. InProceedings of the IEEEReal-Time Systems Symposium,
December 1996.
[31] R. Arnold, F. Mueller, D. Whalley, and M. Harmon. Bounding worst-case instruction cache
performance. In Proceedings of the IEEE Real-Time Systems Symposium, Puerto Rico, December
1994, pp. 172181.
[32] Frank Mueller, David B. Whalley, and Marion Harmon. Predicting instruction cache behavior.
In Proceedings of the ACM SIGPLAN Workshop on Language, Compiler and Tool Support for
Real-Time Systems, 1994.
[33] Jakob Engblom. Processor Pipelines and Static Worst-Case Execution Time Analysis. Ph.D. thesis,
Uppsala University, Uppsala, Sweden, 2002.
[34] Reinhard Wilhelm, Jakob Engblom, Stephan Thesing, and David Whalley. The determination of
worst-case execution times introduction and survey of available tools, 2004 (submitted).
2006 by Taylor & Francis Group, LLC
15
Performance Analysis
of Distributed
Embedded Systems
Lothar Thiele and
Ernesto Wandeler
Swiss Federal Institute of
Technology
15.1 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-1
Distributed Embedded Systems Basic Terms Role in the
Design Process Requirements
15.2 Approaches to Performance Analysis . . . . . . . . . . . . . . . . . . . 15-6
Simulation-Based Methods Holistic Scheduling Analysis
Compositional Methods
15.3 The Performance Network Approach . . . . . . . . . . . . . . . . . . 15-11
Performance Network Variability Characterization
Resource Sharing and Analysis Concluding Remarks
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-17
15.1 Performance Analysis
15.1.1 Distributed Embedded Systems
An embedded system is a special-purpose information processing system that is closely integrated into
its environment. It is usually dedicated to a certain application domain and knowledge about the system
behavior at design time can be used to minimize resources while maximizing predictability.
The embedding into a technical environment and the constraints imposed by a particular application
domain very often lead to heterogeneous and distributed implementations. In this case, systems are
composed of hardware components that communicate via some interconnection network. The functional
and nonfunctional properties of the whole systemnot only depend on the computations inside the various
nodes but also on the interaction of the various data streams on the common communication media.
In contrast to multiprocessor or parallel computing platforms, the individual computing nodes have
a high degree of independence and usually communicate via message passing. It is particulary difcult to
maintain global state and workload information as the local processing nodes usually make independent
scheduling and resource access decisions.
In addition, the dedication to an application domain very often leads to heterogeneous distributed
implementations, where each node is specialized to its local environment and/or its functionality. For
example, in an automotive application one may nd nodes (usually called embedded control units) that
contain a communication controller, a CPU, memory, and I/Ointerfaces. But depending on the particular
15-1
2006 by Taylor & Francis Group, LLC
15-2 Embedded Systems Handbook
task of a node, it may contain additional digital signal processors (DSP), different kinds of CPUs and
interfaces, and different memory capacities.
The same observation holds for the interconnection networks also. They may be composed of several
interconnected smaller sub-networks, each one with its own communication protocol and topology.
For example, in automotive applications we may nd Controller Area Networks (CAN), time-triggered
protocols (TTP) like in TTCAN, or hybrid protocols like in FlexRay. The complexity of a design is
particularly high if the computation nodes responsible for a single application are distributed across
several networks. In this case, critical information may owthrough several sub-networks and connecting
gateways before it reaches its destination.
Recently, we see that the earlier described architectural concepts of heterogeneity, distributivity, and
parallelism can be seen on several layers of granularity. The term system-on-a-chip refers to the imple-
mentation of sub-systems on a single device, that contains a collection of (digital or analogue) interfaces,
busses, memory, and heterogeneous computing resources such as FPGAs, CPUs, controllers, and DSPs.
These individual components are connected using networks-on-chip that can be regarded as dedicated
interconnection networks involving adapted protocols, bridges, or gateways.
Based on the assessment given, it becomes obvious that heterogeneous and distributed embedded
systems are inherently difcult to design and to analyze. In many cases, not only the availability, the safety,
and the correctness of the computations of the whole embedded system are of major concern, but also
the timeliness of the results.
One cause for end-to-end timing constraints is the fact that embedded systems are frequently
connected to a physical environment through sensors and actuators. Typically, embedded systems are
reactive systems that are in continuous interaction with their environment and they must execute at a
pace determined by that environment. Examples are automatic control tasks, manufacturing systems,
mechatronic systems, automotive/air/space applications, radio receivers and transmitters, and signal pro-
cessing tasks in general. And also in the case of multimedia and content production, missing audio or
video samples need to be avoided under all circumstances. As a result, many embedded systems must meet
real-time constraints, that is, they must react to stimuli within the time interval dictated by the environ-
ment. A real-time constraint is called hard, if not meeting that constraint could result in a catastrophic
failure of the system, and it is called soft otherwise. As a consequence, time-predictability in the strong
sense cannot be guaranteed using statistical arguments.
Finally, let us give an example that shows part of the complexity in the performance and timing analysis
of distributed embedded systems. The example adapted fromReference 1 is particularly simple in order to
point out one source of difculties, namely the interaction of event streams on a communication resource
(Figure 15.1).
Bus load
t
BCET WCET
Sensor CPU Memory I/O
Input DSP Buffer
A1
A2
Bus
. . .
P1, P2 P3
P5, P6 P4
FIGURE 15.1 Interference of two applications on a shared communication resource.
2006 by Taylor & Francis Group, LLC
Analysis of Distributed Embedded Systems 15-3
The application A1 consists of a sensor that sends periodically bursts of data to the CPU, which stores
theminthe memory using a task P1. These data are processed by the CPUusing a task P2, with a worst-case
execution time (WCET) and a best-case execution time (BCET). The processed data are transmitted via
the shared bus to a hardware input/output device that is running task P3. We suppose that the CPU uses
a preemptive xed-priority scheduling policy, where P1 has the highest priority. The maximal workload on
the CPU is obtained when P2 continuously uses the WCET and when the sensor simultaneously submits
data. There is a second streaming application A2 that receives real-time data in equidistant packets via the
Input interface. The Input interface is running task P4 to send the data to a DSPfor processing with task P5.
The processed packets are then transferred to a playout buffer and task P6 periodically removes packets
from the buffer, for example, for playback. We suppose that the bus uses a FCFS (rst come rst serve)
scheme for arbitration. As the bus transactions fromthe applications A1 and A2 interfere on the common
bus, there will be a jitter inthe packet streamreceived by the DSPthat eventually may lead to anundesirable
buffer overow or underow. It is now interesting to note that the worst-case situation in terms of jitter
occurs if the processing in A1 uses its BCET, as this leads to a blocking of the bus for a long time period.
Therefore, the worst-case situation for the CPU load leads to a best case for the bus, and vice versa.
In case of more realistic situations, there will be simultaneous resource sharing on the computing and
communication resources, there may be different protocols and scheduling policies on these resources,
there may be a distributed architecture using interconnected sub-networks, and there may be additional
nondeterminism caused by unknown input patterns and data. It is the purpose of performance analysis
to determine the timing and memory properties of such systems.
15.1.2 Basic Terms
As a starting point of the analysis of timing and performance of embedded systems, it is very useful to
clarify a few basic terms. Very often, the timing behavior of an embedded systemcan be described by the
time interval between a specied pair of events. For example, the instantiation of a task, the occurrence
of a sensor input, or the arrival of a packet could be a start event. Such events will be denoted as arrival
events. Similar, the nishing of an application or a part of it can again be modeled as an event, denoted as
nishing event. In case of a distributed system, the physical location of the nishing event may not be equal
to that of the corresponding arrival event and the processing may require the processing of a sequence
or set of tasks, and the use of distributed computing and communication resources. In this case, we talk
about end-to-end timing constraints. Note that not all pairs of events in a system are necessarily critical,
that is, have deadline requirements.
An embedded systemprocesses the data associated with arrival events. The timing of computations and
communications within the embedded systemmay depend on the input data (because of data dependent
behavior of tasks) and on the arrival pattern. In case of a conservative resource sharing strategy, such as
the time-triggered architecture (TTA), the interference between these tasks is removed by applying a static
sharing strategy. If the use of shared resources is controlled by dynamic policies, all activities may interact
with each other and the timing properties inuence each other. As shown in Section 15.1.1, it is necessary
to distinguish between the following terms:
Worst case and best case. The worst case and the best case are the maximal and minimal time interval
between the arrival and nishing events under all admissible system and environment states. The
execution time may vary largely, owing to different input data and interference between concurrent
systemactivities.
Upper and lower bounds. Upper and lower bounds are quantities that bound the worst- and best-
case behavior. These quantities are usually computed ofine, that is, not during the runtime of the
system.
Statistical measures. Instead of computing bounds on the worst- and best-case behavior, one may
also determine a statistical characterization of the runtime behavior of the system, for example,
expected values, variances, and quantiles.
2006 by Taylor & Francis Group, LLC
15-4 Embedded Systems Handbook
In the case of real-time systems, we are particularly interested in upper and lower bounds. They are
used in order to verify statically, whether the systemmeets its timing requirements, for example, deadlines.
In contrast to the end-to-end timing properties, the term performance is less well dened. Usually,
it refers to a mixture of the achievable deadline, the delay of events or packets, and of the number of events
that can be processed per time unit (throughput). There is a close relation between the delay of individual
packets or events, the necessary memory in the embedded systemand the throughput, that is, the required
memory is proportional to the product of throughput and delay. Therefore, we will concentrate on the
delay and memory properties in this chapter.
Several methods do exist, such as analysis, simulation, emulation, and implementation, in order to
determine or approximate the above quantities. Besides analytic methods basedonformal models, one may
also consider simulation, emulation, or implementation. All the latter possibilities should be used with
care as only a nite set of initial states, environment behaviors, and execution traces can be considered.
As is well known, the corner cases that lead to a WCET or BCET are usually not known, and thus incorrect
results may be obtained. The huge state space of realistic systemarchitectures makes it highly improbable
that the critical instances of the execution can be determined without the help of analytical methods.
In order to understand the requirements for performance analysis methods in distributed embedded
systems, we will classify possible causes for a large difference between the worst case and best case or
between the upper and lower bounds:
Nondeterminism and interference. Let us suppose that there is only limited knowledge about the
environment of the embedded system, for example, about the time when external events arrive
or about their input data. In addition, there is interference of computation and communication
on shared resources such as CPU, memory, bus, or network. Then, we will say that the timing
properties are nondeterministic with respect to the available information. Therefore, there will be
a difference between the worst-case and the best-case behavior as well as between the associated
bounds. An example may be that the execution time of a task may depend on its input data. Another
example is the communication of data packets on a bus in case of an unknown interference.
Limited analyzability. If there is complete knowledge about the whole system, then the behavior of
the system is determined. Nevertheless, it may be that because of the system complexity, there is
no feasible way of determining close upper and lower bounds on the worst- and best-case timing,
respectively.
As a result of this discussion, we understand that methods to analyze the performance of distributed
embeddedsystemmust be (1) correct inthat they determine validupper andlower bounds and(2) accurate
in that the determined bounds are close to the actual worst case and best case.
In contrast to other chapters of the handbook, we will concentrate on the interaction between the
task level of an embedded system and the distributed operation. We suppose that the whole application
is partitioned into tasks and threads. Therefore, the task level refers to operating system issues such as
scheduling, memory management, and arbitration of shared resources. In addition, we are faced with
applications that run on distributed resources. The corresponding layer contains methods of distributed
scheduling and networking. On this level of abstraction we are interested in end-to-end timing and
performance properties.
15.1.3 Role in the Design Process
One of the major challenges in the design process of embedded systems is to estimate essential character-
istics of the nal implementation early in the design. This can help in making important design decisions
before investing too much time in detailed implementations. Typical questions faced by a designer dur-
ing a system-level design process are: which functions should be implemented in hardware and which
in software (partitioning)? Which hardware components should be chosen (allocation)? How should the
different functions be mapped onto the chosen hardware (binding)? Do the system-level timing properties
meet the design requirements? What are the different bus utilizations and which bus or processor acts
2006 by Taylor & Francis Group, LLC
Analysis of Distributed Embedded Systems 15-5
Application
specification
Mapping
scheduling
arbitration
Design space
exploration
Execution
platform
Performance
analysis
FIGURE 15.2 Relation between design space exploration and performance analysis.
as a bottleneck? Then there are also questions related to the on-chip memory requirements and off-chip
memory bandwidth.
Typically, the performance analysis or estimation is part of the design space exploration, where different
implementation choices are investigated in order to determine the appropriate design trade-offs between
the different conicting objectives, for an overview see Reference 2. Following Figure 15.2, the estimation
of systemproperties in an early design phase is an essential part of the design space exploration. Different
choices of the underlying systemarchitecture, the mapping of the applications onto this architecture, and
the chosen scheduling and arbitration schemes will need to be evaluated in terms of the different quality
criteria.
In order to achieve acceptable design times though, there is a need for automatic or semiautomatic
(interactive) exploration methods. As a result, there are additional requirements for performance analysis
if used for design space exploration, namely (1) the simple recongurability with respect to architecture,
mapping, and resource sharing policies, (2) a short analysis time in order to be able to test many different
choices in a reasonable time frame, and (3) the possibility to cope with incomplete design information,
as typically the lower layers are not designed or implemented yet.
Even if the design space exploration as described is not a part of the chosen design methodology, the
performance analysis is often part of the development process of software and hardware. In embedded
system design, the functional correctness is validated after each major design step using simulation or
formal methods. If there are nonfunctional constraints such as deadline or throughput requirements, they
need to be validated as well and all aspects of the design representation related to performance become
rst class citizens.
Finally, performance analysis of the whole embedded system may be done after completion of the
design, in particular if the systemis operated under hard real-time conditions where timing failures lead
to a catastrophic situation. As has been mentioned earlier, performance simulation is not appropriate
in this case because the critical instances and test patterns are not known in general.
15.1.4 Requirements
Based on the discussion, one can list some of the requirements that a methodology for performance
analysis of distributed embedded systems must satisfy:
Correctness. The results of the analysis should be correct, that is, there exist no reachable system
states and feasible reactions of the systemenvironment such that the calculated bounds are violated.
Accuracy. The lower and upper bounds determined by the performance analysis should be close to
the actual worst- and best-case timing properties.
2006 by Taylor & Francis Group, LLC
15-6 Embedded Systems Handbook
Embedding into the design process. The underlying performance model should be sufciently
general to allow the representation of the application (which possibly uses different specication
mechanisms), of the environment (periodic, aperiodic, bursty, different event types), of the
mapping including the resource sharing strategies (preemption, priorities, time triggered) and of
the hardware platform. The method should seamlessly integrate into the functional specication
and design methodology.
Short analysis time. Especially, if the performance analysis is part of a design space explora-
tion, a short analysis time is important. In addition, the underlying model should allow for
recongurability in terms of application, mapping, and hardware platform.
As distributed systems are heterogeneous in terms of the underlying execution platform, the diverse
concurrently running applications, and the different scheduling and arbitration policies used, modularity
is a key requirement for any performance analysis method. We can distinguish between several composition
properties:
Process composition. Often, events need to be processed by several consecutive application tasks.
In this case, the performance analysis method should be modular in terms of this functional
composition.
Scheduling composition. Within one implementation, different scheduling methods can be com-
bined, even within one computing resource (hierarchial scheduling); the same property holds for
the scheduling and arbitration of communication resources.
Resource composition. A system implementation can consist of different heterogeneous computing
and communication resources. It should be possible to compose them in a similar way as processes
and scheduling methods.
Building components. Combinations of processes, associated scheduling methods and architecture
elements should be combined into components. This way, one could associate a performance
component to a combined hardware/operating system/software module of the implementation,
that exposes the performance requirements but hides internal implementation details.
It should be mentioned that none of the approaches known to date are able to satisfy all of the
above mentioned criteria. On the other hand, depending on the application domain and the chosen
design approach, not all of the requirements are equally important. Section 15.2 summarizes some of the
available methods and in Section 15.3 one available method is described in more detail.
15.2 Approaches to Performance Analysis
In this survey, we select just a few representative and promising approaches that have been proposed for
the performance analysis of distributed embedded systems.
15.2.1 Simulation-Based Methods
Currently, the performance estimation of embedded systems is mainly done using simulation or trace-
based simulation. Examples of available approaches and software support provides the SystemCinitiative,
see for example, References 3 and 4, that is supported by tools from companies such as Cadence
(nc-systemc) and Synopsys (System Studio). In simulation-based methods, many dynamic and com-
plex interactions can be taken into account whereas analytic methods usually have to stick to a restrictive
underlying model and suffer from limited scope. In addition, there is the possibility to match the level
of abstraction in the representation of time to the required degree of accuracy. Examples for these differ-
ent layers are cycle-accurate models, for example, those used in the simulation of processors [5], up to
networks of discrete event components that can be modeled in SystemC.
In order to determine timing properties of an embedded system, a simulation framework not only
has to consider the functional behavior but also requires a concept of time and a way of taking into
2006 by Taylor & Francis Group, LLC
Analysis of Distributed Embedded Systems 15-7
Input
stimuli
Cosimulation
(Based on an abstract
architecture)
Abstract
trace
Initial CAG
Communication topology
mapping
arbitration protocols
Refined CAG
Performance
estimation
Simulation
Analysis
FIGURE 15.3 A hybrid method for performance estimation, based on simulation and analytic methods.
account properties of the execution platform, of the mapping between functional computation and
communication processes and elements of the underlying hardware, and of resource sharing policies
(as usually implemented in the operating system or directly in hardware). This additional complexity
leads to higher computation times, and performance estimation quickly becomes a bottleneck in the
design. Besides, there is a substantial set-up effort necessary if the mapping of the application to the
underlying hardware platform changes, for example, in order to perform a design space exploration.
The fundamental problem of simulation-based approaches to performance estimation is the
insufcient corner case coverage. As shown in the example in Figure 15.1, the sub-system corner case (high
computation time of A1) does not lead to the system corner case (small computation time of A1). Designers
must provide a set of appropriate simulation stimuli in order to cover all the corner cases that exist in the
distributed embedded system. Failures of embedded systems very often relate to timing anomalies that
happen infrequently and therefore, are almost impossible to discover by simulation. In general, simulation
provides estimates of the average system performance but does not yield worst-case results and cannot
determine whether the system satises required timing constraints.
The approach taken by Lahiri et al. [6] combines performance simulation and analysis by a hybrid
trace-based methodology. It is intended to ll the gap between pure simulation that may be too slow to be
used in a design space exploration cycle, and analytic methods that are often too restricted in scope and
not accurate enough. The approach as described concentrates on communication aspects of a distributed
embedded system. The performance estimation is partitioned into several stages, see Figure 15.3:
Stage 1. An initial cosimulation of the whole distributed systemis performed. The simulation not
only covers functional aspects (processing of data) but also captures the communication in an
abstract manner, that is, in form of events, tokens, or abstract data transfers. The resulting set
of traces cover essential characteristics of computation and communication but do not contain
data information anymore. Here, we do not take into account resource sharing such as different
arbitration schemes and access conicts. The output of this step is a timing inaccurate system
execution trace.
Stage 2. The traces from stage 1 are transformed into an initial Communication Analysis
Graph (CAG). One can omit unnecessary details (values of the data communicated, only the
size might be important here, etc.) and bursts of computation/communication events might be
clustered by identifying only start and end times of these bursts.
Stage 3. A communication topology is chosen, the mapping of the abstract communications to
paths in the communication architecture (network, bus, point-to-point links) is specied and
nally, the corresponding arbitration protocols are chosen.
Stage 4. In the analytic part of the whole methodology, the CAG from stage 2 is transformed
and rened using the information in stage 3. It now captures the computation, communication,
and synchronization as seen on the target system. To this end, the initial CAG is augmented to
incorporate the various latencies and additional computations introduced by moving from an
abstract communication model to an actual one.
2006 by Taylor & Francis Group, LLC
15-8 Embedded Systems Handbook
The resulting CAG can then be analyzed in order to estimate the system performance, determine critical
paths, and collect various statistics about the computation and communication components.
The above approach still suffers from several disadvantages. All traces are the result of a simulation, and
the coverage of corner cases is still limited. The underlying representation is a complete execution of the
application in form of a graph that may be of prohibitive size. The effect of the transformations applied
in order to (1) reduce the size of the CAG and to (2) incorporate the concrete communication architecture
are not formally specied. Therefore, it is not clear what the nal analysis results represent. Finally,
because of the separation between the functional simulation and the nonfunctional analysis, no feedback
is possible. For example, a buffer overow because of a sporadic communication overload situation may
lead to a difference in the functional behavior. Nevertheless, the described approach blends two important
approaches to performance estimation, namely simulation and analytic methods and makes use of the
best properties of both worlds.
15.2.2 Holistic Scheduling Analysis
There is a large body of formal methods available for scheduling of shared computing resources,
for example, xed priority, rate monotonic, earliest deadline rst scheduling, time-triggered policies like
TDMA or round-robin, and static cyclic scheduling. From the WCET of individual tasks, the arrival pattern
of activation and the particular scheduling strategy, one can analyze in many cases the schedulability and
worst-case response times, see for example, Reference 7. Many different application models and event
patterns have been investigated such as sporadic, periodic, jitter, and bursts. There exists a large number
of commercial tools that allow for this one-model approach the analysis of quantities such as resource
load and response times. In a similar way, network protocols are increasingly supported by analysis and
optimization tools.
The classical scheduling theory has been extended toward distributed systems where the application
is executed on several computing nodes and the timing properties of the communication between these
nodes cannot be neglected. The seminal work of Tindell and Clark [8] combined xed priority preemptive
scheduling at computations nodes with TDMA scheduling on the interconnecting bus. These results
are based on two major achievements:
The communication system(in this case, the bus), was handled in a similar way than the computing
nodes. Because of this integrationof process andcommunicationscheduling, the methodwas called
a holistic approach to the performance analysis of distributed real-time systems.
The second contribution was the analysis of the inuence of the release jitter on the response time,
where the release jitter denotes the worst-case time difference between the arrival (or activation)
of a process and its release (making it available to the processor). Finally, the release jitter has been
linked to the message delay induced by the communication system.
This work was improved in terms of accuracy by Yen and Wolf [9] by taking into account correlations
between arrivals of triggering events. In the meantime, many extensions and applications have been pub-
lished based on the same line of thoughts. Other combinations of scheduling and arbitration policies have
been investigated, such as CAN [10], and more recently, the FlexRay protocol [11]. The latter extension
opens the holistic scheduling methodology to mixed event-triggered and time-triggered systems where the
processing and communication is driven by the occurrence of events or the advance of time, respectively.
Nevertheless, it must be noted that the holistic approach does not scale to general distributed architec-
tures in that for every new kind of application structure, sharing of resources and combination thereof,
a new analysis needs to be developed. In general, the model complexity grows with the size of the system
and the number of different scheduling techniques. In addition, the method is restricted to the classical
models of task arrival patterns such as periodic, or periodic with jitter.
2006 by Taylor & Francis Group, LLC
Analysis of Distributed Embedded Systems 15-9
15.2.3 Compositional Methods
Three main problems arise in the case of complex distributed embedded systems: rst, the architecture
of such systems, as already mentioned, is highly heterogeneous the different architectural components
are designed assuming different input event models and use different arbitration and resource sharing
strategies. This makes any kind of compositional performance analysis difcult. Second, applications very
often rely on a high degree of concurrency. Therefore, there are multiple control threads, which additionally
complicate timing analysis. And third, we cannot expect that an embedded system only needs to process
periodic events where to each event a xed number of bytes is associated. If, for example, the event stream
represents a sampled voice signal, then after several coding, processing, and communication steps, the
amount of data per event as well as the timing may have changed substantially. In addition, stream based
systems often also have to process other event streams that are sporadic or bursty, for example, they have
to react to external events or deal with best-effort trafc for coding, transcription, or encryption. There
are only a few approaches available that can handle such complex interactions.
One approach is based on a unifying model of different event patterns in the form of arrival curves
as known from the networking domain, see References 12 and 13. The proposed real-time calculus (RTC)
represents the resources and their processing or communication capabilities in a compatible manner and
therefore, allows for a modular hierarchical scheduling and arbitration for distributed embedded systems.
The approach will be explained in Section 15.3 in some more detail.
Richter et al. propose in References 1, 14, and 15 a method that is based on classical real-time schedul-
ing results. They combine different well known abstractions of event task arrival patterns and provide
additional interfaces between them. The approach is based on the following principles:
The main goal is to make use of the very successful results in real-time scheduling, in particular for
sharing a single processor or a single communication link, see for example, References 7 and 16. For
a large class of scheduling and arbitration policies and a set of arrival patterns (periodic, periodic
with jitter, sporadic, and bursty), upper and lower bounds on the response time can be determined,
that is, the time difference between the arrival of a task and its nishing time. Therefore, the
abstraction of a task of the application consists of a triggering event stream with a certain arrival
pattern, the WCET and BCET on the resource. Several tasks can be mapped onto a single resource.
Together with the scheduling policy, one can obtain for each task the associated lower and upper
bound of the response time. In a similar way, communication and shared busses can be handled.
The application model is a simple concatenation of several tasks. The end-to-end delay can now
be obtained by adding the individual contributions of the tasks; the necessary buffer memory can
simply be computed taking into account the initial arrival pattern.
Obviously, the approach is feasible only if the arrival patterns t the few basic models for which
results on computing bounds on the response time are available. In order to overcome this
limitation, two types of interfaces are dened:
(a) EMIF. Event Model Interfaces are used in the performance analysis only. They perform
a type conversion between certain arrival patterns, that is, they change the mathematical
representation of event streams.
(b) EAF. Event Adaptation Functions need to be used in cases where there exists no EMIF. In this
case, the hardware/software implementation must be changed in order to make the system
analyzable, for example, by adding playout buffers at appropriate locations.
In addition, a new set of six arrival patterns was dened in Reference 1 which is more suitable for the
proposed type conversion using EMIF and EAF, see Figure 15.4.
In Figure 15.5, the example of Figure 15.1 is extended by adding to the tasks P1 to P6, appropriate
arrival patterns (event streamabstractions) and EMIF/EAF interfaces. For example, we suppose that there
is an analysis method for the bus arbitration scheme available that requires periodic with jitter as the
input model. As the transformation fromperiodic with burst requires an EAF, the implementation must
be changed to accommodate a buffer that smoothens the bursts. Fromperiodic to periodic with jitter,
2006 by Taylor & Francis Group, LLC
15-10 Embedded Systems Handbook
Periodic
t
i +1
t
i
=T
t
i
t
i
t
i
t
i +1
T
t
Periodic
w/jitter
t
T
t
i
=i T+ w
i
+ w
0
0>w
i
>J
t
i
=i T+ w
i
+ w
0
0>w
i
>J
t
i +1
t
i
>d
Admissible occurrence of event
J
J >T
Periodic
w/burst
t
T J
J >T
FIGURE 15.4 Some arrival patterns of tasks that can be used to characterize properties of event streams in Reference 1.
T, J , and d denote the period, jitter, and minimal interarrival time, respectively.
0
denotes a constant phase shift.
Sensor Memory
Buffer
A1
A2
P1
P2
P3
P5 P4
C1 C2
Periodic w/burst
Periodic w/jitter
Periodic
Periodic w/jitter
Periodic w/burst
Sporadic
Periodic w/burst
P6
EAF EMIF
EMIF
FIGURE 15.5 Example of event stream interfaces for the example in Figure 15.1.
one can construct a lossless EMIF simply by setting the jitter J = 0. There is another interface between
communication C1 and task P3 that converts the bursty output of the bus to a sporadic model. Now,
one can apply performance analysis methods to all of the components. As a result, one may determine the
minimal buffer size and an appropriate scheduling policy for the DSP such that no overow or underow
occurs.
Several extensions have been worked out, for example, in order to deal with cyclic nonfunctional
dependencies and to generalize the application model. Nevertheless, when comparing the requirements
for a modular performance analysis, the approach has some inherent drawbacks. EAFs are caused by the
limited class of supported event models and the available analysis methods. The analysis method enforces
a change in the implementation. Furthermore, the approach is not modular in terms of the resources,
as their service is not modeled explicitly. For example, if several scheduling policies need to be combined
in one resource (hierarchical scheduling), then for each newcombination an appropriate analysis method
must be developed. In this way, the approach suffers from the same problem as the holistic approach
described earlier. In addition, one is bound to the classical arrival patterns that are not sufcient in case
of streamprocessing applications. Other event models need to be converted with loss in accuracy (EMIF)
or the implementation must be changed (EAF).
2006 by Taylor & Francis Group, LLC
Analysis of Distributed Embedded Systems 15-11
15.3 The Performance Network Approach
This section describes an approach to the performance analysis of embedded systems that is inuenced by
the worst-case analysis of communication networks. The network calculus as described in Reference 17
is based on Reference 18 and uses (max,+)-algebra to formulate the necessary operations. The network
calculus is a promising analysis methodology as it is designed to be modular in various respects and as the
representation of event (or packet) streams is not restricted to the few classes mentioned in Section 15.2.3.
In References 12 and 19, the method has been extended to the RTC in order to deal with distributed
embedded systems by combining computation and communication. Because of the detailed modeling of
the capability of the shared computing and communication resources as well as the event streams, a high
accuracy can be achieved, see Reference 20. The following sections serve to explain the basic approach.
In addition, the main performance analysis method is not bound to the use of the RTC. Instead,
any suitable abstraction of event streams and resource characterization is possible. Only the actual
computations that are done within the components of the performance network need to be changed
appropriately.
15.3.1 Performance Network
In functional specication and verication, the given application is usually decomposed into components
that are communicating via event interfaces. The properties of the whole system are investigated by
combining the behavior of the components. This kind of representation is common in the design of
complex embedded systems and is supported by many tools and standards, for example, UML. It would be
highly desirable if the performance analysis follows the same line of thinking as it could be integrated into
the usual design methodology easily. Considering the discussion given earlier, we can identify two major
additions that are necessary:
Abstraction. Performance analysis is interested in making statements about the timing behavior
not just for one specic input characterization but for a larger class of possible environments.
Therefore, the concrete event streams that ow between the components must be represented in
an abstract way. As an example, we have seen their characterization by periodic or sporadic with
jitter. The same way, the nonfunctional properties of the application and the resource sharing
mechanisms must be modeled appropriately.
Resource modeling. In comparison to functional validation, we need to model the resource capabil-
ities and how they are changed by the workload of tasks or communication. Therefore, in contrary
to the approaches described before, we will model the resources explicitly as rst class citizens of
the approach.
As an example of a performance network, let us look again at the simple example from Figure 15.1 and
Figure 15.5. In Figure 15.6, we see a corresponding performance network. Because of the simplicity of the
example, not all the modeling possibilities can be shown.
On the left-hand side, you see the abstract input which models the sources of the event streams that
trigger the tasks of the applications: Timer represents the periodic instantiation of the task that reads
out the buffer for playback, Sensor models the periodic bursty events from the sensor and RT data
denotes the real-time data in equidistant packets via the Input interface. The associated abstract event
streams are transformed by the performance components. On the top, you can see the resource modules that
model the service of the shared resources, for example, the Input, CPU, Bus, and I/O component. The
abstract resource streams (vertical direction) interact with the event streams on the performance modules
and performance components. The resource interfaces at the bottom represent the remaining resource
service that is available to other applications that may run on the execution platform.
The performance components represent (1) the way how the timing properties of input event streams
are transformed to timing properties of output event streams and (2) the transformation of the resources.
Of course, these components can be hierarchically grouped into larger components. The way how the
2006 by Taylor & Francis Group, LLC
15-12 Embedded Systems Handbook
CPU Input Bus DSP
P1 Sensor
RT data
I/O
P2 P3
P4 P5
Timer P5
C1
C2
Resource
module
Abstract
input
Abstract
event stream
Abstract
resource stream
Performance
component
Resource
interface
FIGURE 15.6 A simple performance network related to the example in Figure 15.1.
performance components are grouped and their transfer function reect the resource sharing strategy.
For example, P1 and P2 are connected serially in terms of the resource stream and therefore, they model
a xed-priority scheme with the high priority assigned to task P1. If the bus implements FCFS strategy
or a TTP, the transfer function of C1/C2 needs to be determined such that the abstract representations of
the event and resource stream are correctly transformed.
15.3.2 Variability Characterization
The timing characterization of event and resource streams is based on Variability Characterization Curves
(VCC) which substantially generalize the classical representations such as sporadic or periodic. As the
event streams propagate through the distributed architecture, their timing properties get increasingly
complex and the standard patterns can not model them with appropriate accuracy.
The event streams are described using arrival curves
u
(),
l
() R 0, R 0 which
provide upper and lower bounds on the number of events in any time interval of length . In particular,
there are at most
u
() and at least
l
() events within the time interval [t , t +) for all t 0.
In a similar way, the resource streams are characterized using service functions
u
(),
l
() R 0,
R 0 provide upper and lower bounds on the available service in any time interval of length .
The unit of service depends on the kind of the shared resource, for example, instructions (computation)
or bytes (communication).
Note that as dened above, the VCCs
u
() and
l
() are expressed in terms of events (this is marked
by a bar on their symbol), while the VCCs
u
() and
l
() are expressed in terms of workload/service.
A method to transform event-based VCCs to workload/resource-based VCCs and vice versa is presented
later in this chapter. All calculations and transformations presented here are valid both with only event-
basedor withonly workload/resource-basedVCCs, but inthis chapter mainly the event-basedformulation
is used.
Figure 15.7 shows arrival curves that specify the basic classical models shown in Figure 15.4. Note that
in case of sporadic patterns, the lower arrival curves are 0. In a similar way, Figure 15.8 shows a service
curve of a simple TDMA bus access with period T, bandwidth b, and slot interval .
2006 by Taylor & Francis Group, LLC
Analysis of Distributed Embedded Systems 15-13
1
2
3
4
1
2
3
4
1
2
3
4
Periodic Periodic w/jitter Periodic w/bursts
T 2T T 2T
TJ T+J
2T+J
T 2T
2TJ d
a
u
, a
l
a
u
, a
l
a
u
, a
l
FIGURE 15.7 Basic arrival functions related to the patterns described in Figure 15.4.
t
T
t
Bandwidth b
b
u
, b
l
T 2T
T t
t
b t
FIGURE 15.8 Example of a service curve that describes a simple TDMA protocol.
Note that arrival curves can be approximated using linear approximations, that is, a piecewise linear
function. Moreover, there are of course nite representations of the arrival and service curves, for example,
by decomposing them into an irregular initial part and a periodic part.
Where do we get the arrival and service functions from, for example, those characterizing a processor
(CPU in Figure 15.6), or an abstract input (Sensor in Figure 15.6):
Pattern. In some cases, the patterns of the event or resource stream are known, for example,
bursty, periodic, sporadic, and TDMA. In this case, the functions can be constructed analytically,
see for example, Figure 15.7 and Figure 15.8.
Trace. In case of unknown arrival or service patterns, one may use a set of traces and compute
the envelope. This can be done easily by using a sliding window of size and determining the
maximumand minimumnumber of events (or service) within the window.
Data sheets. In other cases, one can derive the curves by deriving the bounds fromthe characteristic
of the generating device (in terms of the arrival curve) or the hardware component (in case of
service curve).
The performance components transform abstract event and resource streams. But so far, the arrival
curve is denedinterms of events per time interval whereas the service curve is giveninterms of service per
time interval. One possibility to overcome this gap is to dene the concept of workload curves that connect
the number of successive events in an event stream and the maximal or minimal workload associated.
They capture the variability in execution demands.
The upper and lower workload curve
u
(e),
l
(e) R 0 denote the maximal and minimal workload
on a specic resource for any sequence of e consecutive events. If we have these curves available, then we
can easily determine upper and lower bounds on the workload that an event streamimposes in any time
interval of length on a resource as
u
() =
u
(
u
()) and
l
() =
l
(
l
()), respectively. And
analogously,
u
() =
l
1
(
u
()) and
l
() =
u1
(
l
()). As in the case of the arrival and service
curves, there appears the question, where the workload curves can come from. A selection of possibilities
is given below:
WCET and BCET. The simplest possibility is to (1) assume that each event of an event stream
triggers the same task and (2) that this task has a given WCET and BCET determined by other
2006 by Taylor & Francis Group, LLC
15-14 Embedded Systems Handbook
e
4
8
12
16
1 2
WCET=4
BCET=3
3
e
5
10
15
20
1 5
10
2
2
1
3
Subtask
Workload
Task Workload
g
u
, g
l
g
u
, g
l
FIGURE 15.9 Two examples of modeling the relation between incoming events and the associated workload on
a resource. The left-hand side shows a simple modeling in terms of the WCET and BCET of the task triggered by an
event. The right-hand side models the workload generated by a task through a nite state machine. The workload
curves can be constructed by considering the maximum or minimum weight paths with e transitions.
methods. An example of an associated workload curve is given in Figure 15.9. The same holds for
communication events also.
Application modeling. The above method models the fact that not all events lead to the same
execution load (or number of bits) by simply using upper and lower bounds on the execution time.
The accuracy of this approach can be substantially improved, if characteristics of the application are
taken into account, for example (1) distinguishing between different event types each one triggering
a different task and (2) modeling that it is not possible that many consecutive events all have the
WCET (or BCET). This way, one can model correlations in event streams, see Reference 21.
Figure 15.9 represents on the right-hand side a simple example where a task is rened into a set
of subtasks. At each incoming event, a subtask generates the associated workload and the program
branches to one of its successors.
Trace. As in the case of arrival curves, we can use a given trace and report the workloads associated
to each event, for example, by simulation. Based on this information, we can easily compute the
upper and lower envelope.
A more ne-grained modeling of an application is possible also, for example, by taking into account
different event types in event streams, see Reference 22. By the same approach, it is also possible to model
more complex task models, for example, a task with different production and consumption rates of events
or tasks with several event inputs, see Reference 23. Moreover, the same modeling holds for the load on
communication links of the execution platform also.
In order to construct a scheduling network according to Figure 15.6, we still need to take into account
the resource sharing strategy.
15.3.3 Resource Sharing and Analysis
In Figure 15.1, we see, for example, that the performance modules associated to tasks P1 and P2 are
connected serially. This way, we can model a preemptive xed-priority resource sharing strategy as P2
only gets the CPU resource that is left after the workload of P1 has been served. Other resource sharing
strategies can be modeled as well, see for example, Figure 15.10 where in addition a proportional share
policy is modeled on the left. In this case, a xed portion of the available resource (computation or
communication) is associated to each task. Other sharing strategies are possible also, such as FCFS [17].
In the same Figure 15.10, we also see how the workload characterization as described in the last section
is used to transform the incoming arrival curve into a representation that talks about the workload for
a resource. After the transformation of the incoming streamby a block called RTC, the inverse workload
transformation may be done again in order to characterize the streamby means of events per time interval.
This way, the performance modules can be freely combined as their input and output representations
match.
2006 by Taylor & Francis Group, LLC
Analysis of Distributed Embedded Systems 15-15
Fixed priority
component
Share
Sum
Proportional share
component
RTC
Performance module
RTC
g
[a
l
,a
u
]
[ b
l
,b
u
]
[ b
l
,b
u
]
[a
l
,a
u
]
g
1
FIGURE 15.10 Two examples of resource sharing strategies and their model in the RTC.
Buffers Input
streams
Resource
sharing
Service
FIGURE 15.11 Functional model of resource sharing on computation and communication resources.
We still need to describe how a single workload stream and resource stream interact on a resource. The
underlying model and analysis very much depends on the underlying execution platform. As a common
example, we suppose that the events (or data packets) corresponding to a single stream are stored in
a queue before being processed, see Figure 15.11. The same model is used for computation as well as for
communication resources. It matches well the common structure of operating systems where ready tasks
are lined up until the processor is assigned to one of them. Events belonging to one stream are processed
in a FCFS manner whereas the order between different streams depends on the particular resource sharing
strategy.
Following this model, one can derive the equations that describe the transformation of arrival and
service curves by an RTC module according to Figure 15.10, see for example, Reference 13:
u
= [(
u
u
)
l
]
u
l
= [(
l
u
)
l
]
= (
u
l
) 0
= (
l
u
) 0
Following Reference 24, the operators used are called min-plus/max-plus convolutions
( f g)(t ) = inf
0ut
f (t u) +g(u)
( f g)(t ) = sup
0ut
f (t u) +g(u)
Delay
Backlog
b
l
FIGURE 15.12 Representation of the delay and accumulated buffer space computation in a performance network.
and min-plus/max-plus deconvolutions
( f g )(t ) = sup
u 0
f (t + u ) g (u )
( f g )(t ) = inf
u 0
f (t + u ) g (u )
1
i
(W ) = sup {e 0:
u
i
(e ) W } 1 i N
l
i
() =
1
i
(
l
i
()) 1 i N
l
() =
l
1
()
l
2
()
l
N
()
delay sup
0
inf { 0:
u
()
l
(+)}
backlog sup
0
{
u
()
l
()}
The curve
1
(W ) denotes the pseudo inverse of a workload curve, that is, it yields the minimum
number of events that can be processed if the service W is available. Therefore,
l
i
() is the minimal
available service in terms of events per time interval. It has been shown in Reference 17, that the delay
and backlog are determined by the accumulated service
l
() that can be obtained using the convolution
of all individual services. The delay and backlog can now be interpreted as the maximal horizontal and
vertical distance between the arrival and accumulated service curves, respectively, see Figure 15.12.
All the above computations can be implemented efciently, if appropriate representations for the
variability characterization curves are used, for example, piecewise linear, discrete points, or periodic.
15.3.4 Concluding Remarks
Because of the modularity of the performance network, one can easily analyze a large number of different
mapping and resource sharing strategies for design space exploration. Applications can be extended by
2006 by Taylor & Francis Group, LLC
Analysis of Distributed Embedded Systems 15-17
adding tasks and performance modules. Moreover, different subsystems can use different kinds of resource
sharing without sacricing the performance analysis.
Of particular interest is the possibility to build a performance component for a combined hardware
software system that describes the performance properties of a whole subsystem. This way, a subcontractor
can deliver a hardware/software/operating system module that already contains part of the application.
The system house can now integrate the performance components of the subsystems in order to validate
the performance of the whole system. To this end, he does not need to know the details of the subsystem
implementations. In addition, a system house can also add an application to the subsystems. Using
the resource interfaces that characterize the remaining available service from the subsystems, its timing
correctness can easily be veried.
On one hand, the performance network approach is correct in the sense that it yields upper and lower
bounds on quantities such as end-to-end delay and buffer space. On the other hand, it is a worst-case
approach that covers all possible corner cases independent of their probability. Even if the deviations
from simulation results can be small, see for example, Reference 25, in many cases one is interested in
average case behavior of distributed embedded systems also. Therefore, performance analysis methods as
those described in this chapter can be considered to be complementary to the existing simulation based
validation methods.
Furthermore, any automated or semiautomated exploration of different design alternatives (design
space exploration) could be separated into multiple stages, each having a different level of abstraction.
It would then be appropriate to use an analytical performance evaluation framework, such as those
described in this chapter, during the initial stages and resort to simulation only when a relatively small set
of potential architectures is identied.
References
[1] K. Richter, D. Ziegenbein, M. Jersak, and R. Ernst. Model composition for scheduling analysis
in platform design. In Proceedings of the 39th Design Automation Conference (DAC). ACM Press,
New Orleans, LA, June 2002.
[2] Lothar Thiele, Simon Knzli, and Eckart Zitzler. A modular design space exploration frame-
work for embedded systems. IEE Proceedings Computers and Digital Techniques, Special Issue on
Embedded Microelectronic Systems, 2004.
[3] SystemC homepage. http://www.systemc.org.
[4] T. Grtker, S. Liao, G. Martin, and S. Swan. System Design with SystemC. Kluwer Academic
Publishers, Boston, May 2002.
[5] Doug Burger and Todd M. Austin. The simplescalar tool set, version 2.0. SIGARCH Computer
Architecture News, 25, 1997, 1325.
[6] K. Lahiri, A. Raghunathan, and S. Dey. System-level performance analysis for designing on-chip
communication architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, 20, 2001, 768783.
[7] G.C. Buttazzo. Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and
Applications. Kluwer Academic Publishers, Boston, 1997.
[8] K. Tindell and J. Clark. Holistic schedulability analysis for distributed hard real-time systems.
Microprocessing and Microprogramming Euromicro Journal, Special Issue on Parallel Embedded
Real-Time Systems, 40, 1994, 117134.
[9] T. Yen and W. Wolf. Performance estimation for real-time distributed embedded systems. IEEE
Transactions on Parallel and Distributed Systems, 9, 1998, 11251136.
[10] K. Tindell, A. Burns, and A.J. Wellings. Calculating controller area networks (CAN) message
response times. Control Engineering Practice, 3, 1995, 11631169.
[11] T. Pop, P. Eles, and Z. Peng. Holistic scheduling and analysis of mixed time/event triggered dis-
tributed embedded systems. In Proceedings of the International Symposiumon HardwareSoftware
Codesign (CODES). ACMPress, May 2002, pp. 187192.
2006 by Taylor & Francis Group, LLC
15-18 Embedded Systems Handbook
[12] L. Thiele, S. Chakraborty, M. Gries, A. Maxiaguine, and J. Greutert. Embedded software in
network processors models and algorithms. In Proceedings of the 1st Workshop on Embedded
Software (EMSOFT). (Lake Tahoe, CA, USA), Vol. 2211 of Lecture Notes in Computer Science.
Springer-Verlag, Heidelberg, 2001, pp. 416434.
[13] L. Thiele, S. Chakraborty, M. Gries, and S. Knzli. A framework for evaluating design tradeoffs in
packet processing architectures. In Proceedings of the 39th Design Automation Conference (DAC).
ACMPress, New Orleans, LA, June 2002, pp. 880885.
[14] Kai Richter, Marek Jersak, and Rolf Ernst. A formal approach to mpsoc performance verication.
IEEE Computer, 36, 2003, 6067.
[15] K. Richter and R. Ernst. Model interfaces for heterogeneous system analysis. In Proceedings of
the 6th Design, Automation and Test in Europe (DATE). IEEE, Munich, Germany, March 2002,
pp. 506513.
[16] J.A. Stankovic, M. Spuri, K. Ramamritham, and G.C. Buttazzo. Deadline scheduling for real-time
systems: EDF and related algorithms. In Kluwer International Series in Engineering and Computer
Science, Vol. 460. Kluwer Academic Publishers, Dordrecht, 1998.
[17] J.-Y. Le Boudec andP. Thiran. NetworkCalculus ATheory of Deterministic Queuing Systems for
the Internet, Vol. 2050 of Lecture Notes in Computer Science. Springer-Verlag, Heidelberg, 2001.
[18] R.L. Cruz. A calculus for network delay, part I: network elements in isolation. IEEE Transactions
on Information Theory, 37, 1991, 114131.
[19] L. Thiele, S. Chakraborty, and M. Naedele. Real-time calculus for scheduling hard real-time
systems. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS),
Vol. 4. IEEE, 2000, pp. 101104.
[20] S. Chakraborty, S. Knzli, L. Thiele, A. Herkersdorf, and P. Sagmeister. Performance evaluation
of network processor architectures: combining simulation with analytical estimation. Computer
Networks, 41, 2003, 641665.
[21] Alexander Maxiaguine, Simon Knzli, and Lothar Thiele. Workload characterization model for
tasks with variable execution demand. In Proceedings of Design Automation and Test in Europe
(DATE). IEEE Press, Paris, France, February 2004, pp. 10401045.
[22] Ernesto Wandeler, Alexander Maxiaguine, and Lothar Thiele. Quantitative characterization of
event streams in analysis of hard real-time applications. In Proceedings of the 10th IEEE Real-Time
and Embedded Technology and Applications Symposium(RTAS). IEEEComputer Society, May 2004,
pp. 450459.
[23] Ernesto Wandeler and Lothar Thiele. Abstracting functionality for modular performance analysis
of hard real-time systems. In Asia South Pacic Design Automation Conference (ASP-DAC). IEEE,
January 2005.
[24] F. Baccelli, G. Cohen, G. Olsder, and J.-P. Quadrat. Synchronization and Linearity. John Wiley &
Sons, NewYork, 1992.
[25] S. Chakraborty, S. Knzli, and L. Thiele. A general framework for analysing system properties in
platform-based embedded system designs. In Proceedings of the 6th Design, Automation and Test
in Europe (DATE). Munich, Germany, March 2003.
2006 by Taylor & Francis Group, LLC
Power Aware
Computing
16 Power Aware Embedded Computing
Margarida F. Jacome and Anand Ramachandran
2006 by Taylor & Francis Group, LLC
16
Power Aware
Embedded Computing
Margarida F. Jacome and
Anand Ramachandran
University of Texas
16.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1
16.2 Energy and Power Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-3
Instruction- and Function-Level Models Micro-Architectural
Models Memory and Bus Models Battery Models
16.3 System/Application Level Optimizations . . . . . . . . . . . . . . 16-7
16.4 Energy Efcient Processing Subsystems . . . . . . . . . . . . . . . . 16-8
Voltage and Frequency Scaling Dynamic Resource Scaling
Processor Core Selection
16.5 Energy Efcient Memory Subsystems . . . . . . . . . . . . . . . . . . 16-11
Cache Hierarchy Tuning Novel Horizontal and Vertical
Cache Partitioning Schemes Dynamic Scaling of Memory
Elements Software-Controlled Memories, Scratch-Pad
Memories Improving Access Patterns to Off-Chip Memory
Special Purpose Memory Subsystems for Media Streaming
Code Compression Interconnect Optimizations
16.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-17
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-17
16.1 Introduction
Embedded systems are pervasive in modern life. State-of-the-art embedded technology drives the ongoing
revolution in consumer and communication electronics, and is on the basis of substantial innovation
in many other domains, including medical instrumentation, process control, etc. [1]. The impact of
embedded systems in well established traditional industrial sectors, for example, automotive industry,
is also increasing at a fast pace [1,2].
Unfortunately, as Complementary Metal-Oxide Semiconductor (CMOS) technology rapidly scales,
enabling the fabrication of ever faster and denser Integrated Circuits (ICs), the challenges that must be
overcome to deliver each new generation of electronic products multiply. In the last few years, power
dissipation has emerged as a major concern. In fact, projections on power density increases owing to
CMOS scaling clearly indicate that this is one of the fundamental problems that will ultimately preclude
further scaling [3,4]. Although the power challenge is indeed considerable, much can be done to mitigate
the deleterious effects of power dissipation, thus enabling performance and device density to be taken to
truly unprecedented levels by the semiconductor industry throughout the next 10 to 15 years.
16-1
2006 by Taylor & Francis Group, LLC
16-2 Embedded Systems Handbook
Power density has a direct impact on packaging and cooling costs, and can also affect system reliability,
owing to electromigration and hot-electron degradation effects. Thus, the ability to decrease power
density, while offering similar performance and functionality, critically enhances the competitiveness of
a product. Moreover, for battery operated portable systems, maximizing battery lifetime translates into
maximizing duration of service, an objective of paramount importance for this class of products. Power
is thus a primary gure of merit in contemporaneous embedded system design.
Digital CMOS circuits have two main types of power dissipation: dynamic and static. Dynamic power
is dissipated when the circuit performs the function(s) it was designed for, for example, logic and arith-
metic operations (computation), data retrieval, storage, and transport, etc. Ultimately, all of this activity
translates into switching of the logic states held on circuit nodes. Dynamic power dissipation is thus
proportional to C V
DD
2
f r, where C denotes the total circuit capacitance, V
DD
and f denote the
circuit supply voltage and clock frequency, respectively, and r denotes the fraction of transistors expected
to switch at each clock cycle [5,6]. In other words, dynamic power dissipation is impacted to rst order
by circuit size/complexity, speed/rate, and switching activity. In contrast, static power dissipation is asso-
ciated with preserving the logic state of circuit nodes between such switching activity, and is caused by
subthreshold leakage mechanisms. Unfortunately, as device sizes shrink, the severity of leakage power is
increasing at an alarming pace [3].
Clearly, the power problem must be addressed at all levels of the design hierarchy, from system to
circuit, as well as through innovations on CMOS device technology [5,7,8]. In this survey we provide
a snapshot on the state-of-the-art on system and architecture level design techniques and methodolo-
gies aimed at reducing, both, static and dynamic power dissipation. Since such techniques focus on the
highest level of the design hierarchy, their potential benets are immense. In particular, at this high level
of abstraction, the specics of each particular class of embedded applications can be considered as a
whole and, as it will be shown in our survey, such an ability is critical to designing power/energy efcient
systems, that is, systems that spend energy strictly when and where it is needed. Broadly speaking, this
requires a proper design and allocation of system resources, geared toward addressing critical perform-
ance bottlenecks in a power efcient way. Substantial power/energy savings can also be achieved through
the implementation of adequate dynamic power management policies, for example, tracking instantan-
eous workloads (or levels of resource utilization) and shutting-down idling/unused resources, so as to
reduce leakage power, or slowing down under-utilized resources, so as to decrease dynamic power dis-
sipation. These are clearly system level decisions/policies, in that their implementation typically impacts
several architectural subsystems. Moreover, different decisions/policies may interfere or conict with
each other and, thus, assessing their overall effectiveness requires a system level (i.e., global ) view of the
problem.
A typical embedded system architecture consists of a processing subsystem (including one or more pro-
cessor cores, hardware accelerators, etc.), a memory subsystem, peripherals, andglobal andlocal interconnect
structures (buses, bridges, crossbars, etc.). Figure 16.1 shows an abstract view of two such architecture
instances. Broadly speaking, system level design consists of dening the specic embedded system archi-
tecture to be used for a particular product, as well as dening how the target embedded application
(implementing the required functionality/services) is to be mapped onto that architecture.
Embedded systems come in many varieties and with many distinct design optimization goals and
requirements. Evenwhentwoproducts provide the same basic functionality, say, videoencoding/decoding,
they may have fundamentally different characteristics, namely, different performance and quality-of-
service requirements, one may be battery operated and the other not, etc. The implications of such product
differentiation are of paramount importance when power/energy are considered. Clearly, the higher
the systems required performance/speed (dened by metrics such as throughput, latency, bandwidth,
response time, etc.), the higher will be its power dissipation. The key objective is thus to minimize
the power dissipated to deliver the required level of performance [5,9]. The trade-offs, techniques, and
optimizations required to develop such power aware or power efcient designs vary widely across the vast
spectrumof embedded systems available in todays market, encompassing many complex decisions driven
by system requirements as well as intrinsic characteristics of the target applications [10].
2006 by Taylor & Francis Group, LLC
Power Aware Embedded Computing 16-3
A/D
and
D/A
I/O ports
RAM
Modem
DSP core
Master
control
ASIP core
Sound
codec
DSP core
Host
interface
ASIP
memory
controller
Flash
ROM
Hardware
accelerator
FFT, DCT, ...
VLIW core
Primary embedded processor core
VLIW core
Embedded
processor
core
Memory
A/D
and
D/A
I
/
O
P
o
r
t
s
FIGURE 16.1 Illustrative examples of a simple and a more complex embedded system architecture.
Consider, for example, the design task of deciding on the number and type of processing elements
to be instantiated on an embedded architecture, that is, dening its processing subsystem. Power, per-
formance, cost, and time-to-market considerations dictate if one should rely entirely on readily available
processors (i.e., off-the-shelf microcontrollers, Digital Signal Processors (DSPs) and/or general-purpose
RISC cores), or should also consider custom execution engines, namely, Application Specic Instruc-
tion set Processors (ASIPs), possibly recongurable, and/or hardware accelerators (see Figure 16.1).
Hardware/software partitioning is a critical step in this process [1]. It consists of deciding which of
an applications segments/functions should be implemented in software (i.e., run on a processor core) and
which (if any) should be implemented in hardware (i.e., execute on high performance, highly power ef-
cient custom hardware accelerators). Naturally, hardware/software partitioning decisions should reect
the power/performance criticality of each such segment/function. Clearly, this is a complex multiobject-
ive optimization problem dened on a huge design space that encompasses, both, hardware and software
related decisions. To compound the problem, the performance and energy efciency of an architectures
processing subsystemcannot be evaluated in isolation, since its effectiveness can be substantially impacted
by the memory subsystem (i.e., the adopted memory hierarchy/organization) and the interconnect struc-
tures supporting communication/data transfers betweenprocessing components to/fromthe environment
in which the system is embedded. Thus, decisions with respect to these other subsystems and components
must be concurrently made and jointly assessed.
Targeting up front a specic embedded system platform, that is, an architectural subspace relevant to
a particular class of products/applications, can considerably reduce the design effort [11,12]. Still, the
design space remains (typically) so complex that a substantial design space exploration may be needed
in order to identify power/energy efcient solutions for the specied performance levels. Since time to
market is critical, methodologies to efciently drive such an exploration, as well as fast simulators and
low complexity (yet good delity) performance, power, and energy estimation models, are critical to
aggressively exploiting effective power/energy driven optimizations, within a reasonable time frame.
Our survey starts by providing an overview on state-of-the-art models and tools used to evaluate
the goodness of individual system design points. We then discuss power management techniques and
optimizations aimed at aggressively improving the power/energy efciency of the various subsystems of
an embedded system.
16.2 Energy and Power Modeling
This section discusses high-level modeling and power estimation techniques aimed at assisting system and
architecture-level design. It would be unrealistic to expect a high degree of accuracy on power estimates
2006 by Taylor & Francis Group, LLC
16-4 Embedded Systems Handbook
produced during such an early design phase, since accurate power modeling requires detailed physical
level information that may not yet be available. Moreover, highly accurate estimation tools (working with
detailed circuit/layout-level information) would be too time consuming to allowfor any reasonable degree
of design space exploration [1,5,13].
Thus, practically speaking, power estimation during early design space exploration should aim at
ensuring a high degree of delity rather than necessarily accuracy. Specically, the primary objective
during this critical exploration phase is to assess the relative power efciency of different candidate system
architectures (populated with different hardware and/or software components), the relative effectiveness
of alternative software implementations (of the same functionality), the relative effectiveness of different
power management techniques, etc. Estimates that correctly expose such relative power trends across
the design space region being explored provide the designer with the necessary information to guide the
exploration process.
16.2.1 Instruction- and Function-Level Models
Instruction-level power models are usedtoassess the relative power/energy efciency of different processors
executing a given target embedded application, possibly with alternative memory subsystem congura-
tions. Such models are thus instrumental during the denition of the main subsystems of an embedded
architecture, as well as during hardware/software partitioning. Moreover, instruction-level power models
can also be used to evaluate the relative effectiveness of different software implementations of the same
embedded application, in the context of a specic embedded architecture/platform.
In their most basic form, instruction-level power models simply assign a power cost to each assembly
instruction (or class of assembly instructions) of a programmable processor. The overall energy consumed
by a program running on the target processor is estimated by summing up the instruction costs for
a dynamic execution trace which is representative of the application [1417].
Instruction-level power models were rst developed by experimentally measuring the current drawn by
a processor while executing different instruction sequences [14]. During this rst modeling effort, it was
observed that the power cost of an instruction may actually depend on previous instructions. Accordingly,
the instruction-level power models developed in Reference 14 include several inter-instruction effects.
Later studies observedthat, for certainprocessors, the power dissipationincurredby the hardware respons-
ible for fetching, decoding, analyzing, and issuing instructions, and then routing and reordering results,
was so high that a simpler model that only differentiates between instructions that access on-chip resources
and those that go off-chip would sufce for such processors [16].
Unfortunately, power estimation based on instruction-level models can still be prohibitively time con-
suming during early design space exploration, since it requires collecting and analyzing large instruction
traces and, for many processors, considering a quadratically large number of inter-instruction effects.
In order to accelerate estimation, processor specic coarser function-level power models were later
developed [18]. Such approaches are faster because they rely on the use of macromodels characterizing
the average energy consumption of a library of functions/subroutines executing on a target processor [18].
The key challenge in this case is to devise macromodels that can properly quantify the power consumed by
each subroutine of interest, as a function of easily observable parameters. Thus, for example, a quadratic
power model of the form an
2
+bn +c could be rst tentatively selected for a insertion sort routine, where
n denotes the number of elements to be sorted. Actual power dissipation then needs to be measured for
a large number of experiments, run with different values of n. Finally, the values of the macromodels
coefcients a, b, and c are derived, using regression analysis, and the overall accuracy of the resulting
macromodel is assessed [18].
The high-level instruction- and function-level power models discussed so far allow designers to
quickly assess a large number candidate system architectures and alternative software implementations,
so as to narrow the design space to a few promising alternatives. Once this initial broad exploration is
concluded, power models for each of the architectures main subsystems and components are needed, in
order to support the detailed architectural design phase that follows.
2006 by Taylor & Francis Group, LLC
Power Aware Embedded Computing 16-5
16.2.2 Micro-Architectural Models
Micro-architectural power models are critical to evaluating the the impact of different processing sub-
system choices on power consumption, as well as the effectiveness of different (micro-architecture level)
power management techniques implemented on the various subsystems.
In the late 1980s and early 1990s, cycle accurate (or more precisely, cycle-by-cycle) simulators, such
as Simplescalar [19], were developed to study the effect of architectural choices on the performance of
general-purpose processors. Such simulators are in general very exible, allowing designers/architects
to explore the complex design space of contemporaneous processors. Namely, they include built-in
parameters that can be used to specify the number and mix of functional units to be instantiated in the
processors datapath, the issue width of the machine, the size and associativity of the L1 and L2 caches, etc.
By varying such parameters, designers can study the performance of different machine congurations for
representative applications/benchmarks. As power consumption became more important, simulators to
estimate dynamic power dissipation (e.g., Wattch [20],CaiLim model [21], andSimplepower [22])
were later incorporated on these existing frameworks. Such an integration was performed seamlessly,
by directly augmenting the cycle-oriented performance models for the various micro-architectural
components with corresponding power models.
Naturally, the overall accuracy of these simulation-based power estimation techniques is determined
by the level of detail of the power models used for the micro-architectures constituent components. For
out-of-order RISC cores, for example, the power consumed in nding independent instructions to issue
is a function of the number of instructions currently in the instruction queue and of the actual dependen-
cies between such instructions. Unfortunately, the use of detailed power models accurately capturing the
impact of input and state data onthe power dissipated by each component would prohibitively increase the
already long micro-architectural simulation runtimes. Thus, most state-of-the-art simulators use very
simple/straightforward empirical power models for datapath and control logic, and slightly more soph-
isticated models for regular structures such as caches [20]. In their simplest form, such models capture
typical or average power dissipation for each individual micro-architectural component. Specically,
each time a given component is accessed/used during a simulation run, it is assumed that it dissipates
its corresponding average power. Slightly more sophisticated power macromodels for datapath com-
ponents have been proposed in References 2328, and shown to improve accuracy with a relatively small
impact on simulation time.
So far, we have discussed power modeling of micro-architectural components, yet a substantial percent-
age of the overall power budget of a processor is actually spent on the global clock (up to 40 to 45% [5]).
Thus, global clock power models must also be incorporated in these frameworks. The power dissipated
on global clock distribution is impacted to rst order by the number of pipeline registers (and thus by
a processors pipeline depth) and by global and local wiring capacitances (and thus by a processors core
area) [5]. Accordingly, different processor cores and/or different congurations of the same core may
dissipate substantially different clock distribution power. Power estimates incorporating such numbers
are thus critical during processor core selection and conguration.
The component-level and clock distribution models discussed so far are used to estimate the dynamic
power dissipation of a target micro-architecture. Yet, as mentioned earlier, static/leakage power dissipation
is becoming a major concern, and thus, micro-architectural techniques aimed at reducing leakage power
are increasingly relevant. Models to support early estimation of static power dissipation emerged along the
same lines as those used for dynamic power dissipation. The ButtsSohi model, which is one of the most
inuential static power models developed so far, quanties static energy in CMOS circuits/components
using a lumped parameter model that maps technology and design effects into corresponding charac-
terizing parameters [29]. Specically, static power dissipation is modeled as V
DD
N k
design
I
leak
,
where V
DD
is the supply voltage and N denotes the number of transistors in the circuit. k
design
is the
design dependent parameter it captures circuit style related characteristics of a component, includ-
ing average transistor aspect ratio, average number of transistors switched off during normal/typical
component operation, etc. Finally, I
leak
is the technology dependent parameter. It accounts for the impact
2006 by Taylor & Francis Group, LLC
16-6 Embedded Systems Handbook
of threshold voltage, temperature, and other key parameters, on leakage current, for a specic fabrication
process.
From a system designers perspective, static power can be reduced by lowering supply voltage (V
DD
),
and/or by power supply gating or V
DD
gating (as opposed to clock gating) unused/idling devices (N).
Integrating models for estimating static power dissipation on cycle-by-cycle simulators thus enables
embedded system designers to analyze critical static power versus performance trade-offs enabled by
power aware features available in contemporaneous processors, such as dynamic voltage scaling and
selective datapath (re)conguration. An improved version of the ButtsSohi model, providing the
ability to dynamically recalculate leakage currents (as temperature and voltage change owing to oper-
ating conditions and/or dynamic voltage scaling), has been integrated into the Simplescalar simulation
framework, called HotLeakage, enabling such high-level trade-offs to be explored by embedded system
designers [30].
16.2.3 Memory and Bus Models
Storage elements, such as caches, register les, queues, buffers, and tables constitute a substantial part of
the power budget of contemporaneous embedded systems [31]. Fortunately, the high regularity of some
such memory structures (e.g., caches) permits the use of simple, yet reasonably accurate power estimation
techniques, relying on automatically synthesized structural designs for such components.
The Cache Access and Cycle TIme (CACTI) framework implements this synthesis-driven power estim-
ation paradigm. Specically, given a specic cache hierarchy conguration (dened by parameters such
as cache size, associativity, and line size), as well as information on the minimum feature size of the target
technology [32], it internally generates a coarse structural design for such cache conguration. It then
derives delay and power estimates for that particular design, using parameterized built-in C models for
the various constituent elements, namely, SRAM cells, row and column decoders, word and bit lines,
precharge circuitry, etc. [33,34].
CACTIs synthesis algorithms used to generate the structural design of the memory hierarchy (which
include dening the aspect ratio of memory blocks, the number of instantiated sub-banks, etc.) have
been shown to consistently deliver reasonably good designs across a large range of cache hierarchy
parameters [34]. CACTI can thus be used to quickly generate power estimates (starting from high-
level architectural parameters) exhibiting a reasonably good delity over a large region of the design
space. During design space exploration, the designer may thus consider a number of alternative L1 and
L2 cache congurations and use CACTI to obtain access-based power dissipation estimates for each
such conguration with good delity. Naturally, the memory access traces used by CACTI should be
generated by a micro-architecture simulator (e.g., Simplescalar) working with a memory simulator (e.g.,
Dinero [35]), so that they reect the bandwidth requirements of the embedded application of interest.
Buses are also a signicant contributor to dynamic power dissipation [5,36]. The dynamic power
dissipation on a bus is proportional to C V
2
DD
fa, where C denotes the total capacitance of the bus
(including metal wires and buffers), V
DD
denotes the supply voltage, and fa denotes the average switching
frequency of the bus [36]. In this high-level model, the average switching frequency of the bus (fa) is
dened by the product of two terms, namely, the average number of bus transitions per word, and the
bus frequency (given in bus words per second). The average number of bus transitions per word can be
estimated by simulating sample programs and collecting the corresponding transition traces. Although
this model is coarse, it may sufce during the early design phases under consideration.
16.2.4 Battery Models
The capacity of a battery is a nonlinear function of the current drawn from it, that is, if one increases
the average current drawn from a battery by a factor of two, the remaining deliverable battery capacity,
and thus its lifetime, decreases by more than half. Peukerts formula models such nonlinear behavior by
dening the capacity of a battery as k/I
d
e
l
a
y
(
m
J
c
y
c
l
e
s
1
0
0
,
0
0
0
)
10
5
FIGURE 16.3 Design space exploration: energy-delay product for various L1 and L2 D-cache congurations for a
JPEG application running on an XScale-like processor core.
6
Specically, the energy term accounts for accesses to the L1 and L2 on-chip D-caches, and to main memory.
2006 by Taylor & Francis Group, LLC
Power Aware Embedded Computing 16-13
a high degree of associativity, clearly indicating that no substantial performance gains would be achieved
(for this particular media application) by using such an aggressively dimensioned memory subsystems.
For many embedded systems/applications, the power efciency of the memory subsystems can be
improved even more aggressively, yet this requires the use of novel (nonstandard) memory systemdesigns,
as discussed in the sections that follow.
16.5.2 Novel Horizontal and Vertical Cache Partitioning Schemes
In recent years, several novel cache designs have been proposed to aggressively reduce the average dynamic
energy consumption incurred by memory accesses. Energy efciency is improved in these designs by
taking direct advantage of specic characteristics of target classes of applications. The memory footprint
of instructions and data in media applications, for example, tends to be very small, thus creating unique
opportunities for energy savings [75]. Since streaming media applications are pervasive in todays portable
electronics market, they have been a preferred application domain for validating the effectiveness of such
novel cache designs.
Vertical partition schemes [8083], as the name suggests, introduce additional small buffers/caches
before the rst level of the traditional memory hierarchy. For applications withsmall working sets, this
strategy can lead to considerable dynamic power savings.
A concrete example of a vertical partition scheme is the lter cache [84], which is a very small cache
placed in front of the standard L1 data cache. If the lter cache is properly dimensioned, dynamic energy
consumption in the memory hierarchy can be substantially reduced, not only by accessing most of the
data from the lter cache, but also by powering down (clock gating) the L1 D-cache to a STANDBY
mode during periods of inactivity [84]. Although switching the L1 D-cache to STANDBY mode results in
delay/energy penalties when there is a miss in the lter cache, it was observed that for media applications,
the energy-delay product did improve quite signicantly when the two techniques were combined.
Predecoded instruction buffers [85] and loop buffers [86] are variants of the vertical partitioning scheme
discussed above, yet applied to instruction caches (I-caches). The key idea of the rst partitioning
scheme mentioned is to store recently used instructions on an instruction buffer, in a decoded form,
so as to reduce the average dynamic power spent on fetching and decoding instructions. The second
partitioning scheme allows one to hold time-critical loop bodies (identied a priori by the compiler or by
the programmer) on small and thus energy efcient dedicated loop buffers.
Horizontal partition schemes refer to the placement of additional (small) buffers or caches at the
same level as the L1 cache. For each memory reference, the appropriate (level one) cache to be accessed
is determined by dedicated decoding circuitry residing between the processor core and the memory
hierarchy. Naturally, the method used to partition data across the set of rst level caches should ensure
that the cache selection logic is simple, and thus cache access times are not signicantly affected.
Region-basedCaches implement one suchhorizontal partitioningscheme, by adding twosmall 2KBL1
D-caches to the rst level of the memory hierarchy, one for stack and one for global data. This arrangement
has also been shown to achieve substantial gains in dynamic energy consumption for streaming media
applications with a negligible impact on performance [87].
16.5.3 Dynamic Scaling of Memory Elements
With anincreasing number of on-chip transistors being devoted to storage elements inmodernprocessors,
of which only a very small set is active at any point in time, static power dissipation is expected to soon
become a key contributor to a processors power budget. State-of-the-art techniques to reduce static
power consumption in on-chip memories are based on the simple observation that, in general, data or
instructions fetched into a given cache line have an immediate urry of accesses during a small interval
of time, followed by a relatively long period of time where they are not used, before eventually being
evicted to make way for new data/instructions [88,89]. If one can guess when that period starts, it is
2006 by Taylor & Francis Group, LLC
16-14 Embedded Systems Handbook
possible to switch-off (i.e., V
DD
gate) the corresponding cache lines without introducing extra cache
misses, thereby saving static energy consumption with no impact on performance [90,91].
Cache Decay was one of the earliest attempts to exploit such agenerationalmemory usage behavior to
decrease leakage power [90]. The original Cache Decay implementation used a simple policy that turned
off cache lines after a xed number of cycles (decay interval) since the last access. Note that if the selected
decay interval happens to be too small, cache lines are switched off prematurely, causing extra cache misses,
and if it is too large, opportunities for saving leakage energy are missed. Thus, when such a simple scheme
is used, it is critical to tune the xed decay interval very carefully, so that it adequately matches the access
patterns of the embedded application of interest. Adaptive strategies, varying the decay interval at runtime
so as to dynamically adjust it to the changing access patterns, have been proposed more recently, so as to
enable the use of the cache decay principle across a wider range of applications [90,91]. Similar leakage
energy reduction techniques have also been proposed for issue queues [59,60,92] and branch prediction
tables [93].
Naturally, leakage energy reductiontechniques for instruction/programcaches are alsovery critical [94].
Atechnique has been recently proposed that monitors the performance of the instruction cache over time,
and dynamically scales (via V
DD
gating) its size, so as to closely match the size of the working set of the
application [94].
16.5.4 Software-Controlled Memories, Scratch-Pad Memories
Most of novel designs and/or techniques discussed so far require an application-driven tuning of several
architecturally visible parameters. However, similar to more traditional cache hierarchies, the memory
subsysteminterface implementedonthese novel designs still exposes a at viewof the memory hierarchy to
the compiler/software. That is, the underlying details of the memory subsystemarchitecture are essentially
transparent to both.
Dynamic power dissipation incurred by accesses to basic memory modules occurs owing to switching
activity in bit lines, word lines, and input and output lines. Traditional caches have additional switching
overheads, owing to the circuitry (comparators, multiplexers, tags, etc.) needed to provide the at
memory interface alluded to above. Since the hardware assists necessary to support such a transparent view
of the memory hierarchy are quite power hungry, additional energy saving opportunities can be created
by relying more on the compiler (and less on dedicated hardware) to manage the memory subsystem.
The use of software-controlled (rather than hardware-controlled) memory components is thus becoming
increasing prevalent in power aware embedded system design.
Scratch-Pads are an example of such novel, software-controlled memories [95100]. Scratch-Pads
are essentially on-chip partitions of main memory directly managed by the compiler. Namely, decisions
concerning data/instruction placement in on-chip Scratch-Pads are made statically by the compiler, rather
than dynamically, using dedicated hardware circuitry. Therefore, these memories are much less complex
andthus less power hungry thantraditional caches. As one wouldexpect, the ability toaggressively improve
energy-delay efciency through the use of Scratch-Pads is predicated on the quality of the decisions
made by the compiler on the subset of data/instructions that are to be assigned to that limited memory
space [95,101]. Several compiler-driven techniques have been proposed to identify the data/instructions
that can be assigned to the Scratch-Pad more protably, with frequency of use being one of the key
selection criteria [102104].
The Cool Cache architecture [105], also proposed for media applications, is a good example of a novel,
power aware memory subsystem that relies on the use of software-controlled memories. It uses a small
Scratch-Pad and a software-controlled cache, each of which is implemented on a different on-chip
SRAM. The programs scalars are mapped to the small (2 KB) Scratch-Pad [100].
7
Nonscalar data is
mapped to the software-controlled cache, and the compiler is responsible for translating virtual addresses
to SRAM lines, using a small register lookup area. Even though cache misses are handled in software,
7
This size was found to be sufcient for most media applications.
2006 by Taylor & Francis Group, LLC
Power Aware Embedded Computing 16-15
thereby incurring substantial latency/energy penalties, the overall architecture has been shown to yield
substantial energy-delay product improvements for media applications, when compared to traditional
cache hierarchies [105].
The effectiveness of techniques such as the above is so pronounced that several embedded processors
currently offer a variety of software-controlled memory blocks, including congurable Scratch-Pads (TIs
320C6x [106]), lockable caches (Intels XScale [49] and Trimedia [107]) and stream buffers (Intels
StrongARM [49]).
16.5.5 Improving Access Patterns to Off-Chip Memory
During the last few decades, there has been a substantial effort in the compiler domain aimed at min-
imizing the number of off-chip memory accesses incurred by optimized code, as well as enabling the
implementationof aggressive prefetching strategies. This includes devising compiler techniques to restruc-
ture, reorganize, and layout data in off-chip memory, as well as techniques to properly reorder a programs
memory access patterns [108112].
Prefetching techniques have received considerable attention lately, particularly in the domain of embed-
ded streaming media applications. Instruction and data prefetching techniques can be hardware- or
software-driven [113120]. Hardware-based data prefetching techniques try to dynamically predict when
a given piece of data will be needed, so as to load it into cache (or into some dedicated on-chip buffer),
before it is actually referenced by the application (i.e., explicitly required by a demand access) [114116].
In contrast, software-based data prefetching techniques work by inserting prefetch instructions for selec-
ted data references at carefully chosen points in the program such explicit prefetch instructions are
executed by the processor, to move data into cache [117120].
It has been extensively demonstrated that, when properly used, prefetching techniques can substan-
tially improve average memory access latencies [113120]. Moreover, techniques that prefetch substantial
chunks of data (rather than, say, a single cache line), possibly to a dedicated buffer, can also simultan-
eously decrease dynamic power dissipation [121]. Namely, when data is brought from off-chip memory
in large bursts, then energy-efcient burst/page access modes can be more effectively exploited. Moreover,
by prefetching large quantities of instructions/data, the average length of DRAM idle times is expected
to increase, thus creating more protable opportunities for the DRAM to be switched to a lower power
mode [122124]. Naturally, it is important to ensure that the overhead associated with the prefetching
mechanism itself, as well as potential increases in static energy consumption owing to additional stor-
age requirements, do not outweigh the benets achieved from enabling more energy-efcient off-chip
accesses [124].
16.5.6 Special Purpose Memory Subsystems for Media Streaming
As alluded to before, streaming media applications have been a preferred application domain for validating
the effectiveness of many novel, power aware memory designs. Although the compiler is consistently
given a more preeminent role in the management of these novel memory subsystems, they require no
fundamental changes to the adopted programming paradigm. Additional opportunities for energy savings
can be unlocked by adopting a programming paradigmthat directly exposes those elements of an applica-
tion that should be considered by an optimizing compiler, during performance versus power trade-off
exploration. The two special purpose memory subsystems discussed belowdo precisely that, in the context
of streaming media applications.
Xtream-Fit is a special purpose data memory subsystem targeted to generic uni-processor embedded
systemplatforms executing media applications [124]. Xtream-Fits on-chip memory consists of a Scratch-
Pad, to hold constants and scalars, and a novel software-controlled streaming memory, partitioned into
regions, each of which holds one of the input or output streams used/produced by the target application.
The use of software-controlled memories by Xtream-Fit ensures that dynamic energy consumption is low,
while the region based organization of the streaming memory enables the implementation of very simple
2006 by Taylor & Francis Group, LLC
16-16 Embedded Systems Handbook
and yet effective shutdown policies to turn off different memory regions, as the data they hold become
dead. Xtream-Fits programming model is actually quite simple, requiring only a minor reprogram-
ming effort. It simply requires organizing/partitioning the application code into a small set of processing
and data transfer tasks. Data transfer tasks prefetch streaming media data (the amount required by the
next set of processing tasks) into the streaming memory. The amount of prefetched data is explicitly
exposed via a single customization parameter. By varying this single customization parameter, the com-
piler can thus aggressively minimize energy-delay product, by considering, both, dynamic and leakage
power, dissipated in on-chip and in off-chip memories [124].
While Xtream-Fit provides sufcient memory bandwidth for generic uni-processor embedded media
architectures, it cannot support the very highbandwidthrequirements of high-performance media acceler-
ators. For example, Imagine, the multicluster media accelerator alluded previously, uses its own specialized
memory hierarchy, consisting of a streaming memory, a 128 KB stream register le, and stream buffers
and register les local to each of its eight clusters. Imagines memory subsystem delivers a very high band-
width (2.1 GB/sec) with very high energy-efciency, yet it requires the use of a specialized programming
paradigm. Namely, data transfers to/from the host are controlled by a stream controller, and between the
stream register le and the functional units by a microcontroller, both of which have to be programmed
separately, using Imagines own stream-oriented programming style [125].
Systems that demand still higher performance and/or energy-efciency may require memory archi-
tectures fully customized to the target application. Comprehensive methodologies for designing
high-performance memory architectures for custom hardware accelerators are discussed in detail in
[36,126].
16.5.7 Code Compression
Code size affects both program storage requirements, and off-chip memory bandwidth requirements,
and can thus have a rst order impact on the overall power consumed by an embedded system. Instruc-
tion compression schemes decrease both such requirements by storing in main memory (i.e., off-chip)
frequently fetched/executed instruction sequences in an encoded/compressed form [127129]. Naturally,
when one such scheme is adopted, it is important to factor in the overhead incurred by the on-chip
decoding circuitry, so that it does not outweigh the gains achieved on storage and interconnect elements.
Furthermore, different approaches have been considered for storing such selected instruction sequences
on-chip in either compressed or decompressed forms. On-chip storage of instructions in compressed form
saves on-chip storage, yet instructions must be decoded every time they are executed, adding additional
latency/power overheads.
Instruction subsetting is an alternative instruction compression scheme, where instructions that are not
commonly used are discarded from the instruction set, thus enabling the reduced instruction set to be
encoded using less bits [130]. The Thumb instruction set is a classic example of a compressed instruction
set, featuring the most commonly used 32-bit ARM instructions, compressed to 16-bit wide format. The
Thumb instructions set is decompressed transparently to full 32-bit ARM instructions in real time, with
no performance loss.
16.5.8 Interconnect Optimizations
Power dissipation in on- and off-chip interconnect structures is also a signicant contributor to an
embedded systems power budget [131]. A shared bus is a commonly used interconnect structure, as
it offers a good trade-off between generality/simplicity and performance. Power consumption on the
bus can be reduced by decreasing its supply voltage, capacitance, and/or switching activity. Bus splitting,
for example, reduces bus capacitance by splitting long bus lines into smaller sections, with one section
relaying the data to the next [132]. Power consumption in this approach is reduced at the expense of
a small penalty in latency, incurred at each relay point. Bus switching activity, and thus dynamic power
dissipation, can also be substantially reduced by using an appropriate bus encoding scheme [133138].
2006 by Taylor & Francis Group, LLC
Power Aware Embedded Computing 16-17
Bus invert coding [133], for example, is a simple, and yet widely used coding scheme. The rst step of
bus invert coding is to compute the Hamming distance between the current bus value and the previous
bus value. If this value is greater than half the number of total bits, then the data value is transmitted in
inverted form, with an additional invert bit to interpret the data at the other end. Several other encoding
schemes have been proposed, achieving lower switching activity at the expense of higher encoding and
decoding complexity [134138].
With the increasing adoption of System-on-Chip design methodologies for embedded systems, devising
energy-delay efcient interconnect architectures for such large scale systems is becoming increasingly
critical and is still undergoing intensive research [5].
16.6 Summary
Design methodologies for todays embedded systems must necessarily treat power consumption as a
primary gure of merit. At the system and architecture levels of design abstraction, power aware embedded
system design requires the availability of high-delity power estimation and simulation frameworks. Such
frameworks are essential to enable designers to explore and evaluate, in reasonable time, the complex
energy-delay trade-offs realized by different candidate architectures, subsystem realizations, and power
management techniques, and thus quickly identify promising solutions for the target application of
interest. The detailed, system and architecture level design phases that follow should adequately combine
coarse, system level dynamic power management strategies, with ne-grained, self-monitoring techniques,
exploiting voltage and frequency scaling, as well as advanced dynamic resource scaling and power-driven
reconguration techniques.
References
[1] G. Micheli, R. Ernst, and W. Wolf, Eds. Readings in Hardware/Software Co-Design. Morgan
Kaufman Publishers, Norwell, MA, 2002.
[2] F. Balarin, P. Giusto, A. Jurecska, C. Passerone, E. Sentovich, B. Tabbara, M. Chiodo, H. Hsieh,
L. Lavagno, A.L. Sangiovanni-Vincentelli, and K. Suzuki. HardwareSoftware Co-Design of
Embedded Systems: The POLIS Approach. Kluwer Academic Publishers, Dordrecht, 1997.
[3] S. Borkar. Design Challenges of Technology Scaling. IEEE Micro, 19: 2329, 1999.
[4] http://public.itrs.net/
[5] M. Pedram and J.M. Rabaey. Power Aware Design Methodologies. Kluwer Academic Publishers,
Dordrecht, 2002.
[6] R. Gonzalez, B. Gordon, and M. Horowitz. Supply and Threshold Voltage Scaling for Low Power
CMOS. IEEE Journal of Solid-State Circuits, 32(8): 12101216, 1997.
[7] A.P. Chandrakasan, S. Sheng, and R. W. Brodersen. Low-Power CMOS Digital Design. IEEE
Journal of Solid-State Circuits, 27(4): 473484, 1992.
[8] A.A. Jerraya, S. Yoo, N. Wehn, and D. Verkest, Eds. Embedded Software for SoC. Kluwer Academic
Publishers, Dordrecht, 2003.
[9] A.P. Chandrakasan and R.W. Brodersen. Low Power Digital CMOS Design. Kluwer Academic
Publishers, Dordrecht, 1995.
[10] T.L. Martin, D.P. Siewiorek, A. Smailagic, M. Bosworth, M. Ettus, and J. Warren. A Case Study of a
System-Level Approach to Power-Aware Computing. ACM Transactions on Embedded Computing
Systems, Special Issue on Power-Aware Embedded Computing, 2(3): 255276, 2003.
[11] A.S. Vincentelli and G. Martin. A Vision for Embedded Systems: Platform-Based Design and
Software Methodology. IEEE Design and Test of Computers, 18(6): 2333, 2001.
[12] J.M. Rabaey and A.S. Vincentelli. System-on-a-Chip A Platform Perspective. In Keynote
Presentation, Korean Semiconductor Conference, 2002. Available at http://bwrc.eecs.berkeley.edu/
People/Faculty/jan/presentations/platformdesign.pdf
2006 by Taylor & Francis Group, LLC
16-18 Embedded Systems Handbook
[13] J.T. Buck, S. Ha, E.A. Lee, and D.G. Messerschmitt. Ptolemy: A Framework for Simulating and
Prototyping Heterogeneous Systems. International Journal of Computer Simulation, Special Issue
on Simulation Software Development, 4: 155182, 1994.
[14] V. Tiwari, S. Malik, and A. Wolfe. Power Analysis of Embedded Software: A First Step Towards
Software Power Minimization. IEEE Transactions on Very Large Scale Integration Systems, 2(4):
437445, 1994.
[15] P.M. Chau and S.R. Powell. Power Dissipation of VLSI Array Processing Systems. Journal of VLSI
Signal Processing, 4(23): 199212, 1992.
[16] J. Russell and M. Jacome. Software Power Estimation and Optimization for High-Performance
32-bit Embedded Processors. In Proceedings of the International Conference on Computer Design,
1998, pp. 328333.
[17] C. Brandolese, W. Fornaciari, F. Salice, and D. Sciuto. An Instruction-Level Functionality-Based
Energy Estimation Model for 32-bits Microprocessors. In Proceedings of the Design Automation
Conference, 2000, pp. 346351.
[18] G. Qu, N. Kawabe, K. Usami, and M. Potkonjak. Function-Level Power Estimation Methodology
for Microprocessors. In Proceedings of the Design Automation Conference, 2000, pp. 810813.
[19] D.C. Burger and T.M. Austin. The SimpleScalar Tool Set, Version 2.0. Computer Architecture
News, 25(3): 1325, 1997.
[20] D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural Level Power
Analysis and Optimizations. In Proceedings of the International Symposium on Computer
Architecture, 2000, pp. 8394.
[21] G. Cai and C.H. Lim. Architectural Level Power/Performance Optimization and Dynamic Power
Estimation. In Cool Chips Tutorial, International Symposium on Microarchitecture, 1999.
[22] W. Ye, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin. The Design and Use of Simplepower:
A Cycle-Accurate Energy Estimation Tool. In Proceedings of the Design Automation Conference,
2000, pp. 340345.
[23] G. Jochens, L. Kruse, E. Schmidt, and W. Nebel. A New Parameterizable Power Macro-Model for
Datapath Components. In Proceedings of the Design Automation and Test in Europe, 1999.
[24] A. Bogliolo, L. Benini, and G.D. Micheli. Regression-Based RTL Power Modeling. ACM
Transactions on Design Automation of Electronic Systems, 5(3): 337372, 2000.
[25] S.A. Theoharis, C.E. Goutis, G. Theodoridis, and D. Soudris. Accurate Data Path Models for
RT-Level Power Estimation. In Proceedings of the International Workshop on Power and Timing
Modeling, Optimization and Simulation, 1998, pp. 213222.
[26] M. Khellah and M.I. Elmasry. Effective Capacitance Macro-Modelling for Architectural-Level
Power Estimation. InProceedings of the Eighth Great Lakes SymposiumonVLSI, 1998, pp. 414419.
[27] Z. Chen, K. Roy, and E.K. Chong. Estimation of Power Dissipation Using a Novel Power
Macromodeling Technique. IEEE Transactions on Computer Aided Design of Integrated Circuits
and Systems, 19(11): 13631369, 2000.
[28] R. Melhem and R. Graybill, Eds. Challenges for Architectural Level Power Modeling. In Power
Aware Computing. Kluwer Academic Publishers, Dordrecht, 2001.
[29] J.A. Butts and G.S. Sohi. A Static Power Model for Architects. In Proceedings of the International
Symposium on Microarchitecture, 2000, pp. 191201.
[30] Y. Zhang, D. Parikh, K. Sankaranarayanan, K. Skadron, and M. Stan. HotLeakage: ATemperature-
Aware Model of Subthreshold and Gate Leakage for Architects. Technical report, Department of
Computer Science, University of Virginia, 2003.
[31] D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and
K. Yelick. A Case for Intelligent RAM. IEEE Micro, 17(2): 3344, 1997.
[32] S.J.E. Wilton and N.M. Jouppi. CACTI: An Enhanced Cache Access and Cycle Time Model.
Technical report, Digital Equipment Corporation, Western Research Lab, 1996.
[33] M. Kamble and K. Ghose. Analytical Energy Dissipation Models For Low Power Caches.
In Proceedings of the International Symposium on Low Power Electronics and Design, 1997,
pp. 143148.
2006 by Taylor & Francis Group, LLC
Power Aware Embedded Computing 16-19
[34] G. Reinman and N.M. Jouppi. CACTI 2.0: An Integrated Cache Timing and Power Model.
Technical report, Compaq Computer Corporation, Western Research Lab, 2001.
[35] J. Edler and M.D. Hill. Dinero IV Trace-Driven Uniprocessor Cache Simulator, 1998.
http://www.cs.wisc.edu/markhill/DineroIV/
[36] F. Catthoor, S. Wuytack, E. DeGreef, F. Balasa, L. Nachtergaele, and A. Vandecappelle.
Custom Memory Management Methodology: Exploration of Memory Organization for Embedded
Multimedia System Design. Kluwer Academic Publishers, Dordrecht, 1998.
[37] T. Martin and D. Siewiorek. A Power Metric for Mobile Systems. In International Symposium on
Lower Power Electronics and Design, 1996, pp. 3742.
[38] M. Pedram and Q. Wu. Battery-Powered Digital CMOS Design. IEEE Transactions on Very Large
Scale Integration Systems, 10: 601607, 2002.
[39] P. Rong and M. Pedram. An Analytical Model for Predicting the Remaining Battery Capacity
of Lithium-Ion Batteries. In Proceedings of the Design Automation and Test in Europe, 2003,
pp. 1114811149.
[40] M. Srivastava, A. Chandrakasan, and R. Brodersen. Predictive System Shutdown and other Archi-
tectural Techniques for Energy Efcient Programmable Computation. IEEE Transactions on Very
Large Scale Integration Systems, 4(1): 4255, 1996.
[41] Q. Qiu and M. Pedram. Dynamic Power Management Based on Continuous-Time Markov
Decision Processes. In Proceedings of the Design Automation Conference, 1999, pp. 555561.
[42] T. Simunic, L. Benini, P. Glynn, and G. De Micheli. Dynamic Power Management of Portable
Systems. In Proceedings of the International Conference on Mobile Computing and Networking,
2000, pp. 1119.
[43] J. Liu, P. Chou, N. Bagherzadeh, and F. Kurdahi. A Constraint-Based Application Model and
Scheduling Techniques for Power-Aware Systems. In Proceedings of the International Conference
on Hardware/Software Codesign, 2001, pp. 153158.
[44] http://www.acpi.info/
[45] T. Simunic, L. Benini, A. Acquaviva, P. Glynn, and G. De Micheli. Dynamic Voltage Scaling for
Portable Systems. In Proceedings of the Design Automation Conference, 2001, pp. 524529.
[46] R. Gonzalez and M. Horowitz. Energy Dissipation in General Purpose Microprocessors. IEEE
Journal of Solid-State Circuits, 31(9): 12771284, 1996.
[47] T.D. Burd and R.W. Brodersen. Processor Design for Portable Systems. Journal of VLSI Signal
Processing, 13(23): 203221, 1996.
[48] T. Ishihara and H. Yasuura. Voltage Scheduling Problem for Dynamically Variable Voltage
Processors. In Proceedings of the International Symposium on Low Power Electronics and
Design, 1998, pp. 197202.
[49] http://www.intel.com/
[50] http://www.ibm.com/
[51] http://www.transmeta.com/
[52] M. Weiser, B. Welch, A.J. Demers, and S. Shenker. Scheduling for Reduced CPU Energy.
In Proceedings of the Symposium on Operating Systems Design and Implementation, 1994,
pp. 1323.
[53] K. Govil, E. Chan, and H. Wasserman. Comparing Algorithm for Dynamic Speed-Setting of
a Low-Power CPU. In Proceedings of the International Conference on Mobile Computing and
Networking, 1995, pp. 1325.
[54] T. Pering, T. Burd, and R. Brodersen. The Simulation and Evaluation of Dynamic Voltage Scal-
ing Algorithms. In Proceedings of the International Symposium on Low Power Electronics and
Design, 1998, pp. 7681.
[55] T. Pering, T. Burd, and R. Brodersen. Voltage Scheduling in the lpARM Microprocessor
System. In Proceedings of the International Symposium on Low Power Electronics and Design,
2000, pp. 96101.
[56] K. Flautner, S. Reinhardt, and T. Mudge. Automatic Performance Setting for Dynamic Voltage
Scaling. ACM Journal of Wireless Networks, 8(5): 507520, 2002.
2006 by Taylor & Francis Group, LLC
16-20 Embedded Systems Handbook
[57] D. Brooks and M. Martonosi. Value-Based Clock Gating and Operation Packing: Dynamic
Strategies for Improving Processor Power and Performance. ACM Transactions on Computer
Systems, 18(2): 89126, 2000.
[58] S. Dropsho, V. Kursun, D.H. Albonesi, S. Dwarkadas, and E.G. Friedma. Managing Static Leakage
Energy in Microprocessor Functional Units. In Proceedings of the International Symposium on
Microarchitecture, 2002, pp. 321332.
[59] D. Ponomarev, G. Kucuk, and K. Ghose. Reducing Power Requirements of InstructionScheduling
Through Dynamic Allocation of Multiple Datapath Resources. In Proceedings of the International
Symposium on Microarchitecture, 2001, pp. 90101.
[60] A. Buyuktosunoglu, D. Albonesi, P. Bose, P. Cook, and S. Schuster. Tradeoffs in Power-Efcient
Issue Queue Design. In Proceedings of the International Symposium on Low Power Electronics and
Design, 2002, pp. 184189.
[61] C.J. Hughes, J. Srinivasan, and S.V. Adve. Saving Energy with Architectural and Frequency
Adaptations for Multimedia Applications. In Proceedings of the International Symposium on
Microarchitecture, 2001, pp. 250261.
[62] M.F. Jacome and G. de Veciana. Design Challenges for NewApplication Specic Processors. IEEE
Design and Test of Computers, Special Issue on System Design of Embedded Systems, 17(2): 5060,
2000.
[63] R.P. Colwell, R.P. Nix, J.J.O. Donnell, D.B. Papworth, and P.K. Rodman. A VLIW
Architecture for a Trace Scheduling Compiler. IEEE Transactions on Computers, 37(8): 967979,
1988.
[64] G.R. Beck, D.W.L. Yen, and T.L. Anderson. The Cydra 5: Mini-Supercomputer: Architecture and
Implementation. The Journal of Supercomputing, 7(1/2): 143180, 1993.
[65] M.S. Schlansker and B.R. Rau. EPIC: An Architecture for Instruction-Level Parallel Processors.
Technical report HPL-99-111, Hewlewtt Packard Laboratories, 2000.
[66] W.W. Hwu, R.E. Hank, D.M. Gallagher, S.A. Mahlke, D.M. Lavery, G.E. Haab, J.C. Gyllenhaal,
and D.I. August. Compiler Technology for Future Microprocessors. Proceedings of the IEEE,
83(12): 16251640, 1995.
[67] J. R. Ellis. Bulldog: A Compiler for VLIW Architectures. MIT Press, Cambridge, MA, 1985.
[68] J. Dehnert and R. Towle. Compiling for the Cydra-5. Journal of Supercomputing, 7(1/2): 181227,
1993.
[69] C. Dulong, R. Krishnaiyer, D. Kulkarni, D. Lavery, W. Li, J. Ng, and D. Sehr. An Overview of the
Intel IA-64 Compiler. Intel Technology Journal, Q4, 1999, pp. 115.
[70] M.F. Jacome, G. de Veciana, and V. Lapinskii. Exploring Performance Tradeoffs for Clustered
VLIW ASIPs. In Proceedings of the International Conference on Computer-Aided Design, 2000,
pp. 504510.
[71] V. Lapinskii, M.F. Jacome, and G. de Veciana. Application-Specic Clustered VLIW Datapaths:
Early Exploration on a Parameterized Design Space. IEEE Transactions on Computer Aided Design
of Integrated Circuits and Systems, 21(8): 889903, 2002.
[72] S. Pillai and M.F. Jacome. Compiler-Directed ILPExtractionfor ClusteredVLIW/EPICMachines:
Predication, Speculation and Modulo Scheduling. In Proceedings of the Design Automation and
Test in Europe, 2003, p. 10422.
[73] P. Marwedel and G. Goosens, Eds. Code Generation for Embedded Processors. Kluwer Academic
Publishers, Dordrecht, 1995.
[74] C. Liem. Retargetable Compilers for Embedded Core Processors. Kluwer Academic Publishers,
Dordrecht, 1997.
[75] J. Fritts, W. Wolf, and B. Liu. Understanding Multimedia Application Characteristics for
Designing Programmable Media Processors. In SPIE Photonics West, Media Processors, 1999,
pp. 213.
[76] B. Khailany, W.J. Dally, S. Rixner, U.J. Kapasi, P. Mattson, J. Namkoong, J.D. Owens, B. Towles,
and A. Chang. Imagine: Media Processing with Streams. IEEE Micro, 21: 3546, 2001.
2006 by Taylor & Francis Group, LLC
Power Aware Embedded Computing 16-21
[77] J. Rabaey, H. De Man, J. Vanhoof, G. Goossens, and F. Catthoor. CATHEDRAL-II : A Syn-
thesis System for Multiprocessor DSP Systems. In Silicon Compilation. Addison-Wesley, Reading,
MA, 1987.
[78] J. Montanaro, R.T. Witek, K. Anne, A.J. Black, E.M. Cooper, D.W. Dobberpuhl, P.M. Donahue,
J. Eno, A. Farell, G.W. Hoeppner, D. Kruckemyer, T.H. Lee, P. Lin, L. Madden,
D. Murray, M. Pearce, S. Santhanam, K.J. Snyder, R. Stephany, and S.C. Thierauf.
A 160 MHz 32b 0.5 W CMOS RISC Microprocessor. In Proceedings of the Interna-
tional Solid-State Circuits Conference, Digest of Technical Papers, 31(11): 17031714,
1996.
[79] P. Hicks, M. Walnock, and R.M. Owens. Analysis of Power Consumption in Memory Hierarch-
ies. In Proceedings of the International Symposium on Low Power Electronics and Design, 1997,
pp. 239242.
[80] K. Ghose and M.B. Kamble. Reducing Power in Superscalar Processor Caches Using Subbanking,
Multiple Line Buffers and Bit-Line Segmentation. In Proceedings of the International Symposium
on Low Power Electronics and Design, 1999, pp. 7075.
[81] C.-L. Su and A.M. Despain. Cache Design Trade-Offs for Power and Performance Optimization:
A Case Study. In Proceedings of the International Symposium on Low Power Electronics and Design,
1995, pp. 6368.
[82] J. Kin, M. Gupta, and W.H. Mangione-Smith. Filtering Memory References to Increase Energy
Efciency. IEEE Transactions on Computers, 49(1): 115, 2000.
[83] A.H. Farrahi, G.E. Tllez, and M. Sarrafzadeh. Memory Segmentation to Exploit Sleep Mode
Operation. In Proceedings of the Design Automation Conference, 1995, pp. 3641.
[84] J. Kin, M. Gupta, and W.H. Mangione-Smith. The Filter Cache: An Energy Efcient
Memory Structure. In Proceedings of the International Symposium on Microarchitecture, 1997,
pp. 184193.
[85] R.S. Bajwa, M. Hiraki, H. Kojima, D.J. Gorny, K. Nitta, A. Shridhar, K. Seki, and K. Sasaki.
Instruction Buffering to Reduce Power in Processors for Signal Processing. IEEE Transactions on
Very Large Scale Integration Systems, 5(4): 417424, 1997.
[86] L. Lee, B. Moyer, and J. Arends. Instruction Fetch Energy Reduction Using Loop Caches for
Embedded Applications with Small Tight Loops. In Proceedings of the International Symposium
on Low Power Electronics and Design, 1999, pp. 267269.
[87] H.-H. Lee and G. Tyson. Region-Based Caching: An Energy-Delay Efcient Memory Archi-
tecture for Embedded Processors. In Proceedings of the International Conference on Compilers,
Architectures and Synthesis for Embedded Systems, 2000, pp. 120127.
[88] D.A. Wood, M.D. Hill, and R.E. Kessler. AModel for Estimating Trace-Sample Miss Ratios. InPro-
ceedings of the SIGMETRICS Conference onMeasurement and Modeling of Computer Systems, 1991,
pp. 7989.
[89] D.C. Burger, J.R. Goodman, and A. Kagi. The Declining Effectiveness of Dynamic Caching
for General-Purpose Microprocessors. University of Wisconsin-Madison Computer Sciences
Technical report 1261, 1995.
[90] S. Kaxiras, Z. Hu, and M. Martonosi. Cache Decay: Exploiting Generational Behavior to
Reduce Cache Leakage Power. In Proceedings of the International Symposium on Computer
Architecture, 2001, pp. 240251.
[91] H. Zhou, M.C. Toburen, E. Rotenberg, and T.M. Conte. Adaptive Mode Control: A Static-Power-
Efcient Cache Design. In Proceedings of the International Conference on Parallel Architectures and
Compilation Techniques, 2001, pp. 6172.
[92] D. Folegnani and A. Gonzalez. Energy-Effective Issue Logic. In Proceedings of the International
Symposium on Computer, 2001, pp. 230239.
[93] Z. Hu, P. Juang, K. Skadron, D. Clark, and M. Martonosi. Applying Decay Strategies to Branch
Predictors for Leakage Energy Savings. In Proceedings of the International Conference on Computer
Design, 2002, pp. 442445.
2006 by Taylor & Francis Group, LLC
16-22 Embedded Systems Handbook
[94] S.-H. Yang, M.D. Powell, B. Falsa, K. Roy, and T.N. Vijaykumar. An Integrated Circuit/
Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches.
In Proceedings of the High-Performance Computer Architecture, 2001, pp. 147158.
[95] P.R. Panda, N.D. Dutt, and A. Nicolau. Efcient Utilization of Scratch-Pad Memory in Embedded
Processor Applications. In Proceedings of the European Design and Test Conference, 1997,
pp. 711.
[96] D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Application-Specic Memory Management for
Embedded Systems Using Software-Controlled Caches. In Proceedings of the Design Automation
Conference, 2000, pp. 416419.
[97] L. Benini, A. Macii, and M. Poncino. A Recursive Algorithm for Low-Power Memory Partition-
ing. In Proceedings of the International Symposium on Low Power Electronics and Design, 2000,
pp. 7883.
[98] R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. Scratchpad Memory:
A Design Alternative for Cache On-Chip Memory in Embedded Systems. In Proceedings of the
International Workshop on Hardware/Software Codesign, 2002, pp. 7338.
[99] M. Kandemir, J. Ramanujam, M. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh.
Dynamic Management of Scratch-Pad Memory Space. In Proceedings of the Design Automation
Conference, 2001, pp. 690695.
[100] O.S. Unsal, Z. Wang, I. Koren, C.M. Krishna, and C.A. Moritz. On Memory Behavior of Scalars
in Embedded Multimedia Systems. In Proceedings of the Workshop on Memory Performance Issues,
Goteborg, Sweden, 2001.
[101] P.R. Panda, F. Catthoor, N.D. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. Vandercappelle,
and P.G Kjeldsberg. Data and Memory Optimization Techniques for Embedded Systems. ACM
Transactions on Design Automation of Electronic Systems, 6(2): 149206, 2001.
[102] J. Sjdin, B. Frderberg, and T. Lindgren. Allocation of Global Data Objects in On-Chip RAM.
In Proceedings of the Workshop on Compiler and Architectural Support for Embedded Computer
Systems, Washington DC, USA, 1998.
[103] T. Ishihara and H. Yasuura. A Power Reduction Technique with Object Code Merging for Applic-
ation Specic Embedded Processors. In Proceedings of the Design, Automation and Test in Europe,
2000, pp. 617623.
[104] S. Steinke, L. Wehmeyer, B.-S. Lee, and P. Marwedel. Assigning Program and Data
Objects to Scratchpad for Energy Reduction. In Proceedings of the Design Automation and Test in
Europe, 2002, pp. 409417.
[105] O.S. Unsal, R. Ashok, I. Koren, C.M. Krishna, and C.A. Moritz. Cool Cache: A Compiler-Enabled
Energy Efcient Data Caching Framework for Embedded/Multimedia Processors. ACMTransac-
tions on Embedded Computing Systems, Special Issue on Power-Aware Embedded Computing, 2(3):
373392, 2003.
[106] http://www.ti.com/
[107] http://www.trimedia.com/
[108] M.E. Wolf and M. Lam. A Data Locality Optimizing Algorithm. In Proceedings of the Conference
on Programming Language Design and Implementation, 1991, pp. 3044.
[109] S. Carr, K.S. McKinley, and C. Tseng. Compiler Optimizations for Improving Data Locality.
In Proceedings of the International Conference on Architectural Support for Programming Languages
and Operating Systems, 1994, pp. 252262.
[110] S. Coleman and K.S. McKinley. Tile Size Selection Using Cache Organization and Data Layout.
In Proceedings of the Conference on Programming Language Design and Implementation, 1995.
[111] M.J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishers,
Reading, MA, 1995, pp. 279290.
[112] M. Kandemir, J. Ramanujam, A. Choudhary, and P. Banerjee. ALayout-Conscious Iteration Space
Transformation Technique. IEEE Transactions on Computers, 50(12): 13211335, 2001.
2006 by Taylor & Francis Group, LLC
Power Aware Embedded Computing 16-23
[113] N.P. Jouppi. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-
Associative Cache and Prefetch Buffers. In Proceedings of the International Symposium on
Computer Architecture, 1990, pp. 364373.
[114] T.F. Chen and J.L. Baer. Effective Hardware-Based Data Prefetching for High Performance
Processors. IEEE Transactions on Computers, 44(5): 609623, 1995.
[115] J.W.C. Fu, J.H. Patel, and B.L. Janssens. Stride Directed Prefetching in Scalar Processor. In
Proceedings of the International Symposium on Microarchitecture, 1992, pp. 102110.
[116] S.S. Pinter andA. Yoaz. AHardware-BasedData Prefetching Technique for Superscalar Processors.
In Proceedings of the International Symposium on Microarchitecture, 1996, pp. 214225.
[117] D. Callahan, K. Kennedy, and A. Portereld. Software Prefetching. In Proceedings of the
International Conference on Architectural Support for Programming Languages and Operating
Systems, 1991, pp. 4052.
[118] A.C. Klaiber and H.M. Levy. An Architecture for Software Controlled Data Prefetching. In
Proceedings of the International Symposium on Computer Architecture, 1991, pp. 4353.
[119] T.C. Mowry, M.S. Lam, and A. Gupta. Design and Evaluation of a Compiler Algorithm
for Prefetching. In Proceedings of the International Conference on Architectural Support for
Programming Languages and Operating Systems, 1992, pp. 6273.
[120] D.F. Zucker, R.B. Lee, and M.J. Flynn. Hardware and Software Cache Prefetching Techniques
for MPEG Benchmarks. IEEE Transactions on Circuits and Systems for Video Technology, 10(5):
782796, 2000.
[121] Y. Choi and T. Kim. Memory Layout Technique for Variables Utilizing Efcient DRAM Access
Modes in Embedded System Design. In Proceedings of the Design Automation Conference, 2003,
pp. 881886.
[122] X. Fan, C.S. Ellis, and A.R. Lebeck. Memory Controller Policies for DRAM Power Manage-
ment. In Proceedings of the International Symposium on Low Power Electronics and Design, 2001,
pp. 129134.
[123] V. Delaluz, A. Sivasubramaniam, M. Kandemir, N. Vijaykrishnan, and M.J. Irwin. Scheduler-
Based DRAM Energy Management. In Proceedings of the Design Automation Conference, 2002,
pp. 697702.
[124] A. Ramachandran and M. Jacome. Xtream-Fit: An Energy-Delay Efcient Data Memory Subsys-
tem for Embedded Media Processing. In Proceedings of the Design Automation Conference, 2003,
pp. 137142.
[125] P. Mattson. A Programming System for the Imagine Media Processor. PhD thesis, Stanford
University, 2001.
[126] P. Grun, N. Dutt, and A. Nicolau. Memory Architecture Exploration for Programmable Embedded
Systems. Kluwer Academic Publishers, Dordrecht, 2003.
[127] A. Wolfe and A. Chanin. Executing Compressed Programs on an Embedded RISC Architecture.
In Proceedings of the International Symposium on Microarchitecture, 1992, pp. 8191.
[128] C. Lefurgy, P. Bird, I-C. Cheng, and T. Mudge. Improving Code Density Using Compres-
sion Techniques. In Proceedings of the International Symposium on Microarchitecture, 1997,
pp. 194203.
[129] H. Lekatsas andW. Wolf. SAMC: ACode CompressionAlgorithmfor Embedded Processors. IEEE
Transactions on Computer Aided Design of Integrated Circuits and Systems, 18(12): 16891701,
1999.
[130] W.E. Dougherty, D.J. Pursley, and D.E. Thomas. Instruction Subsetting: Trading Power
for Programmability. In Proceedings of the International Workshop on Hardware/Software
Codesign, 1998.
[131] D. Sylvester and K. Keutzer. A Global Wiring Paradigm for Deep Submicron Design. IEEE
Transactions on Computer Aided Design of Integrated Circuits and Systems, 19(2): 242252, 2000.
[132] C.-T. Hsieh and M. Pedram. Architectural Power Optimization by Bus Splitting. In Proceedings
of the Conference on Design, Automation and Test in Europe, 2000, pp. 612616.
2006 by Taylor & Francis Group, LLC
16-24 Embedded Systems Handbook
[133] M.R. Stan and W.P. Burleson. Bus-Invert Coding for Low-Power I/O. IEEE Transactions on Very
Large Scale Integration Systems, 3(1): 4958, 1995.
[134] H. Mehta, R.M. Owens, and M.J. Irwin. Some Issues in Gray Code Addressing. In Proceedings of
the Sixth Great Lakes Symposium on VLSI, 1996, pp. 178181.
[135] L. Benini, G. De Micheli, E. Macii, M. Poncino, and S. Quez. System-Level Power Optimiza-
tion of Special Purpose Applications The Beach Solution. In Proceedings of the International
Symposium on Low Power Electronics and Design, 1997, pp. 2429.
[136] L. Benini, G. De Micheli, E. Macii, D. Sciuto, and C. Silvano. Address Bus Encoding Techniques
for System-Level Power Optimization. In Proceedings of the Design, Automation and Test in
Europe, 1998, pp. 861867.
[137] P.R. Panda andN.D. Dutt. Low-Power Memory Mapping ThroughReducingAddress Bus Activity.
IEEE Transactions on Very Large Scale Integration Systems, 7(3): 309320, 1999.
[138] N. Chang, K. Kim, and J. Cho. Bus Encoding for Low-Power High-Performance Memory Systems.
In Proceedings of the Design Automation Conference, 2000, pp. 800805.
2006 by Taylor & Francis Group, LLC
Security in Embedded
Systems
17 Design Issues in Secure Embedded Systems
A.G. Voyiatzis, A.G. Fragopoulos, and D.N. Serpanos
2006 by Taylor & Francis Group, LLC
17
Design Issues in
Secure Embedded
Systems
A.G. Voyiatzis,
A.G. Fragopoulos, and
D.N. Serpanos
University of Patras
17.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-1
17.2 Security Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-2
Abilities of Attackers Security Implementation Levels
Implementation Technology and Operational Environment
17.3 Security Constraints in Embedded Systems Design . . . 17-4
Energy Considerations Processing Power Limitations
Flexibility and Availability Requirements Cost of Implementation
17.4 Design of Secure Embedded Systems. . . . . . . . . . . . . . . . . . . 17-7
System Design Issues Application Design Issues
17.5 Cryptography and Embedded Systems . . . . . . . . . . . . . . . . . 17-10
Physical Security Side-Channel Cryptanalysis Side-
Channel Implementations Fault-Based Cryptanalysis
Passive Side-Channel Cryptanalysis Countermeasures
17.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-20
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-20
17.1 Introduction
Acomputing systemis typically considered as an embedded systemwhen it is a programmable device with
limited resources (energy, memory, computation power, etc.) that serves one or few applications and is
embedded in a larger system. Their limited resources make them ineffective to be used as general-purpose
computing systems. However, they usually have to meet hard requirements, such as time deadlines and
other real-time processing requirements.
Embedded systems can be classied in two general categories: (1) standalone embedded systems, where
all hardware and software components of the system are physically close and incorporated into a single
device, for example, a Personal Digital Assistant (PDA) or a system in a washing machine or a fax, and
there is no attachment to a network, and (2) distributed (networked) embedded systems, where several
autonomous components each one a standalone embedded system communicate with each other
over a network in order to deliver services or support an application. Several architectural and design
parameters leading to the development of distributed embedded applications, such as the placement of
processing power at the physical point where an event takes place, data reduction, etc. [1].
17-1
2006 by Taylor & Francis Group, LLC
17-2 Embedded Systems Handbook
The increasing capabilities of embedded systems combined with their decreasing cost have enabled their
adoption in a wide range of applications and services, from nancial and personalized entertainment
services to automotive and military applications in the eld. Importantly, in addition to the typical
requirements for responsiveness, reliability, availability, robustness, and extensibility, many conven-
tional embedded systems and applications have signicant security requirements. However, security
is a resource-demanding function that needs special attention in embedded computing. Furthermore, the
wide deployment of small devices which are used in critical applications has triggered the development
of new, strong attacks that exploit more systemic characteristics, in contrast to traditional attacks that
focused on algorithmic characteristics, due to the inability of attackers to experiment with the physical
devices used in secure applications. Thus, design of secure embedded systems requires special attention.
In this chapter we provide an overview of security issues in embedded systems. Section 17.2 presents
the parameters of security systems, while Section 17.3 describes the effect of security in the resource-
constrained environment of embedded systems. Section 17.4 presents the main issues in the design of
secure embedded systems. Finally, Section 17.5 covers in detail attacks and countermeasures of crypto-
graphic algorithm implementations in embedded systems, considering the critical role of cryptography
and the novel systemic attacks developed due to the wide availability of embedded computing systems.
17.2 Security Parameters
Security is a generic termused to indicate several different requirements in computing systems. Depending
onthe systemand its use, several security properties may be satised ineachsystemand ineachoperational
environment. Overall, secure systems need to meet all or a subset of the following requirements [2,3]:
1. Condentiality. Data stored in the system or transmitted from the system have to be protected from
disclosure; this is usually achieved through data encryption.
2. Integrity. A mechanism to ensure that data received in a data communication was indeed the data
transmitted.
3. Nonrepudiation. A mechanism to ensure that all entities (systems or applications) participating
in a transaction cannot deny their actions in the transaction.
4. Availability. The systems ability to perform its primary functions and serve its legitimate users
without any disruption, under all conditions, including possible malicious attacks that target to disrupt
service, such as the well-known Denial of Service (DoS) attacks.
5. Authentication. The ability of the receiver of a message to identify the message sender.
6. Access control. The ability to ensure that only legal users may take part ina transactionandhave access
to system resources. To be effective, access control is typically used in conjunction with authentication.
These requirements are placed by different parties involved in the development and use of computing
systems, for example, vendors, application providers, and users. For example, vendors need to ensure the
protection of their Intellectual Property (IP) that is embedded in the system, while end users want to be
certain that the systemwill provide secure user identication (only authorized users may access the system
and its applications, even if the system gets in the hands of malicious users) and will have high availability,
that is, the system will be available under all circumstances; also, content providers are concerned for
the protection of their IP, for example, that the data delivered through an application are not copied.
Ravi et al. [3,4] have identied the participating parties in system and application development and use
as well as their security requirements. This classication enables us to identify several possible malicious
users, depending on a partys view; for example, for the hardware manufacturer, even a legal end user
of a portable device (e.g., a PDA or a mobile phone) can be a possible malicious user.
Considering the security requirements and the interested parties above, the design of a secure system
requires identication and denition of the following parameters: (1) the abilities of the attackers, (2) the
level at which security should be implemented, and (3) implementation technology and operational
environment.
2006 by Taylor & Francis Group, LLC
Design Issues in Secure Embedded Systems 17-3
17.2.1 Abilities of Attackers
Malicious users can be classied in several categories, depending on their knowledge, equipment, etc.
Abraham et al. [5] propose a classication in three categories, depending on their knowledge, their
hardwaresoftware equipment, and their funds:
1. Class I clever outsiders. Very intelligent attackers, not well fundedandwithnosophisticatedequip-
ment. They do not have specic knowledge of the attacked system; basically they are trying to exploit
hardware vulnerabilities and software glitches.
2. Class II knowledgeable insiders. Attackers with outstanding technical background and education,
using highly sophisticated equipment and, often, with inside information for the system under attack;
such attackers include former employees who participated in the development cycle of the system.
3. Class III funded organizations. Attackers who are mostly working in teams, and have excellent
technical skills and theoretical background. They are well funded, have access to very advanced tools
and also have the capability to analyze the system technically and theoretically developing highly
sophisticated attacks. Such organizations could be well-organized education foundations, government
institutions, etc.
17.2.2 Security Implementation Levels
Security can be implemented at various systemlevels, ranging fromprotection of the physical systemitself
to application and network security. Clearly, different mechanisms and implementation technologies
have to be used to implement security at different levels. In general, the levels of security considered are
four: (1) physical, (2) hardware, (3) software, and (4) network and protocol security.
Physical security mechanisms target to protect systems from unauthorized physical access to the system
itself. Protecting systems physically ensures data privacy and data and application integrity. According to
US Federal Standard 1027, physical security mechanisms are considered successful when they ensure that
a possible attack will have low possibility of success and high possibility of tracking the malicious attacker,
in reasonable time. The wide adoption of embedded computing systems in a variety of devices, such as
smartcards, mobile devices, and sensor networks, as well as the ability to network them, for example,
through the Internet or VPNs, has led to revision and reconsideration of physical security. Weingart [6]
surveys possible attacks and countermeasures concerning physical security issues, concluding that physical
security needs continuous improvement and revision in order to keep at the leading edge.
Hardware security may be considered as a subset of physical security, referring to security issues con-
cerning the hardware parts of a computer system. Hardware-level attacks exploit circuit and technological
vulnerabilities and take advantage of possible hardware defects. These attacks do not necessarily require
very sophisticated and expensive equipment. Anderson and Kuhn [7] describe several ways to attack
smartcards and microcontrollers through the use of unusual voltages and temperatures that affect the
behavior of specic hardware parts or through microprobing a smartcard chip, such as the Subscriber
Identity Module (SIM) chip found in cellular phones. Reverse engineering attack techniques are equally
successful as Blythe et al. [8] reported for the case of a wide range of microprocessors. Their work con-
cluded that special hardware protection mechanisms are necessary to avoid these types of attacks; such
mechanisms include silicon coatings of the chip, increased complexity in the chip layout, etc.
One of the major goals in the design of secure systems is the development of secure software, which is
free of aws and security vulnerabilities that may appear under certain conditions. Numerous software
security aws have been identied in real systems, for example, by Landwehr et al. [9], and there have been
several cases where malicious intruders hack into systems through exploitation of software defects [10].
Some methods for the prevention of such problems have been proposed by Tevis and Hamilton [11].
The use of the Internet, which is an unsafe interconnection for information transfer, as a backbone
network for communicating entities and the wide deployment of wireless networks demonstrate that
improvements have to be done in existing protocol architectures in order to provide new, secure pro-
tocols [12,13]. Such protocols will ensure authentication between communicating entities, integrity of
2006 by Taylor & Francis Group, LLC
17-4 Embedded Systems Handbook
communicated data, protection of the communicating parties, and nonrepudiation (the inability of an
entity to deny its participation in a communication transaction). Furthermore, special attention has to
be paid in the design of secure protocols for embedded systems, due to their physical constraints, that is,
limited battery power, limited processing, and memory resources, as well as their cost and communication
requirements.
17.2.3 Implementation Technology and Operational Environment
In regard to implementation technology, systems can be classied by static versus programmable techno-
logy and xed versus extensible architecture. When static technology is used, the hardware-implemented
functions are xed and inexible, but they offer higher performance and can reduce cost. However, static
systems can be more vulnerable to attacks, because, once a aw is identied for example, in the design
of the system it is impossible to patch already deployed systems, especially in the case of large installa-
tions, such as SIM cards for cellular telephony or pay-per-view TV. Static systems should be implemented
only once and correctly, which is an unattainable expectation in computing.
In contrast, programmable systems are not limited as static ones, but they can be proven exible in the
hands of an attacker as well; system exibility may allow an attacker to manipulate the system in ways not
expected or dened by the designer. Programmability is typically achieved through the use of specialized
software over a general-purpose processor or hardware.
Fixed architectures are composed of specic hardware components that cannot be altered. Typically,
it is almost impossible to add functionality in later stages, but they have lower cost of implementation and
are, in general, less vulnerable because they offer limited choices to attackers. An extensible architecture
is like a general-purpose processor, capable to interface with several peripherals through standardized
connections. Peripherals can be changed or upgraded easily to increase security or to provide new func-
tionality. However, an attacker can connect malicious peripherals or interface the system in untested or
unexpected cases. As testing is more difcult relatively to static systems, one cannot be too condent that
the system operates correctly under every possible input.
Field Programmable Gate Arrays (FPGAs) combine benets of all types of systems and architectures
because they combine hardware implementation performance and programmability, enabling system
reconguration. They are widely used to implement cryptographic primitives in various systems. Thus,
signicant attention has to be paid to the security of FPGAs as independent systems. There exist research
efforts addressing this issue where systematic approaches are developed and open problems in FPGA
security are addressed; for example, Wollinger et al. [14] provide such an approach and address several
open problems, including resistance under physical attacks.
17.3 Security Constraints in Embedded Systems Design
The design of secure systems requires special considerations, because security functions are resource
demanding, especially in terms of processing power and energy consumption. The limited resources
of embedded systems require novel design approaches in order to deal with trade-offs between
efciency speed and cost and effectiveness satisfaction of the functional and operational
requirements.
17.3.1 Energy Considerations
Embedded systems are often battery powered, that is, they are power constrained. Battery capacity consti-
tutes a major bottleneck to processing for security on embedded systems. Unfortunately, improvements in
battery capacity do not followthe improvements of increasing performance, complexity, and functionality
of the systems they power. Gunther et al. [15], Buchmann [16], and Lahiri et al. [17] report the widening
battery gap, due to the exponential growth of power requirements and the linear growth in energy
density. Thus, the power subsystem of embedded systems is a weak point of system security. A malicious
2006 by Taylor & Francis Group, LLC
Design Issues in Secure Embedded Systems 17-5
attacker, for example, may forma DoS attack by draining the systems battery more quickly than the usual.
Martin et al. [18] describe three ways in which such an attack may take place: (1) service request power
attacks, (2) benign power attacks, and (3) malignant power attacks. In service request attacks, a malicious
user may request repeatedly from the device to serve a power hungry application, even if the application is
not supported by the device. In benign power attacks, the legitimate user is forced to execute an application
with high-power requirements, while in malignant power attacks malicious users modify the executable
code of an existing application, in order to drain as much battery power as possible without changing the
application functionality. They conclude that such attacks may reduce battery life by one to two orders of
magnitude.
Inclusion of security functions in an embedded system places extra requirements on power con-
sumption due to: (1) extra processing power necessary to perform various security functions, such as
authentication, encryption, decryption, signing, and data verication, (2) transmission of security-related
data between various entities, if the system is distributed, that is, a wireless sensor network, and (3) energy
required to store security-related parameters.
Embedded systems are often used to deploy performance-critical functions, which require a lot of pro-
cessing power. Inclusion of cryptographic algorithms that are used as building blocks in secure embedded
design may lead to great consumption of system battery. The energy consumption of the cryptographic
algorithms used in security protocols has been analyzed well, for example, by Potlapally et al. [19]. They
present a general framework that shows asymmetric algorithms having the highest energy cost, symmetric
algorithms as the next power-hungry category, and hash algorithms at the bottom. The power required by
cryptographic algorithms is signicant as measurements indicate [20]. Importantly, in many applications
the power consumed by security functions is larger than that used for the applications themselves. For
example, Raghunathan et al. [21] present the battery gap for a sensor node with an embedded processor,
calculating the number of transactions that the node can serve working in secure or insecure mode until
system battery runs out. Their results state that working in secure mode consumes the battery in less than
half time than when working in insecure mode.
Many applications that involve embedded systems are implemented through distributed, networked
platforms, resulting in a power overhead due to communication between the various nodes of the
system [1]. Considering a wireless sensor network, which is a typical distributed embedded system,
one can easily see that signicant energy is consumed in communication between various nodes. Factors
such as modulation type, data rate, transmit power, and security overhead affect power consumption
signicantly [22]. Savvides et al. [23] showed that the radio communication between nodes consumes
most of the power, that is, 50 to 60% of the total power, when using the WINS Wireless Integrated
Network Sensor platform [24]. Furthermore, in a wireless sensor network, the security functions
consume energy due to extra internode exchange of cryptographic information key exchange, authen-
tication information and per-message security overhead, which is a function of both the number
and the size of messages [20]. It is important to identify the energy consumption of alternative secur-
ity mechanisms. Hodjat and Verbauwhede [25], for example, have measured the energy consumption
using two widely used algorithms for the key exchange of information between entities in a distributed
environment, (1) DifeHellman protocol [26] and (2) basic Kerberos protocol [26]. Their results show
that DifeHellman, implemented using elliptic curve public key cryptography, consumes 1213.7 mJ,
4296 mJ, and 9378.3 mJ for 128-bit, 192-bit, and 256-bit keys, respectively, while the Kerberos key
exchange protocol using symmetric cryptography consumes 139.62 mJ; this indicates that the Kerberos
protocol conguration consumes signicantly less energy.
17.3.2 Processing Power Limitations
Security processing places signicant additional requirements on the processing power of embedded
systems, since conventional architectures are quite limited. The termsecurity processing is used to indicate
the portion of the system computational effort that is dedicated to the implementation of the security
requirements. Since embedded systems have limited processing power, they cannot cope efciently with
2006 by Taylor & Francis Group, LLC
17-6 Embedded Systems Handbook
the execution of complex cryptographic algorithms, which are used in the secure design of an embedded
system. For example, the generation of a 512-bit key for the RSA public key algorithm requires 3.4 min
for the PalmIIIx PDA, while encryption using DES takes only 4.9 msec per block, leading to an encryption
rate of 13 Kbps [27]. The adoption of modern embedded systems in high-end systems (servers, rewalls,
and routers) with increasing data transmission rates and complex security protocols, such as SSL, make
the security processing gap wider and demonstrate that the existing embedded architectures need to be
improved, in order to keep up with the increasing computational requirements that are placed by security
processing.
The wide processing gap has been exposed by measurements, as by Ravi et al. [4], who measured
the security processing gap in the client-server model using the SSL protocol for various embedded
microprocessors. Specically, considering a StrongARM (206 MHz SA-1110) processor, which may be
used in a low-end system such as a PDA or a mobile device, 100% of the processing power dedicated to
SSL processing can achieve data rates up to 1.8 Mbps, while a 2.8 GHz Xeon achieves data rates up to
29 Mbps. Considering that the data rates of low-end systems range between 128 Kbps and 2 Mbps, while
data rates of high-end systems range between 2 and 100 Mbps, it is clear that the processors mentioned
above cannot achieve higher data rates than their maximum, leading to a security processing gap.
17.3.3 Flexibility and Availability Requirements
The design and implementation of security in an embedded system does not mean that the system will
not change its operational security characteristics through time. Considering that security requirements
evolve and security protocols are continuously strengthened, embedded systems need to be exible and
adaptable to changes in security requirements, without losing their performance and availability goals as
well as their primary security objectives.
Modernembeddedsystems are characterizedby their ability to operate indifferent environments, under
various conditions. Such an embedded systemmust be able to achieve different security objectives in every
environment; thus, the system must be characterized by signicant exibility and efcient adaptation.
For example, consider a PDA with mobile telecommunication capabilities that may operate in a wireless
environment [2830] or provide 3G cellular services [31]; different security objectives must be satised in
each case. Another issue that must be addressed is the implementation of different security requirements
at different layers of the protocol architecture. Consider, for example, a mobile PDA that must be able to
execute several security protocols, such as IPSec [13], SSL [12], and WEP [32], depending on its specic
application.
Importantly, availability is a signicant requirement that needs special support, considering that it
should be provided in an evolving world in terms of functionality and increasing system complexity.
Conventional embedded systems should target to provide high availability characteristics not only in their
expected, attack-free environment but in an emerging hostile environment as well.
17.3.4 Cost of Implementation
Inclusion of security in embedded system design can increase system cost dramatically. The problem
originates fromthe strong resource limitations of embedded systems, through which the systemis required
to exhibit great performance as well as high level of security while retaining a low cost of implementation.
It is necessary to perform a careful, in-depth analysis of the designed system, in terms of the abilities of
the possible adversaries, the environmental conditions under which the system will operate, etc., in order
to estimate cost realistically. Consider, for example, the incorporation of a tamper-resistant cryptographic
module in an embedded system. As described by Ravi et al. [4], according to the Federal Information
Processing Standard [33], a designer can distinguish four levels of security requirements for cryptographic
modules. The choice of the security level inuences design and implementation cost signicantly; so, the
manufacturer faces a trade-off between the security requirements that will be implemented and the cost
of manufacturing.
2006 by Taylor & Francis Group, LLC
Design Issues in Secure Embedded Systems 17-7
17.4 Design of Secure Embedded Systems
Secure embedded systems must provide basic security properties, such as data integrity, as well as mech-
anisms and support for more complex security functions, such as authentication and condentiality.
Furthermore, they have to support the security requirements of applications, which are implemented,
in turn, using the security mechanisms offered by the system. In this section, we describe the main design
issues at both the system and application level.
17.4.1 System Design Issues
Design of secure embedded systems needs to address several issues and parameters ranging from the
employed hardware technology to software development methodologies. Although several techniques
used in general-purpose systems can be effectively used in embedded system development as well, there
are specic design issues that need to be addressed separately, because they are unique or weaker in
embedded systems, due to the high volume of available low-cost systems that can be used for development
of attacks by malicious users. The major of these design issues are tamper-resistance properties, memory
protection, IP protection, management of processing power, communication security, and embedded
software design. These issues are covered in the following paragraphs.
Modernsecure embeddedsystems must be able tooperate invarious environmental conditions, without
loss of performance and deviation from their primary goals. In many cases they must survive various
physical attacks and have tamper-resistance mechanisms. Tamper resistance is the property that enables
systems to prevent the distortion of physical parts. Additionally to tamper-resistance mechanisms, there
exist tamper-evidence mechanisms, which allow users or technical staff to identify tampering attacks
and take countermeasures. Computer systems are vulnerable to tampering attacks, where malicious users
intervene in hardware system parts and compromise them, in order to take advantage of them. Security of
many critical systems relies on tamper resistance of smartcards and other embedded processors. Anderson
and Kuhn [7] describe various techniques and methods to attack tamper-resistance systems, concluding
that tamper-resistance mechanisms need to be extended or reevaluated.
Memory technology may be an additional weakness in system implementation. Typical embedded sys-
tems have ROM, RAM, and EEPROMmemory to store data. EEPROMmemory constitutes the vulnerable
spot of such systems, because it can be erased with the use of appropriate electrical signaling by malicious
users [7].
Intellectual Property (IP) protection of manufacturers is an important issue addressed in secure embed-
ded systems. Complicated systems tend to be partitioned in smaller independent modules leading to
module reusability and cost reduction. These modules include IP of the manufacturers, which needs
to be protected from third-party users, who might claim and use these modules. The illegal users of an
IP block do not necessarily need to have full, detailed knowledge of the IP component, since IP blocks are
independent modules which can very easily be incorporated and integrated with the rest of the system
components. Lach et al. [34] propose a ngerprinting technique for IP blocks implemented using FPGAs,
through an embedded unique marker onto the IP hardware that identies both the origin and the recip-
ient of the IP block. Also, they are stating that the removal of such a mark is extremely difcult, with a
probability of less than one in a million.
Implementation of security techniques for tamper resistance, tamper prevention, and IP protection
may require additional processing power, which is limited in embedded systems. The processing gap
between the computational requirements of security and the available processing power of embedded
processors requires special consideration. A variety of architectures and enhancements in security pro-
tocols have been proposed, in order to bridge that gap. Burke et al. [35] propose enhancements in
the Instruction Set Architecture (ISA) of embedded processors, in order to efciently calculate various
cryptographic primitives, such as permutations, bit rotations, fast substitutions, and modular arith-
metic. Another approach is to build dedicated cryptographic embedded coprocessors with their own
ISA. The Cryptomaniac coprocessor [36] is an example of this approach. Several vendors, for example,
2006 by Taylor & Francis Group, LLC
17-8 Embedded Systems Handbook
Inneon [37] and ARM [38], have manufactured microcontrollers that have embedded coprocessors
dedicated to serve cryptographic functions. Intel [39] announced a new generation of 64-bit embedded
processors that have some features that can speed up processing hungry algorithms, such as cryptographic
ones; these features include larger register sets, parallel execution of computations, improvements in
large integers multiplication, etc. In a third approach, software optimizations are exploited. Potlapally
et al. [40] have conducted extensive research in the improvement of public-key algorithms, studying
various algorithmic optimizations, identifying an algorithm design space where performance is improved
signicantly. Also, SmartMIPS [41] provides system exibility and adaptation to any changes in secur-
ity requirements through high-performance software-based enhancements of its cryptographic modules,
while it supports various cryptographic algorithms.
Even if the processing gap is bridged and security functions are provided, embedded systems are
required to support secure communications as well, considering that, often, embedded applications are
implemented in a distributed environment where communicating systems may exchange (possibly) sens-
itive data over an untrusted network-wired, wireless or mobile-like Internet, a Virtual Private Network,
the Public Telephone network, etc. In order to fulll the basic security requirements for secure commu-
nications, embedded systems must be able to use strong cryptographic algorithms and to support various
protocols. One of the fundamental requirements regarding secure protocols is interoperability, leading to
the requirement for system exibility and adaptability. Since an embedded system can operate in several
environments, for example, a mobile phone may provide 3G cellular services or connect to a wireless
LAN, it is necessary for the system to operate securely in all environments without loss of performance.
Furthermore, as security protocols are developed for various layers of the OSI reference model, embedded
systems must be adaptable to different security requirements at each layer of the architecture. Finally,
the continuous evolutions of security protocols require system exibility as new standards are developed,
and requirements are reevaluated and new cryptographic techniques are added to overall architecture.
A comprehensive presentation of the evolution of security protocols in wireless communications, such as
WTLS [42], MET[43], and IPSec [13], is provided by Raghunathan et al. [21]. An important consideration
in the development of exible secure communication subsystems for embedded systems is the limitation
of energy, and processing and memory resources. The performance/cost trade-off leads to special atten-
tion for the placement of protocol functions in hardware for high performance or software for
cost reduction.
Embedded software, such as the operating system or application-specic code, constitutes a crucial
factor in secure embedded systemdesign. Kocher and co-workers [3] identify three basic factors that make
embedded software development a challenging area of security: (1) complexity of the system, (2) system
extensibility, and (3) connectivity. Embedded systems serve critical, complex, and hard to implement
applications, with many parameters that need to be considered, which, in turn, leads to buggy and
vulnerable software. Furthermore, the required extensibility of conventional embedded systems makes
the exploitation of vulnerabilities relatively easy. Finally, as modern embedded systems are designed with
network connectivity, the higher the connectivity degree of the system, the higher the risk for a software
breachtoexpandas time goes by. Many attacks canbe implementedby malicious users that exploit software
glitches and lead to system unavailability, which can have a disastrous impact, for example, a DoS attack
on a military embedded system. Landwehr et al. [9] present a survey of common software security faults,
helping designers to learn from their faults. Tevis and Hamilton [11] propose some methods to detect and
prevent software vulnerabilities, focusing on some possible weaknesses that have to be avoided, preventing
buffer overow attacks, heap overow attacks, array indexing attacks, etc. They also provide some coded
security programs that help designers to analyze the security of their software. Buffer overow attacks
constitute the most widely used type of attacks that lead to unavailability of the attacked system; with
these attacks, malicious users exploit system vulnerabilities and are able to execute malicious code, which
can cause several problems such as a system crash preventing legitimate users from using the system,
loss of sensitive data, etc. Shao et al. [44] propose a technique, called HardwareSoftware Defender, which
targets to protect an embedded system from buffer overow attacks; their proposal is to design a secure
instruction set, extending the instruction set of existing microprocessors, and to demand from outside
2006 by Taylor & Francis Group, LLC
Design Issues in Secure Embedded Systems 17-9
software developers to call secure functions from that set. The limited memory resources of embedded
systems, specically the lack of disk space and virtual memory, make the system vulnerable in cases of
memory-hungry applications: applications that require excessive amount of memory do not have a swap
le to growand canvery easily cause anout-of-memory unavailability of the system. Giventhe signicance
of this potential problemandattack, Biswas et al. [45] propose mechanisms to protect anembeddedsystem
from such a memory overow, thus providing reliability and availability of the system: (1) use of software
runtime checks, in order to check possible out-of-memory conditions, (2) allowing out-of-memory data
segments to be placed in free system space, and (3) compressing already used and unnecessary data.
17.4.2 Application Design Issues
Embedded system applications present signicant challenges to system designers, in order to achieve
efcient and secure systems.
A key issue in secure embedded design is user identication and access control. User identication
includes the necessary mechanisms that guarantee that only legitimate users have access tosystemresources
and can also verify, whenever requested, the identity of the user who has access to the system. The explos-
ive growth of mobile devices and their use in critical, sensitive transactions, such as bank transactions,
e-commerce, etc., demand secure systems with high performance and low cost. This demand has become
urgent and crucial considering the successful attacks on these systems, such as the recent hardware hacking
attacks onPIN(Personal IdenticationNumber)-based bankATMs (Automatic Teller Machines) that have
led to signicant loss of money and decreased the credibility of nancial organizations toward people.
A solution to this problem may come from an emerging new technology for user identication that is
based on biometric recognition, for both user identication and verication. Biometrics are based on
pattern recognition in acquired biological data taken from a user who wants to gain access to a system,
that is, palm prints [46], nger prints [47], iris scan, etc., and comparing them with the data that have
been stored in databases identifying the legitimate users of the system [48]. Moon et al. [49] propose
a secure smartcard that uses biometrics capabilities, claiming that the development of such a system is
less vulnerable to attacks when compared with software-based solutions and that the combination of
smartcard and ngerprint recognition is much more robust than PIN-based identication. Implement-
ation of such systems is realistic as Tang et al. [50] illustrated with the implementation of a ngerprint
recognition system with high reliability and high speed; they achieved an average computational time
per ngerprint image less than 1 sec, using a xed-point arithmetic StrongARM 200 MHz embedded
processor.
As mentioned previously, an embedded system must store information that enables it to identify and
validate users that have access to the system. But, how does an embedded system store this information?
Embedded systems use several types of memory to store different types of data: (1) ROM EPROM to store
programming data used to serve generic applications, (2) RAMto store temporary data, and (3) EEPROM
and FLASH memories to store mobile downloadable code [20]. In an embedded device such as a PDA
or a mobile phone several pieces of sensitive information, such as PINs, credit card numbers, personal
data, keys, and certicates for authorization purposes, may be permanently stored in secondary storage
media. The requirement to protect this information as well as the rapid growth of communications
capabilities of embedded devices, for example, mobile Internet access, which make embedded systems
vulnerable to network attacks as well, lead to increasing demands for secure storage space. The use
of hard cryptographic algorithms to ensure data integrity and condentiality is not feasible in most
embedded systems, mainly due to their limited computational resources. Benini et al. [51] present a
survey of architectures and techniques used to implement memory for embedded systems, taking into
consideration energy limitations of embedded systems. Rosenthal [52] presents an effective way to ensure
that data cannot be erased or destroyed by hiding memory from the processor through use of a serial
EEPROM, which is the same as standard EEPROM with the only difference that a serial link binds the
memory with the processor reading/writing data, using a strict protocol. Actel [53] describes security
issues and design considerations for the implementation of embedded memories using FPGAs claiming
2006 by Taylor & Francis Group, LLC
17-10 Embedded Systems Handbook
that SRAM FPGAs are vulnerable to Level I [5] attacks, while it is more preferable to use nonvolatile Flash
and Antifuse-based FPGA memories, which provide higher levels of security relatively to SRAM FPGAs.
Another key issue in secure embedded systems design is to ensure that any digital content already stored
or downloaded in the embedded system will be used according to the terms and conditions the content
provider has set and in accordance with the agreements between user and provider; such content includes
software for a specic application or a hardware component embedded in the system by a third-party
vendor. It is essential that conventional embedded devices, mobile or not, be enhanced with Digital Right
Management (DRM) mechanisms, inorder to protect the digital IPof manufacturers and vendors. Trusted
computing platforms constitute one approach to resolve this problem. Such platforms are signicant, in
general, as indicated by the Trusted Computing PlatformAlliance (TCPA) [54], which tries to standardize
the methods to build trusted platforms. For embedded systems, IP protection can be implemented in
various ways. A method to produce a trusted computing platform based on a trusted, secure hardware
component, called spy, can lead to systems executing one or more applications securely [55,56].Ways to
transform a 3G mobile device into a trusted one have been investigated by Messerges and Dabbish [57],
who are capable of protecting content through analysis and probing of the various components in a
trusted system; for example, the operating systemof the embedded systemis enhanced with DRMsecurity
hardwaresoftware part, which transforms the system to a trusted one. Alternatively, Thekkath et al. [58]
propose a method to prevent unauthorized reading, modication, and copying of proprietary software
code, using eXecute Only Memory (XOM) system that permits only code execution. The concept is that
code stored in a device can be marked as execute only and content-sensitive applications can be stored
in independent compartments [59]. If an application tries to access data outside its compartment, then it
is stopped.
Signicant attention has to be paid to protect against possible attacks through malicious downloadable
software, such as viruses, Trojans, logic bombs, etc. [60]. The wide deployment of distributed embedded
systems and the Internet have resulted in the requirement for an ability of portable embedded systems, for
example, mobile phones and PDAs, to download and execute various software applications. This ability
may be newtothe worldof portable, highly constrainedembeddedsystems, but it is not newinthe worldof
general-purpose systems, which have had the ability to download and execute Java applets and executable
les from the Internet or from other network resources for a long time. One major problem in this service
is that users cannot be sure about the content of the software that is downloaded and executed on their
system(s), who the creator is and what its origin is. Kingpin and Mudge [61] provide a comprehensive
presentation of security issues in personal digitals assistants, analyzing in detail what malicious software
is, that is, viruses, Trojans, backdoors, etc., where it resides and how it is spread, giving to the future users
of such devices a deeper understanding about the extra security risks that arise with the use of mobile
downloadable code. An additional important consideration is the robustness of the downloadable code:
once the mobile code is considered secure, downloaded, and executed, it must not affect preinstalled
system software. Various techniques have been proposed to protect remote hosts from malicious mobile
code. The sandbox technique, proposed by Rubin and Geer [62], is based on the idea that the mobile code
cannot execute system functions, that is, it cannot affect the le system or open network connections.
Instead of disabling mobile code from execution, one can empower it using enhanced security policies as
Venkatakrishnan et al. [63] propose. Necula [64] suggests the use of proof-carrying code. The producer
of the mobile code, a possibly untrusted source, must embed some type of proof that can be tested by the
remote host in order to prove the validity of the mobile code.
17.5 Cryptography and Embedded Systems
Secure embedded systems should support the basic security functions for (1) condentiality, (2) integrity,
and (3) authentication. Cryptography provides a mechanism that ensures that the previous three require-
ments are met. However, implementation of cryptography in embedded systems can be a challenging
task. The requirement of high performance has to be achieved in a resource-limited environment; this
2006 by Taylor & Francis Group, LLC
Design Issues in Secure Embedded Systems 17-11
task is even more challenging when lowpower constraints exist. Performance usually dictates an increased
cost, which is not always desirable or possible. Cryptography can protect digital assets provided that the
secret keys of the algorithms are stored and accessed in a secure manner. For this, the use of specialized
hardware devices to store the secret keys and to implement cryptographic algorithms is preferred over the
use of general-purpose computers. However, this also increases the implementation cost and results in
reduced exibility. On the other hand, exibility is required, because modern cryptographic protocols do
not rely on a specic cryptographic algorithm but rather allows the use of a wide range of algorithms for
increased security and adaptability to advances on cryptanalysis. For example, both the SSL and IPSec net-
work protocols support numerous cryptographic algorithms to perform the same function, for example,
encryption. The protocol enables negotiation of the algorithms to be used, in order to ensure that both
parties use the desirable level of protection dictated by their security policies.
Apart from the performance issue, a correct cryptographic implementation requires expertise that is
not always available or affordable during the lifecycle of a system. Insecure implementations of theor-
etically secure algorithms have made their way to headline news quite often in the past. An excellent
survey on cryptography implementation faults is provided in [65], while Anderson [66] focuses on the
causes of cryptographic systems failures in banking applications. A common misunderstanding is the use
of random numbers. Pure Linear Feedback Shift Registers (LFSRs) and other pseudorandom number
generators produce random-looking sequences that may be sufcient for scientic experiments but can
be disastrous for cryptographic algorithms that require some unpredictable random input. On the other
hand, the cryptographic community has focused on proving the theoretical security of various crypto-
graphic algorithms and has paid little attention to actual implementations on specic hardware platforms.
In fact, many algorithms are designed with portability in mind and efcient implementation on a specic
platform meeting specic requirements can be quite tricky. This communication gap between vendors
and cryptographers intensies in the case of embedded systems, which can have many design choices and
constraints that are not easily comprehensible.
In the late 1990s, Side-Channel Attacks (SCAs) were introduced. SCAare a method of cryptanalysis that
focuses on the implementation characteristics of a cryptographic algorithm in order to derive its secret
keys. This advancement bridged the gap between embedded systems, a common target of such attacks, and
cryptographers. Vendors became aware and concerned by this new form of attacks, while cryptographers
focused on the specics of the implementations, in order to advance their cryptanalysis techniques.
In this section, we present side-channel cryptanalysis. First, we introduce the concept of tamper
resistance, the implementation of side channels and information leakage through them from otherwise
secure devices; then, we demonstrate how this information can be exploited to recover the secret keys of
cryptographic algorithm, presenting case studies of attacks to the RSA algorithm.
17.5.1 Physical Security
Secrecy is always a desirable property. In the case of cryptographic algorithms, the secret keys of the
algorithm must be stored, accessed, used, and destroyed in a secure manner, in order to provide the
required security functions. This statement is often overlooked and design or implementation aws result
in insecure cryptographic implementations. It is well known that general-purpose computing systems and
operating systems can not provide enough protection mechanisms for cryptographic keys. For example,
SSL certicates for web servers are stored unprotected on servers disks and rely on le system permissions
for protection. This is necessary, because web servers can offer secure services unattended. Alternatively,
a human would provide the password to access the certicate for each connection; this would not be an
efcient decision in the era of e-commerce, where thousands of transactions are made every day. On the
other hand, any software bug in the operating system, in a high-privileged application or in the web server
software itself may expose this certicate to malicious users.
Embedded systems are commonly used for implementing security functions. Since they are complete
systems, they can perform the necessary cryptographic operations in a sealed and controlled environ-
ment [6769]. Tamper resistance refers to the ability of a system to resist to tampering attacks, that is,
2006 by Taylor & Francis Group, LLC
17-12 Embedded Systems Handbook
attempts to bypass its attack-prevention mechanisms. The IBM PCI Cryptographic Coprocessor [70] is
such a system, having achieved FIPS 140-2 Level 4 certication [33]. Advancements of DRMtechnology to
consumer devices and general-purpose computers drives the use of embedded systems for cryptographic
protection of IP. Smartcards are a well-known example of tamper-resistant embedded systems that are
used for nancial transactions and subscription-based service provision.
In many cases, embedded systems used for security-critical operations do not implement any tamper-
resistance mechanisms. Rather, a thin layer of obscurity is preferred, both for simplicity and performance
issues. However, as users become more interested in bypassing the security mechanisms of the system,
the thin layer of obscurity is easily broken and the cryptographic keys are publicly exposed. The Adobe
eBook software encryption [71], the Microsoft XBox case [72], the USB hardware token devices [73], and
the DVD CSS copy protection scheme [74] are examples of systems that have implemented security by
obscurity and were easily broken.
Finally, an often neglected issue is a lifecycle-wide management of cryptographic systems. While a
device may be withdrawn from operation, the data it has stored or processed over time may still need to
be protected. The security of keys that relies on the fact that only authorized personnel has access to the
system may not be sufcient for the recycled device. Garnkel and Shelat [75], Skorobogatov [76], and
Gutman [77] present methods for recovering data from devices using noninvasive techniques.
17.5.2 Side-Channel Cryptanalysis
Until the middle 1990s, academic research on cryptography focused on the mathematical properties of
the cryptographic algorithms. Paul Kocher was the rst to present cryptanalysis attacks on implement-
ations of cryptographic algorithms, which were based on the implementation properties of a system.
Kocher observed that a cryptographic implementation of the RSA algorithm required varying amounts
of time to encrypt a block of data depending on the secret key used. Careful analysis of the timing
differences, allowed him to derive the secret key and he extended this method to other algorithms as
well [78]. This result came as a surprise, since the RSA algorithm has withstood years of mathematical
cryptanalysis and was considered secure [79]. A short time later, Boneh et al. presented theoretical attacks
on how to derive the secret keys on implementations of the RSA algorithm and the Fiat-Shamir and
Schnorr identication schemes [80], revised in Reference 81, while similar results were presented by
Bao et al. [82].
These ndings revealed a new class of attacks on cryptographic algorithms. The term side-channel
attacks (SCAs), rst appeared in Reference 83, has been widely used to refer to this type of cryptanalysis,
while the terms fault-based cryptanalysis, implementation cryptanalysis, active/passive hardware attacks,
leakage attacks, and others have been used also. Cryptographic algorithms acquired a new security dimen-
sion, that of their exact implementation. Cryptographers had previously focused on understanding the
underlying mathematical problems to prove or conjecture for the security of a cryptographic algorithm
based on the abstract mathematical symbols. Now, in spite of the hard underlying mathematical problems
to be solved, an implementation may be vulnerable and allow the extraction of secret keys or other sens-
itive material. Implementation vulnerabilities are of course not a new security concept. In the previous
section, we presented some impressive attacks on security that were based on implementation faults. The
new concept of SCA is that even cryptographic algorithms that are otherwise considered secure can be
also vulnerable to such faults. This observation is of signicant importance, since cryptography is widely
used as a major building block for security; if cryptographic algorithms can be driven insecure, the whole
construction collapses.
Embeddedsystems andespecially smartcards are a popular target for SCA. Tounderstandthis, recall that
such systems are usually owned by a service provider, such as a mobile phone operator, a TV broadcaster
or a bank, and possessed by service clients. The service provider resides on the security of the embedded
system in order to prove service usage by the clients, such as phone calls, movie viewing or a purchase, and
charge the client accordingly. Onthe other hand, consumers have the incentive tobypass these mechanisms
in order to enjoy free services. Given that SCAare implementation specic and rely, as we will present later,
2006 by Taylor & Francis Group, LLC
Design Issues in Secure Embedded Systems 17-13
on the ability to interfere, passively or actively with the device implementing a cryptographic algorithm,
embedded systems are a further attractive target, given their resource limitation, which makes the attack
efforts easier.
Inthe following, we present the classes of SCAandcountermeasures that have beendeveloped. The tech-
nical eld remains highly active, since ingenious channels are continuously appearing in the bibliography.
Embedded system vendors must study the attacks carefully, evaluate the associated risks for their environ-
ment, and ensure that appropriate countermeasures are implemented in their systems; furthermore, they
must be prepared to adapt promptly to new techniques for deriving secrets from their systems.
17.5.3 Side-Channel Implementations
A side channel is any physical channel that can carry information from the operation of a device while
implementing a cryptographic operation; such channels are not captured by the existing abstract mathe-
matical models. The denition is quite broad and the inventiveness of attackers is noticeable. Timing
differences, power consumption, electromagnetic emissions, acoustic noise, and faults have been currently
exploited for leaking information out of cryptographic systems.
The channel realization can be categorized in three broad classes: physical or probing attacks, fault-
induction or glitch attacks, and emission attacks, such as TEMPEST. We shortly reviewthe rst two classes;
readers interested in TEMPEST attacks are referred to Reference 84.
The side channels may seem unavoidable and a frightening threat. However, it should be strongly
emphasized that in most cases, reported attacks, both theoretical and practical, rely for their success on
the detailed knowledge of the platformunder attack and the specic implementation of the cryptographic
algorithm. For example, power analysis is successful in most cases, because cryptographic algorithms
tend to use only a small subset of a processors instruction set and especially simple instructions, such
as LOAD, STORE, XOR, AND, and SHIFT, in order to develop elegant, portable, and high-performance
implementations. This decision allows an attacker to minimize the power proles he or she must construct
and simplies the distinction of different instructions that are executed.
17.5.3.1 Fault-Induction Techniques
Devices are always susceptible to erroneous computations or other kinds of faults for several reasons.
Faulty computations are a known issue from space systems, because, in deep space, devices are exposed to
radiationthat cancause temporary or permanent bit ips, gate destruction, or other problems. Incomplete
testing during manufacturing may allow imperfect designs from reaching the market, as in the case of the
Intel Pentium FDIV bug [85], or in the case of device operation in conditions out of their specica-
tions [86]. Careful manipulation of the power supply or the clock oscillator can also cause glitches in code
execution by tricking the processor, for example, to execute unknown instructions or bypass a control
statement [87].
Some researchers have questioned the feasibility of fault-injection attacks on real systems [88]. While
fault injection may seem as an approach that requires expensive and specialized equipment, there have
been reports that fault injection can be achieved with low cost and readily available equipment. Anderson
and Kuhn [89] and Anderson [66] present low-cost attacks for tamper-resistant devices, which achieve
extraction of secret information fromsmartcards and similar devices. Kmmerling and Kuhn [87] present
noninvasive fault-injection techniques, for example, by manipulating power supply. Anderson [90] sup-
ports the view that the underground community has been using such techniques for quite a long time to
break the security of smartcards of pay-TV systems. Furthermore, Weingart [6] and Aumller et al. [91]
present attacks performed in a controlled lab environment, proving that fault-injection attacks are feasible.
Skorobogatov and Anderson [140] introduces low-cost light ashes, such as a camera ash, as a means
to introduce errors, while eddy-current attacks are introduced in Reference 92. A complete presentation
of the fault-injection methods is presented in Reference 93, along with experimental evidence on the
applicability of the methods to industrial systems and anecdotal information.
2006 by Taylor & Francis Group, LLC
17-14 Embedded Systems Handbook
The combined timespace isolation problem [94] is of signicant importance in fault-induction attacks.
The space isolation problem refers to isolation of the appropriate space (area) of the chip in which to
introduce the fault. The space isolation problem has four parameters:
1. Macroscopic. The part of the chip where the fault can be injected. Possible answers can be one or
more of the following: main memory, address bus, system bus, register le.
2. Bandwidth. The number of bits that can be affected. It may be possible to change just one bit or
multiple bits at once. The exact number of changed bits can be controllable (e.g., one) or follow a random
distribution.
3. Granularity. The area where the error can occur. The attacker may drive the fault-injection position
at a bit level or a wider area, such as a byte or a multibyte area. The fault-injected area can be covered by
a single error or by multiple errors. How are these errors distributed with respect to the area? They may
focus around the mark or may be evenly distributed.
4. Lifetime. The time duration of the fault. It may be a transient fault or a permanent fault. For
example, a power glitch may cause a transient fault at a memory location, since the next time the location
will be written, a new value will be correctly written. In contrast, a cell or gate destruction will result in
a permanent error, since the output bit will be stuck at 0 or 1, independently of the input.
The time isolation problem refers to the time at which a fault is injected. An attacker may be able to
synchronize exactly with the clock of the chip or may introduce the error in a random fashion. This
granularity is the only parameter of the time isolation problem. Clearly, the ability to inject a fault in
a clock period granularity is desirable, but impractical in real-world applications.
17.5.3.2 Passive Side Channels
Passive side channels are not a new concept in cryptography and security. The information available from
the now partially declassied TEMPEST project reveals helpful insights in how electromagnetic emissions
occur and can be used to reconstruct signals for surveillance purposes. A good review of the subject
is provided in chapter 15 of Reference 90. Kuhn [84,95,96] presents innovative use of electromagnetic
emissions to reconstruct information from CRT and LCD displays, while Loughry and Umphress [97]
reconstructs information owing through network devices using the emissions of their LEDs.
The newconcept in this area is the fact that such emissions can be also used to derive secret information
from an otherwise secure device. Probably, the rst such attack took place in 1956 [98]. MI5, the British
Intelligence, used a microphone to capture the sound of the rotor clicks of a Hagelin machine in order
to deduce the core position of some of its rotors. This resulted in reducing the problem to calculate
the initial setup of the machine within the range of their then available resources, and to eavesdrop the
encrypted communications for quite a long time. While the so-called acoustic cryptanalysis may seem
outdated, researchers provided a fresh look on this topic recently, by monitoring low-frequency (KHz)
sounds and correlating them with operations performed by a high-frequency (GHz) processor [99].
Researchers have been quite creative and have used many types of emissions or other physical interac-
tions of the device with the environment it operates. Kocher [78] introduces the idea of monitoring the
execution time of a cryptographic algorithm and tries to identify the secret keys used. The key concept
in this approach is that an implementation of an algorithm may contain branches and other conditional
execution or the implementation may follow different execution paths. If these variances are based on the
bit values of a secret key, then a statistical analysis can reveal the secret key bit-by-bit. Coron et al. [100]
explain the power dissipation sources and causes, while Kocher et al. [101] present how power con-
sumption can be also correlated with key bits. Rao and Rohatgi [102] and Quisquater and Samyde [86]
introduce electromagnetic analysis. Probing attacks can also be applied to reveal the Hamming weight of
data transferred across a bus or stored in memory. This approach is also heavily dependent on the exact
hardware platform [103,104].
While passive side channels are usually thought in the context of embedded systems and other resource-
limited environments, complex computing systems may also have passive side channels. Page [105]
explores the theoretical use of timing variations due to processor cache in order to extract secret keys.
2006 by Taylor & Francis Group, LLC
Design Issues in Secure Embedded Systems 17-15
Song et al. [106] take advantage of a timing channel in the secure communication protocol SSH to recover
user passwords, while Felten and Schneider [107] present timing attacks on web privacy. A malicious web
server can inject client-side code that fetches some specic pages transparently on behalf of the user; the
server would like to know if the user has visited these pages before. The time difference between fetching
the web page from the remote server and accessing it from the users cache is sufcient to identify if the
user has visited this page before. A more impressive result, directly related to cryptography, is presented in
Reference 108, where remote timing attacks on web servers implementing the SSL protocol are shown to
be practical and the malicious user can extract the servers certicate private key by measuring its response
times.
17.5.4 Fault-Based Cryptanalysis
The rst theoretical active attacks are presented in References 80 and 82. The attacks in the former paper
focused on RSA, when implemented with the Chinese Remainder Theorem (CRT) and Montgomery
multiplication method, and the Fiat-Shamir and the Schnorr identication schemes. The latter work
focuses on cryptosystems where security is based on the Discrete Logarithm Problem and presents attacks
on the ElGamal signature scheme, the Schnorr signature scheme and DSA. The attack on the Schnorr
signature scheme is extended, with some modication, to the identication scheme as well. Furthermore,
the second paper reports independently an attack on the RSAMontgomery. Since then, this area has been
quite active, both in developing attacks based on fault induction and countermeasures. The attacks have
succeeded in most of the popular and widely used algorithms. In the following, we give a brief review of
the bibliography.
The attacks on RSA with Montgomery have been extended by attacking the signing key, instead of
the message [109]. Furthermore, similar attacks are presented for LUC and KMOV (based on elliptic
curves) cryptosystems. In Reference 110, the attacks are generalized for any RSA-type cryptosystem, with
examples of LUC and Demytko cryptosystems. Faults can be used to expose the private key of RSAKEM
scheme [111] and transient faults can be used to derive the RSA and DSA secret keys from applications
compatible with the OpenPGPformat [112]. The Bellcore attack on the Fiat-Shamir scheme is shown to be
incomplete in Reference 94; the Precautious Fiat-Shamir scheme is introduced, which defends against it.
A new attack that succeeds against both the classical and the Precautious Fiat-Shamir scheme is presented
in Reference 113.
Beginning withBihamandShamir [114], fault-basedcryptanalysis focusedonsymmetric key cryptosys-
tems. DES is shown to be vulnerable to the so-called Differential Fault Analysis (DFA), using only 50 to 200
faulty ciphertexts. The method is also extended to unknown cryptosystems and an example of an attack on
the once classied algorithm SkipJack is presented. Another variant of the attack on DES takes advantage
of permanent instead of transient faults. The same ideas are also explored and extended for completely
unknown cryptosystems in Reference 115, while Jacob et al. [116] use faults to attack obfuscated ciphers
in software and extract secret material by avoiding de-obfuscating the code.
For some time it was believed that fault-induction attacks can only succeed on cryptographic schemes
based on algebraic-based hard mathematical problems, such as number factoring and discrete logarithm
computation. Elliptic Curve Cryptosystems (ECCs) are a preferable choice to implement cryptography,
since they offer equivalent security with that of algebraic public key algorithms, requiring only about
a tenth of key bits. Biehl et al. [117] extend the DFA on ECC and, especially, on schemes whose
security is based on the discrete logarithm problem over elliptic curve elds. Furthermore, Zheng and
Matsumoto [118] use transient and permanent faults to attack random number generators, a crucial
building block for cryptographic protocols, and the ElGamal signature scheme.
Rijndael [119] was nominated as the AES algorithm [120], the replacement of DES. The case of the
AES algorithm is quite interesting, considering that it was submitted after the introduction of SCA; thus,
authors have taken all the appropriate countermeasures to ensure that the algorithm resisted all known
cryptanalysis techniques applicable to their design. The original proposal [119] even noted timing attacks
and how they could be prevented. Koeune and Quisquater [121] describe how a careless implementation
2006 by Taylor & Francis Group, LLC
17-16 Embedded Systems Handbook
of the AES algorithm can utilize a timing attack and derive the secret key used. The experiments carried
showthat the key can be derived having 3000 samples per key byte, with minimal cost and high probability.
The proposal of the algorithm is aware of this issue and immune against such a simple attack. However,
DFA proved to be successful against AES. Although DFA was designed for attacking algorithms with a
Feistel structure, such as DES, Dusart et al. [122] show that it can be applied to AES, which does not have
such a structure. Four different fault-injection models are presented and the attacks succeed for all key
sizes (128, 192, and 256 bits). Their experiments show that with ten pairs of faulty/correct messages in
hand, a 128-bit AES key can be extracted in a few minutes. Blmer et al. [123] present additional fault-
based attacks on AES. The attack assumes multiple kinds of fault models. The stricter model, requiring
exact synchronization in space and time for the error injection, succeeds in deriving a 128-bit secret key
after collecting 128 faulty ciphertexts, while the least strict model derives the 128-bit key, after collecting
256 faulty ciphertexts.
17.5.4.1 Case Study: RSAChinese Remainder Theorem
The RSAcryptosystemremains a viable and preferable public key cryptosystem, having withstood years of
cryptanalysis [79]. The security of the RSA public key algorithm relies on the hardness of the problem of
factoring large numbers to prime factors. The elements of the algorithm are N = pq, the product of two
large prime numbers, that is, the public and secret exponents respectively, and the modular exponentiation
operation m
k
mod N. To sign a message m, the sender computes s = m
d
mod N, using his or her public
key. The receiver computes m = s
e
mod N to verify the signature of the received message.
The modular exponentiation operation is computationally intensive for large primes and it is the major
computational bottleneck inanRSAimplementation. The CRTallows fast modular exponentiation. Using
RSA with CRT, the sender computes s
1
= m
d
mod p, s
2
= m
d
mod q and combines the two results, based
on the CRT, to compute S = as
1
+ bs
2
mod N for some predened values a and b. The CRT method
is quite popular, especially for embedded systems, since it allows four times faster execution and smaller
memory storage for intermediate results (for this, observe that typically p and q have half the size of N).
The Bellcore attack [80,81], as it is commonly referenced, is quite simple and powerful against RSA
with CRT. It sufces to have one correct signature S for a message m and one faulty signature S
, which is
caused by an incorrect computation of one of the two intermediate results s
1
and s
2
. It does not matter
either if the error occurred on the rst or the second intermediate result, or how many bits were affected
by the error. Assuming that an error indeed occurred, it sufces to compute gcd(S S
)
d
, N) to reveal one of the two prime factors.
Boneh et al. [80] propose double computations as means to detect such erroneous computations.
However, this is not always efcient, especially in the case of resource-limited environments or where per-
formance is animportant issue. Also, this approachis of nohelpinthe case a permanent error has occurred.
Kaliski and Robshaw [125] propose signature verication, by checking the equality S
e
mod N = M. Since
the public exponent may be quite large, this check can be rather time consuming for a resource-limited
system.
Shamir [126] describes a software method for protecting RSA with CRT from fault and timing attacks.
The idea is to use a random integer t and perform a blinded CRT by computing: S
pt
= m
d
mod p t
and S
qt
= m
d
mod q t . If the equality S
pt
= S
qt
mod t holds, then the initial computation is considered
error-free and the result of the CRT can be released from the device. Yen et al. [127] further improve this
countermeasure for efcient implementation without performance penalties, but Blmer et al. [128] show
that this improvement in fact renders RSA with CRT totally insecure. Aumller et al. [91] provide another
software implementation countermeasure for faulty RSACRT computations. However, Yen et al. [129],
using a weak fault model, showthat both these countermeasures [91,126] are still vulnerable, if the attacker
focuses on the modular reduction operation s
p
= s
p
mod p of the countermeasures. The attacks are valid
for both transient and permanent errors and again, appropriate countermeasures are proposed.
2006 by Taylor & Francis Group, LLC
Design Issues in Secure Embedded Systems 17-17
As we show, the implementation of error checking functions using the nal or intermediate results of
RSA computations can create an additional side meta-channel, although faulty computations never leave
a sealed device. Assume that an attacker knows that a bit in a register holding part of the key was invasively
set to zero during the computation and that the device checks the correctness of the output by double
computation. If the device outputs a signed message, then no error was detected and thus, the respective
bit of the key is zero. If the device does not output a signed message or outputs an error message, then
the respective bit of the key is one. Such a safe-error attack is presented in Reference 130, focusing on the
RSA when implemented with Montgomery multiplication. Yen et al. [131] extend the idea of safe-error
attacks frommemory faults to computational faults and present such an attack on RSAwith Montgomery,
which can also be applied to scalar multiplication on elliptic curves.
An even simpler attack would be to attack both an intermediate computation and the condition check.
A condition check can be a single point of failure and an attacker can easily mount an attack against it,
provided that he or she has means to introduce errors in computations [128]. Indeed, in most cases,
a condition check is implemented as a bit comparison with a zero ag. Blmer et al. [128] extend the ideas
of checking vulnerable points of computation by exhaustively testing every computation performed for
an RSACRT signing, including the CRT combination. The proposed solution seems the most promising
at the moment, allowing only attacks by powerful adversaries that can solve precisely the timespace
isolation problem. However, it should be already clear that advancements in this area of cryptanalysis are
continuous and they should be always prepared to adapt to new attacks.
17.5.5 Passive Side-Channel Cryptanalysis
Passive side-channel cryptanalysis has received a lot of attention, since its introduction in 1996 by Paul
Kocher [78]. Passive attacks are considered harder to defend against and many people are concerned, due
to their noninvasive nature. Fault-induction attacks require some form of manipulating the device and
thus, sensors or other similar means can be used to detect such actions and shut down or even zero out
the device. In the case of passive attacks, the physical characteristics of the device are just monitored,
usually with readily available probes and other hardware. So, it is not an easy task to detect the presence of
a malicious user, especially in the case where only a fewmeasurements are required or abnormal operation
(such as continuous requests for encryptions/decryptions) can not be identied.
The rst results are by Kocher [78]. Timing variations in the execution of a cryptographic algorithm
such as DifeHellman key exchange, RSA, and DSS are used to derive bit-by-bit the secret keys of these
algorithms. Although mentioned before, we should emphasize that timing attacks and other forms of
passive SCA require knowledge of the exact implementation of the cryptographic algorithm under attack.
Dhemet al. [132] describe a timing attack against the RSAsignature algorithm. The attack derives a 512-bit
secret key with 200,000 to 300,000 timing measurements. Schindler et al. [133] improve the timing attacks
on RSAmodular exponentiation by a factor of 50, allowing extraction of a 512-bit key using as fewas 5,000
timing measurements. The approach used is an error-correction (estimator) function, which can detect
erroneous bit detections as key extraction process evolves. Hevia and Kiwi [134] introduce a timing attack
against DES, which reveals the Hamming weight of the key, by exploiting the fact that a conditional bit
wrap around function results on variable execution time of the software implementing the algorithm.
They succeed in recovering the Hamming weight of the key and 3.95 key bits (out of a 56-bit key). The
most threatening issue is that keys with lowor high Hamming weight are sparse; so, if the attack reveals that
the key has such a weight, the key space that must be searched reduces dramatically. The RC5 algorithm
has also been subjected to timing attacks, due to conditional statement execution in its code [135].
Kocher et al. [101] extend the attackers arsenal further by introducing the vulnerability of DES to power
analysis attacks and more specically to Differential Power Analysis (DPA), a technique that combines
differential cryptanalysis and careful engineering, and to Simple Power Analysis (SPA). SPA refers to
power analysis attacks that can be performed only by monitoring a single or a few power traces, probably
with the same encryption key. SPA succeeds in revealing the operations performed by the device, such as
permutations, comparisons, and multiplications. Practically, any algorithm implementation that executes
2006 by Taylor & Francis Group, LLC
17-18 Embedded Systems Handbook
INPUT :M,N,d=(d
n1
d
n2
...d
1
d
0
)
2
OUTPUT :S=M
d
mod N
S=1;
for (i =n1; i >=0;i--) {
S=S
2
mod N;
if (d
i
==1) {
S=S
M mod N;
}
}
return S;
FIGURE 17.1 Left-to-right repeated square-and-multiply algorithm.
some statements conditionally, based on data or key material, is at least susceptible to power analysis
attacks. This holds for public key, secret key, and ECC. DPA has been successfully applied at least to block
ciphers, such as IDEA, RC5, and DES [83].
Electromagnetic Attacks (EMAs) have contributed some impressive results on what information can
be reconstructed. Gandol et al. [136] report results from cryptanalysis of real-world cryptosystems, such
as DES and RSA. Furthermore, they demonstrate that electromagnetic emissions may be preferable to
power analysis, in the sense that fewer traces are needed to mount an attack and these traces carry richer
information to derive the secret keys. However, the full power of EMA attacks has not been utilized yet
and we should expect more results on real-world cryptanalysis of popular algorithms.
17.5.5.1 Case Study: RSAMontgomery
Previously, we explained the importance of a fast modular exponentiation primitive for the RSA
cryptosystem. Montgomery multiplication is a fast implementation of this primitive function [137].
The left-to-right repeated square-and-multiply method is depicted in Figure 17.1, in C pseudocode.
The timing attack of Kocher [78] exploits the timing variation caused by the condition statement on
the fourth line. If the respective bit of the secret exponent is 1, then a square (line 3) and multiply (line 5)
operations are executed, while if the bit is 0 only a square operation is performed. In summary, the exact
time of executing the loop n times is only dependent on the exact values of the bits of the secret exponent.
An attacker proceeds as follows. Assume that the rst m bits of the secret exponent are known. The
attacker has an identical device with that containing the secret exponent and can control the key used for
each encryption. The attacker collects from the attacked device the total execution time T
1
, T
2
, . . . , T
k
of
each signature operation on some known messages, M
1
, M
2
, . . . , M
k
. He also performs the same operation
onthe controlleddevice, soas tocollect another set of measurements, t
1
, t
2
, . . . , t
k
, where he xes the mrst
bits of the key, targeting for the m +1 bit. Kochers key observation is that, if the unknown bit d
m+1
= 1,
then the two sets of measurements are correlated. If d
m+1
= 0, then the two sets behave like independent
random variables. This differentiation allows the attacker to extract the secret exponent bit-by-bit.
Depending on the implementation, a simpler form of the attack could be implemented. SPA does
not require lengthy statistical computations but rather relies on power traces of execution proles of
a cryptographic algorithm. For this example, Schindler et al. [133] explain how we can use the power
proles. Execution of line 5 in the above code requires an additional multiplication. Even if the spikes
in power consumption of the squaring and multiplication operations are indistinguishable, the multipli-
cation requires additional load operations and thus, power spikes will be wider than in the case where
only squaring is performed.
17.5.6 Countermeasures
In the previous sections we provided a review of SCA, both fault-based and passive. In this section, we
review the countermeasures that have been proposed. The list is most exhaustive and new results appear
continuously, since countermeasures are steadily improving.
2006 by Taylor & Francis Group, LLC
Design Issues in Secure Embedded Systems 17-19
The proposedcountermeasures canbe classiedintotwomainclasses: hardware protectionmechanisms
and mathematical protection mechanisms. Arst layer of protection against SCAare hardware protection
layers, such as passivation layers that do not allow direct access between a (malicious) user and the system
implementing the cryptographic algorithm or memory address bus obfuscation. Various sensors can also
be embodied in the device, in order to detect and react to abnormal environmental conditions, such as
extreme temperatures, power, and clock variations. Such mechanisms are widely employed in smartcards
for nancial transactions and other high-risk applications. Such protection layers can be effective against
fault-injection attacks, since they shield the device against external manipulation. However, they cannot
protect the device from attacks based on external observation, such as power analysis techniques.
The previous countermeasures do not alter the current designs of the circuits, but rather add protection
layers on top of them. A second approach is the design of a new generation of chips to imple-
ment cryptographic algorithms and to process sensitive information. Such circuits have asynchronous/
self-clocking/dual rail logic; each part of the circuit may be clocked independently [138]. Fault attacks
that rely on external clock manipulation (such as glitch attacks) are not feasible in this case. Furthermore,
timing or power analysis attacks become harder for the attacker, since there is no global clock that correl-
ates the input data and the emitted power. Such countermeasures have the potential to become a common
practice. Their application, however, must be carefully evaluated, since they may occupy a large area of the
circuit; such expansions are justied by manufacturers usually in order to increase the systems available
memory and not to implement another security feature. Furthermore, such mechanisms require changes
in the production line, which is not always feasible.
A third approach targets to implement the cryptographic algorithms so that no key information leaks.
Proposed approaches include modifying the algorithm to run in constant time, adding random delays
in the execution of the algorithm, randomizing the exact sequence of operations without affecting the
nal result, and adding dummy operations in the execution of the algorithm. These countermeasures
can defeat timing attacks, but careful design must be employed to defeat power analysis attacks too. For
example, dummy operations or random delays are easily distinguishable in a power trace, since they tend
to consume less power than ordinary cryptographic operations. Furthermore, differences in power traces
between proles of known operations can also reveal permutation of operations. For example, a modular
multiplication is known to consume more power than a simple addition, so if the execution order is
interchanged, they will still be identiable.
In more resource-rich systems, where high-level programming languages are used, compiler or human
optimizations can remove these artifacts from the program or change the implementation resulting to
vulnerability against SCA. The same holds, if memory caches are used and the algorithm is implemented
so that the latency between cache and main memory can be detected, either by timing or power traces.
Insertion of random delays or other forms of noise should also be considered carefully, because a large
mean value of delay translates directly to reduced performance, which is not always acceptable.
The second class of countermeasures focuses on the mathematical strengthening of the algorithms
against such attacks. The RSA blinding technique by Shamir [126] is such an example; the proposed
method guards the system from leaking meaningful information, because the leaked information is
related to the random number used for blinding instead of the key; thus, even if the attacker manages
to reveal a number, this will be the random number and not the key. It should be noted, however, that
a different random number is used for each signing or encryption operation. Thus, the faults injected
in the system will be applied on a different, random number every time and the collected information is
useless.
At a crossline between mathematical and implementation protection, it is proposed to check crypto-
graphic operations for correctness, in case of fault-injection attacks. However, these checks can be also
exploited as side channels of information or can degrade performance signicantly. For example, double
computations and comparison of the results halves the throughput an implementation can achieve;
furthermore, in the absence of other countermeasures, the comparison function can be bypassed (e.g., by
a clock glitch or a fault injection in the comparison function) or used as a side channel as well. If multiple
checks are employed, measuring the rejection time can reveal in what stage of the algorithm the error
2006 by Taylor & Francis Group, LLC
17-20 Embedded Systems Handbook
occurred; if the checks are independent, this can be utilized to extract the secret key, even when the
implementation does not output the faulty computation [111,139].
17.6 Conclusions
Security constitutes a signicant requirement in modern embedded computing systems. Their widespread
use in services that involve sensitive information in conjunction with their resource limitations have led to
a signicant number of innovative attacks that exploit system characteristics and result in loss of critical
information. Development of secure embedded systems is an emerging eld in computer engineering
requiring skills from cryptography, communications, hardware, and software.
In this chapter, we surveyed the security requirements of embedded computing systems and described
the technologies that are more critical to them, relatively to general-purpose computing systems. Consid-
ering the innovative system (side-channel) attacks that were developed with motivation to break secure
embedded systems, we presented in detail the known SCA and described the technologies for counter-
measures against the known attacks. Clearly, the technical area of secure embedded systems is far from
mature. Innovative attacks and successful countermeasures are continuously emerging, promising an
attractive and rich technical area for research and development.
References
[1] W. Wolf, Computers as Components Principles of Embedded Computing Systems Design. Elsevier,
Amsterdam, 2000.
[2] W. Freeman and E. Miller, An experimental analysis of cryptographic overhead in
performance critical systems. In Proceedings of the Seventh International Symposium on
Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 1999, p. 348.
[3] S. Ravi, P. Kocher, R. Lee, G. McGraw, and A. Raghunathan, Security as a new dimension in
embedded system design. In Proceedings of the 41st Annual Conference on Design Automation,
2004, pp. 753760.
[4] S. Ravi, A. Raghunathan, P. Kocher, and S. Hattangady, Security in embedded systems: design
challenges. Transactions on Embedded Computing Systems, 3, 461491, 2004.
[5] D.G. Abraham, G.M. Dolan, G.P. Double, and J.V. Stevens, Transaction security system. IBM
Systems Journal, 30, 206229, 1991.
[6] S.H. Weingart, Physical security devices for computer subsystems: a survey of attacks and defenses.
In Cryptographic Hardware and Embedded Systems CHES 2000: Second International Workshop,
2000, p. 302.
[7] R. Anderson and M. Kuhn, Tamper resistance a cautionary note. In Proceedings of the Second
Usenix Workshop on Electronic Commerce, 1996, pp. 111.
[8] S. Blythe, B. Fraboni, S. Lall, H. Ahmed, and U. de Riu, Layout reconstruction of complex silicon
chips. IEEE Journal of Solid-State Circuits, 28, 138145, 1993.
[9] C.E. Landwehr, A.R. Bull, J.P. McDermott, and W.S. Choi, A taxonomy of computer program
security aws. ACMComputing Surveys, 26, 211254, 1994.
[10] H. Greg and M. Gary, Exploiting Software: How to Break Code. Addison-Wesley Professional,
Reading, MA, 2004.
[11] J.J. Tevis and J.A. Hamilton, Methods for the prevention, detection and removal of software
security vulnerabilities. In Proceedings of the 42nd Annual Southeast Regional Conference, 2004,
pp. 197202.
[12] P. Kocher, SSL 3.0 specication. http://wp.netscape.com/eng/ssl3/
[13] IETF, IPSec working group. http://www.ietf.org/html.charters/ipsec-charter.html
[14] T. Wollinger, J. Guajardo, and C. Paar, Security on FPGAs: state-of-the-art implementations and
attacks. Transactions on Embedded Computing Systems, 3, 534574, 2004.
2006 by Taylor & Francis Group, LLC
Design Issues in Secure Embedded Systems 17-21
[15] S.H. Gunther, F. Binns, D.M. Carmean, and J.C. Hall, Managing the impact of
increasing microprocessor power consumption. Intel Journal of Technology, Q1: 9, 2001.
http://developer.intel.com/technology/itj/q12001/articles/art_4.htm
[16] I. Buchmann, Batteries in a Portable World, 2nd ed. Cadex Electronics Inc, May 2001.
[17] K. Lahiri, S. Dey, D. Panigrahi, and A. Raghunathan, Battery-driven system design: a new fron-
tier in low power design. In Proceedings of the 2002 Conference on Asia South Pacic Design
Automation/VLSI Design, 2002, p. 261.
[18] T. Martin, M. Hsiao, D. Ha, and J. Krishnaswami, Denial-of-service attacks on battery-powered
mobile computers. In Proceedings of the Second IEEE International Conference on Pervasive
Computing and Communications (PerCom04), 2004, p. 309.
[19] N.R. Potlapally, S. Ravi, A. Raghunathan, and N.K. Jha, Analyzing the energy consumption of
security protocols. In Proceedings of the 2003 International Symposium on Low Power Electronics
and Design, 2003, pp. 3035.
[20] D.W. Carman, P.S. Kruus, and B.J. Matt, Constraints and approaches for distrib-
uted sensor network security. NAI Labs, Technical report 00-110, 2000. Available
at: http://www.cs.umbc.edu/courses/graduate/CMSC691A/Spring04/papers/nailabs_report_00-
010_nal.pdf
[21] A. Raghunathan, S. Ravi, S. Hattangady, and J. Quisquater, Securing mobile appliances: new
challenges for the system designer. In Design, Automation and Test in Europe Conference and
Exhibition (DATE03). IEEE, 2003, p. 10176.
[22] V. Raghunathan, C. Schurgers, S. Park, and M. Srivastava, Energy aware wireless microsensor
networks. IEEE Signal Processing Magazine 19, 4050, 2002.
[23] A. Savvides, S. Park, and M.B. Srivastava, On modeling networks of wireless microsensors.
In Proceedings of the 2001 ACM SIGMETRICS International Conference on Measurement and
Modeling of Computer Systems, 2001, pp. 318319.
[24] Rockwell Scientic, Wireless integrated networks systems. http://wins.rsc.rockwell.com
[25] A. Hodjat and I. Verbauwhede, The energy cost of secrets in ad-hoc networks (Short paper).
http://citeseer.ist.psu.edu/hodjat02energy.html
[26] B. Schneier, Applied Cryptography: Protocols, Algorithms, and Source Code in C. John Wiley &
Sons, New York, 1995.
[27] N. Daswani and D. Boneh, Experimenting with electronic commerce on the PalmPilot.
In Proceedings of the Third International Conference on Financial Cryptography, 1999,
pp. 116.
[28] A. Perrig, J. Stankovic, and D. Wagner, Security in wireless sensor networks. Communications of
the ACM, 47, 5357, 2004.
[29] S. Ravi, A. Raghunathan, and N. Potlapally, Securing wireless data: system architecture challenges.
In Proceedings of the 15th International Symposium on System Synthesis, 2002, pp. 195200.
[30] IEEE 802.11 Working Group, IEEE 802.11 wireless LAN standards. http://grouper.ieee.org/
groups/802/11/
[31] 3GPP, 3G Security; Security Architecture. 3GPP Organization, TS 33.102, 30-09-2003,
Rel-6, 2003.
[32] Intel Corporation, VPN and WEP, wireless 802.11b security in a corporate environment.
http://www.intel.com/business/bss/infrastructure/security/vpn_wep.htm
[33] NIST, FIPS PUB 140-2 security requirements for cryptographic modules. Available at
http://csrc.nist.gov/cryptval/140-2.htm
[34] J. Lach, W.H. Mangione-Smith, and M. Potkonjak, Fingerprinting digital circuits on program-
mable hardware. In Information Hiding: Second International Workshop, IH98, Vol. 1525 of
Lecture Notes in Computer Science, Springer-Verlag, 1998, pp. 1631.
[35] J. Burke, J. McDonald, and T. Austin, Architectural support for fast symmetric-key cryptography.
In Proceedings of the Ninth International Conference on Architectural Support for Programming
Languages and Operating Systems, 2000, pp. 178189.
2006 by Taylor & Francis Group, LLC
17-22 Embedded Systems Handbook
[36] L. Wu, C. Weaver, and T. Austin, CryptoManiac: a fast exible architecture for secure communi-
cation. In Proceedings of the 28th Annual International Symposium on Computer Architecture,
2001, pp. 110119.
[37] Inneon, SLE 88 Family Products. http://www.inneon.com/
[38] ARM, ARM SecurCore Family, Vol. 2004. http://www.arm.com/products/CPUs/securcore.html
[39] S. Moore, Enhancing Security Performance Through IA-64 Architecture, 2000. Intel Corp.,
http://www.intel.com/cd/ids/developer/asmo-na/eng/microprocessors/itanium/index.htm
[40] N. Potlapally, S. Ravi, A. Raghunathan, and G. Lakshminarayana, Optimizing public-key encryp-
tion for wireless clients. In Proceedings of the IEEE International Conference on Communications,
May 2002.
[41] MIPS Inc., SmartMIPS Architecture, Vol. 2004. http://www.mips.com/ProductCatalog/
P_SmartMIPSASE/productBrief
[42] Open Mobile Appliance, http://www.wapforum.org/what/technical.htm
[43] Mobile Electronic Transactions, http://www.mobiletransaction.org/
[44] Z. Shao, C. Xue, Q. Zhuge, E.H. Sha, and B. Xiao, Security protection and checking in embedded
system integration against buffer overow attacks. In Proceedings of the International Conference
on Information Technology: Coding and Computing (ITCC04), Vol. 2, 2004, pp. 409.
[45] S. Biswas, M. Simpson, and R. Barua, Memory overow protection for embedded systems using
run-time checks, reuse and compression. In Proceedings of the 2004 International Conference on
Compilers, Architecture, and Synthesis for Embedded Systems, 2004, pp. 280291.
[46] J. You, Wai-Kin Kong, D. Zhang, and King Hong Cheung, On hierarchical palmprint coding with
multiple features for personal identication in large databases. IEEE Transactions on Circuits and
Systems for Video Technology, 14, 234243, 2004.
[47] K.C. Chan, Y.S. Moon, and P.S. Cheng, Fast ngerprint verication using subregions of
ngerprint images. IEEE Transactions on Circuits and Systems for Video Technology, 14,
95101, 2004.
[48] A.K. Jain, A. Ross, and S. Prabhakar, An introduction to biometric recognition. IEEE Transactions
on Circuits and Systems for Video Technology, IEEE, 14, 420, 2004.
[49] Y.S. Moon, H.C. Ho, and K.L. Ng, A secure smart card system with biometrics capability.
In Proceedings of the IEEE 1999 Canadian Conference on Electrical and Computer Engineering,
1999, pp. 261266.
[50] T.Y. Tang, Y.S. Moon, and K.C. Chan, Efcient implementation of ngerprint verication
for mobile embedded systems using xed-point arithmetic. In Proceedings of the 2004 ACM
Symposium on Applied Computing, 2004, pp. 821825.
[51] L. Benini, A. Macii, and M. Poncino, Energy-aware design of embedded memories: a survey of
technologies, architectures, and optimization techniques. Transactions on Embedded Computing
Systems, 2, 532, 2003.
[52] Scott Rosenthal, Serial EEPROMs provide secure data storage for embedded systems. SLTF
Consulting, http://www.sltf.com/articles/pein/pein9101.htm
[53] Actel Corporation, Design security in nonvolatile ash and antifuse FPGAs. Technical report
5172163-0/11.01, 2001.
[54] Trusted Computing Group: Home. TCG , https://www.trustedcomputinggroup.org/home
[55] D.N. Serpanos and R.J. Lipton, Defense against man-in-the-middle attack inclient-server systems
with secure servers. In Proceedings of IEEE ISCC2001. Hammammet, Tunisia, July 35, 2001,
pp. 914.
[56] R.J. Lipton, S. Rajagopalan, and D.N. Serpanos, Spy: a method to secure clients for network
services. Proceedings of the 22nd International Conference on Distributed Computing Systems
Workshops (Workshop ADSN2002). Vienna, Austria, July 25, 2002, pp. 2328.
[57] T.S. Messerges and E.A. Dabbish, Digital rights management in a 3G mobile phone and
beyond. In Proceedings of the 2003 ACM Workshop on Digital Rights Management, 2003,
pp. 2738.
2006 by Taylor & Francis Group, LLC
Design Issues in Secure Embedded Systems 17-23
[58] D.L.C. Thekkath, M. Mitchell, P. Lincoln, D. Boneh, J. Mitchell, and M. Horowitz, Architec-
tural support for copy and tamper resistant software. In Proceedings of the Ninth International
Conference on Architectural Support for Programming Languages and Operating Systems, 2000,
pp. 168177.
[59] J.H. Saltzer and M.D. Schroder, The protection of information in computer systems.
Proceedings of the IEEE, 63, 12781308, 1975.
[60] T. King, Security + Training Guide. Que Cerication, Boger, Paul, 2003.
[61] Kingpin and Mudge, Security analysis of the palm operating system and its weaknesses
against malicious code threats. In Proceedings of the 10th Usenix Security Symposium, 2001,
pp. 135152.
[62] A.D. Rubin and D.E. Geer Jr., Mobile code security. Internet Computing, IEEE, 2,
3034, 1998.
[63] V.N. Venkatakrishnan, R. Peri, and R. Sekar, Empowering mobile code using expressive security
policies. In Proceedings of the 2002 Workshop on New Security Paradigms, 2002, pp. 6168.
[64] G.C. Necula, Proof-carrying code. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium
on Principles of Programming Languages (POPL 97), 1997, pp. 106119.
[65] P. Gutmann, Lessons learned in implementing and deploying crypto software. In Proceedings of
the 11th USENIX Security Symposium, 2002, pp. 315325.
[66] R.J. Anderson, Why cryptosystems fail. In Proceedings of ACM CSS93, ACM Press, pp. 215217,
November 1993.
[67] Andrew J. Clark, Physical protection of cryptographic device. In Proceedings of Eurocrypt 87,
1987, pp. 8393.
[68] D. Chaum, Design concepts for tamper-responding system. In Advances in Cryptology Proceedings
of Crypto 83, 1983, pp. 387392.
[69] S.H. Weingart, S.R. White, W.C. Arnold, and G.P. Double, An evaluation system for the physical
security of computing systems. In Proceedings of the Sixth Annual Computer Security Applications
Conference, 1990, pp. 232243.
[70] IBM Corporation, IBM PCI Cryptographic Coprocessor, September, 2004. Available at http://
www-3.ibm.com/security/cryptocards/html/pcicc.shtml
[71] EFF, U.S. v. ElcomSoft and Sklyarov FAQ, September, 2004. Available at http://www.eff.org/IP/
DMCA/US_v_Elcomsoft/us_v_sklyarov_faq.html
[72] A. Huang, Keeping secrets in hardware: the Microsoft Xbox case study. In Revised Papers from
the Fourth International Workshop on Cryptographic Hardware and Embedded Systems, 2003,
pp. 213227.
[73] Kingpin, Attacks on and countermeasures for USB hardware token device. In Proceedings of the
Fifth Nordic Workshop on Secure IT Systems Encouraging Co-operation, 2000, pp. 135151.
[74] D.S. Touretzky, Gallery of CSS Descramblers, September 2004. Available at http://www.cs.
cmu.edu/dst/DeCSS/Gallery
[75] S.L. Garnkel and A. Shelat, Remembrance of data passed: a study of disk sanitization practices.
IEEE Security and Privacy Magazine, 1, 1727, 2003.
[76] S. Skorobogatov, Low temperature data remanence in static RAM. Technical report UCAM-CL-
TR-536, University of Cambridge, 2002.
[77] P. Gutman, Data remanence insemiconductor devices. In Proceedings of the 10th USENIXSecurity
Symposium, 2001.
[78] P.C. Kocher, Timing attacks on implementations of Dife-Hellman RSA DSS and other systems.
In Proceedings of CRYPTO 96, Lecture Notes in Computer Science, 1996, pp. 104113.
[79] D. Boneh, Twenty years of attacks on the RSAcryptosystem. Notices of the American Mathematical
Society (AMS), 46, 203213, 1999.
[80] Dan Boneh, Richard A. DeMillo, and Richard J. Lipton, On the importance of checking cryp-
tographic protocols for faults. In Proceedings of Eurocrypt97, Vol. 1233 of Lecture Notes in
Computer Science, 1997, pp. 3751.
2006 by Taylor & Francis Group, LLC
17-24 Embedded Systems Handbook
[81] Dan Boneh, Richard A. DeMillo, and Richard J. Lipton, On the importance of eliminating errors
in cryptographic computations. Journal of Cryptology: The Journal of the International Association
for Cryptologic Research, 14, 101119, 2001.
[82] F. Bao, R.H. Deng, Y. Han, A.B. Jeng, A.D. Narasimhalu, and T. Ngair, Breaking public key
cryptosystems on tamper resistant devices in the presence of transient faults. In Proceedings of
the Fifth International Workshop on Security Protocols, 1998, pp. 115124.
[83] John Kelsey, Bruce Schneier, David Wagner, and Chris Hall, Side channel cryptanalysis of product
ciphers. In Proceedings of ESORICS 1998, 1998, pp. 97110.
[84] Markus G. Kuhn, Compromising emanations: eavesdropping risks of computer displays.
Technical report UCAM-CL-TR-577, University of Cambridge, December 2003.
[85] Intel Corporation, Analysis of the oating point aw in the Pentium processor.
November 1994. Available at http://support.intel.com/support/processors/pentium/fdiv/wp/
(September 2004).
[86] Jean-Jacques Quisquater and David Samyde, ElectroMagnetic analysis (EMA): measures and
countermeasures for smart cards. In Proceedings of the International Conference on Research in
Smart Cards, E-Smart 2001, Lecture Notes in Computer Science, 2001, pp. 200210.
[87] Oliver Kmmerling and Markus G. Kuhn, Design principles for tamper-resistant smartcard
processors. In Proceedings of the USENIX Workshop on Smartcard Technology (Smartcard 99).
USENIX Association, Chicago, IL, May 1011, 1999, pp. 920.
[88] D.P. Maher, Fault induction attacks, tamper resistance, and hostile reverse engineering in per-
spective. In Proceedings of the First International Conference on Financial Cryptography, 1997,
pp. 109122.
[89] Ross J. Anderson and Markus G. Kuhn, Low cost attacks on tamper resistant devices. InProceedings
of the Fifth International Security Protocols Conference, Vol. 1361 of Lecture Notes on Computer
Science. M. Lomas et al. Ed. Springer-Verlag, Paris, France, April 79, 1997, pp. 125136.
[90] Ross J. Anderson, Security Engineering: A Guide to Building Dependable Distributed Systems. John
Wiley & Sons, New York, 2001.
[91] C. Aumller, P. Bier, W. Fischer, P. Hofreiter, and J. Seifert, Fault attacks on RSA with CRT:
concrete results and practical countermeasures. In Revised Papers from the Fourth International
Workshop on Cryptographic Hardware and Embedded Systems, Springer-Verlag, 2003, pp. 260275.
[92] David Samyde, Sergei Skorobogatov, Ross Anderson, and Jean-Jacques Quisquater, On a new
way to read data from memory. In Proceedings of CHES 2002, Lecture Notes in Computer
Science, 2003.
[93] Hagai Bar-El, Hamid Choukri, David Naccache, Michael Tunstall, and Claire Whelan, The
Sorcerers apprentice guide to fault attacks. In Workshop on Fault Diagnosis and Tolerance in
Cryptography, 2004.
[94] Artemios G. Voyiatzis and Dimitrios N. Serpanos, Active hardware attacks and proactive
countermeasures. In Proceedings of IEEE ISCC 2002, 2002.
[95] Markus G. Kuhn, Optical time-domain eavesdropping risks of CRT displays. In Proceedings of
the IEEE Symposium on Security and Privacy, 2002, pp. 318.
[96] Markus G. Kuhn, Electromagnetic eavesdropping risks of at-panel displays. Presented at the
Fourth Workshop on Privacy Enhancing Technologies, May 2628, 2004, Toronto, Canada.
[97] J. Loughry and D.A. Umphress, Information leakage from optical emanations. ACM Transactions
on Information and SystemSecurity, 5, 262289, 2002.
[98] P. Wright, Spycatcher: The Candid Autobiography of a Senior Intelligence Ofcer. Viking, NY, 1987.
[99] Adi Shamir and Eran Tromer, Acoustic cryptanalysis on noisy people and noisy
machines. In Eurocrypt 2004 Rump Session Presentation, September, 2004. Available at
http://www.wisdom.weizmann.ac.il/tromer/acoustic/
[100] J. Coron, D. Naccache, and P. Kocher, Statistics and secret leakage. ACMTransactions on Embedded
Computing Systems, 3, 492508, 2004.
2006 by Taylor & Francis Group, LLC
Design Issues in Secure Embedded Systems 17-25
[101] P. Kocher, J. Jaffe, and B. Jun, Differential power analysis. In Proceedings of the CRYPTO 99, IACR,
1999, pp. 388397.
[102] Josyula R. Rao, and Pankaj Rohatgi, Empowering side-channel attacks. IACR Crypto-
graphy ePrint Archive: report 2001/037, September, 2004. Available at http://eprint.iacr.org/
2001/037/
[103] Mehdi-Laurent Akkar, Rgis Bevan, Paul Dischamp, and Didier Moyar, Power analysis, what is
now possible. In Advances in Cryptology ASIACRYPT 2000: 6th International, Springer-Verlag,
2000, pp. 489502.
[104] Thomas S. Messerges, Ezzy A. Dabbish, and Robert H. Sloan, Investigation of power ana-
lysis attacks on smartcards. In Proceedings of USENIX Workshop on Electronic Commerce, 1999,
pp. 151161.
[105] D. Page, Theoretical use of cache memory as a cryptanalytic side-channel. Technical report
CSTR-02-003, Computer Science Department, University of Bristol, Bristol, 2002.
[106] Dawn Xiaodong Song, David Wagner, and Xuqing Tian, Timing analysis of keystrokes and timing
attacks on SSH. In Proceedings of the 10th USENIX Security Symposium, USENIX Association,
2001.
[107] E.W. Felten and M.A. Schneider, Timing attacks on web privacy. In Proceedings of the Seventh
ACM Conference on Computer and Communications Security, ACM Press, 2000, pp. 2532.
[108] David Brumley and Dan Boneh, Remote timing attacks are practical. In Proceedings of the 12th
USENIX Security Symposium, 2003.
[109] J. Marc and Q. Jean-Jacques, Faulty RSA encryption. Technical report CG-1997/8, UCL Crypto
Group, 1997.
[110] Marc Joye and Jean-Jacques Quisquater, Attacks on systems using Chinese remaindering.
Technical report CG1996/9, UCL Crypto Group, Belgium, 1996.
[111] Vlastimil Klma and Tom Rosa, Further results and considerations on side channel attacks
on RSA. IACR Cryptography ePrint Archive: report 2002/071, September 2004. Available at
http://eprint.iacr.org/2002/071/
[112] Vlastimil and Tom Rosa, Attack on private signature keys of the OpenPGP format, PGP(TM)
programs and other applications compatible with OpenPGP. IACR Cryptology ePrint Archive
report 2002/073, IACR, September 2004. Available at http://eprint.iacr.org/2002/076.pdf
[113] A.G. Voyiatzis and D.N. Serpanos, A fault-injection attack on Fiat-Shamir cryptosystems.
In Proceedings of the 24th International Conference on Distributed Computing Systems Workshops
(ICDCS 2004 Workshops), 2004, pp. 618621.
[114] Eli Biham and Adi Shamir, Differential fault analysis of secret key cryptosystems. Lecture Notes
in Computer Science. Springer-Verlag, 1294, 513525, 1997.
[115] P. Paillier, Evaluating differential fault analysis of unknown cryptosystems. In Proceedings of
the Second International Workshop on Practice and Theory in Public Key Cryptography, 1999,
pp. 235244.
[116] M. Jacob, D. Boneh, and E. Felten, Attacking an obfuscated cipher by injecting faults. In
Proceedings of the 2002 ACMWorkshop on Digital Rights Management, 2002.
[117] Ingrid Biehl, Bernd Meyer, andVoker Mller, Differential fault attacks on elliptic curve cryptosys-
tems. In Proceedings of CRYPTO 2000, Vol. 1880 of Lecture Notes in Computer Science, 2000,
pp. 131146.
[118] Y. Zheng and T. Matsumoto, Breaking real-world implementations of cryptosystems by manipu-
lating their random number generation. In Proceedings of the 1997 Symposium on Cryptography
and Information Security, 1997.
[119] Joan Daemen and Vincent Rijmen, The block cipher Rijndael. In Proceedings of Smart Card
Research and Applications 2000, Lecture Notes in Computer Science, 2000, pp. 288296.
[120] NIST, NIST, Advanced Encryption Standard (AES), Federal Information Processing Standards
Publication 1997, November 26, 2001.
2006 by Taylor & Francis Group, LLC
17-26 Embedded Systems Handbook
[121] Franois Koeune and Jean-Jacques Quisquater, A timing attack against Rijndael. Technical report
CG-1999/1, Universite Catholique de Louvain, 1999.
[122] P. Dusart, L. Letourneux, and O. Vivolo, Differential fault analysis on AES. In Proceedings of
the International Conference on Applied Cryptography and Network Security, Lecture Notes in
Computer Science, 2003, pp. 293306.
[123] Johaness Blmer and Jean-Pierre Seifert, Fault-based cryptanalysis of the advanced encryption
standard (AES). In Financial Cryptography 2003, Vol. 2742 of Lecture Notes in Computer Science,
2003, pp. 162181.
[124] Arjen Lenstra, Memo on RSA signature generation in the presence of faults. September 28, 1996.
(Manuscript, available from the author.)
[125] B. Kaliski and M.J.B. Robshaw, Comments on some new attacks on cryptographic devices.
RSA Laboratories Bulletin, 5 July, 1997.
[126] Adi Shamir, Method and apparatus for protecting public key schemes from timing and fault
attacks. US Patent No. 5,991,415, United States Patent and Trademark Ofce, November 23, 1999.
[127] S. Yen, S. Kim, S. Lim, and S. Moon, RSA speedup with residue number system immune against
hardware fault cryptanalysis. In Proceedings of the Fourth International Conference on Information
Security and Cryptology, Seoul, 2002, pp. 397413.
[128] J. Blmer, M. Otto, and J. Seifert, A new CRT-RSA algorithm secure against bellcore attacks.
In Proceedings of the 10th ACM Conference on Computer and Communication Security, 2003,
pp. 311320.
[129] Sung-Ming Yen, Sangjae Moon, and Jae-Cheol Ha, Hardware fault attack on RSA with
CRT revisited. In Proceedings of ICISC 2002, Lecture Notes in Computer Science, 2003,
pp. 374388.
[130] S. Yen and M. Joye, Checking before output may not be enough against fault-based cryptanalysis.
IEEE Transactions on Computers, 49, 967970, 2000.
[131] S. Yen, S. Kim, S. Lim, and S. Moon, A countermeasure against one physical cryptanalysis
may benet another attack. In Proceedings of the Fourth International Conference on Information
Security and Cryptology, Seoul, 2002, pp. 414427.
[132] J. Dhem, F. Koeune, P. Leroux, P. Mestr, J. Quisquater, and J. Willems, Apractical implementation
of the timing attack. In Proceedings of the International Conference on Smart Card Research and
Applications, 1998, pp. 167182.
[133] Werner Schindler, Franois Koeune, and Jean-Jacques Quisquater, Unleashing the full power of
timing attack. UCL Crypto Group Technical report CG-2001/3, Universite Catholique de Louvain
2001.
[134] A. Hevia and M. Kiwi, Strength of two data encryption standard implementations under timing
attacks. ACMTransactions on Information and SystemSecurity, 2, 416437, 1999.
[135] Helena Handschuh and Heys Howard, A timing attack on RC5. In Proceedings of the Fifth Annual
International Workshop on Selected Areas in Cryptography, SAC98, 1998.
[136] K. Gandol, C. Mourtel, and F. Olivier, Electromagnetic analysis: concrete results. In Proceedings
of the Third International Workshop on Cryptographic Hardware and Embedded Systems, 2001,
pp. 251261.
[137] K. Ko, T. Acar, and B.S. Kaliski Jr., Analyzing and comparing montgomery multiplication
algorithms. IEEE Micro, 16, 2633, 1996.
[138] Simon Moore, Ross Anderson, Paul Cunningham, Robert Mullins, and George Taylor, Improving
smart card security using self-timed circuits. In Proceedings of the Eighth International Symposium
on Advanced Research in Asynchronous Circuits and Systems, 2002.
[139] Kouichi Sakurai and Tsuyoshi Takagi, A reject timing attack on an IND-CCA2 public-key
cryptosystem. In Proceedings of ICISC 2002, Lecture Notes in Computer Science, 2003.
[140] S.P. Skorobogatov and R.J. Anderson, Optical fault induction attacks. In Revised Papers from
the Fourth International Workshop on Cryptographic Hardware and Embedded Systems, 2003,
pp. 212.
2006 by Taylor & Francis Group, LLC
II
System-on-Chip Design
18 System-on-Chip and Network-on-Chip Design
Grant Martin
19 A Novel Methodology for the Design of Application-Specic
Instruction-Set Processors
Andreas Hoffmann, Achim Nohl, and Gunnar Braun
20 State-of-the-Art SoC Communication Architectures
Jos L. Ayala, Marisa Lpez-Vallejo, Davide Bertozzi, and Luca Benini
21 Network-on-Chip Design for Gigascale Systems-on-Chip
Davide Bertozzi, Luca Benini, and Giovanni De Micheli
22 Platform-Based Design for Embedded Systems
Luca P. Carloni, Fernando De Bernardinis, Claudio Pinello,
Alberto L. Sangiovanni-Vincentelli, and Marco Sgroi
23 Interface Specication and Converter Synthesis
Roberto Passerone
24 Hardware/Software Interface Design for SoC
Wander O. Cesrio, Flvio R. Wagner, and A.A. Jerraya
25 Design and Programming of Embedded Multiprocessors:
An Interface-Centric Approach
Pieter van der Wolf, Erwin de Kock, Tomas Henriksson, Wido Kruijtzer, and Gerben Essink
26 A Multiprocessor SoC Platform and Tools for Communications Applications
Pierre G. Paulin, Chuck Pilkington, Michel Langevin, Essaid Bensoudane, Damien Lyonnard,
and Gabriela Nicolescu
2006 by Taylor & Francis Group, LLC
18
System-on-Chip and
Network-on-Chip
Design
Grant Martin
Tensilica Inc.
18.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-1
18.2 System-on-a-Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-2
18.3 System-on-a-Programmable-Chip . . . . . . . . . . . . . . . . . . . . . 18-2
18.4 IP Cores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-4
18.5 Virtual Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-5
18.6 Platforms and Programmable Platforms . . . . . . . . . . . . . . . 18-5
18.7 Integration Platforms and SoC Design. . . . . . . . . . . . . . . . . 18-6
18.8 Overview of the SoC Design Process . . . . . . . . . . . . . . . . . . . 18-7
18.9 System-Level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-10
18.10 Interconnection and Communication Architectures
for SoC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-11
18.11 Computation and Memory Architectures for SoC . . . . 18-11
18.12 IP Integration Quality and Certication Methods
and Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-12
18.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-12
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-13
18.1 Introduction
System-on-Chip (SoC) is a phrase that has been much talked about in recent years [1]. It is more than
a design style, more than an approach to the design of Application-Specic Integrated Circuits (ASICs),
more than a methodology. Rather, SoC represents a major revolution in IC design a revolution enabled
by the advances in process technology allowing the integration of all or most of the major components and
subsystems of an electronic product onto a single chip, or integrated chipset [2]. This revolution in design
has been embraced by many designers of complex chips, as the performance, power consumption, cost,
and size advantages of using the highest level of integration made available have proven to be extremely
important for many designs. In fact, the design and use of SoCs is arguably one of the key problems in
designing real-time embedded systems.
The move to SoC began sometime in the mid-1990s. At this point, the leading CMOS-based semicon-
ductor process technologies of 0.35 and 0.25m were sufciently capable of allowing the integration of
many of the major components of a second-generation wireless handset or a digital set-top box onto a
18-1
2006 by Taylor & Francis Group, LLC
18-2 Embedded Systems Handbook
single chip. The digital baseband functions of a cell phone a Digital Signal Processor (DSP), hardware
(HW) support for voice encoding and decoding, and a RISC processor could all be placed onto a single
die. Although such a baseband SoC was far from the complete cell phone electronics there were major
components such as the RF transceiver, the analog power control, analog baseband, and passives that were
not integrated the evolutionary path with each new process generation, to integrate more and more
onto a single die, was clear. Todays chipset would become tomorrows chip. The problems of integrating
hybrid technologies involved in making up a complete electronic system would be solved. Thus, eventually,
SoC could encompass design components drawn from the standard and more adventurous domains of
digital, analog, RF, recongurable logic, sensors, actuators, optical, chemical, microelectronic mechanical
systems, and even biological and nanotechnology.
With this viewpoint of continued process evolution leading to ever-increasing levels of integration into
ever-more-complex SoC devices, the issue of a SoC being a single chip at any particular point in time
is somewhat moot. Rather, the word system in System-on-Chip is more important than chip. What
is most important about a SoC, whether packaged as a single chip, or integrated chipset, or System-in-
Package (SiP) or System-on-Package (SoP) is that it is designed as an integrated system, making design
trade-offs across the processing domains and across the individual chip and package boundaries.
18.2 System-on-a-Chip
Let us dene a SoC as a complex integrated circuit, or integrated chipset, which combines the major
functional elements or subsystems of a complete end product into a single entity. These days, all inter-
esting SoC designs include at least one programmable processor, and very often a combination of at least
one RISC control processor and one DSP. They also include on-chip communications structures pro-
cessor bus(es), peripheral bus(es), and perhaps a high-speed system bus. A hierarchy of on-chip memory
units, and links to off-chip memory are important especially for SoC processors (cache, main memories,
very often separate instruction and data caches are included). For most signal processing applications,
some degree of HW-based accelerating functional units are provided, offering higher performance and
lower energy consumption. For interfacing to the external, real world, SoCs include a number of peri-
pheral processing blocks, and owing to the analog nature of the real world, this may include analog
components as well as digital interfaces (e.g., to system buses at a higher packaging level). Although
there is a much interesting research in incorporating MEMS-based sensors and actuators, and in SoC
applications incorporating chemical processing (lab-on-a-chip), these are, with rare exceptions, research
topics only. However, future SoCs of a commercial nature may include such subsystems as well as optical
communications interfaces.
Figure 18.1 illustrates what a typical SoC might contain for consumer applications.
One key point about SoC that is often forgotten for those approaching them from a HW-oriented
perspective is that, all interesting SoC designs encompass both hardware (HW) and software (SW) com-
ponents that is, programmable processors, Real-Time Operating Systems (RTOSs), and other aspects
of HW-dependent SW such as peripheral device drivers, as well as middleware stacks for particular
application domains, and possibly optimized assembly code for DSPs. Thus, the design and use of SoCs
cannot remain a HW-only concern it involves aspects of system-level design and engineering, HWSW
trade-off and partitioning decisions, and SW architecture, design and implementation.
18.3 System-on-a-Programmable-Chip
Recently, attention has begun to expand in the SoC world from SoC implementations using custom,
ASIC or Application-Specic Standard Part (ASSP) design approaches, to include the design and use of
complex recongurable logic parts with embedded processors and other application-oriented blocks of
intellectual property. These complex FPGAs (Field-Programmable Gate Arrays) are offered by several
vendors, including Xilinx (Virtex-II PRO Platform FPGA) and Altera (SOPC), but are referred to by
2006 by Taylor & Francis Group, LLC
System-on-Chip and Network-on-Chip Design 18-3
External
memory access
Flash Flash
ICache DCache DCache ICache
RAM RAM DMA
Microprocessor
System
bus
Peripheral
bus
DSP
MPEG
decode
Video I/F
Audio
CODE C
PLL
Test
PCI
USB
Disk controller
100 base-T
Bus bridge
FIGURE 18.1 A typical SoC device for consumer applications.
several names: highly programmable SoCs, system-on-a-programmable-chip, embedded FPGAs. The key
idea behind this approach to SoC is to combine large amounts of recongurable logic with embedded
RISC processors (either custom laid-out, hardened blocks, or synthesizable processor cores), in order
to allow very exible and tailorable combinations of HW and SW processing to be applied to a particular
design problem. Algorithms that consist of signicant amounts of control logic, plus signicant quantities
of dataow processing, can be partitioned into the control RISC processor (e.g., in Xilinx Virtex-II PRO, a
PowerPC processor) and recongurable logic offering HW acceleration. Although the resulting combin-
ation does not offer the highest performance, lowest energy consumption, or lowest cost, in comparison
with custom IC or ASIC/ASSP implementations of the same functionality, it does offer tremendous ex-
ibility in modifying the design in the eld and avoiding expensive NonRecurring Engineering (NRE)
charges in the design. Thus, new applications, interfaces, and improved algorithms can be downloaded to
products working in the eld using this approach.
Products in this area also include other processing and interface cores, such as Multiply-ACcumulate
(MAC) blocks which are specically aimed at DSP-type dataowsignal and image processing applications;
and high speed serial interfaces for wired communications such as SERDES (serializer/de-serializer)
blocks. In this sense, system-on-a-programmable-chip SoCs are not exactly application-specic, but not
completely generic either.
It remains to be seen whether system-on-a-programmable-chip SoCs are going to be a successful way of
delivering high volume consumer applications, or will end up restricted to the two main applications for
high-endFPGAs: rapidprototyping of designs whichwill be re-targetedtoASICor ASSPimplementations;
and used in high-end, relatively expensive parts of the communications infrastructure that require in-eld
2006 by Taylor & Francis Group, LLC
18-4 Embedded Systems Handbook
exibility and can tolerate the trade-offs in cost, energy consumption, and performance. Certainly, the
use of synthesizable processors on more moderate FPGAs to realize SoC style designs is one alternative to
the cost issue. Intermediate forms, such as the use of metal-programmable gate-array style logic fabrics
together with hard-core processor subsystems and other cores, such as is offered in the Structured
ASIC offerings of LSI Logic (RapidChip) and NEC (Instant Silicon Solutions Platform) represents an
intermediate form of SoC between the full-mask ASIC and ASSP approach and the eld-programmable
gate array approach. Here the trade-offs are much slower design creation (a few weeks rather than a day or
so), higher NRE than FPGA (but much lower than a full set of masks), and better cost, performance, and
energy consumption than FPGA (perhaps 15 to 30% worse than an ASIC approach). Further interesting
compromise or hybrid style approaches, such as ASIC/ASSP with on-chip FPGAregions, are also emerging
to give design teams more choices.
18.4 IP Cores
The design of SoC would not be possible if every design started from scratch. In fact, the design of SoC
depends heavily on the reuse of Intellectual Property blocks what are called IP Cores. IP reuse has
emerged as a strong trend over the last 8 to 9 yrs [3] and has been one key element in closing what the Inter-
national Technology Roadmap for Semiconductors [4] calls thedesign productivity gapthe difference
between the rate of increase of complexity offered by advancing semiconductor process technology, and
the rate of increase in designer productivity offered by advances in design tools and methodologies.
But reuse is not just important to offer ways of enhancing designer productivity although it has
dramatic impacts on that. It also provides a mechanism for design teams to create SoC products that
span multiple design disciplines and domains. The availability of both hard (laid-out and characterized)
and soft (synthesizable) processor cores from a number of processor IP vendors allows design teams who
would not be able to design their own processor fromscratch to drop theminto their designs and thus add
RISC control and DSP functionality to an integrated SoC without having to master the art of processor
design within the team. In this sense, the advantages of IP reuse go beyond productivity it offers both a
large reduction in design risk, and also a way for SoCdesigns to be done that would otherwise be infeasible
owing to the length of time it would take to acquire expertise and design IP from scratch.
This ability when acquiring and reusing IP cores to acquire, in a prepackaged form, design domain
expertise outside ones own design teams set of core competencies, is a key requirement for the evolution
of SoC design going forward. SoC up to this point has concentrated to a large part on integrating digital
components together, perhaps with some analog interface blocks which are treated as black boxes. The
hybrid SoCs of the future, incorporating domains unfamiliar to the integration team, such as RF or
MEMS, requires the concept of drop-in IP to be extended to these new domains. We are not yet at that
state considerable evolution in the IP business and the methodologies of IP creation, qualication,
evaluation, integration, and verication are required before we will be able to easily specify and integrate
truly heterogeneous sets of disparate IP blocks into a complete hybrid SoC.
However, the same issues existed at the beginning of the SoC revolution in the digital domain. They
have been solved to a large extent, through the creation of standards for IP creation, evaluation, exchange,
and integration primarily for digital IP blocks but extending also to Analog/Mixed-Signal (AMS) cores.
Among the leading organizations in the identication and creation of such standards has been the Virtual
Socket Interface Alliance (VSIA) [5], formed in 1996 and having at its peak membership more than 200 IP,
systems, semiconductor, and Electronic Design Automation (EDA) corporate members. Although often
criticized over the years for a lack of formal and acknowledged adoption of its IP standards, VSIA has
had a more subtle inuence on the electronics industry. Many companies instituting reuse programmes
internally; many IP, systems, and semiconductor companies engaging in IP creation and exchange; and
many design groups have usedVSIAIPstandards as a key starting point for developing their own standards
and methods for IP-based design. In this sense, use of VSIA outputs has enabled a kind of IP reuse in the
IP business.
2006 by Taylor & Francis Group, LLC
System-on-Chip and Network-on-Chip Design 18-5
VSIA, for example, in its early architectural documents of 1996 to 1997, helped dene the strong
industry-adopted understanding of what it meant for an IP block to be considered to be in hard or
soft form. Other important contributions to design included the widely read system level design model
taxonomy createdby one of its working groups. Its standards, specications, anddocuments thus represent
a very useful resource for the industry [6].
Other important issues for the rise of IP-based design and the emergence of a third party industry
in this area (which has taken much longer to emerge than originally hoped in the mid-1990s) are the
business issues surrounding IP evaluation, purchase, delivery, and use. Organizations such as the Virtual
Component Exchange (VCX) [7] emerged to look at these issues and provide solutions. Although still in
existence, it is clear that the vast majority of IP business relationships between rms occur within a more
ad hoc supplier to customer business framework.
18.5 Virtual Components
The VSIA has had a strong inuence on the nomenclature of the SoC- and IP-based design industry. The
concept of the virtual socket a description of all the design interfaces which an IP core must satisfy,
and design models and integration information which must be provided with the IP core required
to allow it to be more easily integrated or dropped into an SoC design comes from the concept of
Printed Circuit Board (PCB) design where components are sourced and purchased in prepackaged form
and can be dropped into a board design in a standardized way.
The dual of the virtual socket then becomes the virtual component. Not only in the VSIA context,
but also more generally in the interface, an IP core represents a design block which might be reusable.
A virtual component represents a design block that is intended for reuse, and which has been developed
and qualied to be highly reusable. The things that separate IP cores from virtual components are in
general:
Virtual components conform in their development and verication processes to well-established
design processes and quality standards.
Virtual components come with design data, models, associated design les, scripts, characterization
information, and other deliverables which conform to one or other well-accepted standards for IP
reuse for example, the VSIA deliverables, or another internal or external set of standards.
Virtual components in general should have been fabricated at least once, and characterized
postfabrication to ensure that they have validated claims.
Virtual components should have been reused at least once by an external design team, and usage
reports and feedback should be available.
Virtual components should have been rated for quality using an industry standard quality metric
such as OpenMORE (originated by Synopsys and Mentor Graphics) or the VSI Quality standard
(which has OpenMORE as one of its inputs).
To a large extent, the developments over the last decade in IP reuse have been focused on dening the
standards and processes to turn the ad hoc reuse of IP cores into a well-understood and reliable process
for acquiring and reusing virtual components thus enhancing the analogy with PCB design.
18.6 Platforms and Programmable Platforms
The emphasis in the preceding sections has been on IP (or virtual component) reuse on a somewhat
ad hoc block-by-block basis in SoC design. Over the past several years, however, there has arisen a more
integrated approach to the design of complex SoCs and the reuse of virtual components what has been
called platform-based design. This will be dealt with at much greater length in another chapter in this
book. Much more information is available in References 811. Sufce it here to dene platform-based
design in the SoC context from one perspective.
2006 by Taylor & Francis Group, LLC
18-6 Embedded Systems Handbook
We can dene platform-based design as a planned design methodology which reduces the time and
effort required, and risk involved, in designing and verifying a complex SoC. This is accomplished by
extensive reuse of combinations of HW and SW IP. As an alternative to IP reuse in a block-by-block
manner, platform-based design assembles groups of components into reusable platformarchitecture. This
reusable architecture, together with libraries of preveried and precharacterized, application oriented HW
and SW virtual components, is a SoC integration platform.
There are several reasons for the growing popularity of the platformapproachinindustrial design. These
include the increase in design productivity, the reduction in risk, the ability to utilize preintegrated virtual
components from other design domains more easily, and the ability to reuse SoC architectures created by
experts. Industrial platforms include full application platforms, recongurable platforms, and processor-
centric platforms [12]. Full application platforms, such as Philips Nexperia and TI OMAP provide a
complete implementation vehicle for specic product domains [13]. Processor-centric platforms, such as
ARM PrimeXsys concentrate on the processor, its required bus architecture, and basic sets of peripherals,
along with RTOS and basic SW drivers. Recongurable or highly programmable platforms such as the
Xilinx Platform FPGA and Alteras SOPC deliver hardcore processors plus recongurable logic along with
associated IP libraries and design tool ows.
18.7 Integration Platforms and SoC Design
The use of SoC integration platforms changes the SoC design process in two fundamental ways:
1. The basic platform must be designed, using whatever ad hoc or formalized design process for SoC
that the platform creators decide on. Section 18.8 outlines some of the basic steps required to build a SoC,
whether building a platform or using a block-based more ad hoc integration process. However, when
constructing a SoC platform for reuse in derivative design, it is important to remember that it may not be
necessary to take the whole platformandits associatedHWandSWcomponent libraries throughcomplete
implementation. Enough implementation must be done to allowthe platformand its constituent libraries
to be fully characterized and modeled for reuse. It is also essential that the platformcreation phase produce
in an archivable and retrievable form all the design les required for the platform and its libraries to be
reused in a derivative design process. This must also include the setup of the appropriate conguration
programs or scripts to allow automatic creation of a congured platform during derivative design.
2. A design process must be created and qualied for all the derivative designs which will be created
based on the SoC integration platform. This must include processes for retrieving the platform from its
archive, for entering the derivative design conguration into a platform congurator, the generation of
the design les for the derivative, the generation of the appropriate verication environment(s) for the
derivative, the ability for derivative design teams to select components from libraries, to modify these
components and validate them within the overall platform context, and, to the extent supported by the
platform, to create new components for their particular application.
Recongurable or highly programmable platforms introduce an interesting addition to the platform-
based SoCdesign process [14]. PlatformFPGAs and SOPCdevices can be thought of as ameta-platform:
a platform for creating platforms. Design teams can obtain these devices from companies such as Xilinx
and Altera, containing a basic set of more generic capabilities and IP-embedded processors, on-chip buses,
special IP blocks such as MACs and SERDES, and a variety of other prequalied IP blocks. They can then
customize the meta-platform to their own application space by adding application domain-specic IP
libraries. Finally, the combined platform can be provided to derivative design teams, who can select the
basic meta-platform and congure it within the scope intended by the intermediate platform creation
team, selecting the IP blocks needed for their exact derivative application. More on platform-based design
will be found in another chapter in this book.
2006 by Taylor & Francis Group, LLC
System-on-Chip and Network-on-Chip Design 18-7
18.8 Overview of the SoC Design Process
The most important thing to remember about SoC design is that it is a multi-disciplinary design process,
which needs to exercise design processes from across the spectrum of electronics. Design teams must gain
some uency with all these multiple disciplines, but the integrative and reuse nature of SoC design means
that they may not need to become deep experts in all of them. Indeed, avoiding the need for designers to
understand all methodologies, ows, and domain-specic design techniques is one of the key reasons for
reuse and enablers of productivity. Nevertheless, from Design-for-Test (DFT) through digital and analog
HW design, from verication through system level design, from embedded SW through IP procurement
and integration, from SoC architecture through IC analysis, a wide variety of knowledge is required by
the team, if not every designer.
Figure 18.2 illustrates some of the basic constituents of the SoC design process.
We will now dene each of these steps as illustrated:
SoCrequirements analysis. This is the basic step for dening and specifying a complex SoC, based on the
needs of the end product into which it will be integrated. The primary input into this step is the marketing
denition of the end product and the resulting characteristics of what the SoC should be: both functional
and nonfunctional (e.g., cost, size, energy consumption, performance: latency and throughput, package
selection). This process of requirements analysis must ultimately answer the question: is the product
feasible? Is the desired SoC feasible to design, and with what effort and in what timeframe? How much
SoC requirement
analysis
SoC
architecture
Communications
architecture
System-level design
HWSW partitioning
System modeling
Performance analysis
Build transaction-level
golden testbench
Configure and floorplan
SoC HW microarchitecture
DFT architecture
and implementation
HW IP assembly and
implementation
Final SoC HW assembly and verification
Fabrication, testing, packaging, lab verification with SW
Acquisition of HW and SW IP
Define SW architecture
SW assembly and
implementation
HW and HWSW
verification
AMS HW
implementation
Choose
processor(s)
FIGURE 18.2 Steps in the SoC design process.
2006 by Taylor & Francis Group, LLC
18-8 Embedded Systems Handbook
reuse will be possible? Is the SoC design based on legacy designs of previous generation products (or, in
the case of platform-based design, to be built based on an existing platform offering)?
SoC architecture. In this phase, the basic structure of the desired SoC is dened. Vitally important
is to decide on the communications architecture that will be used as the backbone of the SoC on-chip
communications network. An inadequate communications architecture will cripple the SoC and have as
big animpact as the use of aninappropriate processor subsystem. Of course, the choice of communications
architecture is impossible to divorce from making the basic processor(s) choice for example, do I use a
RISCcontrol processor? DoI have anon-boardDSP? Howmany of each? What are the processing demands
of my SoC application? Do I integrate the bare processor core, or use a whole processor subsystem
provided by an IP company (most processor IP companies have moved from offering just processor
cores, to whole processor subsystems including hierarchical bus fabrics tuned to their particular processor
needs)? Do I have some ideas, based on legacy SoC design in this space, as to how SW and HW should be
partitioned? What memory hierarchy is appropriate? What are the sizes, levels, performance requirements,
and congurations of the embedded memories most appropriate to the application domain for the SoC?
System-level design. This is an important phase of the SoC process but one that is often done in
a relatively ad hoc way. The whiteboard and the spreadsheet are as much used by the SoC architects
as more capable toolsets. However, there has long been use of ad hoc C/C++ based models for the
system design phase to validate basic architectural choices. And designers of complex signal processing
algorithms for voice and image processing have long adopted dataow models and associated tools to
dene their algorithms, dene optimal bit-widths, and validate performance whether destined for HW or
SW implementation. A urry of activity in the last few years on different C/C++modeling standards for
system architects has consolidated on SystemC [15]. The system nature of SoC demands a growing use
of system-level design modeling and analysis, as these devices grow more complex. The basic processes
carried out in this phase include HWSW partitioning (the allocation of functions to be implemented in
dedicated HW blocks, in SW on processors [and the decision of RISC versus DSP], or a combination of
both, together with decisions on the communications mechanisms to be used to interface HW and SW, or
HWHW and SWSW). In addition, the construction of system-level models, and the analysis of correct
functioning, performance, and other nonfunctional attributes of the intended SoC through simulation
and other analytical tools, is necessary. Finally, all additional IP blocks required which can be sourced
outside, or reused from the design groups legacy, must be identied both HW and SW. The remaining
new functions will need to be implemented as part of the overall SoC design process.
IP acquisition. After system-level design and the identication of the processors and communications
architecture, and other HW or SW IP required for the design, the group must undertake an IP acquisition
stage. This can, to a large extent, be done at least in part in parallel with other work such as system-level
design (assuming early identication of major external IP is made) or building golden transaction-level
testbench models. Fortunate design groups will be working in companies with a large legacy of existing
well-crafted IP (rather, virtual components) organized in databases which can be easily searched; or
those with access via supplier agreements to large external IP libraries; or at least those with experience
at IP search, evaluation, purchase, and integration. For these lucky groups, the problems at this stage are
greatly ameliorated. Others with less experience or infrastructure will need to explore these processes for
the rst time, hopefully making use of IP suppliers experience with the legal and other processes required.
Here the external standards bodies such as VSIA and VCX have done much useful work that will smooth
the path, at least a little. One key issue in IP acquisition is to conduct rigorous and thorough incoming
inspection of IP to ensure its completeness and correctness to the greatest extent possible prior to use, and
to resolve any problems with quality early with suppliers long before SoC integration. Every hour spent
on this at this stage will pay back in avoiding much longer schedule slips later. The IP quality guidelines
discussed earlier are a foundation level for a quality process at this point.
Build a transaction-level golden testbench. The system model built up during the system-level design
stage can form the basis for a more elaborated design model, using transaction-level abstractions [16],
which represents the underlying HWSW architecture and components in more detail sufcient detail
to act as a functional virtual prototype for the SoC design. This golden model can be used at this stage to
2006 by Taylor & Francis Group, LLC
System-on-Chip and Network-on-Chip Design 18-9
verify the microarchitecture of the design and to verify detailed design models for HW IP at the Hardware
Description Language (HDL) level within the overall system context. It thus can be reused all the way
down the SoC design and implementation cycle.
Dene the SoC SW architecture. SoC is of course not just about HW [17]. As well as often dening the
right on-chip communications architecture, the choice of processor(s) and the nature of the application
domain have a very heavy inuence on the SW architecture. For example, RTOS choice is limited by the
processor ports which have been done and by the application domain (OSEK is an RTOS for automotive
systems; Symbian OS for portable wireless devices; PalmOS for Personal Digital Assistants, etc.). As well
as the basic RTOS, every SoC peripheral device will need a device driver hopefully based on reuse
and conguration of templates; various middleware application stacks (e.g., telephony, multimedia image
processing) are important parts of the SW architecture; voice and image encoding and decoding on
portable devices often is based on assembly code IP for DSPs. There is thus a strong need in dening the
SoC to fully elaborate the SW architecture to allow reuse, easy customization and effective verication of
the overall HWSW device.
Congure and oorplan SoC microarchitecture. At this point we are beginning to deal with the SoC
on a more physical and detailed logical basis. Of course, during high-level architecture and system-level
design, the teamhas been looking at physical implementation issues (although our design process diagram
shows everything as a waterfall kind of ow, in reality SoC design like all electronics design is more of
an iterative, incremental process that is, more akin to the famous spiral model for SW). But before
beginning the detailed HW design and integration it is important that there is agreement among the
team on the basic physical oorplan; that all the IP blocks are properly and fully congured; that the
basic microarchitectures (test, power, clocking, bus, timing) have been fully dened and congured, and
that HW implementation can proceed. In addition, this process should also generate the downstream
verication environments which will be used throughout the implementation processes whether SW
simulation based, emulation based, using rapid prototypes, or other hybrid verication approaches.
DFT architecture and implementation. The test architecture is only one of the key microarchitectures
which must be implemented; it is complicated by IP legacy and the fact that it is often impossible to
impose one DFT style (such as BIST or SCAN) on all IP blocks. Rather, wrappers or adaptations of
standard test interfaces (such as JTAG ports) may be necessary to t all IP blocks together into a coherent
test architecture and plan.
AMS HW implementation. Most SoCs incorporating AMS blocks use them to interface to the external
world. VSIA, among other groups, has done considerable work in dening how AMS IP blocks should be
created to allow them to be more easily integrated into mainly digital SoCs (the Big D/little a SoC); and
guidelines andrules for suchintegration. Experiences withthese rules andguidelines andextra deliverables
have been, on the whole, promising but they have more impact between internal design groups today
than on the industry as a whole. The Big A/Big D mixed-signal SoC is still relatively rare.
HW IP assembly and integration. This design step is in many ways the most traditional. Many design
groups have experience in assembling design blocks done by various designers or subgroups in an incre-
mental fashion, into the agreed on architectures for communications, bussing, clocking, power, etc. The
main difference with SoC is that many of the design blocks may be externally sourced IP. To avoid dif-
culties at this stage, the importance of rigorous qualication of incoming IP and the early denition of
the SoC microarchitecture, to which all blocks must conform, cannot be overstated.
SW assembly and implementation. Just as with HW, the SW IP, together with new or modied SW tasks
created for the particular SoC under design, must be assembled together and validated as to conformance
to interfaces and expected operational quality. It is important to verify as much of the SW in its normal
system operating context as possible.
HW and HWSW verication. Although represented as a single box on the diagram, this is perhaps one
of the largest consumers of design time and effort and the major determinant of nal SoC quality. Vital to
effective verication is the setup of a targeted SoC verication environment, reusing the golden testbench
models created at higher levels of the design process. In addition, highly capable, multi-language, mixed
simulation environments are important (e.g., SystemC models and HDL implementation models need to
2006 by Taylor & Francis Group, LLC
18-10 Embedded Systems Handbook
be mixed in the verication process and effective links between themare crucial). There are a large number
of different verication tools and techniques [18], ranging from SW-based simulation environments to
HW emulators, HW accelerators, and FPGA and bonded-core-based rapid prototyping approaches. In
addition, formal techniques such as equivalence checking, and model/property checking have enjoyed
some successful usage in verifying parts of SoC designs, or the design at multiple stages in the process.
Mixed approaches to HWSW verication range from incorporating Instruction Set Simulators (ISSs) of
processors in SW-based simulation to linking HW emulation of the HW blocks (compiled from the HDL
code) to SW running natively on a host workstation, linked in an ad hoc fashion by design teams or using
a commercial mixed verication environment. Alternatively, HDL models of new HW blocks running in
a SW simulator can be linked to emulation of the rest of the system running in HW a mix of emulation
and use of bonded-out processor cores for executing SW. It is important that as much of the system SW
be exercised in the context of the whole system as possible, using the most appropriate verication tech-
nology that can get the design team close to real-time execution speed (no more than 100 slower is the
minimum to run signicant amounts of SW). The trend to transaction-based modeling of systems, where
transactions range in abstraction from untimed functional communications via message calls, through
abstract bus communications models, through cycle-accurate bus functional models, and nally to cycle
and pin-accurate transformations of transactions to the fully detailed interfaces, allows verication to
occur at several levels or with mixed levels of design description. Finally, a new trend in verication is
assertion-based verication, using a variety of input languages (PSL/Sugar, e, Vera, or regular Verilog and
VHDL) to model design properties, which can then be monitored during simulation, to ensure that either
certain properties will be satised or certain error conditions never occur. Combinations of formal prop-
erty checking and simulation-based assertionchecking have beencreated viz. semiformal verication.
The most important thing toremember about vericationis that armedwitha host of techniques andtools,
it is essential for design teams to craft a well-ordered verication process that allows them to denitively
answer the questionhow do we know that verication is done? and thus allows the SoC to be fabricated.
Final SoCHWassembly andverication. Oftendone inparallel or overlappingthose nal fewsimulation
runs in the verication stage, the nal SoC HW assembly and verication phase includes nal place and
route of the chip, any hand-modications required, and nal physical verication (using design rule
checking and layout-versus-schematic [netlist] tools), as well as important analysis steps for issues which
occur in advanced semiconductor processes such as IR drop, signal integrity, power network integrity, as
well as satisfaction and design transformation for manufacturability (OPC, etc.).
Fabrication, testing, packaging, and lab verication. When a SoC has been shipped to fabrication, it
would seem time for the design team to relax. Instead, this is an opportunity for additional verication to
be carried out especially more verication of system SW running in context of the HW design and
for xes, either of SW, or of the SoC HW on hopefully no more than one expensive iteration of the
design to be determined and planned. When the tested packaged parts arrive back for verication in
the lab, the ideal scenario is to load the SW into the system and have the SoC and its system booted up and
running SW within a few hours. Interestingly, the most advanced SoC design teams, with well-ordered
design methodologies and processes, are able to achieve this quite regularly.
18.9 System-Level Design
As discussed earlier, when describing the overall SoC design ow, system-level design, and SoC are
essentially made for each other. Akey aimof IP reuse and of SoCtechniques such as platform-based design
is to make the back end (RTL to GDS II) design implementation processes easier fast and with low-
risk; and to shift the major design phase for SoCup in time and in abstraction level to the systemlevel. This
also means that the back-end tools and ows for SoC designs do not necessarily differ from those used for
complex ASIC, ASSP, and customICdesign it is the methodology of howthey are used, and howblocks
are sourced and integrated, that overlays the underlying design tools and ows, that may differ for SoC.
However, the fundamental nature of IP-based design of SoC has a stronger inuence on the system level.
2006 by Taylor & Francis Group, LLC
System-on-Chip and Network-on-Chip Design 18-11
It is at the systemlevel that the vital tasks of deciding on and validating the basic systemarchitecture and
choice of IP blocks are carried out. In general, this is known as design space exploration (DSE). As part
of this exploration, SoC platform customization for a particular derivative is carried out, should the SoC
platform approach be used. Essentially one can think of platform DSE as being a similar task to general
DSE, except that the scope and boundaries of the exploration are much more tightly constrained the
basic communications architecture and platformprocessor choices may be xed, and the design teammay
be restricted to choosing certain customization parameters and choosing optional IP froma library. Other
tasks include HWSWpartitioning, usually restricted to decisions about key processing tasks which might
be mapped onto either HW or SW form and which have a big impact on system performance, energy
consumption, on-chip communications bandwidth consumption, or other key attributes. Of course,
in multiprocessor systems, there are SWSW partitioning or codesign issues as well, deciding on the
assignment of SW tasks to various processor options. Again, perhaps 80 to 95% of these decisions can or
are made a priori, especially if a SoC is based on either a platform or an evolution of an existing system;
such codesign decisions are usually made on a small number of functions which have critical impact.
Because partitioning, codesign and DSE tasks at the system level involve much more than HWSW
issues, a more appropriate termfor this isfunction-architecture codesign[19,20]. Inthis codesignmodel,
systems are described on two equivalent levels:
The functional intent of the system for example, a network of applications, decomposed into
individual sets of functional tasks which may be modeled using a variety of models of computation
such as discrete event, nite state machine, or dataow.
The architectural structure of the system the communications architecture, major IP blocks
such as processor(s), memorie(s), and HW blocks, captured or modeled, for example, using some
kind of IP or platform congurator.
The methodology implied in this approach is then to build explicit mappings between the functional
view of the system and the architectural view, which carry within them the implicit partitioning that is
made for both computation and communications. This hybrid model can then be simulated, the results
analyzed, and a variety of ancillary models (e.g., cost, power, performance, communications bandwidth
consumption, etc.) can be utilized in order to examine the suitability of the systemarchitecture as a vehicle
for realizing or implementing the end product functionality.
The function-architecture codesign approach has been implemented and used in both research and
commercial tools [21] and forms the foundation of many system-level codesign approaches going for-
ward. In addition, it has been found extremely suitable as the best system-level design approach for
platform-based design of SoC [22].
18.10 Interconnection and Communication Architectures
for SoC
This topic is dealt with in more detail in other chapters in this book. Sufce it to say here that current
SoC architectures deal in fairly traditional hierarchies of standard on-chip buses: for example, processor-
specic buses, high-speed system buses, and lower-speed peripheral buses, using standards such as ARMs
AMBA and IBMs CoreConnect [13], and traditional masterslave bus approaches. Recently, there has
been a lot of interest in Network-on-Chip (NoC) communications architectures, based on packet-SWand
a number of approaches have been reported in the literature but this remains primarily a research topic
both in universities and industrial research labs [23].
18.11 Computation and Memory Architectures for SoC
The primary processors used in SoC are embedded RISCs such as ARM processors, PowerPCs, MIPS
architecture processors, and some of the congurable processors designed specically for SoC such as
2006 by Taylor & Francis Group, LLC
18-12 Embedded Systems Handbook
Tensilica and ARC. In addition, embedded DSPs from traditional suppliers as TI, Motorola, ParthusCeva,
and others are also quite common in many consumer applications, for embedded signal processing for
voice and image data. Research groups have looked at compiling or synthesizing application-specic pro-
cessors or coprocessors [24,25] and these have interesting potential in future SoCs which may incorporate
networks of heterogeneous congurable processors collaborating to offer large amounts of computational
parallelism. This is an especially interesting prospect given wider use of recongurable logic which opens
up the prospect of dynamic adaptation of SoC to application needs. However, most multiprocessor SoCs
today involve at most 2 to 4 processors of conventional design; the larger networks are more often found
today in the industrial or university lab.
Although several years ago most embedded processors in early SoCs did not use cache memory-based
hierarchies, this has changed signicantly over the years, and most RISC and DSP processors now involve
signicant amounts of Level 1 Cache memory, as well as higher level memory units both on- and off-chip
(off-chip ash memory is often used for embedded SW tasks which may be only infrequently required).
Systemdesigntasks and tools must consider the structure, size, and congurationof the memory hierarchy
as one of the key SoC conguration decisions that must be made.
18.12 IP Integration Quality and Certication Methods
and Standards
We have emphasized the design reuse aspects of SoC and the need for reuse of both internally and
externally sourced IP blocks by design teams creating SoCs. In the discussion of the design process above,
we mentioned issues such as IP quality standards and the need for incoming inspection and qualication
of IP. The issue of IP quality remains one of the biggest impediments to the use of IP-based design for
SoC [26]. The quality standards and metrics available from VSIA and OpenMORE, and their further
enhancement help, but only to a limited extent. The industry could clearly use a formal certication body
or lab for IPquality that would ensure conformance to IPtransfer requirements and the integration quality
of the blocks. Such a certication process would be of necessity quite complex owing to the large number
of congurations possible for many IP blocks and the almost innite variety of SoC contexts into which
they might be integrated. Certied IP would begin the deliver thevirtual componentsof the VSIAvision.
In the absence of formal external certication (and such third party labs seema long way off, if they ever
emerge), designgroups must provide their owncerticationprocesses and real reuse quality metrics, based
on their internal design experiences. Platform-based design methods help owing to the advantages of pre-
qualifying andcharacterizing groups of IPblocks andlibraries of compatible domain-specic components.
Short of independent evaluation and qualication, this is the best that design groups can do currently.
One key issue to remember is that IP not created for reuse, with all the deliverables created and
validated according to a well-dened set of standards, is inherently not reusable. The effort required to
make a reusable IP block has been estimated to be 50 to 200% more than that required to use it once;
however, assuming the most conservative extra cost involved implies positive payback with three uses
of the IP block. Planned and systematic IP reuse and investment in those blocks with greatest SoC use
potential gives a high chance of achieving signicant productivity soon after starting a reuse programme.
But ad hoc attempts to reuse existing design blocks not designed to reuse standards have failed in the past
and are unlikely to provide the quality and productivity desired.
18.13 Summary
In this chapter, we have dened SoC and surveyed a large number of the issues involved in its design. An
outline of the important methods andprocesses involvedinSoCdesigndene a methodology whichcanbe
adopted by design groups and adapted to their specic requirements. Productivity in SoCdesign demands
high levels of design reuse and the existence of the third party and internal IP groups and the chance to cre-
ate a library of reusable IP blocks (true virtual components) are all possible for most design groups today.
2006 by Taylor & Francis Group, LLC
System-on-Chip and Network-on-Chip Design 18-13
The wide variety of design disciplines involved in SoC mean that unprecedented collaboration between
designers of all backgrounds from systems experts through embedded SW designers through architects
through HW designers is required. But the rewards of SoC justify the effort required to succeed.
References
[1] Merrill Hunt and Jim Rowson, Blocking in a system on a chip. IEEE Spectrum, 33(11), 3541,
November 1996.
[2] Rochit Rajsuman. System-on-a-Chip Design and Test. Artech House, Norwood, Massachusetts,
2000.
[3] Michael Keating and Pierre Bricaud, Reuse Methodology Manual for System-on-a-Chip Designs.
Kluwer Academic Publishers, Dordrecht, 1998 (1st ed.), 1999 (2nd ed.), 2002 (3rd ed.).
[4] International Technology Roadmap for Semiconductors (ITRS), 2001 edn. http://public.itrs.net/.
[5] Virtual Socket Interface Alliance, on the web at URL: http://www.vsia.org. This includes access to
its various public documents, including the original Reuse Architecture document of 1997, as well
as more recent documents supporting IP reuse released to the public domain.
[6] B. Bailey, G. Martin, and T. Anderson (eds.). Taxonomies for the Development and verication of
Digital Systems. Springer, New York, 2005.
[7] The Virtual Component Exchange (VCX). Available at http://www.thevcx.com/.
[8] Henry Chang, Larry Cooke, Merrill Hunt, Grant Martin, Andrew McNelly, and Lee Todd,
Surviving the SOC Revolution: A Guide to Platform-Based Design. Kluwer Academic Publishers,
Dordrecht, 1999.
[9] K. Keutzer, S. Malik, A.R. Newton, J. Rabaey, andA. Sangiovanni-Vincentelli, System-Level Design:
Orthogonalization of Concerns and Platform-Based Design. IEEE Transactions on CAD of ICs and
Systems, 19, 1523, 2000.
[10] Alberto Sangiovanni-Vincentelli and Grant Martin, Platform-Based Design and Software Design
Methodology for Embedded Systems. IEEE Design and Test of Computers, 18, 2333, 2001.
[11] IEEE Design and Test of Computers Special Issue on Platform-Based Design of SoCs, 19, 463, 2002.
[12] G. Martin and F. Schirrmeister, ADesign Chain for Embedded Systems. IEEE Computer, Embedded
Systems Column, 35(3), 100103, March 2002.
[13] Grant Martin and Henry Chang, Eds., Winning the SOC Revolution: Experiences in Real Design.
Kluwer Academic Publishers, Dordrecht, May 2003.
[14] Patrick Lysaght, FPGAs as Meta-Platforms for Embedded Systems. In Proceedings of the IEEE
Conference on Field Programmable Technology. Hong Kong, December 2002.
[15] Thorsten Groetker, Stan Liao, Grant Martin, and Stuart Swan, SystemDesign with SystemC. Kluwer
Academic Publishers, Dordrecht, May 2002.
[16] Janick Bergeron, Writing Testbenches, 3rd ed. Kluwer Academic Publishers, Dordrecht, 2003.
[17] G. Martin and C. Lennard, Improving Embedded SW Design and Integration for SOCs. Invited
Custom Integrated Circuits Conference Paper, May 2000, pp. 101108.
[18] Prakash Rashinkar, Peter Paterson, and Leena Singh, System-on-a-Chip Verication: Methodology
and Techniques. Kluwer Academic Publishers, Dordrecht, 2001.
[19] F. Balarin, M. Chiodo, P. Giusto, H. Hsieh, A. Jurecska, L. Lavagno, C. Passerone, A. Sangiovanni-
Vincentelli, E. Sentovich, K. Suzuki, and B. Tabbara, HardwareSoftware Co-Design of Embedded
Systems: The POLIS Approach. Kluwer Academic Publishers, Dordrecht, 1997.
[20] S. Krolikoski, F. Schirrmeister, B. Salefski, J. Rowson, and G. Martin, Methodology and Technology
for Virtual Component Driven Hardware/Software Co-Design on the System Level. Paper 94.1,
ISCAS 99, Orlando, FL, May 30June 2, 1999.
[21] G. Martin and B. Salefski, System Level Design for SOCs: A Progress Report Two Years On.
In System-on-Chip Methodologies and Design Languages, Jean Mermet, Ed. Kluwer Academic
Publishers, Dordrecht, 2001, pp. 297306.
2006 by Taylor & Francis Group, LLC
18-14 Embedded Systems Handbook
[22] G. Martin, Productivity in VC Reuse: Linking SOC Platforms to Abstract Systems Design
Methodology. In Virtual Component Design and Reuse, Ralf Seepold and Natividad Martinez
Madrid, Eds. Kluwer Academic Publishers, Dordrecht, 2001, pp. 3346.
[23] Axel Jantsch and Hennu Tenhunen, Eds., Networks on Chip. Kluwer Academic Publishers,
Dordrecht, 2003.
[24] Vinod Kithail, Shail Aditya, Robert Schreiber, B. Ramakrishna Rau, Darren C. Cronquist, and
Mukund Sivaraman, PICO: Automatically Designing Custom Computers. IEEE Computer, 35,
3947, 2002.
[25] T.J. Callahan, J.R. Hauser, and J. Wawrzynek, The Garp Architecture and C Compiler. IEEE
Computer, 33, 6269, 2000.
[26] DATE 2002 Proceedings, Session 1A: How to Choose Semiconductor IP?: Embedded Processors,
Memory, Software, Hardware. In Proceedings of DATE 2002. Paris, March 2002, pp. 1417.
2006 by Taylor & Francis Group, LLC
19
A Novel Methodology
for the Design of
Application-Specic
Instruction-Set
Processors
Andreas Hoffmann,
Achim Nohl, and
Gunnar Braun
CoWare Inc.
19.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-1
19.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-3
19.3 ASIP Design Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-4
Architecture Exploration LISA Language
19.4 LISA Processor Design Platform. . . . . . . . . . . . . . . . . . . . . . . . 19-10
Hardware Designer Platform For Exploration and Processor
Generation Software Designer Platform For Software
Application Design System Integrator Platform For System
Integration and Verication
19.5 SW Development Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-12
Assembler and Linker Simulator
19.6 Architecture Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-17
LISA Language Elements for HDL Synthesis Implementation
Results
19.7 Tools for Application Development . . . . . . . . . . . . . . . . . . . . 19-22
Examined Architectures Efciency of the Generated Tools
19.8 Requirements and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 19-25
LISA Language HLL C-compiler HDL Generator
19.9 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-26
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-26
19.1 Introduction
In consumer electronics and telecommunications high product volumes are increasingly going along with
short life-times. Driven by the advances in semiconductor technology combined with the need for new
From Andreas Hoffmann, Tim Kogel, Achim Nohl, Gunnar Braun, Oliver Schliebusch, Oliver Wahlen, Andreas
Wieferink, and Heinrich Meyr. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 20,
2001. With permission.
19-1
2006 by Taylor & Francis Group, LLC
19-2 Embedded Systems Handbook
applications like digital TV and wireless broadband communications, the amount of system functionality
realized on a single chip is growing enormously. Higher integration and thus increasing miniaturization
have led to a shift from using distributed hardware components towards heterogeneous system-on-chip
(SOC) designs [1]. Due tothe complexity introducedby suchSOCdesigns andtime-to-market constraints,
the designers productivity has become the vital factor for successful products. For this reason a growing
amount of system functions and signal processing algorithms is implemented in software rather than in
hardware by employing embedded processor cores.
In the current technical environment, embedded processors and the necessary development tools are
designed manually, with very little automation. This is because the design and implementation of an
embedded processor, such as a DSP device embedded in a cellular phone, is a highly complex process
composed of the following phases: architecture exploration, architecture implementation, application
software design, and system integration and verication.
During the architecture exploration phase, software development tools (i.e., HLL compiler, assembler,
linker, and cycle-accurate simulator) are required to prole and benchmark the target application on
different architectural alternatives. This process is usually an iterative one that is repeated until a best
t between selected architecture and target application is obtained. Every change to the architecture
specication requires a complete new set of software development tools. As these changes on the tools
are carried out mainly manually, this results in a long, tedious, and extremely error-prone process.
Furthermore, the lack of automation makes it very difcult to match the proling tools to an abstract
specication of the target architecture. In the architecture implementation phase, the specied processor
has to be converted into a synthesizable HDL model. With this additional manual transformation it is
quite obvious that considerable consistency problems arise between the architecture specication, the
software development tools, and the hardware implementation. During the software application design
phase, software designers need a set of production-quality software development tools. Since the demands
of the software application designer and the hardware processor designer place different requirements
on software development tools, new tools are required. For example, the processor designer needs a
cycle/phase-accurate simulator for hardware/software partitioning and proling, which is very accurate
but inevitably slow, whereas the application designer demands more simulation speed than accuracy. At
this point, the complete software development tool-suite is usually re-implementedby handconsistency
problems are self-evident. In the system integration and verication phase, co-simulation interfaces must
be developed to integrate the software simulator for the chosen architecture into a system simulation
environment. These interfaces vary with the architecture that is currently under test. Again, manual
modication of the interfaces is required with each change of the architecture.
The efforts of designing a newarchitecture can be reduced signicantly by using a retargetable approach
based on a machine description. The Language for Instruction Set Architectures (LISAs) [2,3] was
developed for the automatic generation of consistent software development tools and synthesizable HDL
code. A LISA processor description covers the instruction-set, the behavioral, and the timing model of
the underlying hardware, thus providing all essential information for the generation of a complete set
of development tools including compiler, assembler, linker, and simulator. Moreover, it contains enough
micro-architectural details to generate synthesizable HDL code of the modelled architecture. Changes on
the architecture are easily transferred to the LISA model and are applied automatically to the generated
tools and hardware implementation. In addition, speed and functionality of the generated tools allow
usage even after the product development has been nished. Consequently, there is no need to rewrite
the tools to upgrade them to production quality standard. In its predicate to represent an unambiguous
abstraction of the real hardware, a LISA model description bridges the gap between hardware and software
design. It provides the software developer with all required information and enables the hardware designer
to synthesize the architecture from the same specication the software tools are based on.
The chapter is organized as follows: Section 19.2 reviews existing approaches on machine description
languages and discusses their applicability for the design of application specic instruction set processors.
Section 19.3 presents an overview on a typical ASIP design ow using LISA: from specication to imple-
mentation. Moreover, different processor models are worked out which contain the required information
2006 by Taylor & Francis Group, LLC
A Novel Methodology for the Design of ASIPs 19-3
the tools need for their retargetation. Besides, sample LISA code segments are presented showing how the
different models are expressed in the LISA language. Section 19.4 introduces the LISA processor design
platform. Following that, the different areas of application are illuminated in more detail. In Section 19.5
the generated software development tools are presented in more detail with a focus on different simulation
techniques that are applicable. Section 19.6 shows the path to implementation and gives results for a case
study that was carried out using the presented methodology. To prove the quality of the generated software
development tools, in Section 19.7 simulation benchmark results are shown for modelled state-of-the-
art processors. In Section 19.8, requirements and limitations of the presented approach are explained.
Section 19.9 summarizes the chapter and gives an outlook on future research topics.
19.2 Related Work
Hardware description languages (HDLs) like VHDL or Verilog are widely used to model and simulate
processors, but mainly with the goal to develop hardware. Using these models for architecture exploration
and production quality software development tool generation has a number of disadvantages, especially
for cycle-based or instruction-level processor simulation. They cover a huge amount of hardware imple-
mentation details which are not needed for performance evaluation, cycle-based simulation, and software
verication. Moreover, the description of detailed hardware structures has a signicant impact on simula-
tion speed [4,5]. Another problem is that the extraction of the instruction set is a highly complex, manual
task and some instruction set information, for example, assembly syntax, cannot be obtained from HDL
descriptions at all.
There are many publications on machine description languages providing instruction-set models.
Most approaches using such models are addressing retargetable code generation [69]. Other approaches
address retargetable code generation and simulation. The approaches of Maril [10] as part of the Marion
environment and a system for VLIW compilation [11] are both using latency annotation and reservation
tables for code generation. But models based on operation latencies are too coarse for cycle-accurate
simulation or even generation of synthesizable HDL code. The language nML was developed at TU
Berlin[12,13] andadoptedinseveral projects [1417]. However, the underlying instructionsequencer does
not allow to describe the mechanisms of pipelining as required for cycle-based models. Processors with
more complex executionschemes andinstruction-level parallelismlike the Texas Instruments TMS320C6x
cannot be described, even at the instruction-set level, because of the numerous combinations of instruc-
tions. The same restriction applies to ISDL [18], which is very similar to nML. The language ISDL is an
enhanced version of the nML formalism and allows the generation of a complete tool-suite consisting
of HLL compiler, assembler, linker, and simulator. Even the possibility of generating synthesizable HDL
code is reported, but no results on the efciency of the generated tools nor on the generated HDL code are
given. The EXPRESSION language [19] allows the cycle-accurate processor description based on a mixed
behavioral/structural approach. However, no results on simulation speed have been published nor is it
clear if it is feasible to generate synthesizable HDL code automatically. The FlexWare2 environment [20]
is capable of generating assembler, linker, simulator, and debugger from the Insulin formalism. A link to
implementation is non-existent, but test vectors can be extracted fromthe Insulin description to verify the
HDL model. The HLL compiler is derived froma separate description targeting the CoSy [21] framework.
Recently, various ASIP development systems have been introduced [2224] for systematic co-design
of instruction-set and micro-architecture implementation using a given set of application benchmarks.
The PEAS-III system [25] is an ASIP development environment based on a micro-operation description
of instructions that allows the generation of a complete tool-suite consisting of HLL compiler, assembler,
linker, and simulator including HDL code. However, no further information about the formalism is given
that parameterizes the tool generators nor have any results beenpublishedonthe efciency of the generated
tools. The MetaCore system [26] is a benchmark driven ASIP development system based on a formal
representation language. The system accepts a set of benchmark programs and estimates the hardware
cost and performance for the conguration under test. Following that, software development tools and
2006 by Taylor & Francis Group, LLC
19-4 Embedded Systems Handbook
synthesizable HDL code are generated automatically. As the formal specication of the ISA is similar to
the ISPS formalism [27], complex pipeline operations as ushes and stalls can hardly be modelled. In
addition, exibility in designing the instruction-set is limited to a predened set of instructions. Tensilica
Inc. customizes a RISC processor within the Xtensa system [28]. As the system is based on an architecture
template comprising quite a number of base instructions, it is far too powerful and thus not suitable for
highly application specic processors, which do in many cases only employ very few instructions.
Our interest in a complete retargetable tool-suite for architecture exploration, production quality
software development, architecture implementation, and system integration for a wide range of embedded
processor architectures motivated the introduction of the language LISA used in our approach. In many
aspects, LISA incorporates ideas which are similar to nML. However, it turned out from our experience
with different DSP architectures that signicant limitations of existing machine description languages
must be overcome to allow the description of modern commercial embedded processors. For this reason,
LISA includes improvements in the following areas:
Capability to provide cycle-accurate processor models, including constructs to specify pipelines
and their mechanisms including stalls, ushes, operation injection, etc.
Extension of the target class of processors including SIMD, VLIW, and superscalar architectures of
real-world processor architectures.
Explicit language statements addressing compiled simulation techniques.
Distinction between the detailed bit-true description of operation behavior including side-effects
for the simulation and implementation on the one hand and assignment to arithmetical func-
tions for the instruction selection task of the compiler on the other hand which allows to determine
freely the abstraction level of the behavioral part of the processor model.
Strong orientation on the programming languages C/C++; LISA is a framework which encloses
pure C/C++ behavioral operation description.
Support for instruction aliasing and complex instruction coding schemes.
19.3 ASIP Design Flow
Powerful application specic programmable architectures are increasingly required in the DSP, multimedia
and networking application domains in order to meet demanding cost and performance requirements.
The complexity of algorithms and architectures in these application domains prohibits an ad hoc imple-
mentation and requests for an elaborated design methodology with efcient support in tooling. In this
section, a seamless ASIP design methodology based on LISA will be introduced. Moreover, it will be
demonstrated how the outlined concepts are captured by the LISA language elements. The expressiveness
of the LISA formalism providing high exibility with respect to abstraction level and architecture category
is especially valuable for the design of high performance processors.
19.3.1 Architecture Exploration
The LISA based methodology sets in after the algorithms, which are intended for execution on the
programmable platform, are selected. The algorithm design is beyond the scope of LISA and is typically
performed in an application specic system level design environment, such as, for example, COSSAP
[29] for wireless communications or OPNET [30] for networking. The outcome of the algorithmic
exploration is a pure functional specication usually represented by means of an executable prototype
written in a high-level language (HLL) like C, together with a requirement document specifying cost and
performance parameters. In the following, the steps of our proposed design ow depicted in Figure 19.1
are described, where the ASIP designer renes successively the application jointly with the LISA model of
the programmable target architecture.
First the performance critical algorithmic kernels of the functional specication have to be identied.
This task can be easily performed with a standard proling tool, that instrumentalizes the application
2006 by Taylor & Francis Group, LLC
A Novel Methodology for the Design of ASIPs 19-5
Application
Assembly
algorithm kernal
Assembly program
Revised
assembly program
Assembly program
LISA model
Datapath
model
Instruction
model
Cycle-true
model
RTL model
ISA accurate
profiling (data)
ISA accurate
profiling (data+control)
Cycle accurate
profiling (data+control)
HW cost +timing
Exploration result
1
2
3
4
FIGURE 19.1 LISA based ASIP development ow.
code in order to generate HLL execution statistics during the simulation of the functional prototype. Thus
the designer becomes aware of the performance critical parts of the application and is therefore prepared
to dene the data path of the programmable architecture on the assembly instruction level. Starting
from a LISA processor model which implements an arbitrary basic instruction set, the LISA model can
be enhanced with parallel resources, special purpose instructions, and registers in order to improve the
performance of the considered application. At the same time, the algorithmic kernel of the application
code is translated into assembly by making use of the specied special purpose instructions. By employing
assembler, linker, and processor simulator derived from the LISA model (cf. Section 19.5), the designer
can iteratively prole and modify the programmable architecture in cadence with the application until
both fulll the performance requirements.
After the processing intensive algorithmic kernels are considered and optimized, the instruction set
needs to be completed. This is accomplished by adding instructions to the LISA model which are dedicated
to the low speed control and conguration parts of the application. However, while these parts usually
represent major portions of the application in terms of code amount, they have only negligible inuence
on the overall performance. Therefore it is very often feasible to employ the HLL C-compiler derived
from the LISA model and accept suboptimal assembly code quality in return for a signicant cut in
design time.
So far, the optimization has only been performed with respect to the software related aspects, while
neglecting the inuence of the micro-architecture. For this purpose the LISA language provides capabilities
to model cycle-accurate behavior of pipelined architectures. The LISA model is supplemented by the
instruction pipeline and the execution of all instructions is assigned to the respective pipeline stage. If the
architecture does not provide automatic interlocking mechanisms, the application code has to be revised
to take pipeline effects into account. Now the designer is able to verify that the cycle true processor model
still satises the performance requirements.
At the last stage of the design ow, the HDL generator (see Section 19.6) can be employed to generate
synthesizable HDL code for the base structure and the control path of the architecture. After implementing
the dedicated execution units of the data path, strainable numbers on hardware cost and performance
parameters (e.g., design size, power consumption, clock frequency) can be derived by running the HDL
processor model through the standard synthesis ow. On this high level of detail the designer can tweak
the computational efciency of the architecture by applying different implementations of the data path
execution units.
19.3.2 LISA Language
The language LISA [2,3] is aiming at the formalized description of programmable architectures, their
peripherals and interfaces. LISA closes the gap between purely structural oriented languages (VHDL,
Verilog) and instruction set languages.
2006 by Taylor & Francis Group, LLC
19-6 Embedded Systems Handbook
Memory
model
HLL-
compiler
Assembler
Linker
Simulator
Debugger
HDL
generator
Resource
model
Behavioral
model
Instruction
set model
Timing
model
Micro-
architecture
model
Instruction
selection
Instruction
scheduling
write
conflict
resolution
Instruction
scheduling
Operation
scheduling
Operation
grouping
Register
allocation
Memory
allocation
Profiling
Display
configuration
Basic
structure
Operation
sheduling
Operation
simulation
Simulation
of storage
Decoder/
disassembler
Instruction
translation
Instruction
decoder
FIGURE 19.2 Model requirements for ASIP design.
LISA descriptions are composed of resources and operations. The declared resources represent the storage
objects of the hardware architecture (e.g., registers, memories, pipelines) which capture the state of the
system. Operations are the basic objects in LISA. They represent the designers view of the behavior,
the structure, and the instruction set of the programmable architecture. A detailed reference of the LISA
language can be found in Reference 31.
The process of generating software development tools and synthesizing the architecture requires inform-
ation on architectural properties and the instruction set denition as depicted in Figure 19.2. These
requirements can be grouped into different architectural models the entirety of these models consti-
tutes the abstract model of the target architecture. The LISA machine description provides information
consisting of the following model components:
The memory model. This lists the registers and memories of the system with their respective bit widths,
ranges, and aliasing. The compiler gets information on available registers and memory spaces. The
memory conguration is provided to perform object code linking. During simulation, the entirety of
storage elements represents the state of the processor which can be displayed in the debugger. The HDL
code generator derives the basic architecture structure.
In LISA, the resource section lists the denitions of all objects which are required to build the memory
model. A sample resource section of the ICORE architecture described in Reference 32 is shown in
Figure 19.3. The resource section begins with the keyword RESOURCE followed by (curly) braces enclosing
all object denitions. The denitions are made in C-style and can be attributed with keywords like, for
example, REGISTER, PROGRAM_COUNTER, etc. These keywords are not mandatory but they are used
to classify the denitions in order to congure the debugger display. The resource section in Figure 19.3
shows the declaration of program counter, register le, memories, the four-stage instruction pipeline, and
pipeline-registers.
The resource model. This describes the available hardware resources and the resource requirements of
operations. Resources reect properties of hardware structures which can be accessed exclusively by one
operation at a time. The instruction scheduling of the compiler depends on this information. The HDL
code generator uses this information for resource conict resolution.
Besides the denition of all objects, the resource section in a LISA processor description provides
information about the availability of hardware resources. By this, the property of several ports, for
example, toa register bank or a memory is reected. Moreover, the behavior sectionwithinLISAoperations
announces the use of processor resources. This takes place in the section header using the keyword USES in
conjunction with the resource name and the information if the used resource is read, written or both (IN,
OUT, or INOUT respectively).
2006 by Taylor & Francis Group, LLC
A Novel Methodology for the Design of ASIPs 19-7
RESOURCE
{
PROGRAM_COUNTER int PC;
REGISTER signed int R[0..7];
DATA_MEMORY signed int RAM[0..255];
PROGRAM_MEMORY unsigned int ROM[0..255];
PIPELINE ppu_pipe = { FI; ID; EX; WB };
PIPELINE_REGISTER IN ppu_pipe
{
bit[6] Opcode;
short operandA;
short operandB;
};
}
FIGURE 19.3 Specication of the memory model.
RESOURCE
{
REGISTER unsigned int R([0..7])6;
DATA_MEMORY signed int RAM([0..15]);
}
OPERATION NEG_RM {
BEHAVIOR
USES (IN R[];
OUT RAM[];)
{
/* C-code */
RAM[address] = (-1) * R[index];
}
}
FIGURE 19.4 Specication of the resource model.
For illustrationpurposes, a sample LISAcode takenfromthe ICOREarchitecture is showninFigure 19.4.
The denition of the availability of resources is carried out by enclosing the C-style resource denition
with round braces followed by the number of simultaneously allowed accesses. If the number is omitted,
one allowed access is assumed. The gure shows the declaration of a register bank and a memory with six
and one ports respectively. Furthermore, the behavior section of the operation announces the use of these
hardware resources for read and write.
The instruction set model. This identies valid combinations of hardware operations and admissible
operands. It is expressed by the assembly syntax, instruction word coding, and the specication of legal
operands and addressing modes for each instruction. Compilers and assemblers can identify instructions
based on this model. The same information is used at the reverse process of decoding and disassembling.
In LISA, the instruction set model is captured within operations. Operation denitions collect the
description of different properties of the instruction set model which are dened in several sections:
The CODING section describes the binary image of the instruction word.
The SYNTAX section describes the assembly syntax of instructions, operands, and execution
modes.
The SEMANTICS section species the transition function of the instruction.
2006 by Taylor & Francis Group, LLC
19-8 Embedded Systems Handbook
OPERATION COMPARE_IMM {
DECLARE {
LABEL index;
GROUP src1, dest = { register };
}
CODING { 0b10011 index=0bx[5] src1 dest }
SYNTAX { "CMP" src1 "," index "," dest }
SEMANTICS { CMP (dest,src1,index) }
}
FIGURE 19.5 Specication of the instruction set model.
OPERATION register
{
DECLARE { LABEL index; }
CODING { index=0bx[4] }
EXPRESSION { R[index] }
}
OPERATION ADD {
DECLARE { GROUP src1,src2,dest = { register }; }
CODING { 0b010010 src1 src2 dest }
BEHAVIOR
{
/* C-code */
dest = src1 + src2;
saturate(&dest);
}
}
FIGURE 19.6 Specication of the behavioral model.
Figure 19.5 shows an excerpt of the ICORE LISA model contributing to the instruction set model
information on the compare immediate instruction. The DECLARE section contains local declarations
of identiers and admissible operands. Operation register is not shown in the gure but comprises the
denition of the valid coding and syntax for src1 and dest respectively.
The behavioral model. This abstracts the activities of hardware structures to operations changing the
state of the processor for simulation purposes. The abstraction level of this model can range widely
between the hardware implementation level and the level of HLL statements.
The BEHAVIOR and EXPRESSION sections within LISA operations describe components of the beha-
vioral model. Here, the behavior section contains pure C-code that is executed during simulation whereas
the expression section denes the operands and execution modes used in the context of operations. An
excerpt of the ICORE LISA model is shown in Figure 19.6. Depending on the coding of the src1, src2, and
dest eld, the behavior code of operation ADD works with the respective registers of register bank R. As
arbitrary C-code is allowed, function calls can be made to libraries which are later linked to the executable
software simulator.
The timing model. This species the activation sequence of hardware operations and units. The instruc-
tion latency information lets the compiler nd an appropriate schedule and provides timing relations
between operations for simulation and implementation.
Several parts within a LISA model contribute to the timing model. First, the declaration of pipelines in
the resource section. The declaration starts with the keyword PIPELINE, followed by an identifying name
and the list of stages. Second, operations are assigned to pipeline stages by using the keyword IN and
providing the name of the pipeline and the identier of the respective stage, such as:
OPERATION name_of _operation IN ppu_pipe.EX (19.1)
2006 by Taylor & Francis Group, LLC
A Novel Methodology for the Design of ASIPs 19-9
RESOURCE
{
PIPELINE ppu_pipe = { FI; ID; EX; WB };
}
OPERATION CORDIC IN ppu_pipe.EX
{
ACTIVATION { WriteBack }
BEHAVIOR {
PIPELINE_REGISTER(ppu_pipe, EX/WB).ResultE = cordic();
}
}
OPERATION WriteBack IN ppu_pipe.WB {
BEHAVIOR {
R[value] = PIPELINE_REGISTER(ppu_pipe, EX/WB).ResultE;
}
}
FIGURE 19.7 Specication of the timing model.
Thirdly, the ACTIVATION section in the operation description is used to activate other operations in the
context of the current instruction. The activated operations are launched as soon as the instruction enters
the pipeline stage the activated operation is assigned to. Non-assigned operations are launched in the
pipeline stage of their activation.
To exemplify this, Figure 19.7 shows sample LISA code taken from the ICORE architecture. Oper-
ations CORDIC and WriteBack are assigned to stages EX and WB of pipeline ppu_pipe, respectively.
Here, operation CORDIC activates operation WriteBack which will be launched in the following cycle
(incorrespondence to the spacial ordering of pipeline stages) incase of anundisturbedowof the pipeline.
Moreover, in the ACTIVATION section, pipelines are controlled by means of predened functions stall,
shift, ush, insert, and execute which are automatically provided by the LISA environment for each pipeline
declared in the resource section. All these pipeline control functions can be applied to single stages as well
as whole pipelines, for example:
PIPELINE(ppu_pipe,EX/WB).stall(); (19.2)
Using this very exible mechanism, arbitrary pipelines, hazards, and mechanisms like forwarding can be
modelled in LISA.
The micro-architecture model. This allows grouping of hardware operations to functional units and
contains the exact micro-architecture implementation of structural components such as adders, multi-
pliers, etc. This enables the HDL generator to generate the appropriate HDL code from a more abstract
specication.
In analogy to the syntax of the VHDL language, operation grouping to functional units is formalized
using the keyword ENTITY in the resource section of the LISA model, for example:
ENTITY Alu
{
Add, Sub
}
(19.3)
Here, LISA operations Add and Sub are assigned to the functional unit Alu. Information on the exact
micro-architectural implementation of structural components can be included into the LISA model,
for example, by calling DesignWare components [33] from within the behavior section or by inlining
HDL code.
2006 by Taylor & Francis Group, LLC
19-10 Embedded Systems Handbook
19.4 LISA Processor Design Platform
The LISA processor design platform ( LPDP) is an environment that allows the automatic generation of
software development tools for architecture exploration, hardware implementation, software development
tools for application design, and hardwaresoftware co-simulation interfaces from one sole specication of
the target architecture in the LISA language. Figure 19.8 shows the components of the LPDP environment.
19.4.1 Hardware Designer Platform For Exploration and Processor
Generation
As indicated in Section 19.3, architecture design requires the designer to work in two elds (see Figure 19.9):
on the one hand the development of the software part including compiler, assembler, linker, and simulator
and on the other hand the development of the target architecture itself.
The software simulator produces proling data and thus may answer questions concerning the instruc-
tion set, the performance of an algorithm and the required size of memory and registers. The required
silicon area or power consumption can only be determined in conjunction with a synthesizable HDL
model. To accommodate these requirements, the LISA hardware designer platform can generate the
following tools:
LISA language debugger for debugging the instruction-set with a graphical debugger frontend.
Exploration C-compiler for the non-critical parts of the application.
Exploration assembler which translates text-based instructions into object code for the respective
programmable architecture.
Exploration linker which is controlled by a dedicated linker command le.
Instruction-set architecture (ISA) simulator providing extensive proling capabilities, such as
instruction execution statistics and resource utilization.
Besides the ability to generate a set of software development tools, synthesizable HDL code (bothVHDL
and Verilog) for the processors control path and instruction decoder can be generated automatically from
the LISAprocessor description. This also comprises the pipeline and pipeline controller including complex
Hardware designer
Architecture implementation
Integration and verification Software application design
Architecture exploration
LISA
Architecture
specification
C-compiler
Simulator/debug.
Assembler/
linker
System on chip
System integrator Software designer
C compiler
Application
Simulator
Linker
Assembler
FIGURE 19.8 LISA processor design environment.
2006 by Taylor & Francis Group, LLC
A Novel Methodology for the Design of ASIPs 19-11
Target architecture
LISA description
Language compiler
HDL description
Synthesis tools
Gate level model
Language C-compiler
LISA assembler
LISA linker
LISA simulator
Evalution results
Profiling data,
execution speed
Evalution results
Chip size, clock speed,
power consumption
E
x
p
l
o
r
a
t
i
o
n
I
m
p
l
e
m
e
n
t
a
t
i
o
n
FIGURE 19.9 Exploration and implementation.
interlocking mechanisms, forwarding, etc. For the data path, hand-optimized HDL code has to be inserted
manually into the generated model. This approach has been chosen as the data path typically represents
the critical part of the architecture in terms of power consumption and speed (critical path).
It is obvious that deriving both software tools and hardware implementation model from one sole
specication of the architecture in the LISA language has signicant advantages: only one model needs
to be maintained, changes on the architecture are applied automatically to the software tools and the
implementation model and the consistency problemamong the software tools and between software tools
and implementation model is reduced signicantly.
19.4.2 Software Designer Platform For Software Application Design
To cope with the requirements of functionality and speed in the software design phase, the tools generated
for this purpose are an enhanced version of the tools generated during architecture exploration phase. The
generated simulation tools are enhanced in speed by applying the compiled simulation principle [34]
where applicable and are faster by one to two orders in magnitude than the tools currently provided by
architecture vendors. As the compiled simulation principle requires the content of the program memory
not to be changed during the simulation run, this holds true for most DSPs. However, for architectures
running the program from external memory or working with operating systems which load/unload
applications to/frominternal programmemory, this simulationtechnique is not suitable. For this purpose,
an interpretive simulator is also provided.
19.4.3 System Integrator Platform For System Integration and
Verication
Once the processor software simulator is available, it must be integrated and veried in the context of the
whole system (SOC), which can include a mixture of different processors, memories, and interconnect
components. In order to support the system integration and verication, the LPDP system integrator
platformprovides a well-dened application programmer interface (API) to interconnect the instruction-
set simulator generated from the LISA specication with other simulators. The API allows to control the
simulator by stepping, running, and setting breakpoints in the application code and by providing access
to the processor resources.
2006 by Taylor & Francis Group, LLC
19-12 Embedded Systems Handbook
The following chapters will present the different areas addressed be the LISA processor design platform
in more detail software development tools and HDL code generation. Additionally, Section 19.7 will
prove the high quality of the generated software development tools by comparing them to those shipped
by the processor vendors.
19.5 SW Development Tools
The feasibility to generate automatically HLL C-compilers, assemblers, linkers, and ISA simulators from
LISA processor models enables the designer to explore the design space rapidly. In this section, specialties
and requirements of these tools are discussed with particular focus on different simulation techniques.
19.5.1 Assembler and Linker
The LISA assembler processes textual assembly source code and transforms it into linkable object code for
the target architecture. The transformation is characterized by the instruction-set information dened
in a LISA processor description. Besides the processor specic instruction-set, the generated assembler
provides a set of pseudo-instructions (directives) to control the assembling process and initialize data.
Section directives enable the grouping of assembled code into sections which can be positioned separately
in the memory by the linker. Symbolic identiers for numeric values and addresses are standard assem-
bler features and are supported as well. Moreover, besides mnemonic-based instruction formats, C-like
algebraic assembly syntax can be processed by the LISA assembler.
The linking process is controlled by a linker command le which keeps a detailed model of the target
memory environment and an assignment table of the module sections to their respective target memories.
Moreover, it is suitable to provide the linker with an additional memory model which is separated from
the memory conguration in the LISA description and which allows linking code into external memories
that are outside the architecture model.
19.5.2 Simulator
Due to the large variety of architectures and the facility to develop models on different levels of abstraction
in the domain of time and architecture (see section 19.3), the LISA software simulator incorporates several
simulation techniques ranging from the most exible interpretive simulation to more application- and
architecture-specic compiled simulation techniques.
Compiled simulators offer a signicant increase in instruction (cycle) throughput, however, the
compiled simulation technique is not applicable in any case. To cope with this problem, the most appro-
priate simulation technique for the desired purpose (debugging, proling, verication), architecture
(instruction-accurate, cycle-accurate), and application (DSP kernel, operating system) can be chosen
before the simulation is run. An overview of the available simulation techniques in the generated LISA
simulator is given in the following:
The interpretive simulation technique is employed in most commercially available instruction set
simulators. In general, interpretive simulators run signicantly slower than compiled simulators,
however, unlike compiled simulation, this simulation technique can be applied to any LISA model
and application.
Dynamically scheduled, compiledsimulationreduces simulationtime by performing the steps of instruc-
tion decoding and operation sequencing prior to simulation. This technique cannot be applied to
models using external memories or applications consisting of self-modifying program code.
Besides the compilation steps performed in dynamic scheduling, static scheduling and code translation
additionally implement operation instantiation. While the latter technique is used for instruction-
accurate models, the former is suitable for cycle-accurate models including instruction pipelines.
Beyond, the same restrictions apply as for dynamically scheduled simulation.
2006 by Taylor & Francis Group, LLC
A Novel Methodology for the Design of ASIPs 19-13
A detailed discussion of the different compiled simulation techniques is given in the following sections,
while performance results are given in Section 19.7. The interpretive simulator is not discussed.
19.5.2.1 Compiled Simulation
The objective of compiled simulation is to reduce the simulation time. Considering instruction set sim-
ulation, efcient run-time reduction can be achieved by performing repeatedly executed operations only
once before the actual simulation is run, thus inserting an additional translation step between application
load and simulation. The preprocessing of the application code can be split into three major steps [35]:
1. Within the step of instruction decoding, instructions, operands, and modes are determined for
each instruction word found in the executable object le. In compiled simulation, the instruction
decoding is only performed once for each instruction, whereas interpretive simulators decode the
same instruction multiple times, for example, if it is part of a loop. This way, the instruction
decoding is completely omitted at run-time, thus reducing simulation time signicantly.
2. Operation sequencing is the process of determining all operations to be executed for the accom-
plishment of each instruction found in the application program. During this step, the program
is translated into a table-like structure indexed by the instruction addresses. The table lines con-
tain pointers to functions representing the behavioral code of the respective LISA operations.
Although all involved operations are identied during this step, their temporal execution order is
still unknown.
3. The determination of the operation timing (scheduling) is performed within the step of operation
instantiation and simulation loop unfolding. Here, the behavior code of the operations is instantiated
by generating the respective function calls for each instruction in the application program, thus
unfolding the simulation loop that drives the simulation into the next state.
Besides fully compiled simulation, which incorporates all of the above steps, partial implementations
of the compiled principle are possible by performing only some of these steps. The accomplishment of
each of these steps gives a further run-time reduction, but also requires a non-neglectable amount of
compilation time. The trade-off between compilation time and simulation time is (qualitatively) shown
in Figure 19.10.
There are two levels of compiled simulation which are of particular interest dynamic scheduling and
static scheduling resp. code translation. In case of the dynamic scheduling, the task of selecting operations
from overlapping instructions in the pipeline is performed at run-time of the simulation. The static
scheduling already schedules the operations at compile-time.
Simulation time
C
o
m
p
i
l
a
t
i
o
n
t
i
m
e
Fully
interpretive
Fully
compiled
Static scheduling, code translation
Compile-time
decoding
Dynamic scheduling Operation
instantiation
Operation
sequencing
FIGURE 19.10 Levels of compiled simulation.
2006 by Taylor & Francis Group, LLC
19-14 Embedded Systems Handbook
19.5.2.2 Dynamic Scheduling
As shown in Figure 19.10, the dynamic scheduling performs instruction decoding and operation sequen-
cing at compile-time. However, the temporal execution order of LISA operations is determined at simulator
run-time. While the operation scheduling is rather simple for instruction-accurate models, it becomes a
complex task for models with instruction pipelines.
In order to reect the instructions timing exactly and to consider all possibly occurring pipeline effects
like ushes and stalls, a generic pipeline model is employed simulating the instruction pipeline at run-time.
The pipeline model is parameterized by the LISA model description and can be controlled via predened
LISA operations. These operations include:
Insertion of operations into the pipeline (stages)
Execution of all operations residing in the pipeline
Pipeline shift
Removal of operations (ush)
Halt of entire pipeline or particular stages (stall)
Unlike for statically scheduled simulation, operations are inserted into and removed from the pipeline
dynamically, that means, each operation injects further operations upon its execution. The information
about operation timing is provided in the LISA description, that is, by the activation section as well as the
assignment of operations to pipeline stages (see Section 19.3.2 timing-model ).
It is obvious that the maintenance of the pipeline model at simulation time is expensive. Execution
proling onthe generatedsimulators for the Texas Instruments TMS320C62xx [36] andTMS320C54x [37]
revealed that more than fty percent of the simulators run-time is consumed by the simulation of the
pipeline.
The situation could be improved by implementing the step of operation instantiation, consequently
superseding the need for pipeline simulation. This, in turn, implies static scheduling, in other words,
the determination of the operation timing due to overlapping instructions in the pipeline taking place at
compile-time.
Although there is no pipeline model in instruction-accurate processor models, it will be shown that
operation instantiation also gives a signicant performance increase for these models. Beyond that, opera-
tion instantiation is relatively easy to implement for instruction-accurate models (in contrast to pipelined
models).
19.5.2.3 Static Scheduling
Generally, operation instantiation can be described as the generation of an individual piece of (behavioral)
simulator code for each instruction found in the application program. While this is straightforward
for instruction-accurate processor models, cycle-true, pipelined models require a more sophisticated
approach.
Considering instruction-accurate models, the shortest temporal unit that can be executed is an instruc-
tion. That means, the actions tobe performedfor the executionof anindividual instructionare determined
by the instruction alone. In the simulation of pipelined models, the granularity is dened by cycles. How-
ever, since several instructions might be active at the same time due to overlapping execution, the actions
performed during a single cycle are determined by the respective state of the instruction pipeline. As a
consequence, instead of instantiating operations for each single instruction of the application program,
behavioral code for each occurring pipeline state has to be generated. Several of such pipeline states might
exist for each instruction, depending on the execution context of the instruction, that is, the instructions
executed in the preceding and following cycles.
As pointed out previously, the principle of compiled simulation relies on an additional translation step
taking place before the simulation is run. This step is performed by a so-called simulation compiler, which
implements the three steps presented in Section 5.2.1. Obviously, the simulation compiler is a highly
architecture-specic tool, which is therefore retargeted from the LISA model description.
2006 by Taylor & Francis Group, LLC
A Novel Methodology for the Design of ASIPs 19-15
19.5.2.3.1 Operation Instantiation
The objective of static scheduling is the determination of all possible pipeline states according to the
instructions found in the application program. For purely sequential pipeline ow, that is, in case that
no control hazards occur, the determination of the pipeline states can be achieved simply by overlapping
consecutive instructions subject to the structure of the pipeline. In order to store the generated pipeline
states, pipeline state tables are used, providing an intuitive representation of the instruction ow in the
pipeline. Inserting instructions into pipeline state tables is referred to as scheduling in the following.
A pipeline state table is a two-dimensional array storing pointers to LISA operations. One dimension
represents the locationwithinthe application, the other the locationwithinthe pipeline, that is, the stage in
which the operation is executed. When a new instruction has to be inserted into the state table, both intra-
instruction and inter-instruction precedence must be considered to determine the table elements, in which
the corresponding operations will be entered. Consequently, the actual time an operation is executed at
depends on the scheduling of the preceding instruction as well as the scheduling of the operation(s)
assigned to the preceding pipeline stage within the current instruction. Furthermore, control hazards
causing pipeline stalls and/or ushes inuence the scheduling of the instruction following the occurrence
of the hazard.
A simplied illustration of the scheduling process is given in Figure 19.11. Figure 19.11(a) shows the
pipeline state table after a branch instruction has been inserted, composed of the operations fetch, decode,
branch, and update_pc as well as a stall operation. The table columns represent the pipeline stages,
the rows represent consecutive cycles (with earlier cycles in upper rows). The arrows indicate activation
chains.
The scheduling of a new instruction always follows the intra-instruction precedence, that means, fetch
is scheduled before decode, decode before branch, and so on. The appropriate array element for fetch is
determined by its assigned pipeline stage (FE) and according to inter-instruction precedences. Since the
branch instruction follows the add instruction (which has already been scheduled), the fetch operation is
inserted belowthe rst operation of add (not shown in Figure 19.11[a]). The other operations are inserted
according to their precedences.
The stall of pipeline stage FE, which is issued from the decode operation of branch, is processed by
tagging the respective table element as stalled. Whenthe next instructionis scheduled, the stall is accounted
for by moving the decode operation to the next table row resp. next cycle (see Figure 19.11[b]). Pipeline
ushes are handled in a similar manner: if a selected table element is marked as ushed, the scheduling of
the current instruction is abandoned.
Assuming purely sequential instruction ow, the task of establishing a pipeline state table for the entire
application program is very straightforward. However, every (sensible) application contains a certain
amount of control ow(e.g., loops) interrupting this sequential execution. The occurrence of such control
ow instructions makes the scheduling process extremely difcult or in a few cases even impossible.
fetch
branch
upd_pc
add
write_r
sub incr
write_r
stalled
decode
decode
FE DC EX WB
fetch
branch
upd_pc
add
decode
decode
sub incr
write_r
write_r stalled
fetch
decode
FE
DC EX WB
(a) (b)
FIGURE 19.11 Inserting instructions into pipeline state table.
2006 by Taylor & Francis Group, LLC
19-16 Embedded Systems Handbook
PF FE DC AC RD EX
BC i1
i6 i5 i4
i7 i6 i5 i4
addr BC i1
i4 addr BC i1
i5 i4 add BC i1
stall stall addr BC i1
addr BC
addr
i8 i7 i6 i5 i4
i9 i8 i7 i6 i5
i10 i9 i8 i7 i6
i4
i5
k1
k2 k1
k3 k2 k1
k4 k3 k2 k1
k5 k4 k3 k2 k1
Address Instruction
a1 i1
a2,a3 BC addr
a4 i4
a5 i5
... ...
b1 k1
C
y
c
l
e
Condition evaluated
FIGURE 19.12 Pipeline behavior for a conditional branch.
Generally, all instructions modifying the program counter cause interrupts in the control ow. Further-
more, only instructions providing an immediate target address, that is, branches and calls whose target
address is known at compile-time, can be scheduled statically. If indirect branches or calls occur, it is
inevitable to switch back to dynamic scheduling at run-time.
Fortunately, most control ow instructions can be scheduled statically. Figure 19.12 exemplarily shows
the pipeline states for a conditional branch instruction as found in the TMS320C54xs instruction-set.
Since the respective condition cannot be evaluated until the instruction is executed, scheduling has to
be performed for both eventualities (condition true resp. false), splitting the program into alternative
execution paths. The selection of the appropriate block of prescheduled pipeline states is performed by
switching among different state tables at simulator run-time. In order to prevent from doubling the entire
pipeline state table each time a conditional branch occurs, alternative execution paths are left as soon as
an already generated state has been reached. Unless several conditional instructions reside in the pipeline
at the same time, these usually have the length of a few rows.
19.5.2.3.2 Simulator Instantiation
After all instructions of the application program have been processed, and thus the entire operation
schedule has been established, the simulator code can be instantiated. The simulation compiler backend
thereby generates either C code or an operation table with the respective function pointers, both describ-
ing alternative representations of the application program. Figure 19.13 shows a simplied excerpt of
the generated C code for a branch instruction. Cases represent instructions, while a new line starts a new
cycle.
2006 by Taylor & Francis Group, LLC
A Novel Methodology for the Design of ASIPs 19-17
switch (pc) {
case 0x1584: fetch(); decode(); sub(); write_registers();
case 0x1585: fetch(); decode(); test_condition(); add();
case 0x1586: branch(); write_registers();
fetch(); update_pc();
fetch(); decode();
fetch(); decode(); load(); goto_0x1400_;
}
FIGURE 19.13 Generated simulator code.
19.5.2.4 Instruction-Based Code Translation
The need for a scheduling mechanism arises from the presence of an instruction pipeline in the LISA
model. However, even instruction-accurate processor models without pipeline benet from the step of
operation instantiation. The technique applied here is called instruction-based code translation. Due to
the absence of instruction overlap, simulator code can be instantiated for each instruction independently,
thus simplifying simulator generation to the concatenation of the respective behavioral code specied
in the LISA description.
In contrast to direct binary-to-binary translation techniques [38], the translation of target-specic into
host-specic machine code uses C source code as intermediate format. This keeps the simulator portable,
and thus independent from the simulation host.
Since the instruction-based code translation generates program code that linearly increases in size with
the number of instructions in the application, the use of this simulation technique is restricted to small
and medium sized applications (less than 10k instructions, depending on model complexity). For large
applications, the resultant worse cache utilization on the simulation host reduces the performance of the
simulator signicantly.
19.6 Architecture Implementation
As we are targeting the development of application specic instruction set processors (ASIP), which are
highly optimized for one specic application domain, the HDL code generated from a LISA processor
description has to fulll tight constraints to be an acceptable replacement for handwritten HDL code by
experienced designers. Especially power consumption, chip area, and execution speed are critical points
for this class of architectures. For this reason, the LPDP platform does not claim to be able to efciently
synthesize the complete HDL code of the target architecture. Especially the data path of an architecture is
highly critical and must in most cases be optimized manually. Frequently, full-custom design technique
must be used to meet power consumption and clock speed constraints. For this reason, the generated
HDL code is limited to the following parts of the architecture:
Coarse processor structure such as register set, pipeline, pipeline registers, and test-interface.
Instruction decoder setting data and control signals which are carried through the pipeline and
activate the respective functional units executed in context of the decoded instruction.
Pipeline controller handling different pipeline interlocks, pipeline register ushes and supporting
mechanisms such as data forwarding.
Additionally, hardware operations as they are described in the LISA model can be grouped to functional
units (see Section 19.3.2 micro-architecture model ). Those functional units are generated as wrappers,
that is, the ports of the functional units as well as the interconnects to the pipeline registers and other
functional units are generated automatically while the content needs to be lled manually with code.
Emerging driver conicts in context with the interconnects are resolved automatically by the insertion of
multiplexers.
2006 by Taylor & Francis Group, LLC
19-18 Embedded Systems Handbook
The disadvantage of rewriting the data path in the HDL description by hand is that the behavior of
hardware operations within those functional units has to be described and maintained twice on the one
hand in the LISA model and on the other hand in the HDL model of the target architecture. Consequently,
a problem here is verication which will be addressed in future research.
19.6.1 LISA Language Elements for HDL Synthesis
The following sections will show in detail, how different parts of the LISA model contribute to the generated
HDL model of the target architecture.
19.6.1.1 The Resource Section
The resource section provides general information about the structure of the architecture (e.g., registers,
memories, and pipelines, see Section 19.3.2 resource/memory model ). Based on this information, the
coarse structure of the architecture can be generated automatically. Figure 19.14 shows an excerpt resource
declaration of the LISA model of the ICORE architecture [32], which was used in our case study.
The ICORE architecture has two different register sets one for general purpose use named R,
consisting of eight separate registers with 32 bits width and one for the address registers named AR,
consisting of four elements each with eleven bits. The round brackets indicate the maximum number of
simultaneously accesses allowed for the respective register bank six for the general purpose register R
and one for the address register set. From that, the respective number of access ports to the register banks
can be generated automatically. With this information bit-true widths, ranges and access ports the
register banks can be easily synthesized. Moreover, a data and program memory resource are declared
both 32 bits wide and with just one allowed access per cycle. Since various memory types are known and
are generally very technology dependent, however, cannot be further specied inthe LISAmodel, wrappers
are generated with the appropriate number of access ports. Before synthesis, the wrappers need to be lled
manually with code for the respective technology. The resources labelled as PORT are accessible from
outside the model and can be attached to a testbench in the ICORE the RESET and the STATE_BUS.
Besides the processor resources such as memories, ports, and registers, also pipelines and pipeline
registers are declared. The ICORE architecture contains a four stage instruction pipeline consisting of the
stages FI (instruction fetch), ID (instruction decode), EX (instruction execution), and WB (write-back
to registers). In between those pipeline stages, pipeline registers are located which forward information
about the instruction such as instruction opcode, operand registers, etc. The declared pipeline registers
are multiple instanced between each stage and are completely generated from the LISA model. For the
pipeline and the stages, entities are created which are in a subsequent phase of the HDL generator run
lled with code for functional units, instruction decoder, pipeline controller, etc.
RESOURCE
{
REGISTER S32 R([0..7])6; /* GP Registers */
REGISTER bit[11] AR([0..3]); /* Address Registers */
DATA_MEMORY S32 RAM([0..255]); /* Memory Space */
PROGRAM_MEMORY U32 ROM([0..255]);/* Instruction ROM */
PORT bit[1] RESET; /* Reset pin */
PORT bit[32] STATE_BUS; /* Processor state bus */
PIPELINE ppu_pipe = { FI; ID; EX; WB };
PIPELINE_REGISTER IN ppu_pipe {
bit[6] Opcode;
...
};
}
FIGURE 19.14 Resource declaration in the LISA model of the ICORE architecture.
2006 by Taylor & Francis Group, LLC
A Novel Methodology for the Design of ASIPs 19-19
Architecture entity
Register entity
FE entity FE/DC DC entity
Branch entity ALU entity Shifter entity
DC/EX EX entity
Base structure
Pipeline structure
Stage structureLISA entities
Pipline entity Memory entity
FIGURE 19.15 Entity hierarchy in generated HDL model.
19.6.1.2 Grouping Operations to Functional Units
As the LISA language describes the target architectures behavior and timing on the granularity of hardware
operations, however, the synthesis requires the grouping of hardware operations to functional units that
can then be lled with hand-optimized HDL code for the data path, a well known construct from the
VHDL language was adopted for this purpose: the ENTITY (see Section 19.3.2 micro-architecture
model ). Using the ENTITY to group hardware operations to a functional unit is not only an essential
information for the HDL code generator but also for retargeting the HLL C-compiler which requires
information about the availability of hardware resources to schedule instructions.
As indicated in Section 19.6.1.1, the HDL code derived fromthe LISAresource sectionalready comprises
a pipeline entity including further entities for each pipeline stage and the respective pipeline registers. The
entities dened in the LISA model are now part of the respective pipeline stages as shown in Figure 19.15.
Here, a Branch entity is placed into the entity of the Decode stage. Moreover, the EX stage contains an
ALU and a Shifter entity. As it is possible in LISA to assign hardware operations to pipeline stages, this
information is sufcient to locate the functional units within the pipeline they are assigned to.
As already pointed out, the entities of the functional units are wrappers which need to be lled with
HDL code by hand. Nevertheless, in Section 19.6.2.1 will be shown that by far the largest part of the target
architecture can be generated automatically from a LISA model.
19.6.1.3 Generation of the Instruction Decoder
The generated HDL decoder is derived from information in the LISA model on the coding of instructions
(see Section 19.3.2 instruction-set model ). Depending on the structuring of the LISA architecture
description, decoder processes are generated in several pipeline stages. The specied signal paths within
the target architecture can be divided into data signals and control signals. The control signals are a
straight forward derivation of the operation activation tree which is part of the LISA timing model (see
Section 19.3.2 timing model ). The data signals are explicitly modelled by the designer by writing values
into pipeline registers and implicitly xed by the declaration of used resources in the behavior sections of
LISA operations.
19.6.2 Implementation Results
The ICORE which was used in our case study is a low-power application specic instruction set processor
(ASIP) for DVB-Tacquisitionandtracking algorithms. It has beendevelopedincooperationwithInneon
Technologies. The primary tasks of this architecture are the FFT-window-position, sampling-clock syn-
chronization for interpolation/decimation and carrier frequency offset estimation. In a previous project
this architecture was completely designed by hand using semi-custom design. Thereby, a large amount of
effort was spent in optimizing the architecture towards extremely low power consumption while keeping
up the clock frequency at 120 MHz. At that time, a LISA model was already realized for architecture
exploration purposes and for verifying the model against the handwritten HDL implementation.
2006 by Taylor & Francis Group, LLC
19-20 Embedded Systems Handbook
Instruction-
fetch
FI ID EX
WB
DAG
ZOLP
Branch
Pipeline
control
Decoder
Mem
Addsub
Bitmanip
IIC
MOVE
ALU
Minmax
Mult
Shifter
Registers
I/O
control
Memory
Data path
Control path
Autom. gen. process Entity
Manual entity
ICORE architecture
Pipeline
Write-
back
Decoder
FIGURE 19.16 The complete generated HDL model.
Except for the data path within functional units, the HDL code of the architecture has been generated
completely. Figure 19.16 shows the composition of the model.
The dark boxes have been lled manually with HDL code, whereas the light boxes and interconnects
are the result of the generation process.
19.6.2.1 Comparison of Development Time
The LISA model of the ICORE as well as the original handwritten HDL model of the ICORE architecture
have been developed by one designer. The initial manual realization of the HDL model (without the time
needed for architecture exploration) took approx. three months. As already indicated, a LISA model was
built in this rst realization of the ICORE for architecture exploration and verication purposes. It took
the designer approx. one month to learn the LISA language and to create a cycle-accurate LISA model.
After completion of the HDL generator, it took another two days to rene the LISA model to
RTL-accuracy. The handwritten functional units (data path), that were added manually to the generated
HDL model, could be completed in less than a week.
This comparisonclearly indicates, that the time expensive work inrealizing the HDLmodel was to create
structure, controller and decoder of the architecture. In addition, a major decrease of total architecture
design time can be seen, as the LISA model results from the design exploration phase.
19.6.2.2 Gate Level Synthesis
To verify the feasibility of generating automatically HDL code fromLISAarchitecture descriptions in terms
of power-consumption, clock speed, and chip area, a gate level synthesis was carried out. The model has
not been changed (i.e., manually optimized) to enhance the results.
2006 by Taylor & Francis Group, LLC
A Novel Methodology for the Design of ASIPs 19-21
19.6.2.2.1 Timing and Size Comparison
The results of the gate-level synthesis affecting timing and area optimization were compared to the
handwrittenICOREmodel, whichcomprisedthe same architectural features. Moreover, the same synthesis
scripts were used for both models. It shall be emphasized that the performance values are nearly the
same for both models. Furthermore, it is interesting that the same critical paths were found in both,
the handwritten and the generated model. The critical paths occur exclusively in the data path, which
conrms the presumption that the data path is the most critical part of the architecture and should thus
not be generated automatically from an abstract processor model.
19.6.2.2.2 Critical Path
The synthesis has been performed with a clock of 8 nsec, this equals a frequency of 125 MHz. The critical
path, starting from the pipeline register to the shifter unit and multiplexer to the next pipeline register,
violates this timing constraints by 0.36 nsec. This matches the handwritten ICORE model, which has been
improved from this point of state manually at gate-level.
The longest combinatoric path of the IDstage runs through the decoder and the DAGentity and counts
3.7 nsec. Therefore, the generated decoder does not affect the critical path in any way.
19.6.2.2.3 Area
The synthesized area has been a minor criteria, due to the fact that the constrains for the handwritten
ICORE model are not area sensitive. The total area of the generated ICORE model is 59,009 gates. The
combinational area takes 57% of the total area. The handwritten ICORE model takes a total area of 58,473
gates.
The most complex part of the generated ICORE is the decoder. The area of the automatically generated
decoder in the ID stage is 4693 gates, whereas the area of the handwritten equivalent is 5500 gates. This
result must be consideredcarefully as the control logic varies insome implementedfeatures for example,
the handwritten decoder and program ow controller support an idle and suspended state of the core.
19.6.2.2.4 Power Consumption Comparison
Figure 19.17 shows the comparisonof power consumptionof the handwrittenversus the generated ICORE
realization.
The handwritten model consumes 12,64 mW, whereas the implementation generated from a LISA
model consumes 14,51 mW. The reason for the slightly worse numbers in power consumption of the
generated model versus the handwritten is due to the early version of the LISA HDL generator which in its
current state allows access to all registers and memories within the model via the test-interface. Without
this unnecessary overhead, the same results as for the hand-optimized model are achievable.
16
14
12
10
8
6
4
2
0
P
o
w
e
r
c
o
n
s
u
m
p
t
i
o
n
(
m
W
)
14,51mW
12,64mW
Handwritten ICORE Generated ICORE
FIGURE 19.17 Power consumption of different ICORE realizations.
2006 by Taylor & Francis Group, LLC
19-22 Embedded Systems Handbook
FIGURE 19.18 Graphical debugger frontend.
To summarize, it could be shown in this chapter that it is feasible to generate efcient HDL code from
architecture descriptions in the LISA language.
19.7 Tools for Application Development
The LPDP application software development tool-suite includes HLL C-compiler, assembler, linker,
simulator as well as a graphical debugger frontend. Providing these tools, a complete software devel-
opment environment is available which ranges from the C/assembly source le up to simulation within a
comfortable graphical debugger frontend.
The tools are an enhanced version of those tools used for architecture exploration. The enhancements
concern for the software simulator the ability to graphically visualize the debugging process of the applic-
ation under test. The LISA debugger frontend ldb is a generic GUI for the generated LISA simulator (see
Figure 19.18). It visualizes the internal state of the simulation process. Both the C-source code and the
disassembly of the application as well as all congured memories and (pipeline) registers are displayed.
All contents can be changed in the frontend at run-time of the application. The progress of the simulator
can be controlled by stepping and running through the application and setting breakpoints.
The code generation tools (assembler and linker) are enhanced in functionality as well. The assembler
supports more than30 commonassembler directives, labels, and symbols, named user sections, generation
of source listing and symbol table and provides detailed error report and debugging facilities, whereas the
linker is driven by a powerful linker command le with the ability to link sections into different address
spaces, paging support, and the possibility to dene user specic memory models.
19.7.1 Examined Architectures
To examine the quality of the generated software development tools, four different architectures have been
considered. The architectures were carefully chosen to cover a broad range of architectural characteristics
and are widely used in the eld of digital signal processing (DSP) and microcontrollers (C). Moreover,
the abstraction level of the models ranges fromphase-accuracy (TMS320C62x) to instruction-set accuracy
(ARM7):
ARM7. The ARM7 core is a 32 bit microcontroller of Advanced RISCMachines Ltd [39]. The realization
of a LISA model of the ARM7 C at instruction-set accuracy took approx. two weeks.
2006 by Taylor & Francis Group, LLC
A Novel Methodology for the Design of ASIPs 19-23
ADSP2101. The Analog Devices ADSP2101 is a 16 bit xed-point DSP with 20 bit instruction-word
width [40]. The realization of the LISA model of the ADSP2101 at cycle-accuracy took approx. 3 weeks.
TMS320C54x. The Texas Instruments TMS320C54x is a high performance 16 bit xed-point DSP with
a six stage instruction pipeline [37]. The realization of the model at cycle-accuracy (including pipeline
behavior) took approx. 8 weeks.
TMS320C62x. The Texas Instruments TMS320C62x is a general-purpose xed-point DSP based on a
very long instruction-word (VLIW) architecture containing an eleven stage pipeline [36]. The realization
of the model at phase-accuracy (including pipeline behavior) took approx. 6 weeks.
These architectures were modelled on the respective abstraction level with LISA and software devel-
opment tools were generated successfully. The speed of the generated tools was then compared with the
tools shipped by the respective tools of the architecture vendor. Of course the LISA tools are working
on the same level of accuracy as the vendor tools. The vendor tools are exclusively using the interpretive
simulation technique.
19.7.2 Efciency of the Generated Tools
Measurements took place on an AMD Athlon system with a clock frequency of 800 MHz. The system is
equipped with 256 MB of RAM and is part of the networking system. It runs under the operating system
Linux, kernel version 2.2.14. Tool compilation was performed with GNU GCC, version 2.92.
The generation of the complete tool-suite (HLL C-compiler, simulator, assembler, linker, and debugger
frontend) takes, depending on the complexity of the considered model, between 12 sec (ARM7 C
instruction-set accurate) and 67 sec (C6x DSP phase-accurate). Due to the early stage in research on the
retargetable compiler (see Section 19.8), no results on code quality are presented.
19.7.2.1 Performance of the Simulator
Figures 19.19 to 19.22 show the speed of the generated simulators in instructions per second and cycles
per second, respectively. Simulation speed was quantied by running an application on the respective
simulator and counting the number of processed instructions/cycles.
The set of simulated applications on the architectures comprises a simple 20 tap FIR lter, an ADPCM
G.721 (Adaptive Differential Pulse Code Modulation) coder/decoder and a GSM speech codec. For the
ARM7, an ATM-QFC protocol application was additionally run, which is responsible for ow control and
conguration in an ATM portprocessor chip.
As expected, the compiled simulation technique applied by the generated LISA simulators outperforms
the vendor simulators by one to two orders in magnitude.
40
35
30
25
20
15
10
5
0
S
p
e
e
d
(
i
n
m
e
g
a
i
n
s
t
r
.
p
e
r
s
e
c
o
n
d
s
)
FIR ADPCM ATM-QFC
LISAcompiled (code translation)
LISAcompiled (dynamic scheduling)
ARMulator interpretive
ARM7 (real hardware, 25MHz)
FIGURE 19.19 Speed of the ARM7 C at instruction-accuracy.
2006 by Taylor & Francis Group, LLC
19-24 Embedded Systems Handbook
25
20
15
10
5
0
S
p
e
e
d
(
i
n
m
e
g
a
c
y
c
l
e
s
p
e
r
s
e
c
)
FIR ADPCM GSM
LISAcompiled (code translation)
LISAcompiled (dynamic scheduling)
Analog Devices xsim 2101interpretive
0,01Meg 0,01Meg 0,01Meg
FIGURE 19.20 Speed of the ADSP2101 DSP at cycle-accuracy.
6
5
4
3
2
1
0
S
p
e
e
d
(
i
n
m
e
g
a
c
y
c
l
e
s
p
e
r
s
e
c
)
FIR ADPCM GSM
0,075Meg 0,075Meg 0,075Meg
LISAcompiled (static scheduling)
LISAcompiled (dynamic scheduling)
Texas Instruments sim54x interpretive
FIGURE 19.21 Speed of C54x DSP at cycle-accuracy.
1200
1000
800
600
400
200
0
S
p
e
e
d
(
i
n
k
i
l
o
c
y
c
l
e
s
p
e
r
s
e
c
)
FIR ADPCM GSM
Texas Instruments sim54x interpretive
LISAcompiled (dynamic scheduling)
15K 15K 15K
FIGURE 19.22 Speed of the C6x DSP at cycle-accuracy.
2006 by Taylor & Francis Group, LLC
A Novel Methodology for the Design of ASIPs 19-25
As both the ARM7 and ADSP2101 LISA model contain no instruction pipeline, two different avors of
compiled simulation are applied in the benchmarks instruction-based code translation and dynamic
scheduling (see Section 19.5.2.4). It shows, that the highest possible degree of simulation compilation
offers an additional speed-up of a factor 27 compared to dynamically scheduled compiled simulation.
As explained in Section 19.5.2.4, the speed-up decreases with bigger applications due to cache misses on
the simulating host. It is interesting to see that considering an ARM7 C running at a frequency of
25 MHz the software simulator running at 31 MIPS even outperforms the real hardware. This makes
application development suitable before the actual silicon is at hand.
The LISA model of the C54x DSP is cycle-accurate and contains an instruction pipeline. Therefore, com-
piled simulation with static scheduling is applied (see Section 19.5.2.3). This pays off with an additional
speed-up of a factor of 5 compared to a dynamically scheduled compiled simulator.
Due tothe superscalar instructiondispatching mechanismusedinthe C62x architecture, whichis highly
run-time dependent, the LISA simulator for the C62x DSP uses only compiled simulation with dynamic
scheduling. However, the dynamic scheduled compiled simulator still offers a signicant speed-up of a
factor of 65 compared to the native TI simulator.
19.7.2.2 Performance of Assembler and Linker
The generated assembler and linker are not as time critical as the simulator is. It shall be mentioned though
that the performance (i.e., the number of assembled/linked instructions per second) of the automatically
generated tools is comparable to that of the vendor tools.
19.8 Requirements and Limitations
In this chapter the requirements and current limitations of different aspects of the processor design using
the LISA language are discussed. These affect the modelling capabilities of the language itself as well as the
generated tools.
19.8.1 LISA Language
Commontoall models describedinLISAis the underlying zero-delay model. This means that all transitions
are provided correctly at each control step. Control steps may be clock phases, clock cycles, instruction
cycles or even higher levels. Events between these control steps are not regarded. However, this property
meets requirements of current co-simulation environments [4143] on processor simulators to be used
for HW/SW co-design [44,45]. Besides, the LISA language currently contains no formalism to describe
memory hierarchies such as multi-level caches. However, existing C/C++ models of memory hierarchies
can easily be integrated into the LISA architecture model.
19.8.2 HLL C-compiler
Due to the early stage of research, no further details on the retargetable compiler are presented within
the scope of this chapter. At the current status, the quality of the generated code is only fair. However,
it is evident that the proposed new ASIP design methodology can only be carried out efciently at the
presence of an efcient retargetable compiler. In our case study presented in Section 19.6, major parts of
the application were realized in assembly code.
19.8.3 HDL Generator
As LISA allows modelling the architecture using a combination of both LISA language elements and pure
C/C++ code, certain coding guidelines need to be obeyed in order to generate synthesizable HDL code of
the target architecture. Firstly, only the LISA language elements are considered thus the usage of C-code
in the model needs to be limited to the description of the data path which is not taken into account for
2006 by Taylor & Francis Group, LLC
19-26 Embedded Systems Handbook
HDL code generation anyway. Secondly, architectural properties, which can be modelled in LISA but are
not synthesizable include pipelined functional units and multiple instruction word decoders.
19.9 Conclusion and Future Work
In this chapter we presented the LISA processor design platform LPDP a novel framework for the
design of application specic integrated processors. The LPDP platform helps the architecture designer
in different domains: architecture exploration, implementation, application software design, and system
integration/verication.
In a case study it was shown that an ASIP, the ICORE architecture, was completely realized using this
novel design methodology from exploration to implementation. The implementation results concern-
ing maximum frequency, area and power consumption were comparable to those of the hand-optimized
version of the same architecture realized in a previous project.
Moreover, the quality of the generated software development tools was compared to those of the
semiconductor vendors. LISA models were realized and tools successfully generated for the ARM7 C, the
Analog Devices ADSP2101, the Texas Instruments C62x and the Texas Instruments C54x on instruction-
set/cycle/phase-accuracy respectively. Due to the usage of the compiled simulation principle, the generated
simulators run by one to two orders in magnitude faster than the vendor simulators. In addition, the
generated assembler and linker can compete well in speed with the vendor tools.
Our future work will focus on modelling further real-world processor architectures and improving
the quality of our retargetable C-compiler. In addition, formal ways to model memory hierarchies will
be addressed. For the HDL generator, data path synthesis will be examined in context of the SystemC
modelling language.
References
[1] M. Birnbaum and H. Sachs, How VSIA answers the SOC dilemma. IEEE Computer, 32,
4250, 1999.
[2] S. Pees, A. Hoffmann, V. Zivojnovic, and H. Meyr, LISAmachine description language for cycle-
accurate models of programmable DSP architectures. In Proceedings of the Design Automation
Conference (DAC). New Orleans, June 1999.
[3] V. ivojnovi c, S. Pees, and H. Meyr, LISA machine description language and generic machine
model for HW/SW co-design. In Proceedings of the IEEE Workshop on VLSI Signal Processing.
San Francisco, October 1996.
[4] K. Olukotun, M. Heinrich, and D. Ofelt, Digital system simulation: methodologies and examples.
In Proceedings of the Design Automation Conference (DAC), June 1998.
[5] J. Rowson, Hardware/software co-simulation. In Proceedings of the Design Automation Conference
(DAC), 1994.
[6] R. Stallman, Using and Porting the GNU Compiler Collection, gcc-2.95 ed. Free Software
Foundation, Boston, MA, 1999.
[7] G. Araujo, A. Sudarsanam, and S. Malik, Instruction set design and optimization for address com-
putation in DSP architectures. In Proceedings of the International Symposium on System Synthesis
(ISSS), 1996.
[8] C. Liem et al., Industrial experience using rule-driven retargetable code generation for multi-
media applications. In Proceedings of the International Symposium on System Synthesis (ISSS),
September 1995.
[9] D. Engler, VCODE: a retargetable, extensible, very fast dynamic code generation system.
InProceedings of the International Conference onProgramming Language DesignandImplementation
(PLDI), May 1996.
2006 by Taylor & Francis Group, LLC
A Novel Methodology for the Design of ASIPs 19-27
[10] D. Bradlee, R. Henry, and S. Eggers, The Marion system for retargetable instruction schedul-
ing. In Proceedings of the ACM SIGPLAN91 Conference on Programming Language Design and
Implementation. Toronto, Canada, 1991, pp. 229240.
[11] B. Rau, VLIW compilation driven by a machine description database. In Proceedings of the 2nd
Code Generation Workshop. Leuven, Belgium, 1996.
[12] M. Freericks, The nML machine description formalism. Technical Report 1991/15, Technische
Universitt Berlin, Fachbereich Informatik, Berlin, 1991.
[13] A. Fauth, J. Van Praet, and M. Freericks, Describing instruction set processors using nML.
In Proceedings of the European Design and Test Conference. Paris, March 1995.
[14] M. Hartoog et al., Generation of software tools from processor descriptions for hardware/software
codesign. In Proceedings of the Design Automation Conference (DAC), June 1997.
[15] W. Geurts et al., Design of DSP systems with chess/checkers. In Proceedings of the 2nd International
Workshop on Code Generation for Embedded Processors. Leuven, March 1996.
[16] J. Van Praet et al., A graph based processor model for retargetable code generation. In Proceedings
of the European Design and Test Conference (ED&TC), March 1996.
[17] V. Rajesh and R. Moona, Processor modeling for hardware software codesign. In Proceedings of the
International Conference on VLSI Design. Goa, India, January 1999.
[18] G. Hadjiyiannis, S. Hanono, and S. Devadas, ISDL: an instruction set description language for
retargetability. In Proceedings of the Design Automation Conference (DAC), June 1997.
[19] A. Halambi et al., EXPRESSION: a language for architecture exploration through com-
piler/simulator retargetability. In Proceedings of the Conference on Design, Automation & Test
in Europe (DATE), March 1999.
[20] P. Paulin, Design automation challenges for application-specic architecture platforms. In
Proceedings of the SCOPES 2001 Workshop on Software and Compilers for Embedded Systems,
March 2001.
[21] ACE Associated Compiler Experts, The COSY Compilation System, 2001. http://www.ace.nl/
products/cosy.html
[22] T. Morimoto, K. Saito, H. Nakamura, T. Boku, and K. Nakazawa, Advanced processor design using
hardware description language AIDL. In Proceedings of the Asia South Pacic Design Automation
Conference (ASPDAC), March 1997.
[23] I. Huang, B. Holmer, and A. Despain, ASIA: automatic synthesis of instruction-set architectures.
In Proceedings of the SASIMI Workshop, October 1993.
[24] M. Gschwind, Instruction set selection for ASIP design. In Proceedings of the International Workshop
on Hardware/Software Codesign, May 1999.
[25] S. Kobayashi et al., Compiler generation in PEAS-III: an ASIP development system. In Pro-
ceedings of the SCOPES 2001 Workshop on Software and Compilers for Embedded Systems,
March 2001.
[26] C.-M. Kyung, Metacore: an application specic DSP development system. In Proceedings of the
Design Automation Conference (DAC), June 1998.
[27] M. Barbacci, Instruction set processor specications (ISPS): the notation and its application. IEEE
Transactions on Computers, C-30, 2440, 1981.
[28] R. Gonzales, Xtensa: a congurable and extensible processor. IEEE Micro, 20, 2000.
[29] Synopsys, COSSAP. http://www.synopsys.com
[30] OPNET, http://www.opnet.com
[31] LISA Homepage, ISS, RWTH Aachen, 2001, http://www.iss.rwth-aachen.de/lisa
[32] T. Gloekler, S. Bitterlich, and H. Meyr, Increasing the power efciency of application-specic
instruction set processors using datapath optimization. In Proceedings of the IEEE Workshop on
Signal Processing Systems (SIPS). Lafayette, October 2001.
[33] Synopsys, DesignWare Components, 1999. http://www.synopsys.com/products/designware/
designware.html
2006 by Taylor & Francis Group, LLC
19-28 Embedded Systems Handbook
[34] A. Hoffmann, A. Nohl, G. Braun, and H. Meyr, Generating production quality software devel-
opment tools using a machine description language. In Proceedings of the Conference on Design,
Automation & Test in Europe (DATE), March 2001.
[35] S. Pees, A. Hoffmann, and H. Meyr, Retargeting of compiled simulators for digital signal processors
using a machine description language. In Proceedings of the Conference on Design, Automation &
Test in Europe (DATE). Paris, March 2000.
[36] Texas Instruments, TMS320C62x/C67x CPU and Instruction Set Reference Guide, March 1998.
[37] Texas Instruments, TMS320C54x CPU and Instruction Set Reference Guide, October 1996.
[38] R. Sites et al., Binary translation. Communications of the ACM, 36, 6981, 1993.
[39] Advanced Risc Machines Ltd., ARM7 Data Sheet, December 1994.
[40] Analog Devices, ADSP2101 Users Manual, September 1993.
[41] Synopsys, Eaglei, 1999. http://www.synopsys.com/products/hwsw
[42] Cadence, Cierto, 1999. http://www.cadence.com/technology/hwsw
[43] Mentor Graphics, Seamless, 1999. http://www.mentor.com/seamless
[44] L. Guerra et al., Cycle and phase accurate DSP modeling and integration for HW/SW
co-verication. In Proceedings of the Design Automation Conference (DAC), June 1999.
[45] R. Earnshaw, L. Smith, and K. Welton, Challenges in cross-development. IEEE Micro, 17,
2836, 1997.
2006 by Taylor & Francis Group, LLC
20
State-of-the-Art SoC
Communication
Architectures
Jos L. Ayala and
Marisa Lpez-Vallejo
Universidad Politcnica de Madrid
Davide Bertozzi and
Luca Benini
University of Bologna
20.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-1
20.2 AMBA Bus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-2
AMBA System Bus AMBA AHB Basic Operation
Advanced Peripheral Bus Advanced AMBA Evolutions
20.3 CoreConnect Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-7
Processor Local Bus On-Chip Peripheral Bus Device
Control Register Bus
20.4 STBus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-10
Bus Topologies
20.5 Wishbone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-11
The Wishbone Bus Transactions
20.6 SiliconBackplane MicroNetwork . . . . . . . . . . . . . . . . . . . . . . . 20-12
System Interconnect Bandwidth Conguration Resources
20.7 Other On-Chip Interconnects . . . . . . . . . . . . . . . . . . . . . . . . . . 20-14
Peripheral Interconnect Bus Avalon CoreFrame
20.8 Analysis of Communication Architectures. . . . . . . . . . . . . 20-15
Scalability Analysis
20.9 Packet-Switched Interconnection Networks . . . . . . . . . . . 20-20
20.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-21
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-21
20.1 Introduction
The current high levels of on-chip integration allow for the implementation of increasingly complex
Systems-on-Chip (SoC), consisting of heterogeneous components such as general-purpose processors,
Digital Signal Processors (DSPs), coprocessors, memories, I/Ounits, and dedicated hardware accelerators.
In this context, MultiProcessor Systems-on-Chip (MPSoC) are emerging as an effective solution to
meet the demand for computational power posed by application domains such as network processors
and parallel media processors. MPSoCs combine the advantages of parallel processing with the high
integration levels of SoCs.
It is expected that future MPSoCs will integrate hundreds of processing units and storage elements,
and their performance will be increasingly interconnect dominated [1]. Interconnect technology and
20-1
2006 by Taylor & Francis Group, LLC
20-2 Embedded Systems Handbook
architecture will become the limiting factor for achieving operational goals, and the efcient design
of low-power, high-performance on-chip communication architectures will pose novel challenges. The
main issue regards scalability of system interconnects, since the trend for system integration is expected to
continue. State-of-the-art on-chip buses rely on shared communication resources and on an arbitration
mechanism that is in charge of serializing bus access requests. This widely adopted solution unfortunately
suffers from power and performance scalability limitations, therefore a lot of effort is being devoted to
the development of advanced bus topologies (e.g., partial or full crossbars, bridged buses) and protocols,
some of them are already implemented in commercially available products. In the long run, a more
aggressive approach will be needed, and a design paradigm shift will most probably lead to a packetized
on-chip communication based on micronetworks of interconnects or Networks-on-Chip (NoC) [2,3].
This chapter focuses on state-of-the-art SoC communication architectures, providing an overview of
the most relevant ones from an industrial and research viewpoint. Beyond describing the distinctive
features of each of them, the chapter sketches the main evolution guidelines for these architectures by
means of a protocol and topology analysis framework. Finally, some basic concepts on packet-switched
interconnection networks will be put forward. Open bus specications such as Advanced MicroControl-
ler Bus Architecture (AMBA) and CoreConnect will be obviously described more in detail, providing the
background which is needed to understand the more general description of proprietary industrial bus
architectures, while at the same time being able to assess their contribution to the advance in the eld.
20.2 AMBA Bus
AMBA is a bus standard which was originally conceived by ARM to support communication among ARM
processor cores. However, nowadays AMBA is one of the leading on-chip busing systems because it is
licensed and deployed for use with third party Intellectual Property (IP) cores [4]. Designed for custom
silicon, the AMBA specication provides standard bus protocols for connecting on-chip components,
custom logic and specialized functions. These bus protocols are independent of the ARM processor and
generalized for different SoC structures.
AMBA denes a segmented bus architecture, wherein two bus segments are connected with each other
via a bridge that buffers data and operations between them. Asystembus is dened, which provides a high-
speed, high-bandwidth communication channel between embedded processors and high-performance
peripherals. Two system buses are actually specied: the AMBA High-Speed Bus (AHB) and the Advanced
System Bus (ASB).
Moreover, a low-performance and low-power peripheral bus (called Advanced Peripheral Bus, APB) is
specied, which accommodates communication with general-purpose peripherals and is connected to the
system bus via a bridge, acting as the only APB master. The overall AMBA architecture is illustrated in
Figure 20.1.
20.2.1 AMBA System Bus
ASB is the rst generation of AMBA system bus, and sits above APB in that it implements the features
required for high-performance systems including burst transfers, pipelined transfer operation and mul-
tiple bus masters. AHB is a later generation of AMBAbus which is intended to address the requirements of
high-performance, high-clock synthesizable designs. ASB is used for simpler, more cost-effective designs
whereas more sophisticated designs call for the employment of the AHB. For this reason, a detailed
description of AHB follows.
The main features of AMBA AHB can be summarized as follows:
Multiple bus masters. Optimized system performance is obtained by sharing resources among different
bus masters. A simple request-grant mechanism is implemented between the arbiter and each bus master.
In this way, the arbiter ensures that only one bus master is active on the bus and also that when no masters
are requesting the bus a default master is granted.
2006 by Taylor & Francis Group, LLC
SoC Communication Architectures 20-3
SDRAM
controller
Color
LCD
controller
Smart
card I/F
UART
Synchronous
serial port
AMBA(AHB)
System bus
AMBA(APB)
Peripheral bus
High speed
Low power
External
memory
B
r
i
d
g
e
Audio
codec I/F
Test I/F
controller
SRAM
ARM
CPU
FIGURE 20.1 Schematic architecture of AMBA bus.
Pipelined and burst transfers. Address and data phases of a transfer occur during different clock periods.
In fact, the address phase of any transfer occurs during the data phase of the previous transfer. This
overlapping of address and data is fundamental to the pipelined nature of the bus and allows for high-
performance operation, while still providing adequate time for a slave to provide the response to a transfer.
This also implies that ownership of the data bus is delayed with respect to ownership of the address bus.
Moreover, support for burst transfers allows for efcient use of memory interfaces by providing transfer
information in advance.
Split transactions. They maximize the use of bus bandwidth by enabling high latency slaves to release
the system bus during dead time while they complete processing of their access requests.
Wide data bus congurations. Support for high-bandwidth data-intensive applications is provided using
wide on-chip memories. System buses support 32-, 64-, and 128-bit data bus implementations with a
32-bit address bus, as well as smaller byte and half-word designs.
Nontristate implementation. AMBA AHB implements a separate read and write data bus in order to
avoid the use of tristate drivers. In particular, master and slave signals are multiplexed onto the shared
communication resources (read and write data buses, address bus, control signals).
A typical AMBA AHB system contains the following components:
AHB master. Only one bus master at a time is allowed to initiate and complete read and write trans-
actions. Bus masters drive out the address and control signals and the arbiter determines which master
has its signals routed to all of the slaves. A central decoder controls the read data and response signal
multiplexor, which selects the appropriate signals from the slave that has been addressed.
AHB slave. It signals back to the active master, the status of the pending transaction. It can indicate
if the transfer is completed successfully, or there was an error or the master should retry the transfer or
indicate the beginning of a split transaction.
AHB arbiter. The bus arbiter serializes bus access requests. The arbitration algorithm is not specied
by the standard and its selection is left as a design parameter (xed priority, round-robin, latency-driven,
etc.), although the request-grant based arbitration protocol has to be kept xed.
AHB decoder. This is used for address decoding and provides the select signal to the intended slave.
2006 by Taylor & Francis Group, LLC
20-4 Embedded Systems Handbook
20.2.2 AMBA AHB Basic Operation
In a normal bus transaction, the arbiter grants the bus to the master until the transfer completes and the
bus can then be handed over to another master. However, in order to avoid excessive arbitration latencies,
the arbiter can break up a burst. In that case, the master must rearbitrate for the bus in order to complete
the remaining data transfers.
A basic AHB transfer consists of four clock cycles. During the rst one, the request signal is asserted,
and in the best case at the end of the second cycle a grant signal from the arbiter can be sampled by the
master. Then, address and control signals are asserted for slave sampling on the next rising edge, and
during the last cycle the data phase is carried out (read data bus-driven or information on the write data
bus sampled). A slave may insert wait states into any transfer, thus extending the data phase, and a ready
signal is available for this purpose.
Four-, eight-, and sixteen-beat bursts are dened in the AMBA AHB protocol, as well as undened-
length bursts. During a burst transfer, the arbiter rearbitrates the bus when the penultimate address has
been sampled, so that the asserted grant signal can be sampled by the relative master at the same point
where the last address of the burst is sampled. This makes bus master handover at the end of a burst
transfer very efcient.
For long transactions, the slave can decide to split the operation warning the arbiter that the master
should not be granted access to the bus until the slave indicates it is ready to complete the transfer. This
transfer splitting mechanism is supported by all advanced on-chip interconnects, since it prevents high
latency slaves from keeping the bus busy without performing any actual transfer of data.
On the contrary, split transfers can signicantly improve bus efciency, that is, reduce the number of
bus busy cycles used just for control (e.g., protocol handshake) and not for actual data transfers. Advanced
arbitration features are required in order to support split transfers, as well as more complex master and
slave interfaces.
20.2.3 Advanced Peripheral Bus
The AMBA APB is intended for general-purpose low-speed low-power peripheral devices. It enables the
connection to the main system bus via a bridge. All bus devices are slaves, the bridge being the only
peripheral bus master.
This is a static bus that provides a simple addressing, with latched addresses and control signals for easy
interfacing. ARM recommends a dual Read and Write bus implementation, but APB can be implemented
with a single tristated data bus.
The main features of this bus are the following:
Unpipelined architecture
Low-gate count
Low-power operation
(a) Reduced loading of the main system bus is obtained by isolating the peripherals behind the
bridge.
(b) Peripheral bus signals are only active during low-bandwidth peripheral transfers.
AMBA APB operation can be abstracted as a state machine with three states. The default state for the
peripheral bus is IDLE, which switches to SETUP state when a transfer is required. SETUP state lasts just
one cycle, during which the peripheral select signal is asserted. The bus then moves to ENABLE state,
which also lasts only one cycle and which requires the address, control, and data signals to remain stable.
Then, if other transfers are to take place, the bus goes back to SETUP state, otherwise to IDLE. As can be
observed, AMBAAPB should be used to interface to any peripherals which are low-bandwidth and do not
require the high-performance of a pipelined bus interface.
2006 by Taylor & Francis Group, LLC
SoC Communication Architectures 20-5
20.2.4 Advanced AMBA Evolutions
Recently, some advanced specications of AMBA bus have appeared, featuring increased performance
and better link utilization. In particular, the Multi-Layer AHB and the AMBA AXI interconnect schemes
will be briey addressed in the following subsections.
It should be observed that interconnect performance improvement can be achieved by adopting new
topologies and by choosing new protocols, at the expense of silicon area. The former strategy leads
from shared buses to bridged clusters, partial or full crossbars, and eventually to NoCs, in an attempt to
increase available bandwidth and to reduce local contention. The latter strategy instead tries to maximize
link utilization by adopting more sophisticated control schemes and thus permitting a better sharing of
existing resources.
Multi-Layer AHB can be seen as an evolution of bus topology while keeping the AHB protocol
unchanged. On the contrary, AMBA AXI represents an advanced interconnect fabric protocol.
20.2.4.1 Multi-Layer AHB
The Multi-Layer AHB specication emerges with the aim of increasing the overall bus bandwidth and
providing a more exible interconnect architecture with respect to AMBA AHB. This is achieved by using
a more complex interconnection matrix which enables parallel access paths between multiple masters and
slaves in a system [5].
Therefore, the multi-layer bus architecture allows the interconnection of unmodied standard AHB
master and slave modules with an increased available bus bandwidth. The resulting architecture becomes
very simple and exible: each AHB layer only has one master and no arbitration and master-to-slave
muxing is needed. Moreover, the interconnect protocol implemented in these layers can be very simple: it
does not have to support request and grant, nor retry or split transactions.
The additional hardware needed for this architecture with respect to the AHBis a multiplexor to connect
the multiple masters to the peripherals and some point arbitration is also required when more than one
master wants to access the same slave simultaneously.
Figure 20.2 shows a schematic view of the multi-layer concept. The interconnect matrix contains
a decode stage for every layer in order to determine which slave is required during the transfer. The
multiplexer is used to route the request from the specic layer to the desired slave.
The arbitration protocol decides the sequence of accesses of layers to slaves based on a priority assign-
ment. The layer with lowest priority has to wait for the slave to be freed. Different arbitration schemes can
be used, and every slave port has its own arbitration. Input layers can be served in a round-robin fashion,
changing every transfer or every burst transaction, or based on a xed priority scheme.
The number of input/output ports on the interconnect matrix is completely exible and can be adapted
to suit to system requirements. As the number of masters and slaves implemented in the system increases,
the complexity of the interconnection matrix can become signicant and some optimization techniques
have to be used: dening multiple masters on a single layer, multiple slaves appearing as a single slave to
the interconnect matrix, and dening local slaves to a particular layer.
Finally, it is interesting to outline the capability of this topology to support multi-port slaves. Some
devices, such as SDRAMcontrollers, work much more efciently when processing transfers fromdifferent
layers in parallel.
20.2.4.2 AMBA AXI Protocol
AXI is the latest generation AMBA interface. It is designed to be used as a high-speed submicron inter-
connect, and also includes optional extensions for low-power operation [6]. This high-performance
protocol provides exibility in the implementation of interconnect architectures while still keeping
backward-compatibility with existing AHB and APB interfaces.
AMBA AXI builds upon the concept of point-to-point connection. AMBA AXI does not provide
masters and slaves with visibility of the underlying interconnect, instead featuring the concept of master
interfaces and symmetric slave interfaces. This approach, besides allowing seamless topology scaling, has
2006 by Taylor & Francis Group, LLC
20-6 Embedded Systems Handbook
Slave
Decode
Decode
Mux
Mux
Master
Master
Slave
Slave
Slave
FIGURE 20.2 Schematic view of the multi-layer AHB interconnect.
the advantage of simplifying the handshake logic of attached devices, which only need to manage a
point-to-point link.
To provide high scalability and parallelism, four different logical monodirectional channels are provided
in AXI interfaces: an address channel, a read channel, a write channel, and a write response channel.
Activity on different channels is mostly asynchronous (e.g., data for a write can be pushed to the write
channel before or after the write address is issued to the address channel), and can be parallelized, allowing
multiple outstanding read and write requests.
Figure 20.3(a) shows how a read transaction uses the read address and read data channels. The write
operation over the write address and write data channels is presented in Figure 20.3(b).
As can be observed, the data is transferred from the master to the slave using a write data channel, and
it is transferred from the slave to the master using a read data channel. In write transactions, in which all
the data ows from the master to the slave, the AXI protocol has an additional write response channel to
allow the slave to signal to the master the completion of the write transaction.
However, the AXI protocol is a master/slave to interconnect interface denition, and this enables a
variety of different interconnect implementations. Therefore, the mapping of channels, as visible by the
interfaces, to actual internal communication lanes is decided by the interconnect designer; single resources
might be shared by all channels of a certain type in the system, or a variable amount of dedicated signals
might be available, up to a full crossbar scheme. The rationale of this split-channel implementation is
based upon the observation that usually the required bandwidth for addresses is much lower than that
for data (e.g., a burst requires a single address but maybe four or eight data transfers). Availability of
independently scalable resources might, for example, lead to medium complexity designs sharing a single
internal address channel while providing multiple data read and write channels.
Finally, some of the key incremental features of the AXI protocol can be listed as follows:
Support for out-of-order completion of transactions.
Easy addition of register stages to provide timing closure.
Support for multiple address issuing.
Separate read and write data channels to enable low-cost Direct Memory Access (DMA).
Support for unaligned data transfers.
2006 by Taylor & Francis Group, LLC
SoC Communication Architectures 20-7
Read
data
Read
data
Read
data
Address
and
control
Read address channel
Read data channel
Master
interface
Slave
interface
(a)
Address
and
control
Write
response
Write
data
Write
data
Write
data
Write address channel
(b)
Write data channel
Write response channel
Master
interface
Slave
interface
FIGURE 20.3 Architecture of transfers: (a) read operation, (b) write operation.
20.3 CoreConnect Bus
CoreConnect is an IBM-developed on-chip bus that eases the integration and reuse of processor, subsystem
and peripheral cores within standard product platform designs. It is a complete and versatile architecture
clearly targeting high-performance systems, and many of its features might be overkill in simple embedded
applications [7].
The CoreConnect bus architecture serves as the foundation of IBM Blue Logic or other non-IBM
devices. The Blue Logic ASIC/SoC design methodology is the approach proposed by IBM [8] to extend
conventional ASIC design ows to current design needs: low-power and multiple-voltage products,
recongurable logic, custom design capability, and analog/mixed-signal designs. Each of these offer-
ings requires a well-balanced coupling of technology capabilities and design methodology. The use of this
bus architecture allows the hierarchical design of SoCs.
As can be seen in Figure 20.4, the IBM CoreConnect architecture provides three buses for inter-
connecting cores, library macros, and custom logic:
Processor Local Bus (PLB)
On-Chip Peripheral Bus (OPB)
Device Control Register (DCR) Bus
The PLB bus connects the processor to high-performance peripherals, such as memories, DMA con-
trollers, and fast devices. Bridged to the PLB, the OPB supports slower-speed peripherals. Finally, the DCR
bus is a separate control bus that connects all devices, controllers, and bridges and provides a separate
2006 by Taylor & Francis Group, LLC
20-8 Embedded Systems Handbook
System
core
System
core
System
core
Peripheral
core
Peripheral
core
Arbiter
On-Chip peripheral bus
(OPB)
Processor local bus
(PLB)
On-Chip
Memory
Processor
core
Auxiliary
processor
DCR bus
DCR bus
CoreConnect bus
Bus
bridge
Arbiter
FIGURE 20.4 Schematic structure of the CoreConnect bus.
path to set and monitor the individual control registers. It is designed to transfer data between the CPUs
general-purpose registers and the slave logics device control registers. It removes conguration registers
from the memory address map, which reduces loading and improves bandwidth of the PLB.
This architecture shares many high-performance features with the AMBA bus specication. On
one hand both architectures allow split, pipelined, and burst transfers, multiple bus masters and 32-,
64-, or 128-bits architectures. On the other hand, CoreConnect also supports multiple masters in the
peripheral bus.
Please note that design toolkits are available for the CoreConnect bus and include functional models,
monitors, anda bus functional language todrive the models. These toolkits provide anadvancedvalidation
environment for engineers designing macros to attach to the PLB, OPB, and DCR buses.
20.3.1 Processor Local Bus
The PLB is the main system bus targeting high-performance and low-latency on-chip communication.
More specically, PLB is a synchronous, multi-master, arbitrated bus. It supports concurrent read and
write transfers, thus yielding a maximum bus utilization of two data transfers per clock cycle. Moreover,
PLB implements address pipelining, that reduces bus latency by overlapping a new write request with an
ongoing write transfer and up to three read requests with an ongoing read transfer [9].
Access to PLB is granted through a central arbitration mechanism that allows masters to compete
for bus ownership. This arbitration mechanism is exible enough to provide for the implementation of
various priority schemes. In fact, four levels of request priority for each master allow PLB implementation
with various arbitration priority schemes. Additionally, an arbitration locking mechanism is provided to
support master-driven atomic operations. PLB also exhibits the ability to overlap the bus request/grant
protocol with an ongoing transfer.
The PLB specication describes a system architecture along with a detailed description of the signals
and transactions. PLB-based custom logic systems require the use of a PLB macro to interconnect the
various master and slave macros.
The PLB macro is the key component of PLB architecture, and consists of a bus arbitration control unit
and the control logic required to manage the address and dataow through the PLB. Each PLB master is
attached to the PLB through separate address, read data, and write data buses and a plurality of transfer
qualier signals, while PLB slaves are attached through shared, but decoupled, address, read data, and write
data buses (each one with its own transfer control and status signals). The separate address and data buses
from the masters allow simultaneous transfer requests. The PLB macro arbitrates among them and sends
2006 by Taylor & Francis Group, LLC
SoC Communication Architectures 20-9
the address, data, and control signals from the granted master to the slave bus. The slave response is then
routed back to the appropriate master. Up to 16 masters can be supported by the arbitration unit, while
there are no restrictions in the number of slave devices.
20.3.2 On-Chip Peripheral Bus
Frequently, the OPB architecture connects low-bandwidth devices such as serial and parallel ports, UARTs,
timers, etc. and represents a separate, independent level of bus hierarchy. It is implemented as a multi-
master, arbitrated bus. It is a fully synchronous interconnect with a common clock, but its devices can run
with slower clocks, as long as all of the clocks are synchronized with the rising edge of the main clock.
This bus uses a distributed multiplexer attachment implementation instead of tristate drivers. The
OPB supports multiple masters and slaves by implementing the address and data buses as a distributed
multiplexer. This type of structure is suitable for the less data intensive OPB bus and allows adding
peripherals to a custom core logic design without changing the I/O on either the OPB arbiter or existing
peripherals. All of the masters are capable of providing an address to the slaves, whereas both masters and
slaves are capable of driving and receiving the distributed data bus.
PLB masters gain access to the peripherals on the OPB bus through the OPB bridge macro. The OPB
bridge acts as a slave device on the PLB and a master on the OPB. It supports word (32-bit), half-word
(16-bit), and byte read and write transfers on the 32-bit OPB data bus, bursts and has the capability
to perform target word rst line read accesses. The OPB bridge performs dynamic bus sizing, allowing
devices with different data widths to efciently communicate. When the OPB bridge master performs an
operation wider than the selected OPB slave can support, the bridge splits the operation into two or more
smaller transfers.
Some of the main features of the OPB specication are:
Fully synchronous
Dynamic bus sizing: byte, half-word, full-word, and double-word transfers
Separate address and data buses
Support for multiple OPB bus masters
Single cycle transfer of data between OPB bus master and OPB slaves
Sequential address (burst) protocol
16-cycle xed bus timeout provided by the OPB arbiter
Bus arbitration overlapped with last cycle of bus transfers
Optional OPB DMA transfers
20.3.3 Device Control Register Bus
The DCR bus provides an alternative path to the system for setting the individual device control registers.
These latter are on-chip registers that are implemented outside the processor core, from an architectural
viewpoint. Through the DCR bus, the host CPU can set up the device-control-register sets without
loading down the main PLB. This bus has a single master, the CPU interface, which can read or write
to the individual device control registers. The DCR bus architecture allows data transfers among OPB
peripherals to occur independently from, and concurrently with data transfers between processor and
memory, or among other PLB devices. The DCR bus architecture is based on a ring topology to connect
the CPU interface to all devices. The DCR bus is typically implemented as a distributed multiplexer across
the chip such that each subunit not only has a path to place its own DCRs on the CPU read path, but
also has a path which bypasses its DCRs and places another units DCRs on the CPU read path. DCR bus
consists of a 10-bit address bus and a 32-bit data bus.
This is a synchronous bus, wherein slaves may be clocked either faster or slower than the master,
although a synchronization of clock signals with the DCR bus clock is required.
Finally, bursts are not supported by this bus, and two-cycle minimumread or write transfers are allowed.
Optionally, they can be extended by slaves or by the single master.
2006 by Taylor & Francis Group, LLC
20-10 Embedded Systems Handbook
20.4 STBus
STBus is an STMicroelectronics proprietary on-chip bus protocol. STBus is dedicated to SoC designed for
high-bandwidth applications such as audio/video processing [10]. The STBus interfaces and protocols are
closely related to the industry standardVCI (Virtual Component Interface). The components interconnec-
ted by an STBus are either initiators (which initiate transactions on the bus by sending requests), or targets
(which respond to requests). The bus architecture is decomposed into nodes (sub-buses in which initiators
and targets can communicate directly), and the internode communications are performed through First
In First Out (FIFO) buffers. Figure 20.5 shows a schematic view of the STBus interconnect.
STBus implements three different protocols that can be selected by the designer in order to meet the
complexity, cost, and performance constraints. From lower to higher, they can be listed as follows:
Type 1: Peripheral protocol. This type is the low-cost implementation for low/medium-performance.
Its simple design allows a synchronous handshake protocol and provides a limited transaction set. The
peripheral STBus is targeted at modules that require a low complexity medium data rate communication
path with the rest of the system. This typically includes standalone modules such as general-purpose
input/output or modules which require independent control interfaces in addition to their main memory
interface.
Type 2: Basic protocol. In this case, the limited operation set of the peripheral interface is extended to
a full operation set, including compound operations, source labeling and some priority and transaction
labeling. Moreover, this implementation supports split and pipelined accesses, and is aimed at devices
which need high-performance but do not require the additional system efciency associated with shaped
request/response packets or the ability to reorder outstanding operations.
Type 3: Advanced protocol. The most advanced implementation upgrades previous interfaces with
support for out-of-order execution and shaped packets, and is equivalent to the advanced VCI protocol.
Split and pipelined accesses are supported. It allows the improvement of performance either by allowing
more operations to occur concurrently, or by rescheduling operations more efciently.
A type 2 protocol preserves the order of requests and responses. One constraint is that, when commu-
nicating with a given target, an initiator cannot send a request to a new target until it has received all the
responses from the current target. The unresponded requests are called pending, and a pending request
controller manages them. A given type 2 target is assumed to send the responses in the same order as the
request arrival order. In type 3 protocol, the order of responses may not be guaranteed, and an initiator
can communicate with any target, even if it has not received all responses from a previous one.
Type 1 Type 2 Type 3
Type 1 Type 2 Type 3
Initiators (masters)
Targets (slaves)
Initiator IP
Any bus IF
Stbus IF
STBus IF
Anybus IF
Initiator IP
STBus
FIGURE 20.5 Schematic view of the STBus interconnect.
2006 by Taylor & Francis Group, LLC
SoC Communication Architectures 20-11
Associated with these protocols, hardware components have been designed in order to build complete
recongurable interconnections between initiators and targets. A toolkit has been developed around this
STBus (graphical interface) to automatically generate top level backbone, cycle accurate high-level models,
way to implementation, bus analysis (latencies, bandwidth) and bus verication (protocol and behavior).
An STBus system includes three generic architectural components. The node arbitrates and routes the
requests and optionally the responses. The converter is in charge of converting the requests from one
protocol to another (for instance, from basic to advanced). Finally, the size converter is used between two
buses of the same type but of different widths. It includes buffering capability.
The STBus can implement various strategies of arbitration and allows to change themdynamically. In a
simplied single-node system example, a communication between one initiator and a target is performed
in several steps:
A request/grant step between the initiator and the node takes place, corresponding to an atomic
rendezvous operation of the system.
The request is transferred from the node to the target.
A response-request/grant step is carried out between the target and the node.
The response-request is transferred from the node to the initiator.
20.4.1 Bus Topologies
STBus can instantiate different bus topologies, trading-off communication parallelism with architectural
complexity. In particular, system interconnects with different scalability properties can be instantiated
such as:
Single shared bus: suitable for simple low-performance implementations. It features minimum
wiring area but limited scalability.
Full crossbar: targets complex high-performance implementations. Large wiring area overhead.
Partial crossbar: intermediate solution, medium performance, implementation complexity, and
wiring overhead.
It is worth observing that STBus allows for the instantiation of complex bus systems such as hetero-
geneous multi-node buses (thanks to size or type converters) and facilitates bridging with different bus
architectures, provided proper protocol converters are made available (e.g., STBus and AMBA).
20.5 Wishbone
The Wishbone SoC interconnect [11] denes two types of interfaces, called master and slave. Master
interfaces are cores that are capable of generating bus cycles, while slave interfaces are capable of receiving
bus cycles. Some relevant Wishbone features that are worth mentioning are the multi-master capability
which enables multiprocessing, the arbitration methodology dened by end users attending to their
needs, and the scalable data bus widths and operand sizes. Moreover, the hardware implementation of
bus interfaces is simple and compact, and the hierarchical view of the Wishbone architecture supports
structured design methodologies [12].
The hardware implementation supports various IP core interconnection schemes, including: point-to-
point connection, shared bus, crossbar switch implementation, dataow interconnection, and off-chip
interconnection. The crossbar switch interconnection is usually used when connecting two or more
masters together so that every one can access two or more slaves. In this scheme, the master initiates an
addressable bus cycle to a target slave. The crossbar switch interconnection allows more than one master
to use the bus provided they do not access the same slave. In this way, the master requests a channel on
the switch and, once this is established, data is transferred in a point-to-point way.
2006 by Taylor & Francis Group, LLC
20-12 Embedded Systems Handbook
On one hand the overall data transfer rate of the crossbar switch is higher than shared bus mechan-
isms, and can be expanded to support extremely high data transfer rates. On the other hand, the main
disadvantage is a more complex interconnection logic and routing resources.
20.5.1 The Wishbone Bus Transactions
The Wishbone architecture denes different transaction cycles attending to the action performed (read
or write) and the blocking/nonblocking access. For instance, single read/write transfers are carried out as
follows. The master requests the operation and places the slave address onto the bus. Then the slave places
data onto the data bus and asserts an acknowledge signal. The master monitors this signal and relies the
request signals when data have been latched. Two or more back-to-back read/write transfers can also be
strung together. In this case, the starting and stopping point of the transfers are identied by the assertion
and negation of a specic signal [13].
A Read-Modify-Write (RMW) transfer is also specied, which can be used in multiprocessor and
multitasking systems in order to allow multiple software processes to share common resources by using
semaphores. This is commonly done on interfaces for disk controllers, serial ports, and memory. The
RMW transfer reads and writes data to a memory location in a single bus cycle. For the correct imple-
mentation of this bus transaction, shared bus interconnects have to be designed in such a way that
once the arbiter grants the bus to a master, it will not rearbitrate the bus until the current master
gives it up. Also, it is important to note that a master device must support the RMW transfer in
order to be effective, and this is generally done by means of special instructions forcing RMW bus
transactions.
20.6 SiliconBackplane MicroNetwork
SiliconBackplane MicroNetwork is a family of innovative communication architectures licensed by Sonics
for use in SoC design. The Sonics architecture provides CPU independence, true mix-and-match of IP
cores, a unied communication medium, and a structure that makes a SOC design simpler to partition,
analyze, design, verify, and test [14].
The SiliconBackplane MicroNetwork allows high-speed pipelined transactions (data bandwidth of the
interconnect scales from 50 MB/sec to 4.8 Gbyte/sec) where the real-time Quality of Service (QoS) of
multiple simultaneous dataows is guaranteed. A network utilization of up to 90% can be achieved.
The SiliconBackplane relies on the SonicsStudio development environment for architectural explor-
ation, and the availability of pre-characterization results enables reliable performance analysis and
reduction of interconnect timing closure uncertainties. The ultimate goal is to avoid over-designing
interconnects.
The architecture can be described as a distributed communication infrastructure (thus facilitating place-
and-route) which can be extended hierarchically in the form of Tiles (collection of functions requiring
minimal assistance from the rest of the die) in an easy way. Among other features, the SiliconBackplane
MicroNetwork provides advanced error handling in hardware (features for SoC-wide error detection
and support mechanisms for software clean-up and recovery of unresponsive cores), runtime-operating
reconguration to meet changing application demands and data multicast.
The SiliconBackplane system consists of a physical interconnect bus congured with a combin-
ation of agents. Each IP core communicates with an attached agent through ports implementing
the Open Core Protocol (OCP) standard interface. The agents then communicate with each other
using a network of interconnects based on the SiliconBackplane protocol. This latter includes paten-
ted transfer mechanisms aiming at maximizing interconnect bandwidth utilization and optimized for
streaming multimedia applications [15]. Figure 20.6 shows a schematic view of the SiliconBackplane
system.
2006 by Taylor & Francis Group, LLC
SoC Communication Architectures 20-13
Master
Core
Slave Slave
Slave Slave
Master
Master
Core Core
System initiator System initiator / target System target
Initiator Target
ON-CHIP BUS
Request
Response
OCP
/ target
Bus
initiator
Bus
initiator
Bus
initiator
FIGURE 20.6 Schematic view of the SiliconBackplane system.
A few specic components can be identied in an agent architecture:
Initiators. Who implements the interface between the bus and the master core (CPU, DSP, DMA, etc.).
The initiator receives requests fromthe OCP, thentransmits the requests according to the SiliconBackplane
standard, and nally processes the responses from the target.
Targets. Who implements the interface between the physical bus and the slave device (memories, UARTs,
etc.). This module serves as the bridge between the system and the OCP.
Service agent. Who is an enhanced initiator that provides additional capabilities such as debug and test.
20.6.1 System Interconnect Bandwidth
One of the most interesting features of the SiliconBackplane network is the possibility of allocating
bandwidth based on a two-level arbitration policy. The system designer can preallocate bandwidth to
high priority initiators by means of the concept of Time-Division Multiple Access (TDMA). An initiator
agent with a preassigned time slot has the rights over that slot. If the owner does not need it, the slot is
reallocated in a round-robin fashion to one of the system devices, and this represents the second level of
the arbitration policy.
The TDMA approach provides fast access to variable-latency subsystems and is a simple mechanism
to guarantee QoS. The TDMA bandwidth allocation tables are stored in a conguration register at every
initiator, and can be dynamically over-written to t the system needs. On the other hand, the fair round-
robin allocation scheme can be used to guarantee bandwidth availability to initiators with less predictable
access patterns, since some or many of the TDMA slots may turn out to be left unallocated. Round-robin
arbitration policy is particularly suitable for best-effort trafc.
20.6.2 Conguration Resources
All the congurable IP cores implemented in the SiliconBackplane system can be congured either at
compile time or dynamically by means of specic conguration registers. These conguration devices are
accessible by the operating system.
Conguration registers are individually set for each agent, depending upon the services provided to the
attached cores. The types of conguration registers are:
Unbuffered registers hold conguration values for the agent or its subsystem core.
Buffered registers hold conguration values that must be simultaneously updated in all agents.
Broadcast conguration registers hold values that must remain identical in multiple agents.
2006 by Taylor & Francis Group, LLC
20-14 Embedded Systems Handbook
20.7 Other On-Chip Interconnects
20.7.1 Peripheral Interconnect Bus
The PI bus was developed by several European semiconductor companies (Advanced RISC Machines,
Philips Semiconductors, SGS-THOMSON Microelectronics, Siemens, TEMIC/MATRA MHS) within the
framework of a European project (OMI, Open Microprocessor Initiative framework.)
1
After this, an
extended backward-compatible PI Bus protocol standard frequently used in many hardware systems has
been developed by Philips [16].
The high-bandwidthandlow-overheadof the PI Bus provide a comfortable environment for connecting
processor cores, memories, coprocessors, I/Ocontrollers and other functional blocks in high-performance
chips, for time-critical applications.
The PI Bus functional modules are arranged in macrocells, and a wide range of functions are provided.
Macrocells with a PI Bus interface can be easily integrated into a chip layout even if they are designed by
different manufacturers.
The potential bus agents require only a PI Bus interface of low complexity. Since there is no concrete
implementation specied, PI Bus can be adapted to the individual requirements of the target chip design.
For instance, the widths of the address and data bus may be varied. The main features of this bus are:
Processor independent implementation and design
Demultiplexed operation
Clock synchronous
Peak transfer rate of 200 MB/sec (50 MHz bus clock)
Address and data bus scalable (up to 32 bits)
8-, 16-, 32-bit data access
Broad range of transfer types from single to multiple data transfers
Multi-master capability
The PI Bus does not provide cache coherency support, broadcasts, dynamic bus sizing, and unaligned
data access. Finally, the University of Sussex has developed a VHDL toolkit to meet the needs of embedded
system designers using the PI bus. Macrocell testing for PI bus compliance is also possible using the
framework available in the toolkit [17].
20.7.2 Avalon
Avalon is Alteras parameterized interface bus used by the Nios embedded processor. The Avalon switch
fabric has a set of predened signal types with which a user can connect one or more IP blocks. It can
only be implemented on Altera devices using SOPC Builder, a system development tool that automatically
generates the Avalon switch fabric logic [18].
The Avalon switch fabric enables simultaneous multi-master operation for maximum system perform-
ance by using a technique called slave-side arbitration. It determines which master gains access to a certain
slave, in the event that multiple masters attempt to access the same slave at the same time. Therefore,
simultaneous transactions for all bus masters are supported and arbitration for peripherals or memory
interfaces that are shared among masters is automatically included.
The Avalon interconnect includes chip-select signals for all peripherals, even user-dened peripherals,
to simplify the design of the embedded system. Separate, dedicated address and data paths provide an easy
interface to on-chip user logic. User-dened peripherals are not required to decode data and address bus
cycles. Dynamic bus sizing allows developers to use low-cost, narrow memory devices that do not match
the native bus size of their CPU. The switch fabric supports each type of transfer supported by the Avalon
interface. Each peripheral port into the switch is generated with reduced amount of logic to meet the
requirements of the peripheral, including wait state logic, data width matching, and passing wait signals.
1
The PI Bus has been incorporated as OMI Standard OMI 324.3D.
2006 by Taylor & Francis Group, LLC
SoC Communication Architectures 20-15
Read and write operations with latency can be performed. Latent transfers are useful to masters wanting
to issue multiple sequential read or write requests to a slave, which may require multiple cycles for the rst
transfer but fewer cycles for subsequent sequential transfers. This can be benecial for instruction-fetch
operations and DMA transfers to or from SDRAM. In these cases, the CPU or DMA master may prefetch
(post) multiple requests prior to completion of the rst transfer and thereby reduce overall access latency.
Interestingly, the Avalon interface includes signals for streaming data between master/slave pairs. These
signals indicate the peripherals capacity to provide or accept data. A master does not have to access
status registers in the slave peripheral to determine whether the slave can send or receive data. Streaming
transactions maximize throughput between masterslave pairs, while avoiding data overowor underow
on the slave peripherals. This is especially useful for DMA transfers [19].
20.7.3 CoreFrame
The CoreFrame architecture has been developed by Palmchip Corporation and relies on point-to-point
signals and multiplexing instead of shared tristate lines. It aims at delivering high-performance while
simultaneously reducing design and verication time. The distinctive features of CoreFrame are [20]:
400 MB/sec bandwidth at 100 MHz (bus speed is scalable to technology and design requirements)
Unidirectional buses only
Central, shared memory controller
Single clock cycle data transfers
Zero wait state register accesses
Separate peripheral I/O and DMA buses
Simple protocol for reduced gate count
Low-capacitive loading for high-frequency operation
Hidden arbitration for DMA bus masters
Application-specic memory map and peripherals
The most distinctive feature of CoreFrame is the separation of I/Oand memory transfers onto different
buses. The PalmBus provides for the I/O backplane and allows the processor to congure and control
peripheral blocks while the MBus provides a DMAconnectionfromperipherals to mainmemory, allowing
a direct data transfer without processor intervention.
Other on-chip interconnects are not described here owing to lack of space: IPBus from IDT [21], IP
Interface from Motorola [22], MARBLE asynchronous bus from University of Manchester [23], Atlantic
from Altera [24], ClearConnect from ClearSpeed Techn. [25], and FISPbus from Mentor Graphics [26].
20.8 Analysis of Communication Architectures
Traditional SoC interconnects, as exemplied by AMBA AHB, are based upon low-complexity shared
buses, in an attempt to minimize area overhead. Such architectures, however, are not adequate to support
the trend for SoC integration, motivating the need for more scalable designs. Interconnect performance
improvement can be achieved by adopting new topologies and by choosing new protocols, at the expense
of silicon area. The former strategy leads from shared buses to bridged clusters, partial or full crossbars,
and eventually to NoC, in an attempt to increase available bandwidth and to reduce local contention. The
latter strategy instead tries to maximize link utilization by adopting more sophisticated control schemes
and thus permitting a better sharing of existing resources. While both approaches can be followed at the
same time, we perform separate analysis for the sake of clarity.
At rst, scalability of evolving interconnect fabric protocols is assessed. Three state-of-the-art shared
buses are stressedunder anincreasing trafc load: a traditional AMBAAHBlink is not only more advanced,
but also more expensive, evolutionary solutions as offered by STBus (type 3) and AMBAAXI (based upon
a Synopsys implementation).
2006 by Taylor & Francis Group, LLC
20-16 Embedded Systems Handbook
These system interconnects were selected for analysis because of their distinctive features, which allow
to sketch the evolution of shared-bus based communication architectures. AMBA AHB makes two data
links (one for read, one for write) available, but only one of them can be active at any time. Only one bus
master can own the data wires at any time, preventing the multiplexing of requests and responses on the
interconnect signals. Transaction pipelining (i.e., split ownership of data and address lines) is provided,
but not as a means of allowing multiple outstanding requests, since address sampling is only allowed at
the end of the previous data transfer. Bursts are supported, but only as a way to cut down on rearbitration
times, and AHB slaves do not have a native burst notion. Overall, AMBA AHB is designed for a low silicon
area footprint.
The STBus interconnect (with shared bus topology) implements split request and response channels.
This means that, while a system initiator is receiving data from an STBus target, another one can issue
a second request to a different target. As soon as the response channel frees up, the second request can
immediately be serviced, thus hiding target wait states behind those of the rst transfer. The amount of
saved wait states depends on the depth of the prefetch FIFO buffers on the slave side. Additionally, the
split channel feature allows for multiple outstanding requests by masters, with support for out-of-order
retirement. An additional relevant feature of STBus is its low-latency arbitration, which is performed in a
single cycle.
Finally, AMBAAXI builds upon the concept of point-to-point connection and exhibits complex features,
such as multiple outstanding transaction support (with out-of-order or in-order delivery selectable by
means of transaction IDs) and time interleaving of trafc toward different masters on internal data lanes.
Four different logical monodirectional channels are provided in AXI interfaces, and activity on them can
be parallelized allowing multiple outstanding read and write requests. In our protocol exploration, to
provide a fair comparison, a shared bus topology is assumed, which comprises of a single internal lane
per each one of the four AXI channels.
Figure 20.7 shows an example of the efciency improvements made possible by advanced interconnects
in the test case of slave devices having two wait states, with three system processors and four-beat burst
READY1
READY2
READY3
READY3
READY2
READY1
READY1
READY1
READY2
READY2
READY3
READY3
(c)
(d)
(b)
(a)
CLOCK
1
1
1
1
1
1
1
1
2 3 4
2 3 4
4 3 2
1 2 3 4
4
1 2 3
1
2 3
3 4 2
2
2
2
3
3
3
4
4
4
1
1
1
2
2
2
3
1
FIGURE20.7 Concept waveforms showing burst interleaving for the three interconnects. (a) AMBAAHB, (b) STBus
(with minimal buffering), (c) STBus (with more buffering), and (d) AMBA AXI.
2006 by Taylor & Francis Group, LLC
SoC Communication Architectures 20-17
transfers. AMBA AHB has to pay two cycles of penalty per transferred datum. STBus is able to hide
latencies for subsequent transfers behind those of the rst one, with an effectiveness which is a function of
the available buffering. AMBA AXI is capable of interleaving transfers, by sharing data channel ownership
in time. Under conditions of peak load, when transactions always overlap, AMBA AHB is limited to
a 33% efciency (transferred words over elapsed clock cycles), while both STBus and AMBA AXI can
theoretically reach a 100% throughput.
20.8.1 Scalability Analysis
SystemC models of AMBA AHB, AMBA AXI (provided within the Synopsys CoCentric/Designware
[27]
suites), and STBus are used within the framework of the MPARM simulation platform [2830]. For the
STBus model, the depth of FIFOs instantiated by the target side of the interconnect is a congurable
parameter; their impact can be noticed on concept waveforms in Figure 20.7. 1-stage (STBus hereafter)
and 4-stage (STBus [B]) FIFOs were benchmarked.
The simulated on-chip multiprocessor consists of a congurable number of ARM cores attached to the
system interconnect. Trafc workload and pattern can easily be tuned by running different benchmark
code on the cores, by scaling the number of system processors, or by changing the amount of processor
cache, which leads to different amounts of cache rells. Slave devices are assumed to introduce one wait
state before responses.
To assess interconnect scalability, a benchmark independently but concurrently runs on every system
processor performing accesses to its private slave (involving bus transactions). This means that, while
producing real functional trafc patterns, the test setup was not constrained by bottlenecks owing to
shared slave devices.
Scalability properties of the system interconnects can be observed in Figure 20.8, reporting the execution
time variation when attaching an increasing amount of system cores to a single shared interconnect under
heavy trafc load. Core caches are kept very small (256 bytes) in order to cause many cache misses
and therefore signicant levels of interconnect congestion. Execution times are normalized against those
for a two-processor system, trying to isolate the scalability factor alone. The heavy bus congestion case is
considered here because the same analysis performed under light trafc conditions (e.g., with 1 kB caches)
shows that all of the interconnects perform very well (they are all always close to 100%), with only AHB
showing a moderate performance decrease of 6% when moving from two to eight running processors.
With 256 bytes caches, the resulting execution times, as Figure 20.8 shows, get 77% worse for AMBA
AHB when moving from two to eight cores, while AXI and STBus manage to stay within 12% and 15%. The
impact of FIFOs in STBus is noticeable, since the interconnect with minimal buffering shows execution
times 36% worse than in the two-core setup. The reason behind the behavior pointed out in Figure 20.8 is
that under heavy trafc load and with many processors, interconnect saturation takes place. This is clearly
indicated in Figure 20.9, which reports the fraction of cycles during which some transaction was pending
on the bus with respect to total execution time.
In such a congested environment, as Figure 20.10 shows, AMBA AXI and STBus (with 4-stage FIFOs) are
able to achieve transfer efciencies (dened as data actually moved over bus contention time) of up to 81%
and 83%, respectively, while AMBA AHB reaches 47% only near to its maximum theoretical efciency
of 50% (one wait state per data word). These plots stress the impact that comparatively low-area-overhead
optimizations can sometimes have in complex systems.
According to simulation results, some of the advanced features in AMBA AXI provided highly scalable
bandwidth, but at the price of latency in low-contention setups. Figure 20.11 shows the minimum and
average amount of cycles required to complete a single write and a burst read transaction in STBus
and AMBA AXI. STBus has a minimal overhead for transaction initiation, as low as a single cycle if
communication resources are free. This is conrmed by gures showing a best-case three-cycle latency for
single accesses (initiation, wait state, data transfer) and a nine-cycle latency for four-beat bursts. AMBA
AXI, owing to its complex channel management and arbitration, requires more time to initiate and close
a transaction: recorded minimum completion times are 6 and 11 cycles for single writes and burst reads,
2006 by Taylor & Francis Group, LLC
20-18 Embedded Systems Handbook
AHB AXI STBus STBus (B)
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
2 Cores
4 Cores
6 Cores
8 Cores
R
e
l
a
t
i
v
e
e
x
e
c
u
t
i
o
n
t
i
m
e
(
%
)
FIGURE 20.8 Execution times with 256 bytes caches.
AHB AXI STBus STBus (B)
0
10
20
30
40
50
60
70
80
90
100
I
n
t
e
r
c
o
n
n
e
c
t
b
u
s
y
(
%
)
2 Cores
4 Cores
6 Cores
8 Cores
FIGURE 20.9 Bus busy time with 256 bytes caches.
respectively. As bus trafc increases, completion latencies of AMBA AXI and STBus get more and more
similar because the bulk of transaction latency is spent in contention.
It must be pointed out, however, that protocol improvements alone cannot overcome the intrinsic
performance bound owing to the shared nature of the interconnect resources. While protocol features can
push the saturation boundary further, and get near to a 100% efciency, trafc loads taking advantage of
more parallel topologies will always exist. The charts reported here already show some traces of saturation
even for the most advanced interconnects. However, the improved performance achieved by more parallel
topologies strongly depends on the kind of bus trafc. In fact, if the trafc is dominated by accesses to
2006 by Taylor & Francis Group, LLC
SoC Communication Architectures 20-19
AHB AXI STBus STBus (B)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
I
n
t
e
r
c
o
n
n
e
c
t
u
s
a
g
e
e
f
f
i
c
i
e
n
c
y
(
%
)
2 Cores
4 Cores
6 Cores
8 Cores
FIGURE 20.10 Bus usage efciency with 256 bytes caches.
2 Cores 4 Cores 6 Cores 8 Cores
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
STBus (B) write avg
STBus (B) write min
STBus (B) read avg
STBus (B) read min
AXI write avg
AXI write min
AXI read avg
AXI read min
L
a
t
e
n
c
y
f
o
r
a
c
c
e
s
s
c
o
m
p
l
e
t
i
o
n
(
c
y
c
l
e
s
)
FIGURE 20.11 Transaction completion latency with 256 bytes caches.
shared devices (shared memory, semaphores, interrupt module), they have to be serialized anyway, thus
reducing the effectiveness of area-hungry parallel topologies. It is therefore evident that crossbars behave
best when data accesses are local and no destination conicts arise.
This is reected in Figure 20.12, showing average completion latencies in read accesses for different bus
topologies: shared buses (AMBA AHB and STBus), partial crossbars (STBus-32 and STBus-54), and full
crossbars (STBus-FC). Four benchmarks are considered, consisting of matrix multiplications performed
independently by each processor or in pipeline, with or without an underlying OS (Operating System)
2006 by Taylor & Francis Group, LLC
20-20 Embedded Systems Handbook
Average time for read (cycles)
0
2
4
6
8
10
12
14
16
18
20
ASM-IND OS-IND ASM-PIP OS-PIP
AMBA
ST-BUS
ST-FC
ST-32
ST-54
FIGURE 20.12 Reads average latency.
(OS-IND, OS-PIP, ASM-IND, and ASM-PIP, respectively). IND benchmarks do not give rise to inter-
processor communication, which is instead at the core of PIP benchmarks. Communication goes through
the shared memory. Moreover, OS-assisted code implicitely uses both semaphores and interrupts, while
standalone ASM applications rely on an explicit semaphore polling mechanism for synchronization pur-
poses. Crossbars showa substantial advantage in OS-INDandASM-INDbenchmarks, wherein processors
only access private memories: this operationis obviously suitable for parallelization. BothST-FCandST-54
achieve the minimum theoretical latency where no conict on private memories ever arises. ST-32 trails
immediately behind ST-FC and ST-54, with rare conicts which do not occur systematically because exe-
cution times shift among conicting processors. OS-PIP still shows signicant improvement for crossbar
designs. ASM-PIP, in contrast, puts ST-BUS at the same level of crossbars, and sometimes the shared bus
even proves slightly faster. This can be explained with the continuous semaphore polling performed by
this (and only this) benchmark; while crossbars may have an advantage in private memory accesses, the
resulting speedup only gives processors more opportunities to poll the semaphore device, which becomes
a bottleneck. Unpredictability of conict patterns can then explain why a simple shared bus can sometimes
slightly outperform crossbars, therefore the selection of bus topology should carefully match the target
communication pattern.
20.9 Packet-Switched Interconnection Networks
Previous sections have illustrated on-chip interconnection schemes based on shared buses and on evolu-
tionary communication architectures. This section introduces a more revolutionary approach to on-chip
communication, known as Network-on-Chip [2,3].
The NoC architecture consists of a packet-switched interconnetion network integrated onto a single
chip, and it is likely to better support the trend for SoC integration. The basic idea is borrowed from
the wide-area networks domain, and envisions router (or switch)-based networks of interconnects on
which on-chip packetized communication takes place. Cores access the network by means of proper
interfaces, and have their packets forwarded to destination through a certain number of hops. SoCs
differ from wide area networks in their local proximity and because they exhibit less nondeterminism.
Local, high-performance networks such as those developed for large-scale multiprocessors have
2006 by Taylor & Francis Group, LLC
SoC Communication Architectures 20-21
similar requirements and constraints. However, some distinctive features, such as energy constraints and
design-time specialization, are unique to SoC networks.
Topology selection for NoCs is a critical design issue. It is determined by howefciently communication
requirements of anapplicationcanbe mappedontoa certaintopology, andby physical level considerations.
In fact, regular topologies can be designed with a better control on electrical parameters and therefore
on communication noise sources (such as crosstalk), although they might result in link under-utilization
or localized congestion from an application viewpoint. On the contrary, irregular topologies have to deal
with more complex physical designissues but are more suitable to implement customized, domain-specic
communication architectures. Two-dimensional mesh networks are a reference solution for regular NoC
topologies.
The scalable and modular nature of NoCs and their support for efcient on-chip communication
potentially leads to NoC-based multiprocessor systems characterized by high structural complexity and
functional diversity. On one hand, these features need to be properly addressed by means of new design
methodologies, while on the other hand more efforts have to be devoted to modeling on-chip communic-
ation architectures and integrating them into a single modeling and simulation environment combining
both processing elements and communication architectures. The development of NoC architectures and
their integration into a complete MPSoC design ow is the main focus of an ongoing worldwide research
effort [3033].
20.10 Conclusions
This chapter addresses the critical issue of on-chip communication for gigascale MPSoCs. An overview of
the most widely used on-chip communication architectures is provided, and evolution guidelines aiming
at overcoming scalability limitations are sketched. Advances regard both communication protocol and
topology, although it is becoming clear that in the long term more aggressive approaches will be required
to sustain system performance, namely packet-switched interconnection networks.
References
[1] R. Ho, K.W. Mai, and M.A. Horowitz. The future of wires. Proceedings of the IEEE, 89:
490504, 2001.
[2] L. Benini and G. De Micheli. Networks on chips: a new SoC paradigm. IEEE Computer, 35:
7078, 2002.
[3] J. Henkel, W. Wolf, and S. Chakradhar. On-chip networks: a scalable, communication-centric
embedded system design paradigm. In Proceedings of the International Conference on VLSI Design,
January 2004, pp. 845851.
[4] ARM. AMBA Specication v2.0, 1999.
[5] ARM. AMBA Multi-Layer AHB Overview, 2001.
[6] ARM. AMBA AXI Protocol Specication, 2003.
[7] IBM Microelectronics. CoreConnect Bus Architecture Overview, 1999.
[8] G.W. Doerre and D.E. Lackey. The IBM ASIC/SoC methodology. A recipe for rst-time success.
IBM Journal of Research & Development, 46: 649660, 2002.
[9] IBM Microelectronics. The CoreConnect Bus Architecture White Paper, 1999.
[10] P. Wodey, G. Camarroque, F. Barray, R. Hersemeule, and J.P. Cousin. LOTOS code generation for
model checking of STBus based SoC: the STBus interconnection. In Proceedings of ACM and IEEE
International Conference on Formal Methods and Models for Co-Design, June 2003, pp. 204213.
[11] Richard Herveille. Combining WISHBONE Interface Signals, Application Note, April 2001.
[12] Richard Herveille. WISHBONE System-on-Chip (SoC) Interconnection Architecture for Portable IP
Cores. Specication, 2002.
[13] Rudolf Usselmann. OpenCores SoC Bus Review, 2001.
2006 by Taylor & Francis Group, LLC
20-22 Embedded Systems Handbook
[14] Sonics Inc. -Networks. Technical Overview, 2002.
[15] Sonics Inc. SiliconBackplane III MicroNetwork IP. Product Brief, 2002.
[16] Philip de Nier. Property checking of PI-bus modules. In Proceedings of the Workshop on Circuits,
Systems and Signal Processing (ProRISC99), J.P. Veen, Ed. STW, Technology Foundation, Mierlo,
The Netherlands, 1999, pp. 343354.
[17] ESPRIT, 1996, http://www.cordis.lu/esprit/src/results/res_area/omi/omi10.htm
[18] Altera. AHB to Avalon & Avalon to AHB Bridges, 2003.
[19] Altera. Avalon Bus Specication, 2003.
[20] Palmchip. Overview of the CoreFrame Architecture, 2001.
[21] IDT. IDT Peripheral Bus (IPBus). Intermodule Connection Technology Enables Broad Range of
System-Level Integration, 2002.
[22] Motorola. IP Interface. Semiconductor Reuse Standard, 2001.
[23] W.J. Bainbridge and S.B. Furber. MARBLE: an asynchronous on-chip macrocell bus. Micropro-
cessors and Microsystems, 24: 213222, 2000.
[24] Altera. Atlantic Interface. Functional Specication, 2002.
[25] ClearSpeed. ClearConnect Bus. Scalable High Performance On-Chip Interconnect, 2003.
[26] Summary of SoC Interconnection Buses, 2004, http://www.silicore.net/uCbusum.htm
[27] Synopsys CoCentric, 2004, http://www.synopsys.com
[28] L. Benini, D. Bertozzi, D. Bruni, N. Drago, F. Fummi, and M. Poncino. SystemC cosimulation and
emulation of multiprocessor SoC designs. IEEE Computer, 36: 5359, 2003.
[29] F. Poletti, D. Bertozzi, A. Bogliolo, and L. Benini. Performance analysis of arbitration policies
for SoC communication architectures. Journal of Design Automation for Embedded Systems,
8: 189210, 2003.
[30] M. Loghi, F. Angiolini, D. Bertozzi, L. Benini, andR. Zafalon. Analyzing on-chipcommunicationin
a MPSoCenvironment. In Proceedings of the IEEE DesignAutomation and Test in Europe Conference
(DATE04), February 2004, pp. 752757.
[31] E. Rijpkema, K. Goossens, and A. Radulescu. Trade-offs in the design of a router with both
guaranteed and best-effort services for networks on chip. In Proceedings of Design Automation and
Test in Europe, March 2003, pp. 350355.
[32] K. Lee et al. A 51 Mw 1.6 GHz on-chip network for low power heterogeneous SoC platform.
In ISSCC Digest of Technical Papers, 2004, pp. 152154.
[33] E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny. QNoC: QoS architecture and design process for
network on chip. The Journal of Systems Architecture, Special Issue on Networks on Chip, 50(23):
105128, February 2004.
2006 by Taylor & Francis Group, LLC
21
Network-on-Chip
Design for Gigascale
Systems-on-Chip
Davide Bertozzi and
Luca Benini
University of Bologna
Giovanni De Micheli
Stanford University
21.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-1
21.2 Design Challenges for On-Chip Communication
Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-3
21.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-4
21.4 NoC Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-5
Network Link Switch Network Interface
21.5 NoC Topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-13
Domain-Specic NoC Synthesis Flow
21.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-16
Acknowledgment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-17
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-17
21.1 Introduction
The increasing integration densities made available by shrinking device geometries will have to be
exploited to meet the computational requirements of parallel applications, such as multimedia processing,
automotive, multiwindow TV, ambient intelligence, etc.
As an example, systems designed for ambient intelligence will be based on high-speed digital signal
processing with computational loads ranging from 10 MOPS for lightweight audio processing, 3 GOPS
for video processing, 20 GOPS for multilingual conversation interfaces, and up to 1 TOPS for synthetic
video generation. This computational challenge will have to be addressed at manageable power levels and
affordable costs [1].
Such a performance cannot be provided by a single processor, but requires a heterogeneous on-chip
multiprocessor system containing a mix of general-purpose programmable cores, application specic
processors, and dedicated hardware accelerators.
In this context, performance of gigascale Systems-on-Chip (SoC) will be communication dominated,
and only an interconnect-centric system architecture will be able to cope with this problem. Current
on-chip interconnects consist of low-cost shared arbitrated buses, based on the serialization of bus access
requests; only one master at a time can be granted access to the bus. The main drawback of this solution
is its lack of scalability, which will result in unacceptable performance degradation for complex SoCs
21-1
2006 by Taylor & Francis Group, LLC
21-2 Embedded Systems Handbook
Core
Core
Core
Core
Core
NI
NI
NI
NI
NI
S
S
S S
S
NI
Core
NI network
interface
Sswitch
FIGURE 21.1 Example of NoC architecture.
(more than a dozen of integrated cores). Moreover, the connection of new blocks to a shared bus increases
its associated load capacitance, resulting in more energy consuming bus transactions.
A scalable communication infrastructure that better supports the trend of SoC integration consists
of an on-chip micronetwork of interconnects, generally known as Network-on-Chip (NoC) architecture
[24]. The basic idea is borrowed from the wide-area networks domain, and envisions router (or switch)-
based networks on which on-chip packetized communication takes place, as depicted in Figure 21.1. Cores
access the network by means of proper interfaces, and have their packets forwarded to destination through
a certain number of hops.
The scalable and modular nature of NoCs and their support for efcient on-chip communication
potentially leads to NoC-based multiprocessor systems characterized by high structural complexity and
functional diversity. On one hand, these features need to be properly addressed by means of new design
methodologies [5], while on the other hand more efforts have to be devoted to modeling on-chip commu-
nicationarchitectures andintegrating themintoa single modeling andsimulationenvironment combining
both processing elements and communication infrastructures [68]. These efforts are needed to include
on-chip communication architecture in any quantitative evaluation of system design during design space
exploration [9,10], so as to be able to assess the impact of the interconnect on achieving a target system
performance.
An important design decision for NoCs regards the choice of topology. Several researchers [4,5,11,12]
envision NoCs as regular tile-based topologies (such as mesh networks and fat trees), which are suitable
for interconnecting homogeneous cores in a chip multiprocessor. However, SoCcomponent specialization
(used by designers to optimize performance at low power consumption and competitive cost) leads to
the on-chip integration of heterogeneous cores having varied functionality, size, and communication
requirements. If a regular interconnect is designed to match the requirements of a few communication-
hungry components, it is bound to be largely overdesigned with respect to the needs of the remaining
components. This is the main reason why most of the current SoCs use irregular topologies, such as
bridged buses and dedicated point-to-point links [13].
This chapter introduces basic principles and guidelines for the NoC design. At rst, the motivation for
the design paradigm shift of SoC communication architectures from shared buses to NoCs is examined.
Then, the chapter goes into the details of NoC building blocks (switch, network interface, and switch-to-
switchlinks), discussing the designguidelines andpresenting a case study where some of the most advanced
concepts in NoC design have been applied to a real NoC architecture (called Xpipes and developed at
University of Bologna [14]).
Finally, the challenging issue of heterogeneous NoCdesign will be addressed, and the effects of mapping
the communication requirements of an application onto a domain-specic NoC, instead of a network
with regular topology, will be detailed by means of an illustrative example.
2006 by Taylor & Francis Group, LLC
NoC Design for Gigascale SoC 21-3
21.2 Design Challenges for On-Chip Communication
Architectures
SoC design challenges that are driving the evolution of traditional bus architectures toward NoCs can be
outlined as follows:
Technology issues. While gate delays scale down with technology, global wire delays typically increase or
remain constant as repeaters are inserted. It is estimated that in 50 nm technology, at a clock frequency
of 10 GHz, a global wire delay might range from 6 to 10 clock cycles [2]. Therefore, limiting the on-chip
distance traveled by critical signals will be key to guarantee the performance of the overall system, and will
be a common design guideline for all kinds of system interconnects. On the contrary, other challenges
posed by deep submicron technologies are leading to a paradigmshift in the design of SoCcommunication
architectures. For instance, global synchronization of cores on future SoCs will be unfeasible due to deep
submicron effects (clock skew, power associated with clock distribution tree, etc.), and an alternative
scenario consists of self-synchronous cores that communicate with one another through a network-centric
architecture [15]. Finally, signal integrity issues (crosstalk, power supply noise, soft errors, etc.) will lead to
more transient and permanent failures of signals, logic values, devices, and interconnects, thus raising the
reliability concern for on-chip communication [16]. In many cases, on-chip networks can be designed as
regular structures, allowing electrical parameters of wires to be optimized and well controlled. This leads
to lower communication failure probabilities, thus enabling the use of lowswing signaling techniques [17],
andtothe capability of exploiting performance optimizationtechniques, suchas wavefront pipelining [18].
Performance issues. In traditional buses, all communication actors share the same bandwidth. As a
consequence, performance does not scale with the level of system integration, but degrades signicantly.
Though, once the bus is granted to a master, access occurs with no additional delay. On the contrary, NoCs
can provide much better performance scalability. No delays are experienced for accessing the communica-
tion infrastructure, since multiple outstanding transactions originated by multiple cores can be handled
at the same time, resulting in a more efcient network resources utilization. However, given a certain
network dimension (e.g., number of instantiated switches), large latency uctuations for packet delivery
could be experienced as a consequence of network congestion. This is unacceptable when hard real-time
constraints of an application have to be met, and two solutions are viable: network overdimensioning
(for NoCs designed to support Best Effort [BE] trafc only) or implementation of dedicated mechanisms
to provide guarantees for timing constrained trafc (e.g., loss-less data transport, minimal bandwidth,
bounded latency, minimal throughput, etc.) [19].
Design productivity issues. It is well known that synthesis and compiler technology development do not
keep up with ICmanufacturing technology development [20]. Moreover, time-to-market needs to be kept
as lowas possible. Reuse of complex preveried design blocks is an efcient mean to increase productivity,
and regards both computation resources and the communication infrastructure [21]. It would be highly
desirable to have processing elements that could be employed in different platforms by means of a plug-
and-play design style. To this purpose, a scalable and modular on-chip network represents a more efcient
communication infrastructure compared with shared-bus-based architectures. However, the reuse of
processing elements is facilitated by the denition of standard network interfaces, which also make the
modularity property of the NoCeffective. The Virtual Socket Interface Alliance (VSIA) has attempted to set
the characteristics of this interface industry-wide [22]. OpenCore Protocol (OCP) [23] is another example
of standard interface sockets for cores. It is worth remarking that such network interfaces also decouple the
development of newcores fromthe evolutionof newcommunicationarchitectures. The core developer will
not have to make assumptions about the system, when the core will be plugged into. Similarly, designers of
new on-chip interconnects will not be constrained by the knowledge of detailed interfacing requirements
for particular legacy SoC components. Finally, let us observe that NoC components (e.g., switches or
interfaces) can be instantiated multiple times in the same design (as opposed to the arbiter of traditional
shared buses, which is instance-specic) and reused in a large number of products targeting a specic
application domain.
2006 by Taylor & Francis Group, LLC
21-4 Embedded Systems Handbook
The development of NoC architectures and protocols is fueled by the aforementioned arguments,
in spite of the challenges represented by the need for new design methodologies and an increased
complexity of system design.
21.3 Related Work
The need to progressively replace on-chip buses with micronetworks was extensively discussed in [2,4].
A number of NoC architectures have been proposed in the literature so far.
Sonics MicroNetwork [24] is an on-chip network making use of communication architecture-
independent interface sockets. The MicroNetwork is an example of evolutionary solutions [25], which
move from a physical implementation as a shared bus, and propose generalizations to support higher
bandwidth (such as partial and full crossbars).
STBUS interconnect from STMicroelectronics is another example of evolutionary architecture that
provides designers with the capability to instantiate both shared bus or partial or full crossbar interconnect
congurations.
Even though these architectures provide higher bandwidth than simple buses, addressing the wiring
delay and scalability challenge in the long term requires more radical solutions.
One of the earliest contributions in this area is the Maia heterogeneous signal processing architecture,
proposed by Zhang et al. [26], based on a hierarchical mesh network. Unfortunately, Maias interconnect
is fully instance-specic. Furthermore, routing is static at conguration time: network switches are pro-
grammed once for all for a given application (as in a Field Programmable Gate Array [FPGA]). Thus,
communication is based on circuit switching, as opposed to packet switching.
In this direction, Dally and Lacy [27] sketch the architecture of a VLSI multicomputer using 2009 tech-
nology. Achipwith64processor-memory tiles is envisioned. Communicationis basedonpacket switching.
This seminal work draws upon past experiences in designing parallel computers and recongurable
architectures (FPGAs and their evolutions) [2830].
Most proposed NoC platforms are packet switched and exhibit regular structure. An example is a
mesh interconnection, which can rely on a simple layout and the switch independence on the network
size. The NOSTRUM network described in Reference 5 takes this approach: the platform includes both
a mesh architecture and the design methodology. The Scalable Programmable Integrated Network (SPIN)
described in Reference 31 is another regular, fat-tree-based network architecture. It adopts cut-through
switching to minimize message latency and storage requirements in the design of network switches. The
Linkoeping SoCBUS [32] is a two-dimensional mesh network that uses a packet connected circuit (PCC)
to set up routes through the network: a packet is switched through the network locking the circuit as it
goes. This notion of virtual circuit leads to deterministic communication behavior but restricts routing
exibility for the rest of the communication trafc.
The need to map communication requirements of heterogeneous cores may lead to the adoption of
irregular topologies. The motivation for such architectures lies in the fact that each block can be opti-
mized for a specic application (e.g., video or audio processing), and link characteristics can be adapted
to the communication requirements of the interconnected cores. Supporting heterogeneous architectures
requires a major design effort and leads to coarser-granularity control of physical parameters. Many recent
heterogeneous SoCimplementations are still based on shared buses (such as the single chip MPEG-2 codec
reportedinReference 33, but the growing complexity of customizable media embeddedprocessor architec-
tures for digital media processing will soon require NoC-based communication architectures and proper
hardware/software development tools. The Aethereal NoC design framework presented in Reference 34
aims at providing a complete infrastructure for developing heterogeneous NoC with end-to-end quality
of service guarantees. The network supports guaranteed throughput (GT) for real-time applications and
BE trafc for timing unconstrained applications.
Support for heterogeneous architectures requires highly congurable network building blocks, custom-
izable at instantiation time for a specic application domain. For instance, the Proteo NoC [35] consists of
2006 by Taylor & Francis Group, LLC
NoC Design for Gigascale SoC 21-5
a small library of predened, parameterized components that allow the implementation of a large range
of different topologies, protocols, and congurations.
Xpipes interconnect [14] and its synthesizer XpipesCompiler [36] push this approach to the limit, by
instantiating an application-specic NoC from a library of composable soft macros (network interface,
link, and switch). The components are highly parameterizable and provide reliable and latency-insensitive
operation.
21.4 NoC Architecture
Most of the terminology for on-chip packet-switched communication is adapted from computer network
and multiprocessor domain. Messages that have to be transmitted across the network are usually parti-
tioned into xed-length packets. Packets in turn are often broken into message ow control units called
its. In the presence of channel width constraints, multiple physical channel cycles can be used to transfer
a single it. A phit is the unit of information that can be transferred across a physical channel in a single
step. Flits represent logical units of information, as opposed to phits that correspond to physical quantities.
In many implementations, a it is set to be equal to a phit. The basic building blocks for packet-switched
communication across NoCs are:
1. Network link
2. Switch
3. Network interface
and will be described hereafter.
21.4.1 Network Link
The performance of interconnect is a major concern in scaled technologies. As geometries shrink, gate
delay improves much faster than the delay in long wires. Therefore, the long wires increasingly determine
the maximum clock rate, and hence performance, of the entire design. The problem becomes particularly
serious for domain-specic heterogeneous SoCs, where the wire structure is highly irregular and may
include both short and extremely long switch-to-switch links. Moreover, it has been estimated that only a
fraction of the chip area (between 0.4 and 1.4%) will be reachable in one clock cycle [37].
A solution to overcome the interconnect-delay problem consists of pipelining interconnects [38,39].
Wires can be partitioned into segments (or relay stations, which have a function similar to one of the
latches on a pipelined data path) whose length satises predened timing requirements (e.g., desired clock
speed of the design). In this way, link delay is changed into latency, but data introduction rate is not
bounded by the link delay any more. Now, the latency of a channel connecting two modules may end up
being more than one clock cycle. Therefore, if the functionality of the design is based on the sequencing of
the signals and not on their exact timing, then link pipelining does not change the functional correctness
of the design. This requires the system to be made of modules whose behavior does not depend on the
latency of the communication channels (latency-insensitive operation). As a consequence, the use of
interconnect pipelining can be seen as a part of a new and more general methodology for deep submicron
(DSM) designs, which can be envisioned as synchronous distributed systems composed by functional
modules that exchange data on communication channels according to a latency-insensitive protocol. This
protocol ensures that functionally correct modules behave correctly independently of the channel latencies
[38]. The effectiveness of the latency-insensitive design methodology is strongly related to the ability of
maintaining a sufcient communication throughput in the presence of increased channel latencies.
The International Technology Roadmap for Semiconductors (ITRS) 2001 [15] assumes that interconnect
pipelining is the strategy of choice in its estimates of achievable clock speeds for MPUs. Some industrial
designs already make use of interconnect pipelining. For instance, the NETBURST microarchitecture of
Pentium4 contains instances of a stage dedicated exclusively to handle wire delays: in fact, a so-called drive
2006 by Taylor & Francis Group, LLC
21-6 Embedded Systems Handbook
stage is used only to move signals across the chip without performing any computation and, therefore,
can be seen as a physical implementation of a relay station [40].
Xpipes interconnect makes use of pipelined links and of latency-insensitive operation in the imple-
mentation of its building blocks. Switch-to-switch links are subdivided into basic segments whose length
guarantees that the desired clock frequency (i.e., the maximum speed provided by a certain technology)
can be used. In this way, the system operating frequency is not bound by the delay of long links. According
to the link length, a certain number of clock cycles is needed by a it to cross the interconnect. If network
switches are designed in such a way that their functional correctness depends on the it arriving order and
not on their timing, input links of the switches can be different and of any length. These design choices
are at the basis of latency-insensitive operation of the NoC and allow the construction of an arbitrary
network topology and hence support for heterogeneous architectures.
Figure 21.2 illustrates the link model, which is equivalent to a pipelined shift register. Pipelining has
been used both for data and control lines. The gure also illustrates how pipelined links are used to
support latency-insensitive link-level error control, ensuring robustness against communication errors.
The retransmission of a corrupted it between two successive switches is represented. Multiple outstand-
ing its propagate across the link during the same clock cycle. When its are correctly received at the
destination switch, an ACK is propagated back to the source, and after N clock cycles (where N is the
C B A
D B C
D C
B
D C B
A D C B
D C B
D C B A
D C B A
D
A
A B
A
C B A
D
Flits at source switch Link Destination switch
ACK=1
ACK_Valid=1
ACK=1
ACK_Valid=1
ACK=0
ACK_Valid=1
ACK=0
ACK_Valid=1
C B A
D C B A
Transmission
Flit acknowledgment
Detection of corrupted flit
ACK/NACK propagation
Retransmission
Go-Back-N
FIGURE 21.2 Pipelined link model and latency-insensitive link-level error control.
2006 by Taylor & Francis Group, LLC
NoC Design for Gigascale SoC 21-7
length of the link expressed in number of repeater stages) the it will be discarded from the buffer of
the source switch. On the contrary, a corrupted it is NACKed and will be retransmitted in due time.
The implemented retransmission policy is GO-BACK-N, to keep the switch complexity as low as possible.
21.4.2 Switch
The task of the switch is to carry packets injected into the network to their nal destination, following a
statically dened or dynamically determined routing path. The switch transfers packets from one of its
input ports to one or more of its output ports.
Switch design is usually characterized by a power-performance trade-off: power-hungry switch memory
resources can be required by the need to support high-performance on-chip communication. A specic
design of a switch may include both input and output buffers or only one type of buffer. Input queuing uses
fewer buffers, but suffers from head-of-line blocking. Virtual output queuing has a higher performance,
but at the cost of more buffers.
Network ow control (or routing mode) specically addresses the limited amount of buffering resources
in switches. Three policies are feasible in this context [41].
In store-and-forward routing, an entire packet is received and entirely stored before being forwarded
to the next switch. This is the most demanding approach in terms of memory requirements and switch
latency. Also virtual cut-through routing requires buffer space for an entire packet, but allows lower latency
communication, in that a packet is forwarded as soon as the next switch guarantees that the complete
packet will be accepted. If this is not the case, the current router must be able to store the whole packet.
Finally, a wormhole routing scheme can be employed to reduce switch memory requirements and to
permit low latency communication. The rst it of a packet contains routing information, and header
it decoding enables the switches to establish the path and subsequent its simply follow this path in a
pipelined fashion by means of switch output port reservation. A it is passed to the next switch as soon as
enough space is available to store it, even though there is not enough space to store the whole packet. If a
certain it faces a busy channel, subsequent its have to wait at their current locations and are therefore
spread over multiple switches, thus blocking the intermediate links. This scheme avoids buffering the full
packet at one switch and keeps end-to-end latency low, although it is more sensitive to deadlock and may
result in low link utilization.
Guaranteeing quality of service in switch operation is another important design issue, which needs
to be addressed when time-constrained (hard or soft real-time) trafc is to be supported. Throughput
guarantees or latency bounds are examples of time-related guarantees.
Contention related delays are responsible for large uctuations of performance metrics, and a fully
predictable system can be obtained only by means of contention free routing schemes. With circuit
switching, a connection is setup over which all subsequent data is transported. Therefore, contention
resolution takes place at setup at the granularity of connections, and time-related guarantees during data
transport can be given. In time division circuit switching (see Reference 24 for an example), bandwidth is
shared by time division multiplexing connections over circuits.
In packet switching, contention is unavoidable since packet arrival cannot be predicted. Therefore
arbitration mechanisms and buffering resources must be implemented at each switch, thus delaying data
in an unpredictable manner and making it difcult to provide guarantees. BE NoC architectures can
mainly rely on network overdimensioning to bound uctuations of performance metrics.
The Aethereal NoC architecture makes use of a router that tries to combine GT and BE services [34].
The GT router subsystem is based on a time-division multiplexed circuit switching approach. A router
uses a slot table to (1) avoid contention on a link, (2) divide up bandwidth per link between connections,
and (3) switch data to the correct output. Every slot table T has S time slots (rows), and N router outputs
(columns). There is a logical notion of synchronicity: all routers in the network are in the same xed-
duration slot. In a slot s at most one block of data can be read/written per input/output port. In the next
slot, the read blocks are written to their appropriate output ports. Blocks thus propagate in a store and
forward fashion. The latency a block incurs per router is equal to the duration of a slot and bandwidth
2006 by Taylor & Francis Group, LLC
21-8 Embedded Systems Handbook
is guaranteed in multiples of block size per S slots. The BE router uses packet switching, and it has been
showed that both input queuing with wormhole routing or virtual cut-through routing and virtual output
queuing with wormhole routing are feasible in terms of buffering cost. The BE and GT router subsystem
are combined in the Aethereal router architecture of Figure 21.3. The GT router offers a xed end-to-end
latency for its trafc, which is given the highest priority by the arbiter. The BE router uses all the bandwidth
(slots) that has not been reserved or used by GT trafc. GT router slot tables are programmed by means
of BE packets (see the arrow program in Figure 21.3). Negotiations, resulting in slot allocation, can be
done at compile time, and be congured deterministically at runtime. Alternatively, negotiations can be
done at runtime.
A different perspective has been taken in the design of the switch for the BE Xpipes NoC. Figure 21.4
shows an example conguration with four inputs, four outputs, and two virtual channels multiplexed
across the same physical output link. A physical link is assigned to different virtual channels on a it-
by-it basis, thereby improving network throughput. Switch operation is latency-insensitive, in that
correct operation is guaranteed for arbitrary link pipeline depth. In fact, as explained above, network
BE
Arbitration
Program
Low priority High priority
GT
BE
GT
Control path
Data path Switch Buffers
Preempt
Program
(b) (a)
FIGURE 21.3 A combined GTBE router: (a) conceptual view; (b) hardware view.
Out[1]
Out[0]
Out[3]
Out[2]
2N+M flits
In[3]
In[2]
In[0]
In[1]
Switch
FIGURE 21.4 Example of switch conguration with two virtual channels.
2006 by Taylor & Francis Group, LLC
NoC Design for Gigascale SoC 21-9
links in Xpipes interconnect are pipelined with a exible number of stages, thereby decoupling link data
introduction rate from its physical length.
For latency-insensitive operation, the switch has virtual channel registers to store 2N + M its, where
N is the link length (expressed as number of basic repeater stages) and M is a switch architecture-related
contribution (12 cycles in this design). The reason is that each transmitted it has to be acknowledged
before being discarded from the buffer. Before an ACK is received, the it has to travel across the link
(N cycles), an ACK/NACK decision has to be taken at the destination switch (a portion of M cycles), the
ACK/NACK signal has to be propagated back (N cycles) and recognized by the source switch (remaining
portion of M cycles). During this time, other 2N + M its are transmitted but not yet ACKed.
Output buffering was chosen for Xpipes switches, and the resulting architecture is reported in
Figure 21.5. It consists of a replication of the same output module, accepting all input ports as its
own inputs. Flow control signals generated by each output block are directed to a centralized module, that
takes care of generating proper ACKs or NACKs for the incoming its from the different input ports.
Each output module is deeply pipelined (seven pipeline stages) so as to maximize the operating clock
frequency of the switch. Architectural details on the pipelined output module are illustrated in Figure 21.6.
Forward ow control is used and a it is transmitted to the next switch only when adequate storage is
available. The CRC decoders for error detection work are in parallel with the switch operation, thereby
hiding their impact on switch latency.
In[1]
In[0]
In[1]
In[2]
In[3]
In[0]
Out[0]
Out[0]
Out[1]
ACK management
Out[3]
Out[3] In[3]
Internal_flow_control[0]
Internal_flow_control[1]
Internal_flow_control[3]
flow_control[1]
portOUTtot_switch[0].h
In[0]
In[1]
In[2]
In[3]
In[0]
In[1]
In[2]
In[3]
portOUTtot_switch[1].h
portOUTtot_switch[3].h
flow_control[0]
flow_control[3]
flow_control[2]
Out[2]
In[2]
Out[1]
FIGURE 21.5 Architecture of output buffered Xpipes switch.
2006 by Taylor & Francis Group, LLC
21-10 Embedded Systems Handbook
CRC_decoder[0]
CRC_decoder[1]
CRC_decoder[2]
CRC_decoder[3]
crc_ACK[3]
crc_ACK[2]
crc_ACK[1]
crc_ACK[0]
In[0]
ACK
NACK
ACK
NACK
ACK
ACK
ACK_Valid
which_in
Output module
Out[0]
Forward
flow
control
Matching
input and
output
port
Virtual
channel
registers
MUXstage
Arbiter Virtual
channel
arbiter
Output
link
arbiter
NACK
In[1]
In[2]
In[3]
Error
detection
logic
FIGURE 21.6 Architecture of an Xpipes switch output module.
The rst pipeline stage checks the header of incoming packets on different input ports to determine
whether those packets have to be routed through the output port under consideration. Only matching
packets are forwarded to the second stage, which resolves contention based on a round robin policy.
Arbitration is carried out against receipt of the tail its of preceding packets, so that all other its of a
packet can be propagated without contention resolution at this stage. A NACK for its of nonselected
packets is generated. The third stage is just a multiplexer, which selects the prioritized input port. The
following arbitration stage keeps the status of virtual channel registers and decides whether the its can
be stored into the registers or not. A header it is sent to the register with more free locations, followed
by successive its of the same packet. The fth stage is the actual buffering stage, and the ACK/NACK
message at this stage indicates whether a it has been successfully stored or not. The following stage takes
care of forward ow control and nally a last arbitration stage multiplexes the virtual channels on the
physical output link.
Finally, the switch is highly parameterizable. Design parameters are: number of I/O ports, it width,
number of virtual channels, length of switch-to-switch links, size of output registers.
21.4.3 Network Interface
The most relevant tasks of the network interface are: (1) hiding the details about the network com-
munication protocol to the cores, so that they can be developed independently of the communication
infrastructure, (2) communication protocol conversion (from end-to-end to network protocol), (3) data
packetization (packet assembly, delivery, and disassembly).
The former objective can be achieved by means of standard interfaces. For instance, the VSIA vision
[22] is to specify open standards and specications that facilitate the integration of software and hardware
2006 by Taylor & Francis Group, LLC
NoC Design for Gigascale SoC 21-11
virtual components from multiple sources. Different complexity interfaces are described in the standard,
from Peripheral Virtual Complexity Interfaces (VCI) to Basic VCI and Advanced VCI.
Another example of standard socket to interface cores to networks is represented by Open Core Protocol
(OCP) [23]. Its main characteristics are a high degree of congurability to adapt to the cores functionality
and the independence of request and response phases, thus supporting multiple outstanding requests and
pipelining of transfers.
Data packetization is a critical task for the network interface, and has an impact on the communication
latency, besides the latency of the communication channel. The packet-preparation process consists of
building packet header, payload, and packet tail. The header contains the necessary routing and network
control information (e.g., source and destination address). When source routing is used, the destination
address is ignored and replaced with a route eld that species the route to the destination. This overhead
in terms of packet header is counterbalanced by the simpler routing logic at the network switches: they
simply have to look at the route eld and route the packet over the specied switch output port. The packet
tail indicates the end of a packet and usually contains parity bits for error-detecting or error-correcting
codes.
An insight in the Xpipes network interface implementation will provide an example of these concepts.
It provides a standardized OCP-based interface to network nodes. The network interface for cores that
initiate communication (initiators) needs to turn OCP-compliant transactions into packets to be trans-
mitted across the network. It represents the slave side of an OCP end-to-end connection, and it is therefore
referred to as network interface slave (NIS). Its architecture is showed in Figure 21.7.
Initiator
core
STATIC_PACKETING
Datastream
DP_FAST
BUFFER_OUT
Output
to
the network
Input
from
the network
SYNCHRO
Datastream
RECEIVE_RESPONSE
NIS
enable_new_read Master
if
req_tx_datastream
busy_receive_response
Lutword
Flitout
teq_tx_flitout
busy_buffer
numSB
busy_dpfast
req_tx_datastream
start_receive_response
Request
phase
(OCP)
Response
phase
(OCP)
FIGURE 21.7 Architecture of the Xpipes NIS.
2006 by Taylor & Francis Group, LLC
21-12 Embedded Systems Handbook
The NIS has to build the packet header, which has to be spread over a variable number of its depending
on the length of the path to the destination node. In fact, Xpipes relies on a static routing algorithm called
street sign routing. Routes are derived by the network interface by accessing a look-up table based on the
destination address. Such information consists of direction bits read by each switch and indicating the
output port of the switch to which its belonging to a certain packet have to be directed to.
The look-up table is accessed by the STATIC_PACKETING block, a nite state machine that forwards
the routing information numSB (number of hops to destination) and lutword (word read from the look-
up table) as well as the request-related information datastream from the initiator core to the DP_FAST
block, provided the enable signal busy_dpfast is not asserted.
Based on the input data, module DP_FAST has the task of building the its to be transmitted via the
output buffer BUFFER_OUT, according to the mechanism illustrated in Figure 21.8. Let us assume that a
packet requires numSB = 5 hops to get to destination, and that the direction to be taken at each switch
is expressed by DIR. Module DP_FAST builds the rst it by concatenating the it type eld with path
information. If there is some space left in the it, it is lled with header information derived fromthe input
datastream. The unused part of the datastream is stored in a regpark register, so that a new datastream
can be read from the STATIC_PACKETING block. The following header and payload its will be formed
by combining data stored in regpark and reg_datastream. No partially lled its are transmitted to make
transmission more efcient. Finally, module BUFFER_OUT stores its to be sent across the network, and
allows the NIS to keep preparing successive its when the network is congested. Size of this buffer is a
design parameter.
The response phase is carriedout by means of twomodules. SYNCHROreceives incoming its andreads
out only useful information (e.g., it discards route elds). At the same time, it contains buffering resources
DIR
Lutword
numSB=5
DIR DIR DIR DIR
Path
Header
Header
reg_datastream
Datastream
WFTYPEWD
FTYPE
W
Flit
Regflit
Regpark
FIGURE 21.8 Mechanism for building header its.
2006 by Taylor & Francis Group, LLC
NoC Design for Gigascale SoC 21-13
to synchronize the networks requests to transmit remaining packet its with the core consuming rate.
The RECEIVE_RESPONSE module translates useful header and payload information into OCP-compliant
response elds.
When a read transaction is initiated by the master core, the STATIC_PACKETING block asserts a
start_receive_response signal that triggers the waiting phase of the RECEIVE_RESPONSE module for the
requested data. As a consequence, the NIS supports only one outstanding read operation to keep interface
complexity low. Although no read after read transactions can be initiated unless the previous one has
completed, an indenite number of write transactions can be carried out after an outstanding read has
been initiated.
The architecture of a network interface master is similar to the one just described, and is not reported
here for lack of space. At instantiation time, the main network interface related parameters to be set are:
total number of core blocks, it width, and maximum number of hops across the network.
21.5 NoC Topology
The individual components of SoCs are inherently heterogeneous with widely varying functionality and
communication requirements. The communication infrastructure should optimally match communica-
tion patterns among these components accounting for the individual component needs.
As an example, consider the implementation of an MPEG4 decoder [42], depicted in Figure 21.9(b),
where blocks are drawn roughly to scale and links represent interblock communication. First, the embed-
ded memory (SDRAM) is much larger than all other cores and it is a critical communication bottleneck.
Block sizes are highly nonuniform and the oorplan does not match the regular, tile-based oorplan
shown in Figure 21.9(a). Second, the total communication bandwidth to/from the embedded SDRAM
is much larger than that required for communication among the other cores. Third, many neighboring
blocks do not need to communicate. Even though it may be possible to implement MPEG4 onto a homo-
geneous fabric, there is a signicant risk of either underutilizing many tiles and links, or, at the opposite
extreme, of achieving poor performance because of localized congestion. These factors motivate the use
of an application-specic on-chip network [26].
Withanapplication-specic network, the designer is facedwiththe additional task of designing network
components (e.g., switches) with different congurations (e.g., different I/Os, virtual channels, buffers)
and interconnecting them with links of uneven length. These steps require signicant design time and the
need to verify network components and their communications for every design.
The library-based nature of network building blocks seems the more appropriate solution to support
domain-specic custom NoCs. Two relevant examples have been reported in the open literature: Proteo
and Xpipes Interconnects. Proteo consists of a fully reusable and scalable component library where the
Core
AU
RAST
SRAM
MCPU
SDRAM
BAB UPS ADSP
RISC SRAM
DSP
VU
(a) (b)
FIGURE 21.9 Homogeneous versus heterogeneous architectural template: (a) tile-based on-chip multiprocessor;
(b) MPEG4 SoC.
2006 by Taylor & Francis Group, LLC
21-14 Embedded Systems Handbook
components can be used to implement networks from very simple bus emulation structures to complex
packet networks. It uses standardized VCI interface between the functional cores and the communication
network. Proteo is described using synthesizable VHDL and relies on an interconnect node architecture
that targets exible on-chip communication. It is used as a testing platformwhen the efciency of network
topologies and routing schemes are investigated for on-chip environments. The node is constructed from
a collection of parameterized and reusable hardware blocks, including components such as FIFO (rst in
rst out) buffers, routing controllers, and standardized interface wrappers. A node can be tuned to fulll
the desired characteristics of communication by properly selecting the internal architecture of the node
itself.
Xpipes NoC takes a similar approach. As described throughout this chapter, its network building blocks
have been designed as highly congurable and design-time composable soft macros described in SystemC
at the cycle-accurate level.
An optimal system solution will also require an efcient mapping of high-level abstractions on to
the underlying platform. This mapping procedure involves optimizations and trade-offs among many
complex constraints, including quality of service, real-time response, power consumption, area, etc. Tools
are urgently needed to explore this mapping process, and assist and automate optimization where possible.
The rst challenge for these tools is to bridge the gap in building custom NoCs that optimally match
the communication requirements of the system. The network components they build should be highly
optimized for that particular NoC design, providing large savings in area, power, and latency with respect
to standard NoCs based on regular structures.
Inthe following section, anexample of designmethodology for heterogeneous SoCs is briey illustrated.
It is relative to Xpipes interconnect and relies on a tool that automatically instantiates an application-
specic NoC for heterogeneous on-chip multiprocessors (called XpipesCompiler [36]).
21.5.1 Domain-Specic NoC Synthesis Flow
The complete XpipesCompiler-based NoC design ow is depicted in Figure 21.10. From the specication
of an application, the designer (or a high-level analysis and exploration tool) creates a high-level view of
the SoC oorplan, including nodes (with their network interfaces), links, and switches. Based on clock
NI
files
Switch
files
Link
files
SystemC
Files
of whole
design
Xpipes Library
Application
Application
specific
NoC
Instantiation software
XpipesCompiler tool
Routing
tables
Core
source
files
SystemC
files
of whole
design
FIGURE 21.10 NoC synthesis ow with XpipesCompiler.
2006 by Taylor & Francis Group, LLC
NoC Design for Gigascale SoC 21-15
AU
Media
CPU
RAST
190
0.5
60
40
600
40
250
500
670
32
0.5
910
SDRAM
SRAM1
SRAM2
iDCT,
etc.
173
VU
ADSP
Up
SAMP
BAB
RISC
FIGURE 21.11 Core graph representation of an example MPEG4 design with annotated average communication
requirements.
speed target and link routing, the number of pipeline stages for each link is also specied. The information
on the network architecture is specied in an input le for the XpipesCompiler. Routing tables for the
network interfaces are also specied. The tool takes as additional input the SystemC library of soft network
components. The output is a SystemC hierarchical description, which includes all switches, links, network
nodes, and interfaces, and species their topological connectivity. The nal description can then be
compiled and simulated at the cycle-accurate and signal-accurate level. At this point, the description can
be fed to backend register transfer level (RTL) synthesis tools for silicon implementation.
In a nutshell, the XpipesCompiler generates a set of network component instances that are custom-
tailored to the specication contained in its input network description le. This tool allows a very
instructive comparison of the effects (in terms of area, power, and performance) of mapping applications
on customized domain-specic NoCs and regular mesh NoCs.
Let us focus on the MPEG4 decoder already introduced in this chapter. Its core graph representation
together with its communication requirements are reported in Figure 21.11. The edges are annotated
with the average bandwidth requirements of the cores in MB/sec. Customized application-specic NoCs
that closely match the applications communication characteristics have been manually developed and
compared to a regular mesh topology. The different NoC congurations are reported in Figure 21.12.
In the MPEG4 design considered, many of the cores communicate with each other through the shared
SDRAM. So a large switch is used for connecting the SDRAM with other cores (Figure 21.12[b]) while
smaller switches are used for other cores. An alternate custom NoC is also considered (Figure 21.12[c]): it
is an optimized mesh network, with superuous switches and switch I/Os removed.
Area (in 0.1 m technology) and power estimates for the different NoC congurations are reported in
Table 21.1. Since all cores communicate with many other cores, many switches are needed and therefore
area savings are not extremely signicant for custom NoCs.
Based on the average trafc through each network component, the power dissipation for each NoC
design has been calculated. Power savings for the custom solutions are not very signicant, as most of the
trafc traverses the larger switches connected to the memories. As power dissipation on a switch increases
nonlinearly with increase in switch size, there is more power dissipation in the switches of custom NoC1
(that has an 8 8 switch) than the mesh NoC. However most of the trafc traverses short links in this
custom NoC, thereby giving marginal power savings for the whole design.
2006 by Taylor & Francis Group, LLC
21-16 Embedded Systems Handbook
vu raster
izer
SRAM
Media
CPU
RISC
CPU
SRAM
s1
s2
s1
iDCT,
etc.
Audio
DSP
up
samp
BAB
Calc
s2
DDR
SDRAM
DDR
SDRAM
Media
CPU
BAB
calc
Audio
DSP
raster
izer
RISC
CPU
SRAM
s3 3 3
s8 8 8
s1 3 3
s2 4 4
s3 5 5
SRAM
au
s1
s1 s2
s2
s2
s3
s3
s2
au vu
S3
S3
S3
S3
up
samp
iDCT
etc.
S3
S8
au vu
raster
izer
SRAM
DDR
SDRAM
SRAM
iDCT,
etc.
s1 5 5
s2 3 3
s3 4 4
Audio
DSP
up
samp
BAB
Calc
Media
CPU
RISC
CPU
s2
s2
s1
s1
s3
s2 s2
(a)
(b)
(c)
FIGURE 21.12 NoC congurations for MPEG4 decoder: (a) mesh NoC, (b) application-specic NoC1, and
(c) application-specic NoC2.
TABLE 21.1 Area and Power Estimates for the
MPEG4-Related NoC Congurations
NoC Area Ratio Power Ratio
conguration (mm
2
) mesh/cust (mW) mesh/cust
Mesh 1.31 114.36
Custom 1 0.86 1.52 110.66 1.03
Custom 2 0.71 1.85 93.66 1.22
Figure 21.13 reports the variation of average packet latency (for 64B packets, 32 bit its) with link
bandwidth. Custom NoCs, as synthesized by XpipesCompiler, have lower packet latencies as the average
number of switches and link traversals is lower. At the minimum plotted bandwidth value, almost 10%
latency saving is achieved. Moreover, the latency increases more rapidly with the mesh NoC as the link
bandwidth decreases. Also, custom NoCs have better link utilization: around 1.5 times the link utilization
of a mesh topology.
Area, power, and performance optimizations by means of custom NoCs turn out to be more dif-
cult for MPEG4 than other applications, such as Video Object Plane Decoders and MultiWindow
Displayer [36].
21.6 Conclusions
This chapter has described the motivation for packet-switched networks as communication paradigm for
deep submicron SoCs. After an overview of NoC proposals from the open literature, the chapter has gone
into the details of NoC architectural components (switch, network interface, and point-to-point links),
illustrating the Xpipes library of composable soft macros as a case study. Finally, the challenging issue of
heterogeneous NoC design has been addressed, showing an example NoC synthesis ow and detailing
area, power, and performance metrics of customized application-specic NoC architectures with respect
to regular mesh topologies. The chapter aims at highlighting the main guidelines and open issues for NoC
design on gigascale SoCs.
2006 by Taylor & Francis Group, LLC
NoC Design for Gigascale SoC 21-17
A
v
e
r
a
g
e
P
a
c
k
e
t
L
a
t
e
n
c
y
(
i
n
C
y
)
Mesh
Cust1
Cust2
BW (in Gb/sec)
50
48
46
44
42
40
38
36
34
32
3.2 3 2.8 2.6 2.4 2.2
FIGURE 21.13 Average packet latency as a function of the link bandwidth.
Acknowledgment
This work was supported in part by MARCO/DARPA Gigascale Silicon Research Center.
References
[1] F. Boekhorst. Ambient Intelligence, the Next Paradigm for Consumer Electronics: How will it
Affect Silicon? In ISSCC 2002, Vol. 1, February 2002, pp. 2831.
[2] L. Benini and G. De Micheli. Networks on Chips: a New SoC Paradigm. IEEE Computer, 35, 2002,
7078.
[3] P. Wielage and K. Goossens. Networks on Silicon: Blessing or Nightmare? In Proceedings of the
Euromicro Symposium on Digital System Design DSD02, September 2002, pp. 196200.
[4] W.J. Dally and B. Towles. Route Packets, not Wires: On-Chip Interconnection Networks.
In Proceedings of the Design and Automation Conference DAC01, June 2001, pp. 684689.
[5] S. Kumar, A. Jantsch, J.P. Soininen, M. Forsell, M. Millberg, J. Oeberg, K. Tiensyrja, andA. Hemani.
ANetwork on Chip Architecture and Design Methodology. In IEEE Symposiumon VLSI ISVLSI02,
April 2002, pp. 105112.
[6] L. Benini, D. Bertozzi, D. Bruni, N. Drago, F. Fummi, and M. Poncino. SystemC Cosimulation and
Emulation of Multiprocessor SoC Designs. IEEE Computer, 36, 2003, 5359.
2006 by Taylor & Francis Group, LLC
21-18 Embedded Systems Handbook
[7] S. Nugent, D.S. Wills, and J.D. Meindl. A Hierarchical Block-Based Modeling Methodology
for SOC in GENESYS. In Proceedings of the IEEE ASIC/SOC Conference, September 2002,
pp. 239243.
[8] P. Gerin, S. Yoo, G. Nicolescu, and A.A. Jerraya. Scalable and Flexible Cosimulation of SoC Designs
with Heterogeneous Multi-Processor Target Architecture. In Proceedings of the ASP-DAC 2001,
January/February 2001, pp. 6368.
[9] H. Blume, H. Huebert, H.T. Feldkaemper, and T.G. Noll. Model-Based Exploration of the Design
Space for Heterogeneous Systems on Chip. In Proceedings of the IEEE Conference on Application-
Specic Systems, Architectures and Processors ASAP02, 2002.
[10] P.G. Paulin, C. Pilkington, and E. Bensoudane. StepNP: a System-level Exploration Platform for
Network Processors. IEEE Design and Test of Computers, NovemberDecember 2002, pp. 1726.
[11] P. Guerrier and A. Greiner. A Generic Architecture for On-Chip Packet Switched Interconnections.
In Proceedings of the Design, Automation and Testing in Europe DATE00, March 2000, pp. 250256.
[12] S.J. Lee et al. An 800 MHz Star-Connected On-Chip Network for Application to Systems on a
Chip. In ISSCC03, February 2003.
[13] H. Yamauchi et al. A 0.8 W HDTV Video Processor with Simultaneous Decoding of Two MPEG2
MP@HL Streams and Capable of 30 Frames/s Reverse Playback. In ISSCC02, Vol. 1, February
2002, pp. 473474.
[14] M. DallOsso, G. Biccari, L. Giovannini, D. Bertozzi, and L. Benini. Xpipes: a Latency Insen-
sitive Parameterized Network-on-Chip Architecture for Multi-Processor SoCs. In ICCD03,
October 2003.
[15] ITRS. 2001, http://public.itrs.net/Files/2001ITRS/Home.htm.
[16] D. Bertozzi, L. Benini, and G. De Micheli. Energy-Reliability Trade-Off for NoCs. In Networks
on Chip, A. Jantsch and Hannu Tenhunen, Eds., Kluwer Academic Press, Boston, MA, 2003,
pp. 107129.
[17] H. Zhang, V. George, and J.M. Rabaey. Low-Swing On-Chip Signaling Techniques: Effectiveness
and Robustness. IEEE Transactions on VLSI Systems, 8, 2000, 264272.
[18] J. Xu, and W. Wolf, Wave Pipelining for Application-Specic Networks-on-Chips. In CASES02,
October 2002, pp. 198201.
[19] K. Goossens, J. Dielissen, J. van Meerbergen, P. Poplavko, A. Radulescu, E. Rijpkema,
E. Waterlander, and P. Wielage. Guaranteeing the Quality of Services in Networks on Chip. In
Networks on Chip, A. Jantsch and Hannu Tenhunen, Eds., Kluwer Academic Press, Boston, MA,
2003, pp. 6182.
[20] ITRS, 1999, http://public.itrs.net/les/1999_SIA_Roadmap/.
[21] A. Jantsch and H. Tenhunen. Will Networks on Chip Close the Productivity Gap? In Networks on
Chip, A. Jantsch and Hannu Tenhunen, Eds., Kluwer Academic Press, Boston, MA, 2003, pp. 318.
[22] VSI Alliance. Virtual Component Interface Standard, 2000.
[23] OCP International Partnership. Open Core Protocol Specication, 2001.
[24] D. Wingard. MicroNetwork-Based Integration for SoCs. In Design Automation Conference DAC01,
June 2001, pp. 673677.
[25] D. Flynn. AMBA: enabling Reusable On-Chip Designs. IEEE Micro, 17, 1997, 2027.
[26] H. Zhang et al. A 1V Heterogeneous Recongurable DSP IC for Wireless Baseband Digital Signal
Processing. IEEE Journal of SSC, 35, 2000, 16971704.
[27] W.J. Dally and S. Lacy. VLSI Architecture: Past, Present and Future. In Conference of Advanced
Research in VLSI, 1999, pp. 232241.
[28] D. Culler, J.P. Singh, and A. Gupta. Parallel Computer Architecture. An Hardware/Software
Approach. Morgan Kaufmann, San Francisco, CA, 1999.
[29] K. Compton and S. Hauck. Recongurable Computing: a Survey of System and Software. ACM
Computing Surveys, 34, 2002, 171210.
[30] R. Tessier and W. Burleson. Recongurable Computing and Digital Signal Processing: a Survey.
Journal of VLSI Signal Processing, 28, 2001, 727.
2006 by Taylor & Francis Group, LLC
NoC Design for Gigascale SoC 21-19
[31] J. Walrand and P. Varaja. High Performance Communication Networks. Morgan Kaufmann,
San Francisco, CA, 2000.
[32] Dale Liu et al. SoCBUS: The Solution of High Communication Bandwidth on Chip and Short
TTM, invited paper in Real Time and Embedded Computing Conference, September 2002.
[33] S. Ishiwata et al., A Single Chip MPEG-2 Codec Based on Customizable Media Embedded
Processor. IEEE JSSC, 38, 2003, 530540.
[34] E. Rijpkema, K. Goossens, A. Radulescu, J. van Meerbergen, P. Wielage, and E. Waterlander. Trade
Offs in the Design of a Router with both Guaranteed and Best-Effort Services for Networks on
Chip. In Design Automation and Test in Europe DATE03, March 2003, pp. 350355.
[35] I. Saastamoinen, D. Siguenza-Tortosa, and J. Nurmi. Iterconnect IP Node for Future Systems-
on-Chip Designs. IEEE Workshop on Electronic Design, Test and Applications, January 2002,
pp. 116120.
[36] A. Jalabert, S. Murali, L. Benini, and G. De Micheli. XpipesCompiler: a Tool for Instantiating
Application Specic Networks-on-Chip, DATE 2004, pp. 884889.
[37] V. Agarwal, M.S. Hrishikesh, S.W. Keckler, and D. Burger. Clock Rate Versus IPC: The End of
the Road for Conventional Microarchitectures. In Proceedings of the 27th Annual International
Symposium on Computer Architecture, June 2000, pp. 248250.
[38] L.P. Carloni, K.L. McMillan, and A.L. Sangiovanni-Vincentelli. Theory of Latency-Insensitive
Design. IEEE Transactions on CAD of ICs and Systems, 20, 2001, 10591076.
[39] L. Scheffer. Methodologies and Tools for Pipelined On-Chip Interconnects, International
Conference on Computer Design, 2002, pp. 152157.
[40] P. Glaskowsky. Pentium 4 (Partially) Previewed. Microprocessor Report, 14, 2000, 1013.
[41] J. Duato, S. Yalamanchili, and L. Ni. Interconnection Networks: an Engineering Approach. IEEE
Computer Society Press, Washington, 1997.
[42] E.B. Van der Tol and E.G.T. Jaspers. Mapping of MPEG4 Decoding on a Flexible Architecture
Platform. In SPIE 2002, January 2002, pp. 113.
2006 by Taylor & Francis Group, LLC
22
Platform-Based Design
for Embedded Systems
Luca P. Carloni,
Fernando De Bernardinis,
Claudio Pinello,
Alberto L.
Sangiovanni-Vincentelli,
and Marco Sgroi
University of California at Berkeley
22.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-1
22.2 Platform-Based Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-3
22.3 Platforms at the Articulation Points of the
Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-4
(Micro-)Architecture Platforms API Platform System
Platform Stack
22.4 Network Platforms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-7
Denitions Quality of Service Design of Network
Platforms
22.5 Fault-Tolerant Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-9
Types of Faults and Platform Redundancy Fault-Tolerant
Design Methodology The API Platform (FTDF Primitives)
Fault-Tolerant Deployment Replica Determinism
22.6 Analog Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-15
Denitions Building Performance Models Mixed-Signal
Design Flow with Platforms Recongurable Platforms
22.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-22
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-24
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-24
22.1 Introduction
Platform-Based Design (PBD) [1,2] has emerged as an important design style as the electronics industry
faced serious difculties owing to three major factors:
1. The disaggregation (or horizontalization) of the electronics industry has begun about a decade ago
and has affected the structure of the industry favoring the move from a vertically oriented business
model to a horizontally oriented one. In the past, electronic systemcompanies used to maintain full
control of the product development cycle, from product denition to nal manufacturing. Today,
the identication of a new market opportunity, the denition of the detailed system specications,
the development and assembly of the components, and the manufacturing of the nal product
are tasks performed more and more frequently by distinct organizations. In fact, the complexity
of electronic designs and the number of technologies that must be mastered to bring winning
products to market have forced electronic companies to focus on their core competence. In this
22-1
2006 by Taylor & Francis Group, LLC
22-2 Embedded Systems Handbook
scenario, the integration of the design chain becomes a serious problem at the hand-off points from
one company to another.
2. The pressure for reducing time-to-market of electronic products in the presence of exponentially
increasing complexity has forced designers to adopt methods that favor component reuse at all
levels of abstraction. Furthermore, each organization that contributes a component to the nal
product naturally strives for the exibility in their design approach that allows to make continuous
adjustments and accommodate last-minute engineering changes.
3. The dramatic increase in NonRecurring Engineering (NRE) costs owing to mask making at the
Integrated Circuit (IC) implementation level (a set of masks for the 90 nm technology node costs
more than two million US dollars), development of production plants (a new fab costs more
than two billion US dollars), and design cost (a new generation microprocessor design requires
more than 500 designers with all the associated costs in tools and infrastructure!) has created,
on one hand, the necessity of correct-the-rst-time designs and on the otherhand, the push for
consolidation of efforts in manufacturing.
1
The combination of these factors has caused several system companies to substantially reduce their
ASIC (Application Specic Integrated Circuits) design efforts. Traditional paradigms in electronic system
and IC design have to be revisited and readjusted or altogether abandoned. Along the same line of
reasoning, IC manufacturers are moving toward the development of parts that have guaranteed high-
volume production forma single mask set (or that are likely to have high-volume production, if successful)
thus moving differentiation and optimization to recongurability and programmability.
Platform-Based Design has emerged over the years as a way of coping with the problems listed earlier.
The termplatform has been used in several domains: from service providers to system companies, from
tier-one suppliers to IC companies. In particular, IC companies have been very active, lately, to espouse
platforms. The TI OMAP platform for cellular phones, the Philips Viper and Nexperia platforms for
consumer electronics, the Intel Centrino platform for laptops, are a few examples. Recently, Intel has been
characterized by its CEO Ottellini as a platform company.
As is often the case for fairly radical new approaches, the methodology emerged as a sequence of
empirical rules and concepts, but we have reached a point where a rigorous design process was needed
together with supporting EDA environments and tools. The PBD:
Sets the foundation for developing economically feasible design ows because it is a structured
methodology that theoretically limits the space of exploration, yet still achieves superior results in the
xed time constraints of the design.
Provides a formal mechanism for identifying the most critical hand-off points in the design chain.
The hand-off point between system companies and IC design companies and the one between
IC design companies (or divisions) and IC manufacturing companies (or divisions) represent the
articulation points of the overall design process.
Eliminates expensive design iterations because it fosters design reuse at all abstraction levels thus
enabling the design of an electronic product by assembling and conguring platform components
in a rapid and reliable fashion.
Provides an intellectual framework for the complete electronic design process.
This chapter presents the foundations of this discipline and outlines a variety of domains where the
PBD principles can be applied. In particular, Section 22.2 denes the main principles of PBD. Our goal
is to provide a precise reference that may be used as the basis for reaching a common understanding
in the electronic system and circuit design community. Then, we present the platforms that dene the
articulationpoints betweensystemdenitionandimplementation(Section22.3). Inthe following sections
1
The cost of fabs has changed the landscape of ICmanufacturing in a substantial way forcing companies to teamup
for developing newtechnology nodes (see, e.g., the recent agreement among Motorola, Philips, andSTMicroelectronics
and the creation of Renesas in Japan).
2006 by Taylor & Francis Group, LLC
Platform-Based Design for Embedded Systems 22-3
we show that the PBD paradigm can be applied to all levels of design: from very high levels of abstraction
such as communication networks (Section 22.4) and fault-tolerant platforms for the design of safety-
critical feedback-control systems (Section 22.5) to low levels such as analog parts (Section 22.6), where
performance is the main focus.
22.2 Platform-Based Design
The basic tenets of PBD are:
The identication of design as a meeting-in-the-middle process, where successive renements of
specications meet with abstractions of potential implementations.
The identication of precisely dened layers where the renement and abstraction process take
place. Each layer supports a design stage that provides an opaque abstraction of lower layers that
allows accurate performance estimations. This information is incorporated in appropriate param-
eters that annotate design choices at the present layer of abstraction. These layers of abstraction are
called platforms to stress their role in the design process and their solidity.
Aplatformis a library of components that can be assembled to generate a design at that level of abstraction.
This library not only contains computational blocks that carry out the appropriate computation but also
communication components that are used to interconnect the functional components. Each element of the
library has a characterization in terms of performance parameters together with the functionality it can
support. For every platformlevel, there is a set of methods used to map the upper layers of abstraction into
the platform and a set of methods used to estimate performances of lower level abstractions. As illustrated
in Figure 22.1, the meeting-in-the-middle process is the combination of two efforts:
Top-down: mapaninstance of the topplatformintoaninstance of the lower platformandpropagate
constraints.
Bottom-up: build a platformby dening the library that characterizes it and a performance abstrac-
tion (e.g., number of literals for technology independent optimization, area, and propagation delay
for a cell in a standard cell library).
A platform instance is a set of architecture components that are selected from the library and whose
parameters are set. Often the combination of two consecutive layers and their lling can be interpreted
as a unique abstraction layer with an upper view, the top abstraction layer, and a lower view, the
bottom layer. A platform stack is a pair of platforms, along with the tools and methods that are used to
map the upper layer of abstraction onto the lower layer. Note that we can allow a platform stack to include
several sub-stacks if we wish to span a large number of abstractions.
Upper layer of abstraction
Lower layer of abstraction
C
o
n
s
t
r
a
i
n
t
s
p
r
o
p
a
g
a
t
i
o
n
P
e
r
f
o
r
m
a
n
c
e
e
s
t
i
m
a
t
i
o
n
FIGURE 22.1 Interactions between abstraction layers.
2006 by Taylor & Francis Group, LLC
22-4 Embedded Systems Handbook
Platforms should be dened to eliminate large loop iterations for affordable designs: they should restrict
design space via new forms of regularity and structure that surrender some design potential for lower
cost and rst-pass success. The library of function and communication components is the design space
that we can explore at the appropriate level of abstraction.
Establishing the number, location, and components of intermediate platforms is the essence of PBD.
In fact, designs with different requirements and specication may use different intermediate platforms,
hence different layers of regularity and design-space constraints. A critical step of the PBD process
is the denition of intermediate platforms to support predictability, which enables the abstraction of
implementationdetail to facilitate higher-level optimization, and veriability, that is, the ability to formally
ensure correctness.
The trade-offs involved in the selection of number and characteristics of platforms relate to the size of
the design space to be explored and the accuracy of the estimation of the characteristics of the solution
adopted. Naturally, the larger the step across platforms, the more difcult is the prediction of performance,
optimizing at the higher levels of abstraction, and providing a tight lower bound. In fact, the design space
for this approach may actually be smaller than the one obtained with smaller steps because it becomes
harder to explore meaningful design alternatives and the restriction on search impedes complete design-
space exploration. Ultimately, predictions/abstractions may be so inaccurate that design optimizations
are misguided and the lower bounds are incorrect.
It is important to emphasize that the PBD paradigm applies to all levels of design. While it is rather easy
to grasp the notion of a programmable hardware platform, the concept is completely general and should
be exploited through the entire design ow to solve the design problem. In the following sections, we will
showthat platforms canbe appliedtolowlevels of abstractionsuchas analog components, where exibility
is minimal and performance is the main focus, as well as to very high levels of abstraction such as networks,
where platforms have to provide connectivity and services. In the former case platforms abstract hardware
to provide (physical) implementation, while in the latter communication services abstract software layers
(protocol) to provide global connectivity.
22.3 Platforms at the Articulation Points of the Design Process
As we mentionedinSection22.2, the key tothe applicationof the designprinciple is the careful denitionof
the platformlayers. Platforms can be dened at several point of the design process. Some levels of abstrac-
tion are more important than others in the overall design trade-off space. In particular, the articulation
point between system denition and implementation is a critical one for design quality and time. Indeed,
the very notion of PBD originated at this point (see [1,35]). In References 1, 2, and 5, we have discovered
that at this level there are indeed two distinct platforms forming a system platform stack. These need to be
dened together with the methods and the tools necessary to link them: a (micro-)architecture platform
and an Application Programming Interface (API) platform. The API platform allows system designers to
use the services that a (micro-)architecture offers. In the world of Personal Computers (PCs), this concept
is well known and is the key to the development of application software on different hardware that share
some commonalities allowing the denition of a unique API.
22.3.1 (Micro-)Architecture Platforms
Integrated circuits used for embedded systems will most likely be developed as an instance of a particular
(micro-)architecture platform. That is, rather than being assembled from a collection of independently
developed blocks of silicon functionalities, they will be derived froma specic family of micro-architectures,
possibly oriented toward a particular class of problems, that can be extended or reduced by the system
developer. The elements of this family are a sort of hardware denominator that could be shared across
multiple applications. Hence, an architecture platform is a family of micro-architectures that share some
commonality, the library of components that are used to dene the micro-architecture. Every element
2006 by Taylor & Francis Group, LLC
Platform-Based Design for Embedded Systems 22-5
of the family can be obtained quickly through the personalization of an appropriate set of parameters
controlling the micro-architecture. Often, the family may have additional constraints on the components
of the library that can or should be used. For example, a particular micro-architecture platform may
be characterized by the same programmable processor and the same interconnection scheme, while
the peripherals and the memories of a specic implementation may be selected from the predesigned
library of components depending on the given application. Depending on the implementation platform
that is chosen, each element of the family may still need to go through the standard manufacturing
process including mask making. This approach then conjugates the need of saving design time with the
optimization of the element of the family for the application at hand. Although it does not solve the mask
cost issue directly, it should be noted that the mask cost problem is primarily owing to the generation of
multiple mask sets for multiple design spins, which is addressed by the architecture platformmethodology.
The less constrained the platform, the more freedom a designer has in selecting an instance and the
more potential there is for optimization, if time permits. However, more constraints mean stronger
standards and easier addition of components to the library that denes the architecture platform (as with
PCplatforms). Note that, the basic concept is similar to the cell-based design layout style, where regularity
and the reuse of library elements allow faster design time at the expense of some optimality. The trade-
off between design time and design quality needs to be kept in mind. The economics of the design
problem must dictate the choice of the design style. The higher the granularity of the library, the more
leverage we have in shortening the design time. Given that the elements of the library are reused, there is a
strong incentive to optimize them. In fact, we argue that the macro-cells should be designed with great
care and attention given to area and performance. It also makes sense to offer a variation of cells with
the same functionality but with implementations that differ in performance, area, and power dissipation.
Architecture platforms are, in general, characterized by (but not limited to) the presence of programmable
components. Then, each of the platform instances that can be derived from the architecture platform
maintains enough exibility to support an application space that guarantees the production volumes
required for economically viable manufacturing.
The library that denes the architecture platform may also contain recongurable components, which
comes in two avors. With runtime recongurability, FPGA (Field Programmable Gate Arrays) blocks
can be customized by the user without the need of changing mask set, thus saving both design cost and
fabrication cost. With design-time recongurability, where the silicon is still application specic, only
design time is reduced.
An architecture platform instance is derived from an architecture platform by choosing a set of compo-
nents fromits library andby setting parameters of recongurable components of the library. The exibility,
or the capability of supporting different applications, of a platform instance is guaranteed by program-
mable components. Programmability will ultimately be of various forms. One is software programmability
to indicate the presence of a microprocessor, Digital Signal Processor (DSP) or any other software pro-
grammable component. Another is hardware programmability to indicate the presence of recongurable
logic blocks such as FPGAs, whereby logic function can be changed by software tools without requiring
a custom set of masks. Some of the new architecture and/or implementation platforms being offered in
the market mix the two types into a single chip. For example, Triscend, Altera, and Xilinx are offering
FPGA fabrics with embedded hard processors. Software programmability yields a more exible solution,
since modifying software is, in general, faster and cheaper than modifying FPGA personalities. On the
other hand, logic functions mapped on FPGAs execute orders of magnitude faster and with much less
power than the corresponding implementation as a software program. Thus, the trade-off here is between
exibility and performance.
22.3.2 API Platform
The concept of architecture platform by itself is not enough to achieve the level of application software
reuse we require. The architecture platform has to be abstracted at a level where the application soft-
ware sees a high-level interface with the hardware that we call API or Programmer Model. A software
2006 by Taylor & Francis Group, LLC
22-6 Embedded Systems Handbook
layer is used to perform this abstraction. This layer wraps the essential parts of the architecture
platform:
The programmable cores and the memory subsystem via a Real-Time Operating System (RTOS).
The I/O subsystem via the device drivers.
The network connection via the network communication subsystem.
In our framework, the API is a unique abstract representation of the architecture platformvia the software
layer. Therefore, the application software can be reused for every platform instance. Indeed, the API is a
platform itself that we can call the API platform. Of course, the higher the abstraction level at which a
platform is dened, the more instances it contains. For example, to share the source code, we need to have
the same operating system but not necessarily the same instruction set, while to share the binary code, we
need to add the architectural constraints that force us to use the same ISA (Instruction Set Architecture),
thus greatly restricting the range of architectural choices.
The RTOS is responsible for the scheduling of the available computing resources and of the
communication between them and the memory subsystem. Note that, in several embedded system
applications, the available computing resources consist of a single microprocessor. In others, such as
wireless handsets, the combination of a Reduced Instruction Set Computer (RISC) microprocessor or
controller and DSP has been used widely in 2G, and now for 2.5G and 3G, and beyond. In set-top boxes,
a RISC for control and a media processor have also been used. In general, we can imagine a multiple core
architecture platform where the RTOS schedules software processes across different computing engines.
22.3.3 System Platform Stack
The basic idea of system platform stack is captured in Figure 22.2. The vertex of the two cones repres-
ents the combination of the API and the architecture platform. A system designer maps its application
onto the abstract representation that includes a family of architectures that can be chosen to optimize
cost, efciency, energy consumption, and exibility. The mapping of the application onto the actual
architecture in the family specied by the API can be carried out, at least in part, automatically if a set of
appropriate software tools (e.g., software synthesis, RTOS synthesis, device-driver synthesis) is available.
It is clear that the synthesis tools have to be aware of the architecture features as well as of the API. This
set of tools makes use of the software layer to go from the API platform to the architecture platform. Note
that, the system platform effectively decouples the application development process (the upper triangle)
fromthe architecture implementation process (the lower triangle). Note also that, once we use the abstract
denition of API as described earlier, we may obtain extreme cases such as traditional PC platforms on
Platform
design-space
export
Platform
mapping
Architectural space
Application space
Application instance
Platform instance
System
platform
FIGURE 22.2 System platform stack.
2006 by Taylor & Francis Group, LLC
Platform-Based Design for Embedded Systems 22-7
one side and full hardware implementation on the other. Of course, the programmer model for a full
custom hardware solution is trivial since there is a one-to-one map between functions to be implemented
and physical blocks that implement them. In the latter case, PBD amounts to adding to traditional design
methodologies some higher level of abstractions.
22.4 Network Platforms
In distributed systems the design of the protocols and channels that support the communication among
the systemcomponents is a difcult task owing to the tight constraints on performances and cost. To make
the communication design problem more manageable, designers usually decompose the communication
function into distinct protocol layers, and design each layer separately. According to this approach, of
which the Open Systems Interconnection (OSI) Reference Model is a particular instance, each protocol
layer together with the lower layers dene a platform that provides CSs to the upper layers and to the
application-level components. Identifying the most effective layered architecture for a given application
requires one to solve a trade-off between performances, which increase by minimizing the number of
layers, and design manageability, which improves with the number of the intermediate steps. Present
embedded system applications, owing to their tight constraints, increasingly demand the codesign of
protocol functions that in less-constrained applications are assigned to different layers and considered
separately (e.g., cross-layer protocol design of MAC and routing protocols in sensor networks). The
denition of an optimal layered architecture, the design of the correct functionality for each protocol
layer, and the design-space exploration for the choice of the physical implementation must be supported
by tools and methodologies that allow to evaluate the performances and guarantees the satisfaction of the
constraints after each step. For these reasons, we believe that the PBDprinciples and methodology provide
the right framework to design communication networks. In this section, rst, we formalize the concept of
Network Platform (NP). Then, we outline a methodology for selecting, composing, and rening NP [6].
22.4.1 Denitions
ANetwork Platformis a library of resources that can be selected and composed together to forma Network
Platform Instance (NPI) and support the interaction among a group of interacting components.
The structure of a NPI is dened by abstracting computation resources as nodes and communication
resources as links. Ports interface nodes with links or with environment of the NPI. The structure of
a node or a link is dened by its input and output ports, the structure of a NPI is dened by a set of nodes
and links connecting them.
The behaviors and the performances of a NPI are dened in terms of the type and the quality of the
CSs it offers. We formalize the behaviors of a NPI using the Tagged Signal Model [7]. NPI components
are modeled as processes and events model the instances of the send and receive actions of the processes.
An event is associated with a message that has a type and a value, and with tags that specify attributes of
the corresponding action instance (e.g., when it occurs in time). The set of behaviors of a NPI is dened
by the intersection of the behaviors of the component processes.
A NPI is dened as a tuple, NPI = (L, N, P, S), where:
L = {L
1
, L
2
, . . . , L
Nl
} is a set of directed links.
N = {N
1
, N
2
, . . . , N
Nn
} is a set of nodes.
P = {P
1
, P
2
, . . . , P
Np
} is a set of ports. A port P
i
is a triple (N
i
, L
i
, d), where N
i
N is a node,
L
i
L Env is a link or the NPI environment, and d = in if it is an input port, d = out
if it is an output port. The ports that interface the NPI with the environment dene the sets
P
in
= {(N
i
, Env, in)} P, P
out
= {(N
i
, Env, out)} P.
S =
Nn+Nl
R
i
is the set of behaviors, where R
i
indicates the set of behaviors of a resource that
can be a link in L or a node in N.
2006 by Taylor & Francis Group, LLC
22-8 Embedded Systems Handbook
The basic services provided by a NPI are called Communication Services (CSs). A CS consists of a sequence
of message exchanges through the NPI from its input to its output ports. A CS can be accessed by NPI
users through the invocation of send and receive primitives whose instances are modeled as events. A NPI
API consists of the set of methods that are invoked by the NPI users to access the CS. For the denition
of a NPI API it is essential to specify not only the service primitives but also the type of CS they provide
access to (e.g., reliable send, out-of-order delivery etc.). Formally, a CS is a tuple (
P
in
,
P
out
, M, E, h, g, <
t
),
where
P
in
P
in
is a nonempty set of NPI input ports,
P
out
P
out
is a nonempty set of NPI output ports,
M is a nonempty set of messages, E is a nonempty set of events, h is a mapping h : E (
P
in
P
out
) that
associates each event with a port, g is a mapping g : E M associating each event with a message, <
t
is
a total order on the events in E.
A CS is dened in terms of the number of ports, that determine, for example, if it is a unicast, multicast,
or broadcast CS, the set M of messages representing the exchanged information, the set E including the
events that are associated with the messages in M and model the instances of the send and receive methods
invocations. The CS concept is useful to express the correlation among events, and explicit, for example,
if two events are from the same source or are associated with the same message.
22.4.2 Quality of Service
NPIs can be classied according to the number, the type, the quality, and the cost of the CS they offer.
Rather than in terms of event sequences, a CS is more conveniently described using Quality of Service
(QoS) parameters such as error rate, latency, throughput, jitter, and cost parameters such as consumed
power and manufacturing cost of the NPI components. QoS parameters can be simply dened by using
the annotation functions that associate individual events with quantities, such as the time when an event
occurs and the power consumed by an action. Hence, one can compare the values of pairs of input and
output events associated with the same message to quantify the error rate, or compare the timestamp of
events observed at the same port to compute the jitter. The most relevant QoS parameters are dened
using a notation where e
i,j
e
M,(
P
in
P
out
)
indicates an event carrying the i-th message and observed at
the j-th port, v(e) and t (e) represents, respectively, the value of the message carried by event e and the
timestamp of the action modeled by event e.
Delay. The communication delay of a message is given by the difference between the timestamps of
the input and output events carrying that message. Assuming that the i-th message is transferred from
input port j
1
to output port j
2
, the delay
i
of the i-th message, the average delay
Av
and the peak delay
Peak
are dened, respectively, as
i
= t (e
j
2
,i
) t (e
j
1
,i
),
Av
=
|M|
i=1
(t (e
j
2
,i
) t (e
j
1
,i
))/|M|,
Peak
=
max
i
{t (e
j
2
,i
) t (e
j
1
,i
)}.
Throughput. The throughput is given by the number of output events in an interval (t
0
, t
1
), that is, the
cardinality of the set = {e
i
E|h(e
i
)
P
out
, t (e
i
) (t
0
, t
1
)}.
Error rate. The Message Error Rate (MER) is given by the ratio between the number of lost or corrupted
output events and the total number of input events. Given LostM = {e
i
E|h(e
i
)
P
in
, e
j
E
s.t. h(e
j
)
P
out
g(e
j
) = g(e
i
)}, CorrM = {e
i
E|h(e
i
)
P
in
, e
j
E s.t. h(e
j
)
P
out
, g(e
j
) =
g(e
i
), v(e
j
) = v(e
i
)} and In M = {e
i
E|h(e
i
)
P
in
}, the MER = (|LostM| + |CorrM|)/|InM|. Using
information on message encoding MER can be converted to packet and bit error rate.
The number of CS that a NPI can offer is large, so the concept of Class of Communication Services
(CCSs) is introduced to simplify the description of a NPI. CCS dene a new abstraction (and therefore
a platform) that groups together CS of similar type and quality. For example, a CCS may include all the
CS that transfer a periodic stream of messages with no errors, another CCS may include all the CS that
transfer a stream of input messages arriving at a bursty rate with a 1% error rate. CCS can be identied
based on the type of messages (e.g., packets, audio samples, video pixels etc.), the input arrival pattern
(e.g., periodic, bursty etc.), and the range of QoS parameters. For each NPI supporting multiple CS, there
are several ways to group them into CCS. It is the task of the NPI designer to identify the CCS and provide
the proper abstractions to facilitate the use of the NPI.
2006 by Taylor & Francis Group, LLC
Platform-Based Design for Embedded Systems 22-9
22.4.3 Design of Network Platforms
The design methodology for NPs derive a NPI implementation by successive renement from the spec-
ication of the behaviors of the interacting components and the declaration of the constraints that a NPI
implementation must satisfy. The most abstract NPI is dened by a set of end-to-end direct logical links
connecting pairs of interacting components. Communication renement of a NPI denes at each step a
more detailed NPI
by replacing one or multiple links in the original NPI with a set of components or
NPIs. During this process another NPI can be used as a resource to build other NPIs. A correct renement
procedure generates a NPI
that provides CS equivalent to those offered by the original NPI with respect
to the constraints dened at the upper level. A typical communication renement step requires to dene
both the structure of the rened NPI
, that is, its components and topology, and the behavior of these
components, that is, the protocols deployed at each node. One or more NP components (or predened
NPIs) are selected from a library and composed to create CS of better quality. Two types of compositions
are possible. One consists of choosing a NPI and extending it with a protocol layer to create CS at a higher
level of abstraction (vertical composition). The other is based on the concatenation of NPIs using an
intermediate component called adapter (or gateway) that maps sequences of events between the ports
being connected (horizontal composition).
22.5 Fault-Tolerant Platforms
The increasing role of embedded software in real-time feedback-control systems drives the demand for
fault-tolerant design methodologies [8]. The aerospace and automotive industries offer many examples of
systems whose failure may have unacceptable costs (nancial, human, or both). Designing cost-sensitive
real-time control systems for safety-critical applications require a careful analysis of the cost/coverage
trade-offs of fault-tolerant solutions. This further complicates the task of deploying the embedded soft-
ware that implements the control algorithms on the execution platform. The latter is often distributed
around the plant as it is typical, for instance, in automotive applications. In this section, we present a
synthesis-based design methodology that relieves the designers from the burden of specifying detailed
mechanisms for addressing the execution platform faults, while involving them in the denition of the
overall fault-tolerance strategy. Thus, they can focus on addressing plant faults within their control
algorithms, selecting the best components for the execution platform, and dening an accurate fault
model. Our approach is centered on a new model of computation, Fault-Tolerant Data Flows (FTDF),
that enables the integration of formal validation techniques.
22.5.1 Types of Faults and Platform Redundancy
In a real-time feedback-control system, like the one in Figure 22.3, the controller interacts with the plant by
means of sensors and actuators. Acontroller is a hardwaresoftware systemwhere the software algorithms
that implement the control law run on an execution platform. An execution platform is a distributed
system that is typically made of a software layer (RTOS, middleware services, . . . ) and a hardware layer
(a set of processing elements, called Electronic Control Units or ECUs, connected via communication
channels such as buses, crossbars, or rings). The design of these heterogeneous reactive distributed systems
is made even more challenging by the requirement of making them resilient to faults. Technically, a fault
is the cause of an error, an error is the part of the system state which may cause a failure, and a failure is
the deviation of the system from the specication [9]. A deviation from the specication may be owing to
the designers mistakes (bugs) or the accidents occurring while the system is operating. The latter can
be classied into two categories that are relevant for feedback-control systems: plant faults and execution
platformfaults. Theoretically, all bugs can be eliminated before the systemis deployed. In practice, they are
minimizedby using designenvironments that are basedonprecise Models of Computation(MoCs), whose
well-dened semantics enable formal validation techniques [1012], (e.g., synchronous languages [13]).
2006 by Taylor & Francis Group, LLC
22-10 Embedded Systems Handbook
Embedded software
Plant
Sensor
Sensor
Actuator
Actuator
Execution platform
Control law algorithms
RTOS and middle ware
Hardware architecture
ECU
ECU
ECU
ECU
ECU
ECU
ECU
ECU
Actuator
driver
Actuator
driver
Sensor
driver
Sensor
driver
Controller
FIGURE 22.3 A real-time control system.
Instead, plant faults and execution platformfaults must be dealt with online. Hence, they must be included
in the specication of the system to be designed.
Plant faults, including faults in sensors and actuators, must be handled at the algorithmic level using
estimation techniques and adaptive control methods. For instance, a drive-by-wire system [14, 15] might
need to handle properly a tire puncture or the loss of one of the four brakes. Faults in the execution
platform affect the computation, storage, and communication elements. For instance, a loss of power may
turn off an ECU, momentarily or forever. System operation can be preserved in spite of platform faults if
alternative resources supplying the essential functionality of the faulty one are available. Hence, the process
of making the platformfault-tolerant usually involves the introductionof redundancy with obvious impact
on the nal cost. While the replication of a bus or the choice of a faster microprocessor may not affect
sensibly the overall cost of a new airplane, their impact is quite signicant for high-volume products
like the ones of the automotive industry. The analysis of the trade-offs between higher redundancy and
lower costs is a challenging hardwaresoftware codesign task that designers of fault-tolerant systems for
cost-sensitive applications must face in addition to the following two: (1) how to introduce redundancy,
and (2) how to deploy the redundant design on a distributed execution platform. Since these activities
are both tedious and error prone, designers often rely on off-the-shelf solutions to address fault tolerance,
such as Time-Triggered Architecture (TTA) [16]. One of the main advantages of off-the-shelf solutions
is that the application does not need to be aware of the fault-tolerant mechanisms that are transparently
provided by the architecture to cover the execution platform faults. Instead, designers may focus their
attention on avoiding design bugs and tuning the control algorithms to address the plant faults. However,
the rigidity of off-the-shelf solutions may lead to suboptimal results from a design cost viewpoint.
22.5.2 Fault-Tolerant Design Methodology
We present an interactive design methodology that involves designers in the exploration of the
redundancy/cost trade-off [17]. To do so efciently, we need automatic tools to bridge the different
platforms in the system platform stack. In particular, we introduce automatic synthesis techniques that
process simultaneously the algorithm specication, the characteristics of the chosen execution platform,
and the corresponding fault model. Using this methodology, the designers focus on the control algorithms
and the selection of the components and architecture for the execution platform. In particular, they also
2006 by Taylor & Francis Group, LLC
Platform-Based Design for Embedded Systems 22-11
specify the relative criticality of each algorithm process. Based on a statistical analysis of the failure rates,
which should be part of the characterization of the execution platforms library, designers specify the
expected set of platform faults, that is, the fault model. Then, we use this information to (1) automatically
deduce the necessary software process replication, (2) distribute each process on the execution platform,
and (3) derive an optimal scheduling of the processes on each ECU to satisfy the overall timing constraints.
Together, the three steps (replication, mapping, and scheduling) result in the automatic deployment of the
embedded software on the distributed execution platform. Platforms export performance estimates, and
we can determine for each control process its worst case execution time (WCET) on a given component.
2
Then, we can use a set of verication tools to assess the quality of the deployment, most notably we have
a static timing analysis tool to predict the worst case latency from sensors to actuators. When the nal
results do not satisfy the timing constraints for the control application, precise guidelines are returned
to the designers who may use them to rene the control algorithms, modify the execution platform, and
revisit the fault model. While being centered on a synthesis step, our approach does not exclude the use of
predesigned components, such as TTA modules, communication protocols such as TTP [19] and fault-
tolerant operating systems. These components can be part of a library of building blocks that the designer
uses to further explore the fault-coverage/cost trade-off. Finally, the proposed methodology is founded
on a new MoC, FTDF, thus making it amenable to the integration of formal validation techniques. The
corresponding API platform consists primarily of the FTDF MoC.
22.5.2.1 Fault Model
For the sake of simplicity we assume fail silence: components either provide correct results or do not
provide any result at all. Recent work shows that fail-silent platforms can be realized with limited
area overhead and virtually no performance penalty [20]. The fail silence assumption can be relaxed
if invalid results are detected otherwise, as in the case of CRC-protected communication and voted com-
putation [21]. However, it is important to note that the proposed API platform (FTDF) is fault model
independent. For instance, the presence of value errors, where majority voting is needed, can be accounted
for in the implementation of the FTDF communication media (see Section 22.5.3). The same is true for
Byzantine failures, where components can have any behavior, including malicious ones like coordinating
to bring the system down to a failure [22]. In addition to the type of faults, a fault model also species the
number (or even the mix) of faults to be tolerated [23]. A statistical analysis of the various components
MTBFs (Mean Time Between Faults), their interactions, and MTBR (Mean Time Between Repairs), should
determine which subsystems have a compound MTBF that is so short to be of concern, and should be
part of the platform component characterization. The use of failure patterns to capture effectively these
interactions was proposed in Reference 24, which is the basis of our approach [17].
22.5.2.2 Setup
Consider the feedback-control system in Figure 22.3. The control system repeats the following sequence at
each period T
max
: (1) sensors are sampled, (2) software routines are executed, and (3) actuators are updated
with the newly processed data. The actuator updates are applied to the plant at the end of the period to
help minimize jitter, a well-known technique in the real-time control community [25, 26]. In order to
guarantee correct operation, the WCET among all possible iterations, that is, the worst case latency from
sensors to actuators, must be smaller than the given period T
max
(the real-time constraint), which is
determined by the designers of the controller based on the characteristics of the application. Moreover,
the critical subset of the control algorithms must be executed in spite of the specied platform faults.
22.5.2.3 Example
Figure 22.4 illustrates a FTDF graph for a paradigmatic feedback-control application, the inverted
pendulum control system. The controller is described as a bipartite directed graph G where the
vertices, called actors and communication media, represent software processes and data communication.
2
See Reference 18 for some issues and techniques to estimate WCETs.
2006 by Taylor & Francis Group, LLC
22-12 Embedded Systems Handbook
m
m
Coarse
control
task
Fine
control
task
m
m
m
m
m
Sensor
Input Arbiter
m
Output
Actuator
m Actuator
Inverted pendulum
(the plant)
Sensor
Sensor
FIGURE 22.4 Controlling an inverted pendulum.
CH1
CH0
ECU0 ECU1 ECU2
FIGURE 22.5 A simple platform graph.
Figure 22.5 illustrates a possible platform graph (PG), where vertices represent ECUs and communication
channels and edges describe their interconnections.
22.5.2.4 Platform Characteristics
Each vertex of PG is characterized by its failure rate and by its timing performance. A failure pattern is
a subset of vertices of PG that may fail together during the same iteration, with a probability so high to
be of concern. A set of failure patterns identify the fault scenarios to be tolerated. Based on the timing
performance, we can determine the WCETof actors on the different ECUs and the worst case transmission
time of data on channels. Graphs G and PG are related in two ways:
Fault-tolerance binding: for each failure pattern the execution of a corresponding subset of the
actors of G must be guaranteed. This subset is identied a priori based on the relative criticality
assignment.
2006 by Taylor & Francis Group, LLC
Platform-Based Design for Embedded Systems 22-13
Functional binding : a set of mapping constraints and performance estimates indicate where on PG
each vertex of G may be mapped and the corresponding WCET.
These bindings are the basis to derive a fault-tolerant deployment of G on PG. We use software replication
to achieve fault tolerance: critical routines are replicated statically (at compile time) and executed on
separate ECUs and the processed data are routed on multiple communication paths to withstand channel
failures. In particular, to have a correct deployment in absence of faults, it is necessary that all actors and
data communications are mapped onto ECUs and channels in PG. Then, to have a correct fault-tolerant
deployment, critical elements of G must be mapped onto additional PG vertices to guarantee their correct
and timely execution under any possible failure pattern in the fault model.
22.5.2.5 Design Flow
Using the interactive design ow of Figure 22.6 designers:
Specify the controller (the top-left FTDF graph)
Assemble the execution platform (the top-right PG)
Specify a set of failure patterns (subsets of PG)
Specify the fault-tolerance binding (fault behavior)
Specify the functional binding
All this information contributes to specifying what the system should do and how it should be
implemented. A synthesis tool automatically:
Introduces redundancy in the FTDF graph
Maps actors and their replicas onto PG
Schedules their execution
Finally, a verication tool checks whether the fault-tolerant behavior and the timing constraints are met.
If no solution is found, the tool returns a violation witness that can be used to revisit the specication and
to provide hints to the synthesis tool.
Fault behavior
Mapping
Fine
CTRL
Coarse
CTRL
Sens
Sens
Sens
Act
Act
Input
Arbiter
best
Output
ECU0
ECU1
ECU2
CH0
CH1
Sens
Sens
Sens
Input
Input
Coarse
CTRL
Coarse
CTRL
Fine
CTRL
Arbiter
best
Arbiter
best
Output
Output
Act
Act
FIGURE 22.6 Interactive design ow.
2006 by Taylor & Francis Group, LLC
22-14 Embedded Systems Handbook
22.5.3 The API Platform (FTDF Primitives)
In this section we present the structure and general semantics of the FTDF MoC. The basic building
blocks are actors and communication media. FTDF actors exchange data tokens at each iteration with
synchronous semantics [13].
An actor belongs to one of six possible classes: sensors, actuators, inputs, outputs, tasks, or arbiters.
Sensor and actuator actors read and update, respectively, the sensor and actuator devices interacting
with the plant. Input actors perform sensor fusion, output actors are used to balance the load on the
actuators, while task actors are responsible for the computation workload. Arbiter actors mix the values
that come from actors with different criticality to reach to the same output actor (e.g., braking command
and Antilock Braking System [ABS]).
3
Finally, state memories are connected to actors and operate as
one-iteration delays. With a slight abuse of terminology the terms state memory and memory actor are
used interchangeably in this section.
22.5.3.1 Tokens
Each token consists of two elds: Data, the actual data being communicated; Valid, a boolean ag
indicating the outcome of fault detection on this token. When Valid is false either no data is available
for this iteration, or the available data is not correct. In both cases the Data eld should be ignored. The
Valid ag is just an abstraction of more concrete and robust fault detection implementations.
22.5.3.2 Communication Media
Communication occurs via unidirectional (possibly many-to-many) communication media. All replicas
of the same source actor write to the same medium, and all destination actors read from it. Media act
as both mergers and repeaters sending the single merged result to all destinations. More formally, the
medium provides the correct merged result or an invalid token if no correct result is determined.
Assuming fail silence, merging amounts to selecting any of the valid results; assuming value errors,
majority voting is necessary; assuming Byzantine faults requires rounds of voting (see the consensus
problem [27]). Communication media must be distributed to withstand platform faults. Typically, this
means to have a repeater on each source ECU and a merger on each destination ECU (broadcasting
communication channels help reducing message trafc greatly). Using communication media, actors
always receive exactly one token per input and the application behavior is independent of the type of plat-
formfaults. The transmissionof tokens is initiatedby the active elements: regular actors andmemory actors.
22.5.3.2.1 Regular Actors
When an actor res, its sequential code is executed. This code is: stateless (state must be stored in memory
actors), deterministic (identical inputs generate identical outputs), nonblocking (once red, it does not
await for further tokens, data, or signals from other actors), and terminating (bounded WCET). The ring
rule species which subsets of input tokens must be valid to re the actor, typically all of them (AND
ring rule). However, the designer may need to specify partial ring rules for input and arbiter actors. For
example, an input actor reading data from three sensors may produce a valid result even when one of the
sensors cannot deliver data (e.g., when the ECU where the sensor is mapped is faulty).
22.5.3.2.2 Memory Actors (State Memories)
A memory provides its state at the beginning of an iteration and has a source actor, possibly replicated,
that updates its state at every iteration. State memories are analogous to latches in a sequential digital
circuit: they store the results produced during the current iteration for use in the next one.
Finally FTDF graphs can express redundancy, that is, one or more actors may be replicated. All the
replicas of an actor v A are denoted by R(v) A. Note that any two actors in R(v) are of the same type
and must compute the same function. This basic condition is motivated in Section 22.5.5 where replica
3
We advocate running nonsafety critical tasks, for example, door controllers, on separate hardware. However, some
performance enhancement tasks, for example, side-wind compensation, may share sensors and actuators with critical
tasks (steer-by-wire). It may be protable to have them share the execution platform as well.
2006 by Taylor & Francis Group, LLC
Platform-Based Design for Embedded Systems 22-15
determinismis discussed. Note that the replicationof sensors andactuators is not performedautomatically
because they may have a major impact on cost, we discuss the implications of this choice in Reference 17.
22.5.4 Fault-Tolerant Deployment
The result of the synthesis is a redundant mapping L, that is, an association of elements of the FTDF
network to multiple elements of the execution platform, and for each element in the execution platform
a schedule S, that is, a total order in which actors should be executed and data should be transmitted.
A pair (L, S) is called a deployment. To avoid deadlocks, the total orders dened by S must be compatible
with the partial order in L, which in turn derives directly from the partial order in which the FTDF actors
in the application must be executed. To avoid causality problems, memory actors are scheduled before
any other actor, thus using the results of the previous iteration. Schedules based on total orders are called
static: there are no runtime decisions to make, each ECU and each channel controller simply follows the
schedule. However, in the context of a faulty execution platform an actor may not receive enough valid
inputs to re and this may lead to starvation. This problem is solved by skipping an actor if it cannot re
and by skipping a communication if no data is available [24].
22.5.5 Replica Determinism
Given a mapping L, it is important to preserve replica determinism: if two replicas of a same actor re, they
produce identical results. For general MoCs the order of arrival of results must also be the same for all
replicas. Synchrony of FTDF makes this check unnecessary. Clearly, the execution platform must contain
the implementation of a synchronization algorithm [28].
Replica determinism in FTDF can be achieved enforcing two conditions: (1) all replicas compute the
same function, and (2) for any failure pattern, if two replicas get a ring subset of inputs they get the
same subset of inputs. Condition (1) is enforced by construction by allowing only identical replicas.
Condition (2) amounts to a consensus problemand it can either be checked at runtime (like for Byzantine
agreement rounds of voting), or it can be analyzed statically at compile time (if the fault model is milder).
Our interest in detectably faulty execution platforms makes the latter approach appear more promising
and economical. Condition (2) is trivially true for all actors with the AND ring rule. For input and
arbiter actors the condition must be checked and enforced [17].
22.6 Analog Platforms
Emerging applications such as multimedia devices (video cell phones, digital cameras, wireless PDAs to
mention but a few) are driving the SoC market towards the integration of analog components in almost
every system. Today, system-level analog design is a design process dominated by heuristics. Given a set of
specications/requirements that describes the system to be realized, the selection of a feasible (let alone
optimal) implementation architecture comes mainly out of experience. Usually, what is achieved is just
a feasible point at the system level, while optimality is sought locally at the circuit level. This practice is
caused by the number of second order effect that are very hard to deal with at high level without actually
designing the circuit. Platform-based design can provide the necessary insight to develop a methodology
for analog components that takes into consideration system level specications and can choose among a
set of possible solutions including digital approaches wherever it is feasible to do so. If the productivity
gap between analog and digital components is not overcome, time-to-market and design quality of SoC
will be seriously affected by the small analog sections required to interface with the real world. Moreover,
SoC designs will expose system level explorations that would be severely limited if the analog section is
not provided with a proper abstraction level that allows systemperformance estimation in an efcient way
and across the analog/digital boundary. Therefore, there is a strong need to develop more abstract design
techniques that can encapsulate analog design into a methodology that could shorten design time without
2006 by Taylor & Francis Group, LLC
22-16 Embedded Systems Handbook
compromising the quality of the solutions, leading to a hardware/software/analog co-design paradigm for
embedded systems.
22.6.1 Denitions
The platform abstraction process can be extended to analog components in a very natural way. Deriving
behavioral and performance models, however, is more involved due to the tight dependency of analog
components on device physics that requires the use of continuous mathematics for modeling the rela-
tions among design variables. Formally, an Analog Platform (AP) consists of a set of components, each
decorated with:
a set of input variables u U, a set of output (performance) variables y Y, a set of internal vari-
ables (including state variables) x X , a set of conguration parameters K; some parameters
take values in a continuous space, some take values in a discrete set, for example when they encode
the selection of a particular alternative.
a behavioral model that expresses the behavior of the component represented implicitly as
F(u, y, x, ) = 0, where F(), may include integro-differential components; in general, this set
determines uniquely x and y given u and . Note that the variables considered here can be function
of time and that the functional F includes constraints on the set of variables (for example, the
initial conditions on the state variables).
a feasible performance model. Let
y
(u, ) denote the map that computes the performance y
corresponding to a particular value of u and by solving the behavioral model. The set of feas-
ible analog performance (such as gain, distortion, power), is the set described by the relation
P(y (u )) = 1
, y (u ) =
y
(
, u ).
validity laws L(u, y, x, ) 0 i.e., constraints (or assumptions) on the variables and parameters
of the component that dene the range of the variables for which the behavioral and performance
models are valid.
Note that there is no real need to dene the feasible performance model since the necessary information
is all contained in the behavioral model. We prefer to keep them separate because of the use we make of
them in explaining our approach.
At the circuit level of abstraction, the behavioral models are the circuit equations with x being the
voltages, currents and charges, y being a subset of x and/or a function of x and when they express
performance gures such as power or gain. To compute performance models, we need to solve the
behavioral models that implies solving ordinary differential equations, a time consuming task. In the past,
methods to approximate the relation between y and (the design variables) with an explicit function were
proposed. In general, to compute this approximation, a number of evaluations of the behavioral model
for a number of parameters is performed (by simulation, for example) and then an interpolation or
approximation scheme is used to derive the approximation to the map
y
. We see in Section 22.6.2 how
to compute an approximation to the feasible performance set directly.
Example 22.1 Considering an OTA for an arbitrary application, we can start building a platform from the
circuit level by dening:
U as the set of all possible input voltages V
in
(t ) s.t. |V
in
| < 100 mV and bandwidth V
in
< 3 MHz;
Y as the space of vectors {V
out
(t ), gain, IIP3, r
out
} (IIP3 is the third order intermodulation intercept
point referredto the input, r
out
is the output resistance) X the set of all internal current andvoltages,
and K the set of transistor sizings.
for a transistor level component, the behavioral model F consists of the solution of the circuit
equations, e.g. through a circuit simulator.
y
(u, ) as the set of all possible y.
validity laws L are obtained from Kirchoff laws when composing individual transistors and other
constraints, e.g. maximum power ratings of breakdown voltages.
2006 by Taylor & Francis Group, LLC
Platform-Based Design for Embedded Systems 22-17
We can build a higher level (level 1) OpAmp platform where:
U
1
is the same, Y
1
is the output voltage of the OpAmp, X is empty, K
1
consists of possible {gain,
IIP3, r
out
} triples (thus it is a projection of Y
0
);
F
1
can be expressed in explicit form,
y
1
(t ) = h (t ) (a
1
u (t ) + a
3
u (t )
3
) + noise
y
2
= a
1
; y
3
=
4
3
a
1
a
3
y
is the set of possible y;
there are no validity constraints, L < 0 always.
When a platform instance is considered, we have to compose the models of the components to obtain
the corresponding models for the instance. The platform instance is then characterized by
a set of internal variables of the platform = [
1
,
2
, ...,
n
] ,
a set of inputs of the platform, h H
a set of performances ,
a set of parameters Z.
The variable names are different from the names used to denote the variables of the components to
stress that there may be situations where some of the component variables change roles (for example, an
input variable of one component may become an internal variable; a new parameter can be identied in the
platform instance that is not visible or useful at the component level). To compose the models, we have to
include in the platform the composition rules. The legal compositions are characterized by the interconnect
equations that specify which variables are shared when composing components and by constraints that
dene when the composition is indeed possible. These constraints may involve range of variables as well
as nonlinear relations among variables. Formally, a connection is establishing a pairwise equality between
internal variables for example
i
=
j
, inputs and performance; we denote the set of interconnect relations
with c (h, , , ) = 0 that are in general a set of linear equalities. The composition constraints are denoted
by L(h, , , ) 0, that are in general, non linear inequalities. Note that in the platform instance all
internal variables of the components are present as well as all input variables. In addition, there is no
internal or input variable of the platform instance that is not an internal or input variable of one of the
components. The behavioral model of the platform instance is the union of all behavioral models of
the components conjoined with the interconnect relations. The validity laws are the conjunction of the
validity laws of the components and of the composition constraints. The feasible performance model may
be dened anew on the platform instance but it may also be obtained by composition of the performance
models of the components. There is an important and interesting case when the composition may be
done considering only the feasible performance models of the components obtained by appropriate
approximation techniques. In this case, the composition constraints assume the semantics of dening
when the performance models may be composed. For example, if we indicate with the parameters
related to internal nodes that characterizes the interface in Figure 22.7(a) (e.g. input/output impedance
in the linear case), then matching between has to be enforced during composition. In fact, both P
A
and P
B
were characterized with specic s (Figure 22.7[b]), so L has to constrain A B composition
consistently with performance models. In this case, an architectural exploration step consisting of forming
different platform instances out of the component library and evaluating them, can be performed very
quickly albeit possibly with restrictions on the space of the considered instances caused by the composition
constraints.
2006 by Taylor & Francis Group, LLC
22-18 Embedded Systems Handbook
Platform composition A driving B with interface paramater l
l
A B
A
eq
B
l
S
B
eq
A
l
L
Characterization setup for platform A and B
(a)
(b)
FIGURE 22.7 Interface parameter during composition AB and characterization of A and B.
Example 22.2 We can build a level 2 platform consisting of an OpAmp (OA) and a unity gain buffer
following it (UB, the reader can easily nd a proper denition for it), then we can dene a higher level
OpAmp platform component so that:
1
= V
OA
in
,
2
= V
OA
out
,
3
= V
UB
in
,
4
= V
UB
in
and connect them in series specifying
2
=
3
;
h connected to
1
is the set of input voltages V
in
(t );
is the space of
1
(t ), the cascade response in time,
2
= gain,
3
= IIP3. In this case
2
immediately equals y
OA
2
, while
3
is a non linear function of y
OA
and y
UB
;
Z consists of all parameters specifying a platforminstance, inthis case we may have Z = Y
OA
Y
UB
.
a platform instance composability law L requires that the load impedance Z
L
> 100r
out
both at
the output of the OpAmp and the unity buffer.
22.6.2 Building Performance Models
An important part of the methodology is obtaining performance models. We already mentioned that we
need to approximate the set
Y explicitly eliminating the dependence on the internal variables x. To do so
a simulation-based approach is proposed.
22.6.2.1 Performance Model Approximation
In general terms, simulation maps a conguration set (typically connected) K into a performance set in
Y, thus establishing a relation among points belonging to the mapped set. Classic regression schemes
provides an efcient approximation to the mapping function (), however our approach requires dealing
with performance data in two different ways. The rst one, referred to as performance model P, allows
discriminating between points in
Y and points in Y\
Y. A second one, () =
1
(), implementing the
inverse mapping from
Y into K, used to map down from a higher-level platform layer to a lower one.
However, fundamental issues (i.e. () being an invertible function) and accuracy issues (a regression from
R
m
into R
n
) suggest a table-lookup implementation for (), possibly followed by a local optimization
phase to improve mapping. Therefore, we will mainly focus on basic performance models P.
The set
Y Y denes a relation in Y denoted with P. We use Support Vector Machines (SVMs) as
a way of approximating the performance relation P [29]. SVMs provide approximating functions of the
form
f (x) = sign(
i
e
|xx
i
|
2
) (22.1)
2006 by Taylor & Francis Group, LLC
Platform-Based Design for Embedded Systems 22-19
where x is the vector to be classied, x
i
are observed vectors,
i
s are weighting multipliers, is a biasing
constant and is a parameter controlling the t of the approximation. More specically, SVMs exploit
mapping to Hilbert spaces so that hyperplanes can be exploited to perform classication. Mapping to
high dimensional spaces is achieved through kernel functions, so that a kernel k(, ) is associated at each
point . Since the only general assumption we can make on () is continuity and on K is connectivity
4
,
we can only deduce that
Y is connected as well. Therefore, the radial basis function Gaussian kernel is
chosen, k(,
) = e
2
, where is a parameter of the kernel and controls the widthof the kernel
function around . We resort to a particular formulation of SVMs known as one-class SVM where an
optimal hyperplane is determined to separate data from the origin. The optimal hyperplane be computed
very efciently through a quadratic problem, as detailed in [30].
22.6.2.2 Optimizing the Approximation Process
Sampling schemes for approximating unknown functions are exponentially dependent on the size of the
function support. In the case of circuit, none but very simple circuits could be realistically characterized
in this way. Fortunately, there is no need to sample the entire space K since we can use additional
information obtained from design considerations to exclude parts of the parameter space. The set of
interesting parameters is delimited by a set of constraints of two types:
topological constraints derived from the use of particular circuit structures, such as two stacked
transistor sharing the same current or a set of V
DS
summing to zero;
physical constraints induced by device physics, such as V
GS
V
DS
relation to enforce saturation
or g
m
I
D
relations;
performance constraints on circuit performances, such as minimum gain or minimum phase
margin, that can be achieved.
Additional constraints can be added as designers understanding of circuit improves. The more constraints
we add, the smaller the interesting conguration space K. However, if a constraint is tight, i.e., it either
denes lower dimensional manifolds for example when the constraint is an equality, or the measure of
the manifold is small, the more likely it is to introduce some bias in the sampling mechanism because of
the difculty in selecting points in these manifolds. To eliminate this ill-conditioning effect, we relax
these constraints to include a larger set of interesting parameters. We adopt a statistical means of relaxing
constraints by introducing random errors with the aim of dithering systematic errors and recovering
accuracy in a statistical sense. Given an equality constraint f () = 0 and its approximation
f () = 0,
we derive a relaxation |
Y
top
that denes the set
of achievable performance at the top level. The intersection of the two sets dene the feasible set for the
optimization process. The result of the process is a y
top
opt
. Then the process is to map back the selected
point to the lower levels of the hierarchy. If the abstractions are conservative, the top-down process is
straightforward. Otherwise, at each level of the hierarchy, we have to verify using the performance models,
the behavioral models and the validity laws. In some cases, a better design may be obtained by introducing
in the top-down phase cost functions and constraints that are dened only at a particular abstraction
level. In this case, the space of achievable performances intersected with this new set of constraints denes
the search space for the optimization process. At times, it is more convenient to project down the cost
function and the constraints of the higher-level abstraction to the next level down. In this case, then the
search space is the result of the intersection of three sets in the performance space and the cost function is
a combination of the projected cost function and the one dened at this level. A ow chart summarizing
the top-down ow with platforms is shown in Figure 22.10. In Figure 22.11 the set of congurations
evaluated during an optimization run for the UMTS frontend in [32] is reported visualizing how multiple
topologies are exploited in selecting optimal points.
2006 by Taylor & Francis Group, LLC
Platform-Based Design for Embedded Systems 22-21
Select new topology Select new topology
Derive ACG and
nominal configuration
Generate P
Derive ACG and
nominal configuration
Generate P
Define behavioral
model
Define performance
model P
FIGURE 22.8 Bottom-up phase for generating AP.
npinput
Wideband
ActiveL
LNA
Tuned P
T
(G, NF, P, IP
3
, f
0
, Q)
P(G, NF, P, IP
3
)
P
np
(G, NF, P, IP
3
, f
0
, Q, IP
2
)
P
L
(G, NF, P, IP
3
, f
0
, Q, Q)
P
W
(G, NF, P, IP
3
, f
3dB
)
FIGURE 22.9 Sample model hierarchy for an LNA platform. The root node provides performance constraints for
a generic LNA, which is then rened by more detailed P for specic classes of LNAs.
The peculiarity of a platformapproachtomixedsignal designresides inthe accurate performance model
constraints P that propagate to the top-level architecture related constraints. For example, a platform
stack can be built where multiple analog implementation architectures are presented at a common level
of abstraction together with digital enhancement platforms (possibly including several algorithms and
hardware architectures), each component being annotated with feasible performance spaces. Solving the
system design problem at the top level where the platforms contain both analog and digital components,
allows selecting optimal platform instances in terms of analog and digital solutions, comparing how
different digital solutions interact with different analog topologies and nally selecting the best tradeoff.
The nal verication step is also greatly simplied by the platform approach since, at the end, models
and performances used in the top-down phase were obtained with a bottom-up scheme. Therefore, a
consistency check of models, performances and composition effects is all that is required at a hierarchical
2006 by Taylor & Francis Group, LLC
22-22 Embedded Systems Handbook
Build system with APs
Define a formal set
of conditions for feasibility
Define an objective
function for optimization
Optimize system constraining
behavioral models to their P
Refine/add platforms
Return optimal performances
and candidate solutions
FIGURE 22.10 Top-down phase for analog design-space exploration.
level, followed by more costly, low-level simulations that check for possible important effects that were
neglected when characterizing the platform.
22.6.4 Recongurable Platforms
Analog platforms can also be used to model programmable fabrics. In the digital implementation platform
domain, FPGAs provide a very intuitive example of platform, for example including microprocessors on
chip. The appearance of Field Programmable Analog Arrays [33] constitutes a new attempt to build
recongurable Analog Platform. A platform stack can be built by exploiting the software tools that allow
mapping complex functionalities (lters, ampliers, triggers and so on) directly on the array. The top level
platform, then, provides an API to map and congure analog functionalities, exposing analog hardware
at the software level. By exploiting this abstraction, not only design exploration is greatly simplied,
but new synergies between higher layers and analog components can be leveraged to further increase
the exibility/recongurability and optimize the system. From this abstraction level, implementing a
functionality with digital signal processing (FPGA) or analog processing (FPAA) becomes subject to
system level optimization while exposing the same abstract interface. Moreover, very interesting tradeoffs
can be explored exploiting different partitionings between analog and digital components and leveraging
the recongurability of the FPAA. For example, limited analog performances can be mitigated by proper
reconguration of the FPAA, so that a tight interaction between analog and digital subsystems can provide
a new optimum from the system level perspective.
22.7 Concluding Remarks
We dened PBD as an all-encompassing intellectual framework in which scientic research, design tool
development, and design practices can be embedded and justied. In our denition, a platform is simply
2006 by Taylor & Francis Group, LLC
Platform-Based Design for Embedded Systems 22-23
0.0014 0.0082 0.015 0.022 0.029
2.1
3.6
5.2
6.8
8.4
10
Pd
N
F
Optimization trace
FIGURE 22.11 Example of architecture selection during top-down phase. In the picture, an LNA is being selected.
Circles correspond to architecture 1 instances, crosses to architecture 2 instances. The black circle is the optimal LNA
conguration. It can be inferred that after an initial exploration phase alternating both topologies, simulated annealing
nally focuses on the architecture 1 to converge.
an abstraction layer that hides the details of the several possible implementation renements of the
underlying layer. PBD allows designers to trade-off various components of manufacturing, NRE and
design costs, while sacricing as little as possible potential design performance. We presented examples
of these concepts at different key articulation points of the design process, including system platforms as
composed of two platforms (micro-architecture and API), NPs, and AP.
This concept can be used to interpret traditional design steps in ASIC development such as synthesis
and layout. In fact, logic synthesis takes a level of abstraction consisting of HDL representation (HDL
platform) and maps it onto a set of gates that are dened in a library. The library itself is the gate-level
platform. The logic synthesis tools are the mapping methods that select a platform instance (a particular
netlist of gates that implements the functionality described at the HDL platform level) according to a
cost function dened on the parameters that characterize the quality of the elements of the library in
view of the overall design goals. The present difculties in achieving timing closure in this ow indicate
the need for a different set of characterization parameters for the implementation platform. In fact, in
the gate-level platform the cost associated to the selection of a particular interconnection among gates is
not reected, a major problem since the performance of the nal implementation depend critically on
this. The present solution of making a larger step across platforms by mixing mapping tools such as logic
synthesis, placement, and routing may not be the right one. Instead, a larger pay-off could be had by
changing levels of abstractions and including better parametrization of the implementation platform.
We argued in this chapter that the value of PBD can be multiplied by providing an appropriate set of
tools and a general framework where platforms can be formally dened in terms of rigorous semantics,
2006 by Taylor & Francis Group, LLC
22-24 Embedded Systems Handbook
manipulated by appropriate synthesis, and optimization tools and veried. Examples of platforms have
been given using the concepts that we have developed. We conclude by mentioning that the Metropolis
design environment [34], a federation of integrated analysis, verication, and synthesis tools supported
by a rigorous mathematical theory of meta-models and agents, has been designed to provide a general
open-domain PBD framework.
Acknowledgments
We gratefully acknowledge the support of the Gigascale Silicon Research Center (GSRC), the Center for
Hybrid Embedded System Software (CHESS) supported by an NSF ITR grant, the Columbus Project
of the European Community, and the Network of Excellence ARTIST. Alberto SangiovanniVincentelli
would like to thank Alberto Ferrari, Luciano Lavagno, Richard Newton, Jan Rabaey, and Grant Martin for
their continuous support in this research.
We also thank the member of the DOP center of the University of California at Berkeley for their
support and for the atmosphere they created for our work. The Berkeley Wireless Research Center
and our industrial partners, (in particular: Cadence, Cypress Semiconductors, General Motors, Intel,
Xilinx, and ST Microelectronics) have contributed with designs and continuous feedback to make this
approach more solid. Felice Balarin, Jerry Burch, Roberto Passerone, Yoshi Watanabe, and the Cadence
Berkeley Labs team have been invaluable in contributing to the theory of meta-models and the Metropolis
framework.
References
[1] K. Keutzer, S. Malik, A.R. Newton, J. Rabaey, and A. Sangiovanni-Vincentelli. System level design:
orthogonalization of concerns and platform-based design. IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, 19(12), 2000.
[2] A.L. Sangiovanni-Vincentelli. Dening platform-based design. In EEDesign, February 2002,
Available at www.eedesign.com/story/OEG20020204S0062).
[3] Felice Balarin, Massimiliano Chiodo, Paolo Giusto, Harry Hsieh, Attila Jurecska, Luciano Lavagno,
Claudio Passerone, Alberto Sangiovanni-Vincentelli, Ellen Sentovich, Kei Suzuki, and Bassam
Tabbara. HardwareSoftware Co-Design of Embedded Systems: The POLIS Approach. Kluwer
Academic Publishers, Boston/Dordrecht/London, 1997.
[4] Henry Chang, Larry Cooke, Merrill Hunt, Grant Martin, Andrew McNelly, and Lee Todd.
Surviving the SOC Revolution: A Guide to Platform Based Design. Kluwer Academic Publishers,
Boston/Dordrecht/London, 1999.
[5] A. Ferrari and A.L. Sangiovanni-Vincentelli. System design: traditional concepts and new
paradigms. In Proceedings of the International Conference on Computer Design, October 1999,
pp. 112.
[6] Marco Sgroi. Platform-based design methodologies for communication networks. PhD thesis,
Electronics Research Laboratory, University of California, Berkeley, CA, December 2002.
[7] E.A. Lee and A. Sangiovanni-Vincentelli. A framework for comparing models of computation.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17: 12171229,
1998.
[8] E.A. Lee. Whats ahead for embedded software? Computer, 33: 1826, 2000.
[9] J.C. Laprie, (Ed.). Dependability: Basic Concepts and Terminology in English, French, German,
Italian and Japanese, Vol. 5 Series Title: Dependable Computing and Fault-Tolerant Systems.
Springer-Verlag, New York, 1992.
[10] R. Alur, T. Dang, J. Esposito, Y. Hur, F. Ivancic, V. Kumar, I. Lee, P. Mishra, G.J. Pappas, and
O. Sokolsky. Hierarchical modeling and analysis of embedded systems. Proceedings of the IEEE,
91: 1128, 2003.
2006 by Taylor & Francis Group, LLC
Platform-Based Design for Embedded Systems 22-25
[11] S. Edwards, L. Lavagno, E. Lee, and A.L. Sangiovanni-Vincentelli. Design of embedded systems:
formal methods, validation and synthesis. Proceedings of the IEEE, 85: 266290, 1997.
[12] J. Eker, J.W. Janneck, E.A. Lee, J. Liu, J. Ludwig, S. Neuendorffer, S. Sachs, and Y. Xiong. Taming
heterogeneity the ptolemy approach. Proceedings of the IEEE, 91: 127144, 2003.
[13] A. Benveniste, P. Caspi, S. Edwards, N. Halbwachs, P. Le Guernic, and R. de Simone. The
synchronous language twelve years later. Proceedings of the IEEE, 91: 6483, 2003.
[14] R. Bannatyne. Time triggered protocol-fault tolerant serial communications for real-time
embedded systems. In Wescon/98. Conference Proceedings, 1998.
[15] R. Schwarz and P. Rieth. Global chassis control integration of chassis systems. Automatisier-
ungstechnik, 51: 300312, 2003.
[16] H. Kopetz and D. Millinger. The transparent implementation of fault tolerance in the time-
triggered architecture. In Dependable Computing for Critical Applications. San Jose, CA, 1999.
[17] C. Pinello, L.P. Carloni, andA.L. Sangiovanni-Vincentelli. Fault-tolerant deployment of embedded
software for cost-sensitive real-time feedback-control applications. In Proceedings of the European
Design and Test Conference. ACM Press, 2004.
[18] C. Ferdinand, R. Heckmann, M. Langenbach, F. Martin, M. Schmidt, H. Theiling, S. Thesing, and
R. Wilhelm. Reliable and precise WCET determination for a real-life processor. Lecture Notes in
Computer Science, 2211: 469485, 2001.
[19] H. Kopetz and G. Grundsteidl. TTP a protocol for fault-tolerant real-time systems. IEEE
Computer, 27: 1423, 1994.
[20] M. Baleani, A. Ferrari, L. Mangeruca, A. Sangiovanni-Vincentelli, M. Peri, and S. Pezzini. Fault-
tolerant platforms for automotive safety-critical applications. In Proceedings of the International
Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM Press, 2003,
pp. 170177.
[21] F.V. Brasileiro, P.D. Ezhilchelvan, S.K. Shrivastava, N.A. Speirs, and S. Tao. Implementing fail-silent
nodes for distributed systems. IEEE Transactions on Computers, 45: 12261238, 1996.
[22] L. Lamport, R. Shostak, and M. Pease. The byzantine generals problem. ACM Transactions on
Programming Languages and Systems, 4: 382401, 1982.
[23] H.S. Siu, Y.H. Chin, and W.P. Yang. Reaching strong consensus in the presence of mixed failure
types. Transactions on Parallel and Distributed Systems, 9, 1998.
[24] C. Dima, A. Girault, C. Lavarenne, and Y. Sorel. Off-line real-time fault-tolerant scheduling.
In Proceedings of the Euromicro 2001, Mantova, Italy, February 2001.
[25] T.A. Henzinger, B. Horowitz, and C.M. Kirsch. Embedded control systems development with
Giotto. In Proceedings of the Languages, Compilers, and Tools for Embedded Systems. ACM Press,
2001, pp. 6472.
[26] A.J. Wellings, L. Beus-Dukic, and D. Powell. Real-time scheduling in a generic fault-tolerant
architecture. In Proceedings of the RTSS98. Madrid, Spain, December 1998.
[27] M. Barborak, M. Malek, and A. Dahbura. The consensus problem in fault-tolerant computing.
ACM Computing Surveys, 25: 171220, 1993.
[28] L. Lamport and P. Melliar-Smith. Byzantine clock synchronization. In Proceedings of the Third
ACM Symposium on Principles of Distributed Computing. ACM Press, New York, 1984, pp. 6874.
[29] F. De Bernardinis, M.I. Jordan, and A.L. Sangiovanni Vincentelli. Support vector machines for
analog circuit performance representation. In Proceedings of the Design Automation Conference,
June 2003.
[30] J. Platt. Sequential minimal optimization: a fast algorithm for training support vector machines.
Microsoft Research, MSR-TR-98-14, 1998.
[31] P. Bunus and P. Fritzson. A debugging scheme for declarative equation based modeling languages.
Practical Aspects of Decl. Languages: 4th Int. Symp, 280, 2002.
[32] F. De Bernardinis, S. Gambini, F. Vinci, F. Svelto, R. Castello, and A. Sangiovanni-Vincentelli.
Design space exploration for a UMTS front-end exploiting analog platforms. In Proceedings of the
International Conference on Computer-Aided Design, 2004.
2006 by Taylor & Francis Group, LLC
22-26 Embedded Systems Handbook
[33] I. Macbeth. Programmable analog systems: the missing link. In EDA Vision (www.edavision.com),
July 2001.
[34] F. Balarin, Y. Watanabe, H. Hsieh, L. Lavagno, C. Passerone, and A. Sangiovanni-Vincentelli.
Metropolis: an integrated electronic system design environment. IEEE Computer, 36: 4552,
2003.
2006 by Taylor & Francis Group, LLC
23
Interface Specication
and Converter
Synthesis
Roberto Passerone
Cadence Design Systems, Inc.
23.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-1
23.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-2
23.3 Automata-Based Converter Synthesis . . . . . . . . . . . . . . . . . . 23-4
Interface Specication Requirements Specication
Synthesis
23.4 Algebraic Formulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-10
Trace-Based Solution End-to-End Specication
23.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-18
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-19
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-19
23.1 Introduction
Reuse is an established technique in modern design methodologies to reduce the complexity of designing
a system. Design reuse complements a design methodology by providing precharacterized components
that can be put together to perform the desired function. Together with abstraction and renement
techniques, design reuse is at the basis of such methodologies as platform-based design [13]. A platform
consists of a set of library elements, or resources, that can be assembled and interconnected according
to predetermined rules to form a platform instance. One step in a platform-based design ow involves
mapping a function or a specication onto different platform instances, and evaluating its performance.
By employing existing components and interconnection structures, reuse in a platform-based design ow
shifts the functional verication problemfromthe verication of the individual elements to the verication
of their interaction [4, 5]. This technique can be used at all the levels of abstraction in a design in order to
come to a complete implementation.
Adesign process can therefore be simplied by using a methodology that promotes the reuse of existing
components, also known as intellectual property, or IPs.
1
However, despite the advantages of precharac-
terization, the correct deployment of these blocks when the IPs have been developed by different groups
inside the same company, or by different companies, is notoriously difcult. Unforeseen interactions may
1
The termintellectual property is used to highlight the intangible nature of virtual components which essentially
consist of a set of property rights that are licensed, rather than of a physical entity that is sold.
23-1
2006 by Taylor & Francis Group, LLC
23-2 Embedded Systems Handbook
often make the behavior of the resulting design unpredictable. Design rules have been proposed that
try to alleviate the problem by forcing the designers to be precise about the behavior of the individual
components and to verify this behavior under a number of assumptions about the environment in which
they have to operate. While this is certainly a step in the right direction, it is by no means sufcient to
guarantee correctness: extensive simulation and prototyping are still needed on the compositions. Several
methods have been proposed for hardware and software components that encapsulate the IPs so that their
behavior is protected fromthe interaction with other components. Interfaces are then used to ensure the
compatibility between components. Roughly speaking, two interfaces are compatible if they t together
as they are.
Simple interfaces, typically specied in the type systemof a systemdescription language, may describe
the types of values that are exchanged between the components. This is the case, for example, of high-
level programming languages and hardware description languages. More expressive interfaces, typically
speciedinformally indesigndocuments, may describe the protocol for the component interaction[611].
Several formal methodologies have been proposed for specifying the protocol aspects of interfaces in a way
that supports automatic compatibility checks [7, 8, 12]. The key elements of these approaches are the
interpretation of an interface in the context of its environment, a model-independent formalism, and the
use of automata and game-theoretic algorithms for compatibility checking. With these approaches, given
interfaces for different IPs, one can check whether these IPs can be composed.
When components are taken from legacy systems or from third-party vendors, interface protocols are
unlikely to be compatible. However, this does not necessarily mean that components cannot be combined
together: approaches have been proposed that adapt the components by constructing a converter among
the incompatible communication protocols [10, 13]. We refer to these techniques collectively as interface
synthesis or converter synthesis. Thus, informally, two interfaces are adaptable if they t together by
communicating through a third component, the adapter. If interfaces specify only value types, then
adapters are simply type converters. However, if interfaces specify interaction protocols, then adapters
are protocol converters. For instance, a protocol may be dened as a formal language (a set of strings
from an alphabet) and can be nitely represented using automata [10]. The problem of converting one
protocol into another can then be addressed by considering their conjuction in terms of the product of
the corresponding automata and by removing the states and transitions that lead to a violation of one
of the two protocols. The converter uses state information to rearrange the communication between the
original interfaces, in order to ensure compatibility. A specication in the formof a third component can
be used to dene which rearrangements are appropriate in a given communication context. For instance,
it is possible to specify that the converter can change the timing of messages, but not their order, using an
n-bounded buffer, or that some messages may or may not be duplicated. In this work we initially review
this methodology, and then introduce a mathematically sound interpretation and generalization that can
be applied in several different contexts.
This chapter is organized as follows. First we review some related work in Section 23.2. Then, in
Section 23.3, we illustrate with an example the automata-based approach to the synthesis of protocol
converters. We then introduce more general frameworks in Section 23.4 and discuss the solution of the
protocol conversion problemin Section 23.4.1.
23.2 Related Work
One of the rst approach to interface synthesis was proposed by Borriello [14, 15], who introduces the
event graphto establish correct synchronization of the operations and to determine the data sequencing.
The event graph is constructed at a very low level of abstraction (waveforms), and can be derived from
the timing diagrams of a protocol. In this approach, the two protocols should be made compatible by
manually assigning labels to the data on both sides in order to establish the correct correspondence.
Because the specication is expressed in terms of the real timing of the signals, this approach can handle
both synchronous and asynchronous protocols. Sun and Brodersen [16] extend the approach by providing
2006 by Taylor & Francis Group, LLC
Interface Specication and Converter Synthesis 23-3
a library of components that frees the user fromconsidering lower-level details, without, however, lifting
the requirement of manually identifying the data correspondence.
Another approach is that of Akella and McMillan [17]: the protocols are described as two nite state
machines, while a third nite state machine represents the valid transfer of data. The correspondence
between the protocols is therefore embedded in this last specication. The synthesis procedure consists
of taking the product machine of the two protocols, which is then pruned of the invalid/useless states,
according to the specication. In the form proposed by the authors, the procedure, however, does not
account for data explicitly, so that the converter is unable to handle data widthmismatches inthe protocols.
Adifferent approach is that taken by Narayan and Gajski [18]: rst, the protocol specication is reduced
to the combination of ve basic operations (data read/write, control read/write, time delay); the protocol
description is then broken into blocks (called relations) whose execution is guarded by a condition on
one of the control wires or by a time delay; nally the relations of the two protocols are matched into
sets that transfer the same amount of data. Because the data is broken up into sets, this algorithm is
able to account for data width mismatch between the communicating parties. However, the procedural
specication of the protocols makes it difcult to adapt different sequencing (order) of the data, so that
only the synchronization problemis solved.
Some of the limitations above are addressed by the procedure proposed by Passerone et al. [10].
The specication is simplied by describing the protocols as regular expressions, which more closely
match the structure of a protocol, rather than as nite state machines (of course, the two formalisms
carry the same expressive power). In addition, typing information is used to automatically deduce the
correspondence of data between the communicating parties, so that a third specication for the valid
transfers is not necessary. The synthesis procedure then follows the approach proposed by Akella by
rst translating the regular expressions into automata, then constructing a product machine, and nally
pruning it of the illegal states. This approach was then extended to also include a specication of the valid
transactions, and was cast in the framework of game theory to account for more complex properties, such
as liveness [13].
Recently, Siegmund and Mller [19] have proposed a similar approach where the regular expressions
are embedded in the description language, in this case SystemC, through the use of appropriate supporting
classes. The advantage is that the interface description can be simulated directly with the existing applica-
tion. However, in this approach the user is required to describe the converter itself, instead of having it
be generated automatically froma description of the communicating protocols. In other words, issues of
synchronization and data sequencing must be solved upfront. Register transfer level code for the interface
can then be generated automatically fromthe SystemC specication.
More recent work has been focused on studying the above issues in a more general setting, generalizing
the approach to modeling interfaces and to synthesis by abstracting away from the particular model of
computation. De Alfaro and Henzinger propose to use block algebras to describe the relation between
components and interfaces [8]. Block algebras are mathematical structures that are used to model a system
as a hierarchical interconnection of blocks. Blocks are further classied as components and interfaces.
Informally components are descriptions of blocks that say what the block does. Conversely, interfaces
are descriptions of blocks that say the expectations that the block has with respect to its environment.
This distinction is based upon the observation that physical components do something in any possible
environment, whether they behave well or misbehave. In contrast, interfaces describe for each block the
environments that can correctly work with the block. Several different kinds of block algebras have been
developed for synchronous models, real-time models, and resource models, each carrying a particular
notion of compatibility [7, 2022]. The authors, however, limit their study to questions of compatibility,
and do not address the problemof synthesizing adapters.
The solution to the problem of protocol synthesis in an abstract setting will be discussed in more
detail in Section 23.4, along with the presentation of the relevant related work. Informally, the prob-
lem is formulated as an equation of the form P
1
| C | P
2
G, where P
1
and P
2
are the incompatible
protocols, C the protocol converter, and G a global specication that denes the terms of the transac-
tions. The operator | represents the operation of composition while the relation expresses the notion of
2006 by Taylor & Francis Group, LLC
23-4 Embedded Systems Handbook
conformance to the specication. This problem was rst addressed by Larsen and Xinxin in the framework
of process algebra [23]. The solution is derived constructively by building a special form of transition
system. More recently, Yevtushenko et al. [24] present a formulation of the problem in terms of languages
(sets of sequences of actions) under various kinds of composition operators. By working directly with
languages, the solution can then be specialized to different specic representations, including automata
and nite state machines. Finally, Passerone generalize the solution by representing the models as abstract
algebras, and derive the conditions that guarantee the existence of a solution [12].
23.3 Automata-Based Converter Synthesis
We introduce the problem of interface specication and protocol conversion by way of an example. We rst
set up the conversion problem for sendreceive protocols, where the sender and the receiver are specied as
automata. A third automaton, the requirement, is also introduced to specify constraints on the converter,
such as buffer size and the possibility of message loss. We then solve the protocol conversion bymanually
(of course, the procedure is easy to automate!) deriving an adapter that conforms to both the protocols
and the requirements. Section 23.4.1 will discuss an algebraic solution to the same problem.
23.3.1 Interface Specication
A producer and a consumer component wish to communicate some complex data across a communication
channel. They both partition the data into two parts. The interface of the producer is dened so that it
can wait an unbounded amount of time between the two parts. Because the sender has only outputs, this
is equivalent to saying that the interface does not guarantee to its environment that the second part will
follow the rst within a xed nite time. On the other hand, the interface of the consumer is dened so that
it requires that once the rst part has been received, the second is also received during the state transition
that immediately follows the rst. Because the receiver has only inputs, this specication corresponds to
an assumption that the receiver makes on the set of possible environments that it can work with. Clearly,
the two protocols are incompatible. In fact, the sender may elect to send the rst part of the data and then
wait for some time before sending the second part. Upon receiving the rst part, the receiver will, however,
assume that the second part will be delivered right away. Since this is not the case, a protocol violation
will occur. In other words, the guarantees of the sender are not sufcient to prove that the assumptions
of the receiver are always satised. Thus a direct composition would result in a possible violation of the
protocols. Because no external environment can prevent this violation (the system has no inputs after the
composition), an intermediate converter must be inserted to make the communication possible. Below,
we illustrate how to synthesize a converter that enables sender and receiver to communicate correctly.
The two protocols can be represented by the automata shown in Figure 23.1. There, the symbols a and b
(and their primed counterparts) are used to denote the rst and the second part of the data, respectively.
The symbol denotes instead the absence or irrelevance of the data. In other words, it acts as a dont care.
Figure 23.1(a) shows the producer protocol. The self-loop in state 1 indicates that the transmission of
a can be followed by any number of cycles before b is also transmitted. We call this protocol handshake
because it could negotiate when to send the second part of the data. After b is transmitted, the protocol
returns to its initial state, and is ready for a new transaction.
Figure 23.1(b) shows the receiver protocol. Here state 1 does not have a self-loop. Hence, once a has
been received, the protocol assumes that b is transmitted in the cycle that immediately follows. This
protocol is called serial because it requires a and b to be transferred back-to-back. Similarly to the sender
protocol, once b is received the automaton returns to its initial state, ready for a new transaction.
We have used nonprimed and primed versions of the symbols in the alphabet of the automata to
emphasize that the two sets of signals are different and should be connected through a converter. It is
the specication (below) that denes the exact relationships that must hold between the elements of
the two alphabets. Note that in the denition of the two protocols nothing relates the quantities of one
2006 by Taylor & Francis Group, LLC
Interface Specication and Converter Synthesis 23-5
1
0
b
1
0
b a a
(b) (a)
FIGURE 23.1 (a) Handshake and (b) serial protocols. (From Roberto Passerone, Luca de Alfaro, Thomas A.
Henzinger, and Alberto L. Sangiovanni-Vincentelli. In Proceedings of the IEEE/ACM International Conference on
Computer-Aided Design (ICCAD02), November 2002. With permission. Copyright 2002 IEEE.)
(a and b) to those of the other (a
/
and b
/
). The symbol a could represent the toggling of a signal, or
could symbolically represent the value of, for instance, an 8-bit variable. It is only in the interpretation of
the designer that a and a
/
actually hold the same value. The specication that we are about to describe
does not enforce this interpretation, but merely denes the (partial) order in which the symbols can be
presented to and produced by the converter. It is possible to explicitly represent the values passed; this is
necessary when the behavior of the protocols depends on the data, or when the data values provided by
one protocol must be modied (translated) before being forwarded to the other protocol. The synthesis of
a protocol converter would then yield a converter capable of both translating data values, and of modifying
their timing and order. However, the price to pay for the ability to synthesize data translators is the state
explosion in the automata to describe the interfaces and the specication.
Observe also that if a and b are symbolic representation of data, some other means must be available
in the implementation to distinguish when the actual data corresponds to a or to b. At this level of the
description we do not need to be specic: we simply assume that the sender has a way to distinguish
whether the symbol a or the symbol b is being produced, and the receiver has a way to distinguish whether
a
/
or b
/
is being provided. Examples of methods include toggling bits, or using data elds to specify
message types. However, we do not want to be tied to any particular method at this time.
23.3.2 Requirements Specication
What constitutes a correct transaction? Or in other words, what properties do we want the communica-
tion to have? In the context of this particular example the answer seems straightforward. Nonetheless,
different criteria could be enforced depending on the application. Each criterion is embodied by a different
specication.
One example of a specication is shown in Figure 23.2. The alphabet of the automaton is derived from
the Cartesian product of the alphabets of the two protocols for which we want to build a converter. This
specication states that no symbols should be discarded or duplicated by the converter, and symbols must
be delivered in the same order in which they were received; moreover, the converter can store at most one
undelivered symbol at any time. The three states in the specication correspond to three distinct cases:
State 0 denotes the case in which all received symbols have been delivered (or that no symbol has
been received, yet).
2006 by Taylor & Francis Group, LLC
23-6 Embedded Systems Handbook
(b,a)
0
a b
(,) (a,a) (b,b)
(a,a)
(,)
(a,)
(,a) (,b)
(b,)
(b,b)
(,)
(a,b)
FIGURE 23.2 Specication automaton. (From Roberto Passerone, Luca de Alfaro, Thomas A. Henzinger, and
Alberto L. Sangiovanni-Vincentelli. In Proceedings of the IEEE/ACM International Conference on Computer-Aided
Design (ICCAD02), November 2002. With permission. Copyright 2002 IEEE.)
State a denotes the case in which symbol a has been received, but it has not been output yet.
Similarly, state b denotes the case in which symbol b has been received, but not yet output.
Note that this specication is not concerned with the particular form of the protocols being considered
(or else it would itself function as the converter); for example, it does not require that the symbols a or b
be received in any particular order (other than the one in which they are sent). On the other hand, the
specication makes precise what the converter can, and cannot do, ruling out, for instance, converters
that simply discard all input symbols from one protocol, never producing any output for the destination
protocol. In fact, the specication admits the case in which a and b are transferred in the reversed order.
It also does not enforce that a and b always occur in pairs, and admits a sequence of as without intervening
bs (or vice versa). The specication merely asserts that a
/
should occur no earlier than a (an ordering
relation), and that a
/
must occur whenever a new a or b occurs. In fact, we can view the specication as
an observer that species what can happen (a transition on some symbol is available) and what should not
happen (a transition on some symbol is not available). As such, it is possible to decompose the specication
into several automata, each one of which species a particular property that the synthesized converter
should exhibit. This is similar to the monitor-based property specication proposed by Shimizu et al. [11]
for the verication of communication protocols. In our work, however, we use the monitors to drive the
synthesis so that the converter is guaranteed to exhibit the desired properties (correct-by-construction).
A high-level view of the relationship between the protocols and the specication is presented in
Figure 23.3. The protocol handshake produces outputs a and b, the protocol serial accepts inputs a
/
and b
/
.
The specication accepts inputs a, b, a
/
, b
/
, and acts as a global observer that states what properties the
converter should have. Once we compose the two protocols and the specication, we obtain a system
with outputs a, b, and inputs a
/
, b
/
(Figure 23.3). The converter will have inputs and outputs exchanged:
a and b are the converter inputs, and a
/
, b
/
its outputs.
23.3.3 Synthesis
The synthesis of the converter begins with the composition(product machine) of the two protocols, shown
in Figure 23.4. Here the direction of the signals is reversed: the inputs to the protocols become the outputs
2006 by Taylor & Francis Group, LLC
Interface Specication and Converter Synthesis 23-7
Handshake
protocol
Serial
protocol
Converter
Specification
a,b, a,b,
a,b, a,b,
FIGURE 23.3 Inputs and outputs of protocols, specication, and converter. (From Roberto Passerone, Luca de Alfaro,
Thomas A. Henzinger, and Alberto L. Sangiovanni-Vincentelli. In Proceedings of the IEEE/ACM International
Conference on Computer-Aided Design (ICCAD02), November 2002. With permission. Copyright 2002 IEEE.)
b/ T
1
0
0 1
b a
T
a
b
T
T/T
T/T
T/a
a/T
T/a
a/a
b/b
b/a
a/b
T/b
T/b
FIGURE 23.4 Composition between handshake and serial. (From Roberto Passerone, Luca de Alfaro,
Thomas A. Henzinger, and Alberto L. Sangiovanni-Vincentelli. In Proceedings of the IEEE/ACM International
Conference on Computer-Aided Design (ICCAD02), November 2002. With permission. Copyright 2002 IEEE.)
of the converter, and vice versa. This composition is also a specication for the converter, since on both
sides the converter must comply with the protocols that are being interfaced. However this specication
does not have the notion of synchronization (partial order, or causality constraint) that the specication
discussed above dictates.
We can ensure that the converter satises both specications by taking the converter to be the com-
position of the product machine with the specication, and by removing transitions that violate either
protocol or the correctness specication. Figure 23.5 through Figure 23.7 explicitly show the steps that we
go through to compute this product. The position of the state reects the position of the corresponding
2006 by Taylor & Francis Group, LLC
23-8 Embedded Systems Handbook
T/T
T
a/b
b
0 a
a/T
0
T/b
T/b
b/a
b/b
a/a
T/a
T/T
b
a
T
a b
0 1
0
1
T/a
b/T
FIGURE 23.5 Converter computation, phase 1. (From Roberto Passerone, Luca de Alfaro, Thomas A. Henzinger,
and Alberto L. Sangiovanni-Vincentelli. In Proceedings of the IEEE/ACM International Conference on Computer-Aided
Design (ICCAD02), November 2002. With permission. Copyright 2002 IEEE.)
state in the protocol composition, while the label inside the state represents the corresponding state in
the specication. Observe that the bottom-right state is reached when the specication goes back to
state 0. This procedure corresponds to the synthesis algorithm proposed in Reference 10. The approach
here is however fundamentally different: the illegal states are dened by the specication, and not by the
particular algorithm employed.
The initial step is shown in Figure 23.5. The composition with the specication makes the transitions
depicted in dotted line illegal (if taken, the specication would be violated). However, transitions can
be removed from the composition only if doing so does not result in an assumption on the behavior
of the sender. In Figure 23.5, the transition labeled /a
/
leaving state 0 can be removed because the
machine can still respond to a input by taking the self-loop, which is legal. The same applies to
the transition labeled b /
/
leaving state a which is replaced by the transition labeled b /a
/
. However,
removing the transition labeled /b
/
leaving the bottom-right state would make the machine unreceptive
to input . Equivalently, the converter is imposing an assumption on the producer that will not
occur in that state. Because this assumption is not veried, and because we cannot change the producer,
we can only avoid the problem by making the bottom-right state unreachable, and remove it from the
composition.
The result is shown in Figure 23.6. The transitions that are left dangling because of the removal of the
state should also be removed, and are now shown in dotted lines. The same reasoning as before applies,
and we can only remove transitions that can be replaced by others with the same input symbol. In this
case, all illegal transitions can be safely removed.
The resulting machine shown in Figure 23.7 has now no illegal transitions. This machine complies
both with the specication and with the two protocols, and thus represents the correct conversion (correct
relative to the specication). Notice how the machine at rst stores the symbol a without sending it
(transition a/
/
). Then, when b is received, the machine sends a
/
, immediately followed in the next cycle
by b
/
, as required by the serial protocol.
2006 by Taylor & Francis Group, LLC
Interface Specication and Converter Synthesis 23-9
1
0
0
1
b a
T
a
b
T
T/T
T/a
a/a
b/a
a/b
T/b
0
a
b
a/ T
T/T
FIGURE 23.6 Converter computation, phase 2. (From Roberto Passerone, Luca de Alfaro, Thomas A. Henzinger,
and Alberto L. Sangiovanni-Vincentelli. In Proceedings of the IEEE/ACM International Conference on Computer-Aided
Design (ICCAD02), November 2002. With permission. Copyright 2002 IEEE.)
b/a
1
0
0
1
b a
T
a
b
T
T/T
T/T
T/b
0
a
b
a/T
a/b
FIGURE 23.7 Converter computation, phase 3. (From Roberto Passerone, Luca de Alfaro, Thomas A. Henzinger,
and Alberto L. Sangiovanni-Vincentelli. In Proceedings of the IEEE/ACM International Conference on Computer-Aided
Design (ICCAD02), November 2002. With permission. Copyright 2002 IEEE.)
2006 by Taylor & Francis Group, LLC
23-10 Embedded Systems Handbook
23.4 Algebraic Formulation
The problemof converter synthesis canbe seenas a special case of the more general problemof the synthesis
of a local specication, shown in Figure 23.8 (also known as the unknown component problem). Here,
we are given a global specication G and a partial implementation, called a context, which consists of
the composition of several modules, such as P
1
and P
2
. The implementation is only partially specied,
and is completed by inserting an additional module X to be composed with the rest of the context.
The problem consists of nding a local specication L for X, such that if X implements L, then the
full implementation I implements the global specication G. If we denote with _ the implementation
relation, then the local specication synthesis problem can be expressed as solving for the variable X the
following inequality
P
1
| X | P
2
_ G
The problemof local specication synthesis is very general and can be applied to a variety of situations.
One area of application is, for example, that of supervisory control synthesis [25]. Here a plant is used
as the context, and a control relation as the global specication. The problem consists of deriving the
appropriate control lawto be applied in order for the plant to followthe specication. Engineering changes
is another area, where modications must be applied to part of a systemin order for the entire systemto
satisfy a new specication. This procedure is also known as rectication. Note that the same rectication
procedure could be used to optimize a design. Here, however, the global specication is unchanged, while
the local specication represents all the possible admissible implementations of an individual component
of the system, thus exposing its full exibility [26].
In the case of converter synthesis, the context consists of the protocols that must be connected, while the
specication may simply insist that data be passed fromone side to the other within a set of requirements.
In this case the local specication describes the additional element in the implementation required to
make the communication possible, that is, the converter.
The literature on techniques to solve the local specication synthesis problemis vast. Here we focus on
three of the proposed techniques and highlight in particular their differences in scope and aim.
LarsenandXinxin[23] solve the problemof synthesizing the local specicationfor a systemof equations
in a process algebra. In order to represent the exibility in the implementation, the authors introduce the
Disjunctive Modal Transition System (DMTS). Unlike traditional labeled transition systems, the DMTS
model includes two kinds of transitions: transitions that may exist and transitions that must exist. The
transitions that must exist are grouped into sets, of which only one is required in the implementation.
In other words, the DMTS is a transition systemthat admits several possible implementation in terms of
traditional transition systems.
P
2
L
Local specification
Implies
Global specification
G
X P
1
X
I
FIGURE 23.8 Local specication synthesis.
2006 by Taylor & Francis Group, LLC
Interface Specication and Converter Synthesis 23-11
The systemis solved constructively. Given a context and a specication, the authors construct a DMTS
whose implementations include all and only the solution to the equation. To do so, the context is rst
translated fromits original equational forminto an operational formwhere a transition includes both the
consumption of an event fromthe unknown component, and the production of an event. The transitions
of the context and of the specication are then considered in pairs to deduce whether the implementation
may or may not take certain actions. A transition is possible, but not required, in the solution whenever
the context does not activate such transition. In that case, the behavior of the solution may be arbitrary
afterwards. A transition is required whenever the context activates the transition, and the transition is
used to match a corresponding transition in the specication. A transition is not allowed in the solution
(thus it is neither possible, nor required) whenever the context activates it, and the transition is contrary
to the specication.
The solution proposed by Larsen and Xinxin has the advantage that it provides a direct way of com-
puting the set of possible implementations. On the other hand, it is specic to one model of computation
(transition systems). Yevtushenko et al. [24] present a more general solution where the local specication
is obtained by solving abstract equations over languages under various kinds of composition oper-
ators. By working directly with languages, the solution can then be specialized to different kinds of
representations, including automata and nite state machines.
In the formalism introduced by Yevtushenko et al., a language is a set of nite strings over a xed
alphabet. The particular notion of renement (or implementation) proposed in this work corresponds
to language containment: language P renes a language Q if and only if P Q. If we denote with
P the
operation of complementation of the language P (i.e.,
P is the language that includes all the nite strings
over the alphabet that are not in P), then the most general solution to the equation in the variable X
A X C
is given by the formula
S = A
C
The language S is called the most general solution because a language P is a solution of the equation
if and only if P S. In the formulas above, the operator can be replaced by different avors of
parallel composition, including synchronous and asynchronous composition. These operators are both
constructed as a series of an expansion of the alphabet of the languages, followed by a restriction. For
the synchronous composition, the expansion and the restriction do not alter the length of the strings of
the languages to which they are applied. Conversely, expansion in the asynchronous composition inserts
arbitrary substrings of additional symbols thus increasing the length of the sequence, while the restriction
discards the unwanted symbols while shrinking the string.
The language equations are then specialized to various classes of automata, including nite automata
and nite state machines. This provides an algorithmic way of solving the equation for restricted classes
of languages (i.e., those that can be represented by the automaton). The problem in this case consists of
proving certain closure properties that ensure that the solution can be expressed in the same nite repre-
sentation as the elements of the equation. In particular, the authors consider the problemof receptiveness
(there called I -progression) and prex closure.
A similar solution is proposed in the framework of Agent Algebra by Passerone et al. [12, 27]. The
approach is, however, more general, and does not make any particular assumption about the form that
the protocols or the specication can take. In other words, the solution is not limited to protocols
represented as languages over an alphabet, or as transition systems. This is similar to the block algebras
proposed by de Alfaro and Henzinger (see Section 23.2). There is however a fundamental difference in the
way interfaces and components interact. In de Alfaro and Henzinger, the distinction between interfaces
and components seems to ultimately arise from the fact that components, by making no assumptions,
are unable to constrain their environment. For this reason, components are often called input-enabled,
or receptive. Interfaces, on the other hand, constrain the environment by failing to respond to some
2006 by Taylor & Francis Group, LLC
23-12 Embedded Systems Handbook
of their possible inputs. Receptiveness and environment constraints are not, however, mutually exclu-
sive. The two notions coexist, and are particularly well behaved, in the so-called trace-based models
such as Dills trace structures [9] and Negulescus Process Spaces [28, 29]. We refer to these models as
two-set trace models. In two-set trace models, traces, which are individual executions of a component,
are classied as either successes or failures. In order for a system to be failure-free, the environment of
each component must not exercise the failure traces. Failure traces therefore represent the assumptions
that a component makes relative to its environment. However, the combination of failure and success
traces makes the component receptive. Agent Algebras generalize these concepts by shifting the notion
of compatibility from the individual executions to the components themselves. The interface models
proposed by de Alfaro and Henzinger can easily be seen in these terms. For example, interface auto-
mata [7] can be explained almost exactly in terms of the prex closed trace structures of Dill [9].
In particular, the composition operator in interface automata is an implementation of Dills autofail-
ure manifestation and failure exclusion. Therefore, Agent Algebras do not distinguish between the notion
of an interface and a component. Or, to be more precise, the distinction between a component and its
interface has only to do with a difference in the level of abstraction, rather than with a difference in their
nature.
In Agent Algebra, the problem of local specication synthesis, and therefore of protocol conversion,
is set up as usual as the equation
proj(A)(P
1
| P
2
| X) _ G
Note that here the operation of restriction on the alphabet is not part of the composition and is made
explicit by the operator proj(A), whose effect is to retain only the elements of the alphabet that are
contained in the set A. The solution to the equation is expressed in the form
C _ mirror(proj(A)(P
1
| P
2
| mirror(G)))
where mirror is a generalized complementation operation whose formdepends on the particular model of
computation and on its notion of compatibility. The details of the derivation of this solution are outside
the scope of this chapter [12]. Instead, we only concentrate on protocols represented as two-set trace
structures.
23.4.1 Trace-Based Solution
Two-set trace structures are particularly well suited to modeling behavioral interfaces and protocols.
The set of failure traces, in fact, states the conditions of correct operation of a component. They can there-
fore be interpreted as assumptions that components make relative to their environment. Two components
are compatible whenever they respect those assumptions, that is, they do not engage in behaviors that
make the other component fail. Interface protocols can often be described in this way. The transactions
that do not comply with the protocol specication are considered illegal, and therefore result in an incor-
rect operation of the component that implements the protocol. The solution to the protocol conversion
problemdescribed in Section 23.3 requires that we develop a trace-based model of a synchronous system.
The model that we have in mind is essentially identical to the synchronous models proposed by Burch [30]
and Wolf [31]. For our simple case, an individual execution of a component (a trace) is a sequence of
actions fromthe alphabet A = {, a, b, a
/
, b
/
], where denotes the absence of an action. Each component
T consists of two sets of traces S and F, corresponding to the successes and the failures, respectively.
A projection, or hiding of signals, in a trace can be obtained by replacing everywhere in the trace the
actions to be hidden by the special value , denoting the absence of any action. In this way, while we
abstract away the information about the signal, we do retain the cycle count, ensuring that the model is
synchronous. For instance,
proj({a])(a, b, a, , b, a, b, b, a, . . .)) = a, , a, , , a, , , a, . . .)
2006 by Taylor & Francis Group, LLC
Interface Specication and Converter Synthesis 23-13
where the argument of the projection lists the signals that must be retained. The operation of projection
is applied to all success and all failure traces of a component.
Parallel composition is more complex. A trace is a possible execution of a component whether it is a
success or a failure. It is not a possible execution if it is neither a success nor a failure. If T
1
and T
2
are
two components, then their parallel composition should contain all and only those traces that are possible
executions of both T
1
and T
2
. One such trace will be a success of the composition if it is a success of both
T
1
and T
2
. However, a trace is a failure of the composite if it is a possible trace of one component, and it
is also a failure of the other component. Note that if a trace is a failure of one component, but it is not a
possible trace of the other component (i.e., it is neither a success nor a failure of the other component),
then the trace does not appear as a failure of the composite (in fact, it is not a trace of the composition
at all). This is because, in the interaction, the particular behavior that results in that failure will never be
exercised, as it is ruled out by the other component. Formally, if T
1
= (S
1
, F
1
) and T
2
= (S
2
, F
2
), then the
parallel composition T = T
1
| T
2
is given by
T = (S
1
S
2
, (F
1
(S
2
F
2
)) (F
2
(S
1
F
1
)))
If the two components do not share the same alphabet, parallel composition must also include an inter-
mediate step of projection or inverse projection to equalize the signals. Because the length of the sequence
is retained during a projection, parallel composition results in a lock step execution of the components.
Because components consist of two sets of executions, the relation of implementation cannot be reduced
to a simple set containment. Instead, a component T implements another component T
/
if all the possible
behaviors of T are also possible behaviors of T
/
, and if T fails less often than T
/
. This ensures that replacing
T for T
/
does not produce any additional failure in the system. Formally, T _ T
/
whenever
S F S
/
F and F F
/
The operation of complementation, or mirroring, must also take successes and failures into account.
The complement of T is dened as the most general component that can be composed with T without
generating any failure. Given the denitions of composition and the implementation relation, the mirror
of T is dened as
mirror (T ) = (S F, S F )
In other words, the possible behaviors of mirror (T ) include all behaviors that are not failures of T. Of those,
the successes of T are also successes of its mirror. It is easy therefore to verify that the composition of a
component with its complement has always an empty set of failures.
The two protocols and the correctness specication of the example of Section 23.3 are easily represented
as two-set models. In fact, sets of traces can be represented using automata as recognizers. However, for
each component, we must represent two sets. This can be accomplished in the automaton by adding failure
states that accept the failure traces. For the particular example presented in Section 23.3, we can still use
the automata shown in Figure 23.1 and Figure 23.2. Note that we do not need to add failures to either
the sender protocol or to the specication, since they have only outputs and therefore do not constrain the
environment in any way. The receiver, on the other hand, must be augmented with a state representing the
failure traces. A transition to this additional state is taken from each state on all the inputs for which an
action is not already present. In this case, if P
1
is the sender protocol, P
2
the receiver, C the converter, and
G the specication, we may compute the converter by setting up the following local specication synthesis
problem:
P
1
| P
2
| C _ G
The solution is therefore,
C _ mirror(P
1
| P
2
| mirror(G))
2006 by Taylor & Francis Group, LLC
23-14 Embedded Systems Handbook
Note that projections are not needed in this case, since the alphabet is always A = {, a, b,
/
, a
/
, b
/
],
which is also the alphabet of C. The solution to the problem thus consists of taking the complement of the
global specication, compose it with the context (i.e., the two protocols), and complementing the result.
After taking the complementation, the resulting component may not be receptive. This can be avoided
by applying the operations of autofailure manifestation and failure exclusion, similarly to the synchronous
trace structure algebra of Wolf [31], before computing the mirror. A state is an autofailure if all its outgoing
transitions are failures. In that case, the state can be bypassed by directing its incoming transitions to the
outgoing failure state. Failure exclusion, instead, results in the removal of successful transitions whenever
they are matched by a corresponding failure transition on the same input in the same state. The comple-
mentation can then be most easily done by rst making the automaton deterministic (note, however, that
this is a potentially expensive computation). For a deterministic and receptive automaton the mirror can
be computed by removing the existing outgoing failure transitions of each state and by adding transitions
to a new failure state for each of the input actions that does not already result in a success. When doing
so in the example above, we obtain exactly the result depicted in Figure 23.7, with additional failure
transitions that stand to represent the exibility in the implementation. In particular, the state labeled 0
in Figure 23.7 has failure transitions on input b, the state labeled 1 on input a, and the state labeled 2 on
input b. This procedure is explained in more details below.
23.4.2 End-to-End Specication
A potentially better approach to protocol conversion consists of changing the topology of the local
specication problem, by providing a global specication that extends end to end from the sender to
the receiver, as shown in Figure 23.9. The global specication in this case may be limited to talking about
the behavior of the communication channel as a whole, and would be independent of the particular
signals employed internally by each protocol. In addition, in a scenario where the sender and the receiver
function as layers of two communicating protocol stacks, the end-to-end behavior is likely to be more
abstract, and therefore simpler to specify, than the inner information exchange.
We illustrate this case by modifying the previous example. In order to change the topology, the sender
and receiver protocols must be modied to include inputs from (for the sender) and outputs to (for the
receiver) the environment. This is necessary to let the protocols receive and deliver the data transmitted
over the communication channel, and to make it possible to specify a global behavior. In addition to
adding connections to the environment, in this example we also explicitly model the data. Thus, unlike
the previous example where the specication only required that a certain ordering relationship on the
data be satised, we can here express true correctness by specifying that if a value is input to the system,
the same value is output by the systemat the end of the transaction. Since the size of the state space of the
automata increases exponentially with the size of the data, we will limit the example to the communication
of a two-bit integer value. Abstraction techniques must be used to handle larger problems. To make the
example more interesting, we modify the protocols so that the sender serializes the least signicant bit
Specification
Handshake
protocol
Serial
protocol
Converter
FIGURE 23.9 End-to-end specication.
2006 by Taylor & Francis Group, LLC
Interface Specication and Converter Synthesis 23-15
00
01 10
F
s
ft, 00 / st, 0
nft, -- / st, 0
ft, 01 / st, 1 ft, 10 / st, 0
ft, 11 / st, 1
nft, -- /st, 1
nft, -- /st, 1
nft, -- /st, 0
nft, -- / nst, 0
nft, -- / nst, 0 nft, -- /nst, 0
nft, -- / nst, 0
ft, -- / nst, 0
ft, -- / nst, 0
ft, -- / nst, 0
ft, -- / nst, 0
nft, -- / nst, 0
11
, --/ nst, 0
FIGURE 23.10 The sender protocol.
rst, while the receiver expects the most signicant bit rst. In this case, the converter will also need to
reorder the sequence of the bits received from the sender.
All signals in the system are binary valued. The protocols are simple variations of the ones depicted in
Figure 23.1. The inputs to the sender protocol include a signal ft that is set to 1 when data is available,
and two additional signals that encode the two-bit integer to be transmitted. The outputs also include a
signal st that clocks the serial delivery of the data, and one signal sd for the data itself. The sender protocol
is depicted in Figure 23.10. We adopt the convention that a signal is true in the label of a transition
when it appears with its original name, and it is false when its name is preceded by an n. Hence, for
example, ft implies that ft = 1, and nft that ft = 0. The shaded state labeled F in the automaton accepts
the failure traces, while the rest of the states accept the successful traces. Note that the protocol assumes
that the environment refrains from sending new data while in the middle of a transfer. In addition,
the protocol may wish to delay the transmission of the second bit of the data for as many cycles as
desired.
Similarly, the receiver protocol has inputs rt and rd, where rt is used to synchronize the start of the
serial transfer with the other protocol; the output tt nally informs the environment when new data is
available. The receiver protocol is depicted in Figure 23.11. The receiver fails if the second bit of the data
is not received within the clock cycle that follows the delivery of the rst bit.
The automaton for the global specication is shown in Figure 23.12. The global specication has the
same inputs as the sender protocol, and the same outputs as the receiver protocol. A trace is successful if a
certain value is received on the sender side, and the same value is emitted immediately or after an arbitrary
delay on the receiver side. Analogously to the sender protocol, the specication fails if a new data value is
received while the old value has not been delivered yet.
Following the same notation as the previous example, the solution to the conversion problem can be
stated as
C _ mirror(proj({st, sd, rt, rd])(P
1
| P
2
| mirror(G)))
The projection is now essential to scope down the solution to only the signals that concern the conversion
algorithm. The components must again be receptive, therefore similar considerations as those expressed
before for the computation of the mirror apply. In particular, autofailure manifestation and failure
exclusion is applied before computing the mirror. The automaton is also made deterministic if necessary.
2006 by Taylor & Francis Group, LLC
23-16 Embedded Systems Handbook
0 1
F
r
nrt, / ntt, 00
rt, 0 / ntt, 00
nrt, / tt, 00
rt, 1 / tt, 01
rt, 0 / tt, 00
rt, 0 / tt, 10
rt, 1 / tt, 11
rt, 1 / ntt, 00
nrt, / tt, 00
, / ntt, 00
FIGURE 23.11 The receiver protocol.
00
01 10
F
p
nft, -- / tt, 00
ft, 01 / ntt, --
ft, 10 / ntt, --
ft, 11 / ntt, --
nft, -- / tt, 11
nft, -- / tt, 01
nft, -- / ntt, --
nft, -- / ntt, --
ft, -- /, --
ft, 11 / tt, 11
11
, -- / , --
ft, 00 / ntt, --
nft, -- / tt, 10
nft, -- / ntt, --
nft, -- / ntt, --
ft, -- / , --
ft, -- / , --
ft, -- / , --
ft, 10 / tt, 10
ft, 01 / tt, 01
ft, 00 / tt, 00
nft, -- / ntt, --
FIGURE 23.12 The global specication.
2006 by Taylor & Francis Group, LLC
Interface Specication and Converter Synthesis 23-17
c
F
st, 1 / nrt,
nst, 0 / nrt, nst, 0 / nrt,
st, 1 / rt, 1
st, 0 / rt, 0
st, 1 / nrt,
nst, 0 / rt, 0
nst, 0 / rt, 1
st, 0 / nrt,
nst, 0/ nrt, nst, 0/ nrt, nst, 0/ nrt,
nst, 0 / nrt,
nst, 0 / nrt,
st, 1 / rt, 1
st, 0 / rt, 0
st, 1 / nrt, st, 0 / nrt,
nst, 0 / rt, 0 nst, 0 / rt, 1
nst, 0 / rt, 0
nst, 1 / ,
st, / ,
nst, 1 / ,
nst, 1 / ,
st, / ,
nst, 1 / ,
st, / ,
nst, 1 / ,
st, / ,
nst, 1 / ,
st, / ,
nst, 1 / ,
nst, 1 / ,
nst, 1 / ,
, / ,
st, / ,
st, 0 / nrt,
nst, 0 / rt, 1
0 1
10 00 01 11
*0 *1
FIGURE 23.13 The local converter specication.
The result of the computation is shown in Figure 23.13, where, for readability, the transitions that lead to
the failure states have been displayed in dotted lines. The form of the result is essentially identical to that of
Figure 23.7. Note how the converter switches the position of the most and the least signicant bit of the data
during the transfer. In this way the converter makes sure that the correct data is transferred from one end
to the other. Note, however, that the new global specication (Figure 23.12) had no knowledge whatsoever
of how the protocols were supposed to exchange data. Failure traces again express the exibility in the
implementation, and at the same time represent assumptions on the environment. These assumption
are guaranteed to be satised (modulo a failure in the global specication), since the environment is
composed of the sender and the receiver protocol, which are known variables in the system.
The solution excludes certain states that lead to a deadlock situation. This is in fact an important side
effect of our specic choice of synchronous model, and has to do with the possibility of combinational
loops that may arise as a result of a parallel composition. When this is the case, the mirror of an otherwise
receptive component may not be receptive. This is because it is perfectly admissible in the model to avoid
a failure by withholding an input, that is, by constraining the environment not to generate an input. But
since the environment is not constrained, this can only be achieved by stopping time before reaching
the deadlock state. Since this would be infeasible in any reasonable physical model, we consider deadlock
states tantamount to an autofailure, and remove them from the nal result. This problem can be solved
by employing a synchronous model that deals with combinational loops directly. This is an aspect of
the implementation that has been extensively studied by Wolf [31], who proposes to use a three-valued
model that includes the usual binary values 0 and 1, and one additional value to represent the oscillating,
or unknown, behavior that results from the combinational loops. Exploring the use of this model in the
context of protocol specication and converter synthesis is part of our future work.
Asimilar conditionmay occur whena component tries toguessthe future, by speculating the sequence
of inputs that will be received in the following steps. If the sequence is not received, the component will
nd itself in a deadlock situation, unable to roll back to a consistent state. This is again admissible in
2006 by Taylor & Francis Group, LLC
23-18 Embedded Systems Handbook
*0 *1
F
c
st, 0 / nrt, st, 1 / nrt,
nst, 0/ nrt, nst, 0/ nrt, nst, 0/ nrt,
st, 1 / rt, 1
st, 0 / rt, 0
st, 1 / rt, 1
st, 0 / rt, 0
st, 1 / nrt,
0 1
nst, 0 / rt, 0 nst, 0 / rt, 1
st, / ,
st, / ,
nst, 1 / ,
nst, 1 / ,
nst, 1 / ,
, / ,
nst, 1 / ,
nst, 1 / ,
FIGURE 23.14 The optimized converter.
our model, but would be ruled out if the right notion of receptiveness were adopted. These states and
transitions are also pruned as autofailures.
The procedure outlined above has been implemented in a prototype application in approximately
2400 lines of C++ code. In the code, we explicitly represent the states and their transitions, while the
formulas in the transitions are represented implicitly using BDDs (obtained from a separate package).
This representation obviously suffers from the problem of state explosion. This is particularly true when
the value of the data is explicitly handled by the protocols and the specication, as already discussed.
A better solution can be achieved if the state space and the transition relation are also represented implicitly
using BDDs. Note, in fact, that most of the time the data is simply stored and passed on by a protocol
specication and is therefore not involved in deciding its control ow. The symmetries that result can
therefore likely be exploited to simplify the problem and make the computation of the solution more
efcient.
Note that the converter that we obtain is nondeterministic and could take paths that are slower than
one could expect them to be. This is evident in particular for the states labeled 0 and 1 which can react
to the arrival of the second piece of data by doing nothing, or by transitioning directly to the states *0
and *1, respectively, while delivering the rst part of the data. This is because our procedure derives the
full exibility of the implementation, and the specication depicted in Figure 23.12 does not mandate
that the data be transferred as soon as possible. A faster implementation can be obtained by selecting
the appropriate paths whenever a choice is available, as shown in Figure 23.14. In this case, the converter
starts the transfer in the same clock cycle in which the last bit fromthe sender protocol is received. Other
choices as also possible. In general, a fully deterministic converter can be obtained by optimizing certain
parameters, such as the number of states or the latency of the computation. More sophisticated techniques
might also try to enforce properties that were not included already in the global specication.
23.5 Conclusions
Emerging new design methodologies promote reuse of intellectual property as one of the basic tech-
niques to handle complexity in the design process. In a methodology based on reuse, the components
are predesigned and precharacterized, and are assembled in the system to perform the desired function.
2006 by Taylor & Francis Group, LLC
Interface Specication and Converter Synthesis 23-19
System verication thus reduces to the verication of the interaction of the components used in the sys-
tem. In this chapter, we have reviewed and explored techniques that are useful to dene the interface that
components expose to their environment. These interfaces include not only the basic typing information,
typical of todays programming and hardware description languages, but also sequencing and behavi-
oral information that is necessary to verify correct synchronization. The interface specications of the
components are then used to automatically construct adapters if the components do not already satisfy
each others requirements. This technique was rst presented in the context of automata theory. Later, we
have presented similar, but stronger, results in the context of language theory and algebraic specications.
A simple example was used to illustrate a possible implementation of a converter synthesis algorithm.
Acknowledgments
Several people collaborated to the work described in this chapter, including Jerry Burch,
Alberto Sangiovanni-Vincentelli, Luca De Alfaro, Thomas Henzinger, and James Rowson. The author
would like to acknowledge their contribution.
References
[1] Henry Chang, Larry Cooke, Merrill Hunt, Grant Martin, Andrew J. McNelly, and Lee Todd.
Surviving the SOC Revolution. A Guide to Platform-Based Design. Kluwer Academic Publishers,
Norwell, MA, 1999.
[2] Alberto Ferrari and Alberto L. Sangiovanni-Vincentelli. System design: traditional concepts and
new paradigms. In Proceedings of the International Conference on Computer Design, ICCD 1999,
October 1999, pp. 212.
[3] Alberto L. Sangiovanni-Vincentelli. Dening platform-based design. EEdesign, February 2002.
[4] James A. Rowson and Alberto L. Sangiovanni-Vincentelli. Interface-based design. In Proceedings
of the 34th Design Automation Conference, DAC 1997, Anaheim, CA, June 913, 1997, pp. 178183.
[5] Marco Sgroi, Michael Sheets, Andrew Mihal, Kurt Keutzer, Sharad Malik, Jan Rabaey,
and Alberto Sangiovanni-Vincentelli. Addressing system-on-a-chip interconnect woes through
communication-based design. In Proceedings of the 38th Design Automation Conference, DAC
2001, Las Vegas, NV, June 2001, pp. 667672.
[6] S. Chaki, S.K. Rajamani, andJ. Rehof. Types as models: model checking message-passing programs.
In Proceedings of the 29th ACM Symposium on Principles of Programming Languages, 2002.
[7] Luca de Alfaro and Thomas A. Henzinger. Interface automata. In Proceedings of the Ninth Annual
SymposiumonFoundations of Software Engineering, ACMPress, Vienna, Austria, 2001, pp. 109120.
[8] Luca de Alfaro and Thomas A. Henzinger. Interface theories for component-based design.
In Thomas A. Henzinger and Christoph M. Kirsch, Eds., Embedded Software, Vol. 2211 of Lecture
Notes in Computer Science. Springer-Verlag, Heidelberg, 2001, pp. 148165.
[9] David L. Dill. Trace Theory for Automatic Hierarchical Verication of Speed-Independent Circuits.
ACMDistinguished Dissertations. MIT Press, Cambridge, MA, 1989.
[10] Roberto Passerone, James A. Rowson, andAlberto L. Sangiovanni-Vincentelli. Automatic synthesis
of interfaces between incompatible protocols. In Proceedings of the 35th Design Automation
Conference, San Francisco, CA, June 1998.
[11] Kanna Shimizu, David L. Dill, and Alan J. Hu. Monitor-based formal specication of PCI.
In Proceedings of the Third International Conference on Formal Methods in Computer-Aided Design,
Austin, TX, November 2000.
[12] Roberto Passerone. Semantic Foundations for Heterogeneous Systems. Ph.D. thesis, Department of
EECS, University of California, Berkeley, CA, May 2004.
[13] Roberto Passerone, Luca de Alfaro, Thomas A. Henzinger, and Alberto L. Sangiovanni-Vincentelli.
Convertibility verication and converter synthesis: two faces of the same coin. In Proceedings of
the IEEE/ACM International Conference on Computer-Aided Design (ICCAD02), November 2002.
2006 by Taylor & Francis Group, LLC
23-20 Embedded Systems Handbook
[14] G. Borriello. A New Interface Specication Methodology and its Applications to Transducer Synthesis.
Ph.D. thesis, University of California at Berkeley, Berkeley, CA, 1988.
[15] G. Borriello and R.H. Katz. Synthesis and optimization of interface transducer logic. In Proceedings
of the International Conference on Computer Aided Design, November 1987.
[16] J.S. Sun and R.W. Brodersen. Design of system interface modules. In Proceedings of International
Conference on Computer Aided Design, 1992, pp. 478481.
[17] J. Akella and K. McMillan. Synthesizing converters between nite state protocols. In Proceedings
of the International Conference on Computer Design, Cambridge, MA, October 1415, 1991,
pp. 410413.
[18] S. Narayan and D.D. Gajski. Interfacing incompatible protocols using interface process generation.
In Proceedings of the 32nd Design Automation Conference, San Francisco, CA, June 1216, 1995,
pp. 468473.
[19] Robert Siegmund and Dietmar Mller. A novel synthesis technique for communication controller
hardware fromdeclarative data communication protocol specications. In Proceedings of the 39th
conference on Design Automation, New Orleans, LA, 2002, pp. 602607.
[20] ArindamChakrabarti, Luca de Alfaro, Thomas A. Henzinger, and Freddy Y.C. Mang. Synchronous
and bidirectional component interfaces. In Proceedings of the 14th International Conference on
Computer-Aided Verication (CAV), Vol. 2404 of Lecture Notes in Computer Science. Springer-
Verlag, Heidelberg, 2002, pp. 414427.
[21] Arindam Chakrabarti, Luca de Alfaro, Thomas A. Henzinger, and Marielle Stoelinga. Resource
interfaces. In Proceedings of the Third International Conference on Embedded Software (EMSOFT),
Vol. 2855 of Lecture Notes in Computer Science. Springer-Verlag, Heidelberg, 2003.
[22] Luca de Alfaro, Thomas A. Henzinger, and Marielle Stoelinga. Timed interfaces. In Proceedings of
the Second International Workshop on Embedded Software (EMSOFT), Vol. 2491 of Lecture Notes
in Computer Science. Springer-Verlag, Heidelberg, 2002, pp. 108122.
[23] Kim G. Larsen and Liu Xinxin. Equation solving using modal transition systems. In Proceedings
of the Fifth Annual IEEE Symposium on Logic in Computer Science (LICS 90), June 47, 1990,
pp. 108117.
[24] Nina Yevtushenko, Tiziano Villa, Robert K. Brayton, Alex Petrenko, and Alberto L. Sangiovanni-
Vincentelli. Sequential synthesis by language equation solving. Memorandum No. UCB/ERL
M03/9, Electronic Research Laboratory, University of California at Berkeley, Berkeley, CA, 2003.
[25] Adnan Aziz, Felice Balarin, Robert K. Brayton, Maria D. Di Benedetto, Alex Saldanha, and
Alberto L. Sangiovanni-Vincentelli. Supervisory control of nite state machines. In Pierre
Wolper, Ed., Proceedings of Computer Aided Verication: Seventh International Conference,
CAV95, Vol. 939 of Lecture Notes in Computer Science, Liege, Belgium, July 1995. Springer,
Heidelberg, 1995.
[26] Jerry R. Burch, David L. Dill, Elizabeth S. Wolf, and Giovanni De Micheli. Modeling hierarchical
combinational circuits. In Proceedings of the IEEE/ACM International Conference on Computer-
Aided Design (ICCAD93), November 1993, pp. 612617.
[27] Jerry R. Burch, Roberto Passerone, and Alberto L. Sangiovanni-Vincentelli. Notes on agent
algebras. Technical Memorandum UCB/ERL M03/38, University of California, Berkeley, CA,
November 2003.
[28] Radu Negulescu. Process Spaces and the Formal Verication of Asynchronous Circuits. Ph.D. thesis,
University of Waterloo, Canada, 1998.
[29] Radu Negulescu. Process spaces. In C. Palamidessi, Ed., CONCUR, Vol. 1877 of Lecture Notes in
Computer Science. Springer-Verlag, Heidelberg, 2000.
[30] Jerry R. Burch. Trace Algebra for Automatic Verication of Real-Time Concurrent Systems.
Ph.D. thesis, School of Computer Science, Carnegie Mellon University, August 1992.
[31] Elizabeth S. Wolf. Hierarchical Models of Synchronous Circuits for Formal Verication and
Substitution. Ph.D. thesis, Department of Computer Science, Stanford University, October 1995.
2006 by Taylor & Francis Group, LLC
24
Hardware/Software
Interface Design
for SoC
Wander O. Cesrio
TIMA Laboratory
Flvio R. Wagner
UFRGS Instituto de Informtica
A.A. Jerraya
TIMA Laboratory
24.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-1
24.2 SoC Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-3
System-Level Design Flow SoC Design Automation
An Overview
24.3 HW/SW IP Integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-5
Introduction to IP Integration Bus-Based and Core-Based
Approaches Integrating Software IP Communication
Synthesis IP Derivation
24.4 Component-Based SoC Design . . . . . . . . . . . . . . . . . . . . . . . . . 24-8
Design Methodology Principles Virtual Architecture
Target MPSoC Architecture Model HW/SW Wrapper
Architecture Design Tools Dening IP-Component
Interfaces
24.5 Component-Based Design of a VDSL Application . . . . 24-14
Specication DFU Abstract Architecture MPSoC RTL
Architecture Results Evaluation
24.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-19
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-19
24.1 Introduction
Modern system-on-chip (SoC) design shows a clear trend toward integration of multiple processor cores.
The SoCsystemdriver sectionof theInternational Technology Roadmapfor Semiconductors [1]predicts
that the number of processor cores will increase fourfold per technology node in order to match the
processing demands of the corresponding applications. Typical multiprocessor SoC(MPSoC) applications
such as network processors, multimedia hubs, and base-band telecom circuits have particularly tight
time-to-market and performance constraints that require a very efcient design cycle.
Our conceptual model of the MPSoCplatformis composedof four kinds of components: software tasks,
processor and intellectual property (IP) cores, and a global on-chip interconnect IP (see Figure 24.1[a]).
Moreover, to complete the MPSoC platform we must also include hardware/software (HW/SW)
elements that adapt platform components to each other. MPSoC platforms are quite different from
single-master processor SoCs (SMSoCs). For instance, their implementation of system communication is
24-1
2006 by Taylor & Francis Group, LLC
24-2 Embedded Systems Handbook
On-chip
communication
interconnect
HW interface
IP core
SW application
HW components
(RTL and Layout)
SW communication
abstraction
HW communication
abstraction
Platform API
Abstract HW interfaces
SW
design
SoC
design
HW
design
Communication
interconnect
Dedicated SW
Custom OS
Drivers
MPU core
Platform API
IP
Interface
HW interface
MPU core
SW interface
SW tasks
(a) (b) (c)
FIGURE 24.1 (a) MPSoC platform, (b) software stack, and (c) concurrent development environment.
more complicatedsince heterogeneous processors may be involvedandcomplex communicationprotocols
and topologies may be used. The hardware adaptation layer must deal with some specic issues:
1. In SMSoC platforms, most peripherals (excluding DMA controllers) operate as slaves with respect
to the shared communication interconnect. MPSoCplatforms may use many different types of pro-
cessor cores; in this case, sophisticated synchronization is needed to control shared communication
between several heterogeneous masters.
2. While SMSoC platforms use simple master/slave shared-bus interconnections, MPSoC platforms
often use several complex system buses or micronetworks as global interconnect. In MPSoC plat-
forms, we can separate computation and communication design by using communication copro-
cessors and proting from the multimaster architecture. Communication coprocessors/controllers
(masters) implement high-level communication protocols in hardware and execute themin parallel
with the computation executed on processor cores.
Application software is generally organized as a stack of layers that runs on each processor core (see
Figure 24.1[b]). The lowest layer contains drivers and low-level routines to control/congure the platform.
For the middle layer we can use any commercial embedded operating system (OS) and congure it
according to the application. The upper layer is an application-programming interface (API) that provides
some predened routines to access the platform. All these layers correspond to the software adaptation
layer inFigure 24.1(a), coding applicationsoftware canthenbe isolatedfromthe designof the SoCplatform
(software coding is not the topic of this chapter and will be omitted). One of the main contributions of
this work is to consider this layered approach for the dedicated software (often-called rmware) also.
Firmware is the software that controls the platform, and, in some cases, executes some nonperformance
critical application functions. In this case, it is not realistic to use a generic OS as the middle layer due
to code size and performance reasons. A lightweight custom OS supporting an application-specic and
platform-specic API is required.
Software and hardware adaptation layers isolate platform components enabling concurrent develop-
ment as shown in Figure 24.1(c). With this scheme, the software design teamuses APIs for both application
and dedicated software development. The hardware design team uses abstract interfaces provided by
communication coprocessors/controllers. SoC design team can concentrate on implementing HW/SW
abstraction layers for the selected communication interconnect IP. Designing these HW/SW abstraction
layers represent a major effort, and design tools are lacking. Established EDA tools are not well adapted to
this newMPSoCdesignscenario, andconsequently many challenges are emerging; some major issues are:
1. Higher abstraction level is needed: the register-transfer level (RTL) is very time consuming to
model and verify the interconnection between multiple processor cores.
2. Higher-level programming is needed: MPSoCs will include hundredthousands of lines of dedicated
software (rmware). This software cannot be programmed at the assembler level as today.
2006 by Taylor & Francis Group, LLC
Hardware/Software Interface Design for SoC 24-3
3. Efcient HW/SW interfaces are required: microprocessor interfaces, bank of registers, shared
memories, software drivers, and OSs must be optimized for each application.
This chapter presents a component-based design automation approach for MPSoC platforms.
Section 24.2 introduces the basic concepts for MPSoC design and discusses some related platform and
component-based approaches. Section 24.3 details IP-based methodologies for HW/SW IP integration.
Section 24.4 details our specication model and design ow. Section 24.5 presents the application of this
ow for the design of a VDSL circuit and the analysis of the results.
24.2 SoC Design
24.2.1 System-Level Design Flow
This section gives an overview of current SoC design methodologies using a template design ow (see
Figure 24.2). The basic theory behindthis owis the separationbetweencommunicationandcomputation
renement for platform and component-based design [2,3]; it has ve main design steps:
1. System specication: system designers and the end-customer must agree on an informal model
containing all applications functionality and requirements. Based on this model, system designers
build a more formal specication that can be validated by the end-customer.
2. Architecture exploration: systemdesigners build an executable model of the specication and iterate
through a performance analysis loop to decide the HW/SW partitioning for the SoC architecture.
This executable specication uses an abstract platform composed of abstract models for HW/SW
components. For instance, an abstract software model can concentrate on I/O execution proles,
most frequent use cases, or worst-case scheduling. Abstract hardware can be described using
Specification
SW models HW models
Performance analysis
HW/SW partitioning
Abstract
platform
S
y
s
t
e
m
s
p
e
c
i
f
i
c
a
t
i
o
n
A
r
c
h
i
t
e
c
t
u
r
e
e
x
p
l
o
r
a
t
i
o
n
HW design SW design HW/SW IP integration
A
r
c
h
i
t
e
c
t
u
r
e
d
e
s
i
g
n
1
2
4
5
3 Golden
abstract architecture
RTL
architecture
IP core
interface
HW/SW interfaces
design
API
SW tasks
FIGURE 24.2 System-level design ow for SoC.
2006 by Taylor & Francis Group, LLC
24-4 Embedded Systems Handbook
transaction-level models or behavioral models. This step produces the golden architecture model
that is the customized SoCplatformor a newarchitecture created by systemdesigners after selecting
processors, the global communication interconnect and other IP components. Once HW/SW
partitioning is decided, software and hardware development can be done concurrently.
3. Software design: since the nal hardware platform will not be available during software develop-
ment, some kind of hardware abstraction layer (HAL) or API must be provided to the software
design team.
4. Hardware design: hardware IP designers implement the functionality described by the abstract
hardware models at the RTL. Hardware IPs can use specic interfaces for a given platform or
standard interfaces as dened by Virtual Socket Interface Alliance (VSIA) [4].
5. HW/SW IP integration: SoC designers create HW/SW interfaces to the global communication
interconnect. The golden architecture model must specify performance constrains to assure
a good HW/SW integration. SW/HW communication interfaces are designed to conform to these
constrains.
24.2.2 SoC Design Automation An Overview
Many academic and industrial works propose tools for SoC design automation covering many, but not all,
design steps presented before. Most approaches can be classied into three groups: system-level synthesis,
platform-based design, and component-based design.
System-level synthesis methodologies are top-down approaches, the SoC architecture and software
models are produced by synthesis algorithms from a system-level specication. COSY [5] proposes
a HW/SW communication renement process that starts with an extended Kahn Process Network model
on design step (1), uses virtual channel connection (VCC) [6] for step (2), callback signals over a standard
real-time operating system (RTOS) for the API in step (3), and VSIA interfaces for steps (4) and (5).
SpecC [7] starts with an untimed functional specication model written in extended C on design step (1),
uses performance estimation for a structural architecture model for step (2), HW/SW interface synthesis
based on a timed bus-functional communication model for step (5), synthesized C code for step (3), and
behavioral synthesis for step (4).
Platform-based design is a meet-in-the-middle approach that starts with a functional system specica-
tion and a predesigned SoC platform. Performance estimation models are used to try different mappings
between the set of applications functional modules and the set of platform components. During these
iterations, designers can try different platform customizations and functional optimizations. VCC [6] can
produce a performance model using a functional description of the application and a structural descrip-
tion of the SoC platform for design steps (1) and (2). CoWare N2C [8] is a good complement for VCC
for design steps (4) and (5). Still the API for software components and many architecture details must be
implemented manually.
Section 24.3 discusses HW/SW IP integration in the context of current IP-based design approaches.
Most IP-based design approaches build SoC architectures from the bottom-up using predesigned com-
ponents with standard interfaces and a standard bus. For instance, IBM dened a standard bus called
CoreConnect [9], Sonics proposes a standard on-chip network called Silicon Backplane Network [10],
and VSIA dened a standard component protocol called VCI. When needed, wrappers adapt incompat-
ible buses and component interfaces. Frequently, internally developed components are tied to in-house
(nonpublic) standards; in this case, adopting public standards implies a big effort to redesign interfaces
or wrappers for old components.
Section 24.4 introduces a higher-level IP-based design methodology for HW/SWinterface design called
component-based design. This methodology denes a virtual architecture model composed of HW/SW
components and uses this model to automate design step (5), by providing automatic generation of
hardware interfaces (4), device drivers, OSs, and APIs (3). Even if this approach does not provide much
help in automating design steps (1) and (2), it provides a considerable reduction of design time for design
2006 by Taylor & Francis Group, LLC
Hardware/Software Interface Design for SoC 24-5
steps (3), (4), and (5) and facilitates component reuse. The key improvements over other state-of-art
platform and component-design approaches are:
1. Strong support for software design and integration: the generated API completely abstracts the
hardware platform and OS services. Software development can be concurrent to, and independent
of, platform customization.
2. Higher-level abstractions: the use of a virtual architecture model allows designers to deal with
HW/SW interfaces at a high abstraction level. Behavior and communication are separated in the
system specication, thus, they can be rened independently.
3. Flexible HW/SW communication: automatic HW/SW interfaces generation is based on the com-
position of library elements. It can be used with a variety of IP interconnect components by adding
the necessary supporting library.
24.3 HW/SW IP Integration
There are two major approaches for the integration of HW/SW IP components into a given design. In the
rst one, component interfaces follow a given standard (such as a bus or core interface, for hardware
components, or a set of high-level communication primitives, for software components) and can be thus
directly connected to each other. In the second approach, components are heterogeneous in nature and
their integration requires the generation of HW/SW wrappers. In both cases, an RTOS must be used to
provide services that are needed in order that the application software ts into the SoC architecture. This
section describes different solutions to the integration of HW/SW IP components.
24.3.1 Introduction to IP Integration
The design of an embedded SoC starts with a high-level functional specication, which can be validated.
This specication must already follow a clear separation between computation and communication [11],
in order to allow their concurrent evolution and design. An abstract architecture is then used to evaluate
this functionality based on a mapping that assigns functional blocks to architectural ones. This high-
level architectural model abstracts away all low-level implementation details. A performance evaluation
of the system is then performed, by using estimates of the computation and communication costs.
Communication renement is now possible, with a selection of particular communication mechanisms
and a more precise performance evaluation.
According to the platform-based design approach [2], the abstract architecture follows an architectural
template that is usually domain specic. This template includes both a hardware platform, consisting
of a given communication structure and given types of components (processors, memories, hardware
blocks), and a software platform, in the form of a high-level API. The target embedded SoC will be
designed as a derivative of this template, where the communication structure, the components, and
the software platform are all tailored to t the particular application needs.
The IP-based design approach follows the idea that the architectural template may be implemented
by assembling reusable HW/SW IP components, maybe even delivered by third-party companies.
The IP integration step comprises a set of tasks that are needed to assemble predesigned components
in order to fulll system requirements. As shown in Figure 24.3, it takes as inputs the abstract architecture
and a set of HW/SW IP components that have been selected to implement the architectural blocks.
Its output is a microarchitecture where hardware components are described at the RTL with all cycle-and-
pin accurate details that are needed for a further automatic synthesis. Software components are described
in an appropriate programming language, such as C, and can be directly compiled to the target processors
of the architecture.
In an ideal situation, IP components would t directly together (or to the communication structure)
and exactly match the desired SoC functionality. In a more general situation, the designer may need
to adapt each components functionality (a step called IP derivation) and synthesize HW/SW wrappers
2006 by Taylor & Francis Group, LLC
24-6 Embedded Systems Handbook
OS services
scheduler, interrupt,...
Abstract architecture
HW IP components
CPU
IP
Bus
Microarchitecture
IP Memory
HW
wrapper
HW
wrapper
HW
wrapper
Communication network
CPU
Application software IP
Specific API
Drivers
I/O, interrupt,...
S
W
w
r
a
p
p
e
r
HW/SW IP
integration step
Application SW IP
Memory
FIGURE 24.3 The HW/SW IP integration design step.
to interconnect them. For programmable components, although adaptation may be easily performed
by programming the desired functionality, the designer may still need to develop software wrappers
(usually device and bus drivers) to match the application software to the communication infrastruc-
ture. The generation of HW/SW wrappers is usually known as interface or communication synthesis.
Besides them, application software may also need to be retargeted to the processors and OS of the chosen
architecture.
In the following subsections, different approaches to IP integration are introduced and their impact on
the possible integration subtasks is analyzed.
24.3.2 Bus-Based and Core-Based Approaches
In the bus-based design approach [9,12,13], IP components communicate through one or more buses
(interconnected by bus bridges). Since the bus specication can be standardized, libraries of compo-
nents whose interfaces directly match this specication can be developed. Even if components follow
the bus standard, very simple bus interface adapters may still be needed [14]. For components that
do not directly match the specication, wrappers have to be built. Companies offer very rich compo-
nent libraries and specialized development and simulation environments for designing systems around
their buses.
A somewhat different approach is the core-based design, as proposed by the VSIA VCI standard [4]
and by the OCP-IP organization [15]. In this case, IP components are compliant to a bus-independent
and standardized interface and can be thus directly connected to each other. Although the standard
may support a wide range of functionality, each component may have an interface containing only the
functions that are relevant for it. These components may also be interconnected through a bus, in which
case standard wrappers can adapt the component interface to the bus. Sonics [13] follows this approach,
proposing wrappers to adapt the bus-independent OCP socket to the MicroNetwork bus.
For particular needs, the SoCmay be built arounda sophisticatedanddedicatednetwork-on-chip(NoC)
[16] that may deliver very high performance for connecting a large number of components. Even in this
case, a bus or core-based approach may be adopted to connect the components to the network.
Bus-based and core-based design methodologies are integration approaches that depend on stan-
dardized component or bus interfaces. They allow the integration of homogeneous IP components that
follow these standards and can be directly connected to each other, without requiring the development of
complex wrappers. The problem we face is that many de facto standards exist, coming from different
companies or organizations, thus preventing a real interchange of libraries of IP components developed
for different substandards.
2006 by Taylor & Francis Group, LLC
Hardware/Software Interface Design for SoC 24-7
24.3.3 Integrating Software IP
Programmable components are important in a reusable architectural platform, since it is very cost-effective
to tailor a platform to different applications by simply adapting the low-level software and maybe only
conguring certain hardware parameters, such as memory sizes and peripherals.
As illustrated in Figure 24.3, the software view of an embedded system shows three different layers:
1. The bottom layer is composed of services directly provided by hardware components (processor
and peripherals) such as instruction sets, memory and peripheral accesses, and timers.
2. The top layer is the application software, which should remain completely independent from the
underlying hardware platform.
3. The middle layer is composed of three different sublayers, as seen from bottom to top:
(a) Hardware-dependent software (HdS), consisting for instance of device drivers, boot code, parts
of an RTOS (such as context switching code and conguration code to access the memory
management unit [MMU]), and even some domain-oriented algorithms that directly interact
with the hardware.
(b) Hardware-independent software, typically high-level RTOS services, such as task scheduling
and high-level communication primitives.
(c) The API, which denes a system platform that isolates the application software from the
hardware platform and from all basic software layers and enables their concurrent design.
The standardization of this API, which can be seen as a collection of services usually offered by an
OS, is essential for software reuse above and below it. At the application software level, libraries of
reusable software IP components can implement a large number of functions that are necessary for
developing systems for given application domains. If, however, one tries to develop a systemby integrating
application software components that do not directly match a given API, software retargeting to the new
platformwill be necessary. This can be a very tedious and error-prone manual process, which is a candidate
for an automatic software synthesis technique.
Nevertheless, reuse can be also obtained below the API. Software components implementing the
hardware-independent parts of the RTOS can be more easily reused, especially if the interface between
this layer and the HdS layer is standardized. Although the development of reusable HdS may be harder to
accomplish, because of the diversity of hardware platforms, it can be at least obtained for platforms aimed
at specic application domains.
There are many academic and industrial alternatives providing RTOS services. The problem with most
approaches, however, is that they do not consider specic requirements for SoC, such as minimizing
memory usage and power consumption. Recent research efforts propose the development of application-
specic RTOS containing only the minimal set of functions needed for a given application [17,18] or
including dynamic power management techniques [19]. IP integration methodologies should thus con-
sider the generation of application-specic RTOS that are compliant to a standard API and optimized for
given system requirements.
In recent years, many standardization efforts aimed at hardware IP reuse have been developed. Similar
efforts for software IP reuse are now needed. VSIA [4] has recently created working groups to deal with
HdS and platform-based design.
24.3.4 Communication Synthesis
Solutions for the automatic synthesis of communication wrappers to connect hardware IP components
that have incompatible interfaces have been already proposed. In the PIG tool [20], component interfaces
are specied as protocols described as regular expressions, and a nite state machine (FSM) interface for
connecting two arbitrary protocols is automatically generated. The Polaris tool [21] generates adapters
based on state machines for converting component protocols into a standard internal protocol, together
with send and receive buffers and an arbiter.
2006 by Taylor & Francis Group, LLC
24-8 Embedded Systems Handbook
These approaches, however, do not address the integration of software IP components. The TEReCS
tool [18] synthesizes communication software to connect software IP components, given a specication
of the communication architecture and a binding of IP components to processors. In the IPChinook envi-
ronment [22], abstract communication protocols are synthesized into low-level bus protocols according
to the target architecture. While the IPChinook environment also generates a scheduler for a given parti-
tioning of processes into processors, the TEReCS approach is associated with the automatic synthesis of
a minimal OS, assembled from a general-purpose library of reusable objects that are congured according
to application demands and the underlying hardware.
Recent solutions uniformly handle HW/SW interfaces between IP components. In the COSY approach
[5], design is performed by an explicit separation between function and architecture. Functions are
then mapped to architectural components. Interactions between functions are modeled by high-level
transactions and then mapped to HW/SW communication schemes. A library provides a xed set of
wrapper IPs, containing HW/SW implementations for given communication schemes.
24.3.5 IP Derivation
Hardware IP components may come in several forms [23]. They may be hard, when all gates and
interconnects are placed and routed, soft, with only an RTL representation, or rm, with an RTL descrip-
tion together with some physical oorplanning or placement. The integration of hard IP components
cannot be performed by adapting their internal behavior and structure. If they have the advantage of
a more predictable performance, in turn they are less exible and therefore less reusable than adaptable
components.
Several approaches for enhancing reusability are based on adaptable components. Although one can
think of very simple component congurations (for instance by selecting a bit width), a higher degree
of reusability can be achieved by components whose behavior can be more freely modied. Object
orientation is a natural vehicle for high-level modeling and adaptation of reusable components [24,25].
This approach, which can be better classied as IP derivation, is adequate for not only rm and soft
hardware IP components, but also for software IP [26]. Although component reusability is enhanced by
this approach, the system integrator has a greater design effort, and it becomes more difcult to predict
IP performance.
Intellectual property derivationandcommunicationsynthesis are different approaches tosolve the same
problem of integration between heterogeneous IP components, which do not follow standards (or the
same substandards). IP derivation is a solution usually based on object-oriented concepts coming from
the software community. It can be applied to the integration of application software components and for
hardware soft and rm components, but it cannot be used for hard IP components. Communication syn-
thesis, on the other hand, follows the path of the hardware community on automatic logic and high-level
synthesis. It is the only solution to the integration of heterogeneous hard IP components, although it can
also be used for integrating software IP and soft and rm hardware IP. While IP derivation is essentially a
user-guided manual process, communication synthesis is an automatic process, with no user intervention.
24.4 Component-Based SoC Design
This section introduces the component-based design methodology, a high-level IP-based methodology
aimed at the integration of heterogeneous HW/SW IP components. It follows an automatic communica-
tion synthesis approach, generating both HW/SW wrappers. It also generates a minimal and dedicated
OS for programmable components. It uses a high-level API, which isolates the application software from
the implementation of a HW/SW solution for the system platform, such that software retargeting is not
necessary.
This approach enables the automatic integration of heterogeneous (components that do not follow
a givenbus or core standard) andhardIPcomponents (whose internal behavior or structure is not known).
2006 by Taylor & Francis Group, LLC
Hardware/Software Interface Design for SoC 24-9
However, the approach is also very well suited to the integration of homogeneous and soft IP components.
The methodology has been conceived to t to any communication structure, such as an NoC [16] or a bus.
The component-based methodology is based on a clear denition of three abstraction levels that are also
adopted by other current approaches: system (pure functional), macroarchitecture, and microarchitecture
(RTL). These levels constitute clearinterfacesbetween design steps, promoting reuse of both components
and tools for design tasks at each of these levels.
24.4.1 Design Methodology Principles
The design ow starts with a virtual architecture model that corresponds to the golden architecture in
Figure 24.2, and allows automatic generation of wrappers, device drivers, OSs, and APIs. The goal is to
produce a synthesizable RTL model of the MPSoC platform that is composed of processor cores, IP cores,
the communication interconnect IP, and HW/SW wrappers. The latter are automatically generated from
the interfaces of virtual components (as indicated by the arrows in Figur 24.4). Software written for the
virtual architecture specication runs without modication on the implementation because the same APIs
are provided by the generated custom OSs.
The input abstract architecture (see Figure 24.4[a]) is composed of virtual modules (VM), correspond-
ing to processing and memory IPs, connected by any communication structure, also encapsulated within
a VM. This abstract architecture model clearly separates computation from communication, allowing
independent and concurrent implementation paths for components and for communication. VMs that
MPU core 1
SW
wrapper
HW
wrapper
Communication interconnect IP
IP core1
Wrapper
OS
...
IP core
(blackbox)
Communication interconnect IP
(blackbox)
Configuration
parameters
Wrapper
Module
Task
Virtual port
Virtual channel
Virtual component
API
(a)
(b)
Wrapper
FIGURE 24.4 MPSoC design ow: (a) virtual architecture and (b) target MPSoC platform.
2006 by Taylor & Francis Group, LLC
24-10 Embedded Systems Handbook
correspond to processors may be hierarchically decomposed into submodules containing software tasks
assigned to this processor. VMs communicate through virtual ports, which are sets of hierarchical internal
and external ports through which services are requested and provided. The separation between internal and
external ports makes possible the connection of modules described at different abstraction levels.
24.4.2 Virtual Architecture
The virtual architecture represents a system as an abstract netlist of virtual components (see
Figure 24.4[a]). It is described in VADeL, a SystemC [27] extension that includes a platform-independent
API offering high-level communication primitives. This API abstracts the underlying hardware platform,
thus enhancing the free development of reusable components. In the abstract architecture model, the
interfaces of software tasks are the same for SW/SW and SW/HW connections, even if the software tasks
are executed by different processors. Different HW/SW realizations of this API are possible. Architec-
tural design space exploration can be thus achieved without inuencing the functional description of the
application.
Virtual components use wrappers to adapt accesses from the internal component (a set of software
tasks or a hardware function) to the external channels. The wrapper is modeled as a set of virtual ports
that contain internal and external ports that can be different in terms of: (1) communication protocol,
(2) abstraction level, and (3) specication language. This model is not directly synthesizable or executable
because wrappers behavior is not described. These wrappers can be generated automatically, in order
to produce a detailed architecture that can be both synthesized and simulated.
The required SystemC extensions implemented in VADeL are:
1. Virtual module: consists of a module and its wrapper.
2. Virtual port : groups some internal and external ports that have a conversion relationship. The
wrapper is the set of virtual ports for a given VM.
3. Virtual channel : groups several channels having a logical relationship (e.g., multiple channels
belonging to the same communication protocol).
4. Parameters: used to customize hardware interfaces (e.g., buffer size and physical addresses of ports),
OSs, and drivers.
In VADeL, there are also predened ports with special semantics called SAP (Service Access Ports). They
can be used to access some services that are implemented by hardware or software wrapper components.
For instance, the timer SAP can be used to request an interrupt from a hardware timer after a given delay.
24.4.3 Target MPSoC Architecture Model
We use a generic MPSoC architecture where processors and other IP cores are connected to a global
communication interconnect IP via wrappers (see Figure 24.4[b]). In fact, processors are separated from
the physical communication IP by wrappers that act as communication coprocessors or bridges, freeing
processors from communication management and enabling parallel execution of computation tasks and
communication protocols. Software tasks also need to be isolated from hardware through an OS that
plays the role of software wrapper. When dening this model our goal was to have a generic model where
both computation and communication may be customized to t the specic needs of the application.
For computation, we may change the number and kind of components and, for communication, we can
select a specic communication IP and protocols. This architecture model is suitable to a wide domain of
applications; more details can be found in Reference 28.
24.4.4 HW/SW Wrapper Architecture
Wrappers are automatically generated as point-to-point adapters between each VM and the communica-
tion structure, as shown in Figure 24.4(b) [28]. This approach allows the connection of components to
standard buses as well as point-to-point connections between cores.
2006 by Taylor & Francis Group, LLC
Hardware/Software Interface Design for SoC 24-11
Processor adapter
CA CA
ib_data
ib_it ib_enable
Task
schedule
Software
module
Task1() {...
_write(d);
yield2sched();
SW
wrapper
(a) (b)
FIFO
write()
Yiel d
schedule
...
FIFO Int. I/O
Write
Reg()
... ... ...
API
Services
Drivers
FIGURE 24.5 HW/SW wrapper architecture: (a) software wrapper; (b) hardware wrapper.
Wrappers may have HW/SW parts. The internal architecture of a wrapper on the hardware side is shown
in Figure 24.5(b). It consists of a processor adapter, one or more channel adapters, and an internal bus. The
number of channel adapters depends on the number of channels that are connected to the corresponding
VM. This architecture allows the easy generation of multipoint, multiprotocol wrappers. The wrapper
dissociates communication from computation, since it can be considered as a communication coprocessor
that operates concurrently with other processing functions.
On the software side [17], as shown in Figure 24.5(a), wrappers provide the implementation of the
high-level communication primitives (available through the API) used in the system specication and
drivers to control the hardware. If required, the wrapper will also provide sophisticated OS services such
as task scheduling and interrupt management minimally tailored for the particular application.
The synthesis of wrappers is based on libraries of basic modules from which hardware wrappers and
dedicated OSs are assembled. These libraries may be easily extended with modules that are needed to build
wrappers for processors, memories, and other components that follow various bus and core standards.
24.4.5 Design Tools
Figure 24.6 shows an overall view of our design environment, which is called ROSES. The input model may
be imported from a specication analysis tool (e.g., Reference 6) or manually coded using our extended
SystemC library. All design tools use a unied design model that contains an abstract HW/SW netlist
annotated with parameters (Colif [29]). Hardware wrapper generation [28] transforms the input model
into a synthesizable architecture. The software wrapper generator [17] produces a custom OS for each
processor on the target platform. For validation, we use the cosimulation wrapper generator [30] to
produce simulation models. Details about these tools can be found in the references, only their principle
will be discussed here.
Hardware wrapper generation assembles library components using the architecture template presented
before (Figure 24.5[b]) to produce the RTL architecture. This library contains generalized descriptions
of hardware components in a macrolanguage (m4 like); it has two parts: the processor library and the pro-
tocol library. The former contains local template architectures for processors with four types of elements:
processor cores, local buses, local IP components (e.g., local memory, address decoder, coprocessors, etc.),
and processor adapters. The latter consists of a list of channel adapters. Each channel adapter has simula-
tion, estimation, and synthesis models that are parameterized (by the channel parameters, e.g., direction,
storage size, and data type) as the elements in the processor library.
The software wrapper generator produces OSs streamlined and precongured for the software
module(s) that run(s) on each target processor. It uses a library organized in three parts: APIs,
communication/system services, and device drivers. Each part contains elements that will be used in
a given software layer in the generated OS. The generated OS provides services: communication services
(e.g., FIFO [rst in rst out] communication), I/O services (e.g., AMBA bus drivers), memory services
(e.g., cache or virtual memory usage), etc. Services have dependency between them, for instance, com-
munication services are dependent on I/O services. Elements of the OS library also have dependency
2006 by Taylor & Francis Group, LLC
24-12 Embedded Systems Handbook
RTL architecture
Virtual architecture
Custom OS
generation
HW wrapper
generation
Extended
SystemC
Executable
cosimulation
model
Cosimulation
wrapper generation
RTL synthesis and
compilation
Emulation platform
APIs
OS library
Communication
and system
services
Device
drivers
Processor
library
HW wrapper library
Protocol
library
Simulator
library
Cosimulation
library
Channel
library
FIGURE 24.6 Design automation tools for MPSoC.
information. This mechanism is used to keep the size of the generated OS at a minimum; the elements
that provide unnecessary services are not included.
There are two types of service code: reusable (or existing) code and expandable code. As an example
of existing code, AMBA bus-master service code can exist in the OS library in the form of C language.
As an example of expandable code, OS kernel functions can exist in the OS library in the form of macro-
code (m4 like). There are several preemptive schedulers available in the OS library such as round-robin
scheduler, priority-based scheduler, etc. In the case of round-robin scheduler, time-slicing (i.e., assigning
different CPU load to tasks) is supported. To make the OS kernel very small and exible, (1) the task
scheduler can be selected from the requirements of the application code and (2) a minimal amount (less
than 10% of kernel code size) of processor specic assembly code is used (for context switching and
interrupt service routines).
The cosimulation wrapper generator [30] produces an executable model composed of a SystemC simu-
lator that acts as a master for other simulators. A variety of simulators can participate in this cosimulation:
SystemC, VHDL, Verilog, and Instruction-set simulators. Cosimulation wrappers have the same structure
as that of hardware wrappers (see Figure 24.5[b]), with simulation adapters in the place of processor
adapters and simulation models in the place of channel adapters. In the cosimulation wrapper library,
there are simulation adapters for the different simulators supported and channel adapters that implement
all supported protocols in different languages.
In terms of functionality, the cosimulation wrapper transforms channel access(es) via internal port(s)
to channel access(es) via external port(s) using the following functional chain: channel interface, channel
resolution, data conversion, module communication behavior. Internal ports use channel functions
(e.g., FIFOavailable, FIFOwrite) toexchange data. Channel interface provides the implementationof these
channel functions. Channel resolution maps N-to-Mcorrespondence between internal and external ports.
Data conversion is required since different abstraction levels can use different data types to represent the
same data. Module communication behavior is required to exchange data via external port(s), that is, to
call port functions of external ports.
2006 by Taylor & Francis Group, LLC
Hardware/Software Interface Design for SoC 24-13
24.4.6 Dening IP-Component Interfaces
Hardware and software component interfaces must be composed using basic elements of the hardware
wrapper and software wrapper generators libraries (respectively). Table 24.1 lists some API functions
available for different kinds of software task interfaces and some services provided by channel adapters
available to be used in hardware component interfaces.
Software tasks must communicate through API functions provided by the software wrapper generator
library. For instance, the shared memory (SHM) API provides read/write functions for intertask
communication. The guarded shared memory (GSHM) API adds semaphores services to the SHM API
by providing lock/unlock functions.
Hardware IP components must communicate through communication primitives provided by the
channel adapters of the hardware wrapper generator library. For instance, FIFO channel adapters (sender
and receiver) implement a buffered two-phase handshake protocol (put/get) and provide full/empty func-
tions for accessing the state of the buffer. ASFIFO channel adapters instead use a single-phase handshake
protocol and can generate an interrupt for signaling the full and empty state of the buffer.
A recurrent problem in library-based approaches is library size explosion. In ROSES, this problem is
minimized by the use of layered library structures where a service is factorized so that its implementation
uses elements of different layers. This scheme increases reuse of library elements since the elements of the
upper layers must use the services provided by the elements in the immediate lower layer.
Designers are able to extend ROSES libraries since they are implemented in an open format. This is
an important feature since it enables the support of different standards while reusing most of the basic
elements in the libraries.
Table 24.2 shows some of the existing HW/SW components in the current ROSES IP library and give
the type of communication they use in their interfaces.
TABLE 24.1 HW/SW Communication APIs
Basic component
interfaces API functions
SW Register Put/get
Signal Sleep/wakeup
FIFO Put/get
SHM Read/write
GSHM Lock/unlock/read/write
HW Register Put/get
FIFO Put/get/full/empty
ASFIFO Put/get/IT(full/empty)
Buffer BPut/BGet
Event Send/IT(receiver)
AHB master/slave Read/write
Timer Set/wait
TABLE 24.2 Sample IP Library
IP Description Interfaces
SW host-if Host PC interface Register/signal
Rand Random number generator Signal/FIFO
mult-tx Multipoint FIFO data transmission FIFO
reg-cong Register conguration Register/FIFO/SHM
shm-sync SHM synchronization SHM/signal
stream FIFO data streaming GSHM/FIFO/signal
HW ARM7 Processor core ARM7 pins
TX_Framer Data package framing 17 registers, 1 FIFO
2006 by Taylor & Francis Group, LLC
24-14 Embedded Systems Handbook
Stream
1 void stream::stream_beh()
2 {
3 long int * P;
4 ...
5 for(;;)
6 {...
7 P=(long int*)P1.Lock();
8 P2.Sleep();
9 for (int i=0; i<8; i++)
10 {
11 long int val=P3.Get();
12 P4.Put(*(P+i+8));
13 ...};...
14 P1.Unlock();
15 }
16 ...
17 }
P3
(b) (a)
P4
P1 P2
(GSHM)
(FIFO)
(Signal)
(FIFO)
P1 (Register)
P2 (Register)
P3 (Register)
P15 (Register)
P16 (Register)
P17 (Register)
P18 (FIFO)
P19 (FIFO)
P20 (FIFO)
Register 16
Register 17
Register 1
Register 2
Register 3
Register 15
FIGURE 24.7 (a) The stream software IP and (b) the TX_Framer hardware IP.
Figure 24.7(a) shows the stream software IP and part of its code to demonstrate the utilization of the
communication APIs. Its interface is composed of four ports: two for the FIFO API (P3 and P4), one
for the signal API (P2), and one for the GSHM API (P1). In line 7 of Figure 24.4(a), the stream IP uses
P1 to lock the access to the SHM that contains the data that will be streamed. P2 is used to suspend the
task that lls-up the SHM (line 8). Then, some header information is got from the input FIFO using P3
(line 11) and streamed to the output FIFO using P4 (line 12). When streaming is nished, P1 is used to
unlock the access to the SHM (line 14).
Figure 24.7(b) shows the TX_Framer hardware IP, which is part of a VDSL modem and responsible
for packaging data into ATM-network compatible frames. Its interface is composed of 17 conguration
registers (P1P17) and one single-handshake input FIFO (P18P20). The registers are used to congure the
IP functionality and have bit sizes varying from 2 to 11, while the FIFO is used to store data packets that will
be inserted into specic places in the output ATM frames. These ports are driven directly by the compatible
outputs from register and ASFIFO channel adapters that are generated by the hardware wrapper generator.
24.5 Component-Based Design of a VDSL Application
24.5.1 Specication
The design presented in this section illustrates the IP integration capabilities of ROSES. We redesigned
part of a VDSL modem that was prototyped by Reference 31 using discrete components (the shaded part
in Figure 24.8[a]). The block diagram for the modem subset used in the rest of this paper is shown in
Figure 24.8(b). It corresponds to a deframing/framing unit (DFU), composed of two ARM7 processors
and the TX_Framer. The TX_Framer is part of the VDSL Protocol Processor. In this experiment, it is
used as a hard IP component described at the RTL. The partition of processors/tasks was suggested by the
design team of the VDSL-modem prototype.
Processors exchange data using three asynchronous FIFO buffers. The TX_Framer IP has some cong-
uration registers and input a data streamthrough a synchronous FIFObuffer. Tasks use a variety of control
and data-transmission protocols to communicate. For instance, a task can block/unblock the execution
of other tasks by sending them an OS signal. Tasks use for data transmission: a FIFO memory buffer,
two shared memories (with or without semaphores), and direct register access. Despite representing only
2006 by Taylor & Francis Group, LLC
Hardware/Software Interface Design for SoC 24-15
H
o
s
t
P
C
... ...
(TX_Framer)
config. reg.
Data
processing
HW IP3
(ARM7 processor)
HW IP2
T9
T6
T8
T4
T5
T7
FIFO
FIFO
FIFO
FIFO
Signal
Signal
Data communication
Control communication
Shared memory space
FIFO memory buffer
Software IP (task)
(ARM7 processor)
HW IP1
Signal
Signal
T2
T3
T1
Register 1
Register 2
Register 3
Register 15
Register 16
Register 17
Host PC
Part redesigned as a multicore SoC
VDSL modem
processor
Constellation
processor
VDSL protocol
processor
A
n
a
l
o
g
f
r
o
n
t
-
e
n
d
ATM
layer
DSP
ARM7 RAM
I-M Di-M
BL-M V-M
BL-M: bit-loading memory
V-M: variance memory
I-M: interleaver memory
Di-M: de-interleaver memory
ASIC
FPGA
FPGA
Digital
front-end
ARM7
MCU2
MCU1
T
w
i
s
t
e
d
-
P
a
i
r
(
c
o
p
p
e
r
l
i
n
e
)
(a)
(b)
FIGURE 24.8 (a) VDSL modem prototype and (b) DFU block diagram.
a subset of the VDSL modem, the design of the DFU remains quite challenging. In fact, it uses two pro-
cessors executing parallel tasks. The control over the three modules of the specication is fully distributed.
All three modules act as masters when interacting with their environment. Additionally, the application
includes multipoint communication channels requiring sophisticated OS services.
24.5.2 DFU Abstract Architecture
Figure 24.9 shows the abstract architecture model that captures the DFU specication with point-to-point
communications between the three main IP cores. VM1 and VM2 are two virtual processors, and VM3
corresponds to the TX_Framer function. VM1 and VM2 include several submodules corresponding to
software tasks T1 through T9 assigned to these processors. This abstract model can be mapped onto differ-
ent concrete microarchitectures depending on the selected IP components and on desired performance,
area, and power constraints. For instance, the three point-to-point connections (VC1, VC2, and VC3)
between VM1 and VM2 can be mapped onto a bus or onto an SHM.
2006 by Taylor & Francis Group, LLC
24-16 Embedded Systems Handbook
M1 M2
T5
T7
T8
T6
T9
T4
T3
T1
VM1
T2
VC1
VC2
VC3
.
.
.
SAP
.
.
.
.
.
.
M3
VM3 VM2
.
.
.
.
.
.
FIGURE 24.9 DFU abstract architecture specication.
TABLE 24.3 HW/SW IP Utilization
IP Description Use
SW host-if Host PC interface T1
Rand Random number generator T2, T3
mult-tx Multipoint FIFO data transmission T4, T8
reg-cong Register conguration T5
shm-sync SHM synchronization T6, T9
Stream FIFO data streaming T7
HW ARM7 Processor core VM1, VM2
TX_Framer Data package framing VM3
24.5.3 MPSoC RTL Architecture
For the implementation of the DFU virtual architecture, two hardware IP cores have been selected: an
ARM7 processor and the TX_Framer. The application software has been built by reusing several available
software IP components for implementing tasks T1 to T9. Table 24.3 lists the selected IP components and
indicates their correspondence to the VMs and submodules in the DFU virtual architecture. The interfaces
of the selected software IP components in Table 24.3 (see Table 24.2) match the communication type of
the software tasks of the virtual architecture in Figure 24.8(b).
Figure 24.10 shows the RTL microarchitecture obtained after HW/SW wrapper generation. It is impor-
tant to notice that, from an abstract target architecture containing an abstract ARM7 processor, ROSES
automatically generates aconcreteARM7 local architecture containing additional IP components, which
implement local memory, local bus, and address decoder.
Each software wrapper (custom OS) is customized to the set of software IPs corresponding to the
tasks executed by the processor core. For example, software IPs running on VM2 access the custom OS
using communication primitives available through the API: register is used to write/read to/from the
conguration/status registers inside the TX_Framer block, while SHM and GSHM are used to manage
shared-memory communication. Each OS contains a round-robin scheduler (Sched) and resource man-
agement services (Sync, IT). The driver layer contains low-level code to access the channel adapters within
2006 by Taylor & Francis Group, LLC
Hardware/Software Interface Design for SoC 24-17
VM1
Signal
internal
Signal
LReg
Pipe
LReg
IT Sync Sched
Signal Pipe
VM2
ARM7
processor
core
Memory
(RAM/ROM)
Address
decoder
IP
(TX_Framer)
SHM
internal
Pipe
internal
Pipe
LReg
IT Sync Sched
Direct
Timer
LReg
Signal
internal
Semaph
internal
Pipe
buffer
Direct
register
Pipe SHM GSHM Signal Timer
clock
reset
CS RS CK CS
a
d
d
r
e
s
s
c
o
n
tr
o
l
CK
a
d
d
r
e
s
c
o
n
tr
o
l
d
a
ta
a
d
d
r
e
s
s
c
o
n
tr
o
l
CK
d
a
ta
a
d
d
r
e
s
s
HW
wrapper
d
a
ta
a
d
d
.
c
tr
l
d
a
ta
a
d
d
.
c
tr
l
test vector
d
a
ta
a
d
d
.
c
tr
l
HNDSHK 3
comm. adapter
d
a
ta
a
d
d
.
c
tr
l
CK Polling
comm. adapter
CK HNDSHK 1
comm. adapter
CK
ARM7 processor adapter
CK
CS RS CK CS
d
a
ta
a
d
d
r
e
s
s
c
o
n
tr
o
l
CK
d
a
ta
a
d
d
r
e
s
s
c
o
n
tr
o
l
d
a
ta
a
d
d
r
e
s
s
c
o
n
tr
o
l
CK
d
a
ta
a
d
d
r
e
s
s
HW
wrapper
d
a
ta
a
d
d
.
c
tr
l
d
a
ta
a
d
d
.
c
tr
l
d
a
ta
a
d
d
.
c
tr
l
HNDSHK
1 CA
HNDSHK
3 CA
CK
d
a
ta
a
d
d
.
c
tr
l
d
a
ta
a
d
d
.
c
tr
l
d
a
ta
a
d
d
.
c
tr
l
FIFO
CA
CK
d
a
ta
a
d
d
.
c
tr
l
Polling
16 CA
CK
RS CK
d
a
ta
CK
ARM7 processor adapter
CK
Wrapper bus
CS
Wrapper bus
CS
c
o
n
tr
o
l
c
o
n
tr
o
l
...
...
CK
RS
ARM7 local bus
d
a
ta
RS
ARM7 local bus
TIMER CK CK
... ...
VM3
Custom OS
Custom OS
ARM7
processor
core
Memory
(RAM/ROM)
Address
decoder
Polling
1 CA
FIGURE 24.10 Generated MPSoC Architecture.
TABLE 24.4 Results for OS Generation
Number of Number of lines Code size Data size
OS results lines in C in assembly (bytes) (bytes)
VM1 968 281 3829 500
VM2 1872 281 6684 1020
Context switch (cycles) 36
Latency for interrupt 59(OS) + 28(ARM7)
treatment (cycles)
System call latency (cycles) 50
Resume of task execution 26
(cycles)
the hardware wrappers (e.g., Pipe LReg for the HNDSHK channel adapter), and some low-level kernel
routines.
24.5.4 Results
The manual design of a full VDSL modemrequires several person-years; the presented DFUwas estimated
as a more than ve persons-year effort. When using the ROSES IP integration capabilities, the overall
experiment took only one person during 4 months, including all validation and verication time (but
not counting the effort to develop library components and to debug design tools). This corresponds to
a 15-times reduction in design effort (a more detailed presentation can be found in Reference 32).
Application code and generated OS are compiled and linked together to be executed on each ARM7
processor. The hardware wrapper can be synthesized using RTL synthesis. As can be seen in Table 24.4,
most OS code is generated in C, only a small part of it is in assembly and includes some low-level
routines (e.g., context switching and processor boot) that are specic to each processor. If we compare
the numbers presented in Table 24.4 with commercial embedded OSs, the results are still very good. The
minimum size for such OSs is around 4 KB; but with this size, few of them could provide the required
functionality. Table 24.5 shows the numbers obtained after RTL synthesis of the hardware wrappers
using a CMOS (complementary metal oxide semiconductor) 0.35 m technology. These are good results
because wrappers account for less than 5% of the ARM7s core surface and have a critical path that
2006 by Taylor & Francis Group, LLC
24-18 Embedded Systems Handbook
TABLE 24.5 Results for Hardware Wrapper Generation
Number of Critical path Maximum frequency
HW interfaces gates delay (nsec) (MHz)
VM1 3284 5.95 168
VM2 3795 6.16 162
Latency for read operation 6
(clock cycles)
Latency for write operation 2
(clock cycles)
Number of code lines 2168
(RTL VHDL)
corresponds to less than 15% of the clock cycle for the 25 MHz ARM7 processors used in this case
study.
24.5.5 Evaluation
Results show that the component-based approach can generate HW/SW interfaces and OSs that are
as efcient as the manually coded/congured ones. The HW/SW frontier in wrapper implementation
can be easily displaced by changing some library components. This choice is transparent to the nal
user since everything that implements the interconnect API is generated automatically (the API does not
change only its implementation does). Furthermore, correctness and coherence can be veried inside
tools and libraries against the API semantics without having to impose xed boundaries to the HW/SW
frontier (in contrast to standardized component interfaces or buses).
The utilization of layered library components provides lots of exibility; the design environment
can be easily adapted to accommodate different languages to describe system behavior, different task
scheduling and resource management policies, different global communication interconnect topologies
and protocols, a diversity of processor cores and IP cores, and different memory architectures. In most
cases, inserting a new design element in this environment only requires to add the appropriate library
components. Layered library components are at the roots of the methodology; the principle followed
is to contain a unique functionality, and to respect well-dened interfaces that enable easy composition.
This layered structure prevents library size explosion since composition is used to implement complex
functionality and to increase component reutilization.
As explained in this chapter and illustrated by the design case study, ROSES uses a component-based
methodology that presents a unique combination of features:
It implements a general approach for the automatic integration of heterogeneous and hard
IP components, although it easily accommodates the integration of homogeneous and soft IP
components.
It offers an architectural-independent API, integrated into SystemC, containing high-level com-
munication primitives and enhancing the free development of reusable components. Application
software accessing this API does not need to be retargeted to each system implementation.
It adopts a library-based approach to wrapper generation. As long as components communicate
through a known protocol, communication synthesis can be done automatically, without any addi-
tional designeffort. Ina formal-based approach, instead, the designer must describe the component
interface by some appropriate formalism, such as an FSM [21] or a regular expression [20].
It uniformly addresses the generation of HW/SW parts of communication wrappers for program-
mable components. While some approaches consider only hardware [20,21] or software [18,22]
wrappers, others also consider HW/SW parts but are restricted to predened wrapper libraries
2006 by Taylor & Francis Group, LLC
Hardware/Software Interface Design for SoC 24-19
for given communication schemes [5]. The library-based approach of ROSES, in turn, allows the
synthesis of software interfaces for various communication schemes.
It can be used with any architectural template and communication structure, such as a bus, an NoC,
or point-to-point connections between components. It is also congurable to synthesize wrappers
for any bus or core standard.
24.6 Conclusions
Reuse of IP components is a major requirement for the design of complex embedded SoCs. However, reuse
is a complex process that involves many steps, requires support from specialized tools and methodologies,
and inuences current design practices. The integration of IP components into a particular design is maybe
the most complex step of the reuse process. Many design approaches, such as bus- and core-based design
and platform-based design, are aimed at an easier IP integration. Nevertheless, many problems are still
open, in particular the automatic synthesis of HW/SW wrappers between heterogeneous and hard IP
components.
The chapter has shown that the component-based design methodology provides a complete, generic,
and efcient solution to the HW/SW interfaces design problem. Starting from a high-level function
specication and an abstract architecture, design tools can automatically generate HW/SW wrappers
that are necessary to integrate heterogeneous IP components that have been selected to implement the
application. Designers do not need to design any low-level interfacing details manually. The chapter
has also shown how HW/SW component interfaces can be decomposed and easily adapted to different
communication structures and bus and core standards.
References
[1] ITRS. Available at http://public.itrs.net/.
[2] K. Keutzer, A.R. Newton, J.M. Rabaey, and A. Sangiovanni-Vincentelli. System-Level Design:
Orthogonalization of Concerns and Platform-based Design. IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, 19: 15231543, 2000.
[3] M. Sgroi, M. Sheets, A. Mihal, K. Keutzer, S. Malik, J. Rabaey, and A. Sangiovanni-Vincentelli.
Addressing the System-on-Chip Interconnect Woes through Communication-Based Design.
In Proceedings of the 38th Design Automation Conference, Las Vegas, NV, June 2001.
[4] VSIA. http://www.vsi.org.
[5] J.-Y. Brunel, W.M. Kruijtzer, H.J.H.N. Kenter, F. Ptrot, L. Pasquier, E.A. de Kock, and W.J.M. Smits.
COSY Communication IPs. In Proceedings of the 37th Design Automation Conference, Los Angeles,
CA, June 2000.
[6] Cadence Design Systems, Inc. Virtual Component Co-design. http://www.cadence.com/products/
vcc.html.
[7] D. Gajski, J. Zhu, R. Domer, A. Gerslauer, and S. Zhao. SpecC Specication Language and
Methodology. Kluwer Academic Publishers, Dordrecht, 2000.
[8] CoWare. http://www.coware.com.
[9] IBM CoreConnect Bus Architecture. http://www.chips.ibm.com/bluelogic.
[10] D. Wingard. MicroNetwork-Based Integration for SOCs. In Proceedings of the 38th Design
Automation Conference, Las Vegas, NV, June 2001.
[11] J. Rowson and A. Sangiovanni-Vincentelli. Interface-Based Design. In Proceedings of the 34th
Design Automation Conference, 1997.
[12] ARM AMBA. http://www.arm.com.
[13] Sonics SiliconBackplane MicroNetwork. http://www.sonicsinc.com.
[14] R.A. Bergamaschi and W.R. Lee. Designing Systems-on-Chip Using Cores. In Proceedings of the
37th Design Automation Conference, 2000.
2006 by Taylor & Francis Group, LLC
24-20 Embedded Systems Handbook
[15] Open Core Protocol. http://www.ocpip.org.
[16] G. de Micheli and L. Benini. Networks-on-Chip: A New Paradigm for Systems-on-Chip Design.
In Proceedings of the Design, Automation and Test in Europe Conference, 2002.
[17] L. Gauthier, S. Yoo, and A.A. Jerraya. Automatic Generation and Targeting of Application Specic
Operating Systems and Embedded Systems Software. IEEE Transactions on Computer-Aided Design
and Integrated Circuits and Systems, 20(11): 12931301, 2001.
[18] C. Bke. Combining Two Customization Approaches: Extending the Customization Tool TEReCS
for Software Synthesis of Real-Time Execution Platforms. In Proceedings of the Workshop on
Architectures of Embedded Systems (AES2000), Karlsruhe, Germany, January 2000.
[19] L. Benini, A. Bogliolo, and G. De Micheli. Dynamic Power Management of Electronic Systems.
In Proceedings of International Conference on Computer Aided Design, 1998.
[20] R. Passerone, J.A. Rowson, and A. Sangiovanni-Vincentelli. Automatic Synthesis of Interfaces
between Incompatible Protocols. In Proceedings of the 35th Design Automation Conference, 1998.
[21] J. Smith and G. deMicheli. Automated Composition of Hardware Components. In Proceedings of
the 35th Design Automation Conference, 1998.
[22] P. Chou et al. IPChinook: An Integrated IP-Based Design Framework for Distributed Embedded
Systems. In Proceedings of the 36th Design Automation Conference, 1999.
[23] M. Birnbaum and H. Sachs. How VSIA Answers the SoC Dilemma. IEEE Computer, 32: 4250,
June 1999.
[24] C. Barna and W. Rosenstiel. Object-Oriented Reuse Methodology for VHDL. In Proceedings of the
Design, Automation and Test in Europe Conference, 1999.
[25] P. Schaumont et al., Hardware Reuse at the Behavioral Level, In Proceedings of the 36th Design
Automation Conference, 1999.
[26] F.J. Rammig, Web-based System Design with Components off The Shelf (COTS). In Proceedings
of the Forumon Design Languages, Tuebingen, 2000.
[27] SystemC, http://www.systemc.org.
[28] D. Lyonnard, S. Yoo, A. Baghdadi, and A.A. Jerraya, Automatic Generation of Application-Specic
Architectures for Heterogeneous Multiprocessor System-on-Chip. In Proceedings of the 38th Design
Automation Conference, Las Vegas, NV, June 2001.
[29] W.O. Cesrio, G. Nicolescu, L. Gauthier, D. Lyonnard, and A.A. Jerraya. Colif: A Design
Representation for Application-Specic Multiprocessor SOCs. IEEE Design & Test of Computers,
18: 820, 2001.
[30] S. Yoo, G. Nicolescu, D. Lyonnard, A. Baghdadi, and A.A. Jerraya. A Generic Wrapper Architecture
for Multi-Processor SoC Cosimulation and Design. In Proceedings of the Interantional Symposium
on HW/SWCodesign (CODES), 2001.
[31] M. Diaz-Nava and G.S. Okvist. The Zipper Prototype: A Complete and Flexible VDSL Multi-
Carrier Solution. ST Journal Special Issue xDSL, 2(1): 1/321/3, September 2001.
[32] W. Cesrio, A. Baghdadi, L. Gauthier, D. Lyonnard, G. Nicolescu, Y. Paviot, S. Yoo, M. Diaz-Nava,
and A.A. Jerraya. Component-Based Design Approach for Multicore SoCs. In Proceedings of the
39th Design Automation Conference, New Orleans, June 2002.
2006 by Taylor & Francis Group, LLC
25
Design and
Programming of
Embedded
Multiprocessors:
An Interface-Centric
Approach
Pieter van der Wolf,
Erwin de Kock,
Tomas Henriksson,
Wido Kruijtzer, and
Gerben Essink
Philips Research
25.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-1
25.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-2
25.3 TTL Interface Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-3
25.4 TTL Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-4
Inter-Task Communication TTL Multi-Tasking Interface
TTL APIs
25.5 Multiprocessor Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-8
Source Code Transformation Automated Transformation
25.6 TTL on an Embedded Multi-DSP . . . . . . . . . . . . . . . . . . . . . . 25-14
The Multi-DSP Architecture TTL Implementation
Implementation Results Implementation Conclusions
25.7 TTL in a Smart Imaging Core. . . . . . . . . . . . . . . . . . . . . . . . . . . 25-18
The Smart Imaging Core TTL Shells
25.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-21
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-22
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-22
25.1 Introduction
Modern consumer devices need to offer a broad range of functions at low cost and with low energy
consumption. The core of such devices often is a multiprocessor System-on-Chip (MPSoC) that imple-
ments the functions as an integrated hardware/software solution. The integration technology used for
Originally published as: P. van der Wolf, E. de Kock, T. Henriksson, W. Kruijtzer, and G. Essink. Proceedings of
CODES + ISSS 2004 Conference, Stockholm, September 810. ACMPress. Reprinted with permission.
25-1
2006 by Taylor & Francis Group, LLC
25-2 Embedded Systems Handbook
building such MPSoCs from a set of hardware and software modules is typically based on low-level
interfaces for the integration of the modules. For example, the usual way of working is to use bus
interfaces for the integration of hardware devices, with ad-hoc mechanisms based on memory mapped
registers and interrupts to synchronize hardware and software modules. Further, support for reuse is
typically poor and a method for exploring trade-offs is often missing. As a consequence MPSoC inte-
gration is a labor-intensive and error-prone task and opportunities for reuse of hardware and software
modules are limited.
Integration technology for MPSoCs should be based on an abstract interface for the integration of
hardware and software modules. Such abstract interface should help to close the gap between the
application models used for specication and the optimized implementation of the application on a
multi-processor architecture. The interface must enable mapping technology that supports systematic
renement of application models into optimized implementations. Such interface and mapping technol-
ogy will help to structure MPSoC integration, thereby enhancing both the productivity and the quality of
MPSoC design.
We present design technology for MPSoC integration with an emphasis on three contributions:
1. We present TTL, a task-level interface that can be used both for developing parallel application
models and as a platform interface for integrating hardware and software tasks on a platform
infrastructure. The TTL interface makes services for inter-task communication and multi-tasking
available to tasks.
2. We show how mapping technology can be based on TTL to support the structured design and
programming of embedded multi-processor systems.
3. We showthat the TTLinterface canbe implementedefciently ondifferent architectures. We present
both a software and a hardware implementation of the interface.
After discussing related work in Section 25.2, we present the requirements for the TTL interface in
Section25.3. The TTLinterface is presentedinSection25.4. Section25.5discusses the mapping technology,
exemplied by several code examples. We illustrate the design technology in Sections 25.6 and 25.7 with
two industrial design cases: a multi-DSP solution and a smart-imaging multi-processor. We present
conclusions in Section 25.8.
25.2 Related Work
Interface-based design has been proposed as a way to separate communication from behavior so that
communication renement can be applied [1]. Starting from abstract token passing semantics, commu-
nication mechanisms are incrementally rened down to the level of physical interconnects. In References 2
and 3, a library-based approach is proposed for generating hardware and software wrappers for the inte-
gration of heterogeneous sets of components. The wrappers provide the glue to integrate components
having different (low-level) interfaces. No concrete interface is proposed. In Reference 4 transaction level
models (TLMs) on the device or component level are discussed.
In contrast, we present an abstract task-level interface, named TTL, which can be implemented as
platform interface. This interface is the target for the mapping of tasks. Previously, several task-level
interfaces and their implementations have been developed at Philips [57]. TTL brings these interfaces
together in a single framework, to unify themas a set of interoperable interface types.
The data transfer and storage exploration (DTSE) method [8] of IMEC focuses on source code trans-
formation to optimize memory accesses and memory footprint. To our knowledge the method does not
address the mapping of concurrent applications onto multiprocessor platforms. The Task Concurrency
Management [9] method focuses on run-time scheduling of tasks on multiprocessor platforms to opti-
mize energy consumption under real-time constraints. The interaction between these tasks is based on
low-level primitives such as mutexes and semaphores. As a result, the tasks are less re-usable than TTL
tasks and the design and transformation of tasks is more difcult and time consuming.
2006 by Taylor & Francis Group, LLC
Design and Programming of Embedded Multiprocessors 25-3
The Open SystemC Initiative [10] provides a modeling environment to enable system-level design and
IP exchange. Currently, the environment does not standardize the description of tasks at the high level
of abstraction that we aim at. However, TTL can be made available as a class library for SystemC in the
future.
25.3 TTL Interface Requirements
We present a design method for implementing media processing applications as MPSoCs. A key ingredi-
ent of our design method is the Task Transaction Level (TTL) interface. On the one hand, application
developers can use TTL to build executable specications. On the other hand, TTL provides a plat-
form interface for implementing applications as communicating hardware and software tasks on a
platform infrastructure. The TTL interface enables mapping technology that automates the renement
of application models into optimized implementations. Using the TTL interface to go fromspecication
to implementation allows the mapping process to be an iterative process, where during each step selected
parts of the application model are rened. Figure 25.1 illustrates the basic idea, with the TTL interface
drawn as dashed lines.
For the TTL interface to provide a proper foundation for our design method, it must satisfy a number
of requirements. First it must offer well-dened semantics for modeling media processing applications.
It must allow parallelism and communication to be made explicit to enable mapping to multi-processor
architectures.
Further, the TTL interface must be an abstract interface. This makes the interface easy to use for
application development because the developer does not have to consider low-level details. An abstract
interface also helps to make tasks reusable, as it hides underlying implementation details. For example, if
a task uses an abstract interface for synchronization with other tasks, it can be unaware and independent
of the implementation of the synchronization with, for example, semaphores or some interrupt-based
scheme.
The platforminfrastructure makes services available to tasks via the TTL platforminterface. Specically,
these are services for inter-task communication, multi-tasking, and (re)conguration. Rather than offering
a low-level interface and implementing, for example, synchronization as part of all the tasks, we factor
out such generic services from the tasks to implement them as part of the platform infrastructure. This
implementation is done once for a platform, optimized for the targeted application domain and the
underlying multiprocessor architecture.
An abstract platforminterface provides freedomfor implementing the platforminfrastructure. It must
allow a broad range of platform implementations, including different multiprocessor architectures. For
example, both shared memory and message-passing architectures should be supported. Further, the
Platform infrastructure
TTL
A T S K S
Mapping TTL Mapping TTL
TTL
Task
Task
Task
FIGURE 25.1 TTL interface for building parallel application models and implementing them on a platform
infrastructure.
2006 by Taylor & Francis Group, LLC
25-4 Embedded Systems Handbook
CPU
SW Shell
Task 3
ASP
Task 1 Task 2
TTL API
TTL HW
interface
Interconnect
HW Shell
FIGURE 25.2 TTL interface as software API and as hardware interface in example architecture.
abstraction allows critical parts of a platform implementation to be optimized transparently and enables
evolution of a platform implementation as technology evolves. For example, smooth transition from
bus-based interconnects towards the use of network-on-chip technology should be supported.
The TTL interface must allow efcient implementations of the platform infrastructure and the tasks
integrated on top of it. To enable integration of hardware and software tasks, the interface must be available
both as an API and as a hardware interface. An example of how the TTL interface could manifest itself in
a simple multiprocessor architecture is shown in Figure 25.2.
In the left part of Figure 25.2 the TTL interface is implemented as an API of a software shell executing
on a CPU. Software tasks executing on the CPU can access the platform services via the API. In the right
part of Figure 25.2 a task is implemented as an application-specic processor (ASP). The TTL interface for
integrating the ASP is available as a hardware interface. A hardware shell implements the platform services
on top of a lower interconnect. Such interconnect could, for example, have an interface like AXI [11],
OCP [12], or DTL [13].
25.4 TTL Interface
In this section we present the TTL interface. Specically, we discuss the TTL interface for inter-task
communication and multi-tasking services. We do not discuss reconguration. In this chapter all task
graphs are static.
25.4.1 Inter-Task Communication
We dene the following terminology and associated logical model for the communication between tasks.
The logical model provides the basis for the denition of the TTL inter-task communication interface.
It identies the relevant entities and their relationships (see Figure 25.3).
A task is an entity that performs computations and that may communicate with other tasks. Multiple
tasks canexecute concurrently toachieve parallelism. The mediumthroughwhichthe data communication
takes place is called a channel. A task is connected to a channel via a port. A channel is used to transfer
values from one task to another. A variable is a logical storage location that can hold a value. A private
variable is a variable that is accessible by one task only. A token is a variable that is used to hold a value
that is communicated fromone task to another. A token can be either full or empty. Full tokens are tokens
that contain a value. Empty tokens do not contain a valid value, but merely provide space for a task to put
a value in. We also refer to full and empty tokens as data and room, respectively.
Tasks communicate with other tasks by calling TTL interface functions on their ports. Hence, a task has
to identify a port when calling an interface function. We focus on streaming communication: communica-
ting tasks exchange sequences of values via channels. A set of communicating tasks is organized as a task
graph.
2006 by Taylor & Francis Group, LLC
Design and Programming of Embedded Multiprocessors 25-5
Task Port
Empty token Full token
Private variable
with value
Channel
FIGURE 25.3 Logical model for inter-task communication.
TABLE 25.1 TTL Interface Types
Acronym Full name
CB Combined blocking
RB Relative blocking
RN Relative non-blocking
DBI Direct blocking in-order
DNI Direct non-blocking in-order
DBO Direct blocking out-of-order
DNO Direct non-blocking out-of-order
25.4.1.1 Interface Types
Considering the varying needs for modeling media processing applications and the great variety in poten-
tial platform implementations, it is not likely that a single narrow interface can satisfy all requirements.
For example, applications may process tokens of different granularities, where streams of tokens may or
may not be processed strictly in order. Platformimplementations may have different costs associated with
synchronization between tasks, data transfers, and the use of memory. Certain architectures efciently
implement message-passing communication whereas shared memory architectures offer a single address
space for memory-based communication between tasks.
In our view designers are best served if they are offered a palette of communication styles fromwhich
they can use the most appropriate one for the problem at hand. The TTL interface offers support for
different communication styles by providing a set of different interface types. Each interface type is easy to
use and implement. All interface types are based on the same logical model, which enables interoperability
across interface types. A task designer must select an interface type for each port. Different interface types
can be used in a single model, even in a single task. This allows models to be rened iteratively, where
during each step selected parts of a model are rened.
In dening the interface types, we have to choose which properties to support and which properties
to combine in a particular interface type. Some properties hold for all interface types. Specically, all
channels are uni-directional and support reliable and ordered communication. TTL supports arbitrary
communication data types, but each individual channel can communicate tokens of a single type only.
Multi-cast is supported, that is, a channel has one producing task but can have multiple consuming tasks.
The TTL interface types are listed in Table 25.1.
25.4.1.2 Interface Type CB
The interface type CB provides two functions for communication between tasks:
write (port, vector, size)
read (port, vector, size)
The write function is used by a producer to write a vector of size values into the channel connected
to port. The read function is used by a consumer to read a vector of size values from the channel
connected to port. The write and read functions are also available as scalar functions that operate
2006 by Taylor & Francis Group, LLC
25-6 Embedded Systems Handbook
on a single value at a time. The write and read functions are blocking functions, that is, they do not
return until the complete vector has been written or read, respectively. This interface type is based on our
earlier work on YAPI [5].
This interface type is the most abstract TTL interface type. Since it hides low-level details from the
tasks, it is easy to use and supports reuse of tasks. The write and read functions perform both the
synchronization and the data transfer associated with communication. That is, they check for availability
of room/data, copy data to/from the channel, and signal the availability of data/room. The length of the
communicated vectors may exceed the number of tokens in the channel. The platform implementation
may transfer such vectors in smaller chunks, transparent to the communicating tasks [14]. This inter-
face type is named CB as it combines (C) synchronization and data transfer in a single function with
blocking (B) semantics.
This interface type can be implemented efciently on message-passing architectures or on shared
memory architectures where the processors have local buffers that can hold the values that are read or
written. However, onsharedmemory architectures where the processors donot have suchlocal buffers, this
interface type may yield overhead in copying data between private variables, situated in shared memory,
and the channel buffer in shared memory.
25.4.1.3 Interface Types RB and RN
To provide more exibility for making trade-offs upon task implementation, the other TTL interface types
offer separate functions for synchronization and data transfer. The availability of room or data can be
checked explicitly by means of an acquire function and can be signaled by means of a release function.
The acquire function can be blocking or non-blocking. A non-blocking acquire function does not wait
for data or room to be available, but returns immediately to report success of failure. The functions for
the producer are:
reAcquireRoom (port, count)
tryReAcquireRoom (port, count)
store (port, offset, vector, size)
releaseData (port, count)
reAcquireRoom is the blocking acquire function and tryReAcquireRoom is the non-blocking
acquire function. The acquire and release functions synchronize for vectors of count tokens at a time.
The acquire functions are named reacquire since they also acquire tokens that have previously been
acquired and not yet released. That is, they do not change the state of the channel. This helps to reduce
the state saving effort for tasks as the acquire function can simply be issued again upon a next task
invocation. This behavior is similar to GetSpace in Reference 6. Data accesses can be performed on
acquired room with the store function, which copies a vector of size values to the acquired empty
tokens. The store function can perform out-of-order accesses on the acquired empty tokens using a
relative reference offset. An offset of 0 refers to the oldest acquired and not yet released token. The
store function is also available as a scalar function. The releaseData function releases the count
oldest acquired tokens as full tokens on port.
The functions for the consumer are:
reAcquireData (port, count)
tryReAcquireData (port, count)
load (port, offset, vector, size)
releaseRoom (port, count)
These interface types are named RB and RN with the R of relative, B of blocking, and N of non-blocking.
Offering separate functions for synchronization and data transfer allows data transfers to be performed
on a different granularity and rate than the related synchronizations. This may, for example, be used
to reduce the cost of synchronization by performing synchronization at a coarse grain outside a loop,
while performing computations and data transfers at a ner grain inside the loop. This interface type can
be used to avoid the overhead of memory copies on shared memory architectures at a lower cost than
2006 by Taylor & Francis Group, LLC
Design and Programming of Embedded Multiprocessors 25-7
with CB, as coarse-grain synchronization can be combined with small local buffers, for example, registers,
for ne-grain data transfers. Additionally, for some applications the support for out-of-order accesses
helps to reduce the cost of private variables that are needed in a task. Further, with this interface type,
tasks can selectively load only part of the data fromthe channel, thereby allowing the cost of data transfers
to be reduced. The drawback, compared to CB, is that these interface types are less abstract.
25.4.1.4 Interface Type DBI and DNI
The RB and RN interface types hide the memory addresses of the tokens from the tasks. This supports
reuse of tasks. However, it may also incur inefciencies upon data transfers, like function call overhead,
accesses to the channel administration, and address calculations. To avoid such inefciencies, TTL offers
interface types that support direct data accesses. In these interface types the acquire functions return a
reference to the acquired token in the channel. This reference can subsequently be used by the task to
directly access the data/room in the channel without using a TTL interface function. The functions for
the producer are:
acquireRoom (port, &token)
tryAcquireRoom (port, &token)
token->field = value
releaseData (port)
The functions for the consumer are:
acquireData (port, &token)
tryAcquireData (port, &token)
value = token->field
releaseRoom (port)
The acquire and release functions acquire/release a single token at a time. Supporting vector operations for
these interface types would result in a complex interface. For example, it would expose the wrap-around
in the channel buffer or would require a vector of references to be returned. Since tasks must still be
able to acquire more than one token, these acquire functions acquire the rst un-acquired token and
change the state of the channel, unlike the reacquire functions of RB and RN. The release functions release
the oldest acquired token on port. The interface types are named DBI and DNI with the D of direct,
B of blocking, N of non-blocking, and I of in-order as tokens are released in the same order as they are
acquired. These interface types can be implemented efciently on shared memory architectures [7] and
are suited for software tasks that process coarse-grain tokens.
25.4.1.5 Interface Type DBO and DNO
In some cases tasks do not nish the processing of data in the same order as the data was acquired.
In particular when large tokens are used, it should be possible to release a token as soon as a task is nished
with it. For this purpose TTL offers the DBO and DNO interface types (O for out-of-order). The only
difference with the DBI and DNI interface types is in the release functions:
releaseData (port, &token)
releaseRoom (port, &token)
The token reference allows the task to specify which token should be released. The out-of-order release
supports efcient use of memory at the cost of a more complex implementation of the channel.
25.4.2 TTL Multi-Tasking Interface
To support different forms of multi-tasking, TTL offers different ways for tasks to interact with the
scheduler. Thereto TTL supports three task types.
The task type process is for tasks that have their own (virtual) thread of execution and that do not
explicitly interact with the scheduler. This task type is suited for tasks that have their private processing
2006 by Taylor & Francis Group, LLC
25-8 Embedded Systems Handbook
resource or that rely on the platform infrastructure to perform task switching and state saving implicitly.
For example, this task type is well suited for software tasks executing on an OS.
The task type co-routine is for cooperative tasks that interact explicitly with the scheduler at points
in their execution where task switching is acceptable. For this purpose TTL offers a suspend function.
This task type may be used to reduce the task-switching overhead by allowing the task to suspend itself at
points where only little state needs to be saved.
The task type actor is for re-exit tasks that perform a nite amount of computations and then return
to the scheduler, similar to a function call. Unless explicitly saved, state is lost upon return. This task type
may be used for a set of tasks that have to be scheduled statically.
25.4.3 TTL APIs
The TTL interface is available both as C++ and as C API. The use of C ++ gives cleaner descriptions of
task interfaces, due to C++ support for templates and function overloading. We use C to link to software
compilers for embedded processors and hardware synthesizers since most of them do not support C++
as input language. For both the C++ API and the C API we offer a generic run-time environment, which
can be used for functional modeling and verication of TTL application models.
25.5 Multiprocessor Mapping
In this section we present a systematic approach to map applications efciently onto multiprocessors.
The key advantage of TTL is that it provides a smooth transition from application development to
application implementation. In our approach, we rewrite the source code of applications to improve ef-
ciency. We focus on source code transformations for multiprocessor architectures taking into account
costs of memory usage, synchronization cycles, data transfer cycles, and address generation cycles.
We do not consider algorithmic transformations because these transformations are application-specic.
Typically, application developers perform these transformations. We also do not consider code transforma-
tions for single target processors because these transformations are processor-specic. We assume that
processor-specic compilers and synthesizers support these transformations, although in todays practice
programmers also write processor-specic C.
In the remainder of this section we present methods and tools to transform source code. First we
present source code transformations to illustrate the advantages of using TTL. Next we present tools that
we developed to automate these transformations.
25.5.1 Source Code Transformation
We use a simple example to illustrate the use of TTL. The example consists of an inverse quantization
(IQ) task that produces data for an inverse zigzag (IZZ) task; see Figure 25.4. We focus on the interaction
between these two tasks.
The TTL interface supports different inter-task communication interface types that provide a trade-off
betweenabstractionand efciency. We illustrate this by means of code fragments. To save space we indicate
scopes by means of indentation rather than curly braces.
25.5.1.1 Optimization for Single Interface Types
The most abstract and easy-to-use interface type is CB, which combines synchronization and data transfer
in write and read functions. Figure 25.5 shows a fragment of the IQ task that reads input (Line 08),
IQ IZZ
FIGURE 25.4 IQ and IZZ example.
2006 by Taylor & Francis Group, LLC
Design and Programming of Embedded Multiprocessors 25-9
01 void IQ::main()
02 while (true)
03 for(int j=0; j<vi; j++)
04 for(int k=0; k<hi; k++)
05 for(int l=0; l<64; l++)
06 VYApixel Cin;
07 VYApixel Cout;
08 read(CinP, Cin);
09 Cout = QTable[ti][l]*Cin;
10 write(CoutP, Cout);
FIGURE 25.5 IQ using interface type CB.
01 void IZZ::main()
02 while (true)
03 VYApixel Cin[64];
04 VYApixel Cout[64];
05 read(CinP, Cin, 64);
06 for (int i=0; i<64; i++)
07 Cout [zigzag[i]] = Cin[i];
08 write(CoutP, Cout, 64);
FIGURE 25.6 IZZ using interface type CB.
01 void IQ::main()
02 while (true)
03 for(int j=0; j<vi; j++)
04 for(int k=0; k<hi; k++)
05 VYApixel Cout[64];
06 for(int l=0; l<64; l++)
07 VYApixel Cin;
08 read(CinP, Cin);
09 Cout[l] = QTable[ti][l]*Cin;
10 write(CoutP, Cout, 64);
FIGURE 25.7 IQ using interface type CB and vector write.
performs the inverse quantization (Line 09) and writes output using a scalar write operation (Line 10).
The write function terminates when the value of variable Cout has been transferred to the channel. This is
repeated for all 64 values of a block (Line 05) and for all blocks in a minimumcoding unit (Line 03 and 04).
Figure 25.6 shows a fragment of the IZZ task using a vector read function (Line 05). The read function
terminates when 64 values fromthe channel have been transferred to the variable Cin. Subsequently, these
values are reordered (Line 06 and 07) and written to the output (Line 08). The channel fromthe IQ task
to the IZZ task implements the write and read functions that handle both the synchronization and the
data transfer. Note that the length of the communicated vectors is not bounded by the number of tokens
in the channel, which makes tasks independent of their environment.
A potential performance problem of the IQ task in Figure 25.5 is that for each pixel, the output
synchronizes with the input of the IZZ task. In Reference 15 we demonstrated that this is costly in terms
of cycles per pixel if the write function is implemented in software. We can solve this problem by calling
the write function outside the inner loop as shown in Figure 25.7 in Line 10. To this end, we need to
store a block of pixels locally in the IQ task (Line 05). Similar source code transformations to reduce the
synchronization rate are possible for the other TTL interface types.
25.5.1.2 Optimization across Interface Types
The disadvantage of the IQtask of Figure 25.7 is the additional local memory requirement. Interface type
RB splits synchronization and data transfer in separate functions such that the synchronization rate can
be decreased without additional local memory requirements.
2006 by Taylor & Francis Group, LLC
25-10 Embedded Systems Handbook
01 void IQ::main()
02 while (true)
03 for(int j=0; j<vi; j++)
04 for(int k=0; k<hi; k++)
05 reAcquireRoom(CoutP, 64);
06 for(int l=0; l<64; l++)
07 VYApixel Cin;
08 read(CinP, Cin);
09 store(CoutP, l, QTable[ti][l]*Cin);
10 releaseData(CoutP, 64);
FIGURE 25.8 IQ using interface type RB.
01 void IZZ::main()
02 while (true)
03 VYApixel Cout[64];
04 reAcquireData(CinP, 64);
05 for(int i=0; i<64; i++)
06 VYApixel Cin;
07 load(CinP, i, Cin);
08 Cout[zigzag[i]] = Cin;
09 write(CoutP, Cout, 64);
10 releaseRoom(CinP, 64);
FIGURE 25.9 IZZ using interface type RB.
Figure 25.8 shows how to decrease the synchronization rate from pixel rate to block rate at the output
of the IQ task. Note that here we assume that the channel can store at least 64 pixels, otherwise the
call of the function reAcquireRoom at Line 05 will never terminate. This assumption on the environ-
ment is not needed with interface type CB. Hence, the IQ task of Figure 25.8 puts more constraints on
its use.
Figure 25.9 shows the IZZ task with separate synchronization and data transfer. The IQ task and the IZZ
task do not need to store blocks locally to interact with each other. They share the tokens in the channel.
If the IQ task and the IZZ task need to execute concurrently, then the channel must be able to contain two
blocks, that is, 128 pixels.
The load function (Figure 25.9, Line 07) and the store function (Figure 25.8, Line 09) use relative
addressing. The advantage of this is that the address generation for the FIFO can be implemented in the
load and store functions. Hence, address generation is hidden for the tasks.
Interface type DBI uses direct addressing rather than relative addressing. Direct addressing has advan-
tages if the tokens of a channel and the variables of a task are stored in the same memory. In that case the
tokens and the variables should be mapped onto the same memory locations to avoid in-place copying
in the memory during the transfer of data from and to the tokens. Such copying occurs for instance
in Figure 25.9 at Line 07 where a value from the channel is copied into variable Cin. Furthermore, the
cost of calling the load and store functions can be avoided. The disadvantage of direct addressing is that
the addresses of the tokens are exposed to tasks. To avoid that tasks must take care of wrap-around in
the FIFO only scalar functions are available. Hence, typically it is more efcient to choose larger tokens if
the synchronization rate has to be low. Figure 25.10 shows the IQ task using direct addressing on its output.
We declare a pointer Cout in Line 04 that is given a value in Line 05. After the room has been acquired,
Cout points to a block of 64 pixels. The channel data type is also block of 64 pixels. The pointer Cout is
used to set the value of the pixels in Line 09 avoiding a call of a store function. Similarly, Figure 25.11
shows the IZZ task using direct addressing on its input avoiding both a call to a load function and a copy
operation from the channel to a variable. Note that the granularity of synchronization between the IQ
output and the IZZ input must be identical, because only scalar functions are available. For this reason,
the IQ task and the IZZ task have become less re-usable.
2006 by Taylor & Francis Group, LLC
Design and Programming of Embedded Multiprocessors 25-11
01 void IQ::main()
02 for(int j=0; j<vi; j++)
03 for(int k=0; k<hi; k++)
04 VYApixel *Cout;
05 acquireRoom(CoutP, Cout);
06 for(int l=0; l<64; l++)
07 VYApixel Cin;
08 read(CinP, Cin);
09 Cout[l] = QTable[ti][l]*Cin;
10 releaseData(CoutP);
FIGURE 25.10 IQ using interface type DBI.
01 void IZZ::main()
02 while (true)
03 VYApixel *Cin;
04 VYApixel Cout[64];
05 acquireData(CinP, Cin);
06 for(int i=0; i<64; i++)
07 Cout[zigzag[i]] = Cin[i];
08 write(CoutP, Cout, 64);
09 releaseRoom(CinP);
FIGURE 25.11 IZZ using interface type DBI.
01 void IZZ::main()
02 if (!tryReAcquireData(CinP, 64)) return;
03 if (!tryReAcquireRoom(CoutP, 64)) return;
04 for (unsigned int i=0; i<64; i++)
05 VYApixel Cin;
06 load(CinP, i, Cin);
07 store(CoutP, zigzag[i], Cin);
08 releaseRoom(CoutP, 64);
09 releaseData(CinP, 64);
FIGURE 25.12 IZZ using interface type RN.
25.5.1.3 Non-Blocking Interface Types
So far, we only discussed interface types that provide blocking synchronization functions. These interfaces
are easy to use because programmers do not have to program what should happen when access to tokens
is denied. Sometimes blocking synchronization is not efcient, for instance, if the state of a task is large
such that it is costly to save it. In that case it may be more efcient to let the programmer decide what
should happen. For this reason, non-blocking synchronization functions are needed. Figure 25.12 shows
how the IZZ task can be modeled as an actor. When the actor is red, it rst checks for available data on its
input (Line 02) and then for available room on its output (Line 03). If the data is available but the room
is not available, then the actor can return without saving its state. In the next ring, it can redo the checks
since the tryReAcquire functions do not modify the state of the channels. If both the data and the room
are available, it is guaranteed that the actor can complete its execution.
25.5.1.4 Channel and Task Merging and Splitting
Channel and task merging and splitting are important for load balancing. In Reference 15 we applied task
merging to reduce the data transfer load, since the cost of data transfer from the IQ task to the IZZ task
is large compared to the amount of computation that the IZZ task performs. Figure 25.13 shows how the
IQ task and the IZZ task can be merged.
The merging of the two tasks is based on the observation that the loop structure of the IZZ task ts in
the loop structure of the IQ task. If one wants to merge two arbitrary tasks, this is not always the case.
2006 by Taylor & Francis Group, LLC
25-12 Embedded Systems Handbook
01 void IQ_IZZ::main()
02 while (true)
03 for(int j=0; j<vi; j++)
04 for(int k=0; k<hi; k++)
05 VYApixel Cin[64];
06 VYApixel Cout[64];
07 read(CinP, Cin, 64);
08 for(int l=0; l<64; l++)
09 Cout[zigzag[l]]=QTable[ti][l]*Cin[l];
10 write(CoutP, Cout, 64);
FIGURE 25.13 Merged IQ and IZZ task.
01 void IQ_IZZ::main()
02 while (true)
03 VYApixel mcu[vi][hi][64];
04 IQ(mcu);
05 for(int j=0; j<vi; j++)
06 for(int k=0; k<hi; k++)
07 IZZ(mcu[j][k]);
08
09 void IQ_IZZ::IQ(mcu)
10 for(int j=0; j<vi; j++)
11 for(int k=0; k<hi; k++)
12 for(int l=0; l<64; l++)
13 VYApixel Cin;
14 read(CinP, Cin);
15 mcu[j][k][l] = QTable[ti][l]*Cin;
16
17 void IQ_IZZ::IZZ(block)
18 VYApixel Cout[64];
19 for(int i=0; i<64; i++)
20 Cout[zigzag[i]] = block[i];
21 write(CoutP, Cout, 64);
FIGURE 25.14 Statically scheduled IQ and IZZ tasks.
A more generic approach to statically schedule the rings of tasks is exemplied in Figure 25.14. The new
task IQ_IZZ executes an innite loop fromwhich it calls the IQ and IZZ task by means of function calls.
The communication between the IQ function and the IZZ function does not have to be synchronized
explicitly because the calling order of the functions guarantees the availability of data and room. For this
reason, we replace the channel by a variable mcu (minimumcoding unit) that is declared in Line 03. The
blocks in the mcu are passed by reference to the IQ function and the IZZ function.
25.5.2 Automated Transformation
We aimto automate the above-mentioned source code transformations to support the proposed method
by tools. It is not the goal toautomate the designdecisionmaking process, because experiences inhigh-level
synthesis and compilation tools show that it is hard to automate this while maintaining transparency for
users. Our goal is to automate the rewriting of the source code according to the design decisions of users.
This approach has two advantages. First, design decisions are explicitly formulated rather than implicitly
coded in the source code. Second, the source code can be rewritten automatically such that modications
and bug xes in the original specication can be inserted automatically in architecture-specic versions
of the code. In this way a large set of design descriptions can be kept consistent.
25.5.2.1 Parser Generation
The rst step in automatic source code transformation is to be able to parse programs and to build data
types that support source code transformation. For this purpose, we use an in-house tool called C3PO
2006 by Taylor & Francis Group, LLC
Design and Programming of Embedded Multiprocessors 25-13
in combination with the publicly available parser generator tool ANTLR [16]. C3PO takes a grammar as
input and synthesizes data types for the non-terminals in the rules of the grammar as well as input for
ANTLR. We use C3PO and ANTLR to generate a C ++ parser and a heterogeneous abstract syntax tree
(AST). We use the same tools to generate visitors for the AST that transform the code. After transformation,
we generate new C ++ or C code from the AST. The transformations that we target are typically inter-le
transformations. For this reason, we process all source les simultaneously as opposed to the usual single
le compilation for single processors.
25.5.2.2 Iterative Transformation
Source code transformation typically is an iterative process in which many versions of the same program
are generated. Automatic source code transformation has the advantage that the generated source code is
consistently formatted and that the transformations can be repeated if necessary. This makes it possible to
keep all versions of a program consistent automatically. For version management we have adopted CVS.
Each iteration uses three versions of the source code. The rst version is the result of the previous iteration
or the original code if it is the rst iteration. The second version is manually augmented source code that is
the input for the automatic transformation. The augmentation can contain for instance design constraints
and design decisions. The third version is the code that is automatically generated. If the original code
changes, for instance, due to bug xes or specication changes, then the changes can be automatically
inserted in the second version of the code by the version management tool. The modied second version
of the code is then given as input to the transformation tools in order to produce the third version of the
code that is the starting point for the next iteration.
25.5.2.3 Automatic Interface Type Renement
We illustrate automatic interface renement using the example of IQ and IZZ. The original source code of
the tasks is given in Figure 25.5 and Figure 25.6. The resulting code is given in Figure 25.8 and Figure 25.9.
The complete code is distributed over six les: a source le and a header le for the denition of each
of the two tasks, and a source le and a header le for the denition of the task graph that instantiates
and connects the two tasks. All these les require changes if the communication between the two tasks
changes. This has been automated in the following way. We augment the source code of the tasks with
synchronization constraints. In Figure 25.5 between Line 04 and 05 we add the line ENTRY(P) and at
the end of the text we add the line LEAVE(P), both in the scope of the loop in Line 04. This annotation
means that we want to synchronize the output of the IQ task on blocks of 64 pixels. Similarly we add
synchronization constraints ENTRY(C) and LEAVE(C) to the IZZ task in Figure 25.6 between Line 04
and 05 and at the end of the text, respectively, both in the scope of the loop of Line 02. Assuming that
the channel between the two tasks is called iqizzbuf, we provide the transformation tool with the design
information shown in Figure 25.15.
This information means that we want the iqizzbuf channel to have 64 tokens (Line 01). Furthermore,
the channel should be implemented in data type Channel, it should handle tokens of type VYApixel, and
it should connect to interface type RB both for output and for input (Line 02). Line 03 and 04 denote the
synchronization constraints: the amount of consumption should not exceed the amount of production
01 iqizzbuf[64]
02 Channel<VYApixel> USING RbIn, RbOut
03 64*IZZ::C <= 64*IQ::P
04 64*IQ::P <= 64*IZZ::C+64
05 STORAGE IQ
06 Cout-> ../iqizzbuf TRANSFORMATION T1
07 STORAGE IZZ
08 Cin-> ../iqizzbuf TRANSFORMATION T2
09 SYNCHRONIZATION
10 IQ, IZZ -> iqizzbuf
FIGURE 25.15 Design constraints and decisions.
2006 by Taylor & Francis Group, LLC
25-14 Embedded Systems Handbook
but the difference between the amount of production and consumption may not exceed the buffer capacity
of the channel. Line 06 and 08 denote that the variables Cout and Cin of the IQ task and the IZZ task,
respectively, should be mapped on the iqizzbuf channel using Transformation T1 and T2 that are available
in a library. This introduces the calls to load and store functions. The result of the call to the load function
in the IZZ task is stored in a new variable, also called Cin. Line 10 denotes that the IQ task and the IZZ
task should be synchronized using the iqizzbuf channel. This introduces the calls to acquire and release
functions at the positions indicated by the ENTRY and LEAVE annotations in the augmented source code.
The resulting source code is given in Figure 25.8 and Figure 25.9.
25.5.2.4 Processor and Channel Binding
The last phase of source code transformation is the link to existing compilers and synthesizers in order
to map the individual tasks to hardware and software. To this end, programmers specify a binding of
tasks to processor types and processor instances. From that information the necessary input, that is,
C les and makeles, for compilation or synthesis to the target processor is generated. Furthermore,
the programmer species specic implementations of channels. For instance, the same interface type
can be implemented differently for intra-processor communication and for inter-processor commu-
nication because of efciency reasons. Each implementation has its own set of names for its interface
functions since function overloading is not available in C. The generated C code contains the data types
and function calls that correspond to the implementations of the channels that the programmer has
chosen.
25.5.2.5 Other Transformations
There are other transformations that are beyond the scope of this paper. We briey mention them here.
We support structure transformation to change the hierarchy of task graphs. We support instance trans-
formations such that multiple instances of the same task or task graph can be transformed individually.
Finally, we plan to support channel and task merging and splitting [15] by connecting to the Compaan
tool suite [17].
25.6 TTL on an Embedded Multi-DSP
In this section we present the implementation of TTL on a multi-DSP. The objectives are to show (1) how
TTL can be implemented and that a TTL implementation is cheap, (2) trade-offs between the imple-
mentation cost and the abstraction level of the TTL interfaces, and (3) how TTL supports the exploration
of trade-offs between, for example, memory use and execution cycles. The TTL implementation is done
without special hardware support. We rst present the multi-DSP architecture. Then we describe how
the implementation of ve TTL interface types has been done and we present quantitative results. Finally,
the results for an implementation of an MP3 decoder application are presented.
25.6.1 The Multi-DSP Architecture
The embedded multi-DSP is a template that allows an arbitrary number of DSPs [18]. Each DSP has
it own memory, which in limited ways can be accessed by (some of) the other DSPs. A DSP with
memory and peripherals is called a DSP subsystem (DSS), see Figure 25.16. The DSPs do not have
a shared address space. Communication between the DSSs is done through memory mapped uni-
directional point-to-point links. Thus, two DSPs may refer to a single memory location with different
addresses. Data may be routed from one point-to-point link to another and so on until it reaches its
destination.
In our instance, the DSP Epics7B from Philips Semiconductors was used. The DSP, which is mainly
used for audio applications, has a dual Harvard architecture with 24 bits wide data path and 12 bit
coefcients.
2006 by Taylor & Francis Group, LLC
Design and Programming of Embedded Multiprocessors 25-15
DSS
DSS
DSS
DSS
DSS
DSS
DSS
DSS
DSS
DSS
DSS
DSS
DSS
DSS
DSS
DSS
E
x
t
e
r
n
a
l
i
n
t
e
r
f
a
c
e
s
M
i
c
r
o
p
r
o
c
e
s
s
o
r
i
n
t
e
r
f
a
c
e
Micro-
processor
FIGURE 25.16 Multi-DSP architecture. Here an instance with 16 DSP subsystems is shown.
25.6.2 TTL Implementation
There are two criteria to decide which TTL interface type to use for a certain application on a certain archi-
tecture. First the interface type must match the application characteristics. Second, the implementation
of the interface type on the target architecture must be efcient.
For audio applications, DBO and DNO are not needed because audio applications do not have large
amounts of data that are producedor consumedout-of-order. Therefore, the other ve interface types have
been implemented on the multi-DSP architecture in order to determine the cost of the implementations.
Most of the TTL functions have been implemented in optimized assembly code. It is justiable to spend
the effort because the TTL functions are implemented only once and used by many applications.
ATTLchannel implementationconsists of twoparts, the channel buffer andthe channel administration.
In the multi-DSP architecture, no special-purpose memories exist, so the channel buffer is a circular
buffer in a RAM. This is where the tokens are stored. The channel administration is a structure that holds
information about the state of the channel. In the multi-DSP architecture, the channel buffer has to be
located in the memory of the DSS where the consumer is executed. This is due to the uni-directional
point-to-point links in the architecture.
25.6.2.1 Channel Administration
The channel administration keeps track of how many tokens there are in the channel and how many
of those are full and empty respectively. It also provides a way to get the next full and the next empty
token from the channel. When the channel buffer is implemented as a circular buffer in a RAM, the
channel administration can be implemented in two different ways with two variables to keep track of
the state of the channel. The rst alternative is to use two pointers, one to the start of the empty tokens
and one to the start of the full tokens. The second alternative is to have one pointer and one counter,
for example, a pointer to the start of the full tokens and a counter telling how many full tokens there
are in the channel. This requires atomic increment and decrement operations, which are not supported
on the multi-DSP architecture. Therefore the channel administration is implemented with two pointers.
The producer updates the pointer to the empty tokens (write_pointer) and the consumer updates the
pointer to the full tokens (read_pointer) and thereby no atomic operations are needed [7]. Another
method to avoid the need for atomic updates is to use two counters and two pointers. That method is
explained in Section 25.7.
When the two pointers point to the same memory location, it is not clear if the channel is full or
empty unless wrap-around counters are used. Wrap-around counters imply expensive acquire functions.
To avoid that problemwe have implemented a channel administration that does not allow the pointers to
point to the same memory location unless the channel is empty. We thereby have a memory overhead of
the size of one token in the channel buffer. In the indirect interfaces the token size is always one word.
Both the producer and the consumer need to access the channel administration. In the multi-DSP
there are no shared memories, therefore the channel administration has to be duplicated and present in
2006 by Taylor & Francis Group, LLC
25-16 Embedded Systems Handbook
Producer side Consumer side
READ_POINTER
WRITE_POINTER
BASE_POINTER
CH_SIZE
BASE_RA
REMOTE_POINTER
WRITE_POINTER
READ_POINTER
BASE_POINTER
CH_SIZE
BASE_RA
REMOTE_POINTER
FIGURE 25.17 Double channel administration for the indirect interface types.
01 Boolean tryReAcquireData(port p, uint n) {
02 uint available_data;
03 available_data = (p->write_pointer p->read_pointer)
modulo p->ch_size;
04 if (available_data >= n)
05 return true;
06 else
07 return false; }
FIGURE 25.18 Pseudo code for tryReAcquireData (RN).
the two DSSs involved in the communication. The two copies are called the local and remote channel
administration. See Figure 25.17. Since the producer and the consumer refer to the channel buffer with
different addresses, this must be taken into consideration when updating the remote channel adminis-
tration. We keep a pointer to the base address in the local address space (base_pointer) and a pointer
to the base address in the remote address space (base_ra). These two pointers are used to calculate the
pointer value to be stored in the remote channel administration. The channel administrations as well as
the channel buffer must be located in memory areas that are accessible via the point-to-point links.
As an example of the implementation of the TTL functions, pseudo code for the tryReAcquireData
function in RN is shown in Figure 25.18.
25.6.3 Implementation Results
The acquire functions for the RN interface type use 9 instructions. The release functions use 15 instruc-
tions. The vector load and store functions use a loop unrolling of 2 and achieve 2.5 instructions per data
word with an overhead of 24 instructions to set up the data transfer. The scalar load and store functions
are in-lined in the task code and use each 10 instructions.
The acquire functions for the direct interface type DNI use between 19 and 33 instructions, dependent
on the state of the channel. The release functions use between 29 and 38 instructions. No data transfer
functions are used. The cost of the data transfers is comparable to the cost of accessing private data
structures in the task.
For the blocking interface types, it is not as easy to determine the cost in terms of instructions for the
individual acquire functions, because they may include task switches. However, an acquire function in
RB, that does not trigger a task switch uses 18 instructions. The release functions and the data transfer
functions for RB have the same cost as those for RN. The same applies to the release functions of DBI with
respect to DNI.
In CB, synchronization and data transfer is combined into one single function. The cost of the
implementation is approximately the sumof the costs of the three corresponding functions in RB.
25.6.3.1 Evaluation Application
An MP3 decoder application has been used for the evaluation of the TTL implementations on the multi-
DSP. The MP3 decoder application was available as a sequential Cprogram. The application was converted
2006 by Taylor & Francis Group, LLC
Design and Programming of Embedded Multiprocessors 25-17
TABLE 25.2 Simulation Results for the Whole
Application
Part in TTL #Memory
TTL IF type #Cycles (%) words
CB 45,579,603 2.9 12493
RB 45,551,243 2.8 12494
RN 45,505,950 2.2 12365
DBI 45,152,454 1.1 9162
DNI 45,108,086 0.5 9041
into a TTL task and additional TTL tasks were added for mimicking the rest of a complete application
and for handling the interaction with the simulation environment.
The application has been implemented with all ve interface types. The RN and DNI implementations
use TTL actors and the other types use TTL processes. The application has also been implemented with
four different granularities of the communication for the RN interface type. In the implementations with
the direct interface types, DNI and DBI, the channel between the input task and the MP3 task uses RN
and RB interface types respectively. This is due to the fact that the amount of communicated data on that
channel is data dependent.
25.6.3.2 Simulation Results
Table 25.2 shows the results of the various interface types with frame-based communication. All channel
buffers have been sized so that they can hold one frame. The memory is the total data memory for the
whole application. The number of cycles is the number used by the whole application to decode a test le.
The blocking implementations use somewhat more memory and have some cycle overhead compared
to the non-blocking implementations, when comparing RB to RN and DBI to DNI. This is due to the fact
that the multi-tasking costs both memory for storing register contents and cycles to save and restore the
register contents. The DNI and DBI interface type use considerably less memory and less cycles than the
other interface types. This is because the data in the channels is accessed directly, without copying it to
and from private variables. The CB version has similar performance as the RB version.
For the DNI implementation, about 0.5% of the cycles are spent in the TTL functions and 99.5%
of the cycles are spent in the tasks. This is of course dependent on the application as well as on the
implementation of the TTL functions.
Figure 25.19 shows the trade-offs that can be made by changing the granularity of the communication.
Here a change of a factor of 36 has been pursued on the channel between the MP3 task and the output
task. In the MP3 decoder, this is made possible by using a sub-frame decoding method, which allows the
MP3 decoder to output blocks smaller than a frame.
The memory is reduced both in the channel buffer, in the MP3 task and in the output task. The channel
buffer sizes have been adjusted to match the granularity of the communication. The cycle overhead
for the small granularity communication has two reasons. Smaller granularity implies more frequent
synchronization calls and smaller buffers imply more frequent task switching.
The implementation of CB allows channel buffers to be smaller than the vector sizes used by the tasks.
One of the advantages with CB is that the channel buffer size can be reduced to achieve a memory-cycle
trade-off without rewriting the tasks themselves. Results for this are shown in Figure 25.20.
25.6.4 Implementation Conclusions
It has been shown that TTL can be implemented efciently on a multi-DSP architecture. It has also been
shown that changing the granularity of the communication of the tasks has great impact on the memory-
cycle trade-off. The direct interfaces in TTL provide benets both regarding the memory usage and the
cycle overhead. As expected, the most abstract interface type, CB, is also the most expensive to use. This
proves the value of automating transformations between the various implementation alternatives.
2006 by Taylor & Francis Group, LLC
25-18 Embedded Systems Handbook
4.4 4.5 4.5 4.55 4.6 4.65 4.7 4.75 4.8
x 10
7
#cycles
M
E
M
#
w
o
r
d
s
MEM versus #cycles
Full frame
1/2 frame
1/36 frame
1/4 frame
5000
6000
7000
8000
9000
10000
11000
12000
13000
FIGURE 25.19 Simulation results for RN, when changing the communication granularity.
x 10
7
10000
4.54 4.55 4.56 4.57 4.58 4.59 4.6 4.61 4.62 4.63
10500
11000
11500
12000
12500
13000
#cycles
M
E
M
#
w
o
r
d
s
MEM versus #cycles
Full frame
1/2 frame
1/4 frame
1/8 frame
1/16 frame
1/32 frame
FIGURE 25.20 Simulation results for CB, when changing the channel buffer size.
25.7 TTL in a Smart Imaging Core
The objective of this section is to show that the implementation of TTL in hardware, software, and
mixed hardware/software is possible with reasonable costs. The implementation allows the buffer size
and the buffer location to be changed and the channel administration to be relocated. This section rst
discusses the smart imaging core followed by a detailed description of the TTL implementation including
performance results.
25.7.1 The Smart Imaging Core
Smart imaging applications combine image and video capturing with the processing and/or interpretation
of the scene contents. An example is a camera that is able to segment a video sequence into objects, track
2006 by Taylor & Francis Group, LLC
Design and Programming of Embedded Multiprocessors 25-19
Motion
Estimator
Copro
ARM 9xx
CPU
ext.
Flash
I/O Interface
* -Timers,
Watchdog,
Interrupt Controller
CCIR/
Camera
Frontend
Video
Input
Smart
Imaging
Copro
off-chip
communication
TTL
TTL
DTL
DTL
HW Shell HW Shell
Data
IF
Data
IF
Data
IF
Data
IF
Data
IF
Data
IF
Data
IF
Data
IF
SW Shell
Communication interconnect
SW Shell
I/D Cache
embed.
RAM &
(boot)
ROM
Peri-
pherals*
Memory
Controller
(Flash & DRAM)
ext.
SDRAM
FIGURE 25.21 Architecture of the smart imaging core.
some of them, and raise an alarmif some of these objects show an unusual behaviour. The smart imaging
core described here can be embedded in a camera and is suited for automotive and mobile communication
applications. Example applications are pedestrian detection [19], low-speed obstacle detection [20], and
face tracking.
Each of the smart imaging applications uses low-level pixel processing, typically on image segments, for
an abstraction of the scene contents (feature extraction). Furthermore, motion segmentation is used to
help in tracking objects in the scene. The applications are structured such that the more control-oriented
parts are combined together in a task that ts well on a CPU. All the low-level pixel processing is combined
together in a pixel processing task, which is mapped onto a smart imaging coprocessor. Likewise, the main
processing part of the motion segmentation is described as an independent task, which is mapped onto a
motion estimator coprocessor. The architecture of the smart imaging core is depicted in Figure 25.21.
More details of the architecture can be found in Reference 21. The architecture globally consists of
an ARM CPU, a video input unit, and two coprocessors: the motion estimator (ME) and the smart
imaging (SI) coprocessor. The tasks on the coprocessors and the ARM communicate with each other
using the TTL interface. By adopting the TTL interface for the coprocessors, we expect that the integration
of these blocks into future systems will be signicantly simplied.
25.7.2 TTL Shells
This subsection presents the TTL shells used for the smart imaging core. These are a full hardware shell
for the SI coprocessor and software shells for the ARMand the motion estimator (VLIW) coprocessor.
25.7.2.1 TTL Shell for the SI Coprocessor
The TTL interface type used for the SI is the RB interface type using indirect data access. As already
explained in the Multi-DSP section a TTL channel implementation consists of two parts, the channel
buffer and the channel administration. In the SI core the channel buffers are always located in main
(on-chip) memory. The channel administration can be placed both in the shell and in main memory.
2006 by Taylor & Francis Group, LLC
25-20 Embedded Systems Handbook
Producer side Consumer side
BASE_POINTER
CH_SIZE
TOKEN_SIZE
N_WRITTEN
N_READ
WRITE_POINTER
REMOTE_POINTER
BASE_POINTER
CH_SIZE
TOKEN_SIZE
N_WRITTEN
N_READ
READ_POINTER
REMOTE_POINTER
Tokens
Bytes
Tokens
Tokens
Token aligned
Unit
FIGURE 25.22 Channel administration.
r
e
q
u
e
s
t
a
c
k
n
o
w
l
e
d
g
e
p
o
r
t
_
t
y
p
e
p
r
i
m
_
r
e
q
_
t
y
p
e
i
s
_
n
o
n
_
b
l
o
c
k
i
n
g
i
s
_
g
r
a
n
t
e
d
p
o
r
t
_
i
d
o
f
f
s
e
t
e
s
i
z
e
(
n
)
w
r
_
r
e
q
w
r
_
d
a
t
a
w
r
_
a
c
k
r
d
_
r
e
q
r
d
_
d
a
t
a
r
d
_
a
c
k
1 1 1 2 1 1 np 32 ns 1 32 1 1 32 1
Co-processor
Shell
FIGURE 25.23 TTL signal interface.
We alsouse twocopies of the channel administration; one at the producer side andanother at the consumer
side. Figure 25.22 depicts the channel administration structure.
To make sure that the channel status is handled correctly by both the producer and consumer,
without the need for atomic access to the variables of the channel administration, n_written,
n_read, read_pointer, and write_pointer are used. Only the producer modies n_written and
write_pointer. Similarly only the consumer modies n_read and read_pointer. The equation
n_written n_read is used to calculate the amount of data available in the channel while ch_size
(n_writtenn_read)is usedtoderive the amount of free room. The hardware implementationincludes
correct handling of wraparounds. Withthis approachboththe consumer and producer have a conservative
viewon the channel status. The use of two token counters n_read and n_writteninstead of two pointers
as in the Multi-DSP case is due to the variable token_size that can be handled with this implementation.
Using the counters the implementation of the acquire functions is more efcient because no multiplica-
tion with token_size is needed. The variable remote_pointer is used to reference the remote channel
administration. The variable base_pointer together with the offset parameter provided through the
TTL load and store calls is used to calculate the physical address for accessing the channel buffer. The
buffer behaves as a FIFO and is implemented as a xed size (ch_size) circular buffer. This results in
the equationaddress =base_pointer +(read_pointer +offset
g
i=1
n
i
, where n
i
is the number of patterns corresponding to conguration i, 1 i g. The BISTdesign
procedure described next is tailored to embed a given set of deterministic test cubes in the sequence of
g
i=1
n
i
patterns.
2006 by Taylor & Francis Group, LLC
Modular Testing and Built-In Self-Test 27-15
Pattern
counter
Configuration
counter
Stored
control bits
Data
from
LFSR
Scan chain 1 (l bits)
Scan chain 2 (l bits)
Scan chain 1 (l bits)
Scan chain 2 (l bits)
Decoder
LFSR
.
.
.
(a)
(b)
Multiplexer 1
Multiplexer 2
C
1
C
d1
. . .
D
1
D
g1
. . .
D
3
D
2
D
1
D
0
D
0
C
0
MISR
.
.
.
Scan chain m (l bits)
.
.
Reconfigurable
interconnection
network
(RIN)
.
FIGURE 27.9 (a) Proposed logic BIST architecture (b) RIN for m = 2 and g = 4. (From L. Li and K. Chakrabarty.
IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems, 23, 12891305, 2004. Withpermission.)
During test application, pseudorandom patterns that do not match any deterministic test cube are also
applied to the CUT. These pseudorandom patterns can potentially detect nonmodeled faults. However,
these patterns increase the testing time. Aparameter called MaxSkipPatterns, which is dened as the largest
number of pseudorandom patterns that are allowed between the matching of two deterministic cubes, is
used in the design procedure to limit the testing time. We rst need to determine for each conguration,
the number of patterns as well as the interconnections between the LFSR outputs and the scan chains. We
use the simulation procedure described next to solve this problem.
2006 by Taylor & Francis Group, LLC
27-16 Embedded Systems Handbook
t :
t1 : 0xxx1
t2 : 1xx1x
t3 : xx01x
t4 : xxx10
xxx10 xx01x 1xx1x 0xxx1
FIGURE 27.10 An illustration of converting a test cube to multiple scan chain format (m = 4, l = 5). (From L. Li
and K. Chakrabarty. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 23, 12891305,
2004. With permission.)
We start with an LFSR of length L, a predetermined seed, and a known characteristic polynomial. Let
T
D
= {c
1
, c
2
, . . . , c
n
} be the set of deterministic test cubes that must be applied to the CUT. The set T
D
can
either target all the single stuck-at faults in the circuit, or only the hard faults that cannot be detected by a
small number of pseudorandom patterns. As illustrated in Figure 27.10, each deterministic test cube c in
the test set is converted into the multiple scan chain format as a set of m l -bit vectors {t
1
, t
2
, . . . , t
m
}, where
m is the number of scan chains and l is the length of each scan chain. The bits in a test cube are ordered
such that the least signicant bit is rst shifted into the scan chain. We use Conn
(i )
j
to denote the set of
LFSR taps that are connected to the scan chain j in conguration i, where i = 1, 2, . . . , g, j = 1, 2, . . . , m.
The steps of the simulation procedure are as follows:
1. Set i = 1.
2. Set Conn
(i )
j
= {1, 2, . . . , L } for j = 1, 2, . . . , m, that is, initially, each scan chain can be connected
to any tap of the LFSR.
3. Driving the LFSR for the next l clock cycles, we obtain the output of the LFSR as a set of L l -bit
vectors {O
k
|k = 1, 2, . . . , L }, where vector O
k
is the output stream of the kth ipop of the LFSR
for the l clock cycles.
4. Find a test cube c
in T
D
that is compatible with the outputs of the LFSR under the current
connection conguration Conn
(i )
j
, that is, for all j = 1, . . . , m, there exists k Conn
(i )
j
such that
t
j
is compatible with O
k
, where c
has already been reformatted for m scan chains as a set of vector
{t
1
, t
2
, . . . , t
m
}. (A vector u
1
, u
2
, . . . , u
r
and a vector v
1
, v
2
, . . . , v
r
are mutually compatible if for
any i, 1 i r, one of the following holds: [i] u
i
= v
i
if they are both care bits; [ii] u
i
is a
dont-care bit; [iii] v
i
is a dont-care bit.)
5. If no test cube is found in Step 4, go to Step 6 directly. Otherwise, remove the test cube c
j
. Then set
Conn
(i )
j
= Conn
(i )
j
U.
6. If in the previous MaxSkipPatterns + 1 iterations, at least one test cube is found in Step 4, then go
to Step 3. Otherwise, the simulation for the current conguration is concluded. The patterns that
are applied to the circuit under this conguration are those that are obtained in Step 3.
7. Match the remaining cubes in T
D
to the test patterns for the current conguration, that is, if any
test vector in T
D
is compatible with any pattern for the current conguration, remove it from T
D
.
8. If no pseudorandom pattern for the current conguration is compatible with a test cube, the
procedure fails and exits. Otherwise, increase i by 1, and go to Step 2 to begin the iteration for the
next conguration until T
D
is empty.
Figure 27.11 shows a owchart corresponding to the above procedure, where the variable skip_patterns is
used to record the number of continuous patterns that are not compatible with any deterministic test cube,
and all _randoms is used to indicate if all the patterns for the current conguration are pseudorandom
patterns.
An example of the simulation procedure is illustrated in Figure 27.12. A 4-bit autonomous LFSR with
characteristic polynomial x
4
+x +1 is used to generate the pseudorandom patterns. There are four scan
chains and the length of each scan chain is 4-bits. The parameter MaxSkipPatterns is set to 1. The output of
the LFSRis divided into patterns p
i
, i = 1, 2, . . . . Each pattern consists of four 4 bit vectors. The procedure
2006 by Taylor & Francis Group, LLC
Modular Testing and Built-In Self-Test 27-17
i = 1; skip_patterns=0.
Obtain the outputs of LFSR for next l
clock cycles, {O
k
/ k =1, 2, , L}.
all_randoms=false; skip_ patterns=0;
remove c* from T
D
and
narrow down {Conn
j
(i )
}.
Does there exist
c*in T
D
that is compatible
with {O
k
} under current
Conn
j
(i )
?
skip_ patterns = skip_ patterns + 1.
skip_ patterns=
MaxSkipPatterns+1?
Compact the remaining cubes in
T
D
to the test patterns for the
current configuration.
Is T
D
empty?
all_randoms
= true?
End
i =i +1
No
Yes
No
Yes
No
Yes
Yes
No
Fail!
Conn
j
(i)
={1, 2, , L}, j = 1, 2, , m;
all_randoms =true.
Start
FIGURE 27.11 Flowchart illustrating the simulation procedure. (From L. Li and K. Chakrabarty. IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, 23, 12891305, 2004. With permission.)
that determines the connections is shown as Step (Init) to Step (f). Step (Init) is the initialization step in
which all the connections Conn
(1)
j
, j = 1, 2, 3, 4 are set to {1, 2, 3, 4}. In Step (a), the rst pattern p
1
is
matched with the test cube c
1
, and the connections are shown for each scan chain: scan chain 1 can be
connected to x
1
or x
4
, both scan chain 2 and scan chain 3 can only be connected to x
2
, scan chain 4 can
be connected to x
1
, x
2
, or x
4
. In Step (c), none of the cubes is compatible with p
3
. When neither p
5
nor p
6
matches any cubes in Step (e), the iterations for the current conguration are terminated. The patterns
that are applied to the CUT in this conguration are p
1
, p
2
, . . . , p
6
. We then compare the remaining
cube c
4
with the six patterns and nd that it is compatible with p
2
. So c
4
is also covered by the test patterns
for the current conguration. Thus the connections for this conguration are: scan chain 1 is connected
to x
4
, both scan chain 2 and scan chain 3 are connected to x
2
, scan chain 4 is connected to x
1
. Since p
5
and p
6
are not compatible with any deterministic cubes, the number of patterns for this conguration
is set to 4. If there are test cubes remaining to be matched, the iteration for the next conguration starts
from p
5
.
27.3.1 Declustering the Care Bits
The simulationprocedure to determine the number of patterns and the connections for eachconguration
can sometimes fail to embed the test cubes in the LFSR sequence. This can happen if MaxSkipPatterns
2006 by Taylor & Francis Group, LLC
27-18 Embedded Systems Handbook
:
:
:
:
x
3
x
2
x
4
x
1
. . .
. . .
. . .
. . .
. . .
0001
1000
0100
0010
1001
1100
0110
1011
0101
1010
1101
1110
1111
0111
0011
0001
1000
0100
0010
1001
1100
0110
1011
0101
s
1
s
2
s
3
s
4
s
5
s
6
00xx
1xx0
10xx
x0xx
0xxx
xx1x
01xx
11xx
xx11
x10x
x1x0
10xx
xx11
1xxx
01xx
x001
t
1
t
2
t
3
t
4
(c) s
3
:none
(1, 4)
(2)
(2)
(4)
(f) s
2
:t
4
(4)
(2)
(2)
(1)
(e) s
5
, s
6
:none
(4)
(2)
(2)
(1)
(d) s
4
:t
2
(4)
(2)
(2)
(1)
(1, 2, 4)
(2)
(1, 4)
(a) s
1
:t
1
(2)
(b) s
2
:t
3
(4)
(2)
(2)
(1, 4)
(1,2,3,4)
(1,2,3,4)
(1,2,3,4)
(1,2,3,4)
Output of LFSR:
LFSR:
Test cubes:
x
1
x
2
x
3
x
4
Determination of connections:
Conn
11
:
Conn
12
:
Conn
13
:
Conn
14
:
(Init)
FIGURE 27.12 An illustration of the simulation procedure. (From L. Li and K. Chakrabarty. IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, 23, 12891305, 2004. With permission.)
is too small, or the test cubes are hard to match with the outputs of the LFSR. During our experiments,
we found that it was very difcult to embed the test cubes for the s38417 benchmark circuit. On closer
inspection, we found that the care bits in some of the test cubes for s38417 are highly clustered, even
though the percentage of care bits in T
D
is small. When these test cubes are converted into a multiple scan
chain format, most of the vectors contain very few care bits but a few vectors contain a large number of
care bits. These vectors with many care bits are hard to embed in the output sequence of the LFSR.
In order to embed test cubes with highly clustered care bits, we propose two declustering strategies. The
rst is to reorganize the scan chains such that the care bits can be scattered across many scan chains, and
each scan chain contains only a few care bits. Another strategy is based on the use of additional logic to
interleave the data that are shifted into the different scan chains. The rst strategy requires reorganization
of the scan chains, but it does not require extra hardware overhead. Care needs to be taken in scan chain
redesign to avoid timing closure problems. The interleaving method does not modify the scan chains, but
it requires additional hardware and control mechanisms.
The method of reorganization of scan chains is illustrated in Figure 27.13. As shown in the gure,
before the reorganization, all the care bits of the given test cube are grouped in the second vector, which is
2006 by Taylor & Francis Group, LLC
Modular Testing and Built-In Self-Test 27-19
Scan chain 0
Scan chain 1
Scan chain 2
Scan chain 3
Scan chain 4
Scan chain 0
Scan chain 1
Scan chain 2
Scan chain 3
Scan chain 4
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
25 26 27 28 29 30
1
7
13
19
25
2
8
14
3
9
15
4
10 5
6
12
18
24
30 20
26 21
27
16
22
28
11
17
23
29
Scan cells Vectors
x x x x 1 x
x x x 1 x x
x x 0 x x x
x 0 x x x x
1 x x x x 0
x x x x x x
1 0 0 1 1 0
x x x x x x
x x x x x x
x x x x x x
Reorganization
Test cube : x x x x x xx x x x x xx x x x x x1 0 0 1 10 x x x x x x
Reformat
FIGURE 27.13 An illustration of the reorganization of scan chains. (From L. Li and K. Chakrabarty. IEEETransactions
on Computer-Aided Design of Integrated Circuits and Systems, 23, 12891305, 2004. With permission.)
hard to match with the output of LFSR. After the reorganization, the care bits are scattered across all the
vectors, and the largest number of care bits in a vector is only two. This greatly increases the probability
that this vector can be matched to an output pattern of the LFSR. Note that the concept of reorganization
of scan chains is also used in [9]. However, the reorganization used in [9] changes the scan chain structure
and makes it unsuitable for response capture a separate solution is needed in [9] to circumvent this
problem. In our approach, the basic structure of the scan chains is maintained and the usual scan test
procedure of pattern shift-in, response capture, and shift-out can be used.
The scan cells in the CUT can be indexed as c
i,j
, i = 0, 1, . . . , m 1, j = 0, 1, . . . , l 1, where m is the
number of scan chains and l is the length of a scan chain. Note that we start the indices from 0 to facilitate
the description of the scan chain reorganization procedure. The ith scan chain consists of the l scan cells
c
i,j
, j = 0, 1, . . . , l 1. We use c
i,j
to denote the reorganized scan cells, in which the ith scan chain consists
of the l scan cells c
i,j
, j = 0, 1, . . . , l 1. For each j = 0, 1, . . . , l 1, the m cells c
0,j
, c
1,j
, . . . , c
m 1,j
constitute a vertical vector. The reorganized scan cell structure is obtained by rotating each such vertical
vector upwards by d positions, where d = j mod m, that is, c
i,j
= c
k,j
, where k is given by k = (i + d)
mod m.
An alternative method for declustering, based on the interleaving of the inputs to the scan chains, is
shown in Figure 27.14. We insert an extra stage of multiplexers between the outputs of the RIN and the
inputs of the scan chains. Fromthe perspective of the RIN, the logic that follows it, that is, the combination
of the multiplexers for interleaving and the scan chains, is simply a reorganized scan chain with an
appropriate arrangement of the connections between the two stages of multiplexers. For a CUT with m
scan chains, m multiplexers are used for reconguration, and m multiplexers are inserted for interleaving.
Each of the multiplexers used for interleaving has m inputs, which are selected in ascending order during
the shifting in of a test pattern, that is, the rst input is selected for the rst scan clock cycle, the second
input is selected for the second scan clock cycle, and so on. After the mth input is selected, the procedure
is repeated with the rst input. We use A
i
to denote the output the ith multiplexers for reconguration
and B
i,j
to denote the jth input of the ith multiplexers for interleaving, where i, j = 1, 2, . . . , m. The
interleaving is carried out by connecting the inputs of the multiplexers for interleaving with the outputs
2006 by Taylor & Francis Group, LLC
27-20 Embedded Systems Handbook
Scan chain 2
Scan chain 3
Scan chain 4
Scan chain 5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Scan chain 1
Multiplexers
for interleaving
Multiplexers
for reconfiguration
...
...
...
FIGURE 27.14 An illustration of interleaving of the inputs of scan chains. (From L. Li and K. Chakrabarty.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 23, 12891305, 2004. With
permission.)
of multiplexers for reconguration such that
B
i,j
=
A
i j +1
if i j
A
i j +1+m
if i < j
In order to control the multiplexers for interleaving, an architecture similar to the control logic for the
recongurable interconnection network can be used. However, for the interleaving, we do not need any
storage and the pattern counter. A bit counter counting up to m 1 (where m is the number of scan
chains) is used to replace the conguration counter. The bit counter is reset to 0 at the start of the shifting
in of each pattern, and it returns to 0 after counting to m 1.
Consider the test cube shown in Figure 27.13. After adding the second stage of multiplexers and connect-
ing the inputs of the multiplexers for interleaving with the outputs of the multiplexers for reconguration,
as shown in Figure 27.14 (only the connections related to the rst RIN multiplexer are shown for clarity),
the output of the rst multiplexer for reconguration should match withxxxx1x, the same string as that
in scan cell reorganization method. Note that the above reorganization and interleaving procedures yield
the same set of test cubes.
Detailed simulation results for benchmark circuits are presented in [83]. Here we discuss the inuence
of the initial seed on the effectiveness of test set embedding. Experiments were carried out with 20
randomly-selected initial seeds for the test set from [9] targeting all faults with scan cell reorganization
and scan chains. The statistics on the number of congurations are listed in Table 27.1(A). We also carried
out the same experiments for the test set from [9] targeting random-pattern-resistant faults and listed
results in Table 27.1(B). The results show that the number of congurations depends on the initial seed.
However, the dependency is not very signicant due in part to the recongurability of the interconnection
network.
In order to evaluate the effectiveness of the proposed approach for large circuits, we applied the method
to test sets for two production circuits fromIBM, namely CKT1 and CKT2. CKT1 is a logic core consisting
of 51,082 gates and its test set provides 99.80% fault coverage. CKT2 is a logic core consisting of 94,340
gates and its test set provides 99.76% fault coverage. The number of scan chains is xed to 64 and 128
for each of these two circuits. We modied the simulation procedure such that the conguration of the
interconnection network can be changed during the shifting in of a test cube, and we set the parameter
2006 by Taylor & Francis Group, LLC
Modular Testing and Built-In Self-Test 27-21
TABLE 27.1 Statistics on the Number of the Congurations with Random Seeds for Test Sets from [9] Targeting
(A) All Faults and (B) Random-Pattern-Resistant Faults, with Scan Chain Reorganization (Assuming 32 Scan
Chains for Each Circuit)
(A) All faults (B) Random-pattern-resistant faults
Circuit Minimum Maximum Mean
Standard
deviation Circuit Minimum Maximum Mean
Standard
deviation
s5378 6 9 7.2 0.83 s5378 3 5 3.55 0.60
s9234 29 33 30.5 1.10 s9234 32 36 33.7 1.38
s13207 10 14 11.85 1.18 s13207 5 8 5.95 0.89
s15850 16 20 17.5 1.24 s15850 19 25 21.8 1.51
s38417 180 192 185.9 3.23 s38417 118 129 121.45 3.07
s38584 9 12 9.8 0.89 s38584 9 12 10.8 0.83
Source: From L. Li and K. Chakrabarty. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
23, 12891305, 2004. With permission.
TABLE 27.2 Results for Test Cubes for Circuits from IBM
Length Testing Hardware
No. of No. of No. of of scan time overhead in Storage
test scan scan chain No. of (clock GEs, and as a requirement Encoding
Circuit cubes cells chains (bits) congs cycles) percentage (bits) efciency CPU time
CKT1 17,176 12,256 64 192 1,792 1,351,104 68,145.5 (8.52%) 21,504 46.79 1 h 37 min
128 96 1,079 566,496 75,579.5 (9.45%) 12,948 77.71 1 h 26 min
CKT2 43,079 22,216 64 348 3,221 4,051,764 124,062.5 (7.26%) 38,652 26.03 6 h 35 min
128 174 1,828 2,338,005 128,009.5 (7.49%) 21,936 45.87 6 h 06 min
Source: From L. Li and K. Chakrabarty. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
23, 12891305, 2004. With permission.
TABLE 27.3 The Number of Recongurations Per Pattern for Test Sets IBM
No. of recongurations per pattern
Circuit
No. of scan
chains
Length of a test cube
per scan chain (bits) Minimum Maximum Mean Standard deviation
CKT1 64 192 0 3 0.11 0.0210
128 96 0 3 0.07 0.0354
CKT2 64 348 0 3 0.11 0.0204
128 174 0 15 0.06 0.3018
Source: From L. Li and K. Chakrabarty. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 23, 12891305, 2004. With permission.
MaxSkipPatterns to 0. Accordingly, in the proposed BIST architecture shown in Figure 27.9(a), the stored
control bits are the number of bits per conguration instead of the number of patterns per conguration,
and the pattern counter is replaced by a bit counter that counts the number of bits that have been
shifted into the scan chains. Table 27.2 lists the results for these two industrial circuits. The hardware
overhead is less than 10% and very high encoding efciency (up to 77.71) is achieved for both circuits.
As mentioned above, we allow the conguration of the interconnection network to be changed during
the shifting in of a test cube. Table 27.3, Figure 27.15, and Figure 27.16 present the statistics on the
number of recongurations per test cube. The number of intrapattern congurations is small for both
circuits.
2006 by Taylor & Francis Group, LLC
27-22 Embedded Systems Handbook
0 1 2 3
0
20
40
60
80
95
No. of reconfigurations per pattern
P
e
r
c
e
n
t
a
g
e
o
f
p
a
t
t
e
r
n
s
64 scan chains
128 scan chains
FIGURE 27.15 The number of patterns versus the number of recongurations needed for CKT1. (From L. Li and
K. Chakrabarty. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 23, 12891305, 2004.
With permission.)
0 1
0
20
40
60
80
100
No. of reconfigurations r per pattern, 0 r 1
(a)
P
e
r
c
e
n
t
a
g
e
o
f
p
a
t
t
e
r
n
s
64 scan chains
128 scan chains
2 5 10 15
0
0.005
0.01
0.015
0.02
0.025
0.03
(b)
No. of reconfigurations r per pattern, r 1
P
e
r
c
e
n
t
a
g
e
o
f
p
a
t
t
e
r
n
s
64 scan chains
128 scan chains
FIGURE 27.16 The number of patterns versus the number of recongurations r needed for CKT2. (a) 0 r 1,
(b) r 2. (From L. Li and K. Chakrabarty. IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, 23, 12891305, 2004. With permission.)
27.4 Conclusions
Rapid advances in test development techniques are needed to reduce the test cost of million-gate SOC
devices. This survey has described a number of state-of-the-art techniques for reducing test time and
test data volume, thereby decreasing test cost. Modular test techniques for digital, mixed-signal, and
hierarchical SOCs must develop further to keep pace with design complexity and integration density.
The test data bandwidth needs for analog cores are signicantly different than that for digital cores,
therefore unied top-level testing of mixed-signal SOCs remains a major challenge. Most SOCs today
include embedded cores that operate in multiple clock domains. Since the forthcoming P1500 standard
does not address wrapper design for at-speed testing of such cores, research is needed to develop wrapper
design techniques for multifrequency cores. There is also a pressing need for test planning methods that
can efciently schedule tests for these multifrequency cores. The work reported in [41] is a promising
rst step in this direction. In addition, compression techniques for embedded cores also need to be
developed and rened. Of particular interest are techniques that can combine TAM optimization and
2006 by Taylor & Francis Group, LLC
Modular Testing and Built-In Self-Test 27-23
test scheduling with test data compression. Some preliminary studies on this problem have been reported
recently [84,85].
We have also reviewed a new approach for deterministic BIST based on the use of a RIN. The RIN
is placed between the outputs of pseudorandom pattern generator, for example, an LFSR, and the scan
inputs of the CUT. It consists only of multiplexer switches and it is designed using a synthesis procedure
that takes as inputs the pseudorandom sequence from the LFSR and the deterministic test cubes for the
CUT. As a nonintrusive BIST solution, the proposed approach does not require any circuit redesign and
it has minimal impact on circuit performance.
Acknowledgments
This survey is based on joint work and papers published with several students and colleagues. In particular,
the author acknowledges Anshuman Chandra, Vikram Iyengar, Lei Li, Erik Jan Marinissen, Sule Ozev, and
Anuja Sehgal.
References
[1] M.L. Bushnell and V.D. Agrawal. Essentials of Electronic Testing. Kluwer Academic Publishers,
Norwell, MA, 2000.
[2] Semiconductor Industry Association. International Technology Roadmap for Semiconductors, 2001
Edition. http://public.itrs.net/Files/2001ITRS/Home.htm
[3] A. Khoche and J. Rivoir. I/O bandwidth bottleneck for test: is it real? Test Resource Partitioning
Workshop, 2002.
[4] G. Hetherington, T. Fryars, N. Tamarapalli, M. Kassab, A. Hassan, and J. Rajski. Logic BIST
for large industrial designs: real issues and case studies. In Proceedings of the International Test
Conference, pp. 358367, 1999.
[5] Y. Zorian, E.J. Marinissen, and S. Dey. Testing embedded-core-based system chips. IEEE
Computer, 32, 5260, 1999.
[6] O. Farnsworth. IBM Corp., personal communication, April 2003.
[7] K. Chakrabarty. Test scheduling for core-based systems using mixed-integer linear programming.
IEEE Transactions on Computer-Aided Design, 19, 11631174, 2000.
[8] M. Sugihara, H. Date, and H. Yasuura. A novel test methodology for core-based system LSIs
and a testing time minimization problem. In Proceedings of the International Test Conference,
pp. 465472, 1998.
[9] S. Hellebrand, H.-G. Liang, and H.-J. Wunderlich. A mixed-mode BIST scheme based
on reseeding of folding counters. In Proceedings of the International Test Conference,
pp. 778784, 2000.
[10] C.V. Krishna, A. Jas, and N.A. Touba. Test vector encoding using partial LFSR reseeding.
In Proceedings of the International Test Conference, pp. 885893, 2001.
[11] H.-G. Liang, S. Hellebrand, and H.-J. Wunderlich. Two-dimensional test data compres-
sion for scan-based deterministic BIST. In Proceedings of the International Test Conference,
pp. 894902, 2001.
[12] J. Rajski, J. Tyszer, and N. Zacharia. Test data decompression for multiple scan designs with
boundary scan. IEEE Transactions on Computers, 47, 11881200, 1998.
[13] N.A. Touba and E.J. McCluskey. Altering a pseudo-random bit sequence for scan based.
In Proceedings of the International Test Conference, 1996, pp. 167175.
[14] S. Wang. Low hardware overhead scan based 3-weight weighted random BIST. In Proceedings of
the International Test Conference, pp. 868877, 2001.
[15] H.-J. Wunderlich and G. Kiefer. Bit-ipping BIST. In Proceedings of the International Conference
on Computer-Aided Design, pp. 337343, 1996.
2006 by Taylor & Francis Group, LLC
27-24 Embedded Systems Handbook
[16] A.A. Al-Yamani and E.J. McCluskey. Built-in reseeding for serial BIST. In Proceedings of the VLSI
Test Symposium, pp. 6368, 2003.
[17] A.A. Al-Yamani and E.J. McCluskey. BIST reseeding with very few seeds. In Proceedings of the
VLSI Test Symposium, pp. 6974, 2003.
[18] S. Chiusano, P. Prinetto, and H.-J. Wunderlich. Non-intrusive BIST for systems-on-a-chip.
In Proceedings of the International Test Conference, pp. 644651, 2000.
[19] S. Hellebrand, S. Tarnick, J. Rajski, and B. Courtois. Generation of vector patterns through
reseeding of multiple-polynomial linear feedback shift registers. InProceedings of the International
Test Conference, pp. 120129, 1992.
[20] M.F. Alshaibi and C.R. Kime. Fixed-biased pseudorandom built-in self-test for random pattern
resistant circuits. In Proceedings of the International Test Conference, pp. 929938, 1994.
[21] M.F. AlShaibi and C.R. Kime. MFBIST: a BIST method for random pattern resistant circuits.
In Proceedings of the International Test Conference, pp. 176185, 1996.
[22] S. Pateras and J. Rajski. Cube-contained random patterns and their application to the complete
testing of synthesized multi-level circuits. In Proceedings of the International Test Conference,
pp. 473482, 1991.
[23] N.A. Touba and E.J. McCluskey. Synthesis of mapping logic for generating transformed pseudo-
random patterns for BIST. In Proceedings of the International Test Conference, pp. 674682, 1995.
[24] N.A. Touba and E.J. McCluskey. Transformed pseudo-random patterns for BIST. In Proceedings
of the VLSI Test Symposium, pp. 410416, 1995.
[25] M. Bershteyn. Calculation of multiple sets of weights for weighted randomtesting. In Proceedings
of the International Test Conference, pp. 10311040, 1993.
[26] F. Brglez, G. Gloster, and G. Kedem. Built-in self-test with weighted random pattern hardware.
In Proceedings of the International Conference on Computer Design, pp. 161166, 1990.
[27] F. Muradali, V.K. Agarwal, and B. Nadeau-Dostie. Anewprocedure for weighted randombuilt-in
self-test. In Proceedings of the International Test Conference, pp. 660669, 1990.
[28] I. Pomeranz and S.M. Reddy. 3-weight pseudo-random test generation based on a deterministic
test set for combinational and sequential circuits. IEEE Transactions on Computer-Aided Design,
12, 10501058, 1993.
[29] A. Jas, C.V. Krishna, and N.A. Touba. Hybrid BIST based on weighted pseudo-random testing:
a newtest resource partitioning scheme. In Proceedings of the VLSI Test Symposium, pp. 28, 2001.
[30] K. Chakrabarty. Optimal test access architectures for system-on-a-chip. ACM Transactions on
Design Automation of Electronic Systems, 6, 2649, 2001.
[31] V. Iyengar, K. Chakrabarty, and E.J. Marinissen. Test wrapper and test access mechanism
co-optimization for system-on-chip. Journal of Electronic Testing: Theory and Applications,
18, 213230, 2002.
[32] E.J. Marinissen, S.K. Goel, and M. Lousberg. Wrapper design for embedded core test. In
Proceedings of the International Test Conference, pp. 911920, 2000.
[33] V. Iyengar and K. Chakrabarty. System-on-a-chip test scheduling with precedence relationships,
preemption, and power constraints. IEEE Transactions on Computer-Aided Design of ICs and
Systems, 21, 10881094, 2002.
[34] E.J. Marinissen and H. Vranken. On the role of DfT in ICATE matching. In International
Workshop on TRP, 2001.
[35] E. Volkerink et al. Test economics for multi-site test with modern cost reduction techniques.
In Proceedings of the VLSI Test Symposium, pp. 411416, 2002.
[36] M. Abramovici, M.A. Breuer, and A.D. Friedman. Digital Systems Testing and Testable Design.
Computer Science Press, New York, 1990.
[37] P. Varma and S. Bhatia. A structured test re-use methodology for core-based system chips.
In Proceedings of the International Test Conference, pp. 294302, 1998.
[38] E.J. Marinissen et al. A structured and scalable mechanism for test access to embedded reusable
cores. In Proceedings of the International Test Conference, pp. 284293, 1998.
2006 by Taylor & Francis Group, LLC
Modular Testing and Built-In Self-Test 27-25
[39] T.J. Chakraborty, S. Bhawmik, and C.-H. Chiang. Test access methodology for system-on-chip
testing. In Proceedings of the International Workshop on Testing Embedded Core-Based System-
Chips, pp. 1.1-11.1-7, 2000.
[40] Q. Xu and N. Nicolici. On reducing wrapper boundary register cells in modular SOC testing.
In Proceedings of the International Test Conference, pp. 622631, 2003.
[41] Q. Xu and N. Nicolici. Wrapper design for testing IP cores with multiple clock
domains. In Proceedings of the Design, Automation and Test in Europe (DATE) Conference,
pp. 416421, 2004.
[42] V. Immaneni and S. Raman. Direct access test scheme design of block and core cells for
embedded ASICs. In Proceedings of the International Test Conference, pp. 488492, 1990.
[43] P. Harrod. Testing re-usable IP: a case study. In Proceedings of the International Test Conference,
pp. 493498, 1999.
[44] I. Ghosh, S. Dey, and N.K. Jha. A fast and low cost testing technique for core-based system-on-
chip. In Proceedings of the Design Automation Conference, pp. 542547, 1998.
[45] K. Chakrabarty. A synthesis-for-transparency approach for hierarchical and system-on-a-chip
test. IEEE Transactions on VLSI Systems, 11, 167179, 2003.
[46] M. Nourani and C. Papachristou. An ILP formulation to optimize test access mech-
anism in system-on-chip testing. In Proceedings of the International Test Conference,
pp. 902910, 2000.
[47] L. Whetsel. An IEEE 1149.1 based test access architecture for ICs with embedded cores. In
Proceedings of the International Test Conference, pp. 6978, 1997.
[48] N.A. Touba and B. Pouya. Using partial isolation rings to test core-based designs. IEEE Design
and Test of Computers, 14, 5259, 1997.
[49] J. Aerts and E.J. Marinissen. Scan chain design for test time reduction in core-based ICs. In
Proceedings of the International Test Conference, pp. 448457, 1998.
[50] V. Iyengar, K. Chakrabarty, andE.J. Marinissen. Test access mechanismoptimization, test schedul-
ing and tester data volume reduction for system-on-chip. IEEE Transactions on Computers, 52,
16191632, 2003.
[51] V. Iyengar, K. Chakrabarty, and E.J. Marinissen. Recent advances in TAM optimization, test
scheduling, andtest resource management for modular testing of core-basedSOCs. InProceedings
of the IEEE Asian Test Symposium, pp. 320325, 2002.
[52] Z.S. Ebadi and A. Ivanov. Design of an optimal test access architecture using a genetic algorithm.
In Proceedings of the Asian Test Symposium, pp. 205210, 2001.
[53] V. Iyengar and K. Chakrabarty. Test bus sizing for system-on-a-chip. IEEE Transactions on
Computers, 51, 449459, 2002.
[54] Y. Huang et al. Resource allocation and test scheduling for concurrent test of core-based SOC
design. In Proceedings of the Asian Test Symposium, pp. 265270, 2001.
[55] Y. Huang et al. On concurrent test of core-based SOC design. Journal of Electronic Testing: Theory
and Applications, 18, 401414, 2002.
[56] V. Iyengar, K. Chakrabarty, and E. J. Marinissen. Efcient test access mechanism optimization for
system-on-chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
22, 635643, 2003.
[57] E.J. Marinissen and S.K. Goel. Analysis of test bandwidth utilization in test bus and TestRail
architectures in SOCs. Digest of Papers of DDECS, pp. 5260, 2002.
[58] P.T. Gonciari, B. Al-Hashimi, and N. Nicolici. Addressing useless test data in core-based
system-on-a-chip test. IEEE Transactions on Computer-Aided Design of ICs and Systems, 22,
15681590, 2003.
[59] S.K. Goel and E.J. Marinissen. Effective and efcient test architecture design for SOCs.
In Proceedings of the International Test Conference, pp. 529538, 2002.
[60] W. Jiang and B. Vinnakota. Defect-oriented test scheduling. In Proceedings of the VLSI Test
Symposium, pp. 433438, 1999.
2006 by Taylor & Francis Group, LLC
27-26 Embedded Systems Handbook
[61] E. Larsson, J. Pouget, and Z. Peng. Defect-aware SOC test scheduling. In Proceedings of the VLSI
Test Symposium, pp. 359364, 2004.
[62] F. Beenker, B. Bennetts, and L. Thijssen. Testability Concepts for Digital ICs The Macro Test
Approach. Frontiers in Electronic Testing, Vol. 3. Kluwer Academic Publishers, Boston, MA, 1995.
[63] E.J. Marinissen et al. On IEEE P1500s standard for embedded core test. Journal of Electronic
Testing: Theory and Applications, 18, 365383, 2002.
[64] Y. Zorian. A distributed BIST control scheme for complex VLSI devices. In Proceedings of the
VLSI Test Symposium, pp. 611, 1993.
[65] R.M. Chou, K.K. Saluja, and V.D. Agrawal. Scheduling tests for VLSI systems under power
constraints. IEEE Transactions on VLSI Systems, 5, 175184, 1997.
[66] V. Muresan, X. Wang, and M. Vladutiu. A comparison of classical scheduling approaches in
power-constrained block-test scheduling. In Proceedings of the International Test Conference,
pp. 882891, 2000.
[67] E. Larsson and Z. Peng. Test scheduling and scan-chain division under power constraint. In
Proceedings of the Asian Test Symposium, pp. 259264, 2001.
[68] E. Larsson and Z. Peng. An integrated system-on-chip test framework. In Proceedings of the DATE
Conference, pp. 138144, 2001.
[69] S. Koranne. On test scheduling for core-based SOCs. In Proceedings of the International Conference
VLSI Design, pp. 505510, 2002.
[70] V. Iyengar, S.K. Goel, E.J. Marinissen, and K. Chakrabarty. Test resource optimization for multi-
site testing of SOCs under ATE memory depth constraints. In Proceedings of the International Test
Conference, pp. 11591168, 2002.
[71] S. Koranne and V. Iyengar. A novel representation of embedded core test schedules. In Proceedings
of the International Test Conference, 2002, pp. 539540.
[72] A. Sehgal, V. Iyengar, M.D. Krasniewski, and K. Chakrabarty. Test cost reduction for SOCs using
virtual TAMs and Lagrange multipliers. In Proceedings of the IEEE/ACM Design Automation
Conference, pp. 738743, 2003.
[73] A. Sehgal, V. Iyengar, and K. Chakrabarty. SOC test planning using virtual test access architectures.
IEEE Transactions on VLSI Systems, 12: 12631276, December, 2004.
[74] A. Sehgal and K. Chakrabarty. Efcient modular testing of SOCs using dual-speed TAM architec-
tures. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE) Conference,
pp. 422427, 2004.
[75] Agilent Technologies. Winning in the SOC market, available online at: http://cp.literature.
agilent.com/litweb/pdf/5988-7344EN.pdf
[76] Teradyne Technologies. Tiger: advanced digital with silicon Germanium technology.
http://www.teradyne.com/tiger/digital.html
[77] T. Yamamoto, S.-I. Gotoh, T. Takahashi, K. Irie, K. Ohshima, and N. Mimura. A mixed-signal
0.1865 m CMOS SoC for DVD systems with 432-MSample/s PRML read channel and 16-Mb
embedded DRAM. IEEE Journal of Solid-State Circuits, 36, 17851794, 2001.
[78] H. Kundert, K. Chang, D. Jefferies, G. Lamant, E. Malavasi, and F. Sendig. Design of mixed-
signal systems-on-a-chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, 19, 15611571, 2000.
[79] E. Liu, C. Wong, Q. Shami, S. Mohapatra, R. Landy, P. Sheldon, and G. Woodward. Complete
mixed-signal building blocks for single-chip GSM baseband processing. In Proceedings of the
IEEE Custom Integrated Circuits Conference, pp. 1114, 1998.
[80] A. Cron. IEEE P1149.4 almost a standard. In Proceedings of the International Test Conference,
pp. 174182, 1997.
[81] S.K. Sunter. Cost/benet analysis of the P1149.4 mixed-signal test bus. In IEEE Proceedings
Circuits, Devices and Systems, 143, 393398, 1996.
[82] A. Sehgal, S. Ozev, and K. Chakrabarty. TAM optimization for mixed-signal SOCs using
test wrappers for analog cores. In Proceedings of the IEEE International Conference on CAD,
pp. 9599, 2003.
2006 by Taylor & Francis Group, LLC
Modular Testing and Built-In Self-Test 27-27
[83] L. Li and K. Chakrabarty. Test set embedding for deterministic BIST using a recongurable
interconnection network. IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, 23, 12891305, 2004.
[84] V. Iyengar, A. Chandra, S. Schweizer, and K. Chakrabarty. A unied approach for SOC testing
using test data compression and TAM optimization. In Proceedings of the IEEE/ACM Design,
Automation and Test in Europe (DATE) Conference, pp. 11881189, 2003.
[85] P.T. Gonciari and B. Al-Hashimi. A compression-driven test access mechanism design approach.
In Proceedings of the European Test Symposium, 2004, pp. 100105.
2006 by Taylor & Francis Group, LLC
28
Embedded
Software-Based
Self-Testing for
SoC Design
Kwang-Ting (Tim) Cheng
University of California at Santa
Barbara
28.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-2
28.2 Embedded Processor Self-Testing. . . . . . . . . . . . . . . . . . . . . . . 28-4
Stuck-At Fault Testing
28.3 Test Program Synthesis Using VCCs . . . . . . . . . . . . . . . . . . . . 28-6
28.4 Delay Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-10
28.5 Embedded Processor Self-Diagnosis . . . . . . . . . . . . . . . . . . . 28-11
28.6 Self-Testing of Buses and Global Interconnects. . . . . . . . 28-11
28.7 Self-Testing of Other Nonprogrammable IP Cores. . . . 28-14
28.8 Instruction-Level DfT/Test Instructions . . . . . . . . . . . . . . . 28-15
28.9 Self-Test of On-Chip ADC/DAC and Analog
Components Using DSP-Based Approaches . . . . . . . . . . . 28-16
28.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-17
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-17
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-18
The increasing heterogeneity and programmability associated with system-on-chip (SoC) architecture,
together with ever-increasing operating frequencies and technology changes, are demanding fundamental
changes in integrated circuit (IC) testing. At-speed testing of high-speed circuits with external testers
is becoming increasingly difcult owing to the growing gap between design and tester performance,
the growing cost of high-performance testers, and the increasing yield loss caused by inherent tester
inaccuracy. Therefore, empowering the chip to test itself seems like a sensible solution. Hardware-based
self-testing techniques (known as built-in self test, or BIST) have limitations owing to performance, area,
and design time overhead, as well as problems caused by the application of nonfunctional patterns (which
may result in higher power consumption during testing, over-testing, yield loss problems, etc).
The embedded software-based self-testing technique has recently become the focus of intense research.
One guiding principle of this embedded self-test paradigm is to utilize on-chip programmable resources
(such as embedded microprocessors and digital signal processors, DSPs) for on-chip test generation, test
delivery, signal acquisition, response analysis, and even diagnosis. After the programmable components
28-1
2006 by Taylor & Francis Group, LLC
28-2 Embedded Systems Handbook
have been self-tested, they can be reused for testing on-chip buses, interfaces, and other nonprogrammable
components. Embedded test techniques based on such a principle reduces the need for dedicated test
hardware and makes possible relatively easier applications and more accurate analysis of at-speed test
signals on-chip. In this chapter, we give a survey and outline of the roadmap of this emerging embedded
software-based self-testing paradigm.
28.1 Introduction
System-on-chip has become a widely accepted architecture for highly complex systems on a single chip.
Short time-to-market and rich functionality requirements have driven the design houses to adopt the
SoC design ow. A SoC contains a large number of complex, heterogeneous components that can include
digital, analog, mixed-signal, radio frequency (RF), micromechanical, and other systems on a single piece
of silicon. As the lines gradually fade between traditional digital, analog, RF, and mixed-signal devices;
as operational frequencies are rapidly increasing; and as the feature sizes are shrinking, testing is facing a
whole new set of challenges.
Figure 28.1 shows the cost of silicon manufacturing versus the cost of testing given in the SIA and ITRS
roadmaps [1,2]. The top curve shows the fabrication capital per transistor cost reduction (Moores law).
The bottom curve shows the test capital per transistor (Moores law for test). From the ITRS roadmap it is
clear that unless fundamental changes to test are made, it may cost more, in the future, to test the chip than
to manufacture it [2]. Figure 28.1 also shows the historical trend in the test paradigms. On one hand, the
high cost of manually developed functional tests and difculties in translating the embedded component
tests to the chip boundary where the automatic test equipment (ATE) interface exists are making these
tests infeasible even for very high-volume products. On the other hand, even if automatically developed
structural tests (such as scan tests) are available, their application using ATEs poses challenges because the
testers performance is increasing at a slower rate than the device speed. This translates into an increasing
yield loss owing to external testing since guard-banding to cover tester errors results in the loss of more
and more good chips. In addition, high-speed and high-pincount testers are very costly.
Design-for-testability (DfT) and BIST have been regarded as possible solutions for changing the dir-
ection of the bottom curve in Figure 28.1. BIST solutions eliminate the need for high-speed testers and
Cost of silicon manufacturing and test
0.0000001
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
1982 1985 1988 1991 1994 1997 2000 2003 2006 2009 2012
Fab capital/transistor (Moores law)
Test capital/transistor (Moores law for test)
Based on 97 SIA Roadmap Data
and 99 ITRS Roadmap
1999 Roadmap
Functional
testing
(manual TG)
Structural
testing
(scan, ATPG)
Built-in
self-test
(embedded
hardware tester)
Embedded
software-based
self-test
(embedded
software tester)
Test
paradigms:
Cost:
cents/transistor
FIGURE 28.1 Fab versus test capital.
2006 by Taylor & Francis Group, LLC
Embedded Software-Based Self-Testing for SoC Design 28-3
show greater accuracy in their ability to apply and analyze at-speed test signals on-chip. Existing BIST
techniques belong to the class of structural BIST. Structural BIST, such as scan-based BIST techniques
[35], offer good test quality but require the addition of dedicated test circuitry (such as full scan, linear-
feedback shift registers [LFSRs] for pattern generation, multiple-inputs signature analyzers [MISRs] for
data analysis, and test controllers). Therefore, they incur nontrivial area, performance, and design time
overhead. Moreover, structural BIST applies nonfunctional, high-switching random patterns and thus,
causes much higher power consumption than normal system operations. Also, to apply at-speed tests to
detect timing-related faults, existing structural BISTneeds to resolve various complex timing issues related
to multiple clock domains, multiple frequencies, and test clock skews that are unique in the test mode.
A new embedded software-based self-testing paradigm [68] has the potential to alleviate the problems
caused by using external testers as well as structural BIST problems described earlier. In this testing
strategy, it is assumed that programmable components of the SoC (such as processor, DSP, and FPGA
components) are rst self-tested by running an automatically synthesized test program that can achieve
high-fault coverage. Next, the programmable component is used as a pattern generator and response
analyzer to test on-chip buses, interfaces between components, and other components including digital,
mixed-signal, and analog components. This self-test paradigm is sometimes referred to as functional
self-testing.
The concept of embedded software-based self-testing is illustrated in Figure 28.2 using a bus-based
SoC. In this illustration, the IP cores in the SoC are connected to a standard bus via the virtual component
interface (VCI) [9]. The VCI acts as a standard communication interface between the IP core and the
on-chip bus. First, the microprocessor tests itself by executing a set of instructions. Next, the processor
can be used for testing the bus as well as other nonprogrammable IP cores in the SoC. In order to support
the self-testing methodology, the IP core is encased in a test wrapper. The test wrapper contains test
support logic needed to control shifting of the scan chain, buffers to store scan data and support at-speed
test, etc. In this example, the on-chip bus is a shared bus, and the arbiter controls access to the bus.
There are several advantages to the embedded software-based self-test approach. First, it allows reuse
of programmable resources on SoCs for test purposes. In other words, this strategy views testing as an
application of the programmable components in the SoC and thus minimizes the need for additional
dedicated test circuitry for self-test or DfT.
Second, in addition to eliminating the need for expensive high-speed testers, it can also reduce the yield
loss owing to tester inaccuracy. Self-testing offers the ability to apply and analyze at-speed test signals
on-chip with accuracy greater than that obtainable with a tester.
Bus
Arbite r
DSP
BusInterface
CPU
External
Tester
Main Memor y
Test
program
CPU Responses CPU CPU
VCI
Signatures
VCI
System
Memory
IP Core
(with scan)
VCI VCI
Test
Support
BusIntf / VCI
glue logic
scan
Interface
data
buffe r
Wrappe r
IP core
BusInterface
Master Wrapper
BusInterface
Target Wrapper
BusInterface
Target Wrapper
Bus
arbiter
DSP
Bus interface
master wrapper
CPU
External
tester
Main Memor y
Test
program
Test
program
CPU CPU Responses CPU CPU CPU CPU
VCI
Signatures
VCI
System
memory
IP core
(with scan)
VCI VCI
Test
support
BusIntf / VCI
glue logic
scan
Interface
data
buffe r
Wrappe r
IP core
BusIntf / VCI
glue logic
scan
Interface
data
buffe r
Wrappe r
IP core
Bus intf/ VCI
glue logic
Scan
interface
Data
buffer
Wrappe r
IP core
Wrapper
IP core
Bus interface
master wrapper
Bus interface
target wrapper
Bus interface
target wrapper
On-chip bus
FIGURE 28.2 Embedded software-based self-testing for SoC.
2006 by Taylor & Francis Group, LLC
28-4 Embedded Systems Handbook
Third, while the hardware-based self-test must be applied in the nonfunctional BIST mode, software-
based self-test can be applied in the normal operational mode of the design that is, the tests are
applied by executing instruction sequences as in regular system operations. This eliminates the problems
created by application of nonfunctional patterns that can result in excessive power consumption when
hardware BIST is used.
Also, functional self-test can alleviate many over-testing and yield loss problems owing to the applica-
tion of nonfunctional patterns during structural testing for delay faults and cross-talk faults (through
at-speed scan or BIST). Experiments have shown that many structurally testable delay faults in the
microprocessors can never be sensitized in the functional mode of the circuit [7]. This is because
no functionally applicable vector sequence can excite these delay faults and propagate the fault effects
into destination outputs/ipops at-speed. Defects on these faults will not affect the circuit perform-
ance and their testing is not necessary. However, if the circuit is tested by applying nonfunctional
patterns, these defects could be detected and the chip could be identied as faulty, resulting in
yield loss.
Software-based fault localization tools are on the high-priority list according to ITRS roadmap [2].
In addition to self-testing, functional information can also be used to guide diagnostic self-test program
synthesis.
Testing of analog and mixed-signal circuits has been an expensive process because of the limited access
to the analog parts and testers required to perform functional testing. The situation has become worse
owing to the trend of integrating various digital, mixed-signal, and analog components into the SoC, with
the result that testing the analog and the mixed-signal parts becomes the bottleneck of production testing.
Most of these problems can be alleviated by self-testing of on-chip ADC/DAC and analog components
based on DSP-based approaches and utilizing on-chip programmable resources.
In the rest of the chapter, we present some representative methods on this subject. We start by discussing
processor self-test methods targeting stuck-at faults and delay faults. We also give a brief description of
a processor self-diagnosis method. Next, we continue with a discussion on methods for self-testing of
buses and global interconnects as well as other nonprogrammable IP cores on SoC. We also describe
instruction-level DfT methods based on insertion of test instructions to increase the fault coverage and
reduce the test application time and test program size. Finally, we summarize DSP-based self-test for
analog/mixed-signal components.
28.2 Embedded Processor Self-Testing
Embedded software-based self-test methods for processors [69] consist of two steps: the test preparation
step and the self-testing step. The test preparation step involves generation of realizable tests for com-
ponents of the processor. Realizable tests are those that can be delivered using instructions; therefore, to
avoid producing undeliverable test patterns, the tests are generated under the constraints imposed by the
processor instruction set. The tests can then be either stored or generated on-chip, depending on which
method seems more efcient for a particular case. A low-speed tester can be used to load the self-test
signatures or the predetermined tests to the processor memory, prior to the application of tests. Note that
the inability to apply every conceivable input pattern to a microprocessor component does not necessarily
map to low-fault coverage. If a fault can be detected only by test patterns outside the allowed input space,
then by denition, the fault is redundant in the normal operational mode of the processor. Thus, there is
no need to test for this type of fault in production testing, even though we may still want to detect and
locate it in the debugging and diagnosis phase.
The self-testing step, illustrated in Figure 28.3, involves the application of these tests using a software
tester. The software tester can also compress the responses into self-test signatures that can then be stored
in memory. The signatures can later be unloaded and analyzed by an external tester. Here, the assumption
is that the processor memory has already been tested with standard techniques such as memory BIST
before the application of the test, and so the memory is assumed to be fault-free.
2006 by Taylor & Francis Group, LLC
Embedded Software-Based Self-Testing for SoC Design 28-5
CPU
External
tester
Processor bus
Test response
Response
signatur e
Test Data fo
Stimulus
Applicatio
On -c hip test
applicatio n
program
Test response
analysis
program
CPU
Instruction memory
Data memory
Test response
Response
signature
Test Data fo
Stimulus
Applicatio
On -c hip test
applicatio n
program
Test response
analysis
program
Test data for
stimulus
application
On-chip test
application
program
Test response
analysis
program
FIGURE 28.3 Embedded processor self-testing.
In the following, we describe the embedded software-based self-test methods for testing stuck-at [6,9]
and path delay faults [7,8] in microprocessors.
28.2.1 Stuck-At Fault Testing
The method proposed by Chen and Dey [6] targets stuck-at faults in a processor core using a divide-and-
conquer approach. First, it determines the structural test needs for subcomponents in the processor (e.g.,
ALU, program counter) that are much less complex than the full processor, and hence more amenable to
random pattern testing. Next, the component tests are either stored or generated on-chip and then, at the
processor level, delivered to their target components using predetermined instruction sequences. To make
sure that the test patterns generated for a subcomponent under test can be delivered by instructions, the
test preparation step precedes the self-test step.
28.2.1.1 Test Preparation
To derive the realizable component tests (i.e., tests deliverable by instructions), the instruction-imposed
constraints must rst be derived for each component. These constraints can be divided into input and
output constraints. The input constraints dene the input space of the component allowed by instructions.
They describe the correlation among the inputs to the component and can be expressed in the form
of Boolean equations. The output constraints dene the subset of component outputs observable by
instructions. To obtain a close prediction of fault coverage in component-level fault simulation, errors
propagating to component outputs that are unobservable at processor-level are regarded as unobserved.
Also, the constraints imposed by the processor instruction set can be divided into those that can be
specied in a single time frame (spatial constraints) and those that span over several time frames (temporal
constraints). Temporal constraints are used to account for the loss of fault coverage owing to fault aliasing,
in the cases where the application of one test pattern involves multiple passes through a fault inside the
component.
If component tests are generated by automatic test pattern generation (ATPG), the spatial constraints
can be specied during test generation with the aid of the ATPG tool. Alternatively, they can be specied
with virtual constraint circuits (VCCs) as proposed in [10] (details of this alternative will be described
in Section 28.3). Similarly, temporal constraints can be modeled with sequential VCCs. Unlike the case
of ATPG, if random tests are used for components, random patterns can be used only on independent
inputs. Component-level fault simulation is used for evaluating the preliminary fault coverage of these
tests. The nal fault coverage can be evaluated with processor-level fault simulation once the entire self-
test program is constructed. Although component tests are generated only for the subset of components
2006 by Taylor & Francis Group, LLC
28-6 Embedded Systems Handbook
that are easily accessible through instructions (e.g., ALU, program counter, etc.), other components such
as the instruction decoder are expected to be tested extensively during the application of the self-test
program.
28.2.1.2 Self-Test
After the realizable component tests have been derived, the next step is on-chip self-test using an embedded
software tester for the on-chip generation of component test patterns, the delivery of component tests,
and the analysis of their responses. Component tests can either be stored or be generated on-chip. If tests
are generated on-chip, the test needs of each component are characterized by a self-test signature, which
includes the seed, S, and the conguration, C, of a pseudo-random number generator as well as the
number of test patterns to be generated, N. The self-test signatures can be expanded on-chip into test sets
using a pseudo-random number-generation program. Multiple self-test signatures may be used for one
component if necessary. Thus, this self-test methodology will allow incorporation of any deterministic
BIST techniques that encode a deterministic test set as several pseudo-random test sets [11,12].
Since the component tests are developed under the constraints imposed by the processor instruction
set, it will always be possible to nd instructions for applying the component tests. On the output end,
special care must be taken when collecting component test response. Inasmuch as data outputs and status
outputs have different observability, they should be treated differently during response collection. In
general, although there are no instructions for storing the status outputs of a component directly to
memory, an image of the status outputs can be created in memory using conditional instructions. This
technique can be used to observe the status outputs of any component.
Using manually extracted constraints, the above scheme has been applied to a simple Parwan pro-
cessor [13]. The generated test program could achieve a high coverage for stuck-at faults in this simple
processor.
28.3 Test Program Synthesis Using VCCs
Tupuri et al. proposed an approach in Reference 10 for generating functional tests for processors by using
a gate-level sequential ATPG tool. It attempts to generate tests for all detectable stuck-at faults under the
functional constraints, and then applies these functional test vectors at the systems operational speed.
The key idea of this approach lies in the synthesized logic embodying the functional constraints, also
known as VCCs. After the functional constraints of an embedded module have been extracted, they are
described in hardware description language (HDL) and synthesized into logic gates. Then a commercial
ATPG is used to generate module-level vectors with such constraint circuitry imposed. These module-level
vectors are translated to processor-level functional vectors and fault simulated to verify the fault coverage.
Figure 28.4 illustrates this hierarchical test generation process using a gate-level test generator for sequential
circuits.
Chen et al. [9] performed the module-level test generation for embedded processors using the concept
of VCCs but with a different utilization such that the test vector generation can be directly plugged into the
settable elds (e.g., operands, source, and destination registers) in test program templates. This utilization
simplies the automated generation of test programs for embedded processors. Figure 28.5 shows the
overall test program synthesis process proposed in Reference 9, in which the nal self-test program can
be synthesized automatically from (1) a simulatable HDL processor design at RTL (register transfer level)
level, and (2) the instruction set architecture (ISA) specication of the embedded processor. The goal and
the process for each step are presented as follows:
Step 1. Partition the processor into a collection of combinational blocks as module-under-test (MUT),
and the test program for each MUT will be synthesized separately.
Step 2. Systematically construct a comprehensive set of test program templates. Test program
templates can be classied into single-instruction templates and multi-instruction templates.
2006 by Taylor & Francis Group, LLC
Embedded Software-Based Self-Testing for SoC Design 28-7
Constraint extraction
and
synthesis
Virtual
Inputs
Virtual
Outputs
Synthesized logic
embodying constraints
Embedded
module
Tests for
faults in
module
with
constraints
Tests for
faults in
embedded
module
Translator
Commercial
ATPG
Commercial
ATPG
Surrounding logic
(determines module constraints)
Primary
Primary
Outputs
Inputs
Embedded
module
FIGURE 28.4 Use of VCCs for functional test generation. (From R.S. Tupuri and J.A. Abraham, in Proceedings of the
IEEE International Test Conference (ITC), September 1997. With permission.)
Single-instruction templates are built around one key instruction whereas multi-instruction tem-
plates include additional supporting instructions for example, to trigger pipeline forwarding.
Toexhaust all possibilities ingenerating test programtemplates wouldbe impossible, but generating
a wide variety of templates is necessary in order to achieve a high-fault coverage.
Step 3. Rank templates based on a controllability-/observability-based testability metric through simu-
lation. Templates at the top of the list T
m
have high controllability (meaning it is easy to set specic
values at the inputs of the MUT) and/or high observability (meaning it is easy to propagate the
values at the output of the MUT to data registers or to observation points, which can be mapped
onto and stored in the memory).
Step 4. Derive the input mapping functions for each template t from the program templates settable
elds (which include operands, source registers, and destination registers) to the inputs of MUT.
Also derive the output mapping functions from the MUTs outputs to the systems observation
points.
The input mapping functions can be derived by simulating a number of instances of template t
to obtain traces followed by regression analysis to construct the mapping function between settable
elds and inputs of MUT.
The output mapping functions can be derived by injecting the unknown X value at outputs
of MUT for simulation, followed by observing the propagation of the X values to the specied
templates destinations.
Step 5. Synthesize the mapping functions into VCCs. The utilization of VCCs not only enforces the
instruction-imposed constraints, but also facilitates the translation frommodule-level test patterns
to instruction-level test programs. First, implement the mapping functions between settable elds
in template t and inputs of MUT m as the input side VCC, and insert it into MUT m. Similarly,
insert output side VCC that embodies the output mapping functions.
2006 by Taylor & Francis Group, LLC
28-8 Embedded Systems Handbook
Processor RTL
MUT partitioning
Extract candidate test
program templates
Set of test program
templates: T
Software self-test
programs
Acceptable fault coverage?
Processor-level fault simulation
Test program memory image
Template field assignments
Constrained test generation
Generate VCCS
Input/output mapping functions
Regression analysis to derive
input mapping
Regression analysis to derive
output mapping
Simulate template instances
WithXinjection at outputs of m
Simulate template instances
to obtain traces at inputs of m
Generate instances of t by assigning
random values to settable fields in t
Choose best ranked template t e T
m
Generate ranked list of templates t T
m
A
t e T: X-based simulation to
compute control abillity/observability metrics
Derivation
of mapping
functions
Template
ranking
Test program synthesis,
cross-compilation
More m in M?
More t in T
m
?
Y
Y
Y
N
N
N
8
7
6
5
4e
4d
4a
4b
9
4c
3b
3a
2
1
Pick m e M
Set of
MUTs: M
U
p
d
a
t
e
s
e
t
o
f
u
n
d
e
t
e
c
t
e
d
f
a
u
l
t
s
,
R
e
r
a
n
k
r
e
m
a
i
n
i
n
g
t
e
m
p
l
a
t
e
s
i
n
T
m
ISA
FIGURE 28.5 Overview of the scalable software-based self-test methodology. (From L. Chen, S. Ravi,
A. Raghunathan, and S. Dey, in Proceedings of the ACM/IEEE Design Automation Conference (DAC), June 2003.
With permission.)
2006 by Taylor & Francis Group, LLC
Embedded Software-Based Self-Testing for SoC Design 28-9
Settable fields
in template t
Observable
destination of m
Constrained
inputs to m
All outputs
of m
<val1>
VCC
Circuit as seen by test generator
VCC
MUT
m
<s>
i1
o1
o2
i2
i3
i4
FIGURE 28.6 Constrained test generation using VCCs. (From L. Chen, S. Ravi, A. Raghunathan, and S. Dey,
in Proceedings of the ACM/IEEE Design Automation Conference (DAC), June 2003. With permission.)
1. Test patterns
P
m,t
4. Final test program
TP
m,t 2. Assignment to settable fields
3. Test program template t
<val1>
ef12
0200
1ac0
ef12
0200
1ac0
1002
029a
9213
10
3
8
<s> <val1> <val2> <s> <t> <r>
... ...
10
3
8
...
7
2
9
...
...
...
8
12
3
... ... ...
load a<s>, <val1>
load a<t>, <val2>
nop; nop; nop; nop
add a<r>, a<s>, a<t>
store a<r>, <resp>
load a10, ef12
load a7, 1002
nop; nop; nop; nop
add a8, a10, a7
store a8, ff12
load a3, 0200
load a2, 029a
nop; nop; nop; nop
add a12, a3, a2
store a12, ff16
...
FIGURE28.7 Example of test programsynthesis. (FromL. Chen, S. Ravi, A. Raghunathan, and S. Dey, in Proceedings
of the ACM/IEEE Design Automation Conference (DAC), June 2003. With permission.)
Step 6. Generate module-level tests for the composite circuit of the MUT between input/output virtual
constraint components. During the constraint test generation, the test generator sees the circuit
including MUT m and the two VCCs, as shown in Figure 28.6. Note that faults within the VCCs
will be eliminated from the fault list and so will not be considered for test generation. With this
composite model, the pattern generator can generate patterns with values directly specied at the
settable elds in instruction template t .
Step 7. Synthesize the target test program for the patterns generated in Step 6. Note that the generated
test patterns of Step 6 assign values in some of the settable elds of each instruction template t . The
other settable elds without value assignment in Step 6 would be lled with randomvalues. The test
program is then synthesized by converting the values of each settable eld into its corresponding
position in instruction template t . Figure 28.7 gives an example of the ow for synthesizing the
target program.
Step 8. Perform fault simulation on the synthesized test program segment to identify the subset of
stuck-at faults detected by the program segment.
Step 9. Update the set of undetected faults and rerank the remaining templates in template list T
m
to
prepare for the next iteration of test program generation.
In Reference 14, the above process is further extended to synthesize test program for detecting cross-talk
faults. Unlike the stuck-at faults, the signal integrity problems such as cross-talk need to be tested by
2006 by Taylor & Francis Group, LLC
28-10 Embedded Systems Handbook
applying a sequence of vectors at operational speed. The requirements for generating multiple specic
vectors, considering instruction-imposed constraints at the same time, pose challenges in test program
synthesis. The semiautomated test program generation framework proposed in Reference 14 combines
multiple instruction-level constraints (multiple VCCs) with a structural ATPG algorithm to select the
instruction sequences and their corresponding operand values for detecting cross-talk faults. Preliminary
results were demonstrated for an industrial processor, Xtensa, from Tensilica Inc.
28.4 Delay Testing
To ensure that the designs meet performance specications requires the application of delay tests. These
tests should be applied at-speed and contain two-vector patterns, applied to the combinational portion
of the circuit under test, to activate and propagate the fault effects to registers or other observation
points [33]. A software-based self-test method aiming at delay faults in processor cores has been proposed
by Lai et al. [7,8]. As in the case of stuck-at faults, not all delay faults in the microprocessor can be tested
in the functional mode. This is simply because no instruction sequence can produce the desired test
sequence that can sensitize the path and capture the fault effect into destination output/ipop at-speed.
A fault is said to be functionally testable if there exists a functional test for that fault. Otherwise, the fault
is functionally untestable.
To illustrate functionally untestable faults, consider part of a simple processors datapath as shown in
Figure 28.8. It contains an 8-bit ALU, an accumulator (AC), and an instruction register (IR). The data
inputs of the ALU, A7-A0, and B7-B0, are connected to the internal data bus and the AC, respectively.
The control inputs of the ALU are S2-S0. The values in S2-S0 instruct the ALU to perform the desired
arithmetic/logic operation. The outputs of the ALU are connected to the inputs of AC and the inputs
of IR. It can be shown that for all possible instruction sequences, whenever a rising transition occurs
on signal S1 at the beginning of a clock cycle, AC and IR can never be enabled at the end of the same
cycle. Therefore, paths that start at S1 and end at the inputs of IR or AC are functionally untestable,
since delay effects on them can never be captured by IR or AC immediately after the vector pair has been
applied. The goal of the test preparation step is to identify functionally testable faults and synthesize tests
for them.
The ow of the test program synthesis for self-test of path delay faults in a microprocessor using its
instructions consists of four major steps:
1. Giventhe ISAandthe micro-architecture of the processor core, the spatial andtemporal constraints,
between and at the registers and control signals, are rst extracted.
2. A path classication algorithm, extended from [15,16], implicitly enumerates and examines all
paths and path segments with the extracted constraints imposed. If a path cannot be sensitized
with the imposed extracted constraints, the path is functionally untestable and thus, is eliminated
Memory
Internal data bus
To controller
AC
B7-B0
A7-A0
ALU
IR
S2 S1S0
(op-code)
FIGURE 28.8 Datapath example. (From W.-C. Lai, A. Krstic, and K.-T. Cheng, in Proceedings of the IEEE VLSI Test
Symposium (VTS), April 2000. With permission.)
2006 by Taylor & Francis Group, LLC
Embedded Software-Based Self-Testing for SoC Design 28-11
fromthe fault universe. This helps reduce the computational effort of the subsequent test generation
process. The preliminary experimental results showninReference 7 indicate a nontrivial percentage
of the paths in simple processors (such as the Parwan processor [13] and the DLX processor [17])
are functionally untestable but structurally testable.
3. A subset of long paths among the functionally testable paths are selected as targets for test gener-
ation. A gate-level ATPG for path delay faults is extended to incorporate the extracted constraints
into the test generation process, where it is used to generate test vectors for each target path delay
fault. If the test is successfully generated, it not only sensitizes the path but also meets the extracted
constraints. Therefore, it is most likely to be deliverable by instructions (if the complete set of
constraints has been extracted, the delivery by instructions could be guaranteed).
4. Inthe test programsynthesis process that follows, the test vectors specifying the bit values at internal
ipops are rst mapped back to word-level values in registers and values at control signals.
These mapped value requirements are then justied at the instruction level. Finally, a predened
propagating routine is used to propagate the fault effects captured in the registers/ipops of the
path delay fault to the memory. This routine compresses the contents of some or all registers in the
processor, generates a signature, and stores it in memory. The procedure is repeated until all target
faults have been processed. The test program, which is generated ofine, will be used to test the
microprocessor at-speed.
This test synthesis program has been applied to Parwan [13] and DLX [17] processors. On the average,
5.3 and 5.9 instructions were needed to deliver a test vector, and the achieved fault coverage for testable
path delay faults was 99.8% for Parwan and 96.3% for DLX.
28.5 Embedded Processor Self-Diagnosis
In addition to enabling at-speed self-test with low-cost testers, software-based self-test eliminates the use
of scan chains and the associated test overhead, making itself an attractive solution for testing high-end
microprocessors. The elimination of scan chains, on the other hand, poses a signicant challenge for fault
diagnosis. Though deterministic methods for generating diagnostic tests are available for combinational
circuits [18], sequential circuits are much too complex to be handled by the same approach. Consequently,
there have been several proposals on generating diagnostic tests for sequential circuits by modifying
existing detection tests [19,20]. A prerequisite for these methods is a high-coverage detection test set
for the sequential circuit under test. Thus, the success of these methods depends on the success of the
sequential test generation techniques.
Though current sequential ATPG techniques are not practical enough for handling large sequential
circuits, software-based self-test methods have the ability of successfully generating tests for a particular
type of sequential circuits microprocessors. If properly modied, these tests might possibly achieve
a high-diagnostic capability. In addition, functional information (ISA and micro-architecture) can be
used to guide and facilitate diagnosis.
Initial investigation for the diagnostic potential of software-based self-test was reported in Reference 21
which attempted to generate test programs geared toward diagnosis. Diagnosis is performed by analyz-
ing the combination of test responses to a large number of small diagnostic test programs. To achieve
a high-diagnostic resolution, the diagnostic test programs are generated in a way such that each test
program detected as few faults as possible, while the union of all test programs detects as many faults as
possible.
28.6 Self-Testing of Buses and Global Interconnects
In SoCdesigns, a large amount of core-to-core communications must be realized with long interconnects.
As we ndways to decrease gate delay, the performance of interconnect is becoming increasingly important
2006 by Taylor & Francis Group, LLC
28-12 Embedded Systems Handbook
for achieving a high-overall performance [2]. However, owing tothe increase of cross-coupling capacitance
and mutual inductance, signals on neighboring wires may interfere with each other, causing excessive delay
or loss of signal integrity. While many techniques have been proposed to reduce cross-talk, owing to the
limited design margin and unpredictable process variations, the cross-talk must also be addressed in
manufacturing testing.
Owing to the nature of its timing, testing for cross-talk effects should be conducted at the operational
speed of the circuit under test. However, at-speed testing of GHz systems requires prohibitively expensive
high-speed testers. Moreover, with external testing, hardware access mechanisms are required for applying
tests to interconnects deeply embedded in the system. This may lead to unacceptable costs in area or
performance overhead.
A BIST technique in which a SoC tests its own interconnects for cross-talk defects using on-chip hard-
ware pattern generators and error detectors has been proposed in Reference 22. Although the amount of
area overhead may be amortized for large systems, for small systems, the amount of relative area overhead
may be unacceptable. Moreover, hardware-based self-test approaches, such as the one in Reference 22,
may cause over-testing and yield loss, as not all test patterns generated in the test mode are valid in the
normal operational mode of the system.
The problem of testing system-level interconnects in embedded processor-based SoCs has been
addressed in References 23 and 24. In such SoCs, most of the system-level interconnects, such as the
on-chip buses, are accessible to the embedded processor core(s). The proposed methodology, being
software-based, enables an embedded processor core in the SoC to test for cross-talk effects in these inter-
connects by executing a software program. The strategy is to let the processor execute a self-test program
with which the test vector pairs can be applied to the appropriate bus in the normal functional mode of
the system. In the presence of cross-talk-induced glitch or delay effects, the second vector in the vector
pair becomes distorted at the receiver end of the bus. The processor, however, can store this error effect to
the memory as a test response, which can be later unloaded by an external tester for off-chip analysis.
Maximum aggressor (MA) fault model proposed in Reference 25 is suitable for modeling cross-talk
defects on interconnects. It abstracts the cross-talk defects on global interconnects by a linear number
of faults. It denes faults based on the resulting cross-talk error effects, including positive glitch (g
p
),
negative glitch (g
n
), rising delay (d
r
), and falling delay (d
f
). For a set of N interconnects, the MA fault
model considers the collective aggressor effects on a given victim line Y
i
, while all other N 1 wires act as
aggressors. The required transitions on the aggressor/victim lines to excite the four error types are shown
in Figure 28.9. For example, the test for positive glitch (g
p
) at victim line Y
i
, as shown in the rst column
of Figure 28.9, would require line Y
i
has a constant 0 value while other N 1 aggressor lines have a
rising transition. Under this pattern, the victim line Y
i
would have a positive glitch owing to the cross-talk
effect. If excessive, the glitch would result in errors. These patterns, collectively called MA tests, excite the
worst-case cross-talk effects on the victim line Y
i
. For a set of N interconnects, there are 4N MA faults,
requiring 4N MA tests. It has been shown in Reference 25 that these 4N faults cover all cross-talk defects
on any of the N interconnects.
In a core-based SoC, the address, data, and control buses are the main types of global intercon-
nects with which the embedded processors communicate with memory and other cores of the SoC via
Test for g
p
Test for g
n
Test for d
f
Test for d
r
Y
1
0 1
Y
i
Y
N
Y
i 1
Y
i +1
Y
1
Y
i
Y
N
Y
i 1
Y
i +1
Y
1
Y
i
Y
N
Y
i 1
Y
i +1
Y
1
Y
i
Y
N
Y
i 1
Y
i +1
FIGURE 28.9 Maximal aggressor tests for victim Y
i
. (From M. Cuviello, S. Dey, X. Bai, and Y. Zhao, in Proceedings
of the IEEE International Conference on Computer-Aided Design (ICCAD), November 1999. With permission.)
2006 by Taylor & Francis Group, LLC
Embedded Software-Based Self-Testing for SoC Design 28-13
ADDR
DATA
MEM
CPU
DATA
MEM
0100
1001
0100
1001
CPU
ADDR 0001
0001
1110
1111 ADDR
DATA
MEM
CPU
DATA
MEM
CPU
ADDR 0001
0001
0001
0001
Fault-free
address
bus
Faulty
address
bus
Address Address
Data Data
ADDR
DATA
MEM
CPU
DATA
MEM
0100
1001
0100
1001
0100
1001
0100
1001
ADDR 0001
0001
0001
0001
1110
1111
1110
1111 ADDR
DATA
CPU
DATA
ADDR 0001
0001
0001
0001
0001
0001
0001
0001
FIGURE 28.10 Testing the address bus. (From X. Bai, S. Dey, and J. Rajski, in Proceedings of the ACM/IEEE Design
Automation Conference (DAC), June 2000. With permission.)
memory-mapped I/O. Li et al. [23] concentrates on testing data and addresses bus in a processor-based
SoC. The cross-talk effects on the interconnects are modeled using the MA fault model:
Testing data bus. For a bidirectional bus such as a data bus, cross-talk effects vary as the bus is driven
from different directions. Thus cross-talk tests should be conducted in both directions [22]. However, to
apply a pair of vectors (v1, v2) in a particular bus direction, the direction of v1 is irrelevant, as long as
the logic value at the bus is held at v1. Only v2 needs to be applied in the specied bus direction. This is
because the signal transition triggering the cross-talk effect takes place only when v2 is being applied to
the bus.
To apply a test vector pair (v1, v2) for the data bus from a SoC core to the CPU, the CPU rst exchanges
data v1 with the core. The direction of data exchange is irrelevant, for example, if the core is a memory,
the CPU may either read v1 from the memory or write v1 to the memory. The CPU then requests data
v2 from the core (a memory-read if the core is memory). Upon the arrival of v2, the CPU writes v2 to
memory for later analysis.
To apply a test vector pair (v1, v2) to the data bus from the CPU to a SoC core, the CPU rst exchanges
data v1 with the core. Then, the CPU sends data v2 to the core (a memory-write if the core is memory).
If the core is memory, v2 can be directly stored to an appropriate address for later analysis. Otherwise, the
CPU must execute additional instructions to retrieve v2 from the core and store it to memory.
Testing address bus. To apply a test vector pair (v1, v2) to the address bus, which is a unidirectional bus
from the CPU to a SoC core, the CPU rst requests data from two addresses (v1 and v2) in consecutive
cycles. In the case of a nonmemory core, since the CPU addresses the core via memory-mapped I/O,
v2 must be the address corresponding to the core. If v2 is distorted by cross-talk, the CPU would
be receiving data from a wrong address, v2
]),
the CPU is able to observe the error and store it in memory for analysis. Figure 28.10 illustrates this
process, for example, in the case where the CPU is communicating with a memory core, to apply test
(0001, 1110) in the address bus from the CPU to the memory core, the CPU rst reads data from address
0001 and then from address 1110. In the system with the faulty address bus, this address may become
1111. If different data are stored at addresses 1110 and 1111 (mem[1110] = 0100, mem[1111] = 1001),
the CPU would receive a faulty value from memory (1001 instead of 0100). This error response can later
be stored in memory for analysis.
The feasibility of this method has been demonstrated by applying it to test the interconnects of
a processor-memory system. The defect coverage was evaluated using a system-level cross-talk-defect
simulation method.
Functionally Maximal Aggressor (FMA) tests. Even though the MA tests have been proven to cover all
physical defects related to cross-talk between interconnects, Lai et al. [24] observe that many of them
2006 by Taylor & Francis Group, LLC
28-14 Embedded Systems Handbook
can never occur during normal system operation owing to constraints imposed by the system. Therefore,
testing buses using MA tests might screen out chips that are functionally correct under any pattern
produced under normal system operation. Instead, functionally maximal aggressor (FMA) tests that meet
the system constraints and can be delivered under the functional mode are proposed in [24]. These tests
provide a complete coverage of all cross-talk-induced logical and delay faults that can cause errors during
the functional mode.
Given the timing diagrams of all bus operations, the spatial and temporal constraints imposed on the
buses can be extracted and FMA tests can be generated. A covering relationship between vectors extracted
from the timing diagrams of the bus commands is used during the FMA test generation process. Since the
resulting FMA tests are highly regular, they can be generated in an algorithmic way. Therefore, the FMA
tests are clustered and t into a few groups. The tests in each group are highly similar except that the
victim lines are different. Therefore, as with a marching sequence (which is commonly used for testing
memory), the tests in each group can be synthesized by a software routine. The synthesized test program
is highly modularized and very small. Experimental results have shown that a test program as small as
3000 to 5000 bytes can detect all cross-talk defects on the bus from the processor core to the target core.
Next, the synthesized test program is applied to the bus from the processor core, and the input buffers
of the destination core capture the responses at the other end of the bus. Such responses should be read
back by the processor core to determine whether any faults occurred on the bus. However, because the
input buffers of a nonmemory core cannot be read by the processor core, a DfT scheme is suggested to
allow direct observability of the input buffers by the processor core. The DfT circuitry consists of bypass
logic added to each I/O core to improve its testability.
With the DfT support on the target I/O core, the test generation procedure rst synthesizes instructions
to set the target core to the bypass mode, and then it continues with synthesizing instructions for the FMA
tests. The test generation procedure does not depend on the functionality of the target core.
28.7 Self-Testing of Other Nonprogrammable IP Cores
Testing nonprogrammable cores on a SoC is a complex problem with many unresolved issues [26].
Industry initiatives such as the IEEE P1500 Working Group [27] provide some solutions for IP core
testing. However, they do not address the requirements of at-speed testing.
A self-testing approach for nonprogrammable cores on a SoC has been proposed in Reference 26.
In this approach, a test program running on the embedded processor delivers test patterns to other IP
cores in the SoC at-speed. The test patterns can be generated on the processor itself or fetched from an
external ATE and stored in on-chip memory. This alleviates the need for dedicated test circuitry for pattern
generation and response analysis. The approach is scalable to large-size IP cores whose structural netlists
are available. Since the pattern delivery is done at the SoC operational speed, it supports delay test. A test
wrapper (shown in Figure 28.11) is placed around each core to support pattern delivery. It contains test
support logic needed to control shifting of the scan chain, buffers to store scan data, buffers to support
at-speed test, etc.
The test ow based on the embedded software self-testing methodology is illustrated in Figure 28.11.
It offers tremendous exibility in the type of tests that can be applied to the IP cores as well as in the
quality of the test pattern set without entailing signicant hardware overhead. Again, the ow is divided
into a preprocessing phase and a testing phase.
In the preprocessing phase, a test wrapper is automatically inserted around the IP core under test.
The test wrapper is congured to meet the specic testing needs for the IP core. The IP core is then
fault-simulated with different sets of patterns. Weighted random patterns generated with multiple weight
sets are used in [26]. In [5], multiple capture cycles are used after each scan sequence. Next, a high-level
test program is generated. This program synchronizes the software pattern generation, start of the test,
application of the test, and analysis of the test response. The programcan also synchronize testing multiple
cores in parallel. The test program is then compiled to generate a processor specic binary code.
2006 by Taylor & Francis Group, LLC
Embedded Software-Based Self-Testing for SoC Design 28-15
Preprocessing
phase
Test
phase
Test code
generator
Fault
simulation
IP Core
IP core
BUS
Binary test
program
Processor
Specifi c
Test
wrapper
generator
IP/CUT
Embedded
Processor
BUS
Response
Processor
specific
parameters
Test specific
Parameters
IP/CUT
Embedded
processor
Finding weights
for PIs and PSIs
FIGURE 28.11 The test ow for testing nonprogrammable IP cores. (From J.-R. Huang, M.K. Iyer, and K.-T. Cheng,
in Proceedings of the IEEE VLSI Test Symposium (VTS), April 2001. With permission.)
In the test phase, the test program is run on the processor core to test various IP cores. A test packet
is sent to the IP core test wrapper informing it about the test application scheme (single- or multiple-
capture cycle). Data packets are then sent to load the scan buffers and the PI/PO buffers. The test wrapper
applies the required number of scan shifts and captures the test response for the programmed number of
functional cycles. The results of the test are stored in the PI/PO buffers and the scan buffers; from there
they are read out by the processor core.
28.8 Instruction-Level DfT/Test Instructions
Several potential benets can accrue from self-testing manufacturing defects in a SoC by running test
programs using a programmable core. These include at-speed testing, low-DfT overhead (owing to elim-
ination of dedicated test circuitry), and better power and thermal management during testing. However,
such a self-test strategy might require a lengthy test program and still might not achieve sufciently high-
fault coverage. These problems can be alleviated by applying a DfT methodology based on adding test
instructions to an on-chip programmable core such as a microprocessor core. This methodology is called
instruction-level DfT.
Instruction-level DfT inserts test circuitry in the form of test instructions. It should be a less intrusive
approach than the gate-level DfT techniques which attempt to create a separate test mode somewhat
orthogonal to the functional mode. If the test instructions are carefully designed such that their micro-
instructions reuse the datapath for the functional instructions and do not require any new datapath, then
the overhead, which occurs only in the controller, should be relatively low. This methodology is also more
attractive for applying at-speed tests and for power/thermal management during test, as compared with
the existing logic BIST approaches.
Instruction-level DfT methods have been proposed in References 28 and 29. The approach in
Reference 28 adds instructions to control the exceptions such as microprocessor interrupts and reset.
With the new instructions, the test program can achieve a fault coverage close to 90% for stuck-at faults.
However, this approach cannot achieve a higher coverage because the test program is synthesized based
on a random approach and cannot effectively control or observe some internal registers that have low
testability.
The DfT methodology proposed in Reference 29 systematically adds test instructions to an on-chip
processor core to improve the self-testability of the processor core, reduce the size of the self-test program,
and reduce its runtime (i.e., reduce the test application time). To decide which instructions to add, the
testability of the processor is analyzed rst. If a register in the processor is identied as hard-to-access, a test
instruction allowing direct accessing of the register is added. The testability of a register can be determined
2006 by Taylor & Francis Group, LLC
28-16 Embedded Systems Handbook
based on the availability of data movement instructions between registers and memory. Aregister is said to
be fully controllable if there exists a sequence of data movement instructions that can move the desired
data frommemory to the register. Similarly, a register is said to be fully observable if there exists a sequence
of data movement instructions to propagate the register data to memory. Given the micro-architecture of
a processor core, it is possible to identify the fully controllable and fully observable registers. For registers
that are not fully controllable/observable, new instructions can be added to improve their accessibility.
In addition, test instruction can also be added to optimize the test program size and runtime. This
is based on the observation that in the synthesized self-test program some code segments (called hot
segments) appear repeatedly. Therefore, the addition of few test instructions can reduce the size of hot
segments. Test instructions can be added to speed up the process of preparing the test vectors by the
processor core, retrieving the responses from the on-chip core under test and analyzing the responses
(by the processor core).
When adding new instructions, the existing hardware should be reused as much as possible to reduce
the area overhead. Adding extra buses or registers to implement new instructions is unnecessary and
avoidable. In most cases, a new instruction can be added by introducing new control signals to the
datapath rather than by adding hardware.
Adding test instructions to the programmable core does not improve the testability of other non-
programmable cores on the SoC. Therefore, instruction-level DfT cannot increase the fault coverage
of the nonprogrammable cores. However, the programs for testing the nonprogrammable cores can be
optimized by adding new instructions. In other words, the same set of test instructions added for self-
testing the programmable cores can be used to reduce the size and runtime of the test programs for
testing other nonprogrammable cores. For pipelined designs, instructions can be added to manage the
difcult-to-control registers buried deeply in the pipeline.
The experimental results of two processors (Parwan and DLX) show that test instructions can reduce
the program size and program runtime by about 20% at the cost of about 1.6% increase in area overhead.
28.9 Self-Test of On-Chip ADC/DAC and Analog Components
Using DSP-Based Approaches
For mixed-signal systems that integrate both analog and digital functional blocks onto the same chip,
testing of analog/mixed-signal parts has become the bottleneck during production testing. Because most
analog/mixed-signal circuits are functionally tested, their testing needs expensive ATE for analog stimulus
generation and response acquisition. One promising solution to this problem is BIST that utilizes on-chip
resources (either shared with functional blocks or dedicated BIST circuitry) to perform on-chip stimulus
generation and response acquisition. Under the BIST approach, the demands on the external test equip-
ment are less stringent. Furthermore, stimulus generation and response acquisition is less vulnerable to
environmental noise during the test process.
With the advent of the CMOS technology, DSP-based BISTbecomes a viable solution for analog/mixed-
signal systems, as the required signal processing to make the pass/fail decision can be realized in the digital
domain with digital resources. In DSP-based BIST schemes [30,31], on-chip DA and AD converters are
used for stimulus generation and response acquisition, and DSP resources (such as CPU or DSP cores)
are used for the required signal synthesis and response analysis. The DSP-based BIST scheme is attractive
because of its exibility various tests, suchas AC, DC, andtransient tests, canbe performedby modifying
the software routines without altering the hardware. However, on-chip AD and DA converters are not
always available in mixed-signal SoC devices. In Reference 32, the authors propose to use the one-bit rst-
order deltasigma modulator as a dedicated BIST circuitry for on-chip response acquisition, in case an
on-chip AD converter is not available. Owing to its over-sampling nature, the deltasigma modulator can
tolerate relatively high-process variations and match inaccuracy without causing functional failure, and
is therefore particularly suitable for VLSI implementation. This solution is suitable for low-to-medium
frequency applications (for example, audio signal).
2006 by Taylor & Francis Group, LLC
Embedded Software-Based Self-Testing for SoC Design 28-17
1- bit
modulator
Analog
CUT
Low-res. DAC
& LP F
Response
analysis
Programmable
core+memory
Test stimuli
& spec.
Pass/fail ?
Software
modulator
SOC
One-bit
modulator
Analog
CUT
Low-res. DAC
and LPF
Response
analysis
Test stimuli
and specifications
Pass/fail ?
Software
modulator
ATE
FIGURE 28.12 DSP-based self-test for analog/mixed-signal parts. (From J.L. Huang and K.T. Cheng, in Proceedings
of the Asia and South Pacic Design Automation Conference, January 2000. With permission.)
Figure 28.12 illustrates the overall deltasigma modulation-based BIST architecture. It employs the
deltasigma modulation technique for both stimulus generation [33] and response analysis [32]
A software deltasigma modulator converts the desired signal to a one-bit digital stream. The digital 1s
and 0s are then transferred to two discrete analog levels by a one-bit DAC followed by a low-pass lter
that removes the out-of-band high-frequency modulation noise, thus restoring the original waveform. In
practice, we extract a segment from the deltasigma output bit stream that contains an integer number of
signal periods. The extracted pattern is stored in on-chip memory, and then periodically applied to the
low-resolution DAC and low-pass lter to generate the desired stimulus. Similarly, for response analysis,
a one-bit modulator can be inserted to convert the analog DUT output response into a one-bit
stream, which is then analyzed by DSP operations performed by on-chip DSP/microprocessor cores.
Among the one-bit modulation architectures, the rst-order conguration is the most stable and
has the maximal input dynamic range. However, it is not quite practical for high-resolution applications
(as a rather high over-sampling rate will be needed), and it suffers inter-modulation distortion (IMD).
Compared to the rst-order conguration, the second-order conguration has a smaller dynamic range
but is more suitable for high-resolution applications.
Note that, the software part of this technique, that is, the software modulator andthe response ana-
lyzer, can be performed by on-chip DSP/microprocessor cores, if abundant on-chip digital programmable
resources are available (as indicated in Figure 28.12), or by external digital test equipment.
28.10 Conclusions
Embedded software-based self-testing has the potential to alleviate problems with many of the current
external tester-based and hardware BIST testing techniques for SoCs. In this chapter, we give a sum-
mary of the recently proposed techniques on this subject. One of the main tasks in applying these
techniques is extracting the functional constraints in the process of test program synthesis that is,
deriving tests that can be delivered by processor instructions. Future research in this area must address the
problem of automating the constraint extraction process in order to make the proposed solutions fully
automatic for general embedded processors. The software-based self-testing paradigmcan be further gen-
eralized for analog/mixed-signal components through the integration of DSP-based testing techniques,
modulation principles, and some low-cost analog/mixed-signal DfT.
Acknowledgments
The authors wish to thank L. Chen and T.M. Mak of Intel, Angela Krstic of Cadence, Sujit Dely of UC
San Diego, Larry Lai of Novas, Li.-C. Wang, and Charles Wen of UC Santa Barbara for their efforts and
contribution to this chapter.
2006 by Taylor & Francis Group, LLC
28-18 Embedded Systems Handbook
References
[1] Semiconductor Industry Association, The National Technology Roadmap for Semiconductors, 1997.
[2] Semiconductor Industry Association, The International Technology Roadmap for Semiconductors,
2003.
[3] C.-J. Lin, Y. Zorian, and S. Bhawmik, Integration of Partial Scan and Built-In Self-Test, Journal of
Electronic Testing: Theory and Applications (JETTA), 7(1-2): 125137, August 1995.
[4] K.-T. Cheng and C.-J. Lin, Timing-Driven Test Point Insertion for Full-Scan and Partial-Scan BIST,
in Proceedings of the IEEE International Test Conference (ITC), Washington D.C., October 1995.
[5] H.-C. Tsai, S. Bhawmik, andK.-T. Cheng, AnAlmost Full-ScanBISTSolutionHigher Fault Cov-
erage and Shorter Test Application Time, in Proceedings of the IEEE International Test Conference
(ITC), Washington D.C., October 1998.
[6] L. Chen and S. Dey, Software-Based Self-Testing Methodology for Processor Cores, IEEE
Transactions on Computer-Aided Design (TCAD), 20(3): 369380, March 2001.
[7] W.-C. Lai, A. Krstic, and K.-T. Cheng, On Testing the Path Delay Faults of a Microprocessor using
its Instruction Set, in Proceedings of the IEEE VLSI Test Symposium (VTS), Montreal Canada, April
2000.
[8] W.-C. Lai, A. Krstic, and K.-T. Cheng, Test Program Synthesis for Path Delay Faults in Micro-
processor Cores, in Proceedings of the IEEE International Test Conference (ITC), Washington, D.C.,
October 2000.
[9] L. Chen, S. Ravi, A. Raghunathan, and S. Dey, AScalable Software-Based Self-Test Methodology for
Programmable Processors, in Proceedings of the ACM/IEEE Design Automation Conference (DAC),
Anaheim, CA, June 2003.
[10] R.S. Tupuri and J.A. Abraham, A Novel Functional Test Generation Method for Processors using
Commercial ATPG, in Proceedings of the IEEE International Test Conference (ITC), Washington
D.C., September 1997.
[11] S. Hellebrand and H.-J. Wunderlich, Mixed-Mode BIST Using Embedded Processors, in Proceed-
ings of the IEEE International Test Conference (ITC), Washington, D.C., October 1996.
[12] R. Dorsch and H.-J. Wunderlich, Accumulator Based Deterministic BIST, in Proceedings of the
IEEE International Test Conference (ITC), Washington, D.C., October 1998.
[13] Z. Navabi, VHDL: Analysis and Modeling of Digital Systems. McGraw-Hill, New York, 1997.
[14] X. Bai, L. Chen, and S. Dey, Software-Based Self-Test Methodology for Crosstalk Faults in Pro-
cessors, in Proceedings of the IEEE High-Level Design Validation and Test Workshop, San Francisco,
CA, November 2003, pp. 1116.
[15] K.-T. Cheng and H.-C. Chen, Classication and Identication of Nonrobustly Untestable Path
Delay Faults, IEEE Transactions on Computer-Aided Design (TCAD), 15(8): 845853, August 1996.
[16] A. Krstic, S.T. Chakradhar, and K.-T. Cheng, Testable Path Delay Fault Cover for Sequential
Circuits, in Proceedings of the European Design Automation Conference, Geneva, Switzerland,
September 1996.
[17] M. Gumm, VHDL Modeling and Synthesis of the DLXS RISC Processor. VLSI Design Course
Notes, University of Stuttgart, Germany, December 1995.
[18] T. Grning, U. Mahlstedt, and H. Koopmeiners, DIATEST: A Fast Diagnostic Test Pattern
Generator for Combinational Circuits, in Proceedings of the IEEE International Conference on
Computer-Aided Design (ICCAD), Santa Clara, CA, November 1991.
[19] X. Yu, J. Wu, and E.M. Rudnick, Diagnostic Test Generation for Sequential Circuits, in Proceedings
of the IEEE International Test Conference (ITC), Washington, D.C., October 2000.
[20] I. Pomeranz and S.M. Reddy, A Diagnostic Test Generation Procedure Based on Test Elimination
by Vector Omission for Synchronous Sequential Circuits, IEEE Transactions on Computer-Aided
Design (TCAD), 19(5): 589600, May 2000.
[21] L. Chen and S. Dey, Software-Based Diagnosis for Processors, in Proceedings of the ACM/IEEE
Design Automation Conference (DAC), New Orleans, LA, June 2002.
2006 by Taylor & Francis Group, LLC
Embedded Software-Based Self-Testing for SoC Design 28-19
[22] X. Bai, S. Dey, and J. Rajski, Self-Test Methodology for At-Speed Test of Crosstalk in Chip Inter-
connects, in Proceedings of the ACM/IEEE Design Automation Conference (DAC), Los Angeles, CA,
June 2000.
[23] L. Chen, X. Bai, and S. Dey, Testing for Interconnect Crosstalk Defects Using On-Chip Embedded
Processor Cores, in Proceedings of the ACM/IEEE Design Automation Conference (DAC), Las Vegas,
NV, June 2001.
[24] W.-C. Lai, J.-R. Huang, and K.-T. Cheng, Embedded-Software-Based Approach to Testing
Crosstalk-Induced Faults at On-Chip Buses, in Proceedings of the IEEE VLSI Test Symposium
(VTS), Marina Del Rey, CA, April 2001.
[25] M. Cuviello, S. Dey, X. Bai, and Y. Zhao, Fault Modeling and Simulation for Crosstalk in System-
on-Chip Interconnects, in Proceedings of the IEEE International Conference on Computer-Aided
Design (ICCAD), San Jose, CA, November 1999.
[26] J.-R. Huang, M.K. Iyer, and K.-T. Cheng, A Self-Test Methodology for IP Cores in Bus-Based
Programmable SoCs, in Proceedings of the IEEE VLSI Test Symposium (VTS), Marina Del Rey, CA,
April 2001.
[27] IEEE P1500 Web Site, http://grouper.ieee.org/groups/1500/
[28] J. Shen and J.A. Abraham, Native Mode Functional Test Generation for Processors with Applica-
tions to Self Test and Design Validation, in Proceedings of the IEEE International Test Conference
(ITC), Washington D.C., October 1998.
[29] W.-C. Lai and K.-T. Cheng, Instruction-Level DFT for Testing Processor and IP Cores in System-
on-a-Chip, in Proceedings of the ACM/IEEE Design Automation Conference (DAC), Las Vegas, NV,
June 2001.
[30] M.F. Toner and G.W. Roberts, A BIST Scheme for a SNR, Gain Tracking, and Frequency Response
Test of a SigmaDelta ADC, IEEE Transactions on Circuits and Systems-II, 42: 115, January 1995.
[31] C.Y. Pan and K.T. Cheng, Pseudo-Random Testing and Signature Analysis for Mixed-Signal Cir-
cuits, in Proceedings of the International Conference on CAD (ICCAD), San Jose, CA, November
1995, pp. 102107.
[32] J.L. Huang and K.T. Cheng, A SigmaDelta Modulation Based BIST Scheme for MixedSignal
Circuits, in Proceedings of the Asia and South Pacic Design Automation Conference, Yakohama
Japan, January 2000.
[33] B. Dufort and G.W. Roberts, Signal Generation using Periodic Single and Multi-Bit SigmaDelta
Modulated Streams, in Proceedings of the IEEE International Test Conference (ITC), Washington,
D.C., October 1997.
2006 by Taylor & Francis Group, LLC
IV
Networked Embedded
Systems
29 Design Issues for Networked Embedded Systems
Sumit Gupta, Hiren D. Patel, Sandeep K. Shukla, and Rajesh Gupta
30 Middleware Design and Implementation for Networked Embedded Systems
Venkita Subramonian and Christopher Gill
2006 by Taylor & Francis Group, LLC
29
Design Issues for
Networked Embedded
Systems
Sumit Gupta
Tallwood Venture Capital
Hiren D. Patel and
Sandeep K. Shukla
Virginia Tech
Rajesh Gupta
University of California
29.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-1
29.2 Characteristics of NES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-2
Functionality and Constraints Distributed Nature
Usability, Dependability, and Availability
29.3 Examples of NES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-5
Automobile: Safety-Critical Versus Telematics Data
Acquisition: Precision Agriculture and Habitat Monitoring
Defense Applications: Battle-Space Surveillance Biomedical
Applications Disaster Management
29.4 Design Considerations for NES. . . . . . . . . . . . . . . . . . . . . . . . . 29-8
29.5 System Engineering and Engineering Trade-Offs
in NES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-10
Hardware Software
29.6 Design Methodologies and Tools . . . . . . . . . . . . . . . . . . . . . . . 29-13
29.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-15
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-16
29.1 Introduction
Rapid advances in microelectronic technology coupled with integration of microelectronic radios on the
same board or even on the same chip has been a powerful driver of the proliferation of a new breed of
Networked Embedded Systems (NESs) over the last decade. NES are distributed computing devices with
wireline and/or wireless communicationinterfaces embeddedina myriadof products suchas automobiles,
medical components, sensor networks, consumer products, and personal mobile devices. These systems
have been variously referred as EmNets (Embedded Network Systems), NEST (Networked Embedded
System Technology), and NES [13].
NES are often distributed embedded systems that must interact not only with the environment and
the user, but also with each other to coordinate computing and communication. And yet, these devices
must often operate in very constrained environments related to their size, energy availability, network
connectivity, etc. The challenges posed by the design and deployment of NES has captured the imagina-
tion of a large number of researchers and galvanized whole new communities into action. The design of
29-1
2006 by Taylor & Francis Group, LLC
29-2 Embedded Systems Handbook
NES requires multidisciplinary, multilevel cooperation, and development to address the diverse hardware
(processor cores, radios, security cores) and software (applications, middleware, operating systems, net-
working protocols) needs. In this chapter, we briey highlight some of the design concerns and challenges
in deploying NES.
Some examples of NES are wireless data acquisition systems such as habitat [47], agriculture and
weather monitoring [8], disaster management and civil monitoring, Cooperative Engagement Capability
(CEC) [9] for military use [10], fabric e-textile [11,12], and consumer products such as cell phones and
Personal Digital Assistants (PDAs). Common to all these systems/applications is their ability to provide
interaction between the environment and humans through a medium of devices such as sensors for data
collection, computation processors to perform data computation, and remote storage devices to preserve
and collate the information.
A good exposition of the characteristics, parameters, examples, and design challenges of NES is
presented in Reference 1. We draw heavily on this book for material and examples. Similar surveys
and expositions of challenges in the applications and design and implementation of sensor networks are
given in References 1317.
The rest of this chapter is organized as follows: in Section 29.2, we describe the characteristic of NES
followed by some examples of such systems in Section 29.3. Based on these examples and characteristics,
we delve into the design considerations of NES in Section 29.4. In Section 29.5, we explore the engineering
trade-offs while designing and deploying NES. Finally, we discuss the design methodologies and design
tools available for designing NES in Section 29.6 and conclude the chapter with a discussion.
29.2 Characteristics of NES
The realmof possibilities where NES applications can be implemented makes characterizing these systems
an inherently difcult task. However, we make an attempt at characterizing the basic functionality and
constraints, distributed nature, and usability, dependability, and availability of such systems. Then, we
describe NES through some examples.
29.2.1 Functionality and Constraints
Networked embedded systems are typically designed to interact with and react to the environment and
people around them. Thus, often NES have sensors that measure temperature, moisture, movement, light,
and so on. By denition, NES have a communication mechanism either a wireline connection or a
wireless radio. Also, they typically have computation engines that can do at least a minimal amount of
computing on the data they acquire.
The environment and user needs place constraints on NES such as small size, low weight, harsh
working conditions, safety and reliability concerns, low cost, and poor resource availability in terms of
low computational ability, and low energy availability (limited battery) [18]. NES devices have to be small
in size so that their deployment does not interfere with the environment; that is, they must function
almost invisibly to the environment. For example, animals must not be aware of the habitat monitoring
sensors that are embedded on them or around them. This example also demonstrates the need for these
systems to be low weight and be able to work under harsh conditions, that is, be tolerant of temperature
changes, physical abuse, vibration, shock, and corrosion. Since NES are frequently deployed in the eld
with little or no access to renewable energy sources, they have to live off a limited energy source or battery.
Owing to real-time and mission-critical requirements, NES have to frequently meet safety and reliability
constraints. For example, the cruise control, antilock braking, and airbag systems in automobiles have to
respond within the given real-time constraints to meet safety requirements. The small form factor and
wide distribution of NES also places a constraint on cost; price uctuations of even a few cents on each
device have a big impact as the volume of devices deployed increases.
2006 by Taylor & Francis Group, LLC
Design Issues for Networked Embedded Systems 29-3
FIGURE 29.1 U.C. Berkeley NES device the MICA processor and radio board. (From M. Horton, D. Culler,
K. Pister, J. Hill, R. Szewczyk, and Alec Woo. Sensors Magazine, 19, 2002. With permission.)
Figure 29.1 shows the Berkeley NES device that consists of a MICA processor and a radio board [19].
As technology advances, these devices are becoming smaller. In fact, the latest Berkeley mote is as
small as a coin. This has led to the notion of smart dust or a massively distributed sensor network that
is self-contained, networked, and provides multiple sensor and coordinated computational capabilities
[7,2022].
29.2.2 Distributed Nature
The application spectrum of NES often means that these systems are physically distributed. In fact, the
distributed nature of NES extends to distributed functionality and communication as well. Distributed
in function refers to NES components that perform specic roles and work together with other NES
components to complete the system. Automotive electronics is a good example where many different
function-specic components work in unison, such as the power control modules, the engine, airbag
deployment, cruise control, suspension, etc. Figure 29.2 shows the components from Kyocera that are
widely used in automotive electronics. Similarly, distributed communication refers to local and global
communication between the embedded systems distributed throughout the system. For example, auto-
motive systems have local wires from actuators/sensors to ECUs (Electronic Control Units) and global
wires/buses between ECUs [24].
29.2.3 Usability, Dependability, and Availability
Networked embedded systems are becoming an increasingly dominant part of a number of devices and
systems in all aspect of our daily lives from entertainment, transportation, personal communications to
biomedical devices.
The pervasiveness of these systems, however, raises concerns about dependability and availability.
Availability generally means access to the system. In scenarios where some of the components of the NES
fail, there must be a mechanism through which the users can interface and interact with the components
to investigate and rectify the problems. Mediums of access can be through PDAs, wired serial access points,
infrared technology, etc. Another dimension of availability is the long life expected fromNES components.
Often, NES do not have access to a renewable energy source and sometimes it is not possible to change
the battery source either. For example, sensors deployed to measure trafc on roads or sensor tags placed
on animals are inaccessible or difcult to reach after deployment.
Dependability or reliability is also a major concern that goes hand in hand with the availability require-
ment. The system must guarantee a certain level of service that the user can depend on. For example,
temperature sensors and smoke detectors are critical to the re safety requirements of any building.
Availability and dependability characteristics are especially crucial to safety-critical systems such as
avionics and biomedical applications. The sensors used in an airplane to monitor cabin pressure, oxygen
2006 by Taylor & Francis Group, LLC
29-4 Embedded Systems Handbook
(a)
(b)
(c)
Integrated terminals
(Lead-offs)
Cross section of stepped
active metal bond
FIGURE 29.2 Automotive electronic components by Kyocera. (From Kyocera Website http://global.kyocera.com/
application/automotive/auto_elec/. With permission.)
2006 by Taylor & Francis Group, LLC
Design Issues for Networked Embedded Systems 29-5
levels, elevation, relative speed, etc. are all important to maintain safety. For example, the release of oxygen
into the oxygen masks in airplanes is controlled via sensors that monitor the cabin oxygen levels. For
biomedical applications such as electronic heart pacemakers, the need for dependability is obvious since
malfunctioning components can be life threatening. Military personnel monitoring is another example
for biomedical applications with NES where devices are used to transmit the location and vital statistics
of the personnel.
Since NES consist of diverse hardware and software components that interact with each other, com-
ponent interoperability becomes another important concern [1]. Today, cars have tens or sometimes even
a hundred embedded computing systems that are designed and manufactured by different contractors.
Such complex distributed, interacting embedded systems raise difcult challenges in system integration,
component interoperability, and system testing and validation.
These NES characteristics present new challenges and constraints for systems engineers that have not
been fully addressed in the design of past networking and distributed systems. Software and hardware
tools and techniques are required to satisfy the need for low cost, low power, Quality-of-Service (QoS)
guarantees, and fast time-to-market. Formalization of methodologies to ensure functional correctness and
efciency of design are paramount in meeting the time-to-market requirements by reducing the iterations
in the design process.
In the following sections, we discuss some of the interesting applications of NES and then, describe NES
tools and methodologies such as programming languages, simulation environments, and performance
measurement tools designed specically to address the design challenges posed by NES.
29.3 Examples of NES
To demonstrate the characteristics, constraints, and design challenges of NES, we present several examples
from current and future NES. These examples are representative of the diverse application domains where
NES can be found. We start with three examples from Reference 1.
29.3.1 Automobile: Safety-Critical Versus Telematics
Cars today typically have tens to a hundred microprocessors controlling all aspects of the automobile
from entertainment systems to the emergency airbag release mechanisms. Figure 29.3 shows some of
Communication
Infrared rays
Power electronics
Control tech.
Fuel cell
Super - conduction
Monitoring system
Radar tech.
Optical devices
Semiconductor
Computer and telephone
FIGURE 29.3 Telematics components in an automobile. (From Mitsubishi electric.
http://www.mitsubishielectric.ca/automotive/. With permission.)
2006 by Taylor & Francis Group, LLC
29-6 Embedded Systems Handbook
the telematic components in a Mitsubishi car [25]. The microprocessors in charge of the functionality fre-
quently communicate and interact with other processors. For example, the stereo volume is automatically
reduced when the driver receives (or answers) a call on his or her cell phone.
Thus, a range of devices that perform different tasks are beginning to be organized in sophisticated
networks as distributed systems. Broadly speaking, there are two such distributed systems: safety-critical
processing systems and telematics systems. Clearly, the safety-critical aspects cannot be sacriced or com-
promised in any way. These two systems are an integral part of the design and construction of the
automobile and dictate several design parameters. Since automobiles have time-to-market that can span
up to ve years from concept to nal product, frequently the technology used for the telematics and
safety-critical components in the automobile is already outdated. This is a rising concern, especially for
the safety-critical components, because upgrading or switching out components is generally not per-
formed or usually not even feasible. Note that, systems that cannot be upgraded or altered after nal
production are considered to be closed systems.
Conversely, open systems allow plugging in newer components with more capabilities and features
similar to a plug-and-play environment. Thus, to make automobiles open systems, we have to develop
technologies that enable automobile designers to construct the safety-critical and telematics systems in
an abstracted manner such that components with a standardized communication protocol and interface
can simply be plugged into the nal product. This resolves the disparity between the long design cycles
for automobiles and the rapid advances in NES components used in them.
Increasing popularity of wireless technology has spawned interesting applications for automobiles.
For example, the OnStar system from General Motors can monitor the location of a car and customer
service staff can remotely unlock the car, detect when airbags have been deployed, and so on. Wireless
communication opens up innite possibilities such as automobile service requests and data collection for
automobile users, dealers, and manufacturers.
29.3.2 Data Acquisition: Precision Agriculture and Habitat Monitoring
The use of sensor nodes for data acquisition is becoming an useful tool for agricultural and habitat
monitoring. In Reference 7, the authors present a study in which they used wireless sensor networks in a
real-world habitat monitoring project. The small footprint and weight of modern sensor nodes make them
attractive for habitat and agricultural monitoring since they cause minimal disturbance to the animals,
plant population, and other natural elements of the habitats being monitored. This solution to monitoring
automates some of menial tasks such as data collection for researchers.
Precision agriculture is an important area of research where NES technology is likely to have a big impact
[1,26,27]. Precision agriculture envisages sensor deployment to monitor and manage crop productivity,
quality, and growth. Besides increasing productivity, better crop quality and crop management are also
key aspects of using NES in precision agriculture.
Crop management provides monitoring and adjusting the level of fertilizer, pesticides, water for partic-
ular areas resulting in better yields at less pollution, emissions, and denitely lower costs. The automation
of these functions requires an adaptive behavior to the changing surroundings such as water levels if it
rains, or pesticides for a particular season when bug problems are more common. This adaptation is an
integral aspect of precision agriculture. While there exist models that dictate the necessary amount of
fertilizer, water, nutrient combinations, these models are not always accurate for the specic locale. So
NES can also perform on-the-side data acquisition functions for purposes of reconstructing appropriate
models and recalibrating or reconguring the sensor metrics accordingly to better suit the specic climate
and locale (Figure 29.4).
Feedback into such systems is crucial to develop the notion of true automated precision agriculture.
Fine-grained tuning to crop management can be done automatically based on these regularly updated
models that can also be monitored by researchers. However, manual adjustments of some kind would
require appropriate interfaces between the deployed NES and the end-user attempting to make the change.
Once again, wireless interfaces can be used for such manual ne-tunes. Conguration and management
2006 by Taylor & Francis Group, LLC
Design Issues for Networked Embedded Systems 29-7
Sensor node
Patch network
Gateway
Transit network
Sensor patch
Client data
browsing and
processing
Base station
Data service
Internet
FIGURE 29.4 System architecture of NES for common data acquisition scenarios. (From David Culler et al. Wireless
sensor networks for HAB monitoring. ACM, 2002. With permission.)
of the network can be handled remotely via handheld devices or even desktops. A practical example of
such a deployment is shown in Reference 27 and wireless sensor networks for precision agriculture in
Reference 26. A similar two-tiered network that couples wireless and wired networks together has been
proposed for structural monitoring of civil structures that can be affected by natural disasters such as
earthquakes [28]. The small size and wireless nature of NES sensor nodes enables eld researchers to
deploy these sensors in small and sensitive locations.
29.3.3 Defense Applications: Battle-Space Surveillance
Network embedded systems are projected to become crucial for future defense applications, particularly
in battle-space surveillance and condition and location monitoring of vehicles, equipment, and personnel.
A military application called Cooperative Engagement Capability (CEC), developed by Raytheon Systems
[9,29], acts as a force multiplier for naval air and missile defense systems by distributing sensor and
weapons data to multiple CEC ships and airborne units. Data from each unit is distributed to other CEC
2006 by Taylor & Francis Group, LLC
29-8 Embedded Systems Handbook
units, after which the data is ltered and combined via an identical algorithm in each unit to construct
a common radar-like aerial picture for missile engagements.
DARPA has funded research in several areas of defense systems under the aegis of their future combat
systems program [30]. Manipulating battle environments and critical threat-identication are projected
uses of NES in such systems. Manipulating battle environments refers to controlling the opposition by
detecting their presence and either altering their route or constraining their advance. Threat-identication
involves identifying a threat early for force protection. A force-protection scenario involves deployment of
sensors around a perimeter that requires protection so that forced entry can be identied and automated
responses such as alarms can be triggered based on a certain event.
System deployment used to be of concern in the past because sensors were bulky and large in size and
required manual deployment. However, with the advances in technology entire sensor networks can now
be deployed by airdrop, personnel, or even via artillery. The small sizes have enabled NES to be deployed
for monitoring vehicles in a way similar to the automobile example discussed earlier.
A relatively new technology called e-textiles has emerged whereby sensors or other computation devices
are integrated into wearable material [11]. Nakad et al. [11] are investigating the communication require-
ments between sensing nodes of e-textile and the computing elements embedded with them. One key and
obvious application of e-textiles is for data acquisition for human monitoring where sensor nodes can be
used to track the location and vital statistics of military personnel.
29.3.4 Biomedical Applications
A civilian application for e-textiles is for monitoring the health of people particularly, old people.
A sensor node embedded in an e-textile worn by patients with heart problems can automatically alert
doctors or emergency services when the patient suffers from heart failure. We have already seen the value
of heart pacemakers for helping millions of people around the world maintain a regular heart beat.
Work is in progress to make sensors small and body-friendly enough that they can either be surgically
insertedor swallowedfor temporary monitoring. These devices canbe usedto monitor, diagnose, andeven
correct anomalies in the health of a patient. Of course, surgical insertion or ingestion of microelectronic
devices raises several concerns about safety and ability of the body to adapt to the foreign bodies, which
are active areas of research.
29.3.5 Disaster Management
Scenarios that involve disaster management can be seen as data acquisition applications where certain
information is gathered, based on which a response is computed and performed. A good example of an
implemented scenario is provided in Reference 31 where remote villages are monitored by four sensors
measuring the seismic activity and water levels for earthquakes and oods, respectively. Through wireless
mediums, these sensors are connected to the nearest emergency rescue stations signaling emergency
events when the thresholds for maximum water levels and seismic activity are crossed. As mentioned
earlier, Kottapalli et al. [28] propose a sensor network for structural monitoring of civil structures in case
of natural disasters such as earthquakes. Other applications of disaster management systems include severe
cold (or heat) monitoring, re monitoring (smoke detectors, heat sensors), volcano monitoring, etc.
29.4 Design Considerations for NES
The examples presented in Section 29.3 gives an idea of the breadth of the application domains in
which NES can be deployed. By studying these examples, we understand the various requirements,
constraints, issues, and concerns involved in developing these kind of systems. Furthermore, as NES
proliferate, the true potential of these systems will be realized when they are deployed at a massive
scale, in the order of thousands or more components. Such a large-scale deployment, however, raises
2006 by Taylor & Francis Group, LLC
Design Issues for Networked Embedded Systems 29-9
some problems [13,1517]:
Deployment. Deployment refers to physical distribution of the nodes in the NES. The rst concerns
for deployment are safety, durability, and sturdiness; if devices are dropped from the air, they should
not cause damage to other objects (people, animals, plants, or material) while landing, and should not
cause damaged either. This is clearly important for defense applications where surveillance sensors may
be airdropped in the battle space.
Several deployment strategies are available that can be classied into either random or strategic deploy-
ment. As the name suggests, random deployment refers to deploying NES nodes in an arbitrary fashion
in the eld. Random deployment is useful when the region being monitored is not accessible for precise
placement of sensors [32]. The problem then becomes of determining region coverage and also possible
redeployment or movement of nodes to improve coverage. Strategic deployment refers to placing NES
nodes at well-planned points so that the coverage is maximized or to place nodes strategically in a small
eld of concentration such that these nodes are not easily subjected to natural damages (e.g., habitat
monitoring).
The number of NES nodes deployed must be considered in the cost and performance/quality of mon-
itoring trade-off because some nodes are generally bound to be destroyed by some means, so there should
be sufcient reserves or fault tolerance in the network to continue the monitoring.
Environment interaction. NES components often need to interact with the environment without human
interaction. Thus, a requirement of NES is an ability to work on their own and perhaps also has a feedback
loop so that nodes can adapt to changes (failure of nodes, movement of objects) in the environment
and continue functioning correctly. Systems such as those used in precision agriculture, chemical and
hazardous gas monitoring, and so on are designed to interact and react to changes in the system. For
example in agriculture, release of water can be tied to the the moisture content in the air.
Life expectancy of nodes. As discussed earlier, an essential requirement for nodes in a NES is a long
life expectancy. This is because once deployed, it is very difcult to access and refurbish the batteries
in the nodes. Also, these nodes must sustain environmental challenges such as inclement weather and
unexpected loss of nodes to animal interaction or owing to component failure. Thus, a whole body of
work has gone into identifying node failure and subsequent reconguration of the network to have some
amount of fault tolerance [33].
Communication protocol between devices. A combination of wired and wireless links can be used to
establish a NES. Furthermore, the nodes in the network may be stationary or mobile. Mobile nodes bring
in a whole range of issues related to dynamic route and neighbor discovery, dynamic routing, etc. The
NES should also be able to recongure and adjust to tolerate loss of nodes from a communication point
of view. That is, if a node that is a relay point fails or dies, then the network should be able to use other
nodes for relaying instead.
Recongurability. In many scenarios, it is not possible to physically reach nodes. However, NES fre-
quently require nodes torecongure after deployment. This may be toadd, remove, or change functionality
or to adjust parameters of the functionality. For example, handheld devices or even desktops may be used
to recongure nodes to ne-tune certain aspects of the system. For example, the water level can be
increased in precision agriculture when the weather report suggests a sudden heat wave for the following
few days [26,27].
Security. NES particularly those that use wireless communication are prune to malicious attack
[34]. This is most evident in military equipment where communication has to be secure from enemy
eavesdropping. Security in handheld devices is becoming an increasing concern with their widespread use
in ofce environments for everything from checking email to exchanging sensitive documents and data.
Running security protocols is computationally expensive and hence, power hungry and several researchers
are proposing ways to reduce these power requirements for sensor networks and handheld devices [3537].
Energy constrained. The small formfactor, lowweight, and the deployment of NES nodes in inaccessible
and remote regions implies that these nodes have access to a limited nonrenewable energy source. Thus,
one major focus of the research community is to develop networking protocols, applications, operating
2006 by Taylor & Francis Group, LLC
29-10 Embedded Systems Handbook
systems, etc. (besides devices) that are energy efcient and utilize robust, high throughput but low power
communication schemes [17,13].
Operating system. There is a need for special or optimized operating systems owing to the stringent
hardware constraints (small form factor, limited energy source, limited memory space) and strict applic-
ation requirements (real-time constraints, adaptability). Several Real-Time Operating Systems (RTOSs)
have been proposed for embedded devices such as eCos [38], LynxOS fromLynuxWorks [39], QNXRTOS
[40], etc.
Adequate design methodologies. Standard design methodologies and design ows have to be modied
or new ones have to be created to address the special needs of NES. For example, there is a need for
design methodologies for low power system-on-a-chip implementations to enable integration of the large
number of diverse components that form a NES device [41].
29.5 System Engineering and Engineering Trade-Offs in NES
The design considerations presented in Section 29.4 raise opportunities for interesting trade-offs between
the hardware and software components in NES. Whereas area, power, and weight constraints limit the
amount of hardware that can be put in a NES node, integration, debugging, and complexity issues are
hindering increased dependence on software.
29.5.1 Hardware
Rapid advances in silicon technology is ushering in an era where we will see widespread use of smart dust
or very small sensor nodes with reasonably complex computational and communication abilities [2022].
Besides a small size, these nodes are low power and have a variety of actuators and sensors, along with
radio/wireless communication devices, and processors for computation. This enables these nodes to move
from beyond being just data acquisition sensors that send their data to a central server. They can now
also act as computation points that rst collate and process the data before sending it to a server or even
coordinate computation among themselves independent of a central server.
The power and area constraints on NES nodes means that general-purpose microprocessors cannot
be used in them. However, low power Application Specic Instruction Processors (ASIPs) augmented
with Application Specic Integrated Circuits (ASICs) will provide the necessary computational ability at
a relatively low power. Whereas the ASIPs are easily programmable, the ASICs can be used for executing
computation-ally expensive and/or time sensitive portions of applications. For example, target identic-
ation in defense systems or airbag release mechanisms in cars require ASICs to meet their timing and
computational needs.
In fact, Henkel and Li [42] and Brodersen et al. [1] have shown that less power is consumed by
custom-made processors than by general-purpose processors. The reason for this is that with custom
chips, parallelism can be effectively exploited to gain better power consumption. Also, hardwiring the
execution of each function eliminates the need for instruction storage and decoding, thus reducing power
as well.
On the other hand, applications such as habitat monitoring and precision agriculture do not have high
timing or computational requirements , so generic microproessors or ASIPs can be used. The compromise
is speed and computational ability versus programmability. ASICs have a high design and manufacturing
cost and are inexible when compared with programmable processors. A change in applications or
protocols leads to a large redesign effort. Programmable processors, on the other hand, can be reused for
several generations of an application (provided computational requirements do not increase).
Recongurable hardware such as Field Programmable Gate Arrays (FPGAs) provides a middle path
between programmable processors and hardwired ASICs. As the name suggests, FPGAs can be repro-
grammed after being deployed in the eld and hence, provide the exibility of microprocessors while
providing the hardwired speed of ASICs. In fact, FPGAs can be congured at runtime as suggested by
2006 by Taylor & Francis Group, LLC
Design Issues for Networked Embedded Systems 29-11
Nitsch and Kebschull [43]. They propose storing the functional behavior and structure of applications in
a XML format. When a client wants to execute an application, the XML is analyzed and the appropriate
mapping is done to the FPGA. The drawbacks of FPGAs are that they require large chip area and have
a low clock frequency.
29.5.2 Software
Small memory storage devices and low computational processors limit the size and complexity of the
software that can run on NES nodes. Porting commonly used operating systems and applications to NES
is difcult because of the limitations posed by the hardware. Hence, software development for NES nodes
is another challenge that embedded systems designers have to overcome.
The Tiny microthreading operating system (TinyOS) [44] has been proposed to address the unique
characteristics of NES nodes. TinyOS is a component-based, highly congurable embedded operating
system with a small footprint. TinyOS has a highly efcient multithreading engine and maintains a two-
level First In First Out (FIFO) scheduler. TinyOS consists of a set of interconnected modular components.
Each component has tasks, events, and command handlers associated with it (Figure 29.5). Tasks are the
processing units of components, that can signal event, issue commands, and execute other tasks. They are
allocateda static area of memory toholdthe state informationof the threadassociatedwiththe component.
TinyOS does not provide the functionality of dynamic memory allocation owing to the restrictions
imposed by the hardware. The component-based structure allows TinyOS to be a highly application-
specic operating system that can be congured by altering conguration les (.comp and .desc les).
Volgyesi and Ledeczi [2] provide a model-based approach to the development of applications based
on TinyOS. They presented a graphical environment, called GRATIS, through which the application
and operating system components are automatically glued together to produce an application. GRATIS
provides automatic code generation capability and a graphical user interface to construct the .comp
Handled
commands
Signaled
events
Issued
commands
Handled
events
Frame Tasks
FIGURE 29.5 Events and commands in TinyOS. (From P. Volgyesi and A. Ledeczi. Component-based development
of networked embedded applications, Vanderbilt Publication, 2002. With permission.)
2006 by Taylor & Francis Group, LLC
29-12 Embedded Systems Handbook
and .desc les automatically, thus simplifying the task of component description and wiring for building
TinyOS-based applications [2]. This increases design productivity and recongurability.
Another design effort to provide functionality to program operating system in sensor nodes is the
development of a programming language framework called nesC [41]. nesC provides a programming
paradigmbasedonanevent-drivenexecution, exible concurrency, andcomponent-baseddesign. TinyOS
is an example where this language has been employed to develop a commonly used Linux-based operating
system for sensor networks. This programming language manages to successfully integrate concurrency,
reactivity to the environment and communication.
The distributed nature of NES means that these systems are inherently concurrent. For example, data
processing and event arrival are two processes that need to be concurrently executed on the NES node.
Concurrency management then has to ensure that race conditions do not occur. For example, in the
emergency airbag release system, the sensor needs to be able to sense the impact as well as react to it based
on processing of the collected data. These type of real-time demands along with the small size and low
cost of NES nodes make concurrency management a challenging task.
These issues are addressed by nesC by drawing upon several existing language concepts. Three of the
main contributions that nesC provides are: denition of a component model, expressive concurrency
model, and program analysis to improve reliability and reduce code volume. The component model
supports sensor node-like event targeted systems with bidirectional channel interfaces to ease event com-
munication. It also provides exible hardware and software boundaries and avoids dynamic component
instantiation and the use of virtual functions. Expressive concurrency model is tied in with compile time
analysis yielding data race detection at compile time to allow for comprehensive concurrent behaviors in
NES nodes. Reduction of code size and improvement of reliability are natural goals for any programming
language.
TinyOS inuenced the design of nesC owing to the specic features of the operating system. First,
TinyOS provides a collection of reusable system components best suited for component-based archi-
tectures. The channel interface connecting components is called the wiring specication, which is
independent of the specic implementation of the component. Tasks and events are inherent in TinyOS
where tasks are regarded as nonpreemptive computationmechanisms and events are similar to tasks except
that they can preempt another task or event. The event-task based concurrency scheme in TinyOS makes
event-driven and expressive concurrency closely related to the implementation of nesC.
Components in nesCare of either modules or conguration types, where the former consists of applica-
tion code and the latter provides interfaces for communicating between components. Modules are written
in C-style code and a top-level conguration is used to wire components together. This resembles the
VHDL-like component-architecture scheme where components are dened and the architecture is the
top-level model that connects signals between components. The component-based architecture brings
exibility to application implementations and allows users to write highly concurrent programs for a
very small scale platform with limited physical resources. Fortunately, with the aid of nesC and graph-
ics conguration tools such as GRATIS construction of dedicated operating systems based on TinyOS
is gradually becoming easier [2]. These tools allow a designer to build his/her own operating system
with relative ease, but application-specic functionality still requires implementation at a programming
level.
Another area of software for NES nodes that has received considerable attention is network protocols
[14,17]. Power and energy constraints in NES nodes necessitate efcient network protocols for trans-
mission of sensed data and intermediary communication. Two broad classications of sensor networks
are proactive and reactive. Proactive, as the word suggests, periodically sends the sensed attribute to the
data collection location or base station. The period is known a priori allowing the sensors to migrate
to their idle, sleep, or off modes to conserve energy. Applications that require periodic monitoring are
best suited for this type of sensor network. A protocol called Low Energy Adaptive Clustering Hierarchy
(LEACH) [45] is one of the many proposed proactive protocols.
Reactive networks, on the other hand, continuously sense the environment and transmit their data
to the base station only upon sensing that the attribute has exceeded a specied threshold. This type of
2006 by Taylor & Francis Group, LLC
Design Issues for Networked Embedded Systems 29-13
network is useful for time critical data so that the user or base station receives the sensitive information
immediately. One such time sensitive protocol for reactive systems that has been proposed recently is the
Threshold-sensitive Energy Efcient sensor Network (TEEN) protocol [46].
Hybrid networks constitute a third type of sensor networks that are a combination of proactive and
reactive networks and are an attempt to overcome the drawbacks experienced by the two other network
systems [47]. In hybrid networks, sensor nodes send sensed data at periodic intervals and also when sensed
data exceeds the set threshold. The periodic interval is generally longer than ones found in proactive
networks such that functionality of the two network types can be incorporated within one. Furthermore,
hybrid systems can be made to work in only proactive or only reactive modes as well.
29.6 Design Methodologies and Tools
Deploying large-scale distributed NES is inherently a complex and error-prone task. Designers of such
systems rely on system-level modeling and simulation tools during the initial architectural denition
phase for design space exploration, to come up with possible architectures that will satisfy the constraints
and requirements and then to verify the functionality of the system.
Design verication at the highest levels of abstraction down to nal implementation is an important
concern with any complex system. With distributed systems, this need becomes even more acute due
to the inability of a system designer to foresee all possible events and sequences of events that may
occur.
Several tools have been developed for simulating NES at the highest level of abstraction as com-
municating network models made up of basic network models. Network Simulator 2 (NS2), OPNET,
SensorSim, and NESLsim are popular network simulation tools widely used in the community [4851].
SensorSim and NESLsim are simulation frameworks specically designed for sensor networks.
SensorSim closely ties in sensor networks with their power considerations by constructing a two-
pronged approach to creating the models. The rst prong involves creating a sensor functional model
that represents the software functions of the sensor consisting of the network protocol stack, middleware,
user applications, and the sensor protocol stack. The second prong is the power model that simulates the
hardware abstracts such as CPU, and radio module to provide the sensor functional model to execute.
The architecture of the SensorSim simulator is shown in Figure 29.6.
Micro sensor node
Power model
Battery
model
Radio model
CPU model
Geophone model
Microphone model
Sensor function model
Application
Middleware
Network layer Sensor layer
Sensor channel Wireless channel
Network
protocol stack
Sensor
protocol stack
MAC layer Physical layer
Physical layer
FIGURE 29.6 SensorSim architecture. (From S. Park, A. Savvides, and M. Srivastava. Sensorsim: a simulation
framework for sensor networks, ACM, 2000. With permission.)
2006 by Taylor & Francis Group, LLC
29-14 Embedded Systems Handbook
This two-prong model implies that the sensor functional model dictates the execution of the tasks to the
power model and these two models work in parallel with each other. An added feature is the sensor channel
that allows sensing devices to detect events. In this way, the sensor channel exposes external signals to the
sensor modules such as microphones, infrared detectors, etc. The signals that are transmitted through
this channel can be of any available form such as infrared light and sound waves for microphones. Every
type of signal has different characteristics based on the medium through which they travel and this is
the primary goal of the sensor channel to simulate these characteristics accurately and to detect and
monitor the events in a sensor network.
The use of a power model in SensorSim follows from the importance placed in designing low power
NES devices [52]. Efcient power control is a basic requirement for the longevity of these devices. The
basis behind the power model is that there is a single power supplier, the battery, and all other components
or models are energy consumers (as shown in Figure 29.6. The consumers such as the CPU model and
radio model drain energy from the battery through events.
An attractive feature of SensorSim is its capability to perform hybrid simulations. This refers to the
ability of SensorSim to behave as a network emulator and interact with real external components such as
network nodes and user applications. However, network emulation for sensor networks differs from
traditional network emulation. The large number and speed of input/output events in sensor net-
works mandates readjusting the real-time delays for the events and reordering the events, making the
implementation of an emulator for such networks a much more difcult task.
SensorSimenables reprogramming the sensor channel to monitor external inputs, thus using real inputs
instead of models for these channels. For example, instead of modeling waves traveling through a wired
(e.g., coaxial) cable, a microphone can be connected using a sensor channel to send waveforms through a
wired link to the simulator.
NESLsim is another modeling framework for sensor networks, which is based on the PARSEC (Parallel
Simulation Environment for Complex Systems) simulator [51]. NESLsim abstracts a sensor node into
two entities: the node entity and the radio entity. The node entity is responsible for computation tasks
such as scheduling, trafc monitoring, and congestion control, whereas the radio entity undertakes the
responsibility to maintain the communication between the sensor nodes in the NES. A third entity that
is not part of the sensor node is the channel entity. This models the wireless medium through which
communication is performed.
NS-2 is a discrete event simulator based on an open-source C++framework that implements a discrete
event simulator developed by Virtual Inter Network Test (VINT) bed collaborative research project at
University of Southern California and the University of California, Berkeley. They provide substantial
support for simulation of the routing and multicast protocols in the TCP/IP networking stack (IP, TCP,
UDP, etc.) over both wired and wireless channels. OPNET also performs similar tasks but is a proprietary
software developed by Opnet Technologies.
SystemC [53,54] is a system-level description language developed to model both the hardware and soft-
ware of a behavioral specication. Drago et al. [55] developed a methodology to combine the simulation
environments of NS-2 and SystemC to simulate and test the functionality of NES. They promote the use
of NS-2 for modeling the network topology and communication infrastructure and SystemC for repres-
enting and simulating the hardware/software components of the embedded system. Using NS-2 relieves
the designer from writing detailed high-level network protocols that already exist and are available for
simulation in network simulator, whereas SystemC allows modeling and simulation of implementations
of embedded systems. In unison these simulation frameworks preserve simulation integrity and reduce
the modeling effort with an admissible degradation in simulation performance.
Integration of simulators is regarded as a valuable resource for systems designers. However, to be able
to perform such a link between simulators, the underlying development platform must be similar. For
example, both NS-2 and SystemC have a C++ underlying framework on which these simulators were
built. Also, the basic simulation paradigm in NS-2 and SystemC is similar. NS-2 is a discrete event-driven
simulator where the scheduler runs by selecting the next event, executing it tocompletion, andlooping back
to execute the next event. Similarly, the SystemC simulator also has a discrete-event based kernel where
2006 by Taylor & Francis Group, LLC
Design Issues for Networked Embedded Systems 29-15
processes are executed and signals are updated at clocked transitions, working on the evaluate-update
paradigm. Drago et al. [55] use a shared memory queue to pass tokens and packets for communication
between the NS-2 kernel and SystemC kernel.
System-level design methodologies for embedded systems are based on a hardwaresoftware codesign
approach [5659]. Hardwaresoftware codesign is a methodology where the hardware and software are
designed and developed concurrently and in a collaboration. This leads to a more efcient and optimized
implementation of applications.
Ramanathan et al. [3,60] present a timing-driven design methodology for NES and explore the need for
temporal correctness while designing these systems. Determining temporal correctness of system models
is difcult because these models are not cycle accurate and usually have no notion of hardware and/or
software implementation. Determining the correctness of timing constraints after the hardware has been
manufactured naturally leads to costly redesign iterations for both the hardware and software subsystems.
Instead the authors propose a solution whereby they specify, explore, and exploit temporal information in
high-level network models and bridge the gap between requirement analysis and system design. At each
stage of the design renement, timing information modeled at the higher-level network models is trickled
down to ner and lower-level models. The authors used NS-2 as their modeling framework.
This timing-driven design methodology uses high-level network models generated using a rate deriv-
ation technique called RADHA-RATAN [60,61]. RADHA-RATAN works on generalized task graphs that
have nodes that represent functionality or tasks and edges that are a asynchronous, unidirectional com-
munication channel between producers and consumers. RADHA-RATAN is a collation of algorithms
that generate timing budgets for the task graph nodes based on their preset execution (ring) and data
production and consumption rates.
Along with RADHA-RATAN, network-level models at the highest level are used to represent function-
alities such as routing, congestion, and QoS. The designer can specify distributions, protocols, and such
settings to generate the network graph in NS-2. The nodes of the network graph simulate the transfer of
packets and tokens among themselves. This enables testing of the functionality of the protocols. However,
further renement by an experienced designer of this high-level network graph results in network subsys-
tems that capture timing requirements. In a process known as timing-driven task structuring, the designer
can then mutate the task graph until the desired timing behavior is achieved, after which partitioning
of hardware and software can be performed. The disconnect between requirement analysis and system
design is circumvented by this mutation, allowing the mix of the NS-2 modeling paradigm and formal
timing analysis techniques to provide a methodology whereby timing requirements seep from high-level
network models to low-level synthesis models.
Simulation and timing analysis are one part of the puzzle in hardwaresoftware codesign. Automated
hardware synthesis and software synthesis and compilation techniques are the next step in generating
implementations fromthe system-level models. To this end, Gupta et al. [62,63] have proposed the SPARK
parallelizing high-level synthesis framework that performs automated synthesis of behavioral descriptions
specied in C to synthesizable register-transfer level VHDL. This framework can then be used in a system
level codesign methodology to implement an application on a core-based platform target [64].
Such hardwaresoftware codesign methodologies are crucial for the design and development of NES.
Automated or semiautomated methodologies are less error-prone, lead to faster time-to-market, and can
help realize hardwaresoftware trade-offs and design exploration that may not be obvious in large systems.
29.7 Conclusions
The increasing interest in NES is quite timely as evident by the several important application areas where
such systems can be used. The development process for these applications and systems remains an ad hoc
process. We need methodologies and tools to provide system designers with the exibility and capability
to quickly construct efcient and optimized system designs. In this chapter, we have examined the range
of design and verication challenges faced by the NES designers. Among existing solutions, we have
2006 by Taylor & Francis Group, LLC
29-16 Embedded Systems Handbook
presented techniques that promise to increase efciency of the design process by raising the level of design
abstraction and by enhancing the scope of system models. These include design tools such as GRATIS
and nesC for operating system conguration and NS-2, SystemC, SPARK, NESLsim, and SensorSim for
simulation, synthesis, and codesign of embedded systems. There are several open research problems.
Reducing device size and power and increasing device speed further remain important objectives.
There is a need for distributed applications, along with middleware and operating system support and
support for network protocols for distributed coordinated collaboration. Continued progress in these
technologies will full the promise of NES as ubiquitous computing systems.
References
[1] R.W. Brodersen, A.P. Chandrakasan, and S. Cheng. 1992. Lowpower CMOS digital design. IEEE
Journal of Solid-State Circuits 27(4): 473484.
[2] P. Volgyesi and A. Ledeczi. Component-based development of networked embedded applications.
In Proceedings of EuroMicro, 2002.
[3] D. Ramanathan, R. Jejurikar, and R. Gupta. Timing driven co-design of networked embedded
systems. In Proceedings of ASPDAC, 2000, pp. 117122.
[4] H. Wang, J. Elson, L. Girod, D. Estrin, and K. Yao. Target classication and localization in hab-
itat monitoring. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP 2003), 2003.
[5] S. Simic and S. Sastry. Distributed environmental monitoring using random sensor networks.
In Proceedings of the Second International Workshop, IPSN 2003, 2003.
[6] B. West, P. Flikkema, T. Sisk, and G. Koch. Wireless sensor networks for dense spatio-temporal
monitoring of the environment: a case for integrated circuit, system and network design.
In Proceedings of the IEEE CAS Workshop on Wireless Communications and Networking, 2001.
[7] A. Mainwaring, J. Polsatre, R. Szewczyk, D. Culler, and J. Anderson. Wireless sensor networks for
habitat monitoring. In Proceedings of WSNA02, 2002.
[8] P. Flikkema and B. West. Wireless sensor networks: from the laboratory to the eld. In National
Conference for Digital Government Research, 2002.
[9] Cooperative engagement capability. http://www.fas.org/man/dod-101/sys/ship/weaps/cec.htm
[10] Tian He, B.M. Blum, John A. Stankovic, and Tarek F. Abdelzaher. Aida: adaptive application inde-
pendent data aggregation in wireless sensor networks. ACMTransactions on Embedded Computing
System (TECS), Special Issue on Dynamically Adaptable Embedded Systems, 3(2): 426457, 2004.
[11] Z. Nakad, M. Jones, and T. Martin. Communications in electronic textile systems. In Proceedings
of the International Conference on Communications in Computing (CIC), 2003.
[12] D. Meoli and T.M. Plumlee. Interactive electronic textile. Journal of Textile and Apparel, Technology
and Management, 2: 112, 2002.
[13] D. Estrin, R. Govindan, J. Heidemann, and Satish Kumar. Next century challenges: scalable
coordination in sensor networks. In Proceedings of the International Conference on Mobile
Computing and Networking (MobiCom), 1999.
[14] D. Estrin, A. Sayeed, andM. Srivastava. Wireless sensor networks. InProceedings of the International
Conference on Mobile Computing and Networking (MobiCom), 2002.
[15] J. Kahn, R. Katz, and K. Pister. Emerging challenges: mobile networking for smart dust. Journal
of Communication Networks, 2: 188196, 2000.
[16] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. A survey of wireless sensor networks.
IEEE Communications Magazine, 38(4): 393422, August 2002.
[17] C.E. Jones, K.M. Sivalingam, P. Agrawal, and J.C. Chen. A survey of energy efcient network
protocols for wireless networks. Wireless Networks, 7: 343358, 2001.
[18] P. Koopman. Embedded system design issues the rest of the story. In Proceedings of the
International Conference on Computer Design, 1996.
[19] M. Horton, D. Culler, K. Pister, J. Hill, R. Szewczyk, and Alec Woo. Mica: the commercialization
of microsensor notes. Sensors Magazine, 19(4): 4048, 2002.
2006 by Taylor & Francis Group, LLC
Design Issues for Networked Embedded Systems 29-17
[20] K.S.J. Pister, J.M. Kahn, and B.E. Boser. Smart dust: wireless networks of millimeter-scale sensor
nodes. Technical report, Highlight Article in 1999 Electronics Research Laboratory Research
Summary, 1999.
[21] J.M. Kahn, R.H. Katz, and K.S.J. Pister. Mobile networking for smart dust. In Proceedings of the
ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), 1999.
[22] Crossbow: smarter sensors in silicon. http://www.xbow.com/
[23] Kyocera. http://global.kyocera.com/application/automotive/auto_elec/
[24] Gabriel Leen and Donal Heffernan. Expanding automotive electronic systems. IEEE Computer, 35:
8893, 2002.
[25] Mitsubishi electric. http://www.mitsubishielectric.ca/automotive/
[26] Y. Li and R. Wang. Precision agriculture: smart farm stations. IEEE 802 Plenary Meeting Tutorials.
[27] Board on Agriculture and Natural Resources. Precision Agriculture in the 21st Century: Geospatial
and Information Technologies in Crop Management. National Academy Press, Washington, 1998.
[28] V. Kottapalli, A. Kiremidjian, J. Lynch, E. Carryer, T. Kenny, K. Law, and Y. Lei. Two-tiered wireless
sensor network architecture for structural health monitoring. In Proceedings of the SPIEs 10th
Annual International Symposium on Smart Structures and Materials, 2003.
[29] Raytheon systems co. http://www.raytheon.com/
[30] Darpa. http://www.darpa.mil/fcs/index.html
[31] N. Sarwabhotla and S. Seetharamaiah. Intelligent disaster management system for remote villages
in India. In Development by Design, Bangalore, India, 2002.
[32] T. Clouqueur, V. Phipatanasuphorn, P. Ramanathan, and K. Saluja. Sensor deployment strategy
for target detection. In Proceedings of WSNA 02, 2002.
[33] F. Koushanfar, M. Potkonjak, and A. Sangiovanni-Vincentelli. Fault tolerance in wireless ad hoc
sensor networks. In Proceedings of the IEEE International Conference on Sensors, 2002.
[34] C. Karlof and D. Wagner. Secure routing in wireless sensor networks: attacks and counter-
measures. In Proceedings of the IEEE International Workshop on Sensor Network Protocols and
Applications, 2003.
[35] A. Perrig, R. Szewczyk, J.D. Tygar, V. Wen, and D.E. Culler. Spins: security protocols for sensor
networks. Wireless Networks, 8(5): 521534, 2002.
[36] H. Cam, S. Ozdemir, D. Muthuaavinashiappan, and Prashant Nair. Energy-efcient security
protocol for wireless sensor networks. In Proceedings of the IEEE VTC Fall 2003 Conference, 2003.
[37] N.R. Potlapally, S. Ravi, A. Raghunathan, and N.K. Jha. Analyzing the energy consumption of
security protocols. In Proceedings of the International Symposium on Low Power Electronics and
Design, 2003.
[38] ecos Open-source real-time operating system for embedded systems. http://sources.redhat.com/
ecos/
[39] Lynxos real-time operating system for embedded systems. http://www.lynuxworks.com/
[40] Qnx real-time operating system for embedded systems. http://www.qnx.com/
[41] D. Gay, P. Levis, R. Behren, M. Welsh, E. Brewer, and D. Culler. The nesc language: a holistic
approach to networked embedded systems. In Proceedings of the ACM SIGPLAN 2003 Conference
on Programming Language Design and Implementation, 2002.
[42] J. Henkel and Y. Li. Energy-conscious hw/sw-partitioning of embedded systems: a case study
on an mpeg-2 encoder. In Proceedings of the International Workshop on Hardware/Software
Codesign, 1998.
[43] C. Nitsch and U. Kebschull. The use of runtime conguration capabilities for network embedded
systems. In Proceedings of Design, Automation and Test in Europe Conference and Exhibition, 2002,
pp. 10932002.
[44] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K. Pister. System architecture directions
for networked sensors. In Proceedings of the International Conference on Architectural Support for
Programming Languages and Operating Systems, 2000.
[45] W. Ye, J. Heidemann, and D. Estrin. An energy efcient mac protocol for wireless sensor networks.
In Proceedings of the INFOCOM 2002: 21st Annual Joint Conference of IEEE, 2002.
2006 by Taylor & Francis Group, LLC
29-18 Embedded Systems Handbook
[46] A. Manjeshwar and D.P. Agrawal. Teen: a protocol for enhanced efciency in wireless sensor
networks. In Proceedings of the International Workshop on Parallel and Distributed Computing
Issues in Wireless Networks and Mobile Computing, 2001.
[47] A. Manjeshwar and D.P. Agrawal. Apteen: a hybrid protocol for efcient routing and comprehensive
information retrieval in wireless sensor networks. In Proceedings of the International Workshop on
Parallel and Distributed Computing Issues in Wireless Networks and Mobile Computing, 2002.
[48] Opnet. http://www.opnet.com/
[49] Network simulator 2. http://www.isi.edu/nsnam/ns/
[50] SensorSim. http://nesl.ee.ucla.edu/projects/sensorsim/
[51] NESLsim. http://www.ee.ucla.edu/saurabh/NESLsim/
[52] S. Park, A. Savvides, and M. Srivastava. Sensorsim: a simulation framework for sensor networks.
In Proceedings of MSWiM 2000, 2000.
[53] R.K. Gupta and S.Y. Liao. Using a programming language for digital system design. IEEE Design
and Test of Computers, 14(2): 7280, April 1997.
[54] SystemC. http://www.systemc.org
[55] N. Drago, F. Fummio, and M. Poncino. Modeling network embedded systems with NS-2 and
SystemC. In Proceedings of ICCSC: Circuits and Systems for Communication, 2002, pp. 240245.
[56] R.K. Gupta and G. De Micheli. HardwareSoftware cosynthesis for digital systems. IEEE Design
and Test of Computers, 10(3): 2941, July 1993.
[57] G. Micheli and R. Gupta. Hardware/software co-design. Proceedings of IEEE, 85, 349365, 1997.
[58] R. Ernst and J. Henkel. Hardwaresoftware codesign of embedded controllers based on hardware
extraction. In Proceedings of the International Workshop on Hardware/Software Codesign, 1992.
[59] J. Henkel and R. Ernst. A hardwareSoftware partitioner using a dynamically determined
granularity. In Proceedings of the Design Automation Conference, 1997.
[60] A. Dasdan, D. Ramanathan, and R.K. Gupta. A timing-driven design and validation methodology
for embedded real-time systems. ACM Transactions on Design Automation of Electronic Systems,
3: 533553, 1998.
[61] A. Dasdan, D. Ramanathan, and R.K. Gupta. Rate derivation and its applications to reactive,
real-time embedded systems. In Proceedings of the Design Automation Conference, 1998.
[62] S. Gupta, R.K. Gupta, N.D. Dutt, andA.Nicolau. SPARK: AParallelizing Approach to the High-Level
Synthesis of Digital Circuits. Kluwer Academic, Publishers, Dordrecht, 2004.
[63] S. Gupta, N.D. Dutt, R.K. Gupta, and A. Nicolau. SPARK: a high-level synthesis framework for
applying parallelizing compiler transformations. In Proceedings of the International Conference on
VLSI Design, 2003.
[64] M. Luthra, S. Gupta, N.D. Dutt, R.K. Gupta, and A. Nicolau. Interface synthesis using memory
mapping for an FPGAplatform. In Proceedings of the International Conference on Computer Design,
October 2003.
2006 by Taylor & Francis Group, LLC
30
Middleware Design
and Implementation
for Networked
Embedded Systems
Venkita Subramonian
and Christopher Gill
Washington University
30.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-1
Multiple Design Dimensions Networked Embedded Systems
Middleware Example Application: Ping-Node Scheduling
for Active Damage Detection Engineering Life-Cycle
Middleware Design and Implementation Challenges
30.2 Middleware Solution Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-5
30.3 ORB Middleware for Networked Embedded
Systems A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-7
Message Formats Object Adapter Message Flow
Architecture Time-Triggered Dispatching Priority
Propagation Simulation Support
30.4 Design Recommendations and Trade-Offs . . . . . . . . . . . . 30-12
30.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-13
30.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-13
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-14
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-14
30.1 Introduction
Networked embedded systems support a wide variety of applications, ranging from temperature
monitoring to battle eld strategy planning [1]. Systems in this domain are characterized by the following
properties:
1. Highly connected networks.
2. Numerous memory-constrained end-systems.
3. Stringent timeliness requirements.
4. Adaptive online reconguration of computation and communication policies and mechanisms.
This work was supported in part by the DARPA NEST (contract F33615-01-C-1898) and PCES (contract F33615-
03-C-4111) programs.
30-1
2006 by Taylor & Francis Group, LLC
30-2 Embedded Systems Handbook
Networked embedded systems challenge assumptions about resource availability and scale made by
classical approaches to distributed computing, and thus represent an active research area with many open
questions. For example, advances in Micro Electro Mechanical Systems (MEMSs) hardware technology
have made it possible to move software closer to physical sensors andactuators to make more intelligent use
of their capabilities. To realize this possibility, however, new networked embedded systems technologies
are needed. For example, hardware infrastructure for such systems may consist of a network of hundreds
or even thousands of small microcontrollers, each closely associated with local sensors and actuators.
30.1.1 Multiple Design Dimensions
The following four dimensions drive the design choices for development of many networked embedded
systems:
1. Temporal predictability
2. Distribution
3. Feature richness
4. Memory constraints
There is often a contravariant relationship between some of these design forces. For example, the left
side of Figure 30.1 illustrates that feature richness may suffer when footprint is reduced. Similarly, a real-
time embedded systems temporal performance must be maintained even when more or fewer features
are supported, as illustrated by the right side of Figure 30.1.
Signicant research has gone into each of these individual design dimensions and has resulted in a
wide range of products and technologies. Research on the Embedded Machine [2] and Kokyu [3] mainly
addresses the real-time dimension. The CORBA Event service [4], Real-time Publish/Subscribe [5], and
Distributable Threads [6] provide alternative programming models that support both one-to-many and
one-to-one communication and hence address the distribution dimension. Small footprint middleware
is the main focus of e
ORB [7] and UCI-Core [8]. TAO [9] and ORBexpress RT [10] are general-purpose
CORBAimplementations that provide real-time and distribution features for a wide variety of application
domains.
30.1.2 Networked Embedded Systems Middleware
General-purpose middleware is increasingly taking the role that operating systems held three decades ago.
Middleware basedonstandards suchas CORBA[11], EJB[12], COM[13], andJava RMI [14] nowcaters to
the requirements of a broad range of distributed applications such as banking transactions [15,16], online
stock trading [17], and avionics mission computing [18]. Different kinds of general-purpose middleware
have thus become key enabling technologies for a variety of distributed applications.
F
o
o
t
p
r
i
n
t
Features
P
e
r
f
o
r
m
a
n
c
e
Features
FIGURE 30.1 Features, footprint, and performance.
2006 by Taylor & Francis Group, LLC
Middleware Design and Implementation 30-3
Tomeet the needs of diverse applications, general-purpose middleware solutions have tendedtosupport
a breadth of features. In large-scale applications, layers of middleware have been added to provide different
kinds of services [18].
However, simply adding features breaks downfor certainkinds of applications. Inparticular, features are
rarely innocuous in applications with requirements for real-time performance or small memory footprint.
Instead, every feature of an application and its supporting middleware is likely either to contribute to or
detract from the application in those dimensions. Therefore, careful selection of features is crucial for
memory constrained and real-time networked embedded systems.
As middleware is applied to a wider range of networked embedded systems, a fundamental tension
between breadth of applicability and customization to the needs of each application becomes increasingly
apparent. To resolve this tension, special-purpose middleware must be designed to address the following
two design forces:
1. The middleware should provide common abstractions that can be reused across different
applications in the same domain.
2. It should then be possible to make ne-grained modications to tailor the middleware to the
requirements of each specic application.
In the following section, we describe a motivating example application and the design constraints it
imposes. In Section 30.1.4, we describe additional design constraints imposed by the engineering life-cycle
for this application.
30.1.3 Example Application: Ping-Node Scheduling for
Active Damage Detection
To illustrate how application domain constraints drive the design of special-purpose middleware, we now
describe a next-generation aerospace application [19], in which a number of MEMS sensor/actuator nodes
are mounted on a surface of a physical structure, such as an aircraft wing. The physical structure may
be damaged during operation, and the goal of this application is to detect such damage when it occurs.
Vibration sensor/actuator nodes are arranged in a mesh with (wired or wireless) network connectivity
to a xed number of neighboring nodes. To detect possible damage, selected actuators called ping nodes
generate vibrations that propagate across the surface of the physical structure. Sensors within a dened
neighborhood can then detect possible damage near their locations by measuring the frequencies and
strengths of these induced vibrations. The sensors convey their data to other nodes in the system, which
aggregate data from multiple sensors, process the data to detect damage, and issue alerts or initiate
mitigating actions accordingly.
Three restrictions on the system make the problem of damage detection difcult. First, the
sensor/actuator nodes are resource-constrained. Second, two vibrations whose strengths are above a
certain threshold at a given sensor location will interfere with each other. Third, sensor/actuator nodes
may malfunction over time. These constraints, therefore, require that the actions of two overlapping ping
nodes be synchronized so that no interfering vibrations will be generated at a sensor location at any time.
This damage detection problem can be captured by a constraint model. Scheduling the activities of
the ping nodes can be formulated as a distributed graph coloring problem. A color in the graph coloring
problem corresponds to a specic time slot in which a ping node vibrates. Thus two adjacent nodes in the
graph, each representing an actuator, cannot have the same color since the vibrations from these actuators
would then interfere with each other. The number of colors is therefore the length (in distinct time slots)
of a schedule. The problem is to nd a shortest schedule such that the ping nodes do not interfere with
one another, in order to minimize damage detection and response times. Distributed algorithms [20] have
been shown to be effective for solving the distributed constraint satisfaction problem in such large scale
and dynamic
1
networks.
1
For example, with occasional reconguration due to sensor/actuator failures online.
2006 by Taylor & Francis Group, LLC
30-4 Embedded Systems Handbook
30.1.4 Engineering Life-Cycle
Large-scale networked embedded systems are often expensive and time consuming to develop, deploy,
and test. Allowing separate development and testing of the middleware and the target system hardware
can reduce development costs and cycle times. However, this separation imposes additional design and
implementation challenges for special-purpose middleware.
For example, to gauge performance of the distributed ping-scheduling algorithm in the actual system,
physical, computational, and communication processes must be simulated for hundreds of nodes at
once. For physical processes, tools such as Matlab or Simulink must be integrated within the simulation
environment. Computation should be performed using the actual software that will be deployed in the
target system. However, that software may be run on signicantly different, and often fewer, actual
end-systems in the simulation environment than in the target system. Similarly, communication in the
simulation environment will often occur over conventional networks, such as switched Ethernet, which
may not be representative of the target systems network.
The following issues must be addressed in the design and implementation of middleware that is suitable
for both the simulation and target system environments:
We need to use as much of the software that will be used in the target system as possible in the
simulation environment. This helps us to obtain relatively faithful metrics about the application
and middleware that will be integrated with the target system.
We need to allow arbitrary congurations for the simulation. The hardware and software cong-
uration may be different for each machine used to run the simulation, and different kinds and
numbers of target system nodes may be simulated on each machine.
Simple time scaling will not work since it does not guarantee that the nodes are synchronized. First,
it is not practical to require that all the computation and communication times are known a priori,
since one function of the simulation may be to gauge those times. Moreover, even if we could scale
the time to a safe upper bound, the wall-clock time it takes to run the simulation would likely be
prohibitively large.
Because of the heterogeneous conguration of the simulation environment, some simulated nodes
might run faster than others, leading to causal inconsistencies in the simulation [21,22].
Additional infrastructure is thus necessary to encapsulate the heterogeneity of different simulation
environments and simulate real-time performance on top of general-purpose operating systems
and networks, with simulation of physical processes in the loop.
30.1.5 Middleware Design and Implementation Challenges
To facilitate exchanges of information between nodes as part of the distributed algorithm, a middleware
framework that provides common services, such as remote object method invocation, is needed. Two key
factors that motivate the development of ORB (Object Request Broker)-style middleware for networked
embedded systems are (1) remote communication and (2) location independence.
Remote communication: Even though a xed physical topology may connect a group of sensor/actuator
components, the logical grouping of these components may not strictly followthe physical grouping.
Location independence: The behavior of communicating components should be independent of their
location to the extent possible. True location independence may not be achievable in all cases,
for example, due totiming constraints or explicit coupling tophysical sensors or actuators. However,
the implementation of object functionality should be decoupled from the question of whether it
accesses other objects remotely or locally where appropriate. The programming model provided
to the object developer should thus provide a common programming abstraction for both remote
and local access.
2006 by Taylor & Francis Group, LLC
Middleware Design and Implementation 30-5
In summary, the key challenges we faced in the design and implementation of special-purpose
middleware to address the application domain constraints described in Sections 30.1.3 and 30.1.4 are to:
Reuse existing infrastructure: We want to avoid developing new middleware from scratch. Rather, we
want to reuse prebuilt infrastructure to the extent possible.
Provide real-time assurances: The performance of middleware itself must be predictable to allow
application-level predictability.
Provide a robust DOC middleware: We chose the DOC communication paradigm since it offers direct
communication among remote and local components, thus increasing location independence.
Reduce middleware footprint: The target for this middleware is memory-constrained embedded
microcontroller nodes.
Support simulation environments: Simulations should be done with the same application software and
middleware intended for deployment on the target. The middleware should also be able to deal
with heterogeneous simulation testbeds, that is, different processor speeds, memory resources, etc.
30.2 Middleware Solution Space
General-purpose CORBA implementations, such as TAO [23], offer generic CORBA implementations,
whose feature sets are determined a priori. Furthermore, faithful implementation of the entire CORBA
standard increases the number of features supported by ORBs and hence results in increased footprint for
the application. In the case of memory-constrained networked embedded applications, this can become
prohibitively expensive.
We instead want to get only the features that we need. The selection of features for our special-purpose
middleware implementation was strictly driven by the unique requirements of the application domain.
Two approaches to developing special-purpose middleware must then be considered:
Top-down: Subdividing existing general-purpose middleware frameworks, for example,TAO [9].
Bottom-up: Composing special-purpose middleware from lower-level infrastructure, for example,
ACE [24].
Both approaches seek to balance reuse of features with customizationto application-specic requirements.
The top-down approach is preferred when the number and kinds of features required are close to those
offered by a general-purpose middleware implementation. In this case, provided policy and mechanism
options can be adjusted in the the general-purpose middleware to t the requirements of the application.
In general, this has been the approach used to create and rene features for real-time performance in TAO.
On the other hand, if the number or kinds of middleware features required differs signicantly from
those available in general-purpose middleware, as is the case with many networked embedded systems
applications, then a bottom-up approach is preferable. This is based largely on the observation that in our
experience lower-level infrastructure abstractions are less interdependent and thus more easily decoupled
than higher-level ones. It is therefore easier to achieve highly customized solutions by composing middle-
ware from primitive infrastructure elements [25,26] than trying to extract the appropriate subset directly
from a general-purpose middleware implementation.
Modernsoftware development relies heavily onreuse. Givena problemanda space of possible solutions,
we try rst to see whether the problemcanbe solved directly fromanexisting solutionto a similar problem.
Taking this view, we compared the challenges described in Section 30.1.5 to existing middleware solutions,
as shown in Table 30.1.
TAO [9,23] and e
ORB [27,28] appeared to be the most suitable candidate solutions based on the
requirements of our target application described in Section 30.1.3. TAO is a widely used standards-
compliant ORB built using the Adaptive Communication Environment (ACE) framework [24,29]. In
addition to a predictable and optimized [30,31] ORB core [32], protocols [33,34], and dispatching [35,36]
infrastructure, TAOoffers a variety of higher-level services [37,38]. e
ORB
Reduced middleware footprint UCI-Core, e
ORB
Simulated real-time behavior TAO? Kokyu?
nORB
ACE
Network programming
primitives
Patterns
Portability
Kokyu
Dispatching model
Real-time QoS
assurance
Priority lanes
TAO
IDL compilation strategies
ORB concurrency patterns
ORB core mechanisms
UCI-Core
Minimum ORB
feature set
FIGURE 30.2 Reuse from existing frameworks.
Problem we get more or less than we need: Unfortunately, faithful implementation of the CORBA
standard increases the number of features supported by TAO, e
ORB [7]
is a commercial CORBA ORB developed for embedded systems, especially in the telecommunications
domain.
The Time-TriggeredArchitecture (TTA) [54] is designedfor fault-tolerant distributedreal-time systems.
Within the TTA all system activities are initiated by the progression of a globally synchronized time-base.
This stands in contrast to event-driven systems, in which system activity is triggered by events. The Time-
Triggered Message-Triggered Object (TMO) [55,56] architecture facilitates the design and development of
real-time systems with syntactically simple but semantically powerful extensions of conventional object-
oriented real-time approaches.
30.6 Concluding Remarks
We have described, how meeting the constraints of networked embedded systems requires careful analysis
of a representative application, as an essential tool for the development of the special-purpose middleware
itself. In addition, discovering which settings and features are best for an application requires careful
2006 by Taylor & Francis Group, LLC
30-14 Embedded Systems Handbook
design a priori. It is therefore important to adopt an iterative approach to middleware development
that starts with specic application requirements and takes simulation and experimentation results into
consideration.
By integrating both real-time middleware dispatching and a virtual clock mechanism used for simula-
tion environments with distribution middleware features, we have shown how to develop special-purpose
middleware solutions that address multiple stages of a networked embedded systems engineering life-
cycle. We also have empirically veried [57] that with nORB the footprint of a statically linked executable
memory image for the ping-node-scheduling application was 30% of the footprint for the same application
built with TAO, while still retaining real-time performance similar to TAO.
Acknowledgments
We gratefully acknowledge the support and guidance of the Boeing NEST OEP Principal Investigator
Dr. Kirby Keller and Boeing Middleware Principal Investigator Dr. Doug Stuart. We also wish to
thank Dr. Weixiong Zhang at Washington University in St. Louis for providing the initial algorithm
implementation used in ping scheduling.
References
[1] D. Estrin, D. Culler, K. Pister, and G. Sukhatme. Connecting the physical world with pervasive
networks. IEEE Pervasive Computing, 1: 5969, 2002.
[2] T. Henzinger, C. Kirsch, R. Majumdar, and S. Matic. Time safety checking for embedded programs.
In Proceedings of the Second International Workshop on Embedded Software (EMSOFT). LNCS,
Springer-Verlag, Heidelberg, 2002.
[3] C.D. Gill, R. Cytron, and D.C. Schmidt. Middleware scheduling optimization techniques for
distributed real-time and embedded systems. In Proceedings of the Seventh Workshop on Object-
Oriented Real-Time Dependable Systems. IEEE, San Diego, CA, January 2002.
[4] T.H. Harrison, D.L. Levine, and D.C. Schmidt. The design and performance of a real-time CORBA
event service. In Proceedings of OOPSLA 97. ACM, Atlanta, GA, October 1997, pp. 184199.
[5] D.C. Schmidt and C. ORyan. Patterns and performance of real-time publisher/subscriber archi-
tectures. Journal of Systems and Software, Special Issue on Software Architecture Engineering
Quality Attributes, 66(3): 213223, 2002.
[6] Y. Krishnamurthy, C. Gill, D.C. Schmidt, I. Pyarali, L.M.Y. Zhang, and S. Torri. The design and
implementation of real-time CORBA 2.0: dynamic scheduling in TAO. In Proceedings of the 10th
Real-Time Technology and Application Symposium (RTAS 04). IEEE, Toronto, CA, May 2004.
[7] PrismTech. eORB. URL: http://www.prismtechnologies.com/English/Products/CORBA/eORB/
[8] Manuel Roman. Ubicore: Universally Interoperable Core. www.ubi-core.com/Documentation/
Universally_ Interoperable_Core/universal%ly_interoperable_core.html
[9] Institute for Software Integrated Systems. The ACE ORB (TAO), Vanderbilt University.
www.dre.vanderbilt.edu/TAO/
[10] O. Interface. ORBExpress, 2002. www.ois.com
[11] Object Management Group. The Common Object Request Broker: Architecture and Specication,
3.0.2 ed. December 2002. http://www.omg.org/technology/documents/formal/corba_iiop.htm
[12] Sun Microsystems. Enterprise JavaBeans Specication, August 2001. java.sun.com/products/ejb/
docs.html
[13] D. Rogerson. Inside COM. Microsoft Press, Redmond, WA, 1997.
[14] Sun Microsystems, Inc. Java Remote Method Invocation Specication (RMI), October 1998
http://java.sun.com//j2se/1.3/docs/guide/rmi/spec/rmi-title.html
[15] L.R. David. Online banking and electronic bill presentment payment are cost effective. Published
online by Online Financial Innovations at www.onlinebank report.com
2006 by Taylor & Francis Group, LLC
Middleware Design and Implementation 30-15
[16] K. Kang, S. Son, and J. Stankovic. Star: secure real-time transaction processing with timeliness
guarantees, 23rd IEEE Real-Time Systems Symposium, Austin, Texas, 2002, pp. 312.
[17] X. Defago, K. Mazouni, andA. Schiper. Highly available trading system: experiments with CORBA,
IFIP International Conference on Distributed Systems Platform and Open Distributed Processing
(Middlewate 98), The Lake District, England, September 1518, 1998.
[18] D. Corman. WSOA-Weapon systems open architecture demonstration using emerging open
system architecture standards to enable innovative techniques for time critical target (TCT)
prosecution. In Proceedings of the 20th IEEE/AIAA Digital Avionics Systems Conference (DASC),
October 2001.
[19] C. Gill, V. Subramonian, J. Parsons, H.-M. Huang, S. Torri, D. Niehaus, and D. Stuart. ORB
middleware evolution for networked embedded systems. In Proceedings of the Eighth International
Workshop on Object Oriented Real-time Dependable Systems (WORDS03). Guadalajara, Mexico,
January 2003.
[20] W. Zhang, G. Wang, and L. Wittenburg. Distributed stochastic search for constraint satisfaction and
optimization: parallelism, phase transitions and performance. In Proceedings of AAAI Workshop
on Probabilistic Approaches in Search, 2002.
[21] L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of
the ACM, 26, 558565, 1978.
[22] Nancy A Lynch. Distributed Algorithms. Morgan Kaufmann Publishers, Inc., San Mateo, California,
1996.
[23] D.C. Schmidt, D.L. Levine and S. Mungee. The Design of the TAO Real-Time Object Request
Broker, Computer Communications 21(4): 294324, 1998.
[24] Institute for Software Integrated Systems. The ADAPTIVE Communication Environment (ACE),
Vanderbilt University. www.dre.vanderbilt.edu/ACE/
[25] F. Hunleth, R. Cytron, and C. Gill. Building customizable middleware using aspect oriented
programming. In The OOPSLA 2001 Workshop on Advanced Separation of Concerns in Object-
Oriented Systems. ACM, Tampa Bay, FL, October 2001. www.cs.ubc.ca/kdvolder/Workshops/
OOPSLA2001/ASoC.html
[26] F. Hunleth and R.K. Cytron. Footprint and feature management using aspect-oriented program-
ming techniques. In Proceedings of the Joint Conference on Languages, Compilers and Tools for
Embedded Systems. ACM Press, 2002, pp. 3845.
[27] S. Aslam-Mir. Experiences with real-time embedded CORBA in Telecom. In Proceedings of
the OMGs First Workshop on Real-time and Embedded Distributed Object Computing. Object
Management Group, Falls Church, VA, July 2000.
[28] J. Garon. Meeting performance and QoS requirements with embedded CORBA. In Proceedings
of the OMGs First Workshop on Embedded Objectbased Systems. Object Management Group,
Santa Clara, CA, January 2001.
[29] D.C. Schmidt. ACE: an object-oriented framework for developing distributed applications. In
Proceedings of the USENIX C++ Technical Conference. USENIX Association, Cambridge, MA,
April 1994.
[30] I. Pyarali, C. ORyan, D.C. Schmidt, N. Wang, V. Kachroo, and A. Gokhale. Applying optimization
patterns to the design of real-time ORBs. In Proceedings of the Fifth Conference on Object-Oriented
Technologies and Systems. USENIX, San Diego, CA, May 1999, pp. 145159.
[31] N. Wang, D.C. Schmidt, and S. Vinoski. Collocation optimizations for CORBA. C++ Report, 11,
4752, 1999.
[32] D.C. Schmidt, S. Mungee, S. Flores-Gaitan, and A. Gokhale. Alleviating priority inversion and
non-determinism in real-time CORBA ORB core architectures. In Proceedings of the Fourth IEEE
Real-Time Technology and Applications Symposium. IEEE, Denver, CO, June 1998.
[33] A. Gokhale and D.C. Schmidt. Principles for optimizing CORBA internet inter-ORB protocol
performance. In Proceedings of the Hawaiian International Conference on System Sciences. Hawaii,
USA, January 1998.
2006 by Taylor & Francis Group, LLC
30-16 Embedded Systems Handbook
[34] A. Gokhale and D.C. Schmidt. Optimizing a CORBA IIOP protocol engine for minimal footprint
multimedia systems. Journal on Selected Areas in Communications, Special Issue on Service Enabling
Platforms for Networked Multimedia Systems, 17: 16731699, 1999.
[35] A. Gokhale and D.C. Schmidt. Evaluating the performance of demultiplexing strategies for real-
time CORBA. In Proceedings of GLOBECOM 97. IEEE, Phoenix, AZ, November 1997.
[36] I. Pyarali, C. ORyan, and D.C. Schmidt. A pattern language for efcient, predictable, scalable,
and exible dispatching mechanisms for distributed object computing middleware. In Proceedings
of the International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC).
IEEE/IFIP, Newport Beach, CA, March 2000.
[37] T.H. Harrison, C. ORyan, D.L. Levine, and D.C. Schmidt. The design and performance of a
real-time CORBA event service. In Proceedings of the 12th ACM SIGPLAN conference on Object-
Oriented Programming Systems, Languages, and Applications (OOPSLA 97), October 59, 1997,
Atlanta, Georgia.
[38] C.D. Gill, D.L. Levine, and D.C. Schmidt. The design and performance of a real-time CORBA
scheduling service. Real-Time Systems, The International Journal of Time-Critical Computing
Systems, Special Issue on Real-Time Middleware, 20: 117154, 2001.
[39] C. Gill, D.C. Schmidt, and R. Cytron. Multi-paradigmscheduling for distributed real-time embed-
ded computing. IEEE Proceedings, Special Issue on Modeling and Design of Embedded Software, 91:
183197, 2003.
[40] I. Pyarali and D.C. Schmidt. An overview of the CORBA portable object adapter. ACM
StandardView, 6: 3043, 1998.
[41] M. Henning and S. Vinoski. Advanced CORBA Programming with C++. Addison-Wesley, Reading,
MA, 1999.
[42] D.C. Schmidt and C. Cleeland. Applying a pattern language to develop extensible orb middle-
ware. In Design Patterns in Communications, L. Rising, Ed. Cambridge University Press, London,
2000.
[43] D.C. Schmidt, D.L. Levine, and C. Cleeland. Architectures and patterns for developing high-
performance, real-time ORB endsystems. In Advances in Computers, M. Zelkovitz, Ed., Academic
Press, New York, 1999.
[44] D.C. Schmidt and C.D. Cranor. Half-sync/half-async: an architectural pattern for efcient and
well-structured concurrent I/O. In Proceedings of the Second Annual Conference on the Pattern
Languages of Programs. Monticello, IL, September 1995, pp. 110.
[45] D.C. Schmidt, M. Stal, H. Rohnert, and F. Buschmann. Pattern-Oriented Software Architecture:
Patterns for Concurrent and Networked Objects, Vol. 2. John Wiley & Sons, New York, 2000.
[46] C. Liu and J. Layland. Scheduling algorithms for multiprogramming in a hard-real-time
environment. Journal of ACM, 20, 4661, 1973.
[47] D.B. Stewart and P.K. Khosla. Real-time scheduling of sensor-based control systems. In Real-
Time Programming, W. Halang and K. Ramamritham, Eds. Pergamon Press, Tarrytown,
NY, 1992.
[48] D.C. Schmidt, S. Mungee, S. Flores-Gaitan, and A. Gokhale. Software architectures for reducing
priority inversion and nondeterminism in real-time object request brokers. Journal of Real-
Time Systems, Special Issue on Real-Time Computing in the Age of the Web and the Internet,
21: 77125, 2001.
[49] K.M. Chandy and L. Lamport. Distributed snapshots: determining global states of distributed
systems. ACM Transactions on Computer Systems, 3, 6375, 1985.
[50] I. Pyarali, C. ORyan, D.C. Schmidt, N. Wang, V. Kachroo, andA. Gokhale. Using principle patterns
to optimize real-time ORBs. IEEE Concurrency Magazine, 8: 1625, 2000.
[51] V. Subramonian and C. Gill. A generative programming framework for adaptive middleware.
In Proceedings of the Hawaii International Conference on System Sciences, Software Technology
Track, Adaptive and Evolvable Software Systems Minitrack, HICSS 2003. HICSS, Honolulu, HW,
January 2003.
2006 by Taylor & Francis Group, LLC
Middleware Design and Implementation 30-17
[52] D. McKinnon, D. Bakken et al. A congurable middleware framework with multiple quality of
service properties for small embedded systems. In Proceedings of the Second IEEE International
Symposium on Network Computing and Applications. IEEE, April 2003.
[53] M. Roman, R.H. Campbell, and F. Kon. Reective middleware: from your desk to your
hand. IEEE Distributed Systems Online, 2, 2001. http://csdl.computer.org/comp/megs/ds/2001/05/
o5001abs.htm
[54] H. Kopetz. Real-Time Systems: Design Principles for Distributed Embedded Applications. Kluwer
Academic Publishers, Norwell, MA, 1997.
[55] K. Kim. APIs enabling high-level real-time distributed object programming. IEEE Computer
Magazine, Special Issue on Object-Oriented Real-time Computing, 33(6), June 2000.
[56] K. Kim. Object structures for real-time systems and simulators. IEEE Computer Magazine, 30(8),
August 1997.
[57] V. Subramonian, G. Xing, C. Gill, C. Lu, and R. Cytron. Middleware specialization for memory-
constrained networked embedded systems. In Proceedings of the 10th IEEE Real-Time and
Embedded Technology and Applications Symposium (RTAS), 2004.
2006 by Taylor & Francis Group, LLC
V
Sensor Networks
31 Introduction to Wireless Sensor Networks
S. Dulman, S. Chatterjea, and P. Havinga
32 Issues and Solutions in Wireless Sensor Networks
Ravi Musunuri, Shashidhar Gandham, and Maulin D. Patel
33 Architectures for Wireless Sensor Networks
S. Dulman, S. Chatterjea, T. Hoffmeijer, P. Havinga, and J. Hurink
34 Energy-Efcient Medium Access Control
Koen Langendoen and Gertjan Halkes
35 Overview of Time Synchronization Issues in Sensor Networks
Weilian Su
36 Distributed Localization Algorithms
Koen Langendoen and Niels Reijers
37 Routing in Sensor Networks
Shashidhar Gandham, Ravi Musunuri, and Udit Saxena
38 Distributed Signal Processing in Sensor Networks
Omid S. Jahromi and Parham Aarabi
39 Sensor Network Security
Guenter Schaefer
40 Software Development for Large-Scale Wireless Sensor Networks
Jan Blumenthal, Frank Golatowski, Marc Haase, and Matthias Handy
2006 by Taylor & Francis Group, LLC
31
Introduction to
Wireless Sensor
Networks
S. Dulman,
S. Chatterjea, and
P. Havinga
University of Twente
31.1 The Third Era of Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 31-1
31.2 What Are Wireless Sensor Networks?. . . . . . . . . . . . . . . . . . . 31-2
31.3 Typical Scenarios and Applications . . . . . . . . . . . . . . . . . . . . . 31-3
31.4 Design Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31-5
Locally Available Resources Diversity and Dynamics
Needed Algorithms Dependability
31.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31-9
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31-9
Wireless Sensor Networks have gained a lot of attention lately. Due to technological advances, building
small-sized, energy-efcient reliable devices, capable of communicating with each other and organizing
themselves in ad hoc networks have become possible. These devices have brought a new perspective to
the world of computers as we know it: they can be embedded into the environment in such a way that the
user is unaware of them. There is no need for reconguration and maintenance as the network organizes
itself to inform the users of the most relevant events detected or to assist them in their activity.
This chapter will give a brief overview of the whole area, by introducing the wireless sensor networks
concepts to the reader. Then, a number of applications as well as possible typical scenarios will be presented
in order to better understand the eld of application of this newemerging technology. Up to this moment,
several main areas of applications have been identied. New areas of applications are still to be discovered
as the research and products grow more mature.
Wireless sensor networks bring lots of challenges and often contradictory demands from the design
point of view. The last part of the chapter will be dedicated to highlighting the main directions of research
involved in this eld. It will serve as a brief introduction to the problems to be described in the following
chapters of the book.
31.1 The Third Era of Computing
Things are changing continuously in the world of computers. Everything started with the mainframe
era: some 30 years ago, these huge devices were widely deployed, for example, within universities.
31-1
2006 by Taylor & Francis Group, LLC
31-2 Embedded Systems Handbook
Lots of users made use of a single mainframe computer which they had to share among themselves. The
computation power came together with a high cost and a huge machine requiring a lot of maintenance.
Technology advanced as it was predicted by Moores Law and we stepped into the second era of
computers. It is a period that is still present today, but which is slowly approaching its nal part. It is the
era of the personal computers, cheaper and smaller, and increasingly affordable. Quite often, the average
user has access to and makes use of more than one computer, these machines being present now in almost
any home and work place.
But in this familiar environment, things are starting to change and the third era of computing gains
more and more terrain each day. Let us take a look at the main trends today. The technology advancements
cause the personal computers to become smaller and smaller. The desktop computers tend to be replaced
by laptops and other portable devices.
The main factor that is inuencing the new transition is the availability of wireless communication
technology. People are getting rapidly used to wireless communicating devices due to their independence
from xed machines. The success and availability of the Internet brought even more independence to the
user: the data could now be available regardless of the physical location of its owner.
The advancements in technology did not stop here: the processors became small and cheap enough to
be found now in almost any familiar device around us, starting with an every-day watch and ending with
(almost) any home appliance we own. The new efforts nowadays are to make these devices talk to each
other and organize themselves into ad hoc networks to accomplish their design goal as fast and reliably as
possible.
This is, in fact, the third computer age envisioned two decades ago by Mark Weiser [1]. Several names,
such as ubiquitous computing, pervasive computing, ambient intelligence, invisible computing, dis-
appearing computer, etc., were createdtoindicate different aspects of the newcomputing age (Mark Weiser
himself dened it as the calm technology, that recedes into the background of our lives).
The ubiquitous computing world brings a reversed view on the usage of computing power: instead of
having lots of users gathered around the mainframe computer, now, each user will be using the services of
several embedded networks. The user will be in the middle of the whole system, surrounded by an invisible
intelligent infrastructure. The original functionality of the objects and application will be enhanced, and
a continuous interaction will be present in a large variety of areas of daily life.
31.2 What Are Wireless Sensor Networks?
So what are wireless sensor networks and where is their place in this newenvironment that starts growing
around us?
Wireless sensor networks is the generic name under which a broad range of devices hide. Basically, any
collection of devices equipped with a processor, having sensing and communication capabilities and being
able to organize themselves into a network created in an ad hoc manner falls into this category.
The addition of the wireless communication capabilities to sensors increased their functionality
dramatically. Wireless sensor networks bring monitoring capabilities that will forever change the way
in which data is collected from the ambient environment. Let us take, for example, the traditional
monitoring approach of a remote location for a given phenomenon, such as recording the geological
activity, monitoring the chemical or biological properties of a region, or even monitoring the weather at
a certain place.
The old approach was the following: rather big and robust devices needed to be built. They should
have contained, besides the sensor pack itself, a big power supply and local data storage capabilities.
A team of scientists would have to travel together to the destination to be monitored, place these
expensive devices at predened positions and calibrate all the sensors. Then, they would come back
after a certain amount of time in order to collect the sensed data. If by misfortune some hardware
would fail, then nothing could be done for it, as the information about the phenomenon itself would
be lost.
2006 by Taylor & Francis Group, LLC
Introduction to Wireless Sensor Networks 31-3
The newapproachis toconstruct inexpensive, small sized, energy-efcient sensing devices. As hundreds,
thousands, or even more of these devices will be deployed, the reliability constraints for them will be
diminished. No local data storage is needed anymore as they will process locally and then transmit by
wireless means the observed characteristic of the phenomenon to one or more access points connected to
a computer network. Individual calibration of each sensor node is no longer needed as it can be performed
by localized algorithms [2]. The deployment will also be easier, by randomly placing the nodes (e.g., simply
throwing them from a plane) onto the monitored region.
Having this example in mind, we can give a general description of a sensor node. The name sensor node
will be used to describe a tiny device that has a short range wireless communication capability, a small
processor andseveral sensors attachedtoit. It may be poweredby batteries andits mainfunctionis tocollect
data from a phenomenon, collaborate with its neighbors, and forward its observations (preprocessed
version of the data or even decisions) to the endpoint if requested. This is possible because its processor
additionally contains the code that enables internode communication and setting up, maintenance, and
reconguration of the wireless network. When referring to wireless communication, we have in mind
mainly radio communication (other means such as ultrasound, visible or infrared light, etc., are also
being used [3]). A sensor network is a network made up of large numbers of sensor nodes. By a large
number we understand at this moment hundreds or thousands of nodes but there are no exact limits for
the upper bound of the number of sensors deployed.
Wireless sensor networks are one of the most important tools of the third era of computing. They are the
simplest intelligent devices around, their main purpose being monitoring the environment surrounding
us and alerting us of the main events happening. Based on the observation reported by these instruments,
humans and machines can make decisions and act on them.
31.3 Typical Scenarios and Applications
At this moment a large variety of sensors exist. Sensors have beendeveloped to monitor almost every aspect
of the ambient world: lighting conditions, temperature, humidity, pressure, the presence or absence of
various chemical or biological products, detection of presence and movement, etc. By networking large
number of sensors and deploying them inside the phenomenon to be studied we obtain a sensing tool
way more powerful than a single sensor is able to do sensing at a superior level.
A rst classication of wireless sensor networks can be made based on the complexity of the networks
involved [4]:
Intelligent warehouse. Each item contained inside the warehouse will have a tag attached, that will be
monitored by the sensor nodes embedded into the walls and shelves. Based on the read data, knowledge
of the spatial positioning of the sensors, and time information, the sensor network will offer information
about the trafc of goods inside the building, create automatic inventories, and even perform long-
term correlations between the read data. The need of manual product scanning will thus disappear.
In this category we can include the scenario of the modern supermarket, where the selected products of
the customers will automatically be identied at the exit of the supermarket. This scenario also has the
minimum complexity. The sensor nodes are placed at xed positions, in a more or less random manner.
The deployment area is easily accessible and some infrastructure (e.g., power supplies and computers)
already exists. At the same time, the nodes are operating in a safe environment meaning that there are
no major external factors that can inuence or destroy them.
Environmental monitoring. This is the widest area of application envisioned up to now. A particular
application in this category is disaster monitoring. The sensor nodes deployed in the affected areas will
help humans estimate the effects of the disaster, build maps of the safe areas, and direct the human actions
toward the affected regions. A large number of applications in this category address monitoring of the
wild life. This scenario has an increased complexity. The area of deployment is no longer accessible in an
easy manner and no longer safe for the sensor nodes. There is hardly any infrastructure present, nodes
2006 by Taylor & Francis Group, LLC
31-4 Embedded Systems Handbook
have to be scattered around in a random manner and the network might contain moving nodes. Also
a larger number of nodes will have to be deployed.
Very-large-scale sensor networks applications. The scenarioof a large city where all the cars have integrated
sensors. These sensor nodes will communicate with each other collecting information about the trafc,
routes, and special trafc conditions. On one hand, new information will be available to the driver of each
car. On the other hand, a global view of the whole picture will also be available. The two main constraints
that characterize this scenario are the large number of nodes and their high mobility. The algorithms
employed will have to scale well and deal with a network with a continuously changing topology.
On the other hand, the authors of Reference 5 present a classication of sensor networks based on their
area of application. It takes into consideration only the military, environment, health, home, and other
commercial areas and can be extended with additional categories, such as space exploration, chemical
processing, and disaster relief.
Military applications. Factors such as rapid deployment, self-organization, and increased fault
tolerance make wireless sensor networks a very good candidate for usage in the military eld. They
are suited for deployment in battleeld scenarios due to the large size of the network and the auto-
matic self-reconguration at the moment of destruction/unavailability of some sensor nodes [6]. Typical
applications are: the monitoring of friendly forces, equipment, and ammunition; battleeld surveillance;
reconnaissance of opposing forces and terrain, targeting, and battle damage assessment; and nuclear,
biological, and chemical attack detection and reconnaissance. A large number of projects have already
been sponsored by The Defense Advanced Research Projects Agency (DARPA) [7].
Environmental applications. Several aspects of the wildlife are being studied with the help of sensor
networks. Existing applications include the following: monitoring the presence and the movement of
birds, animals, and even insects; agricultural related projects observing the conditions of crops and
livestock; environmental monitoring of soil, water, and atmosphere contexts and pollution studies; etc.
Other particular examples include forest re monitoring, biocomplexity mapping of the environment,
and ood detection. Ongoing projects at this moment include the monitoring of birds on Great Duck
Island [8], the zebras in Kenya [9], or the redwoods in California [10]. The number of these applications is
continuously increasing as the rst deployed sensor network showthe benets of easy remote monitoring.
Healthcare applications. An increasing interest is being shown to the elder population [11]. Sensor
networks can help in several areas of the healthcare eld. The monitoring can take place both at home and
in hospitals. At home, patients can be under permanent monitoring and the sensor networks will trigger
alerts whenever there is a change in the patients state. Systems that can detect their movement behavior at
home, detect any fall, or remind them to take their prescriptions are being studied. Also inside hospitals,
sensor networks can be used in order to track the position of doctors and patients (their status or even
errors in the medication), expensive hardware, etc. [12].
Home applications. The home is the perfect application domain for the pervasive computing eld.
Imagine all the electronic appliances forming a network and cooperating together to fulll the needs of
the inhabitants [13]. They will have to identify each user correctly, remember their preferences and their
habits, and at the same time, monitor the entire house for unexpected events. The sensor networks also
have an important role here, being the eyes and the ears that will trigger the actuator systems.
Other commercial applications. This category includes all the other commercial applications envisioned
or already built that do not t in the previous categories. Basically they range from simple systems as
environmental monitoring within an ofce to more complex applications, such as managing inventory
control and vehicle tracking and detection. Other examples include incorporating sensors into toys and
thus detecting the position of the children insmart kindergartens [14]; monitoring the material fatigue
and the tensions inside the walls of a building, etc.
The number of research projects dedicated to wireless sensor networks has increased dramatically over
the last years. A lot of effort has been invested in studying all possible aspects of wireless sensor networks.
2006 by Taylor & Francis Group, LLC
Introduction to Wireless Sensor Networks 31-5
TABLE 31.1 List of Sensor Networks Related Research Projects
Project name Research area
CoSense [15] Collaborative sensemaking (target recognition, condition monitoring)
EYES [16] Self-organizing, energy-efcient sensor networks
PicoRadio [17] Develop low cost, energy-efcient transceivers
SensoNet [18] Protocols for sensor networks
Smart Dust [19] Cubic millimeter sensor nodes
TinyDB [20] Query processing system
WINS [21] Distributed network access to sensors, controls, and processors
TABLE 31.2 Current Sensor Networks Companies List
Company name Headquarters location HTTP address
Ambient systems The Netherlands http://www.ambient-systems.net
CrossBow San Jose, CA http://www.xbow.com
Dust networks Berkeley, CA http://dust-inc.com
Ember Boston, MA http://www.ember.com
Millennial net Cambridge, MA http://www.millennial.net
Sensoria corporation San Diego, CA http://www.sensoria.com
Xsilogy San Diego, CA http://www.xsilogy.com
Please refer to Table 31.1 for a few examples. Also, a number of companies were created, most of them
start-ups from the universities that perform research in the eld. Some of the names in the eld, valid at
the date of writing this document, are listed in Table 31.2.
31.4 Design Challenges
When designing a wireless sensor network one faces, on one hand, the simplicity of the underlying
hardware and, on the other hand, the requirements that have to be met. In order to satisfy them, new
strategies and new sets of protocols have to be developed [2224]. In the following paragraphs we will
address the main challenges that are present in the wireless sensor network eld. The research directions
involved and the open questions that still need to be answered will be presented as well.
To begin with, a high-level description of the current goals for the sensor networks can be
synthesized as:
Long life. The sensor node should be able to live as long as possible using its own batteries. This
constraint can be translated to a power consumption <100 W. The condition arises fromthe assumption
that the sensor nodes will be deployed in a harsh environment where maintenance is either impossible or
has a prohibitively high price. It makes sense to maximize the battery lifetime (unless the sensor nodes
use some form of energy scavenging). The targeted lifetime of a node powered by two AA batteries is
a couple of years. This goal can be achieved only by applying a strict energy policy that will make use of
power-saving modes and dynamic voltage scaling techniques.
Small size. The size of the device should be <1 mm
3
. This constraint gave the sensor nodes the name of
smart dust, a name that gives a very intuitive idea about the nal design. Recently, the processor, the radio
were integrated in a chip having a size of 1 mm
3
. What is left is the antenna, the sensors themselves,
and the battery. Advances are required in each of these three elds in order to be able to meet this design
constraint.
Inexpensive. The third high-level design constraint is about the price of these devices. In order to
encourage large-scale deployment, this technology must be really cheap, meaning that the targeted prices
are in the range of a couple of cents.
2006 by Taylor & Francis Group, LLC
31-6 Embedded Systems Handbook
31.4.1 Locally Available Resources
Wireless sensor networks consist of thousands of devices working together. Their small size comes alsowith
the disadvantage of very limited resource availability (limited processing power, low-rate unreliable wire-
less communication, small memory footprint, and low energy). This raises the issue of designing a new
set of protocols across the whole system.
Energy is of special importance and can by far be considered the most important design constraint.
The sensor nodes will be mainly powered by batteries. In most of the scenarios, due to the environment
where they will be deployed, it will be impossible to have a human change their batteries. In some designs,
energy-scavenging techniques will also be employed. Still, the amount of energy available to the nodes
can be considered limited and this is why the nodes will have to employ energy-efcient algorithms to
maximize their lifetime.
By taking a look at the characteristics of the sensor nodes, we notice that the energy is spent for three
main functions: environment sensing, wireless communication, and local processing. Each of these three
components will have to be optimized in order to obtain minimum energy consumption. For sensing
of the environment component, the most energy-efcient available sensors have to be used. From this
point of view, we can regard this component as a function of a specic application and a given sensor
technology.
The energy needed for transmitting data over the wireless channel dominates by far the energy con-
sumption inside a sensor node. More than that, it was previously shown that it is more efcient to use
a short-range multihop transmission scheme than sending data over large distances [5]. A new strategy
characteristic to the sensor networks was developed based on a trade-off between the last two components
and it is in fact, one of the main characteristics of the sensor networks (see e.g., techniques developed in
References 25 and 26). Instead of blindly routing packets through the network, the sensor nodes will act
based on the content of the packet [27].
Let us suppose that a certain event took place. All nodes that sensed it will characterize the event with
some piece of data that needs to be sent to the interested nodes. There will be many similar data packets,
or at least, some redundancy will exist in the packets to be forwarded. In order to reduce the trafc,
each node on the communication path will examine the contents of the packet it has to forward. Then it
will aggregate all the data related to a particular event into one single packet, eliminating the redundant
information. The reduction of trafc by using this mechanism is substantial. Another consequence of
this mechanism is that the user will not receive any raw data, but only high-level characterizations of the
events. This makes us think of the sensor network as a self-contained tool, a distributed network that
collects and processes information.
From an algorithmic point of view, the local strategies employed by sensor nodes have as a global goal
to extend the overall lifetime of the network. The notion of lifetime of the network usually hides one
of the following interpretations: one can refer to it as the time passed since power on and a particular
event, such as the energy depletion of the rst node or of 30% of the nodes, or even the moment when
the network is splitted in several subnetworks. No matter which of these concepts will be used, the nodes
will choose to participate in the collaborative protocols following a strategy that will maximize the overall
network lifetime.
To be able to meet the goal of prolonged lifetime, each sensor node should:
Spend all the idle time in a deep power down mode, thus using an insignicant amount of energy.
Whenactive, employ scheduling schemes that take intoconsiderationvoltage andfrequency scaling.
It is interesting to note at the same time, the contradictory wireless industry trends and the requirements
for the wireless sensor nodes. The industry focuses at the moment in acquiring more bits/sec/Hz while
the sensor nodes need more bits/euro/nJ. From the transmission range point of view, the sensor nodes
need only limited transmission range to be able to use an optimal calculated energy consumption, while
the industry is interested in delivering higher transmission ranges for the radios. Nevertheless, the radios
2006 by Taylor & Francis Group, LLC
Introduction to Wireless Sensor Networks 31-7
designed nowadays tend to be as reliable as possible, while a wireless sensor network is based on the
assumption that failures are regarded as a regular event.
Energy is not the only resource the sensor nodes have to worry about. The processing power and
memory are also limited. Large local data storages cannot be employed, so strategies need to be developed
in order to store the most important data in a distributed fashion and to report the important events
to the outside world. A feature that helps dealing with these issues is the heterogeneity of the network.
There might be several types of devices deployed. Resource poor nodes will be able to ask more powerful
nodes to perform complicated computations. At the same time, several nodes could associate themselves
in order to perform the computations in a distributed fashion.
Bandwidth is also a constraint when dealing with sensor networks. The low-power communication
devices used (most of the time radio transceivers) canonly work insimplex mode. They offer lowdata rates
due also to the fact that they are functioning in the free unlicensed bands where trafc is strictly regulated.
31.4.2 Diversity and Dynamics
As we already suggested, there may be several kinds of sensor nodes present inside a single sensor network.
We could talk of heterogeneous sensor nodes from the point of view of hardware and software. From the
point of view of hardware, it seems reasonable to assume that the number of a certain kind of devices will
be in an inversely proportional relationship to the capabilities offered. We can assist to a tiered architecture
design, where the resource poor nodes will ask more powerful or specialized nodes to make more accurate
measurements of a certain detected phenomenon, to perform resource intensive operations or even to
help in transmitting data at a higher distance.
Diversity can also refer to sensing several parameters and then combining them in a single decision,
or in other words to perform data-fusion. We are talking about assembling together information from
different kinds of sensors, such as light, temperature, sound, smoke, etc., to detect, for example, if a re
has started.
Sensor nodes will be deployed in the real world, most probably in harsh environments. This puts them
in contact with an environment that is dynamic in many senses and has a big inuence on the algorithms
that the sensor nodes should execute. First of all, the nodes will be deployed in a random fashion in the
environment and in some cases, some of them will be mobile. Second, the nodes will be subject to failures
at randomtimes and they will also be allowed to change their transmission range to better suit their energy
budget. This leads to the full picture of a network topology in a continuous change. The algorithms for the
wireless sensor networks have as one of their characteristic the fact that they do not require a predened
well-known topology.
One more consequence of the real-world deployment is that there will be many factors inuencing the
sensors in contact with the phenomenon. Individual calibration of each sensor node will not be feasible
and probably will not help much as the external conditions will be in a continuous change. The sensor
network will calibrate itself as a reply to the changes in the environment conditions. More than this, the
network will be capable of self-conguration and self-maintenance.
Another issue we need to talk about is the dynamic nature of the wireless communication medium.
Wireless links between nodes can periodically appear or disappear due to the particular position of each
node. Bidirectional links will coexist with unidirectional ones and this is a fact that the algorithms for
wireless sensor networks need to consider.
31.4.3 Needed Algorithms
For a sensor network to work as a whole, some building blocks need to be developed and deployed in
the vast majority of applications. Basically, they are: a localization mechanism, a time synchronization
mechanism, and some sort of distributed signal processing. A simple justication can be that data hardly
has any meaning if some position and time values are not available with it. Full, complex signal processing
done separately at each node will not be feasible due to the resource constraints.
2006 by Taylor & Francis Group, LLC
31-8 Embedded Systems Handbook
The self-localization of sensor nodes gained a lot of attention lately [2831]. It came as a response to the
fact that global positioning systems are not a solution due to high cost (in terms of money and resources)
and it is not available or provides imprecise positioning information in special environments, such as
indoors, etc. Informations, such as connectivity, distance estimation based on radio signal strength, sound
intensity, time of ight, angle of arrival, etc., were used with success in determining the position of each
node within degrees of accuracy using only localized computation.
The position information once obtained was not only used for characterizing the data, but also in
designing the networking protocols, for example, leading to more efcient routing schemes based on the
estimated position of the nodes [32].
The second important building block is the timing and synchronization block. Nodes will be allowed
to function in a sleep mode for long periods of time, so periodic waking up intervals need to be computed
within a certain precision. However, the notion of local time and synchronization with the neighbors is
needed for the communication protocols to perform well. Light-weight algorithms have been developed
that allow fast synchronization between neighboring nodes using a limited number of messages. Loose
synchronization will be used, meaning that each pair of neighbor nodes are synchronized within a certain
bound, while nodes situated multiple hops away might not be synchronized at all.
Global timing notion might not be needed at all in most of the applications. Due to the fact that many
applications measure natural phenomenon, such as temperature, where delays up to the order of seconds
can be tolerated, the trade-off between latency and energy is preferred.
The last important block is the signal processing unit. A new class of algorithms has to be developed
due to the distributed nature of wireless sensor networks. In their vast majority the signal processing
algorithms are centralized algorithms that require a large computation power and the availability of all
the data at the same time. Transmitting all the recorded data to all nodes is impossible in a dense network
even from theoretical point of view, not to mention the needed energy for such an operation. The new
distributed signal processing algorithms have to take into account the distributed nature of the network,
the possible unavailability of data from certain regions due to failures, and the time delays that might be
involved.
31.4.4 Dependability
More than any other sort of computer network, the wireless sensor networks are subject to failures.
Unavailability of services will be considered a feature of these networks or regular events rather than
some sporadic and highly improbable events. The probability for something going wrong is at least several
orders of magnitude higher than in all the other computer networks.
All the algorithms have to employ some form of robustness in front of the failures that might affect
them. On the other hand, this comes at the cost of energy, memory, and computation power, so it has to
be kept at a minimum. An interesting issue is the one on the system architecture from the protocols point
of view. In traditional computer networks, each protocol stack is designed for the worst-case scenario.
This scenario hardly ever happens simultaneously for all the layers, and a combination of lower layer
protocols could eliminate such a scenario. This leads to lot of redundancy in the sensor node, redundancy
that costs important resources. The preferred approach is that of crosslayer designing and studying of the
sensor node as a whole object rather than separate building blocks. This opens for a discussion on the
topic of what is a right architecture for all the sensor networks and if a solution that ts all the scenarios
makes sense at all.
Let us summarize the sources of errors the designer will be facing: nodes will stop functioning starting
with even the (rough) deployment phase. The harsh environment will continuously degrade the per-
formances of the nodes making them unavailable as the time passes. Then, the wireless communication
medium will be an important factor to disturb the message communication and to affect the links and
implicitly the network topology. Evenwitha perfect environment, collisions will occur due tothe imprecise
local time estimates and lack of synchronization. Nevertheless, the probabilistic scheduling policies and
protocol implementations can be considered as sources of errors.
2006 by Taylor & Francis Group, LLC
Introduction to Wireless Sensor Networks 31-9
Another issue that can be addressed as a dependability attribute is the security. The communication
channel is opened and cannot be protected. This means that others are able to intercept and to disrupt
the transmissions or even to transmit their own data. In addition to accessing private information, a third
party could also act as an attacker that wants to disrupt the correct functionality of the network. The
security in a sensor network is a hard problem that still needs to be solved. Like almost any other protocol
in this sort of network, it has contradictory requirements: the schemes employed should be as light as
possible while achieving the best results. The usual protection schemes require too much memory and
computation power to be employed (the keys themselves are sometimes too big to t into the limited
available memory).
A real problem is how to control the sensor network itself. The sensor nodes will be too many to
be individually accessible to a single user and might also be deployed in an inaccessible environment.
By control we understand issues, such as deployment and installation, conguration, calibration and
tuning, maintenance, discovery, and reconguration. Debugging the code running in the network is
completely infeasible, as at any point inside, the user has access only to the high-level aggregated results.
The only real debugging and testing can be done with simulators that prove to be invaluable resources in
the design and analysis of the sensor networks.
31.5 Conclusions
This chapter was a brief introduction to the new eld of wireless sensor networks. It provided a short
overview of the main characteristics of this new set of tools that will soon enhance our perception
capabilities regarding the ambient world.
The major challenges have been identied, some initial steps have been taken and early prototypes are
already working. The following chapters of the book will focus on particular issues, giving more insight
into the current state of the art in the eld. The research in this area will certainly continue and there may
come a time when sensor networks will be deployed all around us and will become regular instruments
available to everyone.
References
[1] Weiser, M. The computer for the 21st century. Scientic American, 265, 6675, 1991.
[2] Whitehouse K. and Culler, D. Calibration as parameter estimation in sensor networks.
In Proceedings of ACM International Workshop on Wireless Sensor Networks and Applications
(WSNA02). Atlanta, GA, 2002.
[3] Want, R., Hopper, A., Falcao, V., and Gibbons, J. The active badge location system. ACM
Transactions on Information Systems, 10, 91102, 1992.
[4] Estrin, D., Govindan, R., Heidemann, J., and Kumar, S. Next century challenges: scalable coordin-
ation in sensor networks. In Proceedings of the International Conference on Mobile Computing and
Networking. ACM/IEEE Seattle, Washington, USA, 1999, pp. 263270.
[5] Akyildiz, I., Su, W., Sankarasubramaniam, Y., and Cayirci, E. Wireless sensor networks: a survey.
Computer Networks Journal, 38, 393422, 2002.
[6] Brooks, R.R., Ramanathan, P., and Sayeed, A.M. Distributed target classication and tracking in
sensor networks. Proceedings of the IEEE, 91, 11631171, 2003.
[7] DARPA. http://www.darpa.mil/body/off_programs.html.
[8] Polastre, J., Szewczyk, R., andCuller, D. Analysis of wireless sensor networks for habitat monitoring.
InWireless Sensor Networks, C.S. Ragavendra, K.M. Sivalingam, andT. Znati, Eds. Kluwer Academic
Publishers, Dordrecht, 2004.
[9] Juang, P., Oki, H., Wang, Y., Martonosi, M., Peh, L., and Rubenstein, D. Energy-efcient computing
for wildlife tracking: design tradeoffs and early experiences with zebranet. In Proceedings of the
Tenth International Conference on Architectural Support for Programming Languages and Operating
Systems (ASPLOS-X). San Jose, CA, 2002.
2006 by Taylor & Francis Group, LLC
31-10 Embedded Systems Handbook
[10] Yang, S. Redwoods go hightech: researchers use wireless sensors to study Californias state tree.
UCBerkeley News, 2003.
[11] IEEE Computer Science Society. Pervasive Computing, 3 Successfull Aging, 2004.
[12] Baldus, H., Klabunde, K., and Muesch, G. Reliable set-up of medical body-sensor networks.
In Proceedings of the Wireless Sensor Networks, First European Workshop (EWSN 2004). Berlin,
Germany, 2004.
[13] Basten, T., Geilen, M., and Groot, H. Omnia eri possent. In Ambient Intelligence: Impact on
Embedded System Design. Kluwer Academic Publishers, Dordrecht, 2003, pp. 18.
[14] Srivastava, M., Muntz, R., and Potkonjak, M. Smart kindergarten: sensor-based wireless networks
for smart developmental problem-solving environments (challenge paper). In Proceedings of the
Seventh Annual International Conference on Mobile Computing and Networking. ACM, Rome, Italy,
2001, pp. 132138.
[15] CoSense. http://www2.parc.com/spl/projects/ecca.
[16] Eyes. http://eyes.eu.org.
[17] Picoradio. http://bwrc.eecs.berkeley.edu/research/pico_radio.
[18] SensoNet. http://users.ece.gatech.edu/ weilian/sensor/index.html.
[19] SmartDust. http://robotics.eecs.berkeley.edu/pister/smartdust.
[20] TinyDB. http://telegraph.cs.berkeley.edu/tinydb.
[21] Wins. http://www.janet.ucla.edu/wins.
[22] Estrin, D., Culler, D., Pister, K., and Sukhatme, G. Connecting the physical world with pervasive
networks. IEEE Pervasive Computing, 1, 5969, 2002.
[23] Akyildiz, I., Su, W., Sankarasubramaniam, Y., and Cayirci, E. A survey on sensor networks. IEEE
Communication Magazine, 40, 102114, 2002.
[24] Pottie, G.J. and Kaiser, W.J. Wireless integrated network sensors. Communications of the ACM, 43,
5158, 2000.
[25] Chlamtac, I., Petrioli, C., and Redi, J. Energy-conserving access protocols for identication
networks. IEEE/ACM Transactions on Networking, 7, 5159, 1999.
[26] Schurgers, C., Raghunathan, V., and Srivastava, M.B. Power management for energy-aware
communication systems. ACM Transactions on Embedded Computing Systems, 2, 431447, 2003.
[27] Intanagonwiwat, C., Govindan, R., Estrin, D., Heidemann, J., and Silva, F. Directed diffusion for
wireless sensor networks. IEEE/ACM Transactions on Networking, 11, 2003.
[28] Bulusu, N., Heidemann, J., and Estrin, D. Gps-less low cost outdoor localization for very small
devices. In IEEE Personal Communications, 2000, pp. 2834.
[29] Doherty, L., Pister, K., and Ghaoui, L. Convex position estimation in wireless sensor networks.
In IEEE INFOCOM. Anchorage, AK, 2001.
[30] Langendoen, K. and Reijers, N. Distributed localization in wireless sensor networks: a quantitative
comparison. Computer Networks, Special Issue on Wireless Sensor Networks, 2003.
[31] Evers, L., Dulman, S., and Havinga, P. A distributed precision based localization algorithm for
ad hoc networks. In Proceedings of Pervasive Computing (PERVASIVE 2004), 2004.
[32] Zorzi, M. and Rao, R. Geographic random forwarding (geraf) for ad hoc and sensor networks:
energy and latency performance. IEEE Transactions on Mobile Computing, 2(4), 337348, 2003.
2006 by Taylor & Francis Group, LLC
32
Issues and Solutions
in Wireless Sensor
Networks
Ravi Musunuri,
Shashidhar Gandham,
and Maulin D. Patel
University of Texas at Dallas
32.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32-1
Sensor Networks versus Mobile ad hoc Networks
32.2 System Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32-3
Operational Model Radio Propagation Model
32.3 Design Issues in Sensor Networks . . . . . . . . . . . . . . . . . . . . . . 32-4
32.4 MAC Layer Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32-5
32.5 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32-5
Flat Routing Protocols Cluster-Based Routing Protocols
32.6 Other Important Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32-7
Security Location Determination Lifetime Analysis
Power Management Clock Synchronization Reliability
Sensor Placement and Organization for Coverage and
Connectivity Topology Control
32.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32-13
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32-14
32.1 Introduction
Due to advances in integrated circuits (ICs) fabrication technology and Micro Electro Mechanical Systems
(MEMSs) [1, 2], it is now commercially feasible to manufacture ICs with sensing, signal processing,
memory, and other relevant components built into them. Such ICs enabled with RF communication
bring forth a new kind of network, which are self-organizing and application specic. These networks are
referred to as wireless sensor networks.
A sensor network is a static ad hoc network consisting of hundreds of sensor nodes deployed on the
y for unattended operation. Each node consists of [35] sensors, processor, memory, radio, limited
power battery, and software components, such as operating system and protocols. The architecture of
a sensor node is completely dependent on the purpose of the deployment. But, we can generalize the
architecture [2] as shown in Figure 32.1.
32-1
2006 by Taylor & Francis Group, LLC
32-2 Embedded Systems Handbook
R
a
d
i
o
Operating system
and other software
CPU
AC/DC
convertor
Processor
Sensor
Memory
Battery
FIGURE 32.1 Sensor node architecture.
Sensor nodes are expected to monitor some surrounding environmental phenomena, process the data
obtained and forward this data toward a base station located on the periphery of the sensor network.
Wireless sensor networks have numerous applications in elds, such as surveillance, security, environ-
mental monitoring, habitat monitoring, smart spaces, precision agriculture, inventory tracking, and
healthcare [4].
The main advantage of sensor networks is their ability to be deployed in almost any kind of remote
terrain. Their unattended mode of operation makes them a preferable choice over ground based radar
systems [5]. The spatial distribution of sensor nodes ensures greater signal-to-noise ratio (SNR) by
combining signals from various sensors. Furthermore, higher level of redundancy allows greater fault
tolerance. As sensor nodes are expected to be manufactured for very less price, they canbe deployed inlarge
numbers. As a result, sensor networks can provide large coverage area through the union of individual
nodes coverage area. Since sensor nodes are expected to be deployed close to the object of interest,
obstruction of the line of sight for sensing activity is ruled out.
To illustrate the above-mentioned advantages consider an example of seismic detection [4]. The earth
generates seismic noise, which becomes attenuated and distorted with distance. Hence to increase the
probability of detection, it is advisable to have sensors closer to the source. To accomplish this, we need
to detect the exact location and time of the seismic activity, which happens to be the goal of deploying
sensors. If a distributed network of sensors was deployed across the entire geographical area of interest,
then there would be no requirement to pinpoint the locations where sensors need to be deployed.
32.1.1 Sensor Networks versus Mobile ad hoc Networks
Wireless sensor networks are signicantly different from mobile ad hoc networks (MANETS) [6] due to
following reasons:
Mode of communication. In MANETS, potentially any node can send data to any other node. But in
sensor networks, mode of communication is restricted. In general, base station will broadcast commands
to all sensor nodes in its network and sensor nodes send back sensed data to base station. Sometimes,
sensor nodes may need to forward sensed data to other sensor nodes, if base station is not reachable
directly. Depending on the application, some sensor networks will employ data aggregation at designated
nodes to reduce the bandwidth usage. Most of the sensor networks messages are routed to base stations.
Hence sensor nodes need not maintain explicit routing tables.
Node mobility. In MANETS every node can move. In general, sensor nodes are static and some
architectures have mobile base stations [7].
Energy. Nodes in MANETS have a rechargeable source of energy. Thus energy conservation is of
secondary importance. However, sensor networks consist of several hundreds of nodes, which need to
2006 by Taylor & Francis Group, LLC
Issues and Solutions in Wireless Sensor Networks 32-3
operate in remote terrain. Hence, battery replacement is not possible, due to which energy efciency is
critical for sensor networks.
Apart from the above mentioned differences, sensor nodes have low computational power, less cost as
compared to MANETS nodes. Protocols designed for sensor networks should be more scalable. Since they
are expected to be deployed in hundreds.
The remaining part of this chapter is organized as follows. In Section 32.2, we describe system
models used in the literature. Section 32.3 presents design issues in sensor networks. In Section 32.4,
medium access layer issues and few solutions proposed in literature are described. In Section 32.5, we
move on to at routing protocols and hierarchical routing protocols. We then describe other import-
ant issues, such as security, location determination, lifetime analysis, power management, and clock
synchronization.
32.2 System Models
Various system models proposed in the literature can be classied based on the following factors:
Mobility of base stations
Number of base stations
Method of organization (heirarchical/at)
System models considered by researchers until now consist of static sensor nodes randomly deployed
in a geographical area of interest. This geographical area of interest is often referred to as sensor eld.
Most of the models considered had a single, static base station [6, 811]. In Reference 12, the author
evaluates the best position to locate a base station and proposes to split large sensor networks into small
squares and move the base station to center of each square to collect the data. In Reference 7, the authors
propose to deploy multiple, intermittently mobile base stations to increase the lifetime of the sensor
networks.
32.2.1 Operational Model
Research on sensor networks done until now considered various operational models of the sensor nodes.
These models can be broadly classied as below:
Active. In case of active sensor networks [6,811,13] each sensor node would be sensing its environment
continuously. Based on how frequently this sensed data is forwarded toward the base station, sensor
networks can be further classied as
Periodic: Based on the application for which the sensor network is deployed it might be required to
gather the data from every sensor node periodically [8, 10].
Event driven: Sensor networks that are deployed for monitoring specic events gather data only
when the event of interest occurs [11, 13, 14]. For example, sensor nodes deployed to monitor
seismic activity in a region need to route data only when they detect seismic currents in their
proximity.
Passive. In the case of passive sensor networks, data forwarding would be triggered by a query from the
base station. Passive sensor networks can be further classied as follows:
Energized on query: In this operation mode, sensor nodes switch off their sensors most of the time.
Only when a query is generated for the data, the sensor node would switch on its sensor and record
the data to be forwarded.
Always sensing: This category of sensor nodes have their sensors running all the time. As soon
as there is a query for data from it, a sensor node would generate the data packet based on the
observations until now and forward the packet.
2006 by Taylor & Francis Group, LLC
32-4 Embedded Systems Handbook
32.2.2 Radio Propagation Model
Most of the researchers assumed that energy spent in transmission over wireless medium is in accordance
with the rst-order radio model [8, 11]. In this model energy required to transmit a signal has a xed part
and a variable part. The variable part is directly proportional to square of the distance. Some constant
energy is required to receive a signal by a receiving antenna.
32.3 Design Issues in Sensor Networks
Most sensor networks encounter operational challenges [15], such as ad hoc deployment, limited energy
supply, dynamic environmental conditions, and unattended mode of operation. Any solution proposed
for sensor networks should consider the following design issues:
Energy. Each sensor node is equipped with a limited battery supplied energy. Sensor nodes spend more
energy in communication than local computations.
1
As sensor nodes are deployed in large numbers,
it is not feasible to manually recharge the batteries. Thus sensor nodes should conserve their energy by
minimizing the number of messages that are to be transmitted. Based on energy source, sensor nodes can
be classied as below:
Rechargeable: Sensor nodes equipped with solar cells can recharge their batteries when sunlight is
available. For such sensor nodes, the main design criteria would be to maximize the number of
nodes operational during times when no sunlight is available.
Nonrechargeable: Sensor nodes equipped with nonrechargeable batteries will cease to operate once
they drain their energy. Thus, the main design issue in such sensor networks would be to maximize
the operational time of every sensor node.
Bandwidth. Sensor nodes need to communicate over the ISM (industrial, scientic, and medical)
band. When many nodes make an attempt to use the same communication frequency, there might be
a requirement to use the available bandwidth optimally.
Limited computation power and memory. As the processing power at each sensor node is limited,
proposed solutions for sensor networks should not expect sensor nodes to carry out computationally
intensive tasks.
Unpredictable reliability, failure models. Sensor networks are expected to be deployed in inaccessible and
hostile environment. As a result, it is possible for sensor nodes to crash or malfunction due to external
environmental factors. The proposed solutions should be based on failure models that account for such
possibility. Furthermore, failure of few nodes should not bring down the network.
Scalability. Sensor nodes are expectedtobe deployedinthousands. As a result, scalability is a critical issue
in design of sensor networks. Any solution proposed should be scalable to large-sized sensor networks.
Timeliness of action (latency). Latency is an important issue in sensor networks deployed for critical
applications, such as security and surveillance. Hence, the time elapsed between the time an event is
detected and the time the event is reported at the base station is to be minimized.
To address these design challenges several strategies, such as cooperative signal processing, Exploiting
redundancy, adaptive signal processing, and hierarchical architecture are going to be key building blocks for
sensor networks [3].
In near future we believe that sensor networks would nd wide acceptance in day to day activities
similar to computers. To attain such a wide-scale acceptance, sensor nodes should be affordable, easily
available, easily congurable (plug and play), and deployable. To accomplish these objectives we need
to come up with suitable Medium Access Control (MAC) layer protocols, routing protocols, location
1
To take an example for ground to ground communication [6] it takes 3 J of energy to transmit 1 Kb of data at
a distance of 100 m. Ageneral-purpose processor having a processing capability of 100 million instructions per second
would execute 300 million instructions for the same amount of energy.
2006 by Taylor & Francis Group, LLC
Issues and Solutions in Wireless Sensor Networks 32-5
discovery algorithms, power-management strategies, and solutions to other relevant problems. Some of
these design problems are well studied by researchers. In the next section, we present a brief overview of
existing solutions for each of these design problems.
32.4 MAC Layer Protocols
Medium Access Control layer provides topology information and channel allocation to the higher layers
in the protocol stack. Channel allocation is critical for energy-efcient functioning of the link layer.
Energy efciency and scalability [16] are main issues in developing MAC protocols for sensor net-
works. Fairness, latency, and throughput are also important performance measures for channel allocation
algorithms. A channel could be a time slot in Time Division Multiple Access (TDMA), a frequency band
in Frequency Division Multiple Access (FDMA), or a code in Code Division Multiple Access (CDMA).
Channel allocation algorithms should try to avoid energy wastage through:
Collisions: When two or more nodes, which are in direct transmission range of each other, transmit
packets in the same channel.
Overhearing: Nodes receive data destined for other nodes.
Idle listening: Unnecessarily listening to the channel when there are no packets to be received.
Control packet: Bandwidth wastage due to exchange of too many controls packets.
The existing solutions to channel allocation in ad hoc networks can be divided into two categories:
contention-based and contention-free methods. In contention-based solutions, the sender continu-
ously senses the medium. IEEE 802.11 Distributed Coordinated Function (DCF), MACAW [17], and
PAMAS [18] are examples of contention-based protocols. Contention-based schemes are not suitable
for sensor networks because of energy wastage due to collisions and idle listening [19].
Sensor networks should use organized methods for channel allocation. The organized methods of
channel allocationdetermine the networktopology rst andthenassignthe channels tothe links. Achannel
assignment should avoid co-channel interference, which avoids two consecutive links being assigned to the
same channel. Sensor networks channel allocation algorithmshould be distributed because network-wide
synchronization for calculation of a schedule would be an energy intensive procedure. Another reason for
distributed algorithms is that they scale well with an increase in network size and that they are robust to
network partitions and node failures.
In Reference 6, the authors proposed a Self-organizing MAC (SMAC) protocol for sensor networks.
SMAC is a distributed protocol, which enables nodes to discover their neighbors and build a network
topology for communication. SMACS builds a at topology, that is, there are no clusters or cluster heads.
In SMAC, each node allocates channels to links between itself and neighbors within a TDMA frame
referred to as super frame. In a given time slot, every node communicates with only one neighbor to avoid
interference. Nodes communicate intermittently and hence they can power themselves off when they have
no data to send. The super frame schedule is divided into two periods. In the rst bootup period, nodes try
to discover neighbors, and rebuild severed links. The second time period is reserved for communication
between nodes. Authors in Reference 6 also proposed an Eavesdrop and Register (EAR) protocol to handle
channel allocation with moving base stations. In Piconet [20], authors used periodic sleep cycle to save
energy. Here, if a node wants to communicate with neighbors then it has to wait until it receives broadcast
message from neighbors. Wei et al. [16] proposed an energy-efcient MAC protocol known as SMAC.
SMAC saves energy by avoiding collisions, overhearing, and idle listening and increases latency.
32.5 Routing
As stated earlier, each sensor node is expected to monitor some environmental phenomenon and forward
the corresponding data toward the base station. To forward the data packets each node needs to have the
routing information. Here, we would like to state that the ow of packets is mostly directed from sensor
2006 by Taylor & Francis Group, LLC
32-6 Embedded Systems Handbook
nodes toward the base station. As a result, each sensor node need not maintain explicit routing tables.
Routing protocols can in general be divided into at routing and cluster-based routing protocols.
32.5.1 Flat Routing Protocols
In at routing protocols the nodes in the network are considered to be homogeneous. Each node in
the network participates in route discovery, maintenance, and forwarding of the data packets. Here,
we describe few existing at routing protocols for the sensor networks.
Sequential Assignment Routing (SAR) [6] takes into consideration the energy and Quality of Service
(QoS) for each path, and the priority level of each packet for making routing decisions. Every node main-
tains multiple paths tothe sink toavoidthe overheadof route recomputationdue tothe node or link failure.
Estrin et al. [21] proposed a diffusion-based scheme for routing queries from base station to sensor
nodes and forwarding corresponding replies. In directed diffusion, an attribute-based naming is used
by the sensor nodes. Each sensor names data that it generates using one or more attributes. A sink may
query for data by disseminating interests. Intermediate nodes propagate these interests. Interests establish
gradients of data toward the sink that expressed that interest.
The minimum cost forwarding approach proposed by Ye et al. [9] exploits the fact that the dataow in
sensor networks is in a single direction and is always toward the xed base station. Their method neither
requires sensor nodes to have unique identity nor maintain routing tables to forward the messages. Each
node maintains the least cost estimate from itself to the base station. Each message to be forwarded is
broadcasted by the node. On receiving a message, the node checks if it is on the least cost path between
the source sensor node and the base station. If so, it would forward the message by broadcasting.
In Reference 7, the authors proposed to model the sensor network as a ow network and have proposed
an ILP (Integer Linear Program)-based routing method. The objective of this ILP-based method is to
minimize the maximum energy spent by any sensor node during a period of time. Through simulation
results the authors have shownthat our ILP-basedrouting heuristic increases the lifetime of sensor network
signicantly.
Kulik and coworkers [22] proposed a set of protocols to disseminate sensed data from the sensor to
other sensor nodes. Sensor Protocols for Information via Negotiation (SPIN) overcomes information
implosion and overlap by using negotiation and information descriptors (metadata). Authors proposed
different protocols for both point-to-point and broadcast channels.
32.5.2 Cluster-Based Routing Protocols
In cluster-based routing protocols special nodes, referred to as cluster heads, discover and maintain routes
and noncluster-head nodes join one of the clusters. All the data packets originating in the cluster are
forwarded toward the cluster head. Cluster head in turn will forward these packets toward destination
using the routing information. Here, we describe some cluster-based routing protocols fromthe literature.
Chandrakasan et al. [23] proposed Low Energy Adaptive Clustering Hierarchy (LEACH) as an energy-
efcient communication protocol for wireless sensor networks. In LEACH, self-elected cluster heads
collect data from all the sensor nodes in their cluster, aggregate the collected data by data fusion methods,
and transmit the data directly to the base station.
In Reference 11, the authors have classied sensor networks into proactive networks and reactive
networks. Nodes in proactive networks continuously monitor the environment and thus have the data to
be sent at a constant rate. LEACH suits such sensor networks in transmitting data efciently to the base
station. In case of the reactive sensor networks, nodes need to transmit the data only when an event of
interest occurs. Hence, all the nodes in the network do not have equal amount of data to be transmitted.
Manjeshwar et al. proposed Threshold sensitive Energy-Efcient sensor Network (TEEN) protocol [11]
for routing in reactive sensor networks.
Estrin et al. [21] proposed a two-level clustering algorithm that can be extended to build a cluster
hierarchy.
2006 by Taylor & Francis Group, LLC
Issues and Solutions in Wireless Sensor Networks 32-7
32.6 Other Important Issues
In this section we will discuss other important issues, such as security, location determination, lifetime
analysis, power management, and clock synchronization. We describe why these issues are paramount for
the functioning of sensor networks and some solutions proposed in the literature.
32.6.1 Security
Security is a very critical issue for the envisioned mass deployment of sensor networks. In particular,
a strong security framework is a must in battleeld and border monitoring applications. The security
framework in sensor networks should provide the following objectives:
Authentication/nonrepudiation: Each sensor should be able to identify the sender of a message
correctly and no node should deny its previous actions.
Integrity: Messages sent over wireless medium should not be altered by unauthorized entities.
Condentiality: Messages should be kept secret from unauthorized entities.
Freshness: Messages received by sensors should be current.
32.6.1.1 Sensor Networks versus ad hoc Networks: Security Perspective
Sensor networks share some similarities with ad hoc networks. But security in sensor networks is different
from ad hoc networks due to the following reasons:
Node power. Sensor nodes have limited power supply and lowcomputational capabilities as compared to
ad hoc nodes. Asymmetric Key encryption [24] schemes requires large computational power as compared
to Symmetric Key encryption [24]. Thus, sensor networks can only use Symmetric Key encryption. To use
Symmetric Key encryption mechanisms we need to address the Key Distribution problem.
Mode of communication. As stated earlier, most of the communication in the sensor networks is from
sensor nodes to the base station. At times, base station would issue commands to the sensor nodes. In this
mode of communication, every node may not need to share keys with every other node in its network.
Moreover, it is not practical to store keys, that are shared with every node, at every node.
Node mobility. In ad hoc networks every node can move. In general, sensor nodes are static and in some
architectures have mobile base stations as in Reference 7.
The above differences make ad hoc networks or any other traditional networks security protocols
impractical for sensor networks.
32.6.1.2 Proposed Security Protocols
Recently, there has been some work related to sensor network security. Perrig et al. [25] proposed
SPINS: Security Protocols for Sensor Networks. SPINS framework consists of two protocols to satisfy
the security objectives. Secure Network Encryption Protocol (SNEP) provides data integrity, two party
authentication, data freshness, and micro-Timed Efcient Streaming Loss-tolerant Authentication Protocol
(TESLA) provides authenticated broadcast. In SNEP, each sensor node and base station share a unique
key, whichis bootstrapped. This sharedkey andanincremental message counter, maintainedat bothsensor
node and base station, are used to derive new keys using the RC5 [24] algorithm. In TESLA, sender
generates chain of keys using one way function MD5 [24]. The important property of the chain of keys is
that if the sender authenticates the initial key then other keys in the chain are self-authenticated. Sender
divides time into equal intervals and each interval is assigned a key from the chain of keys. Sender and
receiver agree upon the key disclosure schedule. The rst key fromthe chain is authenticated using unicast
authentication. After the rst authentication, receiver authenticates packets after receiving a symmetric
key from sender as per the disclosure schedule. TESLA employs delayed disclosure of symmetric keys to
authenticate packets after the rst authentication of one key in the chain.
In Reference 26, the authors proposed a security framework based on broadcast with end-to-end
encryption of the data. This scheme avoids trafc analysis and also removes compromised and dead nodes
2006 by Taylor & Francis Group, LLC
32-8 Embedded Systems Handbook
from the network. Sasha et al. [27] divided the messages in sensor networks into three classes depending
on security required. Each class of messages is encrypted using different encryption keys. They showed
that this multilevel scheme saves the resources at the nodes.
In general, the base stations will broadcast commands to all the sensor nodes. Hence, secure broadcast is
a very important issue inthe security framework. InTESLA, the authenticationof the rst key inthe chain
is done using a unicast mechanism. This unicast authentication mechanism has the scalability problem.
Authors inReference 28 replaced this unicast-based mechanismwith a broadcast-based mechanism, which
avoids denial of service [24] attacks. In Reference 29, the authors proposed a routing aware broadcast key
distribution algorithm. Karlof and Wagner [30] described possible attacks on different routing protocols
in literature and suggested countermeasures.
The Asymmetric key mechanism requires large computational power, bandwidth, and memory.
Therefore, sensor networks employ the symmetric key encryption to satisfy security objectives. A key
distribution [3135] in the symmetric key encryption mechanism is another important issue in sensor
networks. Eschenauer and Gligor [31] proposed a probabilistic predistribution of keys scheme. In this
scheme, every sensor node will be given a small set of m keys out of a large set of available keys such
that every two sensor nodes will have one common key with the given probability p. This scheme dra-
matically reduces the number of keys stored in each sensor as compared to storing separate keys to every
node in its network. In Reference 31, the authors proposed three extensions to this basic key distribution
scheme. In the rst, q-composite keys extention, sensor nodes will share q common keys instead of one
key with a given probability p. This extention improves the security against small-scale attacks, such as
eavesdropping on one link. The second extention, multi-path extention, deals with setting up end-to-end
path keys between two communicating nodes. In this extention, path keys between two nodes is estab-
lished by sending random keys through every available path between them. A receiver uses all received
random keys along all the paths to establish a path key. This improves the security against large-scale
attacks, such as eavesdropping many links. The third extention, random pairwise keys scheme, provides
the node-to-node authentication. In this scheme unique node identities are generated randomly. Every
node is randomly paired with m other nodes and m corresponding keys. Every node is aware of the
other nodes Id in the pair and the corresponding key. This node Id information is used for node-to-node
authentication.
32.6.2 Location Determination
Sensor nodes monitor surrounding phenomenon, such as temperature, light, seismic currents, chemical
leaks, radiation, and other parameters of interest. After detecting an event, sensor nodes forward the
sensed data toward the nearest base station. In order to process any message reported by the sensor
network, the base station is required to know the senders location. For example, if the sensor network
is deployed to detect forest res, the base station should know the reporting sensors location. Hence,
the base station needs to be aware of the location of every sensor node deployed in the network. In this
section we will explain different solutions proposed in the literature for location determination in sensor
networks. Locationing algorithms performance can be measured [36] by the following parameters:
Resolution: The smallest distance betweennodes that canbe distinguishedby the locationing system.
Accuracy: Probability of locationing system nding the correct location.
Robustness: Ability of the locationing system to nd the correct location when subjected to node
failures and link failures.
Global Positioning System (GPS) [37] has been used to locate outdoor nodes. Due to reection and
multi-path fading GPS is not a viable option for indoor locationing. Since sensor nodes can be deployed
at indoor locations or on other planets, the GPS-based locationing system is not suggestible. Many
non-GPS-based locationing solutions are proposed by the research community. Most of these solutions
are either proximity or beacons based. In proximity-based solutions, some nodes will act as special nodes,
whose locations are known. We can divide proximity-based solutions into two types. In the rst type [38],
2006 by Taylor & Francis Group, LLC
Issues and Solutions in Wireless Sensor Networks 32-9
beacons are sent by special nodes, from which other nodes can approximate their location. In the second
type of proximity-based solutions [39], beacons are sent by nonspecial nodes, from which special nodes
can approximate the locations of nonspecial nodes. Cricket [38] uses the difference in arrival times from
known beacons as the basis for nding the location. In RADAR [40], authors used SNR as the basis for
nding the location of nodes. SpotON [41] system nds the location of nodes in the three-dimensional
space. These solutions can be adopted for the location detection in sensor networks. In Reference 42,
the authors proposed a location detection scheme, which consists of local positioning and global posi-
tioning. In local positioning, nodes will approximate their relative location from anchor nodes, whose
locations are assumed, by using triangulation method. Global positioning nds global location by using
cooperative ranging approach, in which nodes iteratively converge to global position by interacting with
each other.
Saikat et al. [36] proposed a robust location detection algorithm for emergency applications. None
of the above-explained solutions dealt with robustness. Robustness is an important issue in emergency
scenarios, such as building collapses. This location detection improves robustness by nding identifying
codes. Estrin et al. [21] gave an interesting application of their clustering algorithmto pinpoint the location
of an illegitimate object. This algorithm is robust to link or node failures and the overhead is proportional
to the local population density and a sublinear function of the total number of nodes.
32.6.3 Lifetime Analysis
Lifetime refers to the time period for which a sensor network is capable of sensing and transmitting the
sensed data to the base station(s). In sensor networks, thousands of nodes are powered with very limited
supply of battery power. As a result, lifetime analysis becomes an important tool to efciently use the
available energy. In sensor networks using rechargeable energy, such as solar energy, lifetime analysis helps
the nodes to use the energy efciently before recharging. Lifetime analysis may include an upper bound
on the lifetime and factors inuencing this upper bound.
Theoretical upper bound on the lifetime of a sensor network helps to understand the efciency of
other protocols. Bhardwaj et al. [14] proposed a theoretical upper bound on the lifetime of a sensor
network deployed for tracking movement of external objects. In Reference 43, authors found the lifetime
of a sensor network with Hybrid Automata modeling. Hybrid Automata is a mathematical method to
analyze the systems with both discrete and continuous behaviors. The authors used the trace data to
analyze the power consumption and to estimate the lifetime of a sensor network.
32.6.4 Power Management
Sensor networks should operate with the minimum possible energy to increase the life of sensor nodes.
This requires power aware computation/communication component technology, low-energy signaling
and networking, and power aware software infrastructure.
Design challenges encountered in the building of wireless sensor networks can be broadly classied
into hardware, wireless networking, and OS/applications. All three categories should minimize the power
usage to increase the life of a sensor node. Hardware includes the design activities related to all hardware
platforms that make up sensor networks. MEMS, digital circuit design, system integration, and RF are
important categories in the design of hardware. The second aspect includes design of power-efcient
algorithms and protocols. In previous sections, we described few energy-efcient protocols for MAC and
routing. Next, we present few OS/application-level strategies related to power management in sensor
nodes.
Once the system is designed, additional power savings can be obtained by using the Dynamic Power
Management (DPM) [44]. The basic idea behind DPM is to shutdown (sleep mode) the devices when not
needed and get them back when required. This needs an embedded operating system [45] that is able to
support the DPM. The switching of a node from the sleep state to the active state takes some nite time
and resource. Each sensor node could be equipped with multiple devices. The number of devices switched
2006 by Taylor & Francis Group, LLC
32-10 Embedded Systems Handbook
off determines the level of the sleep state. Each sleep state is characterized by the latency and the power
consumption. The deeper the sleep state, the lesser the power consumption, and more the latency. This
requires a careful use of DPM to maximize the life of a sensor node. But in many cases it is not known
beforehand when a particular device is required. Hence, a stochastic analysis should be applied to predict
the future events.
Energy can be conserved by using Dynamic Voltage Scheduling (DVS) [44, 46]. DVS minimizes the idle
processor cycles by using a feed-back control system. Energy savings can be obtained by optimizing the
sensor nodes performance in active state. DVS is an effective tool to achieve this goal. The main idea
behind DVS is to change the power supply to match the workload. This needs tuning of the processor to
deliver the required throughput to avoid idle cycles. The crux of the problem lies in the fact that future
workloads are nondeterministic. So the efciency depends on predicting the future workload.
Efcient Link Layer strategies can be used to conserve energy at each sensor node. In Reference 47, the
authors propose to conserve energy by compromising the quality of the link layer established. This is
possible by maintaining bit error rate (BER) just below the user requirements. Different error controlling
algorithms, such as the BoseChaudhuriHocquen (BCH) coding, convolution coding, and turbo-coding,
can be employed for error control. The algorithm with the lowest power consumption to support the
predetermined BER and latency should be chosen.
Local computationandprocessing [23,45] of sensor data inwireless networks canbe made highly energy
efcient. Partitioning the computation among multiple sensor nodes and performing the computation
in parallel permits a greater control on latency and results in the energy conservation through frequency
scaling and voltage scaling.
Biomedical wireless sensor networks could use Power Efcient Topologies [48] to save the energy spent
in communication. Biomedical sensor nodes include monitors and implantable devices intended for
long-term placement in the human body. Topology is predetermined in these sensor networks. Ayad et al.
proposed Directional Source-Aware routing Protocol (DSAP) for this class of sensor networks. DSAP
incorporates power considerations into routing tables. The authors explored various topologies to
determine the most energy-efcient topology for biomedical sensor networks.
32.6.5 Clock Synchronization
Some of the communication algorithms, for wireless sensor networks, which are proposed in the literature,
make aninherent assumptionthat there exists some mechanismthroughwhichlocal clocks of all the sensor
nodes are synchronized. Though this assumption is valid we need to have an explicit way of synchronizing
local clocks of all sensor nodes. Apart from the implementation of the communication algorithms, clock
synchronization is required for accurate time stamps in cryptographic schemes, for recognizing duplicate
detection of same event from different sensor nodes, for data aggregation algorithms, such as beam
forming, for ordering of logged events, and many other similar applications. In this section, a post facto
clock synchronization algorithm proposed by Jeremy et al. [49] is described.
The post facto clock synchronization algorithmdiscussed here is suitable for applications, such as beam
forming, duplicate event detection, and other similar localized methods. This algorithm is expected to
be implemented on systems similar to the WINS (Wireless Integrated Network Sensors) platform where
a processor has various sleep modes and has the capability of powering down high-energy peripherals.
Because of the capability of the sensor node processor to power down a device and power up when there is
a requirement to sense data and transmit, existing clock synchronization methods for distributed systems
are not applicable.
The basic idea behind post facto clock synchronization algorithm is that for certain applications such
as data fusion and beam forming it is sufcient to order the events in a localized fashion. In this scheme,
nodes clocks are normally unsynchronized. When a stimulus arrives (time to sense and transmit data),
each node records the stimulus with respect to its local clock. Immediately following this event a third
party will broadcast a synchronization pulse. Every node receiving this pulse normalizes their stimulus
time stamp with respect to broadcasted synchronizing pulse. It is essential to note that the time elapsed
2006 by Taylor & Francis Group, LLC
Issues and Solutions in Wireless Sensor Networks 32-11
between recording the stimulus and arrival of synchronized pulse needs to be measured accurately. For this
reason the algorithm is inappropriate for systems, which need to communicate a time stamp over a long
distance.
32.6.6 Reliability
Reliable transfer of critical sensed data is a very important issue in sensor networks. Reliability can be
achieved from the MAC layer, transport layer, or application layer. In Reference 50, authors concluded
that reliability in both MAC and transport layers are important.
In sensor networks, base station uses the data, sensed by different sensors, to conclude occur-
rence of events. Hence, reliability of data from sensors to base station is critical. In ESRT [51], sink
maintains an application-specic target reliability value, which depends on the reporting frequency of
sensor nodes. ESRT protocol adaptively adjusts the reporting frequency of sensors based on required
reliability.
32.6.7 Sensor Placement and Organization for Coverage and Connectivity
Sensor networks are deployed to perform sensing, monitoring, surveillance, and detection tasks. The area
in which a sensor node can perform its tasks with reasonable accuracy (i.e., the sensor readings have at
least a threshold level of sensing/detection probabilities within that area) is also known as the coverage
area. The union of coverage areas of individual sensor nodes is the coverage area of a sensor network.
The coverage area can be modeled as a circular disk (similar to a sphere in 3D) surrounding a sensor
at the center. The coverage areas can be irregular and can be location dependent due to the obstructions
in the terrain, for example, sensor nodes deployed for indoor applications, in urban and hilly areas [52].
The coverage area may also depend on the target, for example, a bigger target can be detected at a longer
distance than a smaller target [53].
The degree of the sensing coverage is a measure of the sensing quality provided by the sensor network
in a designated area. Coverage requirement depends on the application. For some applications, covering
every location with at least a single sensor node might be sufcient while other applications might need
higher degree of coverage [54], for example, to pinpoint the exact location of a target, it might be necessary
that every location be monitored by multiple sensor nodes [55]. Covering every location with multiple
sensors can provide robustness. Some applications may require preferential coverage of critical points,
for example, sensitive areas in the sensor eld may require more surveillance/monitoring and should be
covered by more sensors than other areas [52]. The coverage requirements can also change with time due
to changes in environmental conditions, for example, the visibility can vary due to fog or smoke. A low
degree of coverage might be sufcient in normal circumstances but when a critical event is sensed, a high
degree of coverage may be desired [54].
It is desirable to achieve the required degree of coverage and robustness with the minimum number
of active sensors so as to minimize the interference and the information redundancy [54, 56]. However,
due to the limited range of the wireless communication, the minimum number of sensors required for
the coverage may not guarantee the connectivity of the resulting sensor network. The network is said to
be connected if any sensor node can communicate with any other sensor node (possibly using other sensor
nodes as intermediate nodes). In some cases, the physical proximity of sensor nodes may neither guarantee
connectivity nor coverage due to the obstacles, such as buildings, walls, and trees. The connectivity of the
sensor nodes also depends on the physical layer technology used for communication. Some technologies
require the transmitter and the receiver to be in the line-of-sight, for example, infra-red, ultrasound [57].
Maintaining greater connectivity is desirable for good throughput and to avoid network partitioning due
to node failures [54].
The sensor nodes can be deployed randomly or deterministically in the sensor eld. Next we discuss
the issues and proposed strategies for placement and organization of sensor nodes.
2006 by Taylor & Francis Group, LLC
32-12 Embedded Systems Handbook
32.6.7.1 Sensor Placement for Connectivity and Coverage
When the sensor nodes are deployed deterministically, a good placement strategy can minimize the cost
and the energy consumption thereby increasing lifetimes of sensor nodes while guaranteeing the desired
level of coverage, connectivity, and robustness [55].
Chakrabarty et al. [55] and Ray et al. [57] have used a framework of identifying codes to determine
sensor placements for target location detection. The identifying code problem, in an undirected graph,
nds an optimal covering of vertices such that any vertex on the graph can be uniquely identied by
the subset of vertices that cover it. If each location in the sensor eld is covered by a unique subset
of sensors then the position of a target can be determined from the subset of sensors that observe the
target. However, to determine the minimum number of sensors that must be deployed for uniquely
identifying each position of the target is equivalent to constructing an optimal identifying code, which is
an NP-complete problem [57]. Ray et al. [57] have proposed a polynomial-time algorithm to compute
irreducible identifying codes such that resulting codes can tolerate up to a given number of errors in the
received identifying code packets, while still providing information position.
Zou and Chakrabarty [58] have proposed a virtual force algorithm to improve the coverage after an
initial randomdeployment of sensor nodes. Initially, the sensors are deployed randomly in the sensor eld.
It is assumed that if two sensor nodes are very close to each other (closer than predened threshold) then
they exert (virtual) repulsive forces on each other. If two sensor nodes are very far (farther than predened
threshold) then they exert (virtual) attractive forces on each other. The obstacles exert repulsive forces
and areas of preferential coverage exert attractive forces on a sensor node. The objective is to move
sensor nodes from densely concentrated regions to sparsely concentrated regions so as to achieve uniform
placement. The sensor nodes do not physically move during the execution of the virtual force algorithm
but a sequence of virtual motion path is determined. After the new positions of the sensors are identied,
a one-time movement is carried out to redeploy the sensors at their new position.
32.6.7.2 Sensor Organization for Connectivity and Coverage
Sensor networks deployed in enemy territories, inhospitable areas, or disaster struck areas preclude
deterministic placement of sensor nodes [53]. Dispersing a large number of sensor nodes over a sensor
eld from an airplane is one way to deploy sensor networks in those areas. Since the sensor nodes may be
scatteredarbitrarily, a very large number of sensor nodes are deployedcomparedwiththe number of sensor
nodes that wouldhave beendeployedif deterministic placement was possible. Therefore, it is advantageous
to operate the minimum number of sensor nodes required for sensing coverage and connectivity in the
active mode and the remaining nodes in the passive (sleep) mode. The passive nodes can be made active as
and when neighboring active nodes deplete their energies or fail so as to increase the lifetime of the sensor
network. When the sensor nodes are deployed randomly, the main challenge is to develop an efcient
distributed localized strategy for sensor organization that would maximize the lifetime of the network
while guaranteeing the coverage and connectivity of active nodes [54, 56].
Wang et al. [54] have proposed a Coverage Conguration Protocol (CCP) which minimizes the number
of active nodes required for coverage and connectivity. CCP assumes that the sensing areas and the
transmission areas are circular and obstacle-free. The authors have shown that the set of sensor nodes
that covers the convex region are connected if the transmission radius is at least twice the sensing radius.
In CCP each node determines whether it is eligible to become active or not based on the coverage provided
by its active neighbors. It is shown that a set of sensors in a convex region provide the required degree of
coverage if (1) all the intersection points between any sensing circles have the required degree of coverage
and (2) all the intersection points between any sensing circle and the regions boundary have the required
degree of coverage. A sensor node discovers other active sensor nodes and their locations that are within
a distance of twice of the sensing radius through HELLO messages. Then it nds the coverage degree
of all the intersection points within its coverage area. A sensor node is not eligible to become active
if all the intersection points within its coverage area have the required degree of coverage. If there are no
intersection points within its coverage area then it is ineligible if there are required number of active
sensors located at the same position as itself. Each node periodically checks it eligibility, and only eligible
2006 by Taylor & Francis Group, LLC
Issues and Solutions in Wireless Sensor Networks 32-13
nodes remain active, sense environment, and communication with other active nodes. As active nodes
deplete their energy, nonactive nodes become eligible and become active to maintain the required degree
of coverage.
Zhang and Hou [56] have proposed an Optimal Geographical Density Control (OGDC) algorithm,
which maintains coverage and connectivity by keeping the minimum number of sensors in the active
mode. The idea behind OGDC is similar to that of CCP, that is, if all the intersection points are covered
by active sensor nodes then the entire area is covered. OGDC minimizes the number of active nodes
by selecting nodes such that the overlap of sensing areas of active nodes is the minimum. It is shown
that to minimize the overlap, the intersection point of two circles should be covered with a third circle
such that the centers of the circles form an equilateral triangle with a side length of
3r. To cover the intersection point of these two circles, a third node
is selected whose position is closest to the optimal position (centers of three circles form an equilateral
triangle). These processes continues until the entire area is covered. All the selected nodes become active
and nodes not selected go to sleep mode.
32.6.8 Topology Control
The topology of the sensor network is induced by the wireless links connecting the sensor nodes. The wire-
less connectivity of the nodes depends on many parameters, such as the physical layer technology,
propagation conditions, terrain, noise, antenna characteristics, and the transmit power [59]. The topology
of the network can be controlled by adjusting the tunable parameters, such as the power levels of the trans-
mitters [5963]. The topology of the network affects its performance in many ways. A sparse topology can
increase the chances of network partitioning due to node failures and can increase the delay. On the other
hand, a dense topology can limit the capacity due to limited spatial reuse and can increase the interference
and the energy consumption [59]. A distributed localized topology control algorithm that adjusts the
tunable parameters to achieve the desired level of performance while minimizing the energy consumption
is highly desirable.
Wattenhofer et al. [62, 63] have proposed a two-phase distributed Cone-Based Topology Control
(CBTC) algorithm. In the rst phase each node broadcasts a neighbor-discovery message with a small
radius and records all the acknowledgments and the direction from which the acknowledgments came.
The node continues its neighbor discovery process by increasing transmission power (radius) until either
it nds at least one neighbor in every cone of degrees centered on that node or it reaches its maximum
transmission power. The authors have proved that for 5/6, the algorithm guarantees that the
resulting network topology is connected. In the second phase the algorithm eliminates redundant edges
without affecting the minimum power routes of the network.
Li et al. [60] have proposed an MST (Minimum Spanning Tree)-based topology control algorithm,
called Local MinimumSpanning Tree (LMST). In the information exchange phase each node collects node
ids and positions of all the nodes within its maximum transmission range using HELLO messages. In the
topology construction phase each node independently constructs its local MST using Prims algorithm.
The transmission power needed to reach a node is taken as the cost of an edge to that node. The nal
topology of the network is derived from all the local MSTs by keeping only on-tree nodes that are one
hop away as neighbors. To retain only bidirection links, either convert all the unidirectional links into
bidirectional or delete the unidirectional links. The authors have proved that the resulting topology
preserves the network connectivity and the node degree of any node is bounded by 6.
32.7 Conclusions
In this chapter we made an attempt to present an overview of the wireless sensor networks and describe
some design issues. We discussed various solutions proposed to prolong the lifetime of sensor networks.
2006 by Taylor & Francis Group, LLC
32-14 Embedded Systems Handbook
Proposed solutions to issues, such as MAC layer, routing data from sensor node to the base station, power
management, location determination, and clock synchronization, were discussed.
References
[1] Sohrabi, K. and Pottie, G.J. Performance of a novel self-organization protocol for wireless
ad-hoc sensor networks. In Proceedings of the IEEE Vehicular Technology Conference, vol. 2, 1999,
pp. 12221226.
[2] Min, R., Bhardwaj, M., Seong-Hwan Cho, Shih, E., Sinha, A., Wang, A., and Chandrakasan, A.
Low-power wireless sensor networks. In Proceedings of the 14th International Conference on VLSI
Design, 2001, pp. 205210.
[3] Estrin, D., Girod, L., Pottie, G., and Srivastava, M. Instrumenting the world with wireless sensor
networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing, 2001, pp. 20332036.
[4] Pottie, G.J. Wireless sensor networks. In Information Theory Workshop, 1998, pp. 139140.
[5] Agre, J. and Clare, L. An integrated architecture for cooperative sensing networks. Computer, 33,
106108, 2000.
[6] Sohrabi, K., Gao, J., Ailawadhi, V., and Pottie, G.J. Protocols for self-organization of a wireless
sensor network. IEEE Personal Communications, 7, 1627, 2000.
[7] Shashidhar Rao Gandham, Milind Dawande, Ravi Prakash, and Venkatesan, S. Energy
efcient schemes for wireless sensor networks with multiple mobile stations. IEEE Globecom,
1, 377381, 2003.
[8] Heinzelman, W., Kulik, J., and Balakrishnan, H. Adaptive protocols for information dissemination
in wireless sensor networks. In Proceedings of the Fifth Annual ACM/IEEE International Conference
on Mobile Computing and Networking, 1999, 174185.
[9] Ye, F., Chen, A., Liu, S., and Zhang, L. A scalable solution to minimum cost forwarding in large
sensor networks. InProceedings of the Tenth International Conference onComputer Communications
and Networks, 2001, pp. 304309.
[10] Lindsey, S. and Raghavendra, C.S. PEGASIS: power-efcient gathering in sensor information
systems. In Proceedings of the International Conference on Communications, 2001.
[11] Manjeshwar, A. and Agrawal, D.P. TEEN: a routing protocol for enhanced efciency in wire-
less sensor networks. In International Proceedings of the 15th Parallel and Distributed Processing
Symposium, 2001, pp. 20092015.
[12] Gao, J. Analysis of energy consumption for ad hoc wireless sensor networks using the watts-
per-meter metric. IPN Progress Report, 42150, 2002.
[13] Youssef, M.A., Younis, M.F., and Arisha, K.A. A constrained shortest-path energy-aware routing
algorithm for wireless sensor networks. In Proceedings of the Wireless Communications and
Networking Conference, vol. 2, 2002, pp. 794799.
[14] Bhardwaj, M., Chandrakasan, A., and Garnett, T. Upper bounds on the lifetime of sensor networks.
In Proceedings of the IEEE International Conference on Communications, 2001, pp. 785790.
[15] Elson, J. and Estrin, D. Time synchronization for wireless sensor networks. In Proceedings of the
15th International Parallel and Distributed Processing Symposium, 2001, pp. 19651970.
[16] Wei Ye, John Heidemann and Deborah Estrin. An energy-efcient MAC protocol for wireless
sensor networks. In Proceedings of the IEEE INFOCOM, 2002.
[17] Bhargavan, V., Demers, A., Sheker, S., and Zhang, L. MACAW: a media access protocol for wireless
LANS. In Proceedings of the ACM SIGCOMM Conference, 1994.
[18] Singh, S. and Ragavendra, C.S. PAMAS: power ware multi-access protocol with signalling for
ad-hoc networks. ACM Computer Communication Review, 28, 526, 1998.
[19] Andrew, S. Tanenbaum. Computer Networks, 3rd ed., Prentice-Hall Inc., New York, 1996.
[20] Frazer Bennett, DavidClarke, JosephB. Evans, Andy Hopper, AlanJones, andDavidLeask. Piconet:
embedded mobile networking. IEEE Personal Communications, 4, 815, 1997.
2006 by Taylor & Francis Group, LLC
Issues and Solutions in Wireless Sensor Networks 32-15
[21] Estrin, D., Govindan, R., Heidemann, J., and Kumar, S. Next century challenges: scalable
coordination in sensor networks. In Proceedings of the Fifth Annual ACM/IEEE International
Conference on Mobile Computing and Networking, 1999, pp. 263270.
[22] Heinzelman, W., Kulik, J., and Balakrishnan, H. Negotiation-based protocols for disseminating
informationinwireless sensor networks. InProceedings of the FifthAnnual ACM/IEEEInternational
Conference on Mobile Computing and Networking, 1999.
[23] Heinzelman, W.R., Chandrakasan, A., and Balakrishnan, H. Energy-efcient communication
protocol for wireless micro sensor networks. In Proceedings of the 33rd Annual Hawaii International
Conference on System Sciences, 2000, pp. 30053014.
[24] Menezes, AlfredJ., vanOorschot, Paul C., andVanstone Scott A. Handbook of AppliedCryptography.
CRC Press, Boca Raton, FL, October 1996.
[25] PerrigAdrian, SzewczykRobert, WenVictor, Culler David, andTygar, J.D. SPINS: security protocols
for sensor networks. Wireless Networks Journal, 8, 521534, 2002.
[26] Undercoffer Jeffery, Avancha Sasikanth, Joshi Anupam, and Pinkston John. Security for sensor
networks. In CADIP Research Symposium, 2002.
[27] Slijepcevic Sasha, Potkonjak Miodrag, Tsiatsis Vlasios, Zimbeck Scott, and Srivastava Mani B.
On communication security in wireless ad-hoc sensor network. In Proceedings of the 11th IEEE
International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises,
Pittsburgh, PA, June 1012, 2002.
[28] Liu, Donggang and Ning, Peng. Efcient distribution of key chain commitments for broadcast
authentication in distributed sensor networks. In Proceedings of the 10th Annual Network and
Distributed System Security Symposium. San Diego, CA, February 2003.
[29] Lazos, Loukas andPoovendran, Radha. Secure broadcast inenergy-aware wireless sensor networks.
In Proceedings of the IEEE International Symposium on Advances in Wireless Communications.
Victoria, BC, Canada, September 2324, 2002.
[30] Karlof, Chris and Wagner, David. Secure routing in wireless sensor Networks: attacks and coun-
termeasures. In Proceedings of the First IEEE International Workshop on Sensor Network Protocols
and Applications, May 2003.
[31] Eschenauer, Laurent and Gligor, Virgil D. A Key-Management Scheme for Distributed Sensor
Networks. ACM Conference on Computer and Communications Security, Washington DC, USA,
2002, pp. 4147.
[32] Chan, Haowen, Perrig, Adrian, and Song, Dawn. Random key predistribution schemes for sensor
networks. In Proceedings of the 2003 IEEE Symposium on Research in Security and Privacy, 2003.
[33] Carman, c.D.W., Matt, B.J., and Cirincione, G.H. Energy-efcient and low-latency key
management for sensor networks. In Proceedings of the 23rd Army Science Conference, Orlando,
FL, December 25, 2002.
[34] Law, Yee Wei, Etalle, Sandro, and Hartel, Pieter H. Key management with group-wise pre-deployed
keying and secret sharing pre-deployed keying. Centre for Telematics and InformationTechnology,
University of Twente, The Netherlands, Technical report (TR-CTIT-02-25), July 2002.
[35] Law, Yee Wei, Corin, Ricardo, Etalle, Sandro, and Hartel, Pieter H. Aformally veried decentralized
key management architecture for wireless sensor networks. 4th IFIP TC6/WG6.8 Int. Conf on
Personal Wireless Communications (PWC), LNCS 2775, Venice, Italy, September 2003, pp. 2739.
[36] Ray, Saikat, Ungrangsi, Rachanee, De Pellegrini, Francesco, Trachtenberg, Ari and
Starobinski, David. Robust location detection in emergency sensor networks. In Proceedings of
INFOCOM, 2003.
[37] Hofmann-Welleenhof, B., Lichtenegger, H., and Collins, J. Global Positioning Sytem: Theory and
Practice, 4th ed., Springer-Verlag, Heidelberg, 1997.
[38] Priyantha, Nissanka B., Chakraborthy, Anit, and Balakrishnan, Hari. The cricket location-support
system. In Proceedings of the ACM MOBICOM Conference, Boston, MA, 2000.
[39] Want, Roy, Hopper, Andy, Falcao, Veronica, and Gibbons, Jon. The active badge location system.
ACM Transactions on Information Sytems, 10, 91102, 1992.
2006 by Taylor & Francis Group, LLC
32-16 Embedded Systems Handbook
[40] Bahl, Paramvir and Padmanabhan, Venkata N. RADAR: an in-building RF-based user
location and tracking system. In Proceedings of the IEEE INFOCOM Conference. Tel Aviv,
Israel, 2000.
[41] Hightower, Jefferey, Borriello, Gaetano, and Want, Roy. SpotON: an indoor 3D location sensing
technology based on RF signal strength. Technical report, 2000-020-02, University of Washington,
February 2000.
[42] Savarese, C. and Rabaey, J. Locationing in distributed ad-hoc wireless sensor networks. In IEEE
Proceedings on Acoustics, Speech, and Signal Processing, 2001, pp. 20372040.
[43] Colerim, Sinem, Ergen, Mustafa, andJohnKoo, T. Lifetime analysis of a sensor network withhybrid
automata modelling. In Proceedings of the ACM WSNA Conference. Atlanta, GA, September 2002.
[44] Sinha, A. and Chandrakasan, A. Dynamic power management in wireless sensor networks.
IEEE Design and Test of Computers, 18, 6274, 2001.
[45] Wang, A. and Chandrakasan, A. Energy efcient system partitioning for distributed wireless
sensor networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and
Signal Processing, vol. 2, 2001, pp. 905908.
[46] lm, C., Huiseok Kim, and Soonhoi, Ha. Dynamic voltage scheduling technique for low-power
multimedia applications using buffers. In Proceedings of the International Symposium on Low
Power Electronics and Design, 2001, pp. 3439.
[47] Shih, E., Calhoun, B.H., Cho, Seong Hwan, and Chandrakasan, A.P. Energy-efcient link layer for
wireless micro sensor networks. In Proceedings of the IEEE Computer Society Workshop on VLSI,
2001, pp. 1621.
[48] Salhieh, A., Weinmann, J., Kochha, M., and Schwiebert, L. Power efcient topologies for wire-
less sensor networks. In Proceedings of the International Conference on Parallel Processing, 2001,
pp. 156163.
[49] Elson, J. and Estrin, D. Time synchronization for wireless sensor networks. In International
Proceedings of the 15th Parallel and Distributed Processing Symposium, 2001, pp. 19651970.
[50] Stann, Fred and Heidemann, John. RMST: reliable data transport in sensor networks.
In Proceedings of the IEEE International Workshop on Sensor Net Protocols and Applications,
May 2003.
[51] Yogesh, S., Ozgur, B. Akan, and Akyildiz, Ian F. ESRT: event-to-sink reliable transport in wireless
sensor networks. In ACM MobiHoc, June 2003.
[52] Santpal Singh Dhillon and Chakrabarty, Krishnendu. Sensor placement for effective coverage and
surveillance in distributed sensor networks. In Proceedings of the IEEE Wireless Communications
and Networking Conference, March 2003.
[53] Slijepcevic Sasa, and Potkonjak, Miodrag. Power efcient organizationof wireless sensor networks.
In Proceedings of the IEEE International Conference on Communications, June 2001.
[54] Wang, X., Xing, G., Zhang, Y., Lu, C., Pless, R., and Gill, C. Integrated coverage and connectivity
conguration in wireless sensor networks. In Proceedings of the ACMSenSys 2003, November 2003.
[55] Chakrabarty, K., Iyengar, S.S., Qi, H., and Cho, E. Grid coverage for surveillance and tar-
get location in distributed sensor networks. IEEE Transactions on Computers, 51(12), 14481453,
December 2002.
[56] Zhang, Honghia and Hou, Jennifer C. Maintaining sensing coverage and connectivity in large
sensor networks. Technical report# UIUCDCS-R-2003-2351, University of Illinois at Urbana-
Champaign, June 2003.
[57] Ray, Saikat, Ungrangsi, Rachanee, De Pellegrini, Francesco, Trachtenberg, Ari, and Starobinski,
David. Robust location detection in emergency sensor networks. In Proceedings of the IEEE
INFOCOM, April 2003.
[58] Zou, Yi and Chakrabarty, Krishnendu. Sensor deployment and target localization based on virtual
forces. In Proceedings of the IEEE INFOCOM, April 2003.
[59] Ramanathan, Ram and Rosales-Hain, Regina. Topology control of multihop wireless networks
using transmit power adjustment. In Proceedings of the IEEE INFOCOM, March 2000.
2006 by Taylor & Francis Group, LLC
Issues and Solutions in Wireless Sensor Networks 32-17
[60] Li, Ning, Hou, Jennifer C., and Sha, Lui. Design and analysis of an MST-based topology control
algorithm. In Proceedings of the IEEE INFOCOM, April 2003.
[61] Liu, Jilei and Li, Baochun. Distributed topology control in wireless sensor networks with
asymmetric links. In Proceedings of the IEEE GLOBECOM, December 2003.
[62] Wattenhofer, Roger, Li, Li, Bahl, Paramvir, and Yi-Min Wang. Distributed topology control
for power efcient operation in multihop wireless ad hoc networks. In Proceedings of the IEEE
INFOCOM, April 2001.
[63] Li, Li, Halpern, Joseph Y., Bahl, Paramvir, Wang, Yi-Min, and Wattenhofer, Roger. Analysis of a
cone-baseddistributedtopology control algorithmfor wireless multi-hopnetworks. In Proceedings
of the ACM Symposium on Principles of Distributed Computing, August 2001.
2006 by Taylor & Francis Group, LLC
33
Architectures for
Wireless Sensor
Networks
S. Dulman,
S. Chatterjea,
T. Hoffmeijer,
P. Havinga,
and J. Hurink
University of Twente
33.1 Sensor Node Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33-2
Mathematical Energy Consumption Model of a Node
33.2 Wireless Sensor Network Architectures . . . . . . . . . . . . . . . . 33-5
Protocol Stack Approach EYES Project Approach
33.3 Data-Centric Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33-17
Motivation Architecture Description
33.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33-21
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33-21
The vision of ubiquitous computing requires the development of devices and technologies that can be
pervasive without being intrusive. The basic component of such a smart environment will be a small
node with sensing and wireless communications capabilities, able to organize itself exibly into a network
for data collection and delivery. Building such a sensor network presents many signicant challenges,
especially at the architectural, protocol, and operating system level.
Although sensor nodes might be equipped with a power supply or energy scavenging means and an
embedded processor that makes them autonomous and self-aware, their functionality and capabilities
will be very limited. Therefore, collaboration between nodes is essential to deliver smart services in
a ubiquitous setting. New algorithms for networking and distributed collaboration need to be developed.
These algorithms will be the key for building self-organizing and collaborative sensor networks that show
emergent behavior and can operate in a challenging environment where nodes move, fail, and energy is
a scarce resource.
The question that rises is how to organize the internal software and hardware components in a manner
that will allowthemtowork properly andbe able toadapt dynamically tonewenvironments, requirements,
and applications. At the same time the solution should be general enough to be suited for as many
applications as possible. Architecture denition also includes, at the higher level, a global view of the
whole network. The topology, placement of base stations, beacons, etc. is also of interest.
In this chapter, we will present and analyze some of the characteristics of the architectures for wireless
sensor networks. Then, we will propose a new dataow-based architecture that allows, as a new feature,
the dynamic reconguration of the sensor nodes software at runtime.
33-1
2006 by Taylor & Francis Group, LLC
33-2 Embedded Systems Handbook
33.1 Sensor Node Architecture
Current existing technology already allows integration of functionalities for information gathering,
processing, and communication in a tight packaging or even in a single chip (e.g., Figure 33.1
presents the EYES sensor node [1]). The four basic blocks needed to construct a sensor node are
(see Figure 33.2):
Sensor platform. The sensors are the interfaces to the real world. They collect the necessary information
and have to be monitored by the central processing unit (CPU). The platforms may be built in a modular
way such that a variety of sensors can be used in the same network. The utilization of a very wide range of
sensors (monitoring characteristics of the environment, such as light, temperature, air pollution, pressure,
etc.) is envisioned. The sensing unit can also be extended to contain one or more actuation units (e.g., to
give the node the possibility of repositioning itself).
Processing unit. Is the intelligence of the sensor node will not only collect the information detected
by the sensor but will also communicate with the server network. The level of intelligence in the sensor
node will strongly depend on the type of information that is gathered by its sensors and by the way
in which the network operates. The sensed information will be preprocessed to reduce the amount of
data to be transmitted via the wireless interface. The processing unit will also have to execute some
networking protocols in order to forward the results of the sensing operation through the network to the
requesting user.
Communication interface. This is the link of each node to the sensor network itself. The focus relies on
a wireless communication link, in particular on the radio communication, although visible or infrared
light, ultrasound, etc. means of communications have already been used [2]. The used radio trans-
ceivers can usually function in simplex mode only, and can be completely turned off, in order to save
energy.
Power source. Owing to the application areas of the sensor networks, autonomy is an important
issue. Sensor nodes are usually equipped with a power supply in the form of one or more batteries.
Current studies focus on reducing the energy consumption by using low-power hardware components
FIGURE 33.1 EYES sensor node. (From EYES. Eyes European project, http://eyes.eu.org. With permission.)
2006 by Taylor & Francis Group, LLC
Architectures for Wireless Sensor Networks 33-3
Processing unit
Sensor platform
Communication
interface
Power source
FIGURE 33.2 Sensor node components.
and advanced networking and data management algorithms. The usage of such energy scavenging tech-
niques for sensor nodes might make possible for the sensor nodes to be self-powered. No matter which
form of power source is used, energy is still a scarce resource and a series of trade-offs will be employed
during the design phase to minimize its usage.
Sensor networks will be heterogeneous fromthe point of viewof the types of nodes deployed. Moreover,
whether or not any specic sensor node can be considered as being part of the network only depends on
the correct usage and participation in the sensor network suite of protocols and not on the nodes specic
way of implementing software or hardware. An intuitive description given in Reference 3 envisions a sea of
sensor nodes, some of them being mobile and some of them being static, occasionally containing tiny isles
of relatively resource-rich devices. Some nodes in the system may execute autonomously (e.g., forming
the backbone of the network by executing network and system services, controlling various information
retrieval and dissemination functions, etc.), while others will have less functionality (e.g., just gathering
data and relaying it to a more powerful node). Thus, from the sensor node architecture point of view
we can distinguish between several kinds of sensor nodes. A simple yet sufcient in the majority of the
cases approach would be to have two kinds of nodes: high-end sensor nodes (nodes that have plenty of
resources or superior capabilities; the best candidate for such a node would probably be a fully equipped
PDA device or even a laptop) and low-end nodes (nodes that have only the basic functionality of the
system and have very limited processing capabilities).
The architecture of a sensor node consists of two main components: dening the precise way in which
functionalities are needed and how to join them into a coherent sensor node. In other words, sensor node
architecture means dening the exact way in which the selected hardware components connect to each
other, how they communicate and how they interact with the CPU, etc.
A large variety of sensor node architectures have been built up to this moment. As a general design
rule, all of them have targetted the following three objectives: energy efciency, small size, and low cost.
Energy efciency is by far the most important design constraint because energy consumption depends
on the lifetime of the sensor nodes. As the typical scenario of sensor networks deployment assumes that
the power supplies of nodes will be limited and not rechargeable, a series of trade-offs need to be made
to decrease the amount of consumed energy. Small size of the nodes leads to the ability of deploying
lots of them to study a certain phenomenon. The ideal size is suggested by the name of one of the rst
research projects in the area: SmartDust [4]. Very cheap sensor nodes will lead to rapid deployment of
such networks and large-scale usage.
2006 by Taylor & Francis Group, LLC
33-4 Embedded Systems Handbook
33.1.1 Mathematical Energy Consumption Model of a Node
In this section, we present a basic version of an energy model for a sensor node. The aim of the model is
to predict the current energy state of the battery of a sensor node based on historical data on the use of
the sensor node and the current energy state of the battery.
In general a sensor node may consist of several components. The main components are: a radio,
a processor, a sensor, a battery, external memory, and periphery (e.g., a voltage regulator or debugging
equipment and periphery to drive an actuator). In the presented model we consider only the rst four
components. The external memory is neglected in this stage of the research since its use of energy is rather
complex and needs an own energy model if the memory is a relevant part of the functional behavior of
the sensor node and not just used for storage. The periphery can be quite different and, thus, can not be
integrated in an energy model of a sensor node in a uniform way.
For the battery we assume that the usage of energy by the other components is independent of the
current energy state of the battery. This implies that the reduction of the energy state of the battery
depends only on the actions of the different components. Furthermore, we do not consider a reactivation
of the battery by time or external circumstances. Based on these assumptions, it remains to give models
for the energy consumption of the three components radio, processor, and sensor.
The base of the model for the energy consumption of a component is the denition of a set S of
possible states s
1
, . . . , s
k
for the component. These states are dened such that the energy consump-
tion of the component is given by the sum of the energy consumptions within the states s
1
, . . . , s
k
plus the energy needed to switch between the different states. We assume that the energy consumption
within a state s
j
can be measured using a simple index t
j
(e.g., execution time or number of instruc-
tions) and that the energy needed to switch between the different states can be calculated on the basis
of a state transition matrix st, where st
ij
denotes the number of times the component has switched from
state s
i
to state s
j
. If now P
j
denotes the needed power in the state s
j
and E
ij
denotes the energy con-
sumption of switching once from state s
i
to state s
j
, the total energy consumption of the component is
given by
E
consumed
=
k
j=1
t
j
P
j
+
k
i,j=1,i=j
st
ij
E
ij
(33.1)
In the following, we describe the state sets S and the indices to measure the energy consumption within
the states for the radio, processor, and sensor:
Radio. For the energy consumption of a radio four different states need to be distinguished: off, sleep,
receiving, and transmitting. For all these four states the energy consumption depends on the time the
radio has been in the state. Thus, for the radio we need to memorize the times the radio has been in the
four states and the 4 4 state transition matrix representing the number of times the radio has switched
between the four states.
Processor. In general, for a processor, four main states can be identied: off, sleep, idle, and active.
In sleep mode the CPU and most internal peripherals are turned off. It can be awaked by an external
event (interrupt) only, on which idle state is entered. In idle mode the CPU is still inactive, but now
some peripherals are active, such as the internal clock or timer). Within the active state the CPU and
all peripherals are active. Within this state multiple states might be identied based on clock speeds and
voltages. We assume that the energy consumption depends on the time the processor has been in a certain
state.
Sensor. For a (simple) sensor we assume that only the two states on and off are given and that the energy
consumption within both states can be measured by time. However, if more powerful sensors are used,
it may be necessary to work with more states (similar to the processor or the radio).
The energy model for the complete sensor node now consists of the energy models for the three
components radio, processor, and sensor plus two extra indicators for the battery:
For the battery only the energy state E
old
at a time t
old
in the past is given.
For each component, the indices I
j
characterizing the energy consumption in state s
j
since time t
old
and the state transition matrix st indicating the transitions since time t
old
are specied.
2006 by Taylor & Francis Group, LLC
Architectures for Wireless Sensor Networks 33-5
Based on this information an estimate of the current energy state of the battery can be calculated by
subtracting from E
old
the sum of the consumed energy for each component estimated on the base of
Equation (33.1).
Since the energy model gives only an estimate of the remaining energy of the battery, in practice it may
be a good approach to use the energy model only for limited time intervals. If the difference between the
current time and t
old
gets larger than a certain threshold, the current energy state of the battery should be
estimated on the base of measurements or other information available on the energy state and E
old
and
t
old
should be replaced by this new estimate and the current time. Furthermore, the indices characterizing
the states and the state transition matrix are reset for all the components of the sensor node.
33.2 Wireless Sensor Network Architectures
A sensor network is a very powerful tool when compared to a single sensing device. It consists of a large
number of nodes, equipped with a variety of sensors that are able to monitor different characteristics of
a phenomenon. A dense network of such small devices, will give the researcher the opportunity to have
a spatial view over the phenomenon and, at the same time, it will produce results based on a combination
of various sorts of sensed data.
Each sensor node will have two basic operation modes: initialization phase and operation phase. But,
the network as a whole will function in a smooth way, with the majority of the nodes in the operation
mode and only a subset of nodes in the initialization phase. The two modes of operation for the sensor
nodes have the following characteristics:
Initialization mode. A node can be considered in initialization mode if it tries to integrate itself in the
network and is not performing its routine function. A node can be in initialization mode, for example, at
power on or when it detects a change in the environment and needs to congure itself. During initialization,
the node can pass through different phases such as detecting its neighbors and the network topology, syn-
chronizing with its neighbors, determining its own position or even performing conguration operations
on its own hardware and software. At a higher abstraction level, a node can be considered in initialization
mode if it tries to determine which services are already present in the network, which services it needs to
provide or can use.
Operation mode. After the initialization phase the node enters a stable state, the regular operation state.
It will function based on the conditions determined in the initialization phase. The node can exit the
operation mode and pass through an initialization mode if either the physical conditions around it or
the conditions related to the network or to itself have changed. The operation mode is characterized by
small bursts of node activity (such as reading sensors values, performing computations, or participating
in networking protocols) and periods spent in an energy-saving low-power mode.
33.2.1 Protocol Stack Approach
A rst approach to building a wireless sensor network will be to use a layered protocol stack as a starting
point, as in the case of traditional computer network. The main difference between the two kinds of
networks is that the blocks needed to build the sensor network usually span themselves over multiple
layers while others depend on several protocol layers. This characteristic of sensor networks comes from
the fact that they have to provide functionalities that are not present in traditional networks. Figure 33.3
presents an approximative mapping of the main blocks onto the traditional OSI protocol layers.
The authors of Reference 5 propose an architecture based on the ve OSI layers together with three
management planes that go throughout the whole protocol stack (see Figure 33.4). A brief description
of the layers included: (1) the physical layer, which addresses mainly the hardware details of the wireless
communication mechanism, such as the modulation type, the transmission and receiving techniques, etc.;
(2) the data-link layer is concerned with the Media Access Control (MAC) protocol that manages com-
munication over the noisy shared channel; (3) the network layers manages routing the data between the
nodes, while the transport layer helps to maintain the dataow; (4) nally, the application layer contains
(very often) only one single user application.
2006 by Taylor & Francis Group, LLC
33-6 Embedded Systems Handbook
A
g
g
r
e
g
a
t
i
o
n
Physical
Link
Network
Transport
Application
C
l
u
s
t
e
r
i
n
g
L
o
c
a
l
i
z
a
t
i
o
n
Lookup
T
i
m
i
n
g
A
d
d
r
e
s
s
i
n
g
S
e
c
u
r
i
t
y
Routing
Collaboration
FIGURE 33.3 Relationship between building blocks and OSI layers.
Application layer
Transport layer
Network layer
Data link layer
Physical layer
P
o
w
e
r
m
a
n
a
g
e
m
e
n
t
p
l
a
n
e
M
o
b
i
l
i
t
y
m
a
n
a
g
e
m
e
n
t
p
l
a
n
e
T
a
s
k
m
a
n
a
g
e
m
e
n
t
p
l
a
n
e
FIGURE 33.4 Protocol stack representation of the architecture. (From Akyildiz, I., Su, W., Sankarasubramaniam, Y.,
and Cayirci, E. IEEE Communication Magazine, 40, 102114, 2002. With permission.)
In addition to the ve network layers, three management planes have the following functionality: the
power management plane coordinates the energy consumption inside the sensor node. It can, for example,
be based on the available amount of energy, allow the node to take part in certain distributed algorithms
or to control the amount of trafc it wants to forward. The mobility management plane will manage all
the information regarding the physical neighbors and their movement patterns as well as its own moving
pattern. The task management plane coordinates sensing in a certain region based on the number of nodes
and their placement (in very densely deployed sensor networks, energy might be saved by turning certain
sensors off to reduce the amount of redundant information sensed).
In the following we will give a description of the main building blocks needed to setup a sensor network.
The description will follow the OSI model. This should not imply that this is the right structure for these
networks, but is taken only as a reference point:
Physical layer. The physical layer is responsible for the management of the wireless interface. For a given
communication task, it denes a series of characteristics as: operating frequency, modulation type, data
coding, interface between hardware and software, etc.
The large majority of already built sensor networks prototypes and most of the envisioned application
scenarios assume the use of a radio transceiver as the means for communication. The unlicensedindustrial,
2006 by Taylor & Francis Group, LLC
Architectures for Wireless Sensor Networks 33-7
scientic, and medical (ISM) band is preferred because it is a free band designed for short-range devices
using low-power radios andrequiring lowdata-transmissionrates. The modulationscheme usedis another
important parameter to decide upon. Complex modulation schemes might not be preferred because they
require important resources (in the form of energy, memory, computation power).
In the future, the advancements of the integrating circuits technology (e.g., ASIC, FPGA) will allow the
use of modulation techniques, such as ultrawide band (UWB) or impulse radio (IR), while if the sensor
node is built using off-the-shelf components the choice comes down mainly to schemes such as amplitude
shift keying (ASK) or frequency shift keying (FSK). Based on the modulation type and on the hardware
used, a specic data encoding scheme will be chosen to assure both the synchronization required by
the hardware component and a rst level of error correction. At the same time, the data frame will also
include some carefully chosen initial bytes needed for the conditioning of the receiver circuitry and clock
recovery.
It is worth mentioning that the minimum output power required to transmit a radio signal over
a certain distance is directly proportional to the distance raised to a power between two and four (the coef-
cient depends on the type of the antenna used and its placement relative to the ground, indooroutdoor
deployment, etc.). In these conditions, it is more efcient to transmit a signal using a multihop network
composed of short-range radios rather than using a (power consuming) long-range link [5].
The communication subsystem usually needs a controller hierarchy to create the abstraction for the
other layers in the protocol stack (we are referring to the device hardware characteristics and the strict
timing requirements). If a simple transceiver is used, some of these capabilities will need to be provided
by the main processing unit of the sensor node (this can require a substantial amount of resources for
exact timing execution synchronization, crosslayer distribution of the received data, etc.). The use of more
advanced specialized communication controllers is not preferred as they will hide important low-level
details of information.
Data-link layer. The data-link layer is responsible for managing most of the communicationtasks within
one hop (both point-to-point and multicasting communication patterns). The main research issues here
are the MAC protocols, the error control strategies and the power consumption control.
The MAC protocols make the communication between several devices over a shared channel possible
by coordinating the sendingreceiving actions as a function of time or frequency. Several strategies have
already been studied and implemented for the mobile telephony networks and for the mobile ad hoc
networks but unfortunately, none of them is directly applicable. Still, ideas can be borrowed from the
existing standards and applications and new MAC protocols can be derived this can be proven by the
large number of new schemes that target specically the wireless sensor networks.
As the radio component is probably the main energy consumer in each sensor node, the MAC protocol
must be very efcient. To achieve this, the protocol must, rst of all, make use of the power down state
of the transceiver (turn the radio off) as much as possible because the energy consumption is negligible
in this state. The most important problem comes from the scheduling of the sleep, receive, and transmit
states. The transitions among these states also need to be taken into account as they consume energy
and sometimes take large time intervals. Message collisions, overhearing, and idle listening are direct
implications of the scheduling used inside the MACprotocol which, in addition, inuences the bandwidth
lost due to the control packet overheads.
A second function of the data-link layer is to perform error control of the received data packets. The
existent techniques include automatic repeat-request (ARQ) and forward error correction (FEC) codes.
The choice of a specic technique comes down to the trade-off between the energy consumed to transmit
redundant information over the channel and the energy and high computation power needed at both the
coder/decoder sides.
Additional functions of the data-link layer are creating and maintaining a list of the neighbor nodes
(all nodes situated within the direct transmission range of the node in discussion); extracting and adver-
tising the source anddestinationas well as the data content of the overheardpackets; supplying information
related to the amount of energy spent on transmitting, receiving, coding, and decoding the packets, the
amount of errors detected, the status of the channel, etc.
2006 by Taylor & Francis Group, LLC
33-8 Embedded Systems Handbook
Network layer. The network layer is responsible for routing of the packets inside the sensor network.
It is one of the most studied topics in the area of wireless sensor networks and it received a lot of
attention lately. The main design constraint for this layer is, as in all the previous cases, the energy
efciency.
The main function of wireless sensor networks is to deliver sensed data (or data aggregates) to the base
stations requesting it. The concept of data-centric routing has been used to address this problem in an
energy-efcient manner, minimizing the amount of trafc in the network. In data-centric routing, each
node is assigned a specic task based on the interests of the base stations. In the second phase of the
algorithm, the collected data is sent back to the requesting nodes. Interest dissemination can be done in
two different ways, depending on the expected amount of trafc and level of events in the sensor network:
the base stations can broadcast the interest to the whole network or the sensor nodes themselves can
advertise their capabilities and the base stations will subscribe to that.
Based on the previous considerations, the network layer needs to be optimized mainly for two oper-
ations: spreading the user queries, generated at one or more base stations, around the whole network, and
then retrieving the sensed data to the requesting node. Individual addressing of each sensor node is not
important in the majority of the applications.
Due to the high density of the sensor nodes, a lot of redundant information will be available inside
the sensor network. Retrieving all this information to a certain base station might easily exceed the
available bandwidth, making the sensor network unusable. The solution to this problem is the data
aggregation technique, which requires each sensor node to inspect the content of the packets it has to
route and aggregate the contained information, reducing the high redundancy of the multiple sensed
data. This technique was proven to substantially reduce the overall trafc and make the sensor net-
work behave as an instrument for analyzing data rather than just a transport infrastructure for raw
data [6].
Transport layer. This layer appears from the need to connect the wireless sensor network to an external
network, such as the Internet, in order to disseminate its data readings to a larger community [7]. Usually
the protocols needed for such interconnections require signicant resources and they will not be present
in all the sensor nodes. The envisioned scenario is to allow a small subset of nodes to behave as gateways
between the sensor network and some external networks. These nodes will be equipped with superior
resources and computation capabilities, and will be able to run the needed protocols to interconnect the
networks.
Application layer. The application layer usually links the users applications with the underlying layers
in the protocol stack. Sensor networks are designed to fulll one single application scenario for each
particular case. The whole protocol stack is designed for a special application and the whole network is
seen as an instrument. These make the application layer to be distributed along the whole protocol stack,
and not appear explicitly. Still, for the sake of classication we can consider an explicit application layer
that could have one of the following functionalities [5]: sensor management protocol, task management
and data advertisement protocol, and sensor query and data dissemination protocol.
33.2.2 EYES Project Approach
The approach taken in the EYES project [1] consists of only two key system abstraction layers: the sensor
and networking layer and the distributed services layer (see Figure 33.5). Each layer provides services that
may be spontaneously specied and recongured:
1. The sensor and networking layer contains the sensor nodes (the physical sensor and wireless
transmission modules) and the network protocols. Ad hoc routing protocols allow messages to be
forwarded through multiple sensor nodes taking into account the mobility of nodes, and the
dynamic change of topology. Communication protocols must be energy efcient since sensor
nodes have very limited energy supplies. To provide more efcient dissemination of data, some
sensors may process data streams, and provide replication and caching.
2006 by Taylor & Francis Group, LLC
Architectures for Wireless Sensor Networks 33-9
Applications
Information
service
Lookup
service
Distributed
services
Sensors and
networking
FIGURE 33.5 EYES project architecture description.
2. The distributed services layer contains distributed services for supporting mobile sensor applica-
tions. Distributed services coordinate with each other to perform decentralized services. These
distributed servers may be replicated for higher availability, efciency, and robustness. We have
identied two major services. The lookup service supports mobility, instantiation, and recongura-
tion. The informationservice deals withaspects of collecting data. This service allows vast quantities
of data to be easily and reliably accessed, manipulated, disseminated, and used in a customized
fashion by applications.
On top of this architecture, applications can be built using the sensor network and distributed services.
Communication in a sensor network is data centric since the identity of the numerous sensor nodes is not
important, only the sensed data together with time and location information counts. The three main
functions of the nodes within a sensor network are directly related to this:
Data discovery. Several classes of sensors will be equipped in the network. Specialized sensors can
monitor climatic parameters (humidity, temperature, etc.), motion detection, vision sensors, and so on.
A rst step of data preprocessing can also be included in this task.
Data processing and aggregation. This task is directly related to performing distributed computations
on the sensed data and also aggregating several observations into a single one. The goal of this operation
is the reduction of energy consumption. Data processing inuences it by the fact that the transmission
of one (raw sensed) data packet is equivalent to many thousands of computation cycles in the current
architectures. Data aggregation keeps the overall trafc low by inspecting the contents of the routed
packets, and in general, reducing the redundancy of the data in trafc by combining several similar
packets into a single one.
Data dissemination. This task includes the networking functionality comprising routing, multicasting,
broadcasting, addressing, etc.
The existing network scenarios contain both static and mobile nodes. In some cases, the static nodes can
be considered to forma back-bone of the network and are more likely to be preferred in certain distributed
protocols. Both mobile and static nodes will have to perform data dissemination, so the protocols should
be designed to be invariant to node mobility. The particular hardware capabilities of each kind of sensor
node will determine how the previously described tasks will be mapped onto them (in principle, all the
2006 by Taylor & Francis Group, LLC
33-10 Embedded Systems Handbook
nodes could provide all the previous functionalities). During the initialization phase of the network, the
functionality of every node will be decided based on both the hardware congurations and the particular
environmental conditions.
For a large sensor network to be able to function correctly, a tiered architecture is needed. This means
that nodes will have to organize themselves into clusters based on certain conditions. The nodes in each
cluster will elect a leader the best tted node to perform coordination inside the cluster (this can
be e.g., the node with the highest amount of energy or the node having the most advanced hardware
architecture or just a random node). The cluster leader will be responsible for scheduling the node
operations, managing the resources and the cluster structure, and maintaining communication with the
other clusters.
We can talk about several types of clusters that can coexist in a single network:
Geographical clustering. The basic mode of organizing the sensor network. The clusters are built based
on the geographical proximity. Neighboring nodes (nodes that are in transmission range of each other)
will organize themselves into groups. This operation can be handled in a completely distributed manner
and it is a necessity for the networking protocols to work even when the network scales up.
Information clustering. The sensor nodes can be grouped into information clusters based on the services
they can provide. This clustering structure belongs to the distributed services layer and is built on top of
the geographical clustering. Nodes using this clustering scheme need not be direct neighbors from the
physical point of view.
Security clustering. A even higher hierarchy appears if security is taken into consideration. Nodes can
be grouped based on their trust levels or based on the actions they are allowed to perform or resources
they are allowed to use in the network.
Besides offering increased capabilities to the sensor network, clustering is considered as one of the
principal building blocks for the sensor networks also from the point of view of energy consumption. The
overhead given by the energy spent for creating and organizing the sensor network is easily recovered in
the long term due to the reduced trafc it leads to.
33.2.2.1 Distributed Services Layer Examples
This section focuses on the distributed services that are required to support applications for wireless sensor
networks. We discuss the requirements of the foundation necessary to run these distributed services and
describe how various research projects approach this problem area from a multitude of perspectives.
A comparison of the projects is also carried out.
One of the primary issues of concern in wireless sensor networks is to ensure that every node in
the network is able to utilize energy in a highly efcient manner so as to extend the total network
lifetime to a maximum [5, 8, 9]. As such, researchers have been looking at ways to minimize energy
usage at every layer of the network stack, starting from the physical layer right up to the application
layer.
While there are a wide range of methods that can be employed to reduce energy consumption, architec-
tures designed for distributed services generally focus on one primary area how to reduce the amount
of communication required and yet get the main job done without any signicant negative impact by
observing and manipulating the data that ows through the network [6, 10, 11]. This leads us to look at
the problem at hand from a data-centric perspective.
In conventional IP-style communication networks, such as on the Internet for instance, nodes are
identied by their end-points and internode communication is layered on an end-to-end delivery service
that is provided within the network. At the communication level, the main focus is to get connected
to a particular node within the network, thus the addresses of the source and destination nodes are
of paramount importance [12]. The precise data that actually ows through the network is irrelevant
to IP.
Sensor networks, however, have a fundamental difference compared to the conventional communica-
tion networks described above as they are application-specic networks. Thus instead of concentrating on
2006 by Taylor & Francis Group, LLC
Architectures for Wireless Sensor Networks 33-11
which particular node a certain data message is originating from, a greater interest lies in the data message
itself what is the data in the data message and what can be done with it? This is where the concept of
a data-centric network architecture comes into play.
As sensor nodes are envisioned to be deployed by the hundreds and potentially even thousands [8],
specic sensor nodes are not usually of any interest (unless of course a particular sensor needs to have
its software patched or a failure needs to be corrected). This means that instead of a sensor network
application asking for the temperature of a particular node with ID 0315, it might pose a query asking,
What is the temperature in sector D of the forest?
Such a framework ensures that the acquired results are not just dependent on a single sensor. Thus
other nodes in sector D can respond to the query even if the node with ID 0315 dies. The outcome is
not only a more robust network but due to the high density of nodes [13], the user of the network is
also able to obtain results of a higher delity (or resolution). Additionally, as nodes within the network
are able to comprehend the meaning of the data passing through them, it is possible for them to carry
out application-specic processing within the network thus resulting in the reduction of data that needs to
be transmitted [14]. In-network processing is particularly important as local computation is signicantly
cheaper than radio communication [15].
33.2.2.1.1 Directed Diffusion
Directed Diffusion is one of the pioneering data-centric communication paradigms developed specically
for wireless sensor networks [6]. Diffusionis based ona publish/subscribe API (applicationprogramming
interface), where the details of how published data is delivered to subscribers is hidden from the data
producers (sources) and publishers (sinks). The transmission and arrival of events (interest or data
messages) occur asynchronously. Interests describe tasks that are expressed using a list of attribute-value
pairs as shown below:
// detect location of seagull
type = seagull
// send back results every 20ms
interval = 20ms
// for the next 15 seconds
duration = 15s
// from sensors within rectangle
rect = [-100,100,200,400]
A node that receives a data message sends it to its Filter API, which subsequently performs a matching
operation according to a list of attributes and their corresponding values. If a match is established between
the received data message and the lter residing on the node, the diffusion substrate passes the event to the
appropriate application module. Thus the Filter API is able to inuence the data which propagates through
the network from the source to the sink node as an application module may perform some application-
specic processing on the received event, for example, it may decide to aggregate the data. For example,
consider a scenario in an environmental monitoring project where the user needs to be notied when the
light intensity in a certain area goes beyond a specied threshold. As the density of deployed nodes may
be very high, it is likely that a large number of sensors would respond to an increase in light intensity
simultaneously. Instead of having every sensor relaying this notication to the user, intermediate nodes in
the region could aggregate the readings from their neighboring nodes and return only the Boolean result
thus greatly reducing the number of radio transmissions.
Apart from aggregating data by simply suppressing duplicate messages, application-specic lters can
also take advantage of named-data to decide how to relay data messages back toward the sink node and
what data to cache in order to route future interest messages in a more intelligent and energy-saving
manner. Filters also help save energy by ensuring that nodes react appropriately to incoming events only
if the attribute matching process has proven to be successful.
Diffusion also supports a more complex form of in-network aggregation. Filters allow nested queries
such that one sensor is able to trigger other sensors in its vicinity if the attribute-value matching operation
2006 by Taylor & Francis Group, LLC
33-12 Embedded Systems Handbook
is successful. It is not necessary for a user to directly query all the relevant sensors. Instead the user only
queries a certain sensor which in turn eventually queries the other relevant sensors around it if certain
conditions are met. In this case, energy savings are obtained from two aspects. First, since the user may be
geographically distant from the observed phenomenon, the energy spent transmitting data can be reduced
drastically using a triggering sensor. Second, if sampling the triggered (or secondary) sensor consumes
a lot more energy than the triggering (initial) sensor, then energy consumption can be reduced greatly by
reducing the duty cycle of the secondary sensor to only periods when certain conditions are met at the
initial sensor.
33.2.2.1.2 COUGAR
Building up on the same concept, that processing data within the network would result in signicant
energy savings, but deviating from the library-based lower-level approach, that is, as used by Directed
Diffusion, the COUGAR [10, 16] project envisions the sensor network as an extension of a conventional
database thus viewing it as a device database system. It makes the usage of the network more user-friendly
by suggesting the use of a high-level declarative language similar to SQL. Using a declarative language
ensures that queries are formulated independent of the physical structure and organization of the sensor
network.
Conventional database systems use a warehousing approach [17] where every sensor that gathers data
from an environment subsequently relays that data back to a central site where this data is then logged for
future processing. While this framework is suitable for historical queries and snapshot queries, it cannot
be used to service long-running queries [17]. For instance, consider the following query:
Retrieve the rainfall level for all sensors in sector A every 30 sec if it is greater than 60 mm.
Using the warehousing approach, every sensor would relay its reading back to a central database every
30 sec regardless of whether it is in sector A or its rainfall level reading is greater than 60 mm. Upon
receiving all the readings from the sensors, the database would then carry out the required processing
to extract all the relevant data. The primary problem in this approach is that excessive resources are
consumed at each and every sensor node as large amounts of raw data need to be transmitted through the
network.
As the COUGAR approach is modeled around the concept of a database, the system generally proceeds
as follows. It accepts a query from the user, produces a query execution plan (which contains detailed
instructions of how exactly a query needs to be serviced), executes this plan against the device database
system, and produces the answer. The query optimizer generates a number of query execution plans and
selects the plan that minimizes a given cost function. The cost function is based on two metrics, namely
resource usage (expressed in Joules) and reaction time.
In this case, the COUGAR approach selects the most appropriate query execution plan that pushes the
selection (rainfall level >60 mm) onto the sensor nodes. Only the nodes that meet this condition send
their readings back to the central node. Thus just like in Directed Diffusion, the key idea here is to transfer
part of the processing to the nodes themselves, which in turn would reduce the amount of data that needs
to be transmitted.
33.2.2.1.3 TinyDB
Following the steps of Directed Diffusion and the COUGAR project, TinyDB [11] also proclaims the
use of some form of in-network processing to increase the efciency of the network and thus improve
network lifetime. However, while TinyDB views the sensor network from the database perspective just
like COUGAR, it goes a step further by pushing not only selection operations to the sensor nodes but
also basic aggregation operations that are common in databases, such as MIN, MAX, SUM, COUNT, and
AVERAGE.
Figure 33.6 illustrates the obvious advantage that performing such in-network aggregation operations
have compared to transmitting just raw data. Without aggregation, every node in the network needs to
transmit not only its own reading but also those of all its children. This not only causes a bottleneck close
2006 by Taylor & Francis Group, LLC
Architectures for Wireless Sensor Networks 33-13
Node transmitting
sensor reading
Root node/base
station
Intermediate node
transmitting sensor reading
+ aggregating data
Data transmission
without in-network
aggregation
Data transmission
with in-network
aggregation
FIGURE 33.6 The effect of using in-network aggregation.
to the root node but also results in unequal consumption of energy, that is, the closer a node is to the root
node, the larger the number of messages it needs to transmit, which naturally results in higher energy
consumption. Thus nodes closer to the root node die earlier. Losing nodes closer to the root node can
have disastrous consequences on the network due to network partitioning. Using in-network aggregation,
however, every intermediate node aggregates its own reading with that of its children and eventually
transmits only one combined result.
Additionally, TinyDB has numerous other features, such as communication scheduling, hypothesis
testing, and acquisitional query processing, which makes it one of the most feature-rich distributed query
processing frameworks for wireless sensor networks at the moment.
TinyDB requires users to specify queries injected into the sensor network using an SQL-like language.
This language describes what data needs to be collected and how it should be processed upon collection as
it propagates through the network toward the sink node. The language used by TinyDB varies from tradi-
tional SQL in the sense that the semantics supports queries that are continuous and periodic. For example,
a query could state: Return the temperature reading of all the sensors on Level 4 of the building every
5 min over a period of 10 h. The period of time between every successive sample is known as an epoch
(in this example it is 5 min).
Just like in SQL, TinyDB queries follow the SELECT - FROM - WHERE - GROUPBY - HAVING
format that supports selection, join, projection, aggregation, and grouping. Just like in COUGAR, sensor
data is viewed as a single virtual table with one column per sensor type. Tuples are appended to the table at
every epoch. Epochs also allow computation to be scheduled such that power is minimized. For example,
the following query species that each sensor should report its own identier and temperature readings
once every 60 sec for a duration of 300 sec:
SELECT nodeid, temp
FROM sensors
SAMPLE PERIOD 60s FOR 300s
The virtual table sensors is conceptually an unbounded, continuous data stream of values that contain
one column for every attribute and one row for every possible instant in time. The table is not actually
stored in any device, that is, it is not materialized but sensor nodes only generate the attributes and rows
that are referenced in active queries. Apart from the standard query shown above, TinyDB also supports
event-basedqueries andlifetime queries [18]. Event-basedqueries reduce energy consumptionby allowing
nodes to remain dormant until some triggering event is detected. Lifetime queries are useful when users
are not particularly interested in the specic rate of incoming readings but more on the required lifetime
of the network. So the basic idea is to send out a query saying that sensor readings are required for say
60 days. The nodes then decide on the best possible rate at which readings can be sent given the specied
network lifetime.
2006 by Taylor & Francis Group, LLC
33-14 Embedded Systems Handbook
Queries are disseminated into the network via a routing tree rooted at the base station that is formed
as nodes forward the received query to other nodes in the network. Every parent node can have multiple
child nodes but every child node can only have a single parent node. Every node also keeps track of its
distance from the root node in terms of the number of hops. This form of communication topology is
commonly known as tree-based routing.
Upon receiving a query, each node begins processing it. A special acquisition operator at each node
acquires readings from sensors corresponding to the elds or attributes referenced in the query. Similar to
the concept of nested queries in Directed Diffusion, where sensors with a low sampling cost are sampled
rst, TinyDB performs the ordering of sampling and predicates. Consider the following query as an
example where a user wishes to obtain readings from an accelerometer and a magnetometer provided
certain conditions are met:
SELECT accel, mag
FROM sensors
WHERE accel > c1
AND mag > c2
SAMPLE INTERVAL 1s FOR 60s
Depending on the cost of sampling the accelerometer and the magnetometer sensors, the optimizer
will rst sample the cheaper sensor to see if its condition is met. It will only proceed to the more costly
second sensor if the rst condition has been met.
Next we describe how the sampled data is processed within the nodes and is subsequently propagated
up the network toward the root node. Consider the following query:
Report the average temperature of the fourth oor of the building every 30 sec.
To service the above query, the query plan has three operators: a data acquisitional operator, a select
operator that checks if the value of oor equals 4, and the aggregate operator that computes the average
temperature from not only the current node but also its children located on the fourth oor. Each sensor
node applies the plan once per epoch and the data stream produced at the root node is the answer to the
query. The partial computation of averages is represented as {sum, count} pairs, which are merged at each
intermediate node in the query plan to compute a running average as data ows up the tree.
TinyDB uses a slotted scheduling protocol to collect data where parent and child nodes receive and send
(respectively) data in the tree-based communication protocol. Each node is assumed to produce exactly
one result per epoch, which must be forwarded all the way to the base station. Every epoch is divided into
a number of xed-length intervals that is dependent on the depth of the tree. The intervals are numbered
in reverse order such that interval 1 is the last interval in the epoch. Every node in the network is assigned
to a specic interval that correlates to its depth in the routing tree. Thus for instance if a particular
node is two hops away from the root node, it is assigned the second interval. During its own interval,
a node performs the necessary computation, transmits its result and goes back to sleep. In the interval
preceding its own, a node sets its radio to listen mode collecting results from its child nodes. Thus data
ows up the tree in a staggered manner eventually reaching the root node during interval 1 as shown in
Figure 33.7.
33.2.2.1.4 Discussion
In this section we do a comparison of the various projects described above and highlight some of their
drawbacks. We also mention some other work in the literature that has contributed further improvements
to some of these existing projects. Table 33.1 shows a list comparing some of the key features of the various
projects.
As mentioned earlier, Directed Diffusion was a pioneering project in the sense that it introduced the
fundamental concept of improving network efciency by processing data within the sensor network.
However, unlike COUGAR and TinyDB it does not offer a particularly simple interface, exible naming
system, or any generic aggregation and join operators. Such operators are considered as application-
specic operators and must always be coded in a low-level language. A drawback of this approach is that
2006 by Taylor & Francis Group, LLC
Architectures for Wireless Sensor Networks 33-15
Key
Message
transmission
Radio in transmit
mode
Radio in listen-only
mode
Interval 5 Interval 4 Interval 3 Interval 2 Interval 1
Time
A
B
C
D
E
F
G
H
I
J
Nodes sleeping
Nodes sleeping
Slotted approach
FIGURE 33.7 Communicating scheduling in TinyDB using the slotted approach [19].
TABLE 33.1 Comparison of Data Management Strategies
Directed Diffusion COUGAR TinyDB
Type Non-database Database Database
Platform iPAQ class (Mote class iPAQ class Mote class
for micro-diffusion)
Query language Application specic, SQL-based SQL-based
dependent on Filter API
Type of in-network Suppression of identical Selection operators Selection, aggregation operators
aggregation data messages from and limited optimization
different sources
Crosslayer features Routing integrated with None Routing integrated with
in-network aggregation in-network aggregation;
communication scheduling
also decreases burden
on the MAC layer
Caching of data Yes No Yes
for future routing
Power saving mechanism Yes nested queries None Yes acquisitional
while sampling sensors query processing
Type of optimization None Centralized Mostly centralized metadata is
occasionally copied to catalogue
query optimizers are unable to deal with such user-dened operators as there are no xed semantics.
This is because query operators are unable to make the necessary cost comparisons between various user-
dened operators. A direct consequence of this is that since the system is not able to handle optimization
tasks autonomously, the arduous responsibility of placement and ordering of operators is placed on the
2006 by Taylor & Francis Group, LLC
33-16 Embedded Systems Handbook
user. This naturally would be a great hindrance to users of the system (e.g., environmentalists) who are
only concerned with injecting queries into the network and obtaining the results not guring out
the intricacies of energy-efcient mechanisms to extend network lifetime!
While the COUGAR project specically claims to target wireless sensor networks [20, 21], apart from
the feature of pushing down selection operations into the device network, it does not demonstrate any
other novel design characteristics that would allow it to run on sensor networks. In fact, the COUGAR
project has simulations and implementations using Linux-based iPAQ class hardware that has made them
take certain design decisions that would be unsuitable for sensor networks. For instance, unlike Directed
Diffusion [14] and TinyDB [18], COUGAR does not take the cost incurred by sampling sensors into
consideration during the generation of query execution plans. It also does not take advantage of certain
inherent properties of radio communication, for example, snooping, and also fails to suggest any methods
that could link queries to communication scheduling. Additionally, the usage of XML to encode messages
and tuples makes it inappropriate for sensor networks given their limited bandwidth and high cost of
transmission per bit.
Among the various query processing systems currently stated in the literature, TinyDB seems to be the
one that is the most feature packed. The TinyDB software has been deployed using Mica2 motes in the
Berkeley Botanical Garden to monitor the microclimate in the gardens redwood grove [22]. However,
the initial deployment only relays raw readings and does not currently make use of any of the aggregation
techniques introduced in the TinyDB literature. While it may have approached the problem of improving
energy efciency from several angles it does have a number of inherent drawbacks the most signicant
being the lack of adaptability. First, the communication scheduling mentioned above is highly dependent
on the depth of the network that is assumed to be xed. This makes it unable to react to changes in the
topology in the network on-the-y that could easily happen if new nodes are added or certain nodes
die. Second, the communication scheduling is also directly dependent on the epoch that is specied in
every query injected into the network. With networks expected to span say hundreds or even thousands
of nodes, it is unlikely that environmentalists using a particular network would only inject one query
into the node at any one time. Imagine if the Internet was designed in a way such that only one person
was allowed to use it at any instant! Thus methods need to be devised to enable multiple queries to run
simultaneously in a sensor network.
Although TinyDB reduces the number of transmissions greatly by carrying out in-network aggregation
for every long-running query, it keeps on transmitting data during the entire duration of the active query
disregarding the temporal correlation in a sequence of sensor readings. Reference 23 takes advantage of
this property and ensures that nodes only transmit data when there is a signicant enough change between
successive readings. In other words, sensors may refrain from transmitting data if the readings remain
constant.
Another area related to the lack of adaptability affecting both COUGAR and TinyDB has to do with
the generation of query execution plans. In both projects the systems assume a global view of the network
when it comes to query optimization. Thus network metadata is periodically copied from every node
within the network to the root node. This information is subsequently used to work out the best possible
query optimization plan. Obviously, the cost of extracting network metadata from every node is highly
prohibitive. Also query execution plans generated centrally may be outdated by the time they reach the
designated nodes as conditions in a sensor network can be highly volatile, for example, the node delegated
to carry out a certain task may have run out of power and died by the time instructions arrive from the
root node. In this regard, it is necessary to investigate methods where query optimizations are carried
out using only local information. While they may not be as optimal as plans generated based on global
network metadata, it will result in signicant saving in terms of the number of radio transmissions.
Reference 24 looks into creating an adaptive and decentralized algorithm that places operators optimally
within a sensor network. However, the preliminary simulation results are questionable since the overhead
incurred during the neighbor exploration phase is not considered. Also there is no mention of how fast the
algorithm responds to changes in networkdynamics.
2006 by Taylor & Francis Group, LLC
Architectures for Wireless Sensor Networks 33-17
33.3 Data-Centric Architecture
As we previously stated, the layered protocol stack description of the system architecture for a sensing
node cannot cover all the aspects involved (such as crosslayer communication, dynamic update, etc.).
In this section we address the problem of describing the system architecture in a more suited way and
its implications in the application design.
33.3.1 Motivation
The sensor networks are dynamic from many points of view. Continuously changing behaviors can be
noticed in several aspects of sensor networks, some of them being:
Sensing process. The natural environment is dynamic by all means (the basic purpose of sensor networks
is to detect, measure, and alert the user of the changing of its parameters). The sensor modules themselves
can become less accurate, need calibration or even break down.
Network topology. One of the features of the sensor networks is their continuously changing topology.
There are a lot of factors contributing to this, such as: failures of nodes or the unreliable communica-
tion channel, mobility of the nodes, variations of the transmission ranges, clusters reconguration,
addition/removal of sensor nodes, etc. Related to this aspect, the algorithms designed for sensor networks
need to have two main characteristics: they need to be independent on the network topology and need to
scale well with the network size.
Available services. Mobility of nodes, failures, or availability of certain kinds of nodes might trigger
recongurations inside the sensor network. The functionality of nodes may depend on existing services
at certain moments and when they are no longer available, the nodes will either recongure themselves or
try to provide them themselves.
Network structure. New kinds of nodes may be added to the network. Their different and increased
capabilities will bring changes to the regular way in which the network functions. Software modules
might be improved or completely new software functionality might be implemented and deployed in the
sensor nodes.
Most wireless sensor network architectures currently use a xed layered structure for the protocol stack in
each node. This approach has certain disadvantages for wireless sensor networks. Some of them are:
Dynamic environment. Sensor nodes address a dynamic environment where nodes have to recongure
themselves to adapt to the changes. Since resources are very limited, reconguration is also needed
in order to establish an efcient system (a totally new functionality might have to be used if energy
levels drop under certain values). The network can adapt its functionality to a new situation, in order
to lower the use of the scarce energy and memory resources, while maintaining the integrity of its
operation.
Error control. It normally resides in all protocol layers so that for all layers the worst-case scenario is
covered. For a wireless sensor network this redundancy might be too expensive. Adopting a central view
on how error control is performed and crosslayer design will reduce the resources spent for error control.
Power control. It is traditionally done only at the physical layer, but since energy consumption in sensor
nodes is a major design constraint, it is found in all layers (physical, data-link, network, transport, and
application layers).
Protocol place in the sensor node architecture. An issue arises when trying to place certain layers in
the protocol stack. Examples may include: timing and synchronization, localization, and calibration.
These protocols might shift their place in the protocol stack as soon as their transient phase is over. The
data producedby some of these algorithms might make a different protocol stack more suitedfor the sensor
node (e.g., a localization algorithm for static sensor networks might enable a better routing algorithm that
uses information about the location of the routed data destination).
2006 by Taylor & Francis Group, LLC
33-18 Embedded Systems Handbook
Protocol availability. New protocols might become available after the network deployment. At certain
moments, in specic conditions, some of the sensor nodes might use a different protocol stack that better
suits their goal and the environment.
It is clear from these examples that dynamic reconguration of each protocol as well as dynamic
reconguration of the active protocol stack is needed.
33.3.2 Architecture Description
The system we are trying to model is an event-driven system, meaning that it reacts and processes the
incoming events and afterwards, in the absence of these stimuli, it spends its time in the sleep state (the
software components running inside the sensor node are not allowed to perform blocking waiting).
Let us name a higher level of abstractionfor the event class as data. Data may encapsulate the information
provided by one or more events, have a unique name and containadditional informationsuchas deadlines,
identity of producer, etc. Data will be the means used by the internal mechanisms of the architecture to
exchange information components.
In the following we will address any protocol or algorithm that can run inside a sensor node with the
term entity (see Figure 33.8). An entity is a software component that will be triggered by the availability
of one or more data types. While running, each entity is allowed to read available data types (but not wait
for additional data types becoming available). As a result of the processing, each software component can
produce one or more types of data (usually on their exit).
An entity is also characterized by some functionality, meaning the sort of operation it can produce
on the input data. Based on their functionality, the entities can be classied as being part of a certain
protocol layer as in the previous description. For one given functionality, several entities might exist inside
a sensor node; to discern among them, one should take into consideration their capabilities. By capability
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Module manager
Publish/subscribe
server
FIGURE 33.8 Entity description.
2006 by Taylor & Francis Group, LLC
Architectures for Wireless Sensor Networks 33-19
we understand high-level description containing the cost for a specic entity to perform its functionality
(as energy, resources, time, etc.) and some characteristics indicating the estimated performance and
quality of the algorithm.
In order for a set of components to work together, the way in which they have to be interconnected
should be specied. The architectures existent up to this moment in the wireless sensor network eld,
assume a xed way in which these components can be connected, which is dened at compile time
(except for the architectures that e.g., allow execution of agents). To change the protocol stack in such an
architecture, the user should download the whole compiled code into the sensor node (via the wireless
interface) and then make use of some boot code to replace the old running code in it. In the proposed
architecture we are allowing this interconnection to be changed at runtime, thus making online update
of the code possible, the selection of a more suited entity to perform some functionality based on
the changes in the environment, etc. (in one word allowing the architecture to become dynamically
recongurable).
To make this mechanism work, a new entity needs to be implemented; let us call this the data manager.
The data manager will monitor the different kinds of data being available and will coordinate the dataow
inside the sensor node. At the same time it will select the most tting entities to perform the work and it
will even be allowed to change the whole functionality of the sensor node based on the available entities
and external environment (see Figure 33.9).
The implementation of these concepts can not make abstraction of the small amount of resources
each sensor node has (as energy, memory, computation power, etc.). Going down from the abstraction
level to the point where the device is actually working, a compulsory step is implementing the envi-
sioned architecture in a particular operating system (in this case a better term maybe system software).
A large range of operating systems exist for embedded systems in general [25, 26]. Scaled down ver-
sions with simple schedulers and limited functionality have been developed especially for wireless sensor
networks [27].
Usually, the issues of systemarchitecture andoperating systemare treatedseparately, bothof themtrying
to be as general as possible and to cover all the possible application cases. A simplistic view of a running
operating system is a scheduler that manages the available resources and coordinates the execution of a
set of tasks. This operation is centralized from the point of view of the scheduler that is allowed to take
all the decisions. Our architecture can also be regarded as a centralized system, with the data manager
coordinating the dataow of the other entities. To obtain the smallest overhead possible there should
be a correlation between the function of the central nucleus from our architecture and the function of
the scheduler from the operating system. This is why we propose a close relationship between the two
concepts by extending the functionality of the scheduler with the functionality of the data manager.
The main challenges that arise are keeping the size of the code low and the context-switching time.
33.3.2.1 Requirements
As we mentioned earlier, the general concept of data is used rather than the event one. For the decision
based on data to work, there are some additional requirements to be met.
First of all, all the modules need to declare the name of the data that will trigger their action, the name of
the data they will need to read during their action (this can generically incorporate all the shared resources
in the system) and the name of the data they will produce. The scheduler needs all this information to
take the decisions.
From the point of view of the operating system, a new component that takes care of all the data
exchange needs to be implemented. This would in fact be an extended message passing mechanism, with
the added feature of notifying the scheduler when new data types become available. The mapping of this
module in the architecture is the constraint imposed to the protocols to send/receive data via, for example,
a publish/subscribe mechanism to the central scheduler.
An efcient naming systemfor the entities and the data is needed. Downloading newentities to a sensor
node involves issues similar to services discovery. Several entities with the same functionality but with
2006 by Taylor & Francis Group, LLC
3
3
-
2
0
E
m
b
e
d
d
e
d
S
y
s
t
e
m
s
H
a
n
d
b
o
o
k
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
Data
Data
Data
Input
Input Output
Functionality
Capabilities
{...}
Data
Data
FIGURE 33.9 Architecture transitions.
2006 by Taylor & Francis Group, LLC
Architectures for Wireless Sensor Networks 33-21
different requirements and capabilities might coexist. The data-centric scheduler has to make the decision
as to which one is the best.
33.3.2.2 Extension of the Architecture
The architecture presented earlier might be extended to groups of sensor nodes. Several data-centric
schedulers together with a small, xed number of protocols can communicate with each other and form
a virtual backbone of the network.
Entities running inside sensor nodes can be activated using data types that become available at other
sensor nodes (e.g., imagine one node using his neighbor routing entity because it needs the memory to
process some other data).
Of course, this approach raises new challenges. A naming system for the functionalities and data
types, reliability issues of the system (for factors, such as mobility, communication failures, node failures,
security attacks) are just a few examples. Related work on these topics already exist (e.g., References 28
and 29).
33.4 Conclusion
In this chapter, we have outlined the characteristics of wireless sensor networks from an architectural
point of view. As sensor networks are designed for specic applications, there is no precise archi-
tecture to t them all but rather a common set of characteristics that can be taken as a starting
point.
The combination of the data-centric features of sensor networks and the need to have a dynamic
recongurable structure has led to a new architecture that provides enhanced capabilities than the existing
ones. The new architecture characteristics and implementation issues have been discussed, laying the
foundations for future work.
This area of research is currently in its infancy and major steps are required in the elds of com-
munication protocols, data processing, and application support to make the vision of Mark Weiser
a reality.
References
[1] EYES. Eyes European project, http://eyes.eu.org.
[2] Chu, P., Lo, N.R., Berg, E., and Pister, K.S.J. Optical communication using micro corner cuber
reectors. In Proceedings of the MEMS97. IEEE, Nagoya, Japan, 1997, pp. 350355.
[3] Havinga, P. et al. Eyes deliverable 1.1 system architecture specication.
[4] SmartDust. http://robotics.eecs.berkeley.edu/ pister/SmartDust.
[5] Akyildiz, I., Su, W., Sankarasubramaniam, Y., and Cayirci, E. A survey on sensor networks. IEEE
Communication Magazine, 40(8), 102114, 2002.
[6] Intanagonwiwat, C., Govindan, R., Estrin, D., Heidemann, J., and Silva, F. Directed diffusion for
wireless sensor networks. IEEE/ACMTransactions on Networking, 11(1), 216, 2003.
[7] Pottie, G.J. and Kaiser, W.J. Embedding the internet: wireless integrated network sensors.
Communications of the ACM, 43(5), 5158, 2000.
[8] Ganesan, D., Cerpa, A., Ye, W., Yu, Y., Zhao, J., and Estrin, D. Networking issues in wireless sensor
networks. Journal of Parallel and Distributed Computing, Special Issue on Frontiers in Distributed
Sensor Networks, 64(7), 799814, 2004.
[9] Estrin, D., Govindan, R., Heidemann, J.S., and Kumar, S. Next century challenges: scalable coordin-
ation in sensor networks. Mobile Computing and Networking. IEEE, Seattle, Washington, USA,
1999, pp. 263270.
2006 by Taylor & Francis Group, LLC
33-22 Embedded Systems Handbook
[10] Bonnet, P., Gehrke, J., and Seshadri, P. Towards sensor database systems. In Proceedings of the
Second International Conference on Mobile Data Management. Springer-Verlag, Heidelberg, 2001,
pp. 314.
[11] Madden, S., Szewczyk, R., Franklin, M., and Culler, D. Supporting aggregate queries over ad-hoc
wireless sensor networks. In Proceedings of the Fourth IEEE Workshop on Mobile Computing and
Systems Applications. IEEE, 2002.
[12] Postel, J. Internet protocol, rfc 791, 1981.
[13] Estrin, D., Girod, L., Pottie, G., and Srivastava, M. Instrumenting the world with wireless sensor
networks. In Proceedings of the International Conference on Accoustics, Speech and Signal Processing.
IEEE, Salt Lake City, Utah, 2001.
[14] Heidemann, J.S., Silva, F., Intanagonwiwat, C., Govindan, R., Estrin, D., and Ganesan, D. Building
efcient wireless sensor networks with low-level naming. In Symposium on Operating Systems
Principles. ACM, 2001, pp. 146159.
[15] Pottie, G.J. and Kaiser, W.J. Wireless integrated network sensors. Communications of the ACM,
43(5), 5158, 2000.
[16] Bonnet, P. and Seshadri, P. Device database systems. In Proceedings of the International Conference
on Data Engineering. IEEE, San Diego, CA, 2000.
[17] Bonnet, P., Gehrke, J., and Seshadri, P. Querying the physical world. IEEE Personal Communica-
tions, 7, 1015, 2000.
[18] Madden, S., Franklin, M.J., Hellerstein, J.M., and Hong, W. The design of an acquisitional query
processor for sensor networks. Proceedings of the 2003 ACM SIGMOD International Conference on
Management of Data. ACM Press, San Diego, CA, 2003, pp. 491502.
[19] Madden, S. The design and evaluation of a query processing architecture for sensor networks.
PhD thesis, University of California, Berkeley, 2003.
[20] Yao, Y. and Gehrke, J. The cougar approach to in-network query processing in sensor networks.
SIGMOD Record, 31(3), 2002.
[21] Yao, Y. and Gehrke, J. Query processing for sensor networks. In Proceedings of the Conference on
Innovative Data Systems Research. Asilomar, CA, 2003.
[22] Gehrke, J. and Madden, S. Query processing in sensor networks. Pervasive Computing. IEEE, 2004,
pp. 4655.
[23] Beaver, J., Sharaf, M.A., Labrinidis, A., and Chrysanthis, P.K. Power aware in-network
query processing for sensor data. In Proceedings of the Second Hellenic Data Management
Symposium. Athens, Greece, 2003.
[24] Bonls, B.J. and Bonnet, P. Adaptive and decentralized operator placement for in-network query
processing. In Proceedings of the Second International Workshop on Information Processing in
Sensor Networks (IPSN), Vol. 2634 of Lecture Notes in Computer Science. Springer-Verlag, Berlin,
Heidelberg, 2003, pp. 4762.
[25] VxWorks. Wind river, http://www.windriver.com.
[26] Salvo. Pumpkin incorporated, http://www.pumpkininc.com.
[27] Hill, J., Szewczyk, R., Woo, A., Hollar, S., Culler, D.E., and Pister, K.S.J. System architecture direc-
tions for networked sensors. In Architectural Support for Programming Languages and Operating
Systems, 2000, pp. 93104.
[28] Verissimo, P. and Casimiro, A. Event-driven support of real-time sentient objects. In Proceedings of
the Eighth IEEE International Workshop on Object-Oriented Real-Time Dependable Systems. IEEE,
Guadalajara, Mexico, 2003.
[29] Cheong, E., Liebman, J., Liu, J., and Zhao, F. TinyGALS: a programming model for event-driven
embedded systems. In Proceedings of the 2003 ACMSymposiumon Applied Computing. ACMPress,
Melbourne, Florida, 2003, pp. 698704.
2006 by Taylor & Francis Group, LLC
34
Energy-Efcient
Medium Access
Control
Koen Langendoen and
Gertjan Halkes
Delft University of Technology
34.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-1
Contention-Based Medium Access Schedule-Based
Medium Access
34.2 Requirements for Sensor Networks. . . . . . . . . . . . . . . . . . . . . 34-5
Hardware Characteristics Communication Patterns
Miscellaneous Services
34.3 Energy Efciency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-7
Sources of Overhead Trade-Offs
34.4 Contention-Based Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-11
IEEE 802.11 LPL and Preamble Sampling WiseMAC
34.5 Slotted Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-12
Sensor-MAC Timeout-MAC Data-Gathering MAC
34.6 TDMA-Based Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-14
Lightweight Medium Access
34.7 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-17
Simulation Framework Micro-Benchmarks
Homogeneous Unicast and Broadcast Local Gossip
Convergecast Discussion
34.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-27
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-27
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-28
34.1 Introduction
Managing wireless communication will be the key to effective deployment of large-scale sensor networks
that need to operate for years. On the one hand, wireless communication is essential (1) to foster collab-
oration between neighboring sensor nodes to help overcome the inherent limitations of their cheap, and
hence inaccurate sensors observing physical events, and (2) to report those events back to a sink node
connected to the wired world. On the other hand, wireless communication consumes a lot of energy, is
error prone, and has limited range, forcing many nodes to participate in relaying information, all of which
severely limit the lifetime of the (unattended) sensor network. In typical sensor nodes, such as the Mica2
mote, communicating one bit of information consumes as much energy as executing several hundred
34-1
2006 by Taylor & Francis Group, LLC
34-2 Embedded Systems Handbook
Physical
Network
Data link
MAC protocol
Layer 2
Layer 3
Layer 1
FIGURE 34.1 Network protocol stack.
instructions. Therefore, one should think twice before actually transmitting a message. Nevertheless,
whenever a message shouldbe sent, the protocol stack must operate as efciently as possible. Inthis chapter,
we will study the medium access layer, which is part of the data link layer (layer 2 of the OSI model) and
sits directly on top of the physical layer (layer 1) (see Figure 34.1). Since the medium access layer controls
the radio, it has a large impact on the overall energy consumption, and hence, the lifetime of a node.
A Medium Access Control (MAC) protocol decides when competing nodes may access the shared
medium, that is, the radio channel, and tries to ensure that no two nodes are interfering with each others
transmissions. In the unfortunate event of a collision, a MAC protocol may deal with it through some
contention resolution algorithm, for example, by resending the message later at a randomly selected
time. Alternatively, the MAC protocol may simply discard the message and leave the retransmission
if any up to the higher layers in the protocol stack. MAC protocols for wireless networks have been
studied since the 1970s, but the successful introduction of wireless LANs (WLANs) in the late 1990s has
accelerated the pace of developments; the recent survey by Jurdak et al. [1] reports an exponential growth
of new MAC protocols. We will now provide a brief historic perspective on the evolution of MAC, and
describe the two major approaches contention-based and schedule-based regularly used in wireless
communication systems. Readers familiar with medium access in wireless networks may proceed to
Section 34.2 immediately.
34.1.1 Contention-Based Medium Access
In the classic (pure) ALOHAprotocol [2], developed for packet radio networks in the 1970s, a node simply
transmits a packet when it is generated. If no other node is sending at the same time, the data transmission
succeeds andthe receiver responds withanacknowledgment. Inthe case of a collision, no acknowledgment
will be generated, and the sender retries after a random period. The price to be paid for ALOHAs
simplicity is its poor use of the channel capacity; the maximum throughput of the ALOHA protocol is
only 18%[2]. However, a minor modication toALOHAcan increase the channel utilization considerably.
In slotted ALOHA, time is divided into slots, and nodes may only transmit at the beginning of a slot. This
organization halves the probability of a collision and raises the channel utilization to around 35% [3].
34.1.1.1 Carrier Sense Multiple Access
Instead of curing the effects (retransmissions) after the fact, it is often much better to take out the root of
the problem(collisions). The Carrier Sense Multiple Access (CSMA) protocol [4], originally introduced by
Kleinrock and Tobagi in 1975, tries to do just that. Before transmitting a packet, a node rst listens to the
channel for a small period of time. If it does not sense any trafc, it assumes that the channel is clear and
starts transmitting the packet. Since it takes some time to switch the radio from receive mode to transmit
mode, the CSMAmethod is not bullet proof and collisions can still occur. In practice however, CSMA-style
MAC protocols can achieve a maximal channel utilization in the order of 50 to 80% depending on the
exact access policy [4].
34.1.1.2 Carrier Sense Multiple Access with Collision Avoidance
When all nodes can sense each others transmissions, CSMA performs just ne. It took until 1990 before
a signicant new development in MAC was recorded. The Medium Access with Collision Avoidance
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-3
(MACA) protocol [5] addresses the so-called hidden terminal problem that occurs in ad hoc (sensor)
networks where the radio range is not large enough to allow communication between arbitrary nodes and
two (or more) nodes may share a common neighbor while being out of each others reach. Consider the
situation in Figure 34.2 where nodes Aand Cboth want to transmit a packet to their common neighbor B.
Both nodes sense an idle channel and start to transmit their packets, resulting in a collision at B. Note that
since node A is hidden from C, any packet sent by C will disrupt an ongoing transmission from A to B, so
this type of collision is quite common in ad hoc networks.
The MACA protocol introduces a three-way handshake to make hidden nodes aware of upcoming
transmissions, so collisions at common neighbors can be avoided. The sender (node A in Figure 34.2)
initiates the handshake by transmitting a short Request-To-Send (RTS) control packet announcing its
intended data transmission. The receiver (B) responds with a Clear-To-Send (CTS) packet, which informs
all neighbors of the receiver (including hidden nodes like C) of the upcoming transfer. The nal DATA
transfer (from A to B) is now guaranteed to be collision-free. When two RTS packets collide, which is
technically still possible, the intended receiver does not respond with a CTS and both senders backoff for
some random time. To account for the unreliability of the radio channel, MACA Wireless (MACAW [6])
adds a fourth packet to the control sequence to guarantee delivery. When the data is received correctly,
an explicit ACKnowledgment is send back to the sender. If the sender does not receive the ACK in due
time, it initiates a retransmission sequence to account for the corrupted or lost data.
The collision avoidance protocol in MACA (and derivatives) is widely used and is generally known as
CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance). It has proved to be very effective
in eliminating collisions. In fact, CSMA/CA is too good at it and also silences nodes whose transmissions
would not interfere with the data transfer between the senderreceiver pair. The so-called exposed terminal
problem is illustrated in Figure 34.3. In principle, the data transmissions BA and CD can take place
concurrently since the signals from B cannot disturb the reception at D, and similarly Cs signals cannot
collide at A. However, since B must be able to receive the CTS by A, all nodes who can hear Bs RTS packet
must remain silent even if they are outside the reach of the receiver (A). Node C is thus exposed to Bs
transmission (and vice versa). Since exposed nodes are prohibited from sending, aggregate throughput
may be reduced.
R
T
S
C
T
S C
T
S
Blocked
T
i
m
e
D
a
ta
A B C A B C
D
a
ta
D
a
ta
(a) (b)
FIGURE 34.2 (a) The hidden terminal problem resolved through (b) Request-To-Send/Clear-To-Send signaling.
R
T
S
T
i
m
e
Blocked
Data Data
A B C D A B C D
D
a
ta
R
T
S
C
T
S
(b) (a)
FIGURE 34.3 The exposed terminal problem: (a) concurrent transfers are (b) synchronized.
2006 by Taylor & Francis Group, LLC
34-4 Embedded Systems Handbook
34.1.1.3 IEEE 802.11
In 1999, the IEEE Computer Society published the 802.11 WLAN standard [7], specifying the PHYsical
and MAC layers. IEEE 802.11 compliant equipment, usually PC cards operating in the 2.4 or 5 GHz band,
can operate in infrastructure mode as well as in ad hoc mode. In both cases, 802.11 implements carrier
sense and collision avoidance to reduce collisions (see Section 34.4.1 for details). To preserve the energy of
mobile nodes, the 802.11 standard includes a power-saving mechanism that allows nodes to go into sleep
mode (i.e., disable their radios) for long periods of time. This mode of operation requires the presence of
an access point that records the status of each node and buffers any data addressed to a sleeping node. The
access point regularly broadcasts beacon packets indicating for which nodes it has buffered packets. These
nodes may then send a poll request to the access point to retrieve the buffered data (or switch back from
sleep to active mode). Krashinksy and Balakrishnan report up to 90% energy savings for web browsing
applications, but at the expense of considerable delays [8]. Currently, power saving in 802.11s ad hoc
mode is only supported when all nodes are within each others reach, so a simple, distributed scheme can
be used to coordinate actions; the standard does not include a provision for power saving in multihop
networks.
34.1.2 Schedule-Based Medium Access
The MAC protocols discussed so far are based on autonomous nodes contending for the channel. A com-
pletely different approach is to have a central authority (access point) regulate the access to the medium by
broadcasting a schedule that species when, and for how long, each controlled node may transmit over the
shared channel. The lack of contention overhead guarantees that this approach does not collapse under
high loads. Furthermore, with the proper scheduling policy, nodes get deterministic access to the medium
and can provide delay-bounded services as voice and multimedia streaming. Schedule-based medium
access is, therefore, the preferred choice for cellular phone systems (e.g., GSM) and wireless networks
supporting a mix of data and real-time trafc (e.g., Bluetooth).
34.1.2.1 Time-Division Multiple Access
Time-Division Multiple Access (TDMA) is an important schedule-based approach that controls the access
to a single channel (techniques for handling multiple channels will be discussed in Section 34.3.2.1).
In TDMA systems the channel is divided into slots, which are grouped into frames (see Figure 34.4). The
access point decides (schedules) which slot is to be used by which node. This decision can be made on a
per frame basis, or it can span several frames in which case the schedule is repeated.
In typical WLAN setups, most trafc is exchanged between the access point and the individual nodes.
In particular, communication between nodes rarely occurs. By limiting communication to up- and down-
link only, the scheduling problem is greatly simplied. Figure 34.4 shows a typical frame layout. The rst
slot in the frame is used by the access point to broadcast trafc control information to all nodes in its cell.
This information includes a schedule that species when each node must be ready to expect to receive a
packet (in the down-link section), and when it may send a packet (in the uplink section). The frame ends
with a contention period in which new nodes can register themselves with the access point, so they can be
included in future schedules.
Traffic
control
Contention
period
Frame n Frame n+2 Frame n+1
Downlink Uplink
FIGURE 34.4 TDMA frame structure: trafc controldownlinkuplinkcontention period.
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-5
The TDMA systems provide a natural way to conserve energy. A node can turn off its radio during all
slots in a frame in which it is not engaged in communication to/from the access point. It does require,
however, accurate time-synchronization between the access point and the individual nodes to ensure that
a node can wake-up exactly at the start of its slots. In a sensor network where activity is usually low,
a node is thenonaverage only awake for one slot eachframe to receive the trafc control information.
Enlarging the frame size reduces the energy consumption, but also increases the latency since a node has
to wait longer before its slot turns up. This fundamental energy/latency trade-off is further explored in
Section 34.3.2.
34.2 Requirements for Sensor Networks
The vast majority of MAC protocols described in the literature so far were designed, and optimized,
for scenarios involving satellite links (early work) and WLANs (recent developments). The deployment
scenarios for wireless sensor networks differ considerably, leading to a different set of requirements.
Inparticular, the unattended operationof sensor networks stresses the importance of energy efciency and
reduces the signicance of performance considerations, such as lowlatency, high throughput, and fairness.
Nevertheless, there are lessons to be learned from MAC protocols developed for wireless communication
systems, especially those targeting ad hoc networks of mobile nodes. The interested reader is referred to
a number of recent surveys in this area [1, 9, 10].
The task of the MAC layer in the context of sensor networks is to use the radio, with its limited
resources, as efciently as possible to send and receive data generated by the upper layers in the protocol
stack. It should take into account that data is often routed across multiple hops, and be able to handle
large-scale networks with hundreds, or even thousands of (mobile) nodes. To understand the design
trade-offs involved we will discuss the hardware characteristics of prototype sensor nodes in use today, as
well as common trafc patterns that have emerged in preliminary experiences with applications.
34.2.1 Hardware Characteristics
The current generation of sensor nodes, some of which are commercially available, are made up of
off-the-shelf components mounted on a small printed circuit board. In the future, we expect single chip
solutions with some of the protocol layers implemented in hardware. At the moment however, the MAC
protocols are running on the main processor, which drives a separate chip that takes care of converting
(modulating) bits to/from radio waves. The interface between the processor and the radio chip is at the
level of exchanging individual bits or bytes. The advantage of this low-level interface is that the MAC
designer has absolute control, which contrasts sharply with 802.11 WLAN equipment where the MAC
is usually included as part of the chipset on the PC card.
Popular processors include the 8-bit Atmel ATmega128L CPU used on the Mica motes, the 16-bit Texas
Instruments MSP430 used on the Eyes nodes, and the PIC-16 from Microchip. The exact specications
vary, but the processors typically run at a frequency in the 110 MHz range, and are equipped with 24 KB
of RAM. The processing capabilities provide ample headroom to drive the radio, but the limited amount
of storage space for local data puts a strong constraint on the memory footprint of the MAC protocol.
Since the focus of sensor node development is on energy consumption and form factor, we do anticipate
that future generations will still be quite limited in their processing and memory resources.
Table 34.1 provides details on the characteristics of two low-power radios employed in various state-
of-the-art sensor nodes. For reference, the specications of a typical 802.11 PC card are included. Several
important observations can be made. First, the energy consumed when sending or receiving data is two
to three orders of magnitude more than keeping the radio in a low-power standby state. Thus, the key to
effective energy management will be in switching the radio off and on. Second, the time needed to switch
from standby to active mode is considerable (518 sec to 2.0 msec), and the time needed to switch the
radio between transmit and receive mode is also nonnegligible. Therefore, the number of mode switches
should be kept to a minimum. Finally, the WaveLAN card (including the MAC) outperforms the other
2006 by Taylor & Francis Group, LLC
34-6 Embedded Systems Handbook
TABLE 34.1 Characteristics of Typical Radios in State-of-the-Art Sensor Nodes
Lucent WaveLAN
RFM TR 1001 [11] CC1000 [12] PCSilver card [13]
Operating frequency 868 MHz 868 MHz
a
2.4 GHz
Modulation scheme ASK FSK DSSS
Bit rate 115.2 kbps 76.8 kbps 11 Mbps
Energy consumption
Transmit 12 mA (1.5 dBm) 8.6 mA (20 dBm) 284 mA
25.4 mA (5 dBm)
Receive 3.8 mA 11.8 mA 190 mA
Standby 0.7 A 30 A 10 mA
Switch times
Standby-to-transmit 16 sec 2.0 msec
Receive-to-transmit 12 sec 270 sec
Standby-to-receive 518 sec
b
2.0 msec
Transmit-to-receive 12 sec 250 sec
Transmit-to-standby 10 sec
Receive-to-standby 10 sec
a
The CC1000 radio supports any frequency in the 300 to 1000 MHz range; the quoted numbers are
for 868 MHz.
b
Time needed to fully initialize receive circuitry; a simple carrier sense can be performed in 30 sec.
radios in terms of energy per bit (77 versus 312 J/bit); future nodes should include radios with higher
frequencies and more complex modulation schemes.
34.2.2 Communication Patterns
In the rapidly emerging eld of wireless sensor networks there is little experience with realistic, long-
running applications, which is unfortunate since a good characterization of the workload (in terms
of network trafc) is mandatory for designing a robust and efcient MAC protocol, or any other part
of the network stack for that matter. It is, however, clear that the nature of the trafc for sensor networks
has a few remarkable characteristics that sets it apart from your average WLAN trafc. From the various
proposed deployment scenarios, usually in the area of remote monitoring, and the limited data from
preliminary studies, such as the Great Duck Island [14] and vehicle tracking system [15], it becomes clear
that data rates are very low: typically in the order of 1200 bytes per second, with message payload sizes
around 2025 bytes. Furthermore, two distinct communication patterns (named convergecast and local
gossip in Reference 16) appear to be responsible for generating the majority of network trafc:
Convergecast. In many monitoring applications, information needs to be periodically transmitted to
a sink node so it can be processed at a central location or simply stored in a database for future use. Since
these individual reports are often quite small and need to travel across the whole network, the overhead
is quite large. Aggregating messages along the spanning tree to the sink node therefore pays off. At the very
least, two (or more) packets can be coalesced to share a common header. At the very best, two (or more)
messages can be combined into one, for example, when reporting the maximum room temperature.
Local gossip. When a sensor node observes a physical event, so do its neighbors since the node density
in a sensor network is expected to be high. This allows a node to check with the nodes in its vicinity if they
observed the same event or not, and in the latter case to derive that its sensor is probably malfunctioning.
If its neighbors do observe the same event (e.g., a moving target) they can collaborate to obtain a better
estimate of the event (location and speed) and report that back to the sink. Besides improving the quality
of the reported information, the collaboration also avoids n duplicate messages traveling all the way back
to the sink. Depending on the situation, neighbors may be addressed individually (unicast) or collectively
(broadcast). In any case, by sharing (gossiping) their sensor readings (rumors) nodes can reduce the
likelihood of false positives, and efciently report signicant events.
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-7
The important implication of these two communication patterns is that trafc is not distributed evenly
over the network. The amount of data varies both in space and in time. Nodes in the vicinity of the sink
relay much more trafc than nodes at the edges of the network due to the convergecast pattern. The
uctuation in time is caused by the physical events triggering outbursts of local gossip. In the extreme case
of a forest re detection system, nodes may be dormant for years before nally reporting an event. MAC
protocols should be able to handle these kinds of uctuations.
34.2.3 Miscellaneous Services
Often the MAC layer is expected to provide some network-related services not directly associated with
data transfer. Localization and time-synchronization algorithms often need precise information about
the moment of the physical transmission of a packet to factor out any time spent by the MAC layer
in contention resolution. The routing layer needs to be informed of any local changes in network topology,
for example, it needs to know when mobile nodes move in and out of radio range. Since the MAC layer
sits directly on top of the radio it can perform these services at no extra cost. Neighborhood discovery,
for example, must be carried out to ensure the proper operation of TDMA-based MAC protocols. We will
not consider these miscellaneous requirements in the remainder of this chapter, but concentrate on the
MAC protocols ability to transfer data as efciently as possible.
34.3 Energy Efciency
The biggest challenge for designers of sensor networks is to develop systems that will run unattended for
years. This calls for robust hardware andsoftware, but most of all for careful energy management, since that
is and will continue to be a limited resource. The current generation of sensor nodes is battery powered, so
lifetime is a major constraint; future generations powered by ambient energy sources (sunlight, vibrations,
etc.) will provide very low currents, so energy consumption is heavily constrained.
It is important to realize that the failure of individual nodes may not harm the overall functioning of
a sensor network, since neighboring nodes can take over provided that the node density is high enough
(which can be guaranteed at roll out). Therefore, the key parameter to optimize is network lifetime,
that is, the time until the network gets partitioned. The MAC layer operates on a local scale (all nodes
within reach) and lacks the global information to optimize for network lifetime. This is therefore best
accomplished at the upper layers of the protocol stack, in particular the routing and transport (data
aggregation) layers, which do have a global overview. This works most effectively when the MAC layer
ensures that the energy it spends is directly related to the amount of trafc that it handles. Thus, the MAC
layer should optimize for energy efciency.
In contrast to typical WLAN protocols, MAC protocols designed for sensor networks usually trade-off
performance (latency, throughput, fairness) for cost (energy efciency, reduced algorithmic complexity).
It is, however, not clear cut what the best trade-off is, and various designs differ signicantly as will
become apparent in Section 34.3.2 where we will reviewthe basic design choices made by 20 WSN-specic
MAC protocols. Before that, we will consider the major sources of overhead that render WLAN-style
(contention-based) MAC protocols ineffective in the context of sensor networks.
34.3.1 Sources of Overhead
When running a contention-based MAC protocol on an ad hoc network with little trafc, much energy
is wasted due to the following sources of overhead:
Idle listening. Since a node does not know when it will be the receiver of a message from one of its
neighbors, it must keep its radio in receive mode at all times. This is the major source of overhead, since
typical radios consume two orders of magnitude more energy in receive mode (even when no data is
arriving) than in standby mode (cf. Table 34.1).
2006 by Taylor & Francis Group, LLC
34-8 Embedded Systems Handbook
TABLE 34.2 Impact of Overhead on Contention-Based Protocols (C) and
Schedule-Based Protocols (S)
Performance (latency, Cost
Source throughput, fairness) (energy efciency)
Collisions C C
Protocol overhead C, S C, S
Idle listening C
Overhearing C
Trafc uctuations C, S C, S
Scalability/mobility S S
Collisions. If two nodes transmit at the same time and interfere with each others transmission, pack-
ets are corrupted. Hence, the energy used during transmission and reception is wasted. The RTS/CTS
handshake effectively resolves the collisions for unicast messages, but at the expense of protocol overhead.
Overhearing. Since the radio channel is a shared medium, a node may receive packets that are not
destined for it; it would have been more efcient to have turned off its radio.
Protocol overhead. The MAC headers and control packets used for signaling (ACK/RTS/CTS) do not
contain application data and are therefore considered overhead; these overheads can be signicant since
many applications only send a few bytes of data per message.
Trafc uctuations. A sudden peak in activity raises the probability of a collision, hence, much time
and energy are spent on waiting in the random backoff procedure. When the load approaches the channel
capacity, the performance can collapse with little or no trafc being delivered while the radio, sensing for
a clear channel, is consuming a lot of energy.
Switching to a schedule-based protocol (i.e., TDMA) has the great advantage of avoiding all energy waste
due to collisions, idle listening, and overhearing since TDMA is inherently collision free and the schedule
noties each node when it should be active and, more importantly, when not. The price to be paid is in
xed costs (i.e., broadcasting trafc schedules) and reduced exibility to handle trafc uctuations and
mobile nodes. The usual solution is to resort to some form of overprovisioning and choosing a frame size
that is large enough to handle peak loads. Dynamically adapting the frame size is another approach, but
this largely increases the complexity of the protocol and, hence, is considered to be an unattractive option
for resource-limited sensor nodes. Table 34.2 compares the impact of the various sources of overhead on
the performance and cost (energy efciency) of contention-based and schedule-based MAC protocols.
34.3.2 Trade-Offs
Different MAC protocols make different choices regarding the performanceenergy trade-off, and also
between sources of overhead (e.g., signaling versus collisions). A survey of 20 medium access protocols
specially designed for sensor networks, and hence optimized for energy efciency, revealed that they can
be classied according to three important design decisions:
1. The number (and nature) of the physical channels used.
2. The degree of organization (or independence) between nodes.
3. The way in which a node is notied of an incoming message.
Table 34.3 provides a comprehensive protocol classication based on these three issues. Given that the
protocols are listed chronologically based on their publication date, we observe that there is no clear trend
indicating that medium access for wireless sensor networks is converging toward a unique, best solution.
On the contrary, new combinations are still being invented showing that additional information (from
simulations and practical experience) is needed to decide on the best approach. Section 34.7 provides a
simulation-based head-to-head comparison of four protocols representing very distinctive choices in the
design space. We will not discuss all individual MAC protocols listed in Table 34.3 in detail, but rather
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-9
TABLE 34.3 Protocol Classication
Protocol Published Channels Organization Notication
SMACS [17] 2000 FDMA Frames Schedule
PACT [18] 2001 Single Frames Schedule
PicoRadio [19] 2001 CDMA +tone Random Wake-up
STEM [20] 2002 Data +control Random Wake-up
Preamble sampling [21] 2002 Single Random Listening
Arisha [22] 2002 Single Frames Schedule
S-MAC [23] 2002 Single Slots Listening
PCM [24] 2002 Single Random Listening
Low Power Listening [25] 2002 Single Random Listening
Sift [26] 2003 Single Random Listening
EMACs [27] 2003 Single Frames Schedule (per node)
T-MAC [28] 2003 Single Slots Listening
TRAMA [29] 2003 Single Frames Schedule (per node)
WiseMAC [30] 2003 Single Random Listening
B-MAC [31] 2003 Single Random Listening
BMA [32] 2004 Single Frames Schedule
Miller [33] 2004 Data +tone Random Wake-up +listening
DMAC [34] 2004 Single Slots (per level) Listening
SS-TDMA [16] 2004 Single Frames Schedule
LMAC [35] 2004 Single Frames Listening
review three fundamental design choices that MAC designers will encounter while crafting a protocol best
matching their envisioned deployment scenario.
34.3.2.1 Use Multiple Channels, or Not?
Data+tone
Double
Data+control
Multiple
CDMA FDMA
Single
Channels
The rst design choice that we discuss is whether or not the radio should be capable of dividing the avail-
able bandwidth into multiple channels. Two common techniques for doing so are Frequency-Division
Multiple Access (FDMA) and Code-Division Multiple Access (CDMA). FDMA partitions the total band-
width of the channel into number of small frequency bands, called subcarriers, on which multiple nodes
can transmit simultaneously without collision. CDMA on the other hand, uses a single carrier in com-
bination with a set of orthogonal codes. Data packets are XOR-ed with a specic code by the sender
before transmission, and then XOR-ed again by the receiver with the same code to retrieve the original
data. Receivers using another code perceive the transmission as (pseudo) random noise. This allows the
simultaneous and collision-free transmission of multiple messages.
The absence of collision in a multiple-channel system is attractive, hence its popularity in early pro-
posals, such as SMACS (FDMA) and PicoRadio (CDMA). It requires, however, a rather complicated
radio consuming considerable amounts of energy, so most MAC protocols are designed for a simple radio
providing just a single channel. An interesting alternative is to use a second, extremely low-power radio
that can be used for signaling an intended receiver to wake-up and turn on its primary radio to receive a
data packet. In the most simple, most energy-efcient case, the second radio is only capable of emitting
a xed tone waking-up all neighboring nodes (including the intended receiver). Miller and Vaidya [33]
discuss several policies to minimize the number of false wake-ups by overhearing nodes. STEM uses a
full-blown second radio to control exactly which node responds on the primary channel.
2006 by Taylor & Francis Group, LLC
34-10 Embedded Systems Handbook
34.3.2.2 Get Organized, or Not?
Frames Random
Organization
Slots
The second design choice that we discuss is if, and how much, the nodes in the network should be organized
to act together at the MAC layer. The CSMA and TDMA protocols discussed before represent the two
extremes in the degree of organization: from completely random to frame-based access. The advantages of
contention-based protocols (random access) are the low implementation complexity, the ad hoc nature,
and the exibility to accommodate mobile nodes and trafc uctuations. The major advantage of frame-
based TDMA protocols is the inherent energy efciency due to the lack of collisions, overhearing, and
idle-listening overheads.
Since the advantages of random access are the drawbacks of frame-based access, and vice versa, some
MAC protocols have chosen to strike a middle ground between these two extremes and organize the sensor
nodes in a slotted system (much like slotted-ALOHA). The Sensor-MAC (S-MAC) protocol was the rst to
propose that nodes agree on a common slot structure, allowing them to implement an efcient duty cycle
regime; nodes are awake in the rst part of each slot and go to sleep in the second part, which signicantly
reduces the energy waste due to idle-listening.
The protocol classication in Table 34.3 shows that the research community is divided into what degree
of organization to apply: we nd nine contention-based, three slotted, and eight TDMA-based protocols.
Since we view the organizational design decision as the most critical, we will detail the main protocols
from each class in Sections 34.4 to 34.6.
34.3.2.3 Get Notied, or Not?
Schedule Listening
Notification
Wake-up
The third and nal design issue is about how the intended receiver of a message transfer will get
notied. In schedule-based protocols, the actual data transfers are scheduled ahead of time, so receiving
nodes know exactly when to turn on the radio. Such knowledge is not available in contention-based
protocols, so receiving nodes must be prepared to handle an incoming transfer at any moment. Without
further assistance fromthe sender, the receiver has noother optionthantolistencontinuously. Toeliminate
the resulting idle-listening overhead completely, senders may actively send a wake-up signal (tone) over
a second, very low-power radio. Although the wake-up model matches well with the low-packet rates
of sensor network applications, all contention-based protocols except PicoRadio, STEM, and Millers
proposal are designed for nodes with a single radio. The general approach to reduce the inherent idle
listening in these nodes is to enforce some kind of duty cycle by periodically switching the radio on for
a short time. This can be arranged individually per node (Low-Power Listening [LPL], and preamble
sampling, Section 34.4.2) or collectively per slot (S-MAC, Section 34.5.1). An alternative is to circumvent
the idle-listening problem, as the Sift protocol does, by restricting the network to a cellular topology where
access points collect data from nearby sensor nodes.
We like to point out that the choice for a particular notication policy is largely dependent on the
available hardware channels and the organizational model discussed before. Schedule-based notication
matches with TDMA frames; wake-up is only possible on dual-channel nodes. The Lightweight Medium
ACcess (LMAC) protocol (Section 34.6.1), however, is the exception to the rule and combines TDMA
frames with listening, striking a different balance between exibility and energy efciency.
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-11
34.4 Contention-Based Protocols
We now proceed with describing in detail some of the medium access protocols developed for sensor
networks according to their particular choice of organizational model (see Table 34.3). In this section, we
review contention-based protocols in which nodes can start a transmission at any random moment and
must contend for the channel. The main challenge with contention-based protocols is to reduce the energy
consumption caused by collisions, overhearing, and idle listening. CSMA/CA protocols effectively deal
with collisions and can be easily adapted to avoid a lot of overhearing overhead (i.e., switch off the radio
for the duration of another transmissions sequence). We also discuss the familiar IEEE 802.11 protocol.
even though it was not developed specically for sensor networks. It does, however, form the basis of the
energy-efcient derivatives discussed in this section (LPL and WiseMAC), as well as the slotted protocols
(S-MAC and Timeout-MAC [T-MAC]) discussed in the next section.
34.4.1 IEEE 802.11
The MAC in the IEEE 802.11 standard [7] is based on carrier sensing (CSMA) and collision detection
(through acknowledgments). A node wanting to transmit a packet must rst test the radio channel to
check if it is free for a specied time called the Distributed Inter Frame Space (DIFS). If so, a DATApacket
1
is transmitted, and the receiver waits a Short Inter Frame Space (SIFS) before acknowledging the reception
of the data by sending an ACK packet. Since the SIFS interval is set shorter than the DIFS interval, the
receiver takes precedence over any other node attempting to send a packet. If the sender does not receive
the acknowledgment, it assumes that the data was lost due to a collision at the receiver and enters a binary
exponential backoff procedure. At each retransmission attempt, the length of the contention window(CW)
is doubled. Since contending nodes randomly select a time from their CW, the probability of a subsequent
collision is reduced by half. To bound access latency somewhat, the CW is not doubled once a certain
maximum (CW
max
) has been reached.
To account for the hidden terminal problem in ad hoc networks, the 802.11 standard denes a virtual
carrier sense mechanismbased on the collision avoidance handshake of the MACAprotocol. The RTS/CTS
control packets include a time eld in their header, that species the duration of the upcoming DATA/ACK
sequence. This allows neighboring nodes overhearing the control packets to set their network allocation
vector (NAV) and defer transmission until it expires (see Figure 34.5). To save energy, the radio can be
switched off for the duration of the NAV. Thus CSMA/CAeffectively eliminates collisions and overhearing
overhead for unicast packets. Broadcast and multicast packets are always transmitted without an RTS/CTS
reservation sequence (and without an ACK), so they are susceptible to collisions.
34.4.2 LPL and Preamble Sampling
The major disadvantage of CSMA/CA is the energy wasted by idle-listening. Both Hill and Culler [25],
and El-Hoiydi [21] independently developed a low-level carrier sense technique that effectively duty cycles
RTS
CTS
Data
ACK
CW
Sender node
Receiver node
Other nodes
SIFS
SIFS
SIFS DIFS
NAV (RTS)
NAV (CTS)
DIFS
FIGURE 34.5 IEEE 802.11 access control.
1
The 802.11 standard denes the transmission protocol in terms of frames, but we use the term packet instead to
avoid confusion with the framing structure of TDMA protocols.
2006 by Taylor & Francis Group, LLC
34-12 Embedded Systems Handbook
Receiver
Preamble Message
Sender
FIGURE 34.6 LPL: a long preamble allows periodic sampling at the receiver.
the radio, that is, turns it off repeatedly, without losing any incoming data. This technique operates at the
physical layer and concerns the layout of the PHY header prepended to each radio packet. This header
starts off with a preamble that is used to notify receivers of the upcoming transfer and allows them to
adjust (train) their circuitry to the current channel conditions; next follows the startbyte, signaling the
true beginning of the data transfer. The basic idea behind the efcient carrier-sense technique is to shift
the cost from the receiver (the frequent case) to the transmitter (the rarer case) by increasing the length
of the preamble. This allows the receiver to periodically turn on the radio to sample for incoming data,
and detect if a preamble is present or not. If it detects a preamble, it will continue listening until the start-
symbol arrives and the message can be properly received (see Figure 34.6). If no preamble is detected, the
radio is turned-off again until the next sample.
This efcient carrier-sense method can be applied to any contention-based MAC protocol. El-Hoiydi
combined it with ALOHA and named it preamble sampling [21]. Hill and Culler combined it with CSMA
and named it Low-Power Listening [25]. Neither implementation includes collision avoidance to save on
protocol overhead. The energy savings depend on the duty cycle, which in turn depends on the switching
times of the radio. LPL, for example, was implemented as part of TinyOS running on Mica motes equipped
with an RFM 1000 radio capable of performing a carrier sense in just 30 sec (cf. Table 34.1). The carrier
is sensed every 300 sec, yielding a duty-cycle of 10%, effectively reducing the idle-listening overhead by a
factor of ten. The energy savings come at a slight increase in latency (the length of the preamble is doubled
to 647 sec), and minor reduction in throughput. In the recently proposed B-MAC implementation (part
of TinyOS 1.1.3) the preamble length is provided as a parameter to the upper layers, so they can select the
optimal trade-off between energy savings and performance [31].
34.4.3 WiseMAC
El-Hoiydi has rened his preamble sampling one step further, by realizing that long preambles are not
necessary when the sender knows the sampling schedule of the intended receiver. The sender can then
simply wait until the moment the receiver is about to sample the channel, and send a packet with an
ordinary preamble. This not only saves energy at the sender, who waits instead of emitting an extended
preamble, but also at the receiver, since the time until the start symbol occurs is reduced in length
considerably. In WiseMAC [30] nodes maintain the schedule offsets of their neighbors through piggy
backed information on the ACKnowledgments of the underlying CSMA protocol. Whenever a node needs
to send a message to a specic neighbor n, it uses ns offset to determine when to start transmitting the
preamble; to account for any clock drift, the preamble is extended with a time proportional to the length of
the interval since the last message exchange. The overall effect of these measures is that WiseMAC adapts
automatically to trafc uctuations. Under low load, WiseMAC uses long preambles and consumes low
power (receiver costs dominate); under high loads, WiseMAC uses short preambles and operates energy
efciently (overheads are minimized). Finally, note that WiseMACs preamble length optimization is not
very effective for broadcast messages, since the preamble must span the sampling points of all neighbors
and account for drift, so it is quite often stretched to full length.
34.5 Slotted Protocols
The three slotted protocols (S-MAC, T-MAC, and Data-gathering MAC [DMAC]) listed in Table 34.3 are
all derived from classical contention-based protocols. They address the inherent idle-listening overhead
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-13
Sleep Sleep Active Active SYNC SYNC
FIGURE 34.7 Slot structure of S-MAC with built-in duty cycle.
by synchronizing the nodes, and implementing a duty cycle within each slot. At the beginning of a slot,
all nodes wake-up and any node wishing to transmit a message must contend for the channel. This
synchronized behavior increases the probability of collision in comparison to the random organization of
the energy-efcient CSMA protocols discussed in the previous section. To mitigate the increased collision
overheads S-MAC and T-MAC include an RTS/CTS handshake, but DMAC does without to save on
protocol overhead. The three slotted protocols also differ in their way of deciding when and how to switch
back from active to sleep mode, as will become apparent in the following discussions.
34.5.1 Sensor-MAC
The S-MAC protocol developed by Ye et al. [23] introduces a technique called virtual clustering to allow
nodes to synchronize on a common slot
2
structure (Figure 34.7). To this end nodes regularly broadcast
SYNC packets at the beginning of a slot, so other nodes receiving these packets can adjust their clocks
to compensate for drift. The SYNC packets also allow new (mobile) nodes to join the ad hoc network.
In principle, the whole network runs the same schedule, but due to mobility and bootstrapping a network
may comprise several virtual clusters. For the details of the synchronization procedure that resolves the
rare occasion of two clusters meeting each other, please refer to Reference 36.
An S-MAC slot starts off with a small synchronization phase, followed by a xed-length active period,
and ends with a sleep period in which nodes turn off their radio. Slots are rather large, typically in the
order of 500 msec to 1 sec. The energy savings of S-MACs built-in duty cycle are under control of the
application: the active part is xed
3
to 300 msec, while the slot length can be set to any value. Besides
addressing the idle-listening overhead, S-MAC includes collision avoidance (RTS/CTS handshake) and
overhearing avoidance. Finally, S-MAC includes message passing support to reduce protocol overhead
when streaming a sequence of message fragments.
The applications explicit control over the idle-listening overhead is a mixed blessing. On the one hand,
the application is in control of the energy-performance trade-off, which is good. On the other hand, the
duty cycle must be decided upon before starting S-MAC, which is bad since the optimal setting depends
on many factors including the expected occurrence rate of events observed after the deployment of the
nodes, and may even change over time.
34.5.2 Timeout-MAC
The T-MAC protocol by van Dam and Langendoen [28] introduces an adaptive duty cycle to improve
S-MAC on two accounts. First, T-MAC frees the application from the burden of selecting an appropriate
duty cycle. Second, T-MAC automatically adapts to trafc uctuations inherent to the local gossip and
convergecast patterns, while S-MACs slot length must be chosen conservatively to handle worst-case
trafc.
T-MAC borrows the virtual clustering method of S-MAC to synchronize nodes. In contrast to S-MAC,
it operates with xed length slots (615 msec) and uses a timeout mechanism to dynamically determine
the end of the active period. The timeout value (15 msec) is set to span a small contention period and an
RTS/CTS exchange. If a node does not detect any activity (an incoming message or a collision) within the
2
The S-MAC protocol is dened in terms of frames, but we use the term slot instead to avoid confusion with the
framing structure of TDMA protocols.
3
A recent enhancement of S-MAC, which is called adaptive listening, includes a variable length active part to reduce
multihop latency [36]. Since the timeout policy of the T-MAC protocol behaves similarly and was designed to handle
trafc uctuations as well, we do not discuss adaptive listening further.
2006 by Taylor & Francis Group, LLC
34-14 Embedded Systems Handbook
Recv
Recv
Recv
Recv
Recv
Recv
Send
Send
Send
Sink
Sleep
Sleep
Sleep
Send
Send
Send
FIGURE 34.8 Convergecast tree with matching, staggered DMAC slots.
timeout interval, it can safely assume that no neighbor wants to communicate with it and goes to sleep.
On the other hand, if the node engages or overhears a communication, it simply starts a new timeout
after that communication nishes. To save energy, a node turns off its radio while waiting for other
communications to nish (overhearing avoidance).
The adaptive duty cycle allows T-MAC to automatically adjust to uctuations in network trafc. The
downside of T-MACs rather aggressive power-down policy, however, is that nodes often go to sleep too
early: when a node s wants to send a message to r, but loses contention to a third node n that is not
a common neighbor, s must remain silent and r goes to sleep. After ns transmission nishes, s will send
out an RTS to sleeping r and receives no matching CTS, hence, s must wait until the next frame to try
again. T-MAC includes two measures to alleviate this so-called early-sleeping problem, for details refer
to Reference 28, but the results in Section 34.7 showthat it strongly favors energy savings over performance
(latency/throughput).
34.5.3 Data-Gathering MAC
The DMACprotocol by Lu et al. [34] is the third slotted protocol that we discuss. For energy efciency and
ease of use, DMAC includes an adaptive duty cycle like T-MAC. In addition, it provides low node-to-sink
latency, which is achieved by supporting one communication paradigm only: convergecast.
DMAC divides time into rather short slots (around 10 msec) and runs CSMA (with acknowledgments)
within each slot to send or receive at most one message. Each node repeatedly executes a basic sequence
of one receive, one send, n sleep slots. At setup DMAC ensures that the sequences are staggered to match
the structure of the convergecast tree rooting at the sink node (see Figure 34.8). This arrangement allows
a single message from a node at depth d in the tree to arrive at the sink with a latency of just d slot times,
which is typically in the order of tens of milliseconds. DMAC includes an overow mechanism to handle
multiple messages in the tree. In essence, a node will stay awake for one more slot after relaying a message,
so in the case of two children contending for their parents receive slot, the one losing will get a second
chance. To account for interference, the overow slot is not scheduled back to back with the send slot,
but instead, receive slots are scheduled ve slots apart. The overow policy automatically takes care of
adapting to the trafc load, much like T-MACs extension of the active period.
The results reported in Reference 34 show that DMAC outperforms S-MAC in terms of latency
(due to the staggered schedules), throughput, and energy efciency (due to the adaptivity).
It remains to be seen if DMAC can be enhanced to support communications other than convergecast
equally well.
34.6 TDMA-Based Protocols
The major attractions of a schedule-based MACprotocol are that it is inherently collision free and that idle
listening can be ruled out since nodes know beforehand when to expect incoming data. The challenge is to
adapt TDMA-based protocols to operate efciently in ad hoc sensor networks without any infrastructure
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-15
(i.e., access points). We will now briey discuss the different approaches taken by the frame-based protocols
listed in Table 34.3:
Sink-based scheduling. The approach taken by Arisha et al. [22] is to partition the network into large
clusters, in which multihop trafc is possible. The trafc within each cluster is scheduled by a sink
node that is connected to the wired backbone network, and hence equipped with increased resources.
The goal is to optimize network lifetime, and the sink therefore takes the energy levels of each node into
account when deciding (scheduling) which nodes will sense, which nodes will relay, and which nodes
may sleep. The TDMA schedule is periodically refreshed to adapt to changes. It is required that all nodes
can directly communicate with the sink node (at maximum transmit power), which clearly limits the
scalability. Furthermore, the TDMA frame is of xed length, so the maximum number of nodes must be
known before deployment.
Static scheduling. The Self-Stabilizing (SS-TDMA) protocol by Kulkarni and Arumugam [16] uses a
xed schedule throughout the lifetime of the network, which completely removes the need for a centralized
(or distributed) scheduler. SS-TDMA operates on regular topologies, such as square and hexagonal grids
and synchronizes trafc network-wide in rounds: all even rows transmit a north-bound message, all odd
rows transmit a south-bound message, and so on. They show that such static schedules can result in
acceptable performance for typical communication patterns (broadcast, convergecast, and local gossip),
but their constraints on the location of the nodes renders it impractical in many deployment scenarios.
Rotating duties. When the node density is high, the costs of serving as an access point may be amortized
over multiple nodes by rotating duties among them. The PACT [18] protocol uses passive clustering to
organize the network into a number of clusters connected by gateway nodes; the rotation of the cluster
heads and gateways is based on information piggy-backed on the control messages exchanged during the
trafc control phase of the TDMA schedule. The BMA protocol [32] uses the LEACH approach [37]
to manage cluster formation and rotation. At the start of a TDMA frame, each node broadcasts one
bit of information to its cluster head stating whether or not the node has data to send. Based on this
information, the cluster head determines the number of data slots needed, computes the slot assignment,
and broadcasts that to all nodes under its control. Note that the bit-level trafc announcements require
very tight time synchronization between the nodes in the cluster.
Partitioned scheduling. In the EMACs protocol by van Hoesel et al. [27] the scheduling duties are
partitioned according to slot number. Each slot serves as a mini-TDMAframe and consists of a contention
phase, a trafc control section, and a data section. An active node that owns a slot always transmits in
its own slot. Therefore, a node n must listen to the trafc control sections of all its neighbors, since n
may be the intended receiver of any of them. The contention phase is included to serve passive nodes that
do not own a slot; the idea being that only some nodes need to be active to form a backbone network
ready to be used by passive nodes when they detect an event. In many scenarios, events occur rarely, so
the energy spent in listening for requests forms a major source of overhead. The LMAC protocol by the
same authors therefore simply does without a contention interval. This improved protocol is discussed in
detail below. In comparison to other TDMA-based protocols, both EMACs and LMAChave the advantage
of supporting node mobility, which signicantly increases their scope of deployment. The results in
Section 34.7 show that, performance-wise, partitioned scheduling is also an attractive option.
Replicated scheduling. The approach taken by Rajendran et al. [29] in the TRAMAprotocol is to replicate
the scheduling process over all nodes within the network. Nodes regularly broadcast information about
(long-running) trafc ows routed through them and the identities of their one-hop neighbors. This
results in each node being informed about the demands of its one-hop neighbors and the identity of
its two-hop neighbors. This information is sufcient to determine a collision-free slot assignment by
means of a distributed hash function that computes the winner (i.e., sender) of each slot based on the
node identities and slot number. During execution the schedule may be adapted to match actual trafc
conditions; nodes with little trafc may release their slot for the remainder of the frame for use by other
(overloaded) nodes. Although TRAMA achieves high channel utilization, it does so at the expense of
considerable latency and high algorithmic complexity.
2006 by Taylor & Francis Group, LLC
34-16 Embedded Systems Handbook
Fromthe discussionabove, it becomes apparent that distributing TDMAout into ad hoc networks is rather
complicated and requires major compromises on deployment scenario (SS-TDMAand Arishas protocol),
algorithmic complexity (TRAMA), exibility/adaptivity (EMACs and LMAC), and latency (all protocols).
Although TDMA is inherently free of collision and idle-listening overheads, PACT and BMA rely on the
higher layers to amortize the overheads of the TDMA scheduler over rotating cluster heads.
Note that the partitioned and replicated scheduling approaches are most similar to contention-based
and slotted protocols in the sense that nodes operate autonomously, making them easy to install and
operate, and robust to node failures. The algorithmic complexity of the TRAMA protocol (replication)
is beyond the scope of this chapter, so we will only detail the LMAC protocol (partitioning).
34.6.1 Lightweight Medium Access
With the LMAC protocol [35], nodes organize time into slots, grouped into xed-length frames. A slot
consists of a trafc control section (12 bytes) and a xed-length data section. The scheduling discipline is
extremely simple: each active node is in control of a slot. When a node wants to send a packet, it waits until
its time-slot comes around, broadcasts a message header in the control section detailing the destination
and length, and then immediately proceeds with transmitting the data. Nodes listening to the control
header turn off their radio during the data part if they are not an intended receiver of the broadcast
or unicast message. In contrast to all other MAC protocols, the receiver of a unicast message does not
acknowledge the correct reception of the data; LMAC puts the issue of reliability at the upper layers.
The LMACprotocol ensures collision-free transmission by having nodes select a slot number that is not
in use within a two-hop neighborhood (much like frequency reuse in cellular communication networks).
To this end, the information broadcasted in the control section includes a bit set detailing which slots
are occupied by the one-hop neighbors of the sending node (i.e., the slot owner). New nodes joining the
network listen for a complete frame to all trafc control sections. By OR-ing the occupancy bit sets, they
can determine which slots are still free (Figure 34.9). The new node randomly selects a slot and claims
it by transmitting control information in that slot. Collisions in slot-selection result in garbled control
sections. A node observing such a collision, broadcasts the involved slot number in its control section,
which will be overheard by the unfortunate new nodes, who will then backoff and repeat the selection
process.
The drawback of LMACs contention-based slot-selection mechanism is that nodes must always listen
to the control sections of all slots in a frame even the unused ones since other nodes may join the
network at arbitrary moments. The resulting idle-listening overhead is minimized by taking one sample
of the carrier in an unused slot to sense any activity (cf. preamble sampling in Section 34.4.2). If there was
activity, the slot is included in the occupancy bit set and listened to completely in the next frame. The end
result is that LMAC combines a frame-based organization with notication by listening.
...1010111...
4
1
?
7
6
5
...1001010...
...0100111...
...1001111...
...1001110...
...0100110...
...0100111...
...0110110...
...0010110...
2
3
5
6
OR-ed bit sets for new node:
...1110111...
FIGURE 34.9 Slot-selection by LMAC. Nodes are marked with slot number and occupancy bit set.
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-17
34.7 Comparison
In the previous sections we reviewed 20 energy-efcient MAC protocols especially developed for sensor
networks. We discussed the qualitative merits of the different organizations: contention-based, slotted,
and TDMA-based protocols. When available, we reported quantitative results published by the designers
of the protocol at hand. Unfortunately, results from different publications are difcult to compare due to
the lack of a standard benchmark, making it hard to draw any nal conclusions. This section addresses
the need for a quantitative comparison by presenting the results from a study into the performance
and energy efciency of four MAC protocols (LPL, S-MAC, T-MAC, and LMAC) on top of a common
simulation platform. For reference we also report on the classic IEEE 802.11 protocol (in ad hoc mode).
The work load used to evaluate the protocols ranges from standard micro-benchmarks (latency and
throughput tests) to communication patterns specic to sensor networks (local gossip and convergecast).
34.7.1 Simulation Framework
The discrete-event simulator developed at Delft University of Technology includes a detailed model of the
popular RFM TR1001 low-power radio (discussed in Section 34.2.1) taking turnaround and wake-up
times (12 and 518 sec, respectively) into account. Energy consumption is based on the amount of
energy the radio uses; we do not take protocol processing costs on a CPU driving the radio into account.
The simulator records the amount of time spent in various states (standby, transmit, and receive/idle);
transitions between states are modeled as time spent in the most energy consuming state. At the end of
a run the simulator computes the average energy consumed for each node in the network using the current
drawn by the radio in each state (Table 34.1) and an input voltage of 3 V.
The ve MAC protocols under study are implemented as a class hierarchy on top of the physical
layer, which is a thin layer encapsulating the RFM radio model. The physical layer takes care of low-
level synchronization (preambles, start/stop bits) and proper channel coding. We now briey discuss the
implementation details of the ve MAC protocols:
802.11. The IEEE 802.11 (CSMA/CA) protocol was implemented using an 8 byte header encoding the
message type (RTS/CTS/DATA/ACK), source and destination ID (2 bytes each), sequence number, data
length, and CRC. The payload of the DATA packet can be up to 250 bytes. The sequence number serves
for detecting duplicate packets; retransmissions are triggered upon detection of a missing CTS or ACK
packet.
LPL. The LPL protocol (CSMA with acknowledgments) was implemented with the DATA and ACK
packets from the 802.11 implementation. LPL was set to sample the radio with a 10% duty cycle: 30 sec
carrier sense, 300 sec sample period. The preamble was stretched with one sample period to 647 sec.
Since hidden nodes make CSMAsusceptible to collisions, LPLs initial contend time is set somewhat larger
than for 802.11 (9.15 versus 3.05 msec).
S-MAC. The implementation of the S-MAC protocol extends the 802.11 model with SYNC packets
(8 byte header + 2 byte timestamp) to divide time into slots of 610 msec (20,000 ticks of a 32 kHz crystal).
Like LPL, S-MAC is set to operate with a 10% duty cycle, hence, the active period is set to 61 msec. This
is different from the original implementation to account for the different radio bitrate in our simulator
and to bring the frame length in line with T-MAC. Since trafc is grouped into bursts in the active
period, S-MAC deviates from the 802.11 backoff scheme and uses a xed contend time of 9.15 msec.
To reduce idle-listening overhead we choose to remove the synchronization section from the original
S-MAC protocol; SYNC packets are transmitted in the active period of a slot. To reduce interference with
other packets, a node transmits a SYNC packet only once every 90 sec on average. In our grid topology
with eight neighbors within radio range, that amounts to receiving a SYNC message every 11 sec.
T-MAC. The implementation of the T-MAC protocol enhances the S-MAC model with a variable-
length active period controlled by a 15 msec timeout value, which is set to span the contention period
(9.15 msec), an RTS (1.83 msec), the radio turnover period (12 sec), and the start of a CTS. This timeout
value causes T-MAC to operate with a 2.5% duty cycle in an empty network. In a loaded network the duty
2006 by Taylor & Francis Group, LLC
34-18 Embedded Systems Handbook
TABLE 34.4 Implementation Details of the Simulator
PHYsical layer
Channel coding 8-to-16 bit coding
Effective bit rate 46 kbps
Prelude 433 sec (347 sec preamble + startbyte)
Carrier sense 30 sec
802.11 [extends PHY]
Control packets 8 bytes
DATA packets 8 byte header and 0250 byte payload
Contend time 3.05305 msec
LPL [extends 802.11]
Sample period 300 sec
Contend time 9.15305 msec
S-MAC [extends 802.11]
SYNC packets 10 bytes
Slot time 610 msec
Active period 61 msec
Contend time 9.15 msec
T-MAC [extends S-MAC]
Activity timeout 15 msec
LMAC [extends PHY]
Slot time 14.3 msec (76 bytes)
Frame time 456 msec (32 slots)
cycle will increase as the active period is adaptively extended. All options for mitigating the early-sleeping
problem are included, see Reference 28 for details.
LMAC. The LMAC protocol was implemented from scratch on top of the physical layer. It was set to
operate with the maximum of 32 slots per frame to ensure that all nodes within a two-hop neighborhood
can own a slot for typical node densities (up to ten neighbors). The slot size was set to 76 bytes (12 byte
header + 63 byte data section + 1 byte CRC) to support a reasonable range of application-dependent
message sizes. We short-circuited LMACs collision-based registration procedure by randomly selecting
a slot number for each node at the start of a simulation run. A node listens to the 12 byte control
sections of all slots owned by its one-hop neighbors, it polls the other slots in the frame with the short,
30 sec carrier sense function to detect new nodes joining the network (which never happens during the
experiments).
For convenience Table 34.4 lists the key parameters of the MAC protocols used in our comparison. Note
that the LMAC implementation includes a certain overprovisioning, since the experiments involve just
24 two-hop neighbors (<32 slots) and messages with a 25 byte payload (<63 bytes). This is the price to
be paid for LMACs simplicity; other protocols, however, pay in terms of overhead (RTS/CTS signaling).
Another important characteristic of LMAC is that it does not try to correct any transmission errors, while
the others automatically do so through their retransmission policy for handling collisions. This difference
also shows up in the estimated memory footprint (i.e., RAM usage) and code complexity of the MAC
protocols listed in Table 34.5. All protocols except LMAC maintain information about the last sequence
number seen from each neighbor to lter out duplicates.
Our experiments use a static network with a grid topology. The radio range was set so that the nonedge
nodes all have eight neighbors. Concurrent transmissions are modeled to cause collisions if the radio
ranges (circles) of the senders intersect; nodes in the intersection receive a garbled packet with a failing
CRC check.
The application is modeled by a trafc generator at every node. The generator is parameterized to
send messages with a 25 byte payload either to direct neighbors (i.e., nodes within the radio range of the
sender), or to the sink node, which is located in the bottom-left corner of the grid. To route the latter
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-19
TABLE 34.5 Code Complexity and Memory Usage
802.11 LPL S-MAC T-MAC LMAC
a
Code complexity (lines) 400 325 625 825 250
RAM usage (bytes) 51 49 78 80 15
a
The LMAC protocol leaves acknowledgments and retransmissions to the
higher layers, adding about 75 lines of code and 40 bytes of RAM to those
layers.
TABLE 34.6 Base Performance with an Empty Network
802.11 LPL S-MAC T-MAC LMAC
Energy consumption (mW) 11.4 1.14 1.21 0.37 0.75
Effective duty cycle (%) 100 10 11 3.2 6.6
messages to the sink, we use a randomized shortest path routing method; for each message, the possible
next hops are enumerated. Next hops are eligible if they have a shorter path to the nal destination than
the sending node. From these next hops, a random one is chosen. Thus messages ow in the correct
direction, but do not use the same path every time. No control messages are exchanged for this routing
scheme: nodes automatically determine the next hop. By varying the message interarrival times, we can
study how the protocols perform under different loads.
34.7.2 Micro-Benchmarks
To determine the organizational overhead associated with each protocol we ran the simulator with an
empty workload. The resulting energy consumption is shown in Table 34.6. This table also shows the
effective duty cycle relative to the performance of the 802.11 protocol, which keeps all nodes listening all
the time.
The contention-based LPL protocol wastes no energy on organizing nodes, and achieves its target duty
cycle of 10%. The slotted protocols (S-MAC and T-MAC) spend some energy on sending and receiving
SYNC packets, but the impact is limited as the effective duty cycles only marginally exceed the built-in
active/sleep ratios (10 and 2.5%). Finally, note that the overhead of the TDMA-based LMAC protocol is
remarkably low (6.6%), which is largely due to the efcient carrier sense at the physical layer. If the nodes
were to listen to all trafc control sections completely, the overhead would grow to about 16% (12 control
bytes per 76 byte slot).
Our second experiment measured the multihop latency in an empty network, which we expect to be
signicant for slotted and schedule-based protocols. The results in Figure 34.10 conrm this: S-MAC,
T-MAC, and LMAC show end-to-end latencies that are much higher than those obtained by 802.11 and
LPL. In the case of LMAC a node that wants to send or relay a packet must wait until its slot turns up.
On an average, this means that packets are delayed by half the length of a frame, or 236 msec, which is an
order of magnitude more than the one-hop latency under 802.11 (13.2 msec). With T-MAC and S-MAC
the source node must wait for the next active period to show up before it can transfer the message with
an RTS/CTS/DATA/ACK sequence. This accounts for the initial offset of 263 msec. Then, in the case of
T-MAC, the second node may immediately relay the message since the third node is again awake due
to overhearing the rst CTS packet. The fourth node, however, did not receive that same CTS and by
lack of activity went to sleep. Therefore, the third nodes attempt to relay will fail, and it has to wait
until the start of the next slot. This accounts for T-MACs staircase pattern in Figure 34.10. S-MAC is
less aggressive in putting nodes to sleep, and messages can travel about 3 to 4 hops during one active
period. The exact number depends on the random numbers selected from the contention interval prior
to each RTS, and may be different for each data packet and active period. These numbers get averaged
over multiple messages and this explains the erosion of the staircase pattern when traveling more hops.
2006 by Taylor & Francis Group, LLC
34-20 Embedded Systems Handbook
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 2 4 6 8 10 12 14 16
L
a
t
e
n
c
y
(
m
s
e
c
)
Number of hops
802.11
LPL
S-MAC
T-MAC
LMAC
FIGURE 34.10 Multihop latency in an empty network.
0
10
20
30
40
50
60
70
0 20 40 60 80 100
P
a
c
k
e
t
s
r
e
c
e
i
v
e
d
/
s
e
c
Generated packets/sec
802.11
LPL
S-MAC
T-MAC
LMAC
FIGURE 34.11 Throughput in a 3 3 grid.
Finally, observe that LPL outperforms 802.11 because it does not include an RTS/CTS handshake, but
sends the DATA immediately.
The third experiment that we carried out measured the maximum throughput that a single node can
handle (channel utilization). We selected a 3 3 section of the network grid, and arranged the 8 edge
nodes to repeatedly send a message (25 byte payload) to the central node. By increasing the sending
rate we were able to determine what the maximum throughput is that each MAC protocol can handle,
and whether or not it collapses under high loads. Figure 34.11 shows the results of this stress test. LPL
performs very poorly because of the collisions generated by hidden nodes; in the 3 3 conguration each
sending node senses only the communications by its direct neighbors on the edge, but the other nodes
are hidden from it. The repeated retransmissions issued to resolve the collisions cause the internal queues
to overow, and packets to be dropped. The RTS/CTS handshake eliminates most collisions and 802.11
achieves a maximum throughput of around 70 packets per second, which is about 30% of the effective
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-21
bitrate (46 kbps, or 230 packets/sec) offered by the physical layer. The signaling overhead (33 bytes MAC
control + physical layer headers + radio turnaround times) reduces this capacity already to 85 packets/sec;
the remaining loss is caused by the contention period prior to each RTS. S-MAC runs at a 10% duty cycle,
and its throughput is therefore reduced by a factor of 10. T-MAC on the other hand, adapts its duty cycle
and is able to follow the 802.11 curve at much higher loads than the other protocols. It attens off abruptly
(around 45 packets/sec) due to its xed contention window (9.15 msec), which is much shorter than the
maximum length of 802.11s binary backoff period (305 msec). The throughput of LMAC is limited by
two factors: (1) only 8 out 32 slots in each frame are used, and (2) only 25 bytes out of each 76 byte slot
are used. Consequently, LMACs throughput is maximized at 8% of the channel capacity.
34.7.3 Homogeneous Unicast and Broadcast
The micro-benchmarks discussed in the previous section studied the behavior of the MAC protocols
in isolation. In this section, we report on experiments involving all nodes in the network. The results
in this section provide a stepping stone for understanding the performance of the complex local gossip
and convergecast patterns common to sensor network applications.
In our rst network-wide experiment, we had all 100 nodes in a 10 10 grid repeatedly send a message
(25 byte payload) to a randomly selected neighbor. The intensity of this homogeneous load on the network
was controlled by adjusting the sending rate of the nodes. The topmost graph in Figure 34.12 shows the
delivery ratio with increasing load. It reveals that S-MAC, T-MAC, and LPL collapse at some point, while
the performance of the LMAC and 802.11 protocols degrades gracefully. When comparing the order in
which the protocols break down (S-MAC, T-MAC, LPL, LMAC, 802.11) with that of the corresponding
throughput benchmark (LPL, S-MAC, LMAC, T-MAC, 802.11) we see some striking differences. First,
LPL does much better, because nodes are throttled back by eight neighbors instead of just a few reducing
the probability of a collision with a hidden node. Second, T-MAC does much worse, because the RTS/CTS
signaling in combination with T-MACs power-down policy silences nodes too early. Third, the gap
between LMAC and 802.11 for high loads has shrunk considerably, which is mainly caused by 802.11 now
suffering from exposed nodes not present in the micro-benchmark.
The middle graph in Figure 34.12 plots the energy consumption of each MAC protocol when intensi-
fying the homogeneous load. Again we observe a few remarkable facts. First, the energy consumption of
the 802.11 protocol decreases for higher loads. This is caused by the overhearing avoidance mechanism
that shuts down the radio during communications in which a node is not directly involved. Second, the
energy consumption of T-MAC and LPL initially increase linearly, then jump to 11 mW. The jumps
correspond with the breakdowns of the message delivery rates, showing that the most energy is spent
on retransmissions due to collisions. The difference in gradient is caused by T-MAC spending additional
energy on the RTS/CTS handshake and the early-sleeping problem. Third, the energy consumption of
LMAC and S-MAC cross at about 50 bytes/node/sec, but while LMAC still delivers more than 97% of
the messages S-MACs delivery rate is down to just 10%. This signicant difference in price/performance
ratio is shown in the bottom graph of Figure 34.12, which plots the energy spent per data bit delivered.
These energy-efciency curves clearly show the collapse of the (slotted) contention-based protocols.
In our second network-wide experiment we had all 100 nodes repeatedly send a broadcast message
(25 byte payload) to their neighbors. Figure 34.13 shows the delivery rates, energy consumption, and
energy-efciency metrics. When comparing these results with those for unicast (Figure 34.12) some
interesting differences and similarities emerge. Consider the LMACprotocol rst. For broadcast it achieves
the same delivery rate as for unicast, which is no surprise given that LMAC guarantees collision-free
communications. The energy consumption to handle broadcast trafc, on the other hand, is about
twice the amount needed for unicast under high loads. This is a consequence of each node processing
more incoming data; instead of one neighbor with its radio set to listen for unicast, all neighbors have to
listen for a broadcast packet. This effect also explains why energy per received bit of information is reduced
with a factor of about six for light loads: all neighbors (6.84 on average) receive useful data at little extra
cost, and the energy is calculated per received bit.
2006 by Taylor & Francis Group, LLC
34-22 Embedded Systems Handbook
0
20
40
60
80
100
120
140
0 20 40 60 80 100 120
E
n
e
r
g
y
p
e
r
b
i
t
(
m
J
)
Payload (bytes/node/sec)
802.11
LPL
S-MAC
T-MAC
LMAC
0
2
4
6
8
10
12
0 20 40 60 80 100 120
E
n
e
r
g
y
c
o
n
s
u
m
e
d
(
a
v
g
.
m
W
/
n
o
d
e
)
Payload (bytes/node/sec)
802.11
LPL
S-MAC
T-MAC
LMAC
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100 120
D
e
l
i
v
e
r
y
r
a
t
i
o
Payload (bytes/node/sec)
802.11
LPL
S-MAC
T-MAC
LMAC
FIGURE 34.12 Performance under homogeneous unicast trafc: delivery rate (top), energy consumption (middle),
and energy efciency (bottom).
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-23
0
5
10
15
20
25
30
0 20 40 60 80 100 120
E
n
e
r
g
y
p
e
r
b
i
t
(
m
J
)
Payload (bytes/node/sec)
802.11
LPL
S-MAC
T-MAC
LMAC
0
2
4
6
8
10
12
14
0 20 40 60 80 100 120
E
n
e
r
g
y
c
o
n
s
u
m
p
t
i
o
n
(
a
v
g
.
m
W
/
n
o
d
e
)
Payload (bytes/node/sec)
802.11
LPL
S-MAC
T-MAC
LMAC
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100 120
D
e
l
i
v
e
r
y
r
a
t
i
o
Payload (bytes/node/sec)
802.11
LPL
S-MAC
T-MAC
LMAC
(a)
(b)
(c)
FIGURE 34.13 Performance under homogeneous broadcast trafc: (a) delivery rate, (b) energy consumption, and
(c) energy efciency.
2006 by Taylor & Francis Group, LLC
34-24 Embedded Systems Handbook
When considering the other protocols we nd that the delivery rates degrade for light loads with
respect to the unicast experiments, but improve dramatically for high loads. In particular, we nd no
breakdown points as with unicast trafc. The reason is twofold: (1) there are no retransmissions that
clog up the network, and (2) even when collisions occur, some of the neighbors still receive data. Note
that the delivery ratio should be interpreted as the fraction of neighbors receiving a broadcast message,
not as the probability that the message is received by all neighbors. The slotted protocols (S-MAC and
T-MAC) perform considerably worse than the contention-based protocols (802.11 and LPL). The reason
for this is that by grouping all trafc into a rather short active period, the probability of a collision is
increased considerably. The reason that 802.11 outperforms LPL is that the latter uses a longer preamble,
and although this increases the length of the DATA only by about 5%, the probability of a collision is
raised enough to make a difference in delivery rate.
The energy-efciency curves show that all protocols except S-MAC spend less energy per bit when the
intensity of the broadcast trafc increases. In particular, the contention-based protocols do not suffer from
a collapse as with unicast. The reason that the energy spent per bit increases with S-MAC is threefold:
(1) it suffers from considerably more collisions in its small active period, (2) the fraction of time spent
in transmitting steadily increases, especially since no time is spent waiting during a handshake as for
unicast, and (3) overhearing avoidance is no longer applicable, forcing the radio to be on all the time
during S-MACs active period. The latter reason also explains why 802.11s energy consumption does not
go down with increasing load as it did for unicast trafc.
34.7.4 Local Gossip
The rst communication pattern specic to sensor network applications that we studied was local gossip.
We designated a 5 5 area in the middle of the grid as the event region in which nodes would repeatedly
send a message (25 byte payload) to a randomly selected neighbor. In essence local gossip is a mixture of
75% empty workload (Table 34.6) and 25% homogeneous workload (Figure 34.12). The delivery rates
associated with local gossip, as shown in Figure 34.14, are completely determined by the homogeneous
unicast component of the workload, and therefore resemble the curves in Figure 34.12 to a large extent.
The LMAC curve is identical, the others are shifted to the right because collisions occur less frequently
due to a relatively large number of edge nodes with inactive neighbors (16/25 versus 36/100). The energy
consumption numbers, which are averages over the whole network, are diluted by the empty workload
component (cf. Figure 34.12 and Figure 34.14). In contrast, the energy-efciency numbers, not shown
for brevity, are raised since the energy spent by passive nodes (idle listening) is amortized over the limited
trafc in the 5 5 region.
34.7.5 Convergecast
In our nal experiment we studied the convergecast communication pattern. All 100 nodes in the network
periodically send a message (25 byte payload) to the sink in the bottom-left corner of the grid. To maximize
the load on the MAC protocols, messages are not aggregated at intermediate nodes. Figure 34.15 shows
the delivery rates and energy efciencies for the convergecast pattern. The shapes of these curves show
a large similarity with the homogeneous unicast pattern. Note that the generated load that can be handled
is much lower than with homogeneous unicast, since each injected message needs to travel 6.15 hops
on average. The performance results, however, do not simply scale with the path-length factor. The
breakdown points on the delivery curves for convergecast are shifted far more to the left than a factor
of six, and also the order in which the protocols breakdown is changed signicantly. In particular, the
LMAC protocol cannot handle the heavy loads around the sink since each node can only use the capacity
of one slot as demonstrated by the throughput micro-benchmark. T-MAC and LPL handle the high
loads around the sink much better than LMAC, with LPL being slightly more efcient. Both suffer from
a collapse, however, when the load is increased causing the energy consumed per bit to suddenly rocket
upwards. Furthermore, note that energy efciency degrades more than a factor of six compared with that
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-25
0
2
4
6
8
10
12
0 20 40 60 80 100 120
E
n
e
r
g
y
c
o
n
s
u
m
e
d
(
a
v
g
.
m
W
/
n
o
d
e
)
Payload (bytes/node/sec)
802.11
LPL
S-MAC
T-MAC
LMAC
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100 120
D
e
l
i
v
e
r
y
r
a
t
i
o
Payload (bytes/node/sec)
802.11
LPL
S-MAC
T-MAC
LMAC
FIGURE 34.14 Performance under local gossip: delivery rate (top), energy efciency (bottom).
for unicast under comparable load. Apparently, even the adaptive T-MAC protocol nds it impossible to
select the right duty cycle for each node.
34.7.6 Discussion
When reviewing the simulation results we nd that no MAC protocol outperforms the others in all
experiments. Each protocol has its strong and weak points, which reects the particular choice on how to
trade-off performance (latency, throughput) for cost (energy consumption). Some general observations,
however, can be made:
Communication grouping considered harmful. The slotted protocols (S-MAC and T-MAC) organize
nodes to communicate during small periods of activity. The advantage is that very low duty cycles can
be obtained, but at the expense of high latency and a collapse under high loads. T-MACs automatic
adaptation of the duty cycle allows it to handle higher loads; S-MACs xed duty cycle bounds the energy
consumption under a collapse.
2006 by Taylor & Francis Group, LLC
34-26 Embedded Systems Handbook
0
100
200
300
400
500
600
700
800
900
0 5 10 15 20
E
n
e
r
g
y
p
e
r
b
i
t
(
m
J
)
Payload (bytes/node/sec)
802.11
LPL
S-MAC
T-MAC
LMAC
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
D
e
l
i
v
e
r
y
r
a
t
i
o
Payload (bytes/node/sec)
802.11
LPL
S-MAC
T-MAC
LMAC
FIGURE 34.15 Performance under convergecast: delivery rate (top) and energy efciency (bottom).
The TDMA-based LMAC protocol also limits the moments at which nodes may communicate and
therefore incurs high latencies in general, and reduced throughput under high load. In contrast to T-MAC,
its energy consumption does not deteriorate; LMAC is rather robust and performance degrades gracefully
under higher loads.
The LPL protocol is most exible since it puts only minor restrictions on when nodes can communicate
(i.e., once every 300 sec). Its sampling approach, however, critically depends on the radios ability to
switch on quickly. This is the case for the RFM radio at hand, but preliminary experiments with the
Chipcon radio shows that LPLs advantage weakens when operating with a corresponding 2 out of 20 msec
duty cycle:
Collision avoidance considered prohibitive. On the one hand, the RTS/CTS handshake prevents collisions
due to hidden nodes, which is good. On the other hand, the RTS/CTS handshake reduces the effective
channel capacity since a communication takes more time (11.68 versus 8.31 msec), which decreases the
minimum packet transfer rate required before network collapse. Given that typical messages in sensor
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-27
networks are small, the overheads associated with collision avoidance prove to be prohibitive, especially
in combination with communication grouping.
Adaptivity considered essential. The results for local gossip and convergecast communication patterns
show that MAC protocols must be able to adapt to local trafc demands. Static protocols either consume
too much energy under low loads (e.g., S-MAC), or throttle throughput too much under high loads
(e.g., LMAC). The current generation of adaptive protocols (e.g., T-MAC and LPL), however, are not the
nal answer since they suffer from contention collapse, forcing applications to be aware of that and take
precautions.
34.8 Conclusions
Medium access protocols for wireless sensor networks trade-off performance (latency, throughput, and
fairness) for cost (energy consumption). They do so by turning off the radio for signicant amounts
of time reducing the energy wasted by idle listening, which dominates the cost of typical WLAN-based
MAC protocols. Other sources of overhead include collisions, overhearing, protocol overhead, and trafc
uctuations. Different protocols take different approaches to reduce (some of) these overheads. They can
be classied according three important design decisions: (1) the number of channels used (single, double,
or multiple), (2) the way in which nodes are organized (random, slotted, frames), and (3) the notication
method used (listening, wake-up, schedule). Given that the current generation of sensor nodes is equipped
with one radio, most protocols use a single channel. The organizational choice, however, is not so easily
decided on since it reects the fundamental trade-off between exibility and energy efciency.
Contention-based protocols like CSMA are extremely exible regarding the time, location, and amount
of data transfered by individual nodes. This gives them the advantage of handling the trafc uctuations
present in typical monitoring applications running on wireless sensor networks. Contention-based proto-
cols can be made energy efcient by implementing a duty cycle at the physical level provided that the radio
can be switched on and off rapidly. The idea is to stretch the preamble, which allows potential receivers to
sample the carrier at a low rate.
Slotted protocols organize nodes to synchronize on a common slot structure. They reduce idle-listening
by implementing a duty cycle within each slot. This duty cycle need not be xed, and can be adapted
automatically to match demands.
TDMA-based protocols have the advantage of being inherently free of idle-listening since nodes are
informed up front, by means of a schedule, when to expect incoming trafc. To control the overheads
associated with computing the schedule and its distribution through the network, TDMA-based protocols
must either limit the deployment scenario (e.g., single hop) or hard code some parameters (e.g., maximum
number of two-hop neighbors) compromising on exibility.
A head-to-head comparison of sample protocols from each class revealed that there is no single, best
MAC protocol that outperforms all others. What did become apparent, however, is that adaptivity is
mandatory to handle the generic local gossip and convergecast communication patterns displaying trafc
uctuations both in time and space. Considering the speed at which protocols have been developed so far,
we expect a number of new protocols to appear that will strike yet another balance between exibility and
energy efciency. Other future developments may include crosslayer optimizations with routing and data
aggregation protocols, and an increased level of robustness to handle practical issues, such as asymmetric
links and node failures.
Acknowledgments
We thank Tijs van Dam for his initial efforts in designing the T-MAC protocol and putting the issue of
energy-efcient MAC protocols on the Delft research agenda. We thank Ivaylo Haratcherev, Tom Parker,
and Niels Reijers for proofreading this chapter correcting numerous mistakes, ltering out jargon, and
rearranging material all of which greatly enhanced the readability of the text.
2006 by Taylor & Francis Group, LLC
34-28 Embedded Systems Handbook
References
[1] R. Jurdak, C. Lopes, and P. Baldi. A survey, classication and comparative analysis of medium
access control protocols for ad hoc networks. IEEE Communications Surveys and Tutorials,
6, 216, 2004.
[2] N. Abramson. The ALOHA system another alternative for computer communications.
In Proceedings of the Fall Joint Computer Conference, Vol. 37. Montvale, NJ, 1970, pp. 281285.
[3] L. Roberts. ALOHApacket systemwith and without slots and capture. ACMSIGCOMMComputer
Communications Review, 5, 2842, 1975.
[4] L. Kleinrock and F. Tobagi. Packet switching in radio channels: part I carrier sense multiple-
access modes and their throughput-delay characteristics. IEEE Transactions on Communications,
23, 14001416, 1975.
[5] P. Karn. MACA a new channel access method for packet radio. In Proceedings of the 9th ARRL
Computing Networking Conference, September 1990, pp. 134140.
[6] V. Bharghavan, A. Demers, S. Shenker, and L. Zhang. MACAW: a media access protocol for wireless
LANs. InProceedings of the Conference onCommunications Architectures, Protocols andApplications.
London, August 1994, pp. 212225.
[7] IEEE standard 802.11. Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)
Specications, 1999.
[8] R. Krashinsky and H. Balakrishnan. Minimizing energy for wireless web access with bounded
slowdown. In Proceedings of the 8th ACM International Conference on Mobile Computing and
Networking (MobiCom02). Atlanta, GA, September 2002, pp. 119130.
[9] A. Gummalla and J. Limb. Wireless medium access control protocols. IEEE Communications
Surveys and Tutorials, 3, 215, 2000.
[10] F. Liu, K. Xing, X. Cheng, and S. Rotenstrech. Energy-efcient MAC layer protocols in ad hoc
networks. In Resource Management in Wireless Networking, M. Cardei, I. Cardei, and D.-Z. Du,
Eds. Kluwer Academic Publishers, Dordrecht, 2004.
[11] RF Monolithics. TR1001 868.35 MHz Hybrid Tranceiver.
[12] Chipcon Corporation. CC1000 Low Power FSK Tranceiver.
[13] L. Feeney and M. Nilsson. Investigating the energy consumption of a wireless network interface
in an ad hoc networking environment. In Proceedings of the IEEE INFOCOM, IEEE, Anchorage,
Alaska, 2001, pp. 15481557.
[14] R. Szewczyk, J. Polastre, A. Mainwaring, and D. Culler. Lessons from a sensor network expedition.
In Proceedings of the 1st European Workshop on Wireless Sensor Networks (EWSN 04). Berlin,
Germany, January 2004.
[15] T. He, S. Krishnamurthy, J. Stankovic, T. Abdelzaher, L. Luo, R. Stoleru, T. Yan, L. Gu, J. Hui, and
B. Krogh. An energy-efcient surveillance system using wireless sensor networks. In Proceedings of
the 2nd International Conference on Mobile Systems, Applications, and Services (MobiSys04). Boston,
MA, June 2004, pp. 270283.
[16] S. Kulkarni and M. Arumugam. TDMA service for sensor networks. In Proceedings of the 24th
International Conference on Distributed Computing Systems (ICDCS04), ADSN Workshop. Tokyo,
Japan, March 2004, pp. 604609.
[17] K. Sohrabi, J. Gao, V. Ailawadhi, and G. Pottie. Protocols for self-organization of a wireless sensor
network. IEEE Personal Communications, 7, 1627, 2000.
[18] G. Pei and C. Chien. Low power TDMA in large wireless sensor networks. In Proceedings of
the Military Communications Conference (MILCOM 2001), Vol. 1. Vienna, VA, October 2001,
pp. 347351.
[19] C. Guo, L. Zhong, and J. Rabaey. Low power distributed MAC for ad hoc sensor networks.
In Proceedings of the IEEE GlobeCom. San Antonio, AZ, November 2001.
[20] C. Schurgers, V. Tsiatsis, S. Ganeriwal, and M. Srivastava. Optimizing sensor networks in the
energy-latency-density design space. IEEE Transactions on Mobile Computing, 1, 7080, 2002.
2006 by Taylor & Francis Group, LLC
Energy-Efcient Medium Access Control 34-29
[21] A. El-Hoiydi. ALOHA with preamble sampling for sporadic trafc in ad hoc wireless sensor
networks. In Proceedings of the IEEE International Conference on Communications (ICC). NewYork,
April 2002.
[22] K. Arisha, M. Youssef, and M. Younis. Energy-aware TDMA-based MAC for sensor networks.
In Proceedings of the IEEE Workshop on Integrated Management of Power Aware Communications,
Computing and NeTworking (IMPACCT 2002). New York City, NY, May 2002.
[23] W. Ye, J. Heidemann, and D. Estrin. An energy-efcient MAC protocol for wireless sensor
networks. In Proceedings of the 21st Conference of the IEEE Computer and Communications Societies
(INFOCOM), Vol. 3. June 2002, pp. 15671576.
[24] E.-S. Jung and N. Vaidya. Apower control MACprotocol for ad hoc networks. In Proceedings of the
8th ACM International Conference on Mobile Computing and Networking (MobiCom02). Atlanta,
GA, September 2002, pp. 3647.
[25] J. Hill and D. Culler. Mica: a wireless platform for deeply embedded networks. IEEE Micro, 22,
1224, 2002.
[26] K. Jamieson, H. Balakrishnan, and Y. Tay. Sift: a MAC protocol for event-driven wireless sensor
networks. Technical report LCS-TR-894, MIT, May 2003.
[27] L. van Hoesel, T. Nieberg, H. Kip, and P. Havinga. Advantages of a TDMA based, energy-efcient,
self-organizing MAC protocol for WSNs. In Proceedings of the IEEE VTC 2004 Spring. Milan, Italy,
May 2004.
[28] T. van Dam and K. Langendoen. An adaptive energy-efcient MAC protocol for wireless sensor
networks. In Proceedings of the 1st ACMConference on Embedded Networked Sensor Systems (SenSys
2003). Los Angeles, CA, November 2003, pp. 171180.
[29] V. Rajendran, K. Obraczka, and J. Garcia-Luna-Aceves. Energy-efcient, collision-free medium
access control for wireless sensor networks. In Proceedings of the 1st ACM Conference
on Embedded Networked Sensor Systems (SenSys 2003). Los Angeles, CA, November 2003,
pp. 181192.
[30] A. El-Hoiydi, J.-D. Decotignie, C. Enz, and E. Le Roux. Poster abstract: WiseMAC, an ultra
low power MAC protocol for the WiseNET wireless sensor network. In Proceedings of the
1st ACM Conference on Embedded Networked Sensor Systems (SenSys 2003). Los Angeles, CA,
November 2003.
[31] J. Polastre and D. Culler. B-MAC: an adaptive CSMA layer for low-power operation. Technical
report cs294-f03/bmac, UC Berkeley, December 2003.
[32] J. Li andG. Lazarou. Abit-map-assistedenergy-efcient MACscheme for wireless sensor networks.
In Proceedings of the 3rd International Symposium on Information Processing in Sensor Networks
(IPSN04). Berkeley, CA, April 2004, pp. 5560.
[33] M. Miller and N. Vaidya. Minimizing energy consumption in sensor networks using a wakeup
radio. In Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC04).
Atlanta, GA, March 2004.
[34] G. Lu, B. Krishnamachari, and C. Raghavendra. An adaptive energy-efcient and low-latency MAC
for data gathering in sensor networks. In Proceedings of the International Workshop on Algorithms
for Wireless, Mobile, Ad Hoc and Sensor Networks (WMAN). Santa Fe, NM, April 2004.
[35] L. van Hoesel and P. Havinga. A lightweight medium access protocol (LMAC) for wireless sensor
networks. In Proceedings of the 1st International Workshop on Networked Sensing Systems (INSS
2004). Tokyo, Japan, June 2004.
[36] W. Ye, J. Heidemann, and D. Estrin. Medium access control with coordinated, adaptive sleeping
for wireless sensor networks. Technical report ISI-TR-567, USC/Information Sciences Institute,
January 2003 (accepted for publication in IEEE/ACM Transactions on Networking).
[37] W. Heinzelman, A. Chandrakasan, andH. Balakrishnan. Energy-efcient communicationprotocol
for wireless microsensor networks. In Proceedings of the 33rd Hawaii International Conference on
System Sciences. January 2000.
2006 by Taylor & Francis Group, LLC
35
Overview of Time
Synchronization
Issues in Sensor
Networks
Weilian Su
Naval Postgraduate School
35.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-1
35.2 Design Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-2
35.3 Factors Inuencing Time Synchronization . . . . . . . . . . . . 35-3
35.4 Basics of Time Synchronization . . . . . . . . . . . . . . . . . . . . . . . . 35-3
35.5 Time Synchronization Protocols for
Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-6
35.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-9
Acknowledgment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-9
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-9
35.1 Introduction
In the near future, small intelligent devices will be deployed in homes, plantations, oceans, rivers, streets,
and highways to monitor the environment [1]. Events, such as target tracking, speed estimating, and
ocean current monitoring, require the knowledge of time between sensor nodes that detect the events.
In addition, sensor nodes may have to time-stamp data packets for security reasons. With a common view
of time, voice and video data from different sensor nodes can be fused and displayed in a meaningful
way at the sink. Also, mediumaccess scheme such as Time Division Multiple Access (TDMA) requires the
nodes to be synchronized, so the nodes can be turned off to save energy.
The purpose of any time synchronization technique is to maintain a similar time within a certain
tolerance throughout the lifetime of the network or among a specic set of nodes in the network.
Combining with the criteria that sensor nodes have to be energy efcient, low-cost, and small in a
multihop environment, this requirement makes a challenging problem to solve. In addition, the sensor
nodes may be left unattended for a long period of time, for example, in deep space or on an ocean oor.
When messages are exchanged using short distance multihop broadcast, the software and mediumaccess
time and the variation of the access time may contribute the most in time uctuations and differences in
the path delays. Also, the time difference between sensor nodes may be signicant over time due to the
drifting effect of the local clocks.
35-1
2006 by Taylor & Francis Group, LLC
35-2 Embedded Systems Handbook
In this chapter, the backgrounds of time synchronization are provided to enable new developments or
enhancements of timing techniques for the sensor networks. The design challenges and factors inuencing
time synchronization are described in Sections 35.2 and 35.3, respectively. In addition, the basics of time
synchronization for sensor networks are explained in Section 35.4. Afterwards, different types of timing
techniques are discussed in Section 35.5. Last, the chapter is concluded in Section 35.6.
35.2 Design Challenges
In the future, many low-end sensor nodes will be deployed to minimize the cost of the sensor networks.
These nodes may work collaboratively together to provide time synchronization for the whole sensor
network. The precision of the synchronized clocks depends on the needs of the applications. For example,
a sensor network requiring TDMA service may require microseconds difference among the neighbor
nodes while a data gathering application for sensor networks requires only milliseconds of precision.
As sensor networks are application driven, the design challenges of a time synchronization protocol
are also dictated by the application. These challenges are to provide an overall guideline and requirement
when considering the features of a time synchronization protocol for sensor networks; they are robust,
energy aware, server-less, light-weight, and tunable service:
Robust. Sensor nodes may fail, and the failures should not have signicant effect on the time synchro-
nization error. If sensor nodes depend on a specic master to synchronize their clocks, a failure or anomaly
of the masters clock may create a cascade effect that nodes in the network may become unsynchronized.
So, a time synchronization protocol has to handle the unexpected or periodic failures of the sensor nodes.
If failures do occur, the errors caused by these failures should not be propagated throughout the network.
Energy aware. Since each node is battery limited, the use of resources should be evenly spread and
controlled. Atime synchronization protocol should use the minimumnumber of messages to synchronize
the nodes in the earliest time. In addition, the load for time synchronization should be shared, so some
nodes in the network do not fail earlier than others. If some parts of the network fail earlier than others,
the partitioned networks may drift apart fromeach other and become unsynchronized.
Server-less. Aprecise time server may not be available. In addition, the time servers may fail when placed
in the sensor eld. As a result, sensor nodes should be able to synchronize to a common time without the
precise time servers. When the precise time servers are available, the quality of the synchronized clocks
as well as the time to synchronize the clocks of the network should be much better. This server-less feature
also helps to address the robustness challenge as stated earlier.
Light-weight. The complexity of the time synchronization protocol has to be low in order to be
programmed into the sensor nodes. Besides being energy limited, the sensor nodes are memory limited
as well. The synchronization protocol may be programmed into a eld programmable gate array (FPGA)
or designed into an ASIC. By having the time synchronization protocol tightly integrated with the
hardware, the delay and variation of the processing may be smaller. With the increase of precision,
the cost of a sensor node is higher.
Tunable service. Some services, such as mediumaccess, may require time synchronization to be always
ON while others only need it when there is an event. Since time synchronization can consume a lot of
energy, a tunable time synchronization service is applicable for some applications. Nevertheless, there are
needs for both type of synchronization protocols.
The above challenges provide a guideline for developing various types of time synchronizationprotocols
that are applicable to the sensor networks. A time synchronization protocol may have a mixture of
these design features. In addition, some applications in the sensor networks may not require the time
synchronization protocol to meet all these requirements. For example, a data gathering application may
require the tunable service and light-weight features more than the server-less capability. The tunable
service and light-weight features allow the application to gather precise data when the users require it.
In addition, the nodes that are not part of this data gathering process may not have to be synchronized.
2006 by Taylor & Francis Group, LLC
Time Synchronization Issues in Sensor Networks 35-3
Also, the precision of the time does not need to be high, because the users may only need milliseconds
precision to satisfy their needs.
As these design challenges are important for guiding the development of a time synchronization
protocol, the inuencing factors that affect the quality of the synchronized clocks have to be discussed.
Although the inuencing factors are similar to existing distributed computer system, they are at different
extreme levels. These inuencing factors are discussed in Section 35.3.
35.3 Factors Inuencing Time Synchronization
Regardless of the design challenges that a time synchronization protocol wants to address, the protocol
still needs to address the inherent problems of time synchronization. In addition, small and low-end
sensor nodes may exhibit device behaviors that may be much worst than large systems such as personal
computers (PCs). As a result, time synchronization with these nodes present a different set of problems.
Some of the factors inuencing time synchronization in large systems also apply to sensor networks [2];
they are temperature, phase noise, frequency noise, asymmetric delays, and clock glitches:
Temperature. Since sensor nodes are deployed in various places, the temperature variation throughout
the day may cause the clock to speedup or slow down. For a typical PC, the clock drifts few parts per
million (ppm) during the day [3]. For low-end sensor nodes, the drifting may be even worst.
Phase noise. Some of the causes of phase noise are due to access uctuation at the hardware interface,
response variation of the operating system to interrupts, and jitter in the network delay. The jitter in the
network delay may be due to medium access and queueing delays.
Frequency noise. The frequency noise is due to the unstability of the clock crystal [4]. A low-end
crystal may experience large frequency uctuation, because the frequency spectrum of the crystal has
large sidebands on adjacent frequencies. The drift rate values for quartz oscillators are between 10
4
and 10
6
[5].
Asymmetric delay. Since sensor nodes communicate with each other through the wireless medium,
the delay of the path from one node to another may be different than the return path. As a result, an
asymmetric delay may cause an offset to the clock that can not be detected by a variance type method [2].
If the asymmetric delay is static, the time offset between any two nodes is also static. The asymmetric delay
is bounded by one-half the round trip time between the two nodes [2].
Clock glitches. Clock glitches are sudden jumps in time. This may be caused by hardware or software
anomalies such as frequency and time steps.
Since sensor nodes are randomly deployed and their broadcast ranges are small, the inuencing factors
may shape the design of the time synchronization protocol. In addition, the links between the sensor
nodes may not be reliable. As a result, the inuencing factors may have to be addressed differently. In the
following section, the basics of time synchronization for sensor networks are discussed.
35.4 Basics of Time Synchronization
As the factors described in Section 35.3 inuence the error budget of the synchronized clocks, the purpose
of a time synchronization protocol is to minimize the effects of these factors. Before developing a solution
to address these factors, some basics of time synchronization for sensor networks need to be discussed.
These basics are to provide the fundamentals for designing a time synchronization protocol.
If a better clock crystal is used, the drift rate may be much smaller. Usually, the hardware clock time
H (t ) at real-time t is within a linear envelope of the real-time as illustrated in Figure 35.1. Since the clock
drifts away fromreal-time, the time difference between two events measured with the same hardware clock
may have a maximumerror of (b a) [5], where a and b are the time of occurrence of rst and second
events, respectively. For modern computers, the clock granularity may be negligible, but it may contribute
a signicant portion to the error budget if the clock of a sensor node is really coarse, running at kHz range
2006 by Taylor & Francis Group, LLC
35-4 Embedded Systems Handbook
H(t)
1
2
3
4
5
6
7
8
1
0
9
1
1
1
2
1
3
1
4
1 2 3 4 5 6 7 8 10 9 11 12 13 14
Hardware clock time
1
1+ r
1 r
t, Real-time
Ideal time
FIGURE 35.1 Drifting of hardware clock time.
instead of MHz. In certain applications, a sensor node may have a volume of cm
3
[6], so a fast oscillator
may not be possible or suitable for such size.
Regardless of the clock granularity, the hardware clock time H (t ) is usually translated into a virtual
clock time by adding an adjustment constant to it. Normally, it is the virtual clock time that we read from
a computer. Hence, a time synchronization protocol may adjust the virtual clock time and discipline the
hardware clock to compensate for the time difference between the clocks of the nodes. Either approach
has to deal with the factors inuencing time synchronization as described earlier.
When an application issues a request to obtain the time, the time is returned after a certain delay. This
software access delay may uctuate according to the loading of the system. This type of uctuation is
nondeterministic and may be lessened if real-time operation system and hardware architecture are used.
For low-end sensor nodes, the software access time may be in the order of few hundred microseconds.
For example, a Mica mote is running at 4 MHz [7] having clock granularity of 0.25 sec. If the node
is 80% loaded and it takes 100 cycles to obtain the time, the software access time is around 125 sec.
In addition to the software access time, the medium access time also contributes to the nondeterministic
delay that a message experiences. If carrier-sense multiple access (CSMA) is used, the back-off window
size as well as the trafc load affect the medium access time [810]. Once the sensor node obtains the
channel, the transmission and propagation times are pretty deterministic, and they can be estimated by
the packet size, transmission rate, and speed-of-light.
In summary, the delays experienced when sending a message at real-time t
1
and receiving an acknowl-
edgment (ACK) at real-time t
4
are shown in Figure 35.2. The message from node A incurs the software
access, medium access, transmission, and propagation times. These times are represented by S
1
, M
1
, T
1
,
and P
12
. Once the message is received by node B at t
2
, it will incur extra delays through receiving and
processing. After the message is processed, an ACK is sent to node A at t
3
. The total delay at node B is the
2006 by Taylor & Francis Group, LLC
Time Synchronization Issues in Sensor Networks 35-5
Node B
S=Software access time
M=Medium access time
T=Transmission time
P=Propagation time
R=Reception time
Sending message at t
1
Receiving message at t
2
Receiving ACK at t
4
S
1
M
1
T
1
P
12
R
2
S
2
S
4
R
4
P
34
T
3
M
3
S
3
Sending message at t
3
Node A
FIGURE 35.2 Round-trip time.
summationof R
2
, S
2
, (1
B
)(t
3
t
2
), S
3
, M
3
, and T
3
, where
B
is the drift rate at node Band the difference
(t
3
t
2
) is to account for the waiting time before an ACK is sent to node A by node B. After node B sends
the ACK, the ACK propagates through the wireless medium and arrives at node A. Afterwards, node A
processes the ACK. The path delays for sending and receiving the ACK from node B to A are P
34
, R
4
,
and S
4
. The round-trip time in real-time t for sending a message and receiving an ACK is calculated by
t
4
t
1
= S
1
+M
1
+T
1
+P
12
+R
2
+S
2
+(1
B
)(t
3
t
2
) +S
3
+M
3
+T
3
+P
34
+R
4
+S
4
(35.1)
where S, M, T, P, and R are the software access, mediumaccess, transmission, propagation, and reception
times, respectively.
If the round-trip time is measured using the hardware clock of node A, it has to be adjusted by the
drift rate of node A
A
. If the granularity of the hardware clock is coarse, the error contributed by the
granularity should be accounted for. As a result, the round-trip time measured with the hardware clock is
bounded by an error associated with the clock drift and granularity as determined by
(1
A
)(t
4
t
1
) H(t
4
) H(t
1
) < (1 +
A
)(t
4
t
1
) + (35.2)
The bound for the round-trip time uctuates with respect to time since the software and mediumaccess
uctuate according to the load at the node and in the channel. Although the transmission, propagation,
and reception times may be deterministic, they may contribute to the asymmetric delay that can cause
time offset between nodes A and B.
In the following section, different types of time synchronization protocols are described. Each of them
tries to minimize the effect of the nondeterministic and asymmetric delays. For sensor networks, it is best
to minimize the propagation delay variation. For example, the delays and jitters between two nodes may be
different inthe forwardandreturnpaths. Inaddition, the jitters may vary signicantly due tofrequent node
2006 by Taylor & Francis Group, LLC
35-6 Embedded Systems Handbook
failures, since the messages are relayed hop-by-hop between the two nodes. The synchronization protocols
in the following section focus on synchronizing nodes hop-by-hop, so the propagation time and variation
do not play too much effect on the error of the synchronized clocks. Although the sensor nodes are densely
deployed and they can take advantage of the close distance, the medium and software access times may
contribute the most in the nondeterministic of the path delay during a one hop synchronization. The
way to provide time synchronization for sensor networks may be different for different applications. The
current timing techniques that are available for different applications are describedinthe following section.
35.5 Time Synchronization Protocols for Sensor Networks
There are three types of timing techniques as shown in Table 35.1, and each of these types has to
address the design challenges and factors affecting time synchronization as mentioned in Sections 35.2
and 35.3, respectively. In addition, the timing techniques have to address the mapping between the
sensor network time and the Internet time, for example, universal coordinated time. In the following
paragraphs, examples of these types of timing techniques are described, namely the Network Time
Protocol (NTP) [11], Timing-sync Protocol for Sensor Networks (TPSN) [12], Reference-Broadcast
Synchronization (RBS) [13], and Time-Diffusion Synchronization Protocol (TDP) [14].
In Internet, the NTP is used to discipline the frequency of each nodes oscillator. The accuracy of the
NTP synchronization is in the order of milliseconds [15]. It may be useful to use NTP to discipline the
oscillators of the sensor nodes, but the connection to the time servers may not be possible because of
frequent sensor node failures. In addition, disciplining all the sensor nodes in the sensor eld maybe
a problemdue to interference fromthe environment and large variation of delay between different parts
of the sensor eld. The interference can temporarily disjoint the sensor eld into multiple smaller elds
causing undisciplined clocks among these smaller elds. The NTP protocol may be considered as type (1)
of the timing techniques. Inaddition, it has to be renedinorder to address the designchallenges presented
by the sensor networks.
As of now, the NTP is very computational intensive and requires a precise time server to synchronize the
nodes in the network. In addition, it does not take into account the energy consumption required for time
synchronization. As a result, the NTP does not satisfy the energy aware, server-less, and light-weight design
challenges of the sensor networks. Although the NTP can be robust, it may suffer large propagation delay
whensending timing messages to the time servers. Inaddition, the nodes are synchronized ina hierarchical
manner, and some time servers in the middle of the hierarchy may fail causing unsynchronized nodes in
the network. Once these nodes fail, it is hard to recongure the network since the hierarchy is manually
congured.
Another time synchronization technique that adopts some concepts from NTP is TPSN. The TPSN
requires the root node to synchronize all or part of the nodes in the sensor eld. The root node synchro-
nizes the nodes in a hierarchical way. Before synchronization, the root node constructs the hierarchy by
TABLE 35.1 Three Types of Timing Techniques
Type Description
Relies on xed time servers The nodes are synchronized to time servers that
to synchronize the network are readily available. These time servers are expected
to be robust and highly precise
Translates time throughout The time is translated hop-by-hop fromthe
the network source to the sink. In essence, it is a time
translation service
Self-organizes to synchronize The protocol does not depend on specialized time
the network servers. It automatically organizes and determines
the master nodes as the temporary time servers
2006 by Taylor & Francis Group, LLC
Time Synchronization Issues in Sensor Networks 35-7
Synchronization pulse
g
4
g
3
g
1
A
B
Acknowledgment
g
2
FIGURE 35.3 Two-way message handshake.
broadcasting a level_discovery packet. The rst level of the hierarchy is level 0, which is where the root
node resides. The nodes receiving the level_discovery packet from the root node are the nodes belonging
to level 1. Afterwards, the nodes in level 1 broadcast their level_discovery packet, and neighbor nodes
receiving the level_discovery packet for the rst time are the level 2 nodes. This process continues until all
the nodes in the sensor eld has a level number.
The root node sends a time_sync packet to initialize the time synchronization process. Afterwards, the
nodes in level 1 synchronize to level 0 by performing the two way handshake as shown in Figure 35.3. This
type of handshake is used by the NTP to synchronize the clocks of distributed computer systems. At the
end of the handshake at time g
4
, node A obtains the time g
1
, g
2
, and g
3
from the acknowledgment packet.
The time g
2
and g
3
are obtained from the clock of sensor node B while g
1
and g
4
are from the node A.
After processing the acknowledgment packet, the nodeA readjusts its clock by the clock drift value , where
= ((g
2
g
1
) (g
4
g
3
))/2. At the same time, the level 2 nodes overhear this message handshake and
wait for a random time before synchronizing with level 1 nodes. This synchronization process continues
until all the nodes in the network are synchronized. Since TPSN enables time synchronization from one
root node, it is type (1) of the timing techniques.
The TPSN is based on a senderreceiver synchronization model, where the receiver synchronizes with
the time of the sender according to the two-way message handshake as shown in Figure 35.3. It is trying
to provide a light-weight and tunable time synchronization service. On the other hand, it requires a time
server and does not address the robust and energy aware design goal. Since the design of TPSN is based
on a hierarchical methodology similar to NTP, nodes within the hierarchy may fail and cause nodes to
be unsynchronized. In addition, node movements may render the hierarchy useless, because nodes may
move out of their levels. Hence, nodes at level i can not synchronize with nodes at level i 1. Afterwards,
synchronization may fail throughout the network.
As for type (2) of the timing techniques, the RBS provides an instantaneous time synchronization among
a set of receivers that are within the reference broadcast of the transmitter. The transmitter broadcasts
m reference packets. Each of the receivers that are within the broadcast range records the time-of-arrival
of the reference packets. Afterwards, the receivers communicate with each other to determine the offsets.
To provide multihop synchronization, it is proposed to use nodes that are receiving two or more reference
broadcasts fromdifferent transmitters as translation nodes. These translation nodes are used to translate
the time between different broadcast domains.
As shown in Figure 35.4, nodes A, B, and C are the transmitter, receiver, and translation nodes,
respectively. The transmitter nodes broadcast their timing messages, and the receiver nodes receive these
messages. Afterwards, the receiver nodes synchronize with each other. The sensor nodes that are within
the broadcast regions of both transmitter nodes A are the translation nodes. When an event occurs,
a message describing the event with a time stamp is translated by the translation nodes when the message
is routed back to the sink. Although this time synchronization service is tunable and light-weight, there
may not be translation nodes on the route path that the message is relayed. As a result, services may not
be available on some routes. In addition, this protocol is not suitable for medium access scheme, such as
TDMA, since the clocks of all the nodes in the network are not adjusted to a common time.
2006 by Taylor & Francis Group, LLC
35-8 Embedded Systems Handbook
C
Transmitters Receivers
Translation nodes
A
B
FIGURE 35.4 Illustration of the RBS.
Hops
D F G
C
1 2 E 3
1 2 3
Diffused leader nodes
Master nodes
N
M
FIGURE 35.5 TDP concept.
Another emerging timing technique is the TDP. The TDP is used to maintain the time through-
out the network within a certain tolerance. The tolerance level can be adjusted based on the purpose
of the sensor networks. The TDP automatically self-congures by electing master nodes to synchro-
nize the sensor network. In addition, the election process is sensitive to energy requirement as well as
the quality of the clocks. The sensor network may be deployed in unattended areas, and the TDP still
synchronizes the unattended network to a common time. It is considered as a type (3) of the timing
techniques.
The TDP concept is illustrated in Figure 35.5. The elected master nodes are nodes C and G. First, the
master nodes send a message to their neighbors to measure the round-trip times. Once the neighbors
2006 by Taylor & Francis Group, LLC
Time Synchronization Issues in Sensor Networks 35-9
receive the message, they self-determine if they should become diffuse leader nodes. The ones elected to
become diffuse leader nodes reply to the master nodes and start sending a message to measure the round-
trip to their neighbors. As shown in Figure 35.5, nodes M, N, and D are the diffused leader nodes of node C.
Once the replies are received by the master nodes, the round-trip time and the standard deviation of the
round-trip time are calculated. The one-way delay fromthe master nodes to the neighbor nodes is half of
the measured round-trip time. Afterwards, the master nodes send a time-stamped message containing the
standard deviation to the neighbor nodes. The time in the time-stamped message is adjusted with the
one-way delay. Once the diffuse leader nodes receive the time-stamped message, they broadcast the time-
stamped message after adjusting the time, which is in the message, with their measured one-way delay and
inserting their standard deviation of the round-trip time. This diffusion process continues for n times,
where n is the number of hops from the master nodes. From Figure 35.5, the time is diffused 3 hops
from the master nodes C and G. The nodes D, E, and F are the diffused leader nodes that diffuse the
time-stamped messages originated fromthe master nodes.
For the nodes that have received more than one time-stamped messages originated from different
master nodes, they use the standard deviations carried in the time-stamped messages as weighted ratio
of their time contribution to the new time. In essence, the nodes weight the times diffused by the master
nodes to obtain a new time for them. This process is to provide a smooth time variation between the
nodes in the network. The smooth transition is important for some applications, such as target tracking
and speed estimating.
The master nodes are autonomously elected, so the network is robust to failures. Although some of
the nodes may die, there are still other nodes in the network that can self-determine to become master
nodes. This feature also enables the network to become server-less if necessary and to reach an equilibrium
time. In addition, the master and diffusion leader nodes are self-determined based on their own energy
level. Also, the TDP is light-weight, but it may not be as tunable as the RBS.
In summary, these timing techniques may be used for different types of applications; each of them
has its own benets. All of these techniques try to address the factors inuencing time synchronization
while designing according to the challenges as described in Section 35.2. Depending on the types of
services required by the applications or the hardware limitation of the sensor nodes, some of these timing
techniques may be applied.
35.6 Conclusions
The design challenges and factors inuencing time synchronization for sensor networks are described in
Sections 35.2 and 35.3, respectively. They are to provide guidelines for developing time synchronization
protocols. The requirements of sensor networks are different from traditional distributed computer
systems. As a result, new types of timing techniques are required to address the specic needs of the
applications. These techniques are described in Section 35.5. Since the range of applications in the sensor
networks is wide, new timing techniques are encouraged for different types of applications. This is to
provide optimized schemes tailored for unique environments and purposes.
Acknowledgment
The author wishes to thank Dr. Ian F. Akyildiz for his encouragement and support.
References
[1] Akyildiz, I.F. et al., Wireless Sensor Networks: A Survey. Computer Networks Journal, 393422,
2002.
[2] Levine, J., Time Synchronization Over the Internet Using an Adaptive Frequency-Locked Loop.
IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 46, 888896, 1999.
2006 by Taylor & Francis Group, LLC
35-10 Embedded Systems Handbook
[3] Mills, D.L., Adaptive Hybrid Clock Discipline Algorithm for the Network Time Protocol.
IEEE/ACM Transactions on Networking, 6, 505514, 1998.
[4] Allan, D., Time and Frequency (Time-Domain) Characterization, Estimation, and Prediction of
Precision Clocks and Oscillators. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency
Control, 34, 647654, 1987.
[5] Cristian, F. and Fetzer, C., Probabilistic Internal Clock Synchronization. In Proceedings of the
Thirteenth Symposium on Reliable Distributed Systems. Dana Point, CA, October 1994, pp. 2231.
[6] Pottie, G.J. and Kaiser, W.J., Wireless Integrated Network Sensors. Communications of the ACM,
43, 551558, 2000.
[7] MICA Motes and Sensors, http://www.xbow.com.
[8] Bianchi, G., Performance Analysis of the IEEE 802.11 Distributed Coordination Function. IEEE
Journal on Selected Areas in Communications, 18, 535547, 2000.
[9] Crow, B.P. et al., Investigation of the IEEE 802.11 Medium Access Control (MAC) Sublayer
Functions. In Proceedings of the IEEE INFOCOM97. Kobe, Japan, April 1997, pp. 126133.
[10] Tay, Y.C. and Chua, K.C., A Capacity Analysis for the IEEE 802.11 MAC Protocol. ACM Wireless
Networks Journal, 7, 159171, 2001.
[11] Mills, D.L., Internet Time Synchronization: The Network Time Protocol. Global States and Time
in Distributed Systems. IEEE Computer Society Press, Washington, 1994.
[12] Ganeriwal, S., Kumar, R., andSrivastava, M.B., Timing-Sync Protocol for Sensor Networks. InACM
SenSys 2003. Los Angeles, CA, November 2003 (to appear).
[13] Elson, J., Girod, L., and Estrin, D., Fine-Grained Network Time Synchronization Using Reference
Broadcasts. In Proceedings of the Fifth Symposiumon Operating Systems Design and Implementation
(OSDI 2002). Boston, MA, December 2002.
[14] Su, W. and Akyildiz, I.F., Time-Diffusion Synchronization Protocol for Wireless Sensor Networks.
IEEE/ACM Transaction on Networking, 13(2), April 2005.
[15] IEEE 1588, Standard for a Precision Clock Synchronization Protocol for Networked Measurement
and Control Systems, 2002.
2006 by Taylor & Francis Group, LLC
36
Distributed
Localization
Algorithms
Koen Langendoen and
Niels Reijers
Delft University of Technology
36.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36-1
36.2 Localization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36-2
Generic Approach Phase 1: Distance to Anchors Phase 2:
Node Position Phase 3: Renement
36.3 Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36-8
Standard Scenario
36.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36-9
Phase 1: Distance to Anchors Phase 2: Node Position
36.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36-14
Phases 1 and 2 Combined Phase 3: Renement
Communication Cost Recommendations
36.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36-21
Future Work
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36-22
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36-22
36.1 Introduction
Newtechnology offers newopportunities, but it also introduces newproblems. This is particularly true for
sensor networks where the capabilities of individual nodes are very limited. Hence, collaboration between
nodes is required, but energy conservation is a major concern, which implies that communication should
be minimized. These conicting objectives require unorthodox solutions for many situations.
A recent survey by Akyildiz et al.[1] discusses a long list of open research issues that must be addressed
before sensor networks can become widely deployed. The problems range from the physical layer (low-
power sensing, processing, and communication hardware) all the way up to the application layer (query
and data dissemination protocols). In this chapter we address the issue of localization in ad-hoc sensor
networks. That is, we want to determine the location of individual sensor nodes without relying on
external infrastructure (base stations, satellites, etc.).
The localization problemhas received considerable attention in the past, as many applications need to
know where objects or persons are, and hence various location services have been created. Undoubtedly,
Reprinted fromK. Langendoen and N. Reijers. Elsevier Computer Networks, 43: 499518, 2003. With permission.
36-1
2006 by Taylor & Francis Group, LLC
36-2 Embedded Systems Handbook
the Global Positioning System (GPS) is the most well-known location service in use today. The approach
taken by GPS, however, is unsuitable for low-cost, ad-hoc sensor networks since GPS is based on extensive
infrastructure (i.e., satellites). Likewise solutions developed in the area of robotics [24] and ubiquitous
computing [5] are generally not applicable for sensor networks as they require too much processing power
and energy.
Recently a number of localization systems have been proposed specically for sensor networks [614].
We are interested in truly distributed algorithms that can be employed on large-scale ad-hoc sensor
networks (100+ nodes). Such algorithms should be:
1. Self-organizing (i.e., do not depend on global infrastructure).
2. Robust (i.e., be tolerant to node failures and range errors).
3. Energy efcient (i.e., require little computation and, especially, communication).
These requirements immediately rule out some of the proposed localization algorithms for sensor net-
works. In this chapter, we carry out a thorough sensitivity analysis on three algorithms that do meet the
above requirements to determine how well they perform under various conditions. In particular, we study
the impact of the following parameters: range errors, connectivity (density), and anchor fraction. These
algorithms differ in their position accuracy, network coverage, induced network trafc, and processor
load. Given the (slightly) different design objectives for the three algorithms, it is no surprise that each
algorithm outperforms the others under a specic set of conditions. Under each condition, however, even
the best algorithm leaves much room for improving accuracy and/or increasing coverage.
In this chapter we will:
1. Identify a common, 3-phase, structure in the selected distributed localization algorithms.
2. Identify a generic optimization applicable to all algorithms.
3. Provide a detailed comparison on a single (simulation) platform.
4. Show that there is no algorithm that performs best, and that there exists room for improvement in
most cases.
Section 36.2 discusses the selection, generic structure, and operation of three distributed localization
algorithms for large-scale ad-hoc sensor networks. These algorithms are compared on a simulation plat-
form, which is described in Section 36.3. Section 36.4 presents intermediate results for the individual
phases, while Section 36.5 provides a detailed overall comparison and an in-depth sensitivity analysis.
Finally, we give conclusions in Section 36.6.
36.2 Localization Algorithms
Before discussing distributed localization in detail, we rst outline the context in which these algorithms
have to operate. A rst consideration is that the requirement for sensor networks to be self-organizing
implies that there is no ne control over the placement of the sensor nodes when the network is installed
(e.g., when nodes are dropped from an airplane). Consequently, we assume that nodes are randomly
distributed across the environment. For simplicity and ease of presentation we limit the environment to
2 dimensions, but all algorithms are capable of operating in 3D. Figure 36.1 shows an example network
with 25 nodes; pairs of nodes that can communicate directly are connected by an edge. The connectivity
of the nodes in the network (i.e., the average number of neighbors) is an important parameter that has a
strong impact on the accuracy of most localization algorithms (see Sections 36.4 and 36.5). It is initially
determined by the node density and radio range, and in some cases it can be adjusted dynamically by
changing the transmit power of the RF radio.
In some application scenarios, nodes may be mobile. In this chapter, however, we focus on static net-
works, where nodes do not move, since this is already a challenging condition for distributed localization.
We assume that some anchor nodes have a priori knowledge of their own position with respect to some
2006 by Taylor & Francis Group, LLC
Distributed Localization Algorithms 36-3
Anchor
Unknown
FIGURE 36.1 Example network topology.
global coordinate system. Note that anchor nodes have the same capabilities (processing, communica-
tion, energy consumption, etc.) as all other sensor nodes with unknown positions; we do not consider
approaches based on an external infrastructure with specialized beacon nodes (access points) as used in,
for example, the GPS-less [6], Cricket [15], and RADAR [16] location systems. Ideally the fraction of
anchor nodes should be as low as possible to minimize the installation costs. Our simulation results show
that, fortunately, most algorithms are rather insensitive to the number of anchors in the network.
The nal element that denes the context of distributed localization is the capability to measure the
distance between directly connected nodes in the network. From a cost perspective it is attractive to
use the RF radio for measuring the range between nodes, for example, by observing the signal strength.
Experience has shown, however, that this approach yields poor distance estimates [1719]. Much better
results are obtained by time-of-ight measurements, particularly when acoustic and RF signals are com-
bined [14,19,20]; accuracies of a fewpercent of the transmission range are reported. However, this requires
extra hardware on the sensor boards.
Several different ways of dealing with the problem of inaccurate distance information have been pro-
posed. The APIT [10] algorithm by He et al. only needs distance information accurate enough for two
nodes determine which of themis closest to an anchor. GPS-less [6] by Bulusu et al. and DV-hop [11] by
Niculescu and Nath do not use distance information at all, and are based on topology information only.
Ramadurai and Sichitiu [12] propose a probabilistic approach to the localization problem. Not only the
measured distance, but also the condence in the measurement is used.
It is important to realize that the main three context parameters (connectivity, anchor fraction, and
range errors) are dependent. Poor range measurements can be compensated for by using many anchors
and/or a high connectivity. This chapter provides insight in the complex relation between connectivity,
anchor fraction, and range errors for a number of distributed localization algorithms.
36.2.1 Generic Approach
From the known localization algorithms specically proposed for sensor networks, we selected the three
approaches that meet the basic requirements for self-organization, robustness, and energy-efciency:
1. Ad-hoc positioning by Niculescu and Nath [11]
2. N-hop multilateration by Savvides et al. [14]
3. Robust positioning by Savarese et al. [13]
The other approaches often include a central processing element (e.g., convex optimization by Doherty
et al. [9]), rely on an external infrastructure (e.g., GPS-less by Bulusu et al. [6]), or induce too much
2006 by Taylor & Francis Group, LLC
36-4 Embedded Systems Handbook
TABLE 36.1 Algorithm Classication
Phase Ad-hoc positioning [11] Robust positioning [13] N-hop multilateration [14]
1. Distance Euclidean DV-hop Sum-dist
2. Position Lateration Lateration Min-max
3. Renement No Yes Yes
communication (e.g., GPS-free by Capkun et al. [7]). The three selected algorithms are fully distributed
and use local broadcast for communication with immediate neighbors. This last feature allows them to
be executed before any multi-hop routing is in place, hence, they can support efcient location-based
routing schemes like GAF [21].
Although the three algorithms were developed independently, we found that they share a common
structure. We were able to identify the following generic, 3-phase approach
1
for determining the individual
node positions:
1. Determine the distances between unknowns and anchor nodes.
2. Derive for each node a position from its anchor distances.
3. Rene the node positions using information about the range (distance) to, and positions of,
neighboring nodes.
The original descriptions of the algorithms present the rst two phases as a single entity, but we found
that separating them provides two advantages. First, we obtain a better understanding of the combined
behavior by studying intermediate results. Second, it becomes possible to mix-and-match alternatives
for both phases to tailor the localization algorithm to the external conditions. The renement phase is
optional and may be included to obtain more accurate locations.
In the remainder of this section we will describe the three phases (distance, position, and renement)
in detail. For each phase we will enumerate the alternatives as found in the original descriptions. Table 36.1
gives the breakdown into phases of the three approaches. When applicable we also discuss (minor) adjust-
ments to (parts of) the individual algorithms that were needed to ensure compatibility with the alternatives.
During our simulations we observed that we occasionally operated (parts of) the algorithms outside their
intended scenarios, which deteriorated their performance. Often, small improvements brought their
performance back in line with the alternatives.
36.2.2 Phase 1: Distance to Anchors
In this phase, nodes share information to collectively determine the distances between individual nodes
and the anchors, so that an (initial) position can be calculated in Phase 2. None of the Phase 1 alternat-
ives engages in complicated calculations, so this phase is communication bounded. Although the three
distributed localization algorithms each use a different approach, they share a common communication
pattern: information is ooded into the network, starting at the anchor nodes. A network-wide ood by
some anchor A is expensive since each node must forward As information to its (potentially) unaware
neighbors. This implies a scaling problem: ooding information from all anchors to all nodes will become
much too expensive for large networks, even with low anchor fractions. Fortunately a good position can
be derived inPhase 2 with knowledge (positionand distance) froma limited number of anchors. Therefore
nodes can simply stop forwarding information when enough anchors have been located. This simple
optimization presented in the Robust positioning approach proved to be highly effective in controlling
the amount of communication (see Section 36.5.3). We modied the other two approaches to include a
ood limit as well.
1
Our three phases donot correspondtothe three of Savvides et al. [14]; our structure allows for aneasier comparison
of all algorithms.
2006 by Taylor & Francis Group, LLC
Distributed Localization Algorithms 36-5
36.2.2.1 Sum-Dist
The most simple solution for determining the distance to the anchors is simply adding the ranges
encountered at each hop during the network ood. This is the approach taken by the N-hop multi-
lateration approach, but it remained nameless in the original description [14]; we name it Sum-dist in this
chapter. Sum-dist starts at the anchors, who send a message including their identity, position, and a path
length set to 0. Each receiving node adds the measured range to the path length and forwards (broadcasts)
the message if the ood limit allows it to do so. Another constraint is that when the node has received
information about the particular anchor before, it is only allowed to forward the message if the current
path length is less than the previous one. The end result is that each node will have stored the position
and minimumpath length to at least ood limit anchors.
36.2.2.2 DV-Hop
A drawback of Sum-dist is that range errors accumulate when distance information is propagated over
multiple hops. This cumulative error becomes signicant for large networks with few anchors (long
paths) and/or poor ranging hardware. A robust alternative is to use topological information by counting
the number of hops instead of summing the (erroneous) ranges. This approach was named DV-hop by
Niculescu and Nath [11], and Hop-TERRAIN by Savarese et al. [13]. Since the results of DV-hop were
published rst we will use this name.
DV-hop essentially consists of two ood waves. After the rst wave, which is similar to Sum-dist, nodes
have obtained the position and minimumhop count to at least ood limit anchors. The second calibration
wave is needed to convert hop counts into distances such that Phase 2 can compute a position. This
conversion consists of multiplying the hop count with an average hop distance. Whenever an anchor a
1
infers the position of another anchor a
2
during the rst wave, it computes the distance between them, and
divides that by the number of hops to derive the average hop distance between a
1
and a
2
. When calibrating,
an anchor takes all remote anchors into account that it is aware of. When later, information on extra
anchors is received, the calibration procedure is repeated. Nodes forward (broadcast) calibration messages
only from the rst anchor that calibrates them, which reduces the total number of messages in the
network.
36.2.2.3 Euclidean
A drawback of DV-hop is that it fails for highly irregular network topologies, where the variance in actual
hop distances is very large. Niculescu and Nath have proposed another method, named Euclidean, which is
based on the local geometry of the nodes around an anchor. Again anchors initiate a ood, but forwarding
the distance is more complicated than in the previous cases. When a node has received messages fromtwo
neighbors that know their distance to the anchor, and to each other, it can calculate the distance to the
anchor.
Figure 36.2 shows a node (Self ) that has two neighbors n1 and n2 with distance estimates (a and b) to
an anchor. Together with the known ranges c, d, and e, Euclidean arrives at two possible values (r1 and r2)
n2
n1
Anchor
Self Self
b
e
e
r 1
r 2
a
d
c
d
FIGURE 36.2 Determining distance using Euclidean.
2006 by Taylor & Francis Group, LLC
36-6 Embedded Systems Handbook
for the distance of the node to the anchor. Niculescu describes two methods to decide on which, if any,
distance to use. The neighbor vote method can be applied if there exists a third neighbor (n3) that has a
distance estimate to the anchor and that is connected to either n1 or n2. Replacing n2 (or n1) by n3 will
again yield a pair of distance estimates. The correct distance is part of both pairs, and is selected by a
simple voting. Of course, more neighbors can be included to make the selection more accurate.
The second selection method is called common neighbor and can be applied if node n3 is connected
to both n1 and n2. Basic geometric reasoning leads to the conclusion that the anchor and n3 are on
the same or opposite side of the mirroring line n1 to n2, and similarly whether or not Self and n3
are on the same side. From this it follows whether or not self and the anchor lay on the same side.
To handle the uncertainty introduced by range errors Niculescu implements a safety mechanism that
rejects ill-formed (at) triangles, which can easily derail the selection process by neighbor vote and
common neighbor. This check veries that the sum of the two smallest sides exceeds the largest side
multiplied by a threshold, which is set to two times the range variance. For example, the triangle Self-n1-n2
in Figure 36.2 is accepted when c +d > (1 +2RangeVar )e. Note that the safety check becomes more strict
as the range variance increases. This leads to a lower coverage, dened as the percentage of non-anchor
nodes for which a position was determined.
We now describe some modications to Niculescus neighbor vote method that remedy the poor
selection of the location for Self in important corner cases. The rst problem occurs when the two votes
are identical because, for instance, the three neighbors (n1, n2, and n3) are collinear. In these cases it is
hard to select the right alternative. Our solution is to leave equal vote cases unsolved, instead of picking
an alternative and propagating an error with 50% chance. We lter all indecisive cases by adding the
requirement that the standard deviation of the votes for the selected distance must be at most 1/3rd of the
standard deviation of the other distance. The second problem that we address is that of a bad neighbor
with inaccurate information spoiling the selection process by voting for two wrong distances. This case is
ltered out by requiring that the standard deviation of the selected distance is at most 5%of that distance.
To achieve good coverage, we use both the neighbor vote and common neighbor methods. If both
produce a result, we use the result fromthe modied neighbor vote because we found it to be the most
accurate of the two. If both fail, the ooding process stops, leading to the situation where certain nodes
are not able to establish the distance to enough anchor nodes. Sum-dist and DV-hop, on the other hand,
never fail to propagate the distance and hop count, respectively.
36.2.3 Phase 2: Node Position
In the second phase nodes determine their position based on the distance estimates to a number of anchors
provided by one of the three Phase 1 alternatives (Sum-dist, DV-hop, or Euclidean). The ad-hoc posi-
tioning and Robust positioning approaches use lateration for this purpose. N-hop multilateration, on the
other hand, uses a much simpler method, which we named Min-max. In both cases the determination of
the node positions does not involve additional communication.
36.2.3.1 Lateration
The most common method for deriving a position is lateration, which is a form of triangulation. From
the estimated distances (d
i
) and known positions (x
i
, y
i
) of the anchors we derive the following systemof
equations:
(x
1
x)
2
+(y
1
y)
2
= d
1
2
.
.
.
(x
n
x)
2
+(y
n
y)
2
= d
n
2
2006 by Taylor & Francis Group, LLC
Distributed Localization Algorithms 36-7
where the unknown position is denoted by (x, y). The system can be linearized by subtracting the last
equation from the rst n 1 equations.
x
1
2
x
n
2
2(x
1
x
n
)x + y
1
2
y
n
2
2(y
1
y
n
)y = d
1
2
d
n
2
.
.
.
x
n 1
2
x
n
2
2(x
n 1
x
n
)x + y
n 1
2
y
n
2
2(y
n 1
y
n
)y = d
n 1
2
d
n
2
Reordering the terms gives a proper system of linear equations in the form Ax = b, where
A =
2(x
1
x
n
) 2(y
1
y
n
)
.
.
.
.
.
.
2(x
n 1
x
n
) 2(y
n 1
y
n
)
b =
x
1
2
x
n
2
+ y
1
2
y
n
2
+ d
n
2
d
1
2
.
.
.
x
n 1
2
x
n
2
+ y
n 1
2
y
n
2
+ d
n
2
d
n 1
2
n
i =1
(x
i
x )
2
+(y
i
y )
2
d
i
n
A large residue signals an inconsistent set of equations; we reject the location x when the length of the
residue exceeds the radio range.
36.2.3.2 Min-Max
Lateration is quite expensive in the number of oating point operations that is required. A much simpler
method is presented by Savvides et al. as part of the N-hop multilateration approach. The main idea is to
construct a bounding box for each anchor using its position and distance estimate, and then to determine
the intersection of these boxes. The position of the node is set to the center of the intersection box.
Figure 36.3 illustrates the Min-max method for a node with distance estimates to three anchors. Note that
the estimated position by Min-max is close to the true position computed through Lateration (i.e., the
intersection of the three circles).
The bounding box of anchor a is created by adding and subtracting the estimated distance (d
a
) from
the anchor position (x
a
, y
a
):
[x
a
d
a
, y
a
d
a
] [x
a
+d
a
, y
a
+d
a
]
The intersection of the bounding boxes is computed by taking the maximumof all coordinate minimums
and the minimumof all maximums:
[max(x
i
d
i
), max(y
i
d
i
)] [min(x
i
+d
i
), min(y
i
+d
i
)]
The nal position is set to the average of both corner coordinates. As for Lateration, we only accept the
nal position if the residue is small.
2006 by Taylor & Francis Group, LLC
36-8 Embedded Systems Handbook
pos.
est.
Anchor2
Anchor1
Anchor3
FIGURE 36.3 Determining position using Min-max.
36.2.4 Phase 3: Renement
The objective of the third phase is to rene the (initial) node positions computed during Phase 2. These
positions are not very accurate, even under good conditions (high connectivity, small range errors),
because not all available information is used in the rst two phases. In particular, most ranges between
neighboring nodes are neglected when the node-anchor distances are determined. The iterative Rene-
ment procedure proposed by Savarese et al. [13] does take into account all inter-node ranges, when nodes
update their positions in a small number of steps. At the beginning of each step a node broadcasts its posi-
tion estimate, receives the positions and corresponding range estimates from its neighbors, and performs
the Lateration procedure of Phase 2 to determine its new position. In many cases the constraints imposed
by the distances to the neighboring locations will force the new position towards the true position of
the node. When, after a number of iterations, the position update becomes small, Renement stops and
reports the nal position.
The basic iterative renement procedure outlined above proved to be too simple to be used in practice.
The main problem is that errors propagate quickly through the network; a single error introduced by
some node needs only d iterations to affect all nodes, where d is the network diameter. This effect was
countered by (1) clipping undetermined nodes with non-overlapping paths to less than three anchors,
(2) ltering out difcult symmetric topologies, and (3) associating a condence metric with each node and
using them in a weighted least-squares solution (wAx = wb). The details (see Reference 17) are beyond
the scope of this chapter, but the adjustments considerably improved the performance of the Renement
procedure. This is largely due to the condence metric, which allows ltering of bad nodes, thus increasing
the (average) accuracy at the expense of coverage.
The N-hop multilateration approach by Savvides et al. [14] also includes an iterative renement pro-
cedure, but it is less sophisticated than the Renement discussed above. In particular, they do not use
weights, but simply group nodes into so-called computation subtrees (over-constrained congurations)
and enforce nodes within a subtree to execute their position renement in turn in a xed sequence to
enhance convergence to a pre-specied tolerance. In the remainder of this chapter we will only consider
the more advanced Renement procedure of Savarese et al.
36.3 Simulation Environment
Tocompare the three original distributedlocalizationalgorithms (Ad-hoc positioning, Robust positioning,
and N-hop multilateration) and to try out newcombinations of phase 1, 2, and 3 alternatives, we extended
the simulator developed by Savarese et al. [13]. The underlying OMNeT++discrete event simulator [22]
takes care of the semi-concurrent execution of the specic localization algorithm. Each sensor node
2006 by Taylor & Francis Group, LLC
Distributed Localization Algorithms 36-9
runs the same C++code, which is parameterized to select a particular combination of phase 1, 2, and 3
alternatives.
Our network layer supports localized broadcast only, and messages are simply delivered at the neigh-
bors within a xed radio range (circle) from the sending node; a more accurate model should take radio
propagation effects into account (see future work). Concurrent transmissions are allowed if the transmis-
sion areas (circles) do not overlap. If a node wants to broadcast a message while another message in its area
is in progress, it must wait until that transmission (and possibly other queued messages) are completed.
In effect we employ a CSMA policy. Furthermore we do not consider message corruption, so all messages
sent during our simulation are delivered (after some delay).
At the start of a simulation experiment we generate a random network topology according to some
parameters (#nodes, #anchors, etc.). The nodes are randomly placed, with a uniformdistribution, within
a square area. Next we select which nodes will serve as an anchor. To this end we superimpose a grid on top
of the square, and designate to each grid point its closest node as an anchor. The size of the grid is chosen
as the maximal number s that satises s s #anchors; any remaining anchors are selected randomly. The
reason for carefully selecting the anchor positions is that most localization algorithms are quite sensitive
to the presence, or absence, of anchors at the edges of the network. (Locating unknowns at the edges of
the network is more difcult because nodes at the edge are less well connected and positioning techniques
like lateration performbest when anchors surround the unknown.) Although anchor placement may not
be feasible in practice, the majority of the nodes in large-scale networks (1000+ nodes) will generally be
surrounded by anchors. By placing anchors we can study the localization performance in large networks
with simulations involving only a modest number of nodes.
The measured range between connected nodes is blurred by drawing a random value from a normal
distribution having a parameterized standard deviation and having the true range as the mean. We
selected this error model based on the work of Whitehouse and Culler [23], which shows that, although
individual distance measurements tendtoovershoot the real distance, a proper calibrationprocedure yields
distance estimates with a symmetric error distribution. The connectivity (average number of neighbors)
is controlled by specifying the radio range.
At the end of a run the simulator outputs a large number of statistics per node: position information,
elapsed time, message counts (broken down per type), etc. These individual node statistics are combined
and presented as averages (or distributions), for example, as an average position error. Nodes that do not
produce a position are excluded fromsuch averaged metrics. To account for the randomness in generating
topologies and range errors we repeated each experiment 100 times with a different seed, and report the
averaged results. To allow for easy comparison between different scenarios, range errors as well as errors
on position estimates are normalized to the radio range (i.e., 50%position error means a distance of half
the range of the radio between the real and estimated positions).
36.3.1 Standard Scenario
The experiments described in the subsequent sections share a standard scenario, in which certain para-
meters are varied: radio range (connectivity), anchor fraction, and range errors. The standard scenario
consists of a network of 225 nodes placed in a square with sides of 100 units. The radio range is set to 14,
resulting in an average connectivity of about 12. We use an anchor fraction of 5%, hence, 11 anchors in
total, of which 9 (3 3) are placed in a grid-like position. The standard deviation of the range error is set
to 10%of the radio range. The default ood limit for Phase 1 is set to 4 (Lateration requires a minimum
of 3 anchors). Unless specied otherwise, all data will be based on this standard scenario.
36.4 Results
In this section we present results for the rst two phases (anchor distances and node positions). We study
each phase separately and show how alternatives respond to different parameters. These intermediate
2006 by Taylor & Francis Group, LLC
36-10 Embedded Systems Handbook
results will be used in Section 36.5, where we will discuss the overall performance, and compare complete
localization algorithms. Throughout this section we will vary one parameter in the standard scenario
(radio range, anchor fraction, range error) at a time to study the sensitivity of the algorithms. The reader,
however, should be aware that the three parameters are not orthogonal.
36.4.1 Phase 1: Distance to Anchors
Figure 36.4 shows the performance of the Phase 1 alternatives for computing the distances between
nodes and anchors under various conditions. There are two metrics of interest: rst, the bias in the
estimate, measured here using the mean of the distance errors, and second, the precision of the estimated
distances, measured here using the standard deviation of the distance errors. Therefore, Figure 36.4 plots
both the average error, relative to the true distance, and the standard deviation of that relative error. We
will now discuss the sensitivity of each alternative: Sum-dist, DV-hop, and Euclidean.
36.4.1.1 Sum-Dist
Sum-dist is the cheapest of the three methods, both with respect to computation and communication
costs. Nevertheless it performs quite satisfactorily, except for large range errors (0.1). There are two
opposite tendencies affecting the bias of Sum-dist. First, without range errors, the sum of the ranges along
a multi-hop path will always be larger than the actual distance, leading to an overestimation of the distance.
Second, the algorithm searches for the shortest path, forcing it to select links that underestimate the actual
distance when range errors are present. The combined effect shows non-intuitive results. A small range
error reduces the bias of Sum-dist. Initially, the detour effect leads to an overshoot, but the shortest-path
effect takes over when the range errors increase, leading to a large undershoot.
When the radio range (connectivity) is increased, more nodes can be reached in a single hop. This
leads to straighter paths (less overshoot), and provides more options for selecting a (incorrect) shortest
path (higher undershoot). Consequently, increasing the connectivity is not necessarily a good thing for
Sum-dist.
36.4.1.2 DV-Hop
The DV-hop method is a stable and predictable method. Since it does not use range measurements, it is
completely insensitive to this source of errors. The low relative error (5%) shows that the calibration wave
is very effective. DV-hop searches for the path with the minimum number of hops, causing the average
hop distance to be close to the radio range. The last hop on the path from an anchor to a node, however,
is usually shorter than the radio range, which leads to a slight overestimation of the node-anchor distance.
This effect is more pronounced for short paths, hence the increased error for larger radio ranges and
higher anchor fractions (i.e., fewer hops).
36.4.1.3 Euclidean
Euclidean is capable of determining the exact anchor-node distances, but only in the absence of range errors
and in highly connected networks. When these conditions are relaxed, Euclideans performance rapidly
degrades. The curves in Figure 36.4 show that Euclidean tends to underestimate the distances. The reason
is that the selection process is forced to choose between two options that are quite far apart and that in
many cases the shortest distance is incorrect. Consider Figure 36.2 again, where the shortest distance r2
falls within the radio range of the anchor. If r2 would be the correct distance then the node should be in
direct contact with the anchor avoiding the need for a selection. Therefore nodes simply have more chance
to underestimate distances than to overestimate them in the face of (small) range errors. This error can
then propagate to nodes that are multiple hops away from the anchor, causing them to underestimate the
distance to the anchor as well.
We quantied the impact of the selection bias towards short distances. Figure 36.5 shows the distribution
of the errors, relative to the true distance, on the standard scenario for Euclideans selection mechanism
(solid line) and an oracle that always selects the best distance (dashed line). The oracles distribution is
2006 by Taylor & Francis Group, LLC
Distributed Localization Algorithms 36-11
0.4
0.2
0
0.2
0.4
0.6
0.8
0.2 0.3 0.4 0.5
R
e
l
a
t
i
v
e
d
i
s
t
a
n
c
e
e
r
r
o
r
(
a
c
t
u
a
l
d
i
s
t
a
n
c
e
)
R
e
l
a
t
i
v
e
d
i
s
t
a
n
c
e
e
r
r
o
r
(
a
c
t
u
a
l
d
i
s
t
a
n
c
e
)
R
e
l
a
t
i
v
e
d
i
s
t
a
n
c
e
e
r
r
o
r
(
a
c
t
u
a
l
d
i
s
t
a
n
c
e
)
Range variance
DV-hop
Sum-dist
Euclidean
Mean
Std. dev.
0.4
0.2
0
0.2
0.4
0.6
0.8
DV-hop
Sum-dist
Euclidean
Mean
Std. dev.
0.4
0.2
0
0.2
0.4
0.6
0.8
DV-hop
Sum-dist
Euclidean
Mean
Std. dev.
0 0.1
8
(4.2)
9 10
(6.4)
11 12
(9.0)
13 14
(12.1)
15 16
(15.5)
Radio range (avg. connectivity)
0 0.05 0.1 0.15 0.2
Anchor fraction
FIGURE 36.4 Sensitivity of Phase 1 methods: distance error (solid lines) and standard deviation (dashed lines).
2006 by Taylor & Francis Group, LLC
36-12 Embedded Systems Handbook
0
1
2
3
4
5
Relative range error
Euclidean
Oracle
P
r
o
b
a
b
i
l
i
t
y
d
e
n
s
i
t
y
1 0.5 0 0.5 1
FIGURE 36.5 The impact of incorrect distance selection on Euclidean.
nicely centered around zero (no error) with a sharp peak. Euclideans distribution, in contrast, is skewed
by a heavy tail at the left, signalling a bias for underestimations.
Euclideans sensitivity for connectivity is not immediately apparent from the accuracy data in
Figure 36.4. The main effect of reducing the radio range is that Euclidean will not be able to propag-
ate the anchor distances. Recall that Euclideans selection methods require at least three neighbors with a
distance estimate to advance the anchor distance one hop. In networks with low connectivity, two parts
connected only by a few links will often not be able to share anchors. This leads to problems in Phase 2,
where fewer node positions can be computed. The effects are quite pronounced, as will become clear in
Section 36.5 (see the coverage curves in Figure 36.10).
36.4.2 Phase 2: Node Position
To obtain insight into the fundamental behavior of the Lateration and Min-max algorithms we now report
on some experiments with controlled distance errors and anchor placement. The impact of actual distance
errors as produced by the Phase 1 methods will be discussed in Section 36.5.
36.4.2.1 Distance Errors
Starting from the standard scenario we select for each node the ve nearest anchors, and add some noise
to the real distances. This noise is generated by rst taking a sample from a normal distribution with
the actual distance as the mean and a parameterized percentage of the distance as the standard deviation.
The result is then multiplied by a bias factor. The ranges for the standard deviation and bias factor follow
fromthe Phase 1 measurements.
Figure 36.6 shows the sensitivity of Lateration and Min-max when the standard deviation percentage
was varied from 0 to 0.25, and the bias factor xed at zero. Lateration outperforms Min-max for precise
distance estimates, but Min-max takes over for large standard deviations ( 0.15).
Figure 36.7 shows the effect of adding a bias to the distance estimates. The curves show that Lateration is
very sensitive to a bias factor, especially for precise estimates (std. dev. = 0). Min-max is rather insensitive
to bias, because stretching the bounding boxes has little effect on the position of the center. For precise
distance estimates and a small bias factor Lateration outperforms Min-max, but the bottom graph in
Figure 36.7 shows that Min-max is probably the preferred technique when the standard deviation rises
above 10%.
Although Min-max is not very sensitive to bias, we do see that Min-max performs better for a positive
range bias (i.e., an overshoot). This is a consequence of the error introduced by Min-max using a bound-
ing box instead of a circle around anchors. For simplicity we limit the explanation to the effects on
2006 by Taylor & Francis Group, LLC
Distributed Localization Algorithms 36-13
0
20
40
60
80
100
P
o
s
i
t
i
o
n
e
r
r
o
r
(
%
r
)
Range error std. dev.
Lateration
Min-max
0 0.05 0.1 0.15 0.2 0.25
FIGURE 36.6 Sensitivity of Phase 2 to precision.
std. dev. = 0
Bias factor
Lateration
Minmax
P
o
s
i
t
i
o
n
e
r
r
o
r
(
%
r
)
std. dev. = 0.1 Lateration
Minmax
0
20
40
60
80
100
P
o
s
i
t
i
o
n
e
r
r
o
r
(
%
r
)
0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2
0
20
40
60
80
100
0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2
Bias factor
FIGURE 36.7 Sensitivity of Phase 2 to bias.
Anchor1
Anchor2
r 1
a1
r 2
a2
FIGURE 36.8 Min-max scenario.
the x-coordinate only. Figure 36.8 shows that Anchor1 making a small angle with the x-axis yields a tight
bound (to the right of the true location), and that the large angle of Anchor2 yields a loose bound (to
the left of the true location). The estimated position is off in the direction of the loose bound (to the
left). Adding a positive bias to the range estimates causes the two bounds to shift proportionally. As a
2006 by Taylor & Francis Group, LLC
36-14 Embedded Systems Handbook
Min-max
Anchor
Unknown
Estimated position
Lateration
FIGURE 36.9 Node locations computed for network topology of Figure 36.1.
consequence the center of the intersection moves into the direction of the bound with the longest range
(to the right). Consequently the estimated coordinate moves closer to the true coordinate. The opposite
will happen if the anchor with the largest angle has the longest distance. Min-max selects the strongest
bounds, leading to a preference for small angles and small distances, which favors the number of good
cases where the coordinate moves closer to the true coordinate if a positive range bias is added.
36.4.2.2 Anchor Placement
Min-max has the advantage of being computationally cheap and insensitive to errors, but it requires
a good constellation of anchors; in particular, Savvides et al. recommend to place the anchors at the
edges of the network [14]. If the anchors cannot be placed and are uniformly distributed across the
network, the accuracy of the node positions at the edges is rather poor. Figure 36.9 illustrates this problem
graphically. We applied Min-max and Lateration to the example network presented in Figure 36.1. In the
case of Min-max, all nodes that lie outside the convex envelope of the four anchor nodes are drawn
inwards, yielding considerable errors (indicated by the dashed lines); the nodes within the envelope are
located adequately. Lateration, on the other hand, performs much better. Nodes at the edges are located
less accurately than interior nodes, but the magnitude of and variance in the errors is smaller than for
Min-max.
The differences in sensitivity to anchor placement between Lateration and Min-max can be considerable.
For instance, for DV-hop/Lateration in the standard scenario, the average position accuracy degrades from
43 to 77%, when anchors are randomly distributed instead of the grid-based placement. The accuracy of
DV-hop/Lateration also degrades, but only from 42 to 54%.
36.5 Discussion
Now that we know the behavior of the individual phase 1 and 2 components, we can turn to the per-
formance effects of concatenating both phases, followed by applying Renement in Phase 3. We will study
the sensitivity of various combinations to connectivity, anchor fraction, and range errors using both the
resulting position error and coverage.
36.5.1 Phases 1 and 2 Combined
Combining the three Phase 1 alternatives (Sum-dist, DV-hop, and Euclidean) with the two Phase 2
alternatives (Lateration and Min-max) yields a total of six possibilities. We will analyze the differences
in terms of coverage (Figure 36.10) and position accuracy (Figure 36.11). When ne-tuning localization
2006 by Taylor & Francis Group, LLC
Distributed Localization Algorithms 36-15
algorithms, the trade-off between accuracy and coverage plays an important role; dropping difcult cases
increases average accuracy at the expense of coverage.
36.5.1.1 Coverage
Figure 36.10 shows the coverage of the six Phase 1/Phase 2 combinations for varying range error (top),
radio range (middle), and anchor fraction (bottom). The solid lines denote the Lateration variants; the
dashed lines denote the Min-max variants. The rst observation is that Sum-dist and DV-hop are able to
determine the range to enough anchors to position all the nodes, except in cases when the radio range
is small (11), or equivalently when the connectivity is low ( 7.5). In such sparse networks, Lateration
provides a slightly higher coverage than Min-max. This is caused by the sanity check on the residue.
A consistent set of anchor positions and distance estimates leads to a low residue, but the reverse does
not hold. Occasionally if Lateration is used with an inconsistent set, an outlier is produced with a small
residue, which is accepted. Min-max does not suffer from this problem because the positions are always
constrained by the bounding boxes and thus cannot produce such outliers. Laterations higher coverage
results in higher errors, see the accuracy curves in Figure 36.11.
The second observation is that Euclidean has great difculty in achieving a reasonable coverage when
conditions are non-ideal. The combination with Min-max gives the highest coverage, but even that
combination only achieves acceptable results under ideal conditions (range variance 0.1, connectivity
15, anchor fraction 0.1). The reason for Euclideans poor coverage is twofold. First, the triangles used
to propagate anchor distances are checked for validity (see Section 36.2.2); this constraint becomes more
strict as the range variance increases, hence the signicant drop in coverage. Second, Euclidean can only
forward anchor distances if enough neighbors are present (see Section 36.4.1) resulting in many nodes
locating only one or two anchors. Lateration requires at least three anchors, but Min-max does not have
this requirement. This explains why the Euclidean/Min-max combination yields a higher coverage. Again,
the price is paid in terms of accuracy (cf. Figure 36.11).
36.5.1.2 Accuracy
Figure 36.11 gives the average position error of the six combinations under the same varying conditions
as for the coverage plots. To ease the interpretation of the accuracies we ltered out anomalous cases
whose coverage is below 50%, which mainly concerns Euclideans results. The most striking observation is
that the Euclidean/Lateration combination clearly outperforms the others in the absence of range errors:
0% error versus at least 29% (Sum-dist/Min-max). This follows from the good performance of both
Euclidean and Lateration in this case (see Section 36.4). The downside is that both components were also
shown to be very sensitive to range errors. Consequently, the average position error increases rapidly if
noise is added to the range estimates; at just 2% range variance, Euclidean/Lateration looses its advantage
over the Sum-dist/Min-max combination. When the range variance exceeds 10%, DV-hop performs best.
In this scenario DV-hop achieves comparable accuracies for both Lateration and Min-max. Which Phase 2
algorithm is most appropriate depends on anchor placement, and whether the higher computation cost
of Lateration is important.
Notice that Sum-dist/Lateration actually becomes more accurate when a small amount of range vari-
ance is introduced, while the errors of Sum-dist/Min-max increase. This matches the results found in
Sections 36.4.1 and 36.4.2. Adding a small range error causes Sum-dist to yield more accurate distance
estimates (cf. Figure 36.4). Lateration benets greatly from a reduced bias, but Min-max is not that
sensitive and even deteriorates slightly (cf. Figure 36.7). The combined effect is that Sum-dist/Lateration
benets fromsmall range errors; Sum-dist/Min-max does not show this unexpected behavior.
All six combinations are quite sensitive to the radio range (connectivity). A minimum connectivity of
9.0 is required (at radio range 12) for DV-hop and Sum-dist, in which case Sum-dist slightly outperforms
DV-hop and the difference between Lateration and Min-max is negligible. Euclidean does not perform
well because of the 10%range variance in the standard scenario.
2006 by Taylor & Francis Group, LLC
36-16 Embedded Systems Handbook
Range variance
DV-hop
Sum-dist
Euclidean
Lateration
Min-max
DV-hop
Sum-dist
Euclidean
Lateration
Min-max
DV-hop
Sum-dist
Euclidean
Lateration
Min-max
0
20
40
60
80
100
C
o
v
e
r
a
g
e
(
%
)
0
20
40
60
80
100
0
20
40
60
80
100
0 0.1 0.2 0.3 0.4 0.5
8
(4.2)
9 10
(6.4)
11 12
(9.0)
13 14
(12.1)
15 16
(15.5)
Radio range (avg. connectivity)
0 0.05 0.1 0.15 0.2
Anchor fraction
C
o
v
e
r
a
g
e
(
%
)
C
o
v
e
r
a
g
e
(
%
)
FIGURE 36.10 Coverage of phase 1 /2 combinations.
The sensitivity to the anchor fraction is quite similar for all combinations. More anchors ease the
localization task, especially for Euclidean, but there is no hard threshold like for the sensitivity to
connectivity.
36.5.2 Phase 3: Renement
For brevity we do not report the effects of rening the initial positions produced by all six phase 1 /2
combinations, but limit the results to the three combinations proposed in the original papers: Sum-dist/
Min-max, Euclidean/Lateration, and DV-hop/Lateration (cf. Table 36.1). Figure 36.12 shows the coverage
2006 by Taylor & Francis Group, LLC
Distributed Localization Algorithms 36-17
DV-hop
Sum-dist
Euclidean
Lateration
Min-max
DV-hop
Sum-dist
Euclidean
Lateration
Min-max
DV-hop
Sum-dist
Euclidean
Lateration
Min-max
0
50
100
150
P
o
s
i
t
i
o
n
e
r
r
o
r
(
%
r
)
0
50
100
150
200
250
300
P
o
s
i
t
i
o
n
e
r
r
o
r
(
%
r
)
0
50
100
150
P
o
s
i
t
i
o
n
e
r
r
o
r
(
%
r
)
0 0.1 0.2 0.3 0.4 0.5
Range variance
8
(4.2)
9 10
(6.4)
11 12
(9.0)
13 14
(12.1)
15 16
(15.5)
Radio range (avg. connectivity)
0 0.05 0.1 0.15 0.2
Anchor fraction
FIGURE 36.11 Accuracy of phase 1/2 combinations.
with (solid lines) and without (dashed lines) Renement for the three selected combinations. Figure 36.13
shows the average position error, but only if the coverage exceeds 50%.
The most important observation is that Renement dramatically reduces the coverage for all com-
binations. For example, in the standard case (10% range variance, radio range 14, and 5% anchors) the
coverage for Sum-dist/Min-max and DV-hop/Lateration drops from100%to a mere 51%. For the nodes
that are not rejected Renement results in a better accuracy: the average error decreases from 42 to 23%
for DV-hop, and from 38 to 24% for Sum-dist. Other tests have revealed that Renement does not only
improve accuracy by merely ltering out bad nodes; the initial positions of good nodes are improved
as well. A second observation is that Renement equalizes the performance by Sum-dist and DV-hop.
2006 by Taylor & Francis Group, LLC
36-18 Embedded Systems Handbook
C
o
v
e
r
a
g
e
(
%
)
DV-hop, Lateration
Sum-dist, Min-max
Euclidean, Lateration
Phases 1 and 2 only
With Refinement
C
o
v
e
r
a
g
e
(
%
)
0
20
40
60
80
100
0
20
40
60
80
100
0 0.1 0.2 0.3 0.4 0.5
Range variance
8
(4.2)
9 10
(6.4)
11 12
(9.0)
13 14
(12.1)
15 16
(15.5)
Radio range (avg. connectivity)
FIGURE 36.12 Coverage after renement.
P
o
s
i
t
i
o
n
e
r
r
o
r
(
%
r
)
DV-hop, Lateration
Sum-dist, Min-max
Euclidean, Lateration
Phases 1 and 2 only
With Refinement
P
o
s
i
t
i
o
n
e
r
r
o
r
(
%
r
)
DV-hop, Lateration
Sum-dist, Min-max
Euclidean, Lateration
Phases 1 and 2 only
With Refinement
50
100
150
0
50
100
150
0 0.1 0.2 0.3 0.4 0.5
Range variance
0
8
(4.2)
9 10
(6.4)
11 12
(9.0)
13 14
(12.1)
15 16
(15.5)
Radio range (avg. connectivity)
FIGURE 36.13 Accuracy after renement.
As a consequence the simpler Sum-dist is to be preferred in combination with Renement to save on
computation and communication.
36.5.3 Communication Cost
The network simulator maintains statistics about the messages sent by each node. Table 36.2 presents a
breakdown per message type of the three original localization combinations (with Renement) on the
standard scenario.
The number of messages in Phase 1 (Flood + Calibration) is directly controlled by the ood limit
parameter, which is set to 4 by default. Figure 36.14 shows the message counts in Phase 1 for various ood
limits. Note that Sum-dist and DV-hop scale almost linearly; they level off slightly because information on
multiple anchors can be combined in a single message. Euclidean, on the other hand, levels off completely
because of the difculties in propagating anchor distances, especially along long paths.
For Sum-dist and DV-hop we expect nodes to transmit a message per anchor. Note, however, that for
low ood limits the message count is higher than expected. In the case of DV-hop, the count also includes
the calibration messages. With some ne-tuning the number of calibration messages can be limited to
one, but the current implementation needs about as many messages as the ooding itself. A second factor
that increases the number of messages for DV-hop and Sum-dist is the update information to be sent
2006 by Taylor & Francis Group, LLC
Distributed Localization Algorithms 36-19
TABLE 36.2 Average Number of Messages Per Node
Type Sum-dist DV-hop Euclidean
Flood 4.3 2.2 3.5
Calibration 2.6
Renement 32 29 20
#
M
e
s
s
a
g
e
s
p
e
r
n
o
d
e
Flood limit
DV-hop, Lateration
Sum-dist, Min-max
Euclidean, Lateration
P
o
s
i
t
i
o
n
e
r
r
o
r
(
%
r
)
Flood limit
DV-hop, Lateration
Sum-dist, Min-max
Euclidean, Lateration
0
1
2
3
4
5
6
7
8
0
50
100
150
0 2 4 6 8 10 0 2 4 6 8 10
FIGURE 36.14 Sensitivity to ood limit.
when a shorter path is detected, which happens quite frequently for Sum-dist. Finally, all three algorithms
are self-organizing and nodes send an extra message when discovering a new neighbor that needs to be
informed of the current status.
Although the ood limit is essential for crafting scalable algorithms, it affects the accuracy, see the
bottom graph in Figure 36.14. Note that using a higher ood limit does not always improve accuracy. In
the case of Sum-dist, there is a trade-off between using few anchors with accurate distance information,
and using many anchors with less accurate information. With DV-hop, on the other hand, the distance
estimates become more accurate for longer paths (last-hop effect, see Section 36.4.1). Euclideans error
only increases with higher ood limits because it starts with a low coverage, which also increases with
higher ood limits. DV-hop and Sum-dist reach almost 100% coverage at ood limits of 2 (Min-max)
or 3 (Lateration).
With the ood limit set to 4, the nodes send about 4 messages during Phase 1 (cf. Table 36.2). This
is comparable to the three messages needed by a centralized algorithm: set up a spanning tree, collect
range information, and distribute node positions. Running Renement in Phase 3, on the other hand,
is extremely expensive, requiring 20 (Euclidean) to 32 messages (Sum-dist). The problem is that Rene-
ment takes many iterations before local convergence criteria decide to terminate. We added a limit to the
number of Renement messages a node is allowed to send. The effect of this is shown in Figure 36.15.
ARenement limit of 0 means that no renement messages are sent, andRenement is skippedcompletely.
The position errors in Figure 36.15 show that most of the effect of Renement takes place in the rst
fewiterations, so hard limiting the iteration count is a valid option. For example, the accuracy obtained by
DV-hop without Renement is 42%and it drops to 28%after two iterations; an additional 4%drop can
be achieved by waiting until Renement terminates based on the local stopping criteria, but this requires
another 27 messages (29 in total). Thus the communication cost of Renement can effectively be reduced
to less than the costs for Phase 1. Nevertheless, the poor coverage of Renement limits its practical use.
36.5.4 Recommendations
From the previous discussion it follows that no single combination of phase 1, 2, and 3 alternatives
performs best under all conditions; each combination has its strengths and weaknesses. The results
2006 by Taylor & Francis Group, LLC
36-20 Embedded Systems Handbook
DV-hop, Lateration, Refinement
Sum-dist, Min-max, Refinement
Euclidean, Lateration, Refinement
DV-Hop, Lateration, Refinement
Sum-dist, Min-max, Refinement
Euclidean, Lateration, Refinement
0
20
40
60
80
100
C
o
v
e
r
a
g
e
[
%
]
0
20
40
60
80
100
P
o
s
i
t
i
o
n
e
r
r
o
r
[
%
r
]
0 2 4 6 8 10
Refinement limit
0 2 4 6 8 10
Refinement limit
FIGURE 36.15 Effect of renement limit.
TABLE 36.3 Comparison; Anchor Fraction Fixed at 5%, No Renement
Radio range (avg. connectivity)
Range
variance 16 (15.5) 14 (12.1) 12 (9.0) 10 (6.4) 8 (4.2)
0 Euclidean/Lateration Euclidean/Lateration Sum-dist/Min-max Sum-dist/Min-max DV-hop/Lateration
0.025 Sum-dist/Lateration Sum-dist/Min-max Sum-dist/Min-max Sum-dist/Min-max DV-hop/Lateration
0.05 Sum-dist/Lateration Sum-dist/Lateration Sum-dist/Min-max Sum-dist/Min-max DV-hop/Lateration
0.1 Sum-dist/Min-max Sum-dist/Lateration Sum-dist/Min-max Sum-dist/Min-max DV-hop/Lateration
0.25 DV-hop/Lateration DV-hop/Min-max DV-hop/Min-max DV-hop/Min-max DV-hop/Lateration
0.5 DV-hop/Lateration DV-hop/Min-max DV-hop/Min-max DV-hop/Min-max DV-hop/Lateration
presented in Section 36.5 follow from changing one parameter (radio range, range variance, and anchor
fraction) at a time. Since the sensitivity of the localization algorithms may not be orthogonal in the three
parameters, it is difcult to derive general recommendations. Therefore, we conducted an exhaustive
search for the best algorithmin the three-dimensional parameter space. For readability we do not present
the rawoutcome, a 665 cube, but showa two-dimensional slice instead. We found that the localization
algorithms are the least sensitive to the anchor fraction, so Table 36.3 presents the results of varying the
radio range and the range variance, while keeping the anchor fraction xed at 5%. In each case we list the
algorithmthat achieves the best accuracy (i.e., the lowest average position error) under the condition that
its coverage exceeds 50%. Since Renement often results in very poor coverage, we only examine Phases 1
and 2 here.
The exhaustive parameter search, and basic observations about Renement, lead to the following
recommendations:
1. Euclidean should always be used in combination with Lateration, but only if distances can be
measured very accurately (range variance <2%) and the network has a high connectivity (12).
When the anchor fraction is increased, Euclidean captures some more entries in the left-upper
corner of Table 36.3, and the conditions on range variance and connectivity can be relaxed slightly.
Nevertheless, the window of opportunity for Euclidean/Lateration is rather small.
2. DV-hop should be used when there are no or poor distance estimates, for example, those obtained
from the signal strength (cf. the bottom rows in Table 36.3). Our results show that DV-hop
outperforms the other methods when the range variance is large (>10%in this slice). The presence
of Lateration in the last column, that is, with a very low connectivity, is an artifact caused by
the ltering on coverage. DV-hop/Min-max has a coverage of 49% in this case (versus 56% for
DV-hop/Lateration), but also a much lower error. Regarding the issue of combining DV-hop with
Lateration or Min-max, we observe that overall, Min-max is the preferred choice. Recall, however,
its sensitivity for anchor placement leading to large errors at the edges of the network.
2006 by Taylor & Francis Group, LLC
Distributed Localization Algorithms 36-21
3. Sum-dist performs best in the majority of cases, especially if the anchor fraction is (slightly)
increased above 5%. Increasing the number of anchors reduces the average path length between
nodes and anchors, limiting the accumulation of range errors along multiple hops. Except for a few
corner cases, Sum-dist performs best in combination with Min-max. In scenarios with very low
connectivity and a low anchor fraction, Sum-dist tends to overestimate the distance signicantly.
Therefore DV-hop performs better in the far right column of Table 36.3.
4. Renement can be used to improve the accuracy of the node positions when the range estimates
between neighboring nodes are quite accurate. The best results are obtained in combination with
DV-hop or Sum-dist, but at a signicant (around 50%) drop in coverage. This renders the usage
of Renement questionable, despite its the modest communication overhead.
A nal important observation is that the localization problem is still largely unsolved. In ideal condi-
tions Euclidean/Lateration performs ne, but in all other cases it suffers from severe coverage problems.
Although Renement uses the extra information of the many neighbor-to-neighbor ranges and reduces
the error, it too suffers fromcoverage problems. Under most conditions, there is still signicant roomfor
improvement.
36.6 Conclusions
This chapter addressed the issue of localization in ad-hoc wireless sensor networks. From the known
localization algorithms specically proposed for sensor networks, three approaches were selected that
meet basic requirements of self-organization, robustness, and energy-efciency: Ad-hoc positioning [11],
Robust positioning [13], and N-hop multilateration [14]. Although these three algorithms were developed
independently, they share a common structure. We were able to identify a generic, 3-phase approach to
determine the individual node positions consisting of the steps below:
1. Determine the distances between unknowns and anchor nodes.
2. Derive for each node a position fromits anchor distances.
3. Rene the node positions using information about the range to, and positions of, neighboring
nodes.
We studied three Phase 1 alternatives (Sum-dist, DV-hop, and Euclidean), two Phase 2 alternatives
(Lateration and Min-max) and an optional Renement procedure for Phase 3. To this end the discrete
event simulator developed by Savarese et al. [13] was extended to allow for the execution of an arbitrary
combination of alternatives.
Section 36.4 dealt with Phase 1 and Phase 2 in isolation. For Phase 1 alternatives, we studied the
sensitivity to range errors, connectivity, and fraction of anchor nodes (with known positions). DV-hop
proved to be stable and predictable, Sum-dist and Euclidean showed tendencies to under estimate the
distances betweenanchors andunknowns. Euclideanwas foundto have difculties inpropagating distance
information under non-ideal conditions, leading to low coverage in the majority of cases. The results for
Phase 2 showed that Lateration is capable of obtaining very accurate positions, but also that it is very
sensitive to the accuracy and precision of the distance estimates. Min-max is more robust, but is sensitive
to the placement of anchors, especially at the edges of the network.
In Section 36.5 we compared all six phase 1/2 combinations under different conditions. No single
combination performs best; which algorithm is to be preferred depends on the conditions (range errors,
connectivity, anchor fraction, and placement). The Euclidean/Lateration combination [11] should be used
only in the absence of range errors (variance <2%) and requires a high node connectivity. The DV-hop/
Min-max combination, which is a minor variation on the DV-hop/Lateration approach proposed in [11]
and [13], performs best when there are no or poor distance estimates, for example, those obtained from
the signal strength. The Sum-dist/Min-max combination [14] is to be preferred in the majority of other
conditions. The benet of running Renement in Phase 3 is considered to be questionable since in many
cases the coverage dropped by 50%, while the accuracy only improved signicantly in the case of small
2006 by Taylor & Francis Group, LLC
36-22 Embedded Systems Handbook
range errors. The communication overhead of Renement was shown to be modest (2 messages per node)
in comparison to the controlled ooding of Phase 1 (4 messages per node).
36.6.1 Future Work
Regarding the future, the ultimate distributed localization algorithm is yet to be devised. Under ideal
circumstances Euclidean/Lateration performs ne, but in all other cases there is signicant room for
improvement. Furthermore, additional effort is needed to bridge the gap between simulations and real-
world localization systems. For instance, we need to gather more data on the actual behavior of sensor
nodes, particularly with respect to physical effects like multipath, interference, and obstruction.
Acknowledgments
This work was rst published in Elsevier Computer Networks [24]. We thank Elsevier for giving us
permission to reproduce the material. We also thank Andreas Savvides and Dragos Niculescu for their
input and for sharing their code with us.
References
[1] I. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. A survey on sensor networks. IEEE
Communications Magazine, 40: 102114, 2002.
[2] S. Atiya and G. Hager. Real-time vision-based robot localization. IEEE Transactions on Robotics
and Automation, 9: 785800, 1993.
[3] J. Leonard and H. Durrant-Whyte. Mobile robot localization by tracking geometric beacons. IEEE
Transanctions on Robotics and Automation, 7: 376382, 1991.
[4] R. Tinos, L. Navarro-Serment, and C. Paredis. Fault tolerant localization for teams of distributed
robots. InIEEEInternational Conference onIntelligent Robots and Systems, Vol. 2, Maui, HI, October
2001, pp. 10611066.
[5] J. Hightower and G. Borriello. Location systems for ubiquitous computing. IEEE Computer,
34: 5766, 2001.
[6] N. Bulusu, J. Heidemann, and D. Estrin. GPS-less low-cost outdoor localization for very small
devices. IEEE Personal Communications, 7: 2834, 2000.
[7] S. Capkun, M. Hamdi, and J.-P. Hubaux. GPS-free positioning in mobile ad-hoc networks. Cluster
Computing, 5: 157167, 2002.
[8] J. Chen, K. Yao, and R. Hudson. Source localization and beamforming. IEEE Signal Processing
Magazine, 19: 3039, 2002.
[9] L. Doherty, K. Pister, and L. El Ghaoui. Convex position estimation in wireless sensor networks.
In IEEE Infocom 2001, Anchorage, AK, April 2001.
[10] T. He, C. Huang, B. M. Blum, J. A. Stankovic, andT. Abdelzaher. Range-free localizationschemes for
large scale sensor networks. InACMInternational Conference on Mobile Computing and Networking
(Mobicom), San Diego, CA, September 2003, pp. 8195.
[11] D. Niculescu and B. Nath. Ad-hoc positioning system. In IEEE GlobeCom, San Antonio, TX,
November 2001, pp. 29262931.
[12] V. Ramadurai and M. Sichitiu. Localization in wireless sensor networks: a probabilistic
approach. In International Conference on Wireless Networks (ICWN), Las Vegas, NV, June 2003,
pp. 275281.
[13] C. Savarese, K. Langendoen, and J. Rabaey. Robust positioning algorithms for distributed ad-hoc
wireless sensor networks. In USENIX Technical Annual Conference, Monterey, CA, June 2002,
pp. 317328.
2006 by Taylor & Francis Group, LLC
Distributed Localization Algorithms 36-23
[14] A. Savvides, H. Park, and M. Srivastava. The bits and ops of the n-hop multilateration primitive
for node localization problems. In Proceedings of the First ACM International Workshop on Wireless
Sensor Networks and Application (WSNA), Atlanta, GA, September 2002, pp. 112121.
[15] N. Priyantha, A. Chakraborty, and H. Balakrishnan. The cricket location-support system.
In Proceedings of the 6th ACM International Conference on Mobile Computing and Networking
(Mobicom), Boston, MA, August 2000, pp. 3243.
[16] P. Bahl and V. Padmanabhan. RADAR: an in-building RF-based user location tracking system.
In Infocom, Vol. 2, Tel Aviv, Israel, March 2000, pp. 575584.
[17] J. Hightower, R. Want, and G. Borriello. SpotON: an indoor 3Dlocation sensing technology based
on RF signal strength. UW CSE 00-02-02, University of Washington, Department of Computer
Science and Engineering, Seattle, WA, February 2000.
[18] J. Zhao and R. Govindan. Understanding packet delivery performance in dense wireless sensor
networks. In Proceedings of the First International Conference on Embedded Networked Sensor
Systems (SenSys), Los Angeles, CA, November 2003, pp.113
[19] A. Savvides, C.-C. Han, and M. Srivastava. Dynamic ne-grained localization in ad-hoc networks
of sensors. In Proceedings of the 7th ACM International Conference on Mobile Computing and
Networking (Mobicom), Rome, Italy, July 2001, pp. 166179.
[20] L. Girod and D. Estrin. Robust range estimation using acoustic and multimodal sensing. In
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Maui, Hawaii,
October 2001.
[21] Y. Xu, J. Heidemann, and D. Estrin. Geography-informed energy conservation for ad-hoc routing.
In Proceedings of the 7th ACM International Conference on Mobile Computing and Networking
(Mobicom), Rome, Italy, 2001, pp. 7084.
[22] A. Varga. The OMNeT++ discrete event simulation system. In European Simulation Multiconfer-
ence (ESM2001), Prague, Czech Republic, June 2001.
[23] K. Whitehouse and D. Culler. Callibration as parameter estimation in sensor networks. In Proceed-
ings of the 1st ACM International Workshop on Wireless Sensor Networks and Application (WSNA),
Atlanta, GA, September 2002, pp. 5967.
[24] K. Langendoen and N. Reijers. Distributed localization in wireless sensor networks: a quantitative
comparison. Elsevier Computer Networks, 43: 499518, 2003.
2006 by Taylor & Francis Group, LLC
37
Routing in Sensor
Networks
Shashidhar Gandham and
Ravi Musunuri
University of Texas at Dallas
Udit Saxena
Microsoft Corporation
37.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37-1
37.2 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37-2
Flat Routing Protocols Cluster-Based Routing Protocols
37.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37-7
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37-7
37.1 Introduction
Sensor networks are expected to be deployed in large numbers for applications, such as environmental
monitoring, surveillance, security, and precession agriculture [14]. Each sensor node consists of a sensing
device, processor with limited computational capabilities, memory, and wireless transceiver. These nodes
are typically deployed in inaccessible terrain in an ad hoc manner. Once deployed, each sensor node
is expected to periodically monitor its surrounding environment and detect the occurrence of some
predetermined events. For example, in a sensor network deployed for monitoring forest res any sudden
surge in the temperature of the surrounding area would be an event of interest. Similarly, in a sensor
network deployed for surveillance any moving object in the surroundings would be an event of interest.
On detecting an event a sensor node is expected to report the details of the event to a base station associated
with the sensor network. In most cases, the base station might not be in direct reach of the reporting
nodes. Hence, sensor nodes need to forma multihop wireless network to reach the base station. Amedium
access control protocol and a routing protocol are essential in setting up such a wireless network. In this
chapter we present an overview of design challenges associated with routing in sensor networks and some
existing routing protocols.
It is known that each sensor node is powered by limited battery supplied energy. Nodes drain their
energy in carrying local tasks and in communicating with neighboring nodes. The amount of energy spent
in communication is known to be order of magnitudes higher than the amount spent in local tasks [5]. As
stated earlier, sensor nodes are expected to be deployed in inaccessible terrain and it might not be feasible
to replenish the energy available at each node. Thus, energy available at sensor nodes is an important
design constraint in routing.
37-1
2006 by Taylor & Francis Group, LLC
37-2 Embedded Systems Handbook
37.2 Routing
Each sensor node is expected to monitor some environmental phenomenon and forward the corres-
ponding data toward the base station. To forward the data packets each node needs to have the routing
information. Here, we would like to state that the ow of packets is mostly directed from sensor nodes
toward the base station. As a result, each sensor node need not maintain explicit routing tables. Routing
protocols can in general be divided into at routing and cluster-based routing protocols.
37.2.1 Flat Routing Protocols
In at routing protocols the nodes in the network are considered to be homogeneous. Each node in
the network participates in route discovery, maintainance, and forwarding of the data packets. Here, we
describe few existing at routing protocols for the sensor networks.
37.2.1.1 Sequential Assignment Routing
Sequential Assignment Routing (SAR) [5] takes into consideration the energy and Quality of Service
(QoS) for each path, and the priority level of each packet for making routing decisions. Every node
maintains multiple paths to the sink to avoid the overhead of route recomputation due to the node or
link failure. Multiple paths are built by building multiple trees rooted at the one-hop sink neighbors.
Each tree is grown outward by successively adding more nodes to one-hop sink neighbors, while avoiding
nodes with lower QoS and energy reserves. Each sensor node can control, which of the neighbors can
be used for relaying a message. Each node associates two parameters, an additive QoS metric and energy
measure, with every path. Energy is measured by estimating the maximum number of packets that can
be routed without energy being depleted if this node is exclusively using the path. SAR then calculates a
weighted QoS metric as the product of the additive QoS metric and a weighted coefcient associated with
the priority level of the packet. The SAR algorithmattempts to minimize the average weighted QoS metric
in the lifetime of the network. A periodic recomputation of paths is triggered by the sink to account for
any changes in the topology. Failure recovery is done by a handshaking procedure between neighbors.
37.2.1.2 Directed Diffusion
Estrin et al. [2] proposed a diffusion-based scheme for routing queries from base station to sensor nodes
and forwarding corresponding replies. In directed diffusion, an attribute-based naming is used by the
sensor nodes. Each sensor names data that it generates using one or more attributes. A sink may query for
data by disseminating interests. Intermediate nodes propagate these interests. Interests establish gradients
of data toward the sink that expressed that interest. For example, a seismic sensor may generate a data:
type =seismic, id =12, location =NE, time stamp =01.01.01, footprint =vehicle/wheeled/over 40 tons.
A sink may send an interest of the form: type =seismic, location =NE. The intermediate nodes may
send an interest for data for vehicle data in the NE quadrant toward the approximate direction. The
strength of the gradient may be different toward different neighbors resulting in different amounts of
information ow.
37.2.1.3 Minimum Cost Forwarding Algorithm for Large Sensor Networks
The minimum cost forwarding approach proposed by Ye et al. [6] exploits the fact that the data ow in
sensor networks is in a single direction and is always toward the xed base station. Their method neither
requires sensor nodes to have unique identity nor maintain routing tables to forward the messages. Each
node maintains the least cost estimate from itself to the base station. Each message to be forwarded is
broadcasted by the node. On receiving a message, the node checks if it is on the least cost path between
the source sensor node and the base station. If so it would forward the message by broadcasting.
In principle, the concept behind minimum cost forwarding is similar to the gravity eld that drives
waterfalls from top of mountain to the ground. At each point water ows from a high post to a low
post along the shortest path. For this algorithm to work each node needs to have the least cost estimate
from itself to the base station. The base station broadcasts an advertisement message with the cost set to
2006 by Taylor & Francis Group, LLC
Routing in Sensor Networks 37-3
zero. Every node initially has the estimate set to innity. On receiving an advertisement message, every
node checks if the estimate in the message plus the cost of link on which it is received, is less than the
current estimate. If so the current estimate and the estimate in the advertisement message is updated. If
the received advertisement message is updated with a new cost estimate, it is forwarded, else it is purged.
As a result of forwarding advertisement message immediately after updating, it was noticed by the
authors that some nodes will get multiple updates and do multiple forwards as lesser cost estimates
ow in. Furthermore, the nodes far away from the base station get more updates than those close to
the base station. To avoid this instability during the setup phase, a back off algorithm was proposed.
According to this back off algorithm on updating the current cost estimate, the advertisement message is
not forwarded until A C
node
units of time. Where, A is a constant determined through simulations and
C
node
is the cost of link on which the advertisement message was received.
37.2.1.4 Flow-Based Routing Protocol
In Reference 7, the authors proposed to model the sensor network as a ow network and have proposed
an Integer Linear Program (ILP) [8] based routing method. The objective of this ILP-based method is to
minimize the maximum energy spent by any sensor node during a period of time. Through simulation
results, the authors have shown that our ILP-based routing heuristic increases the lifetime of sensor
network signicantly.
In the above-mentioned study, the authors observed that the sensor nodes that are one-hop away from
a base station (sink) drain their energy much earlier than other nodes in the network. As a result, the base
station is disconnected from the network. To address this problem deployment of multiple, intermittently
mobile base stations was proposed. The operation time of sensor network was split into equal periods of
time referred to as rounds. Modications were made to the ow network model and the ILP such that,
solving it gives locations of base stations in addition to the routing information. The ILPwith modication
is given below:
Minimize E
max
jN(i)
x
ij
kN(i)
x
ki
= T, i V (37.1)
E
t
jN(i)
x
ij
+E
r
kN(i)
x
ki
RE
i
, i V (37.2)
l V
f
y
l
K
max
(37.3)
iV
s
x
ik
T|V
s
|y
k
, k V
f
(37.4)
E
t
jN(i)
x
ij
+E
r
kN(i)
x
ki
E
max
, i V (37.5)
x
ij
0, i V
s
, j V; y
k
{0, 1}, k V
f
(37.6)
In formulating the above ILP, the sensor network is represented as a graph G(V, E) where (1) V = V
s
V
f
where V
s
represents the sensor nodes and V
f
represents the feasible sites; (2) E V V represents the
set of wireless links. 01 integer variables y
l
are dened such that for each l V
f
, y
l
= 1 if a base station
is located at feasible site l ; 0 otherwise. N(i) = {j: (i, j) E}. K
max
and RE
i
represent the maximum
number of base stations available and residual energy of nodes, respectively. Given G(V, E), and K
max
,
the above ILP, denoted by BSL
mm
(G, , K
max
), minimizes the maximum energy spent, E
max
, by a sensor
node in a round. For a detailed explaination of the ILP, we refer the readers to Reference 7.
Apart from increasing the lifetime of network, the authors argued that multiple, mobile base stations
would decrease the average hop length taken by each packet and increase the robustness of the system.
2006 by Taylor & Francis Group, LLC
37-4 Embedded Systems Handbook
37.2.1.5 Sensor Protocols for Information via Negotiation
Kulik and coworkers [9] a set of protocols to disseminate individual sensor information to all the sensor
nodes. Sensor Protocols for Information via Negotiation (SPIN) overcomes information implosion and
overlap by using negotiation and information descriptors (metadata). Classic ooding suffers from the
problem of implosion in that the information is sent to all nodes regardless of whether they have already
seen that information or not. Another problem is that of overlap of information where two pieces of
information might have some components in common, so it might be sufcient to just forward the
information after removing the common part. SPIN uses three kinds of messages to communicate:
ADV when a node has data to send, it advertises it using this message.
REQ a node sends this message when it wishes to receive some data.
DATA data message contain the data with a metadata header.
The details are as follows:
1. SPIN-PP. This protocol is designed for point-to-point communication, assuming that two nodes can
communicate with each other without interfering with other nodes communication. This protocol also
assumes that energy is not a constraint and packets are never lost. This protocol works on a hop-by-hop
basis. A node that has information to send advertises this by sending an ADV to its neighboring nodes.
The nodes who are interested in receiving this information express their interest by sending an REQ. The
originator of the ADV then sends the data to the nodes that sent an REQ. These nodes then send ADV
messages to their neighbors and the process repeats itself.
2. SPIN-EC. This protocol adds an energy heuristic to the previous protocol. Anode participates in the
process only if it can complete all the stages in the protocol without going below a low-energy threshold.
3. SPIN-BC. This protocol was dened for broadcast channels. The advantage is that all nodes within
hearing range canhear a broadcast while the disadvantage is that the nodes have to desist fromtransmitting
if the channel is already in use. Another difference from the previous protocols is that nodes do not
immediately send out REQ messages on hearing an ADV. Each node sets a random timer and on expiry
of that timer sends out the REQ message. The other nodes whose timer have not yet expired cancel it on
hearing the request thus preventing redundant copies of the request being sent again.
4. SPIN-RL. This protocol was designedfor lossy broadcast channels by incorporating twoadjustments.
First, each node keeps track of the advertisements it receives and re-requests data if a response from the
requested node is not received within a specied time interval. Second, nodes limit the frequency with
which they will resend data. Every node waits for a predetermined time period before servicing requests
for the same piece of data again.
Multihop at routing can also be subdivided according to the signal processing techniques. There
are two types of cooperative signal processing techniques: noncoherent and coherent. For noncoherent
processing, raw data is preprocessed at the node itself before forwarding it to the Central Node (CN)
for further processing. For coherent processing, the data is forwarded after minimum processing to the
CN. The processing at the node involves operations, such as time stamping. Thus for energy efciency,
algorithmic techniques assume importance for coherent processing since the data trafc is low, while path
optimality is important for coherent processing.
37.2.1.6 Geographic Routing Protocols
Geographic routing protocols are basedonthe assumptionthat eachnode is aware of geographical location
of its neighbors and the destination node. There are many known location determination algorithms
[10,11], which would enable sensor nodes to learn about their location once deployed. On determining
its position, each node can inform its neighbors about its location. In addition, the dataow in sensor
networks is mostly directed toward a base station whose position can be sent to the nodes on deployment.
The basic idea ingeographic routing protocols is toforwardpackets toa neighbor that is closer tothe des-
tination. Every node employs the same forwarding strategy until the packet reaches the destination node.
It is known that this simple packet-forwarding strategy suffers from local minimum phenomenon [12].
2006 by Taylor & Francis Group, LLC
Routing in Sensor Networks 37-5
Packets might reach a node whose neighbors are all further away from the destination. Thus, they are
struck with no further node to which they can be forwarded.
Karp and Kung [12] proposed the right-hand rule to overcome the local minimum phenomenon. They
assume that the underlying connectivity graph is planar. When a packet gets struck at a node, they propose
to forward the packet along the face of the graph in counterclockwise direction. Face routing is employed
until the packet reaches a node that is closer to the destination. Fang et al. [13] show that local minimum
phenomenon can be addressed in nonplanar graphs too.
37.2.1.7 Parametric Probabilistic Routing
In parametric probabilistic routing protocol, proposed by Barrett et al. [14] each node forwards a packet
based on a probability density function. Barrett et al. proposed two variations of their protocol. In the
rst variation, referred to as a Destination Attractor, the probability with which a packet is forwarded
to a neighbor depends on the number hops the source node is from the destination and the number
hops the current node is from the destination. The basic idea behind this variation is to increase the
probability of retransmission if the packet is approaching the destination; and to decrease the probability
of retransmission if the packet is going away from the destination. The second variation, referred to as
Directed Transmission, uses number hops already traversed by the packet in addition to the two parameters
used by destination attractor. In directed transmission, nodes on the shortest path to the destination
retransmit with higher probability.
37.2.1.8 MinMinMax, an Energy Aware Routing Protocol
Gandham et al. [15] have formulated the energy aware routing during a round as described below: the
sensor network is represented as a graph G(V, E) where
1. V = V
s
V
b
where V
s
is the set of sensor nodes and V
b
is the set of base station(s).
2. E V V represents the set of wireless links.
A round is assumed to consist of T time frames and each sensor node generates one packet of data in
every time frame. At the beginning of a round the residual energy at a sensor node i is represented by RE
i
.
During a round, the total energy spent by sensor node i can be at most RE
i
, where (0 < 1) is
a parameter. The goal is to determine routing information so as to minimize the total energy spent in the
network such that the maximum energy spent by a node in a round is minimized.
It is known that the energy spent by a node is directly proportional to the amount of ow (number
of packets) passing through the node. Thus, minimizing the maximum energy spent by a node is same
as minimizing the maximum ow through a node. Exploiting this fact, energy aware routing is cast as a
variant of the maximum ow problem [16]. In the maximum ow problem [16], given a directed graph
G(V, E), supply node S
s
and demand node S
d
and capacity u
ij
of each link (i, j) E. We need to determine
the ow x
ij
on each arc (i, j) E such that the net outow from the supply node is maximized. We refer
the readers to Reference 15 for the details.
37.2.2 Cluster-Based Routing Protocols
In cluster-based routing protocols, special nodes referred to as cluster heads, discover and maintain routes
and noncluster head nodes join one of the clusters. All the data packets originating in the cluster are
forwarded toward the cluster head. The cluster head in turn will forward these packets toward destination
using the routing information. Here, we describe some cluster-based routing protocols fromthe literature.
37.2.2.1 Low-Energy Adaptive Clustering Hierarchy
Chandrakasan and coworkers [17] proposed the Low-Energy Adaptive Clustering Hierarchy (LEACH)
as an energy-efcient communication protocol for wireless sensor networks. Authors of LEACH claim
that this protocol will extend the life of wireless sensor networks by a factor of 8, when compared with
protocols based on multihop routing and static clustering. LEACH is a cluster-based routing algorithm
in which self-elected cluster heads collect data from all the sensor nodes in their cluster, aggregate the
2006 by Taylor & Francis Group, LLC
37-6 Embedded Systems Handbook
collected data by data fusion methods, and transmit the data directly to the base station. These self-elected
cluster heads continue to be cluster heads for a period referred to as a round. At the beginning of each
round, every node determines if it can be a cluster head during the current round. If it decides to be a
cluster head for the current round it announces its decision to its neighbors. Other nodes that choose not
to be cluster heads opt to join one of the cluster heads on listening to these announcements, based on
predetermined parameters, such as signal to noise ratio.
LEACH is proposed for routing data in wireless sensor networks, which have a xed base station to
which the recorded data needs to be routed. All the sensor nodes are considered to be static, homogeneous,
and energy constrained. The sensor nodes are expected to sense the environment continuously and thus
have data to be sent at a xed rate. This assumption makes it unsuitable for sensor networks where a
moving source needs to be monitored. Furthermore, radio channels are assumed to be symmetric. The
term symmetric here means that energy required to transmit a particular message between two nodes is
same in either direction. A rst-order radio model is assumed to describe the transmission characteristics
of the sensor nodes. In this model energy required to transmit a signal has a xed part and a variable part.
The variable part is directly proportional to square of the distance. Some constant energy is required to
receive a signal by any receiving antenna. Based on these assumptions it is clear that having too many
intermediate nodes to route the data might consume more energy, on a global perspective, when compared
to direct transmission to the base station. This argument supports the decision to transmit the aggregated
data directly from cluster head to base station.
The key features of LEACH are localized coordination for cluster setup and operation, randomized
rotation of cluster heads and local fusion of data to reduce global communication costs. LEACH is
organized into rounds where each round starts with a setup phase followed by a longer steady-state data
transfer phase. Here we describe various subphases involved in both these phases:
1. Advertisement phase. Apredetermined fraction of nodes, say p, elect themselves as cluster heads. The
optimum value of p can be found from the plot between normalized energy dissipation and percentage of
nodes acting as cluster heads. For detailed description of this procedure we refer the reader to Reference 18.
The decision to be a cluster head is made by choosing a random number between 0 and 1. If the generated
number is less than a threshold T(n) then the node will be a cluster head for current round. The threshold
T(n) is given by the expression p/[1 p(r mod (1/p))]. This would ensure that every node would be a
cluster head once in 1/p rounds. Once the decision is made, cluster heads advertise their id and this is
done by employing CSMA MAC protocol.
2. Cluster setup phase. On listening to advertisements in the previous phase, noncluster head nodes
determine which cluster head to join by comparing signal to noise ratio from various cluster heads
surrounding it. Each node informs the cluster head of the cluster that it decides to join by employing
CSMA MAC protocol.
3. Schedule creation. On receiving all the messages, cluster head creates a TDMA schedule and
announces it to all the nodes in the cluster. In order to avoid interference between nodes in adjacent
clusters, cluster head determines the CDMA code to be used by all the nodes in its cluster. This CDMA
code to be used in the current round is transmitted along with the TDMA schedule.
4. Data transmission. Once the schedule is known each node will transmit the data during the time slot
allocated to it. When the cluster head receives data from all the nodes in its cluster it will run some data
fusion algorithms to aggregate the data. The resulting data is transmitted directly to the base station.
37.2.2.2 Threshold Sensitive Energy-Efcient Sensor Network Protocol
In Reference 19, the authors have classied sensor networks into proactive networks and reactive networks.
Nodes in proactive networks continuously monitor the environment and thus have the data to be sent at
a constant rate. LEACH suits such sensor networks in transmitting data efciently to the base station. In
case of the reactive sensor networks, nodes need to transmit the data only when an event of interest occurs.
Hence, all the nodes in the network do not have equal amount of data to be transmitted. Manjeshwar
2006 by Taylor & Francis Group, LLC
Routing in Sensor Networks 37-7
and Agrawal [19] proposed Threshold Sensitive Energy-Efcient Sensor Network (TEEN) for routing in
reactive sensor networks.
TEEN employs the cluster formation strategy of LEACH but adopts a different strategy in data trans-
mission phase. TEEN makes use of two user-dened parameters hard threshold (Ht) and soft threshold
(St) to determine if it needs to transmit the value it sensed currently. When the monitored value exceeds Ht
for the rst time, it is stored in a variable and is transmitted during the time slot of the node. Subsequently,
if the monitored value exceeds the currently stored value by a magnitude of St then the node will transmit
the data. This transmitted value is stored for comparing in future.
37.2.2.3 Two-Level Clustering Algorithm
Estrin et al. [2] proposed a two-level clustering algorithmthat can be extended to build a cluster hierarchy.
In this algorithm every sensor at a particular level is associated with a radius or the number of hops
that its advertisements will reach. Sensors at a higher level are associated with higher radii. All sensors
start with level 0. Each sensor sends out periodic advertisements to other nodes that are within its
radius. The advertisements carry its current level, its parents identity (if any) and the remaining energy.
After transmitting the advertisement, each node will wait for a time proportional to its radius to receive
advertisements from other nodes. At the end of the wait time all level 0 nodes start a promotion timer that
is proportional to its remaining energy reserves and the number of level 0 nodes whose advertisements
it received. When the promotion timer expires the node promotes itself to level 1 and starts sending out
periodic advertisements. In these new advertisements it lists its potential children which are the level 0
nodes that it previously heard. A level 0 node then picks up its parent from one of the level 1 nodes, whose
advertisements included its identity. Once a level 0 node picks up its parent it cancels its promotion timer
and drops out of the race. At the end, each of the level 1 node starts a wait timer and waits for its potential
childrens acknowledgments. If no level 0 node selected it as its parent or its energy dropped below a
certain level it demotes itself to a level 0 node. All level 0 and level 1 nodes periodically enter the wait stage
to take into account any change in network conditions and reclustering takes place.
37.3 Conclusions
Inthis chapter we presenteda brief overviewof some knownrouting algorithminwireless sensor networks.
Both at and cluster-based routing algorithms were discussed.
References
[1] Estrin, D., Girod, L., Pottie, G., and Srivastava, M. Instrumenting the world with wireless sensor
networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing, IEEE, 2001, pp. 20332036.
[2] Estrin, D., Govindan, R., Heidemann, J., and Kumar, S. Next century challenges: scalable coordin-
ation in sensor networks. In Proceedings of the 5th Annual ACM/IEEE International Conference on
Mobile Computing and Networking, IEEE, 1999, pp. 263270.
[3] Pottie, G.J. and Kaiser, W.J. Wireless integrated network sensors. Communications of the ACM, 43,
5158, 2000.
[4] Pottie, G.J. Wireless sensor networks. In Proceedings of the Information Theory Workshop, 1998,
pp. 139140.
[5] Sohrabi, K., Gao, J., Ailawadhi, V., and Pottie, G.J. Protocols for self-organization of a wireless
sensor network. IEEE Personal Communications, 7, 1627, 2000.
[6] Ye, F., Chen, A., Liu, S., and Zhang, L. A scalable solution to minimum cost forwarding in large
sensor networks. In Proceedings of the 10th International Conference on Computer Communications
and Networks, 2001, pp. 304309.
2006 by Taylor & Francis Group, LLC
37-8 Embedded Systems Handbook
[7] Gandham, Shashidhar Rao, Dawande, Milind, Prakash, Ravi, and Venkatesan, S. Energy efcient
schemes for wireless sensor networks with multiple mobile stations. In Proceedings of the IEEE
Globecom, IEEE, 2003.
[8] Nemhauser, G.L. and Wolsey, L.A. Integer Programming and Combinatorial Optimization. John
Wiley & Sons, New York, 1988.
[9] Heinzelman, W., Kulik, J., and Balakrishnan, H. Negotiation-based protocols for disseminating
information in wireless sensor networks. In Proceedings of the 5th Annual ACM/IEEE International
Conference on Mobile Computing and Networking, IEEE, 1999.
[10] Saikat Ray, Rachanee Ungrangsi, Francesco De Pellegrini, Ari Trachtenberg, andDavidStarobinski.
Robust location detection in emergency sensor networks. In Proceedings of the INFOCOM, IEEE,
2003.
[11] Nirupama Bulusu, John Heidemann, and Deborah Estrin. GPS-less low cost outdoor localization
for very small devices. Technical report 00-729, USC/ISI, April 2000.
[12] Karp, B. and Kung, H. GPRS: greedy perimeter stateless routing for wireless networks. In
Proceedings of the Mobicom, ACM, 2000.
[13] Qing Fang, Jie Gao, and Leonidas J. Guibas. Locating and bypassing routing holes in sensor
networks. In Proceedings of the INFOCOM, IEEE, 2004.
[14] Christopher L. Barrett, Stephan J. Eidenbenz, Lukas Kroc, Madhav Marathe, and James P. Smith.
Parametric probabilistic sensor network routing. In Proceedings of the WSNA03, 2003.
[15] Shashidhar Gandham, Milind Dawande, and Ravi Prakash. An integral ow-based energy-efcient
routing algorithm for wireless sensor networks. In Proceedings of the WCNC, IEEE, 2004.
[16] Ahuja, R.K. and Orlin, J.B. Afast and simple algorithmfor the maximumowproblem. Operations
Research, 37, 748759, 1989.
[17] Heinzelman, W.R., Chandrakasan, A., and Balakrishnan, H. Energy-efcient communication
protocol for wireless micro sensor networks. In Proceedings of the 33rd Annual Hawaii International
Conference on System Sciences, 2000, pp. 30053014.
[18] Heinzelman, W., Kulik, J., and Balakrishnan, H. Adaptive protocols for information dissemination
in wireless sensor networks. In Proceedings of the 5th Annual ACM/IEEE International Conference
on Mobile Computing and Networking, IEEE, 1999, pp. 174185.
[19] Manjeshwar, A. and Agrawal, D.P. TEEN: a routing protocol for enhanced efciency in wire-
less sensor networks. In International Proceedings of the 15th Parallel and Distributed Processing
Symposium, 2001, pp. 20092015.
2006 by Taylor & Francis Group, LLC
38
Distributed Signal
Processing in Sensor
Networks
Omid S. Jahromi
Bioscrypt Inc.
Parham Aarabi
University of Toronto
38.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38-1
38.2 Spectrum Estimation Using Sensor Networks . . . . . . . . . 38-3
Background Mathematical Formulation of the Problem
38.3 Inverse and Ill-Posed Problems . . . . . . . . . . . . . . . . . . . . . . . . . 38-6
Ill-Posed Linear Operator Equations Regularization
Methods for Solving Ill-Posed Linear Operator Equations
38.4 Spectrum Estimation Using Generalized Projections 38-9
38.5 Distributed Algorithms for Calculating Generalized
Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38-11
The Ring Algorithm The Star Algorithm
38.6 Concluding Remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38-17
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38-17
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38-17
38.1 Introduction
Sensors are vital means for scientists and engineers to observe physical phenomena. They are used to
measure physical variables such as temperature, pH, velocity, rotational rate, ow rate, pressure, and many
others. Most modern sensors output a discrete-time (digitized) signal that is indicative of the physical
variable they measure. Those signals are often imported into digital signal processing (DSP) hardware,
stored in les or plotted on a computer display for monitoring purposes.
In recent years there has been an emergence of a number of new sensing concepts which advocate
connecting a large number of inexpensive and small sensors in a sensor network. The trend to network
many sensors together has been reinforced by the widespread availability of cheap embedded processors
and easily accessible wireless networks. The building blocks of a sensor network, often called Motes, are
self-contained, battery-powered computers that measure light, sound, temperature, humidity, and other
environmental variables (Figure 38.1).
Motes can be deployed in large numbers providing enhanced spatio-temporal sensing coverage in ways
that are either prohibitively expensive or impossible using conventional sensing assets. For example, they
can allow monitoring of land, water, and air resources for environmental monitoring. They can also be
used to monitor borders for safety and security. In defence applications, sensor networks can provide
38-1
2006 by Taylor & Francis Group, LLC
38-2 Embedded Systems Handbook
FIGURE 38.1 A wireless sensor node or Mote made by Crossbow Technology, Inc. in San Jose, CA.
enhanced battleeld situational awareness which can revolutionize a wide variety of operations from
armored assault on open terrain to urban warfare. Sensor networks have many potential applications in
biomedicine, factory automation, and control of transportation systems as well.
In principle, a distributed network of sensors can be highly scalable, cost effective, and robust with
respect to individual Motes failure. However, there are many technological hurdles that must be overcome
for sensor networks to become viable. For instance, Motes are inevitably constrained in processing speed,
storage capacity, and communication bandwidth. Additionally, their lifetime is determined by their ability
to conserve power. These constraints require new hardware designs and novel network architectures.
Sensor networks raise nontrivial theoretical issues as well. For example, newnetworking protocols must
be devised to allowthe sensor nodes to spontaneously create an impromptu network, dynamically adapt to
device failure, manage movement of sensor nodes, and react to changes in task and network requirements.
Froma signal processing point of view, the main challenge is the distributed fusion of sensor data across
the network. This is because individual sensor nodes are often not able to provide useful or comprehensive
information about the quantity under observation. Furthermore, the following constraints must be
considered while designing the information fusion algorithm:
1. Each sensor node is likely to have limited power and bandwidth capabilities to communicate with
other devices. Therefore, any distributed computation on the sensor network must be very efcient
in utilizing the limited power and bandwidth budget of the sensor devices.
2. Owing to the variable environmental conditions in which sensor devices may be deployed, one can
expect a fraction of the sensor nodes to be malfunctioning. Therefore, the underlying distributed
algorithms must be robust with respect to device failures.
Owing to the large and often ad hoc nature of sensor networks, it would be a formidable challenge
to develop distributed information fusion algorithms without rst developing a simple, yet rigorous and
exible, mathematical model. The aim of this chapter is to introduce one such model.
We advocate that information fusion in sensor networks should be viewed as a problem of nding a
solution point in the intersection of some feasibility sets. The key advantage of this viewpoint is that
the solution can be found using a series of projections onto the individual sets. The projections can be
computed locally at each sensor node allowing the fusion process to be done in a parallel and distributed
fashion.
To maintain clarity and simplicity, we will focus on solving a benchmark signal processing problem
(spectrumestimation) using sensor networks. However, the fusion algorithms that result fromour formu-
lations are very general and can be used to solve other sensor network signal processing problems as well.
2006 by Taylor & Francis Group, LLC
Distributed Signal Processing in Sensor Networks 38-3
Notation: Vectors are denoted by capital letters. Boldface capital letters are used for matrices. Elements
of a matrix A are referred to as [A]
ij
. We denote the set of real M-tuples by R
M
and use the notation
R
+
for positive real numbers. The expected value of a random variable x is denoted by E {x }. The linear
convolution operator is denoted by . The spaces of Lebesgue-measurable functions are represented by
L
1
(a, b ), L
2
(a, b ), etc. The end of an example is indicated using the symbol .
38.2 Spectrum Estimation Using Sensor Networks
38.2.1 Background
Spectrum estimation is concerned with determining the distribution in frequency of the power of a random
process. Questions such as Does most of the power of the signal reside at low or high frequencies?
or Are there resonance peaks in the spectrum? are often answered as a result of a spectral analysis.
Spectral analysis nds frequent and extensive use in many areas of physical sciences. Examples abound in
oceanography, electrical engineering, geophysics, astronomy, and hydrology.
Let x (n ) denote a zero-mean Gaussian wide-sense stationary (WSS) random process. It is well known
that a complete statistical description of such a process is provided by its autocorrelation sequence (ACS)
R
x
(k )
= E {x (n )x (n + k )}
or, equivalently, by its power spectrum also known as power spectral density (PSD)
P
x
(e
j
) =
k =
R
x
(k )e
j k
The ACS sequence is a time-domain description of the second-order statistics of a random process. The
power spectrum provides a frequency domain description of the same statistics.
An issue of practical importance is, how to estimate the power spectrum of a time series given a nite-
length data record. This is not a trivial problem as reected in a bewildering array of power spectrum
estimation procedures, with each procedure claimed to have or show some optimum property.
1
The
reader is referred to the excellent texts [36] for analysis of empirical spectrum estimation methods.
Consider the basic scenario where a sound source (a speaker) is monitored by a collection of Motes
put at various known locations in a room (Figure 38.2). Because of reverberation, noise, and other
artifacts, the signal arriving at each Mote location is different. The Motes (which constitute the sensor
nodes in our network) are equipped with microphones, sampling devices, sufcient signal processing
hardware, and some communication means. Each Mote can process its observed data, come up with
some statistical inference about it and share the result with other nodes in the network. However, to save
energy and communication bandwidth, the Motes are not allowed to share their rawobserved data with each
other.
Now, how should the network operate so that an estimate of the power spectrum of the sound source
consistent with the observations made by all Motes is obtained? We will provide an answer to this question
in the sections that follow.
2
1
The controversy is rooted in the fact that power spectrum is a probabilistic quantity and these quantities cannot
be constructed using nite-size sample records. Indeed, neither the axiomatic theory [1] nor the frequency theory [2]
of probability species a constructive way for building probability measures from empirical samples.
2
The problem of estimating the power spectrum of a random signal, when the signal itself is not available but
some measured signals derived from it are observable, has been studied in Reference 7. The approach developed in
Reference 7, however, leads to a centralized fusion algorithm which is not suited to sensor network applications.
2006 by Taylor & Francis Group, LLC
38-4 Embedded Systems Handbook
Microphone
Data acquisition and
processing module
Communications
module
Speech
source
Sensor
nodes
FIGURE 38.2 A sensor network monitoring a stationary sound source in a room.
38.2.2 Mathematical Formulation of the Problem
Let x (n ) denote a discrete version of the signal produced by the source and assume that it is a zero-mean
Gaussian WSS random process. The sampling frequency f
s
associated with x (n ) is arbitrary and depends
on the frequency resolution desired in the spectrum estimation process.
We denote by v
i
(n ) the signal produced at the front end of the ith sensor node. We assume that v
i
(n )
are related to the original source signal x (n ) by the model shown in Figure 38.3. The linear lter H
i
(z) in
this gure models the combined effect of room reverberations, microphones frequency response, and an
additional lter which the system designer might want to include. The decimator block that follows the
lter represents the (potential) difference between the sampling frequency f
s
associated with x(n) and the
actual sampling frequency of the Motes sampling device. Here, it is assumed that the sampling frequency
associated with v
i
(n) is f
s
/N
i
where N
i
is a xed natural number.
It is straightforward to show that the signal v
i
(n) in Figure 38.3 is also a WSS processes. The
autocorrelation coefcients R
v
i
(k) associated with v
i
(n) are given by
R
v
i
(k) = R
x
i
(N
i
k) (38.1)
where
R
x
i
(k) = (h
i
(k) h
i
(k)) R
x
(k) (38.2)
and h
i
(k) denotes the impulse response of H
i
(z). We can express R
v
i
(k) as a function of the source signals
power spectrum as well. To do this, we dene G
i
(z)
=H
i
(z)H
i
(z
1
) and then use it to write (38.2) in
2006 by Taylor & Francis Group, LLC
Distributed Signal Processing in Sensor Networks 38-5
Speech
source
x(n)
v
i
(n)
x(nD)
N
i
x
i
(n)
Processor
H
i
(z)
FIGURE 38.3 The relation between the signal v
i
(n) produced by the front end of the ith sensor and the original
source signal x(n).
the frequency domain:
R
x
i
(k) =
1
2
P
x
(e
j
)G
i
(e
j
)e
jk
d (38.3)
Combining (38.1) and (38.3), we then get
R
v
i
(k) =
1
2
P
x
(e
j
)G
i
(e
j
)e
jN
i
k
d (38.4)
The above formula shows that P
x
(e
j
) uniquely species R
v
i
(k) for all values of k. However, the reverse
is not true. That is, in general, knowing R
v
i
(k) for some or all values of k is not sufcient for characterizing
P
x
(e
j
) uniquely.
Recall that v
i
(n) is a WSS signal so all the statistical information that can be gained about it is conned
in its autocorrelation coefcients. One might use the signal processing hardware available at each sensor
node and estimate the autocorrelation coefcients R
v
i
(k) for some k, say 0 k L 1. Now, we may
pose the sensor network spectrum estimation problem as follows:
Problem38.1
Let Q
i,k
denote the set of all power spectra which are consistent with the kth autocorrelation coefcient
R
v
i
(k) estimated at the ith sensor node. That is, P
x
(e
j
) Q
i,k
if
1
2
P
x
(e
j
)G
i
(e
j
)e
jMk
d = R
v
i
(k),
P
x
(e
j
) 0,
P
x
(e
j
) = P
x
(e
j
),
P
x
(e
j
) L
1
(, ).
Dene Q
=
N
i=1
L1
k=0
Q
i,k
where N is the number of nodes in the network and L is the number of
autocorrelation coefcients estimated at each node. Find a P
x
(e
j
) in Q.
2006 by Taylor & Francis Group, LLC
38-6 Embedded Systems Handbook
If we ignore measurement imperfections and assume that the observed autocorrelation coefcients
R
v
i
(k) are exact, then the sets Q
i,k
are nonempty and admit a nonempty intersection Q as well. In this
case Q contains innitely many P
x
(e
j
). When the measurements v
i
(n) are contaminated by noise or
R
v
i
(k) are estimated based on nite-length data records, the intersection set Q might be empty owing
to the potential inconsistency of the autocorrelation coefcients estimated by different sensors. Thus,
Problem 38.1 has either no solution or innitely many solutions. Problem which have such undesirable
properties are called ill-posed. Ill-posed problems are studied in Section 38.3.
38.3 Inverse and Ill-Posed Problems
The study of inverse problems has been one of the fastest-growing areas in applied mathematics in
the last two decades. This growth has largely been driven by the need of applications in both natural
sciences (e.g., inverse scattering theory, astronomical image restoration, and statistical learning theory)
and industry (e.g., computerized tomography, remote sensing). The reader is referred to References 8 to 11
for detailed treatments of the theory of ill-posed problems and to References 12 and 13 for applications
in inverse scattering and statistical inference, respectively.
By denition, inverse problems are concerned with determining causes for a desired or an observed
effect. Most often, inverse problems are much more difcult to deal with (from a mathematical point of
view) than their direct counterparts. This is because they might not have a solution in the strict sense
or solutions might not be unique or depend on data continuously. Mathematical problems having such
undesirable properties are called ill-posed problems and cause severe numerical difculties (mostly because
of the discontinuous dependence of solutions on the data).
Formally, a problem of mathematical physics is called well-posed or well-posed in the sense of Hadamard
if it fullls the following conditions:
1. For all admissible data, a solution exists.
2. For all admissible data, the solution is unique.
3. The solution depends continuously on the data.
A problem for which one or more of the above conditions are violated is called ill-posed. Note that, the
conditions mentioned do not make a precise denition for well-posedness. To make a precise denition
in a concrete situation, one has to specify the notion of a solution, which data are considered admissible,
and which topology is used for measuring continuity.
The study of concrete ill-posed problems often involves the questionhow can one enforce uniqueness
by additional informationor assumptions?Not muchcanbe saidabout this ina general context. However,
the aspect of lack of stability and its restoration by appropriate methods known as regularization methods
can be treated in sufcient generality. The theory of regularization is well developed for linear inverse
problems and will be introduced, very briey, in Section 38.3.1.
38.3.1 Ill-Posed Linear Operator Equations
Let the linear operator equation
Ax = y (38.5)
be dened by the continuous operator A that maps the elements x of a metric space E
1
onto elements
y of the metric space E
2
. In the early 1900s, noted French mathematician Jacques Hadamard observed
that under some (very general) circumstances the problem of solving the operator equation (38.5) is
ill-posed. This is because, even if there exists a unique solution x E
1
that satises the equality (38.5),
a small deviation on the right-hand side can cause large deviations in the solution. The following example
illustrates this issue.
2006 by Taylor & Francis Group, LLC
Distributed Signal Processing in Sensor Networks 38-7
Example 38.1 Let A denote a Fredholm integral operator of the rst kind. Thus, we dene
(Ax)(s)
=
b
a
K(s, t )x(t ) dt (38.6)
The kernel K(s, t ) is continuous on [a b] [a b] and maps a function x(t ) continuous on [a b] to a
function y(s) also continuous on [a b]. We observe that the continuous function
g
(s)
=
b
a
K(s, t ) sin(t ) dt (38.7)
which is formed by means of the kernel K(s, t ), possesses the property
lim
is dened in (38.7). Since the above equation is linear, it follows using (38.7) that
its solution x(t ) has the form
x(t ) = x
(t ) +sin(t ) (38.10)
where x
(t ) is a solution to the original integral equation Ax = y. For sufciently large , the right-hand
side of (38.9) differs from the right-hand side of (38.5) only by the small amount g
E
1
such that the functional
R(x)
=Ax y
E
2
(38.11)
is minimized.
3
Note that the minimizing element x
E
1
always exists even when the original
equation (38.5) does not have a solution. In any case, if the right-hand side of (38.5) is not exact,
that is, if we replace y by y
such that y y
E
2
< where is a small value, a new element x
E
1
will
minimize the functional
R
(x)
=Ax y
E
2
(38.12)
However, the new solution x
E
1
= 0 when the operator equation Ax = y is ill-posed.
38.3.2 Regularization Methods for Solving Ill-Posed
Linear Operator Equations
Hadamard [15] thought that ill-posed problems are a pure mathematical phenomenon and that all real-
life problems are well-posed. However, in the second half of the 20th century, a number of very important
real-life problems were found to be ill-posed. In particular, as we just discussed, ill-posed problems arise
when one tries to reverse the causeeffect relations to nd unknown causes from known consequences.
Even if the causeeffect relationship forms a one-to-one mapping, the problem of inverting it can be
3
To save in notation, we write a b
E
to denote the distance between the two elements a, b E whether the
metric space E is a normed space or not. If E is a normed space too, our notation is self-evident. Otherwise, it should
be interpreted only as a symbol for the distance between a and b.
2006 by Taylor & Francis Group, LLC
38-8 Embedded Systems Handbook
ill-posed. The discovery of various regularization methods by Tikhonov, Ivanov, and Phillips in the early
60s made it possible to construct a sequence of well-posed solutions that converges to the desired one.
Regularization theory was one of the rst signs of existence of intelligent inference. It demonstrated that
where the self-evident methods of solving an operator equation might not work, the nonself-evident
methods of regularization theory do. The inuence of the philosophy created by the theory of regulariz-
ation is very deep. Both the regularization philosophy and the regularization techniques became widely
disseminated in many areas of science and engineering [10,11].
38.3.2.1 Tikhonovs Method
In the early 60s, it was discovered by A.N. Tikhonov [16,17] that, if instead of the functional R
(x) one
minimizes
R
reg
(x)
=Ax y
E
2
+()S(x) (38.13)
where S(x) is a stabilizing functional (that belongs to a certain class of functionals) and () is an appropri-
ately chosen constant (whose value depends on the noise level, ), then one obtains a sequence of solutions
x
that converges to the desired one as tends to zero. For the above result to be valid, it is required that:
1. The problem of minimizing R
reg
(x) be well-posed for xed values of and ().
2. lim
0
x
E
1
0 when () is chosen appropriately.
Consider a real-valued lower semicontinuous
4
functional S(x). We shall call S(x) a stabilizing functional
if it possesses the following properties:
1. The solution of the operator equation Ax = y belongs to the domain of denition D(S) of the
functional S.
2. S(x) 0, x D(S).
3. The level sets {x: S(x) c}, c = const., are all compact.
It turns out that the above conditions are sufcient for the problem of minimizing R
reg
(x) to be well-
posed [8, page 51]. Now, the important remaining problem is to determine the functional relationship
between and () such that the sequence of solutions obtained by minimizing (38.13) converges to the
solution of (38.11) as tends to zero. The following theorem establishes sufcient conditions on such a
relationship:
Theorem 38.1 [13, page 55] Let E
1
and E
2
be two metric spaces and let A : E
1
E
2
be a continuous and
one-to-one operator. Suppose that for y E
2
there exists a solution x D(S) E
1
to the operator equation
Ax = y. Let y
be an element in E
2
such that y y
E
2
. If the parameter () is chosen such that:
(i) () 0 when 0,
(ii) lim
0
2
/() < ,
Then the elements x
E
2
+()S(x)
converge to the exact solution x as 0.
If E
1
is a Hilbert space, the stabilizing functional S(x) may simply be chosen as x
2
, which, indeed,
is the original choice made by Tikhonov. In this case, the level sets of S(x) will only be weakly compact.
However, the convergence of the regularized solutions will be a strong one in view of the properties of
4
A function f : R
N
[, ] is called lower semicontinuous at X R
N
if for any t < f (X) there exists
> 0 such that for all y B(X, ), t < . The notation B(X, ) represents a ball with center at X and radius . This
denition generalizes to functional spaces by using the appropriate metric in dening B(X, ).
2006 by Taylor & Francis Group, LLC
Distributed Signal Processing in Sensor Networks 38-9
Hilbert spaces. The conditions imposed on the parameter () are, nevertheless, more stringent than
those stated in the above theorem.
5
38.3.2.2 The Residual Method
The results presented above are fundamentals in Tikhonovs theory of regularization. Tikhonovs theory,
however, is only one of several proposed schemes for solving ill-posed problems. An important variation
known as Residual Method was introduced by Phillips [18]. In Phillipss method one minimizes the
functional
R
P
(x)
=S(x)
subjected to the constraint
Ax y
E
2
where is a xed constant. The stabilizing functional S(x) is dened as in Section 38.3.2.1.
38.3.2.3 The Quasi-Solution Method
The quasi-solutionmethodwas developedby Ivanov [19,20]. Inthis method, one minimizes the functional
R
I
(x)
=Ax y
E
2
subjected to the constraint
S(x)
where is a xed constant. Again, the stabilizing functional S(x) is dened as in Tikhonovs method.
Note that, the three regularization methods mentioned contain one free parameter ( in Tikhonovs
method, for Phillips method, and in Ivanovs method). It has been shown [21] that these methods
are all equivalent in the sense that if one of the methods (say Phillips) for a given value of its parameter
(say
) produces a solution x
(P
1
P
2
)
2
d
D
2
(P
1
, P
2
) =
P
1
ln
P
1
P
2
+ P
2
P
1
d
D
3
(P
1
, P
2
) =
P
1
P
2
ln
P
1
P
2
1
d
can be used to measure the generalized distance between P
1
(e
j
) and P
2
(e
j
). These functions are non-
negative and become zero if and only if P
1
= P
2
. Note that, D
1
is simply the Euclidean distance between
P
1
and P
2
. The functions D
2
and D
3
have roots in information theory and statistics. They are known as
the Kullback-Leibler divergence and Burg cross entropy, respectively.
By using a suitable generalized distance, we can convert our original sensor network spectrum
estimation problem (Problem 38.1) into the following minimization problem:
Problem 38.2
Let Q be dened as in Problem 38.1. Find P
x
(e
j
) in Q such that
P
= arg min
P Q
D (P, P
0
) (38.14)
where P
0
(e
j
) is an arbitrary power spectrum, say P
0
(e
j
) = 1, < .
When a unique P
exists, it is called the generalized projection of P
0
onto Q [23]. In general, a projection
of a given point onto a convex set is dened as another point which has two properties: rst, it belongs to
the set onto which the projection operation is performed and, second, it renders a minimal value to the
distance between the given point and any point of the set (Figure 38.4).
If the Euclideandistance ||XY|| is used inthis context thenthe projectionis called a metric projection.
In some cases, such as the spectrum estimation problem considered here, it turns out to be very useful to
introduce more general means to measure the distance between two vectors. The main reason is that the
functional form of the solution will depend on the choice of the distance measure used in the projection.
Often, a functional formwhich is easy to manipulate or interpret (for instance, a rational function) cannot
be obtained using the conventional Euclidean metric.
It can be shown that the distances D
1
and D
2
in Example 38.2 lead to well-posed solutions for P
. The
choice D
3
will lead to a unique solution given that ceratin singular power spectra are excluded from the
space of valid solutions [24]. It is not known whether D
3
will lead to a stable solution. As a result,
the well-posedness of Problem 38.2 when D
3
is used is not yet established.
6
6
Well-posedness of the minimization problem (38.14) when D is the Kullback-Leibler divergence D
2
has been
established in several works including References 25 to 29. Well-posedness results exist for certain classes of generalized
distance functions as well [29,30]. Unfortunately, the Burg cross entropy D
3
does not belong to any of these classes.
While Burg cross entropy lacks theoretical support as a regularizing functional, it has been used successfully to resolve
ill-posed problems in several applications including spectral estimation and image restoration (see, e.g., [31] and
references therein). The desirable feature of Burg cross entropy in the context of spectrum estimation is that its
minimization (subjected to linear constraints P
x
(e
j
) Q) leads to rational power spectra.
2006 by Taylor & Francis Group, LLC
Distributed Signal Processing in Sensor Networks 38-11
Y
Y
X*
Q
||XY||
Y
X
X*
Q
D(X,Y)
(b) (a)
FIGURE 38.4 Symbolic depiction of metric projection (a) and generalized projection (b) of a vector Y into a closed
convex set Q. In (a) the projection X
is selected by minimizing the metric ||X Y|| over all X Qwhile in (b) X
is found by minimizing the generalized distance D(X, Y) over the same set.
38.5 Distributed Algorithms for Calculating
Generalized Projection
As we mentioned before, a very interesting aspect of the generalized projections formulation is that the
solution P
=arg min
PQ
i,k
D
j
(P, P
1
) (38.15)
Using standard techniques from calculus of variations we can show that the generalized distances D
1
, D
2
,
and D
3
introduced in Example 38.2 result in projections of the form
P
[P
1
Q
i,k
;D
1
]
= P
1
(e
j
) G
i
(e
j
) cos(Mk)
P
[P
1
Q
i,k
;D
2
]
= P
1
(e
j
) exp(G
i
(e
j
) cos(Mk))
P
[P
1
Q
i,k
;D
3
]
= (P
1
(e
j
)
1
+ G
i
(e
j
) cos(Mk))
1
where , , and are parameters (Lagrange multipliers). These parameter should be chosen such that in
each case P
[P
1
Q
i,k
;D
j
]
Q
i,k
. That is,
1
2
P
[P
1
Q
i,k
;D
j
]
G
i
(e
j
)e
jMk
d = R
v
i
(k) (38.16)
The reader may observe that the above equation leads to a closed-form formula for but in general
nding and requires numerical methods. The projection formulae developed above can be employed
in a variety of iterative algorithms to nd a solution in the intersection of Q
i,k
. We discuss two example
algorithms below.
38.5.1 The Ring Algorithm
Ring Algorithm is a very simple algorithm: it starts with an initial guess P
0
for P
x
(e
j
) and then cal-
culates a series of successive projections onto the constraint sets Q
i,k
. Then, it takes the last projection,
now called P
(1)
, and projects it back onto the rst constraint set. Continuing this process will generate
2006 by Taylor & Francis Group, LLC
38-12 Embedded Systems Handbook
The Ring Algorithm
Input: A distance function D
j
(P
1
, P
2
), an initial power spectrum P
0
(e
j
), the squared sensor frequency
responses G
i
(e
j
), and the autocorrelation estimates R
v
i
(k ) for k = 0, 1, . . . , L 1 and i = 1, 2, . . . , N.
Output: A power spectrum P
(e
j
).
Procedure:
1. Let m = 0, i = 1, and P
(m )
= P
0
.
2. Send P
(m )
to the ith sensor node.
At the ith sensor:
(i) Let k = 0 and dene
P
k
= P
(m )
.
(ii) Calculate
P
k
= P
[
P
k 1
Q
i,k
;D
j
]
for k = 1, 2, . . . , L 1.
(iii) If D (
P
L 1
,
P
0
) > then let
P
0
=
P
L 1
and go back to item (ii). Otherwise, let i = i + 1
and go to Step 3.
3. If (i mod N) = 1 then set m = m + 1 and reset i to 1. Otherwise, set P
(m )
=
P
L 1
and go back
to Step 2.
4. Dene P
(m )
=
P
L 1
. If D (P
(m )
, P
(m 1)
) > , go back to Step 2. Otherwise output P
= P
m
and stop.
a sequence of solutions P
(0)
, P
(1)
, P
(1)
, . . . which will eventually converge to a solution P
i,k
Q
i,k
[22].
Steps of the Ring Algorithm are summarized in the text box above. A graphical representation of this
algorithm is shown in Figure 38.5.
Example 38.3 Consider a simple four-sensor network similar to the one shown in Figure 38.5. Assume
that the down-sampling ratio in each Mote is equal to four. Thus, N
0
= N
1
= N
2
= N
3
= 4. Assume,
further, that the transfer functions H
0
(z ) to H
3
(z ) which relate the Motes front-end output v
i
(n ) to the
original source signal x (n ) are given as follows:
H
0
(z ) =
0.0753 + 0.1656z
1
+ 0.2053z
2
+ 0.1659z
3
+ 0.0751z
4
1.0000 0.8877z
1
+ 0.6738z
2
0.1206z
3
+ 0.0225z
4
H
1
(z ) =
0.4652 0.1254z
1
0.3151z
2
+ 0.0975z
3
0.0259z
4
1.0000 0.6855z
1
+ 0.3297z
2
0.0309z
3
+ 0.0032z
4
H
2
(z ) =
0.3732 0.8648z
1
+ 0.7139z
2
0.1856z
3
0.0015z
4
1.0000 0.5800z
1
+ 0.5292z
2
0.0163z
3
+ 0.0107z
4
H
3
(z ) =
0.1931 0.4226z
1
+ 0.3668z
2
0.0974z
3
0.0405z
4
1.0000 +0.2814z
1
+0.3739z
2
+0.0345z
3
0.0196z
4
The above transfer functions were chosen to show typical low-pass, band-pass, and high-pass char-
acteristics (Figure 38.6). They were obtained using standard lter design techniques. The input signal
whose power spectrum is to be estimated was chosen to have a smooth low-pass spectrum. We used the
Ring Algorithm with L = 4 and the Euclidean metric D
1
as the distance function to estimate the input
signals spectrum. The results are shown in Figure 38.7. As seen in this gure, the algorithm converges to a
solution which is in this case almost identical to the actual input spectrum in less than 100 rounds.
38.5.2 The Star Algorithm
The Ring Algorithm is completely decentralized. However, it will not converge to a solution if the feasible
sets Q
i,k
do not have an intersection (which can happen owing to measurement noise) or one or more
2006 by Taylor & Francis Group, LLC
Distributed Signal Processing in Sensor Networks 38-13
Speech
source
x(n)
Feasible sets Q
i,k
P
(m)
e
( jv)
P
(m)
e
( jv)
Input P
(m)
e
( jv)
P
(0)
e
( jv)
P
(m)
e
( jv)
P
(m)
e
( jv)
Output P
(m)
e
( jv)
FIGURE 38.5 Graphical depiction of the Ring Algorithm. For illustrative reasons, only three feasible sets Q
i,k
are shown in the inside picture. Also, it is shown that the output spectrum P
(m)
(e
j
) is obtained from the input
P
(m)
(e
j
) only after three projections. In practice, each sensor node has L feasible sets and has to repeat the
sequence of projections many times before it can successfully project the input P
(m)
(e
j
) into the intersection of its
feasible sets.
0 0.5 1 1.5 2 2.5 3 3.5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Radian frequency (v)
|
H
i
(
v
)
|
FIGURE 38.6 Frequency response amplitude of the transfer functions used in Example 38.3. The curves show, from
left to right, |H
0
(e
j
)|, |H
1
(e
j
)|, |H
2
(e
j
)|, and |H
3
(e
j
)|.
sensors in the network are faulty. The Star Algorithm is an alternative distributed algorithm for fusing
individual sensors data. It combines successive projections onto Q
i,k
with a kind of averaging operation to
generate a sequence of solutions P
(m)
. This sequence will eventually converge to a solution P
i,k
Q
i,k
if
one exists. The Star Algorithm is fully parallel and hence much faster than the Ring Algorithm. It provides
2006 by Taylor & Francis Group, LLC
38-14 Embedded Systems Handbook
0 0.5 1 1.5 2 2.5 3 3.5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Radian frequency (v) Radian frequency (v)
Radian frequency (v)
P
o
w
e
r
s
p
e
c
t
r
u
m
P
m
(
v
)
P
o
w
e
r
s
p
e
c
t
r
u
m
P
m
(
v
)
P
o
w
e
r
s
p
e
c
t
r
u
m
P
m
(
v
)
P
o
w
e
r
s
p
e
c
t
r
u
m
P
m
(
v
)
P
o
w
e
r
s
p
e
c
t
r
u
m
P
m
(
v
)
P
o
w
e
r
s
p
e
c
t
r
u
m
P
m
(
v
)
m=0
0 0.5 1 1.5 2 2.5 3 3.5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
m=1
(a) (b)
0 0.5 1 1.5 2 2.5 3 3.5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Radian frequency (v)
Radian frequency (v) Radian frequency (v)
m=4
0 0.5 1 1.5 2 2.5 3 3.5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
m=10
(c) (d)
0 0.5 1 1.5 2 2.5 3 3.5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
m=20
0 0.5 1 1.5 2 2.5 3 3.5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
m=100
(e) (f)
FIGURE 38.7 Ring Algorithm convergence results. In each gure, the dashed curve shows the source signals actual
power spectrum while the solid curve is the estimate obtained by the Ring Algorithm after m rounds. Around means
projections have been passed through all the nodes in the network.
some degree of robustness to individual nodes failure as well. However, it includes a centralized step
which needs to be accommodated for when the systems network protocol is being designed. Steps of the
Star Algorithm are summarized in the text box further. A graphical representation of this algorithm is
shown in Figure 38.8.
2006 by Taylor & Francis Group, LLC
Distributed Signal Processing in Sensor Networks 38-15
Speech
source
x(n)
Feasible sets Q
i,k
Input
P
(m)
e
( jv)
Feasible sets Q
i,k
Feasible sets Q
i,k
...
P
(m+1)
P
i
(m)
FIGURE 38.8 The Star Algorithm. Again, only three feasible sets Q
i,k
are shown in the inside picture. In practice,
each sensor node has to repeat the sequence of projections and averaging many times before it can successfully
project the input P
(m)
(e
j
) supplied by the central node into the intersection of its feasible sets. The projection
result, which is called P
(m)
i
(e
j
) is sent back to the central node. The central node then averages all the P
(m)
i
(e
j
) it
has received and averages them to produce P
(m+1)
(e
j
). This is sent back to the individual nodes and the process
repeats.
The Star Algorithm
Input: Adistance function D
j
(P
1
, P
2
), an initial power spectrumP
0
(e
j
), the squared sensor frequency
responses G
i
(e
j
), and the autocorrelation estimates R
v
i
(k).
Output: A power spectrumP
(e
j
).
Procedure:
1. Let m = 0 and P
(0)
= P
0
.
2. Send P
(m)
to all sensor nodes.
At the ith sensor:
(i) Let n = 0 and dene
P
(n)
= P
(m)
.
(ii) Calculate
P
k
= P
[
P
(n)
Q
i,k
;D
j
]
for all k.
(iii) Calculate
P
(n+1)
= arg min
P
k
D(P,
P
k
).
(iv) If D(
p
(n+1)
,
P
(n)
) > go to item (ii) and repeat. Otherwise, dene P
(m)
i
=
P
(n+1)
and
send it to the central unit.
3. Receive P
(m)
i
from all sensor and calculate P
(m+1)
= arg min
P
i
D(P, P
(m)
i
).
4. If D(P
(m+1)
, P
(m)
) > , go to step 2 and repeat. Otherwise stop and output P
= p
(m+1)
.
2006 by Taylor & Francis Group, LLC
38-16 Embedded Systems Handbook
0 0.5 1 1.5 2 2.5 3 3.5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Radian frequency (v) Radian frequency (v)
P
o
w
e
r
s
p
e
c
t
r
u
m
P
m
(
v
)
0 0.5 1 1.5 2 2.5 3 3.5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Radian frequency (v)
0 0.5 1 1.5 2 2.5 3 3.5
Radian frequency (v)
0 0.5 1 1.5 2 2.5 3 3.5
Radian frequency (v)
0 0.5 1 1.5 2 2.5 3 3.5
Radian frequency (v)
Radian frequency (v)
0 0.5 1 1.5 2 2.5 3 3.5
Radian frequency (v)
P
o
w
e
r
s
p
e
c
t
r
u
m
P
m
(
v
)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
P
o
w
e
r
s
p
e
c
t
r
u
m
P
m
(
v
)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
P
o
w
e
r
s
p
e
c
t
r
u
m
P
m
(
v
)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
P
o
w
e
r
s
p
e
c
t
r
u
m
P
m
(
v
)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
P
o
w
e
r
s
p
e
c
t
r
u
m
P
m
(
v
)
m=0 m=1
m=4 m=10
m=20 m=100
(a) (b)
(c) (d)
(e) (f)
FIGURE 38.9 Star Algorithm results.
Example 38.4 Consider a simple ve-sensor network similar to the one shown in Figure 38.8. Assume
that the down-sampling ratio in each Mote is equal to four. Thus, again, N
0
= N
1
= N
2
= N
3
= 4.
Assume, further, that the transfer functions H
0
(z) to H
3
(z) which relate the Motes front-end output
v
i
(n) to the original source signal x(n) are the same as those introduced in Example 38.3. We simu-
lated the Star Algorithm with L = 4 and the Euclidean metric D
1
as the distance function to estimate
the input signals spectrum. The results are shown in Figure 38.9. Like the Ring Algorithm, the Star
Algorithm also converges to a solution which is almost identical to the actual input spectrum in less than
100rounds.
2006 by Taylor & Francis Group, LLC
Distributed Signal Processing in Sensor Networks 38-17
38.6 Concluding Remark
In this chapter we considered the problem of fusing the statistical information gained by a distributed
network of sensors. We provided a rigorous mathematical model for this problem where the solution
is obtained by nding a point in the intersection of nitely many closed convex sets. We investigated
distributed optimization algorithms to solve the problem without exchanging the raw observed data
among the sensors.
The information fusion theory presented in this chapter is by no means complete. Many issues regarding
both the performance and implementation of the two algorithms we introduced need to be investigated.
Other algorithms for solving the problem of nding the solution in the intersection of the feasible sets are
possible as well. We hope that our results point out the way toward more complete theories and help to
give shape to the emerging eld of sensor processing for sensor networks.
MATLAB codes implementing the algorithms mentioned in this chapter are maintained online at
www.multirate.org.
Acknowledgments
The authors would like to thank Mr. Mayukh Roy for his help in drawing some of the gures. They are
also very grateful to the Editor, Dr. Richard Zurawski, for his patience and cooperation during the long
process of writing this chapter.
References
[1] H. Jeffreys, Theory of Probability, 3rd ed., Oxford University Press, London, 1967.
[2] R. von Mises, Mathematical Theory of Probability and Statistics, Academic Press, New York, 1964.
[3] S.M. Kay, Modern Spectrum Estimation: Theory and Applications, Prentice Hall, Upper Saddle
River, NJ, 1988.
[4] D.B. Percival and A.T. Walden, Statistical Digital Signal Processing and Modeling, Cambridge
University Press, London, 1993.
[5] M.H. Hayes, Statistical Signal Processing and Modeling, John Wiley and Sons, New York, 1996.
[6] B. Buttkus, Spectral Analysis and Filter Theory in Applied Geophysics, Springer-Verlag, Berlin,
2000.
[7] O.S. Jahromi, B.A. Francis, and R.H. Kwong, Spectrum estimation using multirate observations,
IEEE Transactions on Signal Processing, 52(7), 18781890, July 2004. Preprint available from
www.multirate.org (to appear).
[8] A.N. Tikhonov and V.Y. Arsenin, Solutions of Ill-Posed Problems, V.H. Winston & Sons,
Washington, DC, 1977.
[9] V.V. Vasin and A.L. Ageev, Ill-Posed Problems with A Priori Information, VSP, Utrecht,
The Netherlands, 1995.
[10] H.W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems, Kluwer Academic
Publishers, Dordrecht, The Netherlands, 1996.
[11] A.N. Tikhonov, A.S. Leonov, and A.G. Yagola, Nonlinear Ill-Posed Problems, Chapman & Hall,
London, 1998, 2 Vols.
[12] K. Chadan, D. Colton, L. Pivrinta, and W. Rundell, An Introduction to Inverse Scattering and
Inverse Spectral Problems, SIAM, Philadelphia, 1997.
[13] V. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, 1999.
[14] F. Jones, Lebesgue Integration on Euclidean Space, Jones and Bartlett Publishers, Boston, MA, 1993.
[15] J. Hadamard, Lectures on Cauchys Problemin Linear Partial Differential Equations, Yale University
Press, New Haven, CT, 1923.
[16] A.N. Tikhonov, Onsolving ill-posedproblems andthe methodof regularization, Doklady Akademii
Nauk SSSR, 151, 501504, 1963 (in Russian), English translation in Soviet Math. Dokl.
2006 by Taylor & Francis Group, LLC
38-18 Embedded Systems Handbook
[17] A.N. Tikhonov, On the regularization of ill-posed problems, Doklady Akademii Nauk SSSR, 153,
4952, 1963 (in Russian), English translation in Soviet Math. Dokl.
[18] D.L. Phillips, A technique for numerical solution of certain integral equations of the rst kind,
Journal of the Association for Computing Machinery, 9, 8497, 1962.
[19] V.K. Ivanov, Integral equations of the rst kindandthe approximate solutionof aninverse potential
problem, Doklady Akademii Nauk SSSR, 142, 9971000, 1962 (in Russian), English translation in
Soviet Math. Dokl.
[20] V.K. Ivanov, On linear ill-posed problems, Doklady Akademii Nauk SSSR, 145, 270272, 1962
(in Russian), English translation in Soviet Math. Dokl.
[21] V.V. Vasin, Relationship of several variational methods for approximate solutions of ill-posed
problems, Mathematical Notes, 7, 161166, 1970.
[22] Y. Censor and S.A. Zenios, Parallel Optimization: Theory, Algorithms, and Applications, Oxford
University Press, Oxford, 1997.
[23] H.H. Bauschke andJ.M. Borwein, Onprojectionalgorithms for solving convex feasibility problems,
SIAMReview, 38, 367426, 1996.
[24] J.M. Borwein and A.S. Lewis, Partially-nite programming in L
1
and the existence of maximum
entropy estimates, SIAMJournal of Optimization, 3, 248267, 1993.
[25] M. Klaus and R.T. Smith, A Hilbert space approach to maximum entropy reqularization,
Mathematical Methods in Applied Sciences, 10, 397406, 1988.
[26] U. Amato and W. Hughes, Maximum entropy regularization of Fredholm integral equations of
the rst kind, Inverse Problems, 7, 793808, 1991.
[27] J.M. Borwein and A.S. Lewis, Convergence of best maximum entropy estimates, SIAMJournal of
Optimization, 1, 191205, 1991.
[28] P.P.B. Eggermont, Maximum entropy regularization for Fredholm integral equations of the rst
kind, SIAMJournal of Mathematical Analysis, 24, 15571576, 1993.
[29] M. Teboulle and I. Vajda, Convergence of best -entropy estimates, IEEE Transactions on
Information Theory, 39(1), 297301, 1993.
[30] A.S. Leonev, A generalization of the maximal entropy method for solving ill-posed problems,
Siberian Mathematical Journal, 41, 716724, 2000.
[31] N. Wu, The MaximumEntropy Method, Springer-Verlag, Berlin, 1997.
2006 by Taylor & Francis Group, LLC
39
Sensor Network
Security
Guenter Schaefer
Fachgebiet Telematik/Rechnernetze
Technische Universitaet Ilmenau
Berlin
39.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 39-2
39.2 DoS and Routing Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39-4
39.3 Energy Efcient Condentiality and Integrity . . . . . . . . . 39-7
39.4 Authenticated Broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39-11
39.5 Alternative Approaches to Key Management . . . . . . . . . . 39-13
39.6 Secure Data Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39-19
39.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39-21
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39-22
This chapter gives an introduction to the specic security challenges in wireless sensor networks and some
of the approaches to overcome them that have been proposed so far. As this area of research is very active
at the time of writing, it is to be expected that more approaches are going to be proposed as the eld gets
more mature, so this chapter should be understood as a snapshot rather than a denitive account of the
eld.
When thinking of wireless sensor network security, one major question that comes to mind is: what are
the differences between security in sensor networks and general network security? Well, in both cases
one usually aims to ensure certain security objectives (also called security goals). In general, the following
objectives are pursuit: authenticity of communicating entities and messages (data integrity), condentiality,
controlled access, availability of communication services, and nonrepudiation of communication acts [1].
And basically, these are the same objectives that need to be ensured also in wireless sensor networks (with
maybe the exception of nonrepudiation which is of less interest at the level on which sensor networks
operate). Also, in both cases cryptographic algorithms and protocols [2] are the main tool to be deployed
for ensuring these objectives. So, from a high level point of view, one could come to the conclusion that
sensor network security does not add much to what we already know from network security in general,
and thus the same methods could be applied in sensor networks as in classical xed or wireless networks.
However, closer consideration reveals various differences that have their origins in specic characterist-
ics of wireless sensor networks, so that straightforwardapplicationof knowntechniques is not appropriate.
In this chapter we, therefore, rst point out these characteristics and give an overviewof the specic threats
and security challenges in sensor networks. The remaining sections of the chapter then deal in more detail
with the identied challenges, that are: Denial of Service (DoS) and routing security, energy efcient cond-
entiality and integrity, authenticated broadcast, alternative approaches to key management, and secure data
aggregation.
39-1
2006 by Taylor & Francis Group, LLC
39-2 Embedded Systems Handbook
39.1 Introduction and Motivation
The main characteristics of wireless sensor networks can be explained by summarizing that they are
envisaged to be:
Formed by tens to thousands of small, inexpensive sensors that communicate over a wireless
interface.
Connected via base stations to traditional networks/hosts running applications interested in the
sensor data.
Using multi-hop communications among sensors in order to bridge the distance between sensors
and base stations.
Considerably resource constrained owing to limited energy availability.
To get an impression of the processing capabilities of a wireless sensor node, one should have the following
example of a sensor node in mind: a node running a 8-bit CPU with 4 MHz clock frequency, 4 KB free
of 8 KB ash read-only memory, 512 bytes SRAM main memory, a 19.2 Kbit/sec radio interface and the
node being powered by battery.
Typical applications envisaged for wireless sensor networks are environment monitoring (earthquake or
re detection, etc.), home monitoring and convenience applications, site surveillance (intruder detection),
logistics and inventory applications (tagging and locating goods, containers, etc.), as well as military
applications (battleground reconnaissance, troop coordination, etc.). The fundamental communication
pattern to be used in such a network consists of an application demanding some named information in a
specic geographical area. Upon this request, one or more base stations broadcast the request, and wireless
sensors relay the request and generate answers to it if they contribute to the requested information. The
answers are then processed and aggregated as they ow through the network toward the base station(s).
Figure 39.1 shows an examplary sensor network topology as currently designated for such applications.
The sensor network itself consists of one or more base stations that may be able to communicate among
each other by some high-bandwidth link (e.g., IEEE 802.11). The base stations furthermore communicate
with sensor nodes over a low-bandwidth link. As not all sensor nodes can communicate directly with the
base station, multi-hop communication is used in the sensor network to relay queries or commands sent
Int ernet
...
Cl assi cal Inf rast ruct ure Sensor Net work
Sensor node Low-power radio link
Base station High-bandwidth radio link
...
Classical infrastructure Sensor network
FIGURE 39.1 A general sensor network topology example.
2006 by Taylor & Francis Group, LLC
Sensor Network Security 39-3
by the base station to all sensors, as well as to send back the answers from sensor nodes to the base station.
If multiple sensors contribute to one query, partial results may be aggregated as they ow toward the base
station. In order to communicate results or report events to an application residing outside the sensor
network, one or more base stations may be connected to a classical infrastructure network.
As the above description already points out, there are signicant differences between wireless sensor and
so-called ad hoc networks, to which they are often compared. Both types of networks can be differentiated
more specically by considering the following characteristics [3]:
Sensor networks show distinctive application specic characteristics, for example, depending on
its application, a sensor network might be very sparse or dense.
The interaction of the network with its environment may cause rather bursty trafc patterns.
Consider, for example, a sensor network deployed for detecting/predicting earthquakes or re
detection. Most of the time, there will be little trafc, but if an incident happens the trafc load
will increase heavily.
The scale of sensor networks is expected to vary between tens to thousands of sensors.
Energy is even more scarce than in ad hoc networks as sensors will be either battery powered or
powered by environmental phenomena (e.g., vibration).
Self-congurability will be an important feature of sensor networks. While this requirement also
exists for ad hoc networks, its importance is even higher in sensor networks, as, for example, human
interaction during conguration might be prohibitive, the geographic position of sensor nodes has
to be learnt, etc.
Regarding dependability and Quality-of-Service (QoS), classical QoS notions such as throughput,
jitter, etc., are of little interest in sensor networks, as the main requirement in such networks is the
plain delivery of requested information, and most envisaged applications only pose low-bandwidth
requirements.
As sensor networks follow a data centric model, sensor identities are of little interest, and new
addressing schemes, for example, based on semantics or geography, are more appealing.
The requiredsimplicity of sensor nodes interms of operating system, networking software, memory
footprint, etc., is much more constraining than in ad hoc networks.
So far, we have mainly described sensor networks according to their intrinsic characteristics, and regarding
their security, we have only stated that principally the same security objectives need to be met as in other
types of networks. This leads to the question, what makes security in sensor networks a genuine area of
network security research?
To give a short answer, there are three main reasons for this. First, sensor nodes are deployed under
particularly harsh conditions from a security point of view, as there will often be a high number of
nodes distributed in a (potentially hostile) geographical area, so that it has to be assumed that at least
some nodes may get captured and compromised by an attacker. Second, the severe resource constraints
of sensor nodes in terms of computation time, memory, and energy consumption demand for very
optimized implementation of security services, and also lead to a very unfair power balance between
potential attacker (e.g., equipped with a notebook) and defender (cheap sensor node). Third, the specic
property of sensor networks to aggregate (partial) answers to a request as the information ows from the
sensors toward the base station calls for new approaches for ensuring the authenticity of sensor query
results, as established end-to-end security approaches are not appropriate for this.
Consequently, the following security objectives prove to be challenging in wireless sensor networks:
Avoiding and coping with sensor node compromise. This includes measures to partiallyhidethe location
of sensor nodes at least on the network layer, so that an attacker should ideally not be able to use network
layer information in order to locate specic sensor nodes. Furthermore, sensor nodes should as far as
possible be protected from compromise through tamper-proong measures, where this is economically
feasible. Finally, as node compromise cannot be ultimately prevented, other sensor network security
mechanisms should degrade gracefully in case of single node compromises.
2006 by Taylor & Francis Group, LLC
39-4 Embedded Systems Handbook
Maintaining availability of sensor network services. This requires a certain level of robustness against
so-called DoS attacks, protection of sensor nodes frommalicious energy draining and ensuring the correct
functioning of message routing.
Ensuring condentiality and integrity of data. Data retrieved from sensor networks should be protected
from eavesdropping and malicious manipulation. In order to attain these goals in sensor networks, both
efcient cryptographic algorithms and protocols as well as an appropriate key management are required,
and furthermore the specic communication pattern of sensor networks (including data aggregation) has
to be taken into account.
In the following sections, we will discuss these challenges in more detail and present rst approaches that
have been proposed to meet them.
39.2 DoS and Routing Security
Denial of Service attacks aim at denying or degrading a legitimate users access to a service or network
resource, or at bringing down the servers offering such services themselves.
Froma high level point of view, DoS attacks can be classied into the two categories resource destruction
and resource allocation. In a more detailed examination the following DoS attacking techniques can be
identied:
1. Disabling services by:
Breaking into systems (hacking)
Making use of implementation weaknesses as buffer overrun, etc
Deviation from proper protocol execution
2. Resource depletion by causing:
Expensive computations
Storage of state information
Resource reservations (e.g., bandwidth)
High trafc load (requires high overall bandwidth from attacker)
Generally speaking, these attacking techniques can be applied to protocol processing functions at different
layers of the protocol architecture of communication systems. While some of the attacking techniques can
be defended by a combination of established means of good system management, software engineering,
monitoring and intrusion detection, the attacking techniques protocol deviation, and resource depletion
require dedicated analysis for specic communication protocols.
Insensor networks, twoaspects raise specic DoS concerns: rst, breaking intosensor nodes is facilitated
by the fact that it might be relatively easy for an attacker to physically capture and manipulate some of the
sensor nodes distributed in an area, and second, energy is a very scarce resource in sensor nodes, so any
opportunity for an attacker to cause a sensor node to wake up and perform some processing functions is
a potential DoS vulnerability.
In 2002, Wood and Stankovic [4] published an article on DoS threats in sensor networks in which they
mainly concentrated on protocol functions of the rst four Open System Interconnection (OSI) layers.
Table 39.1 gives an overview of their ndings and potential countermeasures proposed.
On the physical layer, jamming of the wireless communication channel represents the principal attack-
ing technique. Spread-spectrum techniques are by nature more resistant against this kind of attack, but
can nevertheless not guarantee the availability of physical layer services. In case the bandwidth available
in an area is reduced by a DoS attack, giving priority to more important messages could help to maintain
at least basic operations of a sensor network. While jamming mainly disturbs the availability of sensor
nodes to communicate, it has second DoS relevant side effect: as a consequence of worse channel con-
ditions, sensor nodes need more energy to exchange messages. Depending on protocol implementation,
this could even lead to energy exhaustion of some nodes, if they tirelessly tried to send their messages
instead of waiting for better channel conditions. Therefore, from a DoS avoidance point of view, lower
2006 by Taylor & Francis Group, LLC
Sensor Network Security 39-5
TABLE 39.1 DoS-Threats in Wireless Sensor Networks [4]
Network layer Attacks Countermeasures
Physical Tampering Tamper-proong, hiding
Jamming Spread-spectrum, priority messages,
lower duty cycle, region mapping,
mode change
Link Collision Error-correcting code
Exhaustion Rate limitation
Unfairness Small frames
Network Neglect and greed Redundancy, probing
Homing Encryption (only partial protection)
Misdirection Egress ltering, authorization, monitoring
Black holes Authorization, monitoring, redundancy
Transport Flooding Client puzzles
Desynchronization Data origin authentication
duty cycles could be a benecial protocol reaction to bad channel conditions. Furthermore, the routing
protocol (see also later) should avoid to direct messages into jammed areas, and ideally, cooperating sensor
nodes located at the edge of a jammed area could collaborate to map jamming reports and reroute trafc
around this area. If sensor nodes posses of multiple modes of communication (e.g., wireless and infrared
communications), changing the mode is also a potential countermeasure. Finally, even if not directly
related to communications, capturing and tampering with sensor nodes can also be classied as a physical
layer threat. Tamper-proong of nodes is one obvious measure to avoid further damage resulting from
misuse of captured sensor nodes. A traditional preventive measure to at least render capturing of nodes
more difcult, is to hide them.
On the link layer, Wood and Stankovic identify (malicious) collisions and unfairness as potential threats
andpropose as classical measures the use of error-correcting codes andsmall frames. While one couldargue
that both threats (and respective countermeasures) are not actually security specic but also known as
conventional problems (and strategies for overcoming them), their deliberate exposure for DoS attacks
could nevertheless lead to temporal unavailability of communicationservices, and ultimately to exhaustion
of sensor nodes. For the latter threat the authors propose rate limitation as a potential countermeasure
(basically the same idea as lower duty cycle mentioned in the physical layer discussion).
Considering the network layer, threats can be further subdivided into forwarding- and routing-related
threats. Regarding forwarding, the main threats are neglect and greed, that is, sensor nodes that might only
be interested in getting their own packets transfered in the network without correctly participating in the
forwarding of other nodes packets. Such behavior could potentially be detected by the use of probing
packets and circumvented by using redundant communication paths. However, both measures increase
the network overhead and thus do not come for free. If packets contain the geographical position of nodes
in cleartext, this could be exploited by an attacker for homing (locating) specic sensor nodes in order to
physically capture and compromise them. As a countermeasure against this threat, Wood and Stankovic
propose encryption of message headers and content between neighboring nodes. Regarding routing-
related threats, deliberate misdirection of trafc could lead to higher trafc load, as a consequence to
higher energy consumption in a sensor network, and potentially also to unreachability of certain network
parts. Potential countermeasures against this threat are egress ltering, that is, checking the direction
in which messages will be routed, authorization verication of routing-related messages, monitoring of
routing- andforwarding-behavior of nodes by neighboring nodes, andredundant routing of messages over
multiple paths that in the ideal case do not share common intermediate nodes. The same countermeasures
can also be applied in order to defend against so-called black hole attacks, in which one node or part of the
network attracts a high amount of trafc (e.g., by announcing short routes to the base station) but does
not forward this trafc.
2006 by Taylor & Francis Group, LLC
39-6 Embedded Systems Handbook
On the transport layer, the threats ooding with connection requests and desynchronization of sequence
numbers are identied in Reference 4. Both attack techniques are known from classical Internet commu-
nications and might also be potentially applied to sensor networks, in case such networks are going to
make use of transport layer connections. Established countermeasures to defend them are so-called client
puzzles [5] and authentication of communication partners.
Recapitulating the given discussion, it can be seen that especially the network layer exhibits severe DoS
vulnerabilities and proves to be the most interesting layer for potential attackers interested in degrading
the availability of sensor network services. This is mostly owing to the fact that in this layer the essential
forwarding- and routing-functionality is realized, so that an attacker can cause signicant damage with
rather moderate means (e.g., in comparison to jamming a large area). In the following, we will, therefore,
further elaborate on this layer and at the same time extend our discussion in general threats on forwarding-
and routing-functions including attacks beyond pure DoS interests.
In Reference 6, Karlof and Wagner give an overview on attacks and countermeasures regarding secure
routing in wireless sensor networks. From a high level point of view they identify the following threats:
Insertion of spoofed, altered, or replayed routing information with the aim of loop construction,
attracting, or repelling trafc, etc.
Forging of acknowledgments which may trick other nodes to believe that a link or node is either
dead or alive when in fact it is not.
Selective forwarding which may be realized eitherin pathorbeneath pathby deliberate jamming,
and which allows to control what information is forwarded and what information is suppressed.
Creation of so-called sinkholes, that is attracting trafc to a specic node, for example, to prepare
selective forwarding.
Simulating multiple identities (Sybil attacks) which allows to reduce effectiveness of fault-tolerant
schemes like multi-path routing.
Creation of so-called wormholes by tunneling messages over alternative low-latency links, for
example, to confuse the routing protocol, create sinkholes, etc.
Sending of so-called hello oods (more precisely: hello shouting), in which an attacker sends or
replays a routing protocols hello packets with more energy in order to trick other nodes into the
belief that they are neighbors of the sender of the received messages.
In order to give an example for such attacks, Figure 39.2 [7] illustrates the construction of a breadth-
rst spanning tree, and Figure 39.3 [6] shows the effect of two attacks on routing schemes that use the
breadth-rst search tree idea to construct their forwarding tables.
One example of a sensor network operating system that builds a breadth-rst spanning tree rooted
at the base station is TinyOS. In such networks, an attacker disposing of one or two laptops can either
send out forged routing information or launch a wormhole attack. As can be seen in Figure 39.3, both
attacks lead to entirely different routing trees and can be used to prepare further attacks such as selective
forwarding, etc.
In order to defend against the abovementioned threats, Karlof and Wagner discuss various methods.
Regarding forging of routing information or acknowledgments, data origin authentication and conden-
tiality of link layer PDUs (Protocol Data Units) can serve as an effective countermeasure. While the rst
naive approach of using a single group key for this purpose exhibits the rather obvious vulnerability that
a single node compromise would result in complete failure of the security, a better, still straightforward
approach is to let each node share a secret key with a base station and to have base stations act as trusted
third parties in key negotiation (e.g., using the OtwayRees protocol [8]).
Combined with an appropriate key management, the abovementioned link layer security measures
could also limit the threat potential of the attack of simulating multiple identities: by reducing the
number of neighbors a node is allowed to have, for example, through enforcement during key distribution,
authentic sensor nodes could be protected fromaccepting too many neighborhood relations. Additionally,
by keeping track of authentic identities and associated keys, the ability of potentially compromised nodes
to simulate multiple identities could be restricted. However, the latter idea requires some kind of global
2006 by Taylor & Francis Group, LLC
Sensor Network Security 39-7
BS
BS BS
BS
(a) (b)
(c) (d)
FIGURE 39.2 Breadth-rst search: (a) BS sends beacon, (b) rst answers to beacon, (c) answers to rst answers, and
(d) resulting routing tree.
Example routing tree Forging routing updates Wormhole attack
FIGURE 39.3 Attacks on breadth-rst search routing.
knowledge, that often can only be realized efciently by a centralized scheme which actively involves a
base station in the key distribution protocol.
When it comes to hello shouting and wormhole/sinkhole attacks, however, pure link layer security
measures cannot provide sufcient protection, as they cannot completely protect against replay attacks.
Links should, therefore, be checked in both directions before making routing decisions in order to defend
against simple hello shouting attacks. Detection of wormholes actually proves to be difcult and rst
approach to this problem requires rather tight clock synchronization [9]. Sinkholes might be avoided by
deploying routing schemes like geographical routing, that do not rely on constructing forwarding tables
according to distance measured in hops to destination. Selective forwarding attacks might be countered
with multi-path routing. However, this requires redundancy in the network and results in higher network
overhead.
39.3 Energy Efcient Condentiality and Integrity
The preceding discussion of potential countermeasures against DoS attacks and general attacks on routing
in wireless sensor networks has shown that the security services, condentiality and integrity, prove to be
valuable mechanisms against various attacks. Obviously, they are also effective measures to protect applic-
ation data (e.g., commands and sensor readings) against unauthorized eavesdropping and manipulation,
2006 by Taylor & Francis Group, LLC
39-8 Embedded Systems Handbook
respectively. In this section, we will therefore examine their efcient implementation in resource restricted
sensor networks.
In their paper, SPINS: Security Protocols for Sensor Networks, Perrig et al. [10] discuss the requirements
and propose a set of protocols for realizing efcient security services for sensor networks. The main chal-
lenges in the design of such protocols arise out of tight implementation constraints in terms of instruction
set, memory, CPU speed, a very small energy budget in low-powered devices, and the fact that some
nodes might get compromised. These constraints opt out some well established alternatives: asymmetric
cryptography [1113] is generally considered to be too expensive as it results in high computational cost
and long ciphertexts and signatures (sending and receiving is very expensive!). Especially, public key
management based on certicates exceeds the sensor nodes energy budget, and key revocation is almost
impossible to realize under the restricted conditions in sensor networks. Even symmetric cryptography
implementation turns out to be nonstraightforward owing to architectural limitations and energy con-
straints. Furthermore, the key management for authenticating broadcast-like communications calls for
new approaches, as simple distribution of one symmetric group key among all receivers would not allow
to cope with compromised sensor nodes.
Perrig et al. therefore propose two main security protocols:
The Sensor Network Encryption Protocol (SNEP) for realizing efcient end-to-end security between
nodes and base stations.
A variant of the Timed Efcient Stream Loss-Tolerant Authentication Protocol (TESLA), called
TESLA, for authenticating broadcast communications, that will be further discussed in
Section 39.4.
The main goal in the development of SNEP was the efcient realization of end-to-end security services for
two-party communication. SNEP provides the security services data condentiality, data origin authentic-
ation, and replay protection. The considered communication patterns are node to base station (e.g., sensor
readings) and base station to individual nodes (e.g., specic requests). Securing messages from a base
station to all nodes (e.g., routing beacons, queries, reprogramming of the entire network) is the task of
the TESLA protocol to be discussed in Section 39.4. The main design decisions in the development of
SNEP were to avoid the use of asymmetric cryptography, to construct all cryptographic primitives out
of a single block cipher, and to exploit common state in order to reduce communication overhead where
this is possible.
SNEPs basic trust model assumes that two communicating entities A and B share a common master
key K
A,B
. Initially, the base station shares a master key with all nodes and node-to-node master keys can
be negotiated with the help of the base station (see later). From such a master key two condentiality keys
CK
A,B
, CK
B,A
(one per direction), two integrity keys IK
A,B
, IK
B,A
, and a random seed RK
A,B
are derived
according to the following equations:
CK
A,B
= F
X
A,B
(39.1)
CK
B,A
= F
X
A,B
(39.2)
IK
A,B
= F
X
A,B
(39.3)
IK
B,A
= F
X
A,B
(39.4)
RK
A,B
= F
X
A,B
(39.5)
The principal cryptographic primitive of SNEP is the RC5 algorithm [14]. Three parameters of this
algorithm can be congured: the word length w[bit], the number of rounds r, and the key size b[byte],
and the resulting instantiation of the algorithm is denoted as RC5-w/r/b. What makes RC5 specically
suitable for implementation in sensor nodes is the fact that it can be programmed with a few lines of code
and that the main algorithm only makes use of three simple and efcient instructions: twos complement
addition + of words (mod 2w ), Bit-wise XOR of words, and cyclic rotation <<<. Figure 39.4 illustrates
2006 by Taylor & Francis Group, LLC
Sensor Network Security 39-9
// Algorithm: RC5-Encryption
// Input: A, B = plaintext stored in two words
// S[0, 2r+1] = an array filled by a key setup
// procedure
// Ouput: A, B = ciphertext stored in two words
A := A + S[0];
B := B + S[1];
for i := 1 to r
A := ( (A B) <<< B) + s[2i];
B := ( (B A) <<< A) + s[2i + 1];
FIGURE 39.4 The RC5 encryption algorithm.
TABLE 39.2 Plaintext Requirements for Differential Attacks on RC5
Number of rounds 4 6 8 10 12 14 16
Differential attack 2
7
2
16
2
28
2
36
2
44
2
52
2
61
(chosen plaintext)
Differential attack 2
36
2
41
2
47
2
51
2
55
2
59
2
63
(known plaintext)
the encryption function. The corresponding decryption function can be easily obtained by basically
reading the code in reverse. Prior to en- or decryption with RC5, an array s [0, 2r + 1] has to be lled by
a key preparation routine that is a little bit more tricky, but also uses only simple instructions.
Regarding the security of the RC5 algorithm, Kaliski and Yin [15] report in 1998 that the best known
attacks against RC5 with a blocklength of 64-bit have plaintext requirements as listed in Table 39.2.
According to the information given in Reference 10 (RAM requirements, etc.), Perrig et al. seem to
plan for RC5 with 8 rounds and 32-bit words (leading to a blocklength of 64-bit), so that a differential
cryptanalysis attack would require about 2
28
chosen plaintexts and about 2
47
known plaintexts and CPU
effort in the same order of magnitude. Taking into account progress in PC technology, this should be
considered on the edge of being secure (if an attacker can collect that many plaintexts). Nevertheless, by
increasing the number of rounds the required effort could be raised to 2
61
or 2
63
, respectively. Even higher
security requirements can by principle only be ensured by using a block cipher with a larger block size.
In SNEP, encryption of messages is performed by using the RC5 algorithm in an operational mode
called counter mode, that XORs the plaintext with a pseudo-random bit sequence which is generated by
encrypting increasing counter values (see also Figure 39.5). The encryption of message Msg with key K
and counter value Counter is denoted as: {M }
K,Counter
For computing Message Authentication Codes (MACs), SNEP uses the well established Cipher Block
Chaining Message Authentication Code (CBC-MAC) construction. This mode encrypts each plaintext
block P
1
, . . . , P
n
with an integrity key IK , XORing the ciphertext of the last encryption result C
i 1
with
the plaintext block P
i
prior to the encryption step. The result of the last encryption step is then taken as
the message authentication code (see also Figure 39.6).
Depending on whether encryption of message data is required or not, SNEP offers two message
formats:
1. The rst format appends an RC5CBCMAC computed with the integrity key IK
A,B
over the
message data:
A B : Msg | RC5 CBC(IK
A,B
, Msg)
2006 by Taylor & Francis Group, LLC
39-10 Embedded Systems Handbook
RC5
P
2
Counter +1
CK
A,B
C
2
RC5
P
1
Counter
CK
A,B
C
1
RC5
P
3
Counter +2
CK
A,B
C
3
FIGURE 39.5 Encryption in counter mode.
...
RC5
encrypt
C
1
IK
P
1
P
2
RC5
encrypt
C
2
IK
+
P
n
RC5
encrypt
C
n
IK
+
C
n-1
MAC (64 bits)
FIGURE 39.6 Computing a MAC in cipher block chaining mode.
2. The second format encrypts the message and appends a MAC in whose computation the counter
value is also included:
A B : {Msg}
CK
A,B
,Counter
|RC5CBC(IK
A,B
, Counter, {Msg}
CK
A,B
,Counter
)
Please note that the counter value itself is not transmitted in the message, so that common state
between sender and receiver is exploited in order to save transmission energy and bandwidth.
Furthermore, random numbers are generated by encrypting a (different) counter, and the RC5CBC
construction is also used for key derivation, as the key deriving function mentioned above is realized as:
F
X
A,B
(n) := RC5CBC(X
A,B
, n)
In order to be able to successfully decrypt a message, the receivers decryption counter needs to be
synchronized with the senders encryption counter. An initial counter synchronization can be achieved
by the following protocol, in which both entities A and B communicate their individual counter value
for encryption C
A
and C
B
to the other party, and authenticate both values by exchanging two MACs
computed with their integrity keys IK
A,B
and IK
B,A
, respectively:
A B: C
A
B A: C
B
| RC5CBC(IK
B,A
, C
A
, C
B
)
A B: RC5CBC(IK
A,B
, C
A
, C
B
)
In case of a message loss, counters get out of synchronization. By trying out a couple of different counter
values, a few message losses can be tolerated. However, as this consumes energy, after trying out a couple
of succeeding values, an explicit resynchronization dialog is initiated by the receiver A of a message. The
dialog consists of sending a freshly generated random number N
A
to B, who answers with his current
2006 by Taylor & Francis Group, LLC
Sensor Network Security 39-11
counter C
B
and a MAC computed with his integrity key over both the random number and the counter
value:
A B: N
A
B A: C
B
| RC5CBC(IK
B,A
, N
A
, C
B
)
As encrypted messages are only accepted by a receiver if the counter value used in their MACcomputation
is higher thanthe last accepted value, the implementationof the condentiality service inSNEPto a certain
degree also provides replay protection. If for a specic request Req an even tighter time synchronization is
needed, the request can also contain a freshly generated random number N
A
that will be included in the
computation of the MAC of the answer message containing the response Rsp:
A B: N
A
, Req
B A: {Rsp}
CK
B,A
,C
B
|RC5CBC(IK
B,A
, N
A
, C
B
, {Rsp}
CK
B,A
,C
B
)
In order to establish a shared secret SK
A,B
between two sensor nodes A and B with the help of base station
BS, SNEP provides the following protocol:
A B: N
A
| A
B BS: N
A
| N
B
| A | B |RC5CBC(IK
B,BS
| N
A
| N
B
| A | B)
BS A: {SK
A,B
}
K
BS,A
|RC5CBC(IK
BS,A
| N
A
| B | {SK
A,B
}
K
BS,A
)
BS B: {SK
A,B
}
K
BS,B
|RC5CBC(IK
BS,B
| N
B
| A | {SK
A,B
}
K
BS,B
)
In this protocol, A rst sends a random number N
A
and his name to B, who in turn sends both values
together with his own randomnumber N
B
and name B to the base station. The base station then generates
a session key SK
A,B
and sends it to both sensor nodes in two separate messages, which are encrypted with
the respective key the base station shares with each node. The random numbers N
A
and N
B
allow both
sensor nodes to verify the freshness of the returned message and the key contained in it.
Regarding the security properties of this protocol, however, it has to be remarked that ina strict sense the
protocol as formulated in Reference 10 does neither allow A nor B to performconcurrent key negotiations
with multiple entities, as in such a case they would not be able to securely relate the answers to the correct
protocol run (please note that the name of the peer entity is not transmitted in the returned message
but only included in the MAC computation). Furthermore, neither A nor B knows, if the other party
received the key and trusts in its suitability, which is commonly regarded as an important objective of a
key management protocol [16]. Finally, the base station cannot deduce anything about the freshness of
messages and can therefore not differentiate between fresh and replayed requests for a session key.
39.4 Authenticated Broadcast
Authenticated broadcast is required if one message needs to be sent to all (or many) nodes in a sensor
network, and the sensor nodes have to be able to verify the authenticity of the message. Examples for
this communication pattern are authenticated query messages, routing beacon messages, or commands
to reprogram an entire network. As it has to be ensured, that recipients of such a message should not be
able to make use of their verifying key for forging authenticated messages, an asymmetric mechanism has
to be deployed. Classical asymmetric cryptography, however, is considered to be too expensive in terms of
computation, storage, and communication requirements for sensor nodes.
One basic idea for obtaining asymmetry while at the same time deploying a symmetrical cryptographic
algorithm is to send a message that has been authenticated with a key K
i
and to disclose this key at a later
2006 by Taylor & Francis Group, LLC
39-12 Embedded Systems Handbook
point in time, so that the authenticity of the message can be veried. Of course, from the moment in
which the key disclosure message has been sent, a potential attacker could use this key to create MACs for
forged messages. Therefore, it is important that all receivers have at least loosely synchronized clocks and
only use a key K
i
to verify messages that have been received before the key disclosure message was sent.
However, it must also be ensured that a potential attacker cannot succeed in tricking genuine nodes into
accepting bogus authentication keys generated by himself. One elegant way to achieve this is the inverse
use of a chain of hash codes for obtaining integrity keys, basically a variation of the so-called one-time
password idea [17].
The TESLA protocol uses a reversed chain of hash values to authenticate broadcast data streams [18].
The TESLAprotocol proposed to be used in sensor networks is a minor variation of the TESLAprotocol,
with the basic difference being the cryptographic scheme used to authenticate the initial key. While TESLA
uses asymmetric cryptography for this, TESLA deploys the SNEP protocol, so that the base station
calculates for each sensor node one individual MAC that authenticates the initial key K
0
. Furthermore,
while TESLA discloses the key in every packet, TESLA discloses the key only once per time interval in
order to reduce protocol overhead, and only base stations authenticate broadcast packets because sensor
nodes are not capable of storing entire key chains.
In order to setup a sender, rst the length n of the key chain to be computed is chosen and the last key
of the key chain K
n
is randomly generated. Second, the entire hash key chain is computed according to
the equation K
n1
:= H(K
n
), stored at the sender, and the key K
0
is communicated and authenticated
to all participating sensor nodes. For this, each sensor node A sends a random number N
A
to the base
station and the base station answers with a message containing its current time, the currently disclosed
key K
i
(in the initial case: i = 0), the time period T
i
in which K
i
was valid for authenticating messages, the
interval length T
Int
, the number of intervals the base station waits before disclosing a key, and a MAC
computed with the integrity key K
BS,A
over these values:
A BS: N
A
| A
BS A: T
BS
| K
i
| T
i
| T
Int
| |RC5CBC(IK
BS,A
| N
A
| T
BS
| K
i
| T
i
| T
Int
| )
After this preparatory phase, broadcasting authenticated packets is then realized as follows:
Time is divided in uniform length intervals T
i
and all sensor nodes are loosely synchronized to
the clock of the base station.
In time interval T
i
, the sender authenticates packets with key K
i
.
The key K
i
is disclosed in time interval i + (e.g., = 2).
Figure 39.7 illustrates this reverse use of the chain of hash values for authenticating packets. In order to
check the authenticity of a received packet, a sensor node rst has to store the packet together with T
i
and
wait until the respective key has been disclosed by the base station. Upon disclosure of the appropriate
key K
i
the authenticity of the packet can be checked.
Of course, it is crucial to discard all packets that have been authenticated with an already disclosed key
for this scheme to be secure.This requires at least a loose time synchronization with an appropriate value
Use key
Interval
Packet
Disclose
K
1
K
2
K
3
K
4
K
5
K
6
K
7
K
8
T
1
T
2
T
3
T
4
T
5
T
6
T
7
T
8
H H H H H H H
t
P
1
MAC
2
P
2
MAC
3
P
3
MAC
5
P
4
MAC
5
P
5
MAC
7
K
1
K
2
K
3
K
4
K
5
K
6
K
7
FIGURE 39.7 An example of TESLA operation.
2006 by Taylor & Francis Group, LLC
Sensor Network Security 39-13
of that needs to be selected in accordance with the maximum clock drift. However, as nodes cannot
store many packets, key disclosure cannot be postponed for a long time so that the maximum clock drift
should not be too big.
If a sensor node should need to send a broadcast packet, it would send a SNEP protected packet to the
base station, which in turn would then send an authenticated broadcast packet. The main reason for this is
that sensor nodes do not have enough memory for storing key chains and can, therefore, not authenticate
broadcast packets on their own.
39.5 Alternative Approaches to Key Management
Key management is often said to be the hardest part of implementing secure communications, as on one
hand legitimate entities need to hold or be able to agree on the required keys, and on the other hand,
a suite of security protocols cannot offer any protection if the keys fall in the hands of an attacker. The
SNEP protocol suite as described in Section 39.3 includes a simple and rather traditional key management
protocol, that enables two sensor nodes to obtain a shared secret key with the help of a base station. In this
section, we will treat the subject of key management in more depth and reviewalternative approaches to it.
Key management comprises of the following tasks [1]:
Key generation is the creation of the keys that are used. This process must be executed in a random
or at least pseudo-random-controlled way, because hackers will otherwise be able to execute the
process themselves and in a relatively short time, will discover the key that was used for security.
Pseudo-random-controlled key generation means that keys are created according to a deterministic
approach but each possible key has the same probability of being created fromthe method. Pseudo-
random generators must be initialized with a real random value so that they do not always produce
the same keys. If the process of key generation is not reproducible, it is referred to as really random
key generation.
The task of key distribution consists of deploying generated keys in the place in a system where they
are needed. In simple scenarios the keys can be distributed through direct (e.g., personal) contact.
If larger distances are involved and symmetric encryption algorithms are used, the communication
channel again has to be protected through encryption. Therefore, a key is needed for distributing
keys. This necessity supports the introduction of what is called key hierarchies.
When keys are stored, measures are needed to make sure that they cannot be read by unauthorized
users. One way to address this requirement is to ensure that the key is regenerated from an easy
to remember but sufciently long password (usually an entire sentence) before each use, and
therefore is only stored in the memory of the respective user. Another possibility for storage is
manipulation-safe crypto-modules that are available on the market in the form of processor chip
cards at a reasonable price.
Key recovery is the reconstruction of keys that have been lost. The simplest approach is to keep
a copy of all keys in a secure place. However, this creates a possible security problem because an
absolute guarantee is needed that the copies of the keys will not be tampered with. The alternative
is to distribute the storage of the copies to different locations, which minimizes the risk of fraud-
ulent use so long as there is an assurance that all parts of the copies are required to reconstruct
the keys.
Key invalidation is an important task of key management, particularly with asymmetric crypto-
graphic methods. If a private key is known, then the corresponding public key needs to be identied
as invalid. In sensor networks, key invalidation is expected to be a quite likely operation, as sensor
nodes may be relatively easy to capture and compromise.
The destruction of no longer required keys is aimed at ensuring that messages ciphered with them
also cannot be decrypted by unauthorized persons in the future. It is important to make sure
that all copies of the keys have really been destroyed. In modern operating systems this is not a
trivial task since storage content is regularly transferred to hard disk through automatic storage
2006 by Taylor & Francis Group, LLC
39-14 Embedded Systems Handbook
management and the deletion in memory gives no assurance that copies of the keys no longer
exist. In the case of magnetic disk storage devices and so-called EEPROMs (Electrically Erasable
Programmable Read-Only Memory), these have to be overwritten or destroyed more than once to
guarantee that the keys stored on them can no longer be read, even with sophisticated technical
schemes.
From the listed tasks, most key management protocols address the task of key distribution and sometimes
also concern key generation. Approaches to distributing keys in traditional networks, however, do not
work well in wireless sensor networks. Methods based on asymmetric cryptography require very resource
intensive computations and are, therefore, often judged as not being appropriate for sensor networks.
Arbitrated key management based on predetermined keys, such as the key management protocol of SNEP,
on the other hand, assume predetermined keys at least between the base station and sensor nodes. This
requires predistribution of these keys before deployment of the sensor network and also has some security
implications in case of node compromise.
There are a couple of particular requirements to key management schemes for sensor networks resulting
from their specic characteristics [19]:
Vulnerability of nodes to physical capture and node compromise. Sensor nodes may be deployed in difcult
situation to protect/hostile environments and can therefore fall into the hands of an attacker. Because of
tight cost constraints, nodes will often not be tamper-proof, so that cryptographic keys might be captured
by an attacker. This leads to the requirement, that compromise of some nodes and keys should not
compromise the overall networks security (graceful degradation).
Lack of a priori knowledge of deployment conguration. In some applications, sensor networks will be
installed via random scattering (e.g., from an airplane), so that neighborhood relations are not known
a priori. Even with manual installation, preconguration of sensor nodes would be expensive in large net-
works. This leads tothe requirement that sensor networks key management shouldsupport forautomatic
conguration after installation.
Resource restrictions. As mentioned earlier, nodes of a sensor network only possess limited memory
and computing resources, as well as very limited bandwidth and transmission power. This puts tight
constraints on the design of key management procedures.
In-network processing. Over-reliance on a base station as source of trust may result in inefcient com-
munication patterns (cf. data aggregation in Section 39.6). Also, it turns base stations into attractive
targets (which they are in any case!). Therefore, centralistic approaches like the key management protocol
of SNEP should be avoided.
Need for later addition of sensor nodes. Compromise, energy exhaustion or limited material/calibration
lifetime may make it necessary to add new sensors to an existing network. However, legitimate nodes that
have been added to sensor network should be able to establish secure relationships with existing nodes.
Erasure of master keys after initial installation (cf. the LEAP approach described later) does not allow this.
In the following, we will describe two new alternatives to traditional key management approaches that
have been proposed for sensor networks: the neighborhood-based initial key exchange protocol Localized
Encryption and Authentication Protocol (LEAP), and the approach of probabilistic key distribution.
The LEAP [20] enables automatic and efcient establishment of security relationships in an initial-
ization phase after installation of the nodes. It supports key establishment for various trust relationships
between:
Base station and sensor with so-called individual keys
Sensors that are direct neighbors with pairwise shared keys
Sensors that form a cluster with cluster keys
All sensors of a network with a group key
In order to establish individual keys prior to deployment, every sensor node u is preloaded with an
individual key K
m
u
known only to the node and the base station. The base station s generates these keys
2006 by Taylor & Francis Group, LLC
Sensor Network Security 39-15
from a master key K
m
s
and the node identity u according to the equation K
m
u
:= f (K
m
s
, u). Generating
all node keys from one master key is supposed to save memory at the base station, as the individual keys
need not be stored at the base station but can be generated on-the-y when they are needed.
In scenarios in which pairwise shared keys cannot be preloaded into sensor nodes because of installation
by random scattering but neighboring relationships remain static after installation, LEAP provides for a
simple key establishment procedure for neighboring nodes. For this, it is assumed that there is a minimum
time interval T
min
during which a node can resist against attacks. After being scattered in the eld, sensor
nodes establish neighboring relations during this time interval based on an initial group key K
I
that has
been precongured into all sensor nodes before deployment. First, every node u computes its master key
K
u
= f (K
I
, u). Then, every node discovers its neighbors by sending a message with its identity u and a
random number r
u
and collecting the answers:
u : u | r
u
v u: v | MAC(K
v
, r
u
| v)
As u can also compute K
v
, it can directly check this MAC and both nodes compute the common shared
secret K
u,v
:= f (K
v
, u). After expiration of the timer T
min
, all nodes erase the initial group key K
I
and all
computed master keys so that only the pairwise shared keys are kept. This scheme can be augmented with
all nodes forwarding also the identities of their neighbors, enabling a node to compute pairwise shared
keys with nodes that are one hop away.
In order to establish a cluster key with all its immediate neighbors, a node randomly generates a cluster
key K
c
u
and sends it individually encrypted to all neighbors v
1
, v
2
, . . . :
u v
i
: E(K
u,v
i
, K
c
u
)
All nodes v
i
decrypt this message with their pairwise shared key K
u,v
i
and store the obtained cluster key.
When a node is revoked, a new cluster key is distributed to all remaining nodes.
If a node u wants to establish a pairwise shared key with a node c that is multiple hops away, it can
do so by using other nodes it knows as proxies. In order to detect suitable proxy nodes v
i
, u broadcasts a
query message with its own node id and that of c. Nodes v
i
knowing both nodes u and c will answer to
this message:
u : u | c
v
i
u: v
i
Assuming that node u has received m answers, it then generates m shares sk
1
, . . . , sk
m
of the secret key
K
u,c
to be established with c and sends them individually over the appropriate nodes v
i
:
u v
i
: E(K
u,v
i
, sk
i
) | f (sk
i
, 0)
v
i
c: E(K
v
i
,c
, sk
i
) | f (sk
i
, 0)
The value f (sk
i
, 0) allows the nodes v
i
and c to verify if the creator of such a message actually knew the
key share sk
i
, as otherwise it would not have been able to compute this value (the function f needs to
be a one-way function for this to be secure). After receiving all values sk
i
, node c computes K
u,c
:=
sk
1
sk
m
.
In order to establish a new group key K
g
, the base station s randomly generates a new key and sends it
encrypted with its own cluster key to its neighbors:
s v
i
: E(K
c
s
, K
g
)
2006 by Taylor & Francis Group, LLC
39-16 Embedded Systems Handbook
All nodes receiving such a message forward the new group key encrypted with their own cluster key to
their neighbors.
Node revocation is performed by the base station and uses TESLA. All nodes, therefore, have to be
preloaded with an authentic initial key K
0
, and loose time synchronization is needed in the sensor network.
In order to revoke a node u, the base station s broadcasts the following message in time interval T
i
using
the TESLA key K
i
valid for that interval:
s : u | f (K
g
, 0) | MAC(K
i
, u | f (K
g
, 0))
The value f (K
g
, 0) later on allows all nodes to verify the authenticity of a newly distributed group key K
g
.
This revocation becomes valid after disclosure of the TESLA key K
i
.
A couple of remarks to some security aspects of LEAP have to be mentioned at this point:
As every node u knowing K
I
may compute the master key K
v
of every other node v, there is little
additional security to be expected from distinguishing between these different master keys. Especially,
all nodes need to hold K
I
during the discovery phase in order to be able to compute the master keys
of answering nodes. The authors of Reference 20 give no reasoning as to why they think that this differ-
entiation of master keys should attain any additional security. As any MAC construction that deserves its
name should not leak information about K
I
in a message authentication code MAC(K
I
, r
u
| v), it is hard
to see any benet in this (is it crypto snake oil?).
The synchronization of the time interval for pairwise key negotiation is critical. However, the authors of
Reference 20 give no hint on how the nodes should know when this time interval starts, or if there should
be a signal and if there should, what to do if a node misses this signal or sleeps during the interval? It is
clear that if any node is compromised before erasure of K
I
the approach fails to provide protection against
disclosure of pairwise shared keys.
It does not become clear, what is the purpose of the random value (nonce) in the pairwise shared
key establishment dialog. Pairwise shared keys are only established during T
min
, and most probably, all
neighbors will answer to the rst message anyway (including the same nonce from this message). This
random value is not even included in the computation of K
u,v
, and so the only thing that can be defended
against it, is an attacker that sends replayed replies during T
min
, but these would not result in additional
storage of keys K
u,v
or anything else than having to parse and discard these replays.
The cluster key establishment protocol does not allow a node to check the authenticity of the received
key, as every attacker could send some binary data that is decrypted tosomething. This would overwrite
an existing cluster key K
c
u
with garbage, leading to a DoS vulnerability. By appending a MAC this could
be avoided. However, an additional replay protection would be required in this case in order to avoid
overwriting with old keys.
Furthermore, after expiration of the initial time interval T
min
, it is no longer possible to establish pairwise
shared keys among neighbors, so that the LEAP approach does not support later addition/exchange of
sensor nodes.
In 2002, Eschenauer and Gligor [21] proposed a probabilistic key management scheme that is based on
the simple observation that, on one hand, sharing one key K
G
among all sensors leads to weak security,
and on the other hand, sharing individual keys K
i,j
among all nodes i, j requires too many keys in large
sensor networks (n
2
n keys for n nodes). The basic idea of probabilistic key management is to randomly
give each node a so-called key ring containing a relatively small number of keys from a large key pool,
and to let neighboring nodes discover the keys they share with each other. By properly adjusting the size
of the key pool and the key rings, a sufcient degree of shared key connectivity for a given network size
can be attained.
The basic scheme published in Reference 21 consists of three phases:
Key predistribution
Shared key discovery
Path key establishment
2006 by Taylor & Francis Group, LLC
Sensor Network Security 39-17
The key predistribution consists of ve steps that are processed ofine. First, a large key pool P with
about 2
17
to 2
20
keys and accompanying key identiers is generated. Then, for each sensor k keys are
randomly selected out of P without replacement, in order to establish the sensors key ring. Every sensor
is loaded with its key ring comprising the selected keys and their identiers. Furthermore, all sensor
identiers and the key identiers of their key ring are loaded into a controller node. Finally, a shared
key for secure communication with each sensor s is loaded into the controller node ci, according to the
following rule: if K
1
, . . . , K
k
denote the keys on the key ring of sensor s, the shared key K
ci,s
is computed
as K
ci,s
:= E(K
1
K
k
, ci).
The main purpose of the key predistribution is to enable any two sensor nodes to identify a common
key with a certain probability. This probability, that two key rings KR1, KR2 share at least one common
key, can be computed as follows:
Pr(KR1 & KR2 share at least one key) = 1 Pr(KR1 & KR2 share no key)
The number of possible key rings is:
P
k
=
P!
k!(P k)!
The number of possible key rings after k keys have been drawn from the key pool without replacement is:
P k
k
=
(P k)!
k!(P 2k)!
Thus, the probability that no key is shared is the ratio of the number of key rings without a match divided
by the total number of key rings. Concluding the probability of at least one common key is:
Pr(at least one common key) = 1
k!(P k)!(P k)!
P! k!(P 2k)!
After being installed, all sensor nodes start discovering their neighbors within the wireless communication
range, and any two nodes, wishing to nd out if they share a key, simply exchange lists of key ids on their
key ring. Alternatively, each node s could broadcast a list:
s : | E(K
1
, ) | | E(K
k
, )
A node receiving such a list would then have to try all its keys, in order to nd out matching keys (with
a high probability). This would hide from an attacker which node holds which key ids but requires more
computational overhead from each sensor node. The shared key discovery establishes a (random graph)
topology in which links exist between nodes that share at least one key. It might happen that one key is
used by more than one pair of nodes.
In the path key establishment phase, path keys are assigned to pairs of nodes (s
1
, s
n
) that do not share
a key but are connected by two or more links so that there is a sequence of nodes which share keys and
connect s
1
to s
n
. The article [21], however, does not contain any clear information on how path keys
are computed or distributed. It only states that they do not need to be generated by the sensor nodes.
Furthermore, it is mentioned, that the designof the DSNensures that, after the shared key discovery phase
is nished, a number of keys on any ring are left unassigned to any link. However, it does not become
clear from Reference 21 how two nodes can make use of these unused keys for establishing a path key.
If a node is detected to be compromised, all keys on its ring need to be revoked. For this, the controller
node generates a signature key K
e
and sends it individually to every sensor node si, encrypted with the
key K
ci,si
:
ci si: E(K
ci,si
, K
e
)
2006 by Taylor & Francis Group, LLC
39-18 Embedded Systems Handbook
Afterwards, it broadcasts a signed list of all identiers of keys that have to be revoked:
ci : id
1
| id
2
| | id
k
| MAC (K
e
, id
1
| id
2
| | id
k
)
Every node receiving this list has to delete all listed keys from his key ring. This removes all links to the
compromised node plus some more links from the random graph. Every node that had to remove some
of its links tries to reestablish as much as possible of them by starting a shared key discovery and a path
key establishment phase.
Chan et al. [19] proposed a modication to the basic random pre-distribution scheme described so
far by requiring to combine multiple shared keys. In this variant, two nodes are required to share at least
q keys on their rings, in order to establish a link. So, if K
1
, . . . , K
q
are the common keys of nodes u and v
(with q
u,v
= K
u,v
v
1
v
j
Clearly, more the paths used, the harder it gets for an attacker to eavesdrop on all of them. However,
the probability for an attacker to be able to eavesdrop on a path increases with the length of the path,
so that utilizing more but longer paths does not necessarily increase the overall security to be attained
by the scheme. In Reference 19, the special case of 2-hop multi-path key reinforcement is analyzed
probabilistically. Furthermore, the paper also describes a third approach called random pairwise key
scheme, that hands out keys to pairs of nodes which also store the identity of the respective peer node
holding the same key. The motivation behind this approach is to allow for node-to-node authentication
(see Reference 19 for details).
Concerning security, the following remarks on probabilistic key management should be noted. The
nice property of having a rather high probability that any two given nodes share at least one key (e.g.,
p = 0.5, if 75 keys out of 10,000 keys are given to every node), also plays in the hands of an attacker
who compromises a node, and an attacker that has compromised more than one node has an even
higher probability of holding at least one key with any given node. This problem also exists with the
q-composite scheme, as the key pool size is reduced in order to ensure a high enough probability that any
two nodes share at least q keys. This especially concerns the attackers ability to perform active attacks, as
eavesdropping attacks are less probable because the probability that the attacker holds exactly the key that
two other nodes are using is rather small (and even a lot smaller in the q-composite scheme). Furthermore,
keys of compromised nodes are supposed to be revoked, but as to how to detect compromised nodes is
2006 by Taylor & Francis Group, LLC
Sensor Network Security 39-19
FIGURE 39.8 Aggregating data in a sensor network.
still an open question, how to know in a sensor network which nodes and keys should be revoked? Finally,
the presented probabilistic schemes do not support node-to-node authentication (with the exception of
the random pairwise key scheme).
39.6 Secure Data Aggregation
As already mentioned in the introduction, data from different sensors is supposed to be aggregated on its
way toward the base station (see also Figure 39.8). This raises the question, how to ensure authenticity and
integrity of aggregated data? If every sensor would add a MAC to its answer in order to ensure data origin
authentication, all (answer, MAC)-tuples would have to be sent to the base station in order to enable
checking of the authenticity. This shows that individual MACs are not suitable for data aggregation.
However, if only the aggregating node added one MAC, a subverted node could send arbitrary data
regardless of the data sent by sensors.
At GlobeCom03, Du et al. [22] proposed a scheme that allows a base station to check the integrity
of an aggregated value based on endorsements provided by so-called witness nodes. The basic idea of
this scheme is that multiple nodes perform data aggregation and compute a MAC over their result. This
requires individual keys between each node and the base station. In order to allow for aggregated sending
of data, some nodes act as so-called data fusion nodes, aggregating sensor data and sending it toward the
base station. As a data fusion node could be a subverted or malicious node, his result needs to be endorsed
by witness nodes. For this, neighboring nodes receiving the same sensor readings compute their own
aggregated result, compute a MAC over this result and send it to the data fusion node. The data fusion
node computes a MACover his own result and sends it together with all received MACs to the base station.
Figure 39.9 illustrates this approach.
In more detail, the scheme described in Reference 22 is as follows:
1. The sensor nodes S
1
, S
2
, . . . , S
n
collect data from their environment and make binary decisions
b
1
, b
2
, . . . , b
n
(e.g., re detected) based on some detection rules.
2. Every sensor node sends his/her decision to the data fusion node F which computes an aggregated
decision SF.
3. Neighboring witness nodes w
1
, w
2
, . . . , w
m
also receive the sensor readings and compute their own
fusion results s
1
, s
2
, . . . , s
m
. Every w
i
computes a message authentication code MAC
i
with key k
i
it
shares with the base station, MAC
i
:= h(s
i
, w
i
, k
i
), and sends it to the base station.
2006 by Taylor & Francis Group, LLC
39-20 Embedded Systems Handbook
S
1
S
2
S
n
...
Witness 1 Witness 1 Data fusion node
Phenomenon
b
1
b
2
b
n
MAC
1
MAC
2
x
1 x
2
x
n
u
0
Base station
FIGURE 39.9 Overview of the witness based approach [22].
4. Concerning the verication at the base station, Du et al. proposed two variants. The rst one is an
m + 1 out of m + 1 voting scheme and works as follows:
The data fusion node F computes his message authentication code:
MAC
F
:= h(SF, F, k
F
, MAC
1
MAC
2
MAC
m
)
F sends to base station: (SF, F, w
1
, . . . , w
m
, MAC
F
).
The base station computes all MAC
i
= h(SF, w
i
, k
i
) and the authentication code to be expected
from F:
MAC
F
:= h(SF, F, k
F
, MAC
1
MAC
2
MAC
m
)
The base station then checks if MAC
F
= MAC
F
and otherwise discards the message.
If the set (w
1
, . . . , w
m
) remains unchanged, the identiers of the w
i
needonly tobe transmittedwith
the rst MAC
F
in order to save transmission bandwidth. There is, however, one major drawback
with this scheme: if one witness deliberately sends a wrong MAC
i
, the aggregated data gets refused
by the base station (representing a DoS vulnerability).
5. In order to overcome the DoS vulnerability of the rst scheme, Du et al. [22] also proposed an n
out of m + 1 voting scheme:
F sends to the base station: (SF, F, MAC
F
, w
1
, MAC
1
, . . . , w
m
, MAC
m
).
The base station checks if at least n out of m + 1 MACs match, that is at least n 1 MAC
i
match MAC
F
.
This scheme is more robust against erroneous or malicious witness nodes, but requires a higher
communication overhead as m MACs must be sent to the base station.
Du et al. [22] analyzed the minimumlength of the MACs in order to ensure a certain tolerance probability
2
that an invalid result is accepted by a base station. For this, they assume that each MAC has the length
k, there are m witnesses, no witness colludes with F and F needs to guess the endorsements MAC
i
for at
least n 1 witnesses. As the probability of correctly guessing one MAC
i
is p = 1/2
k
, the authors compute
the chance of correctly guessing at least n 1 values to:
P
S
=
m
i=n1
m
i
p
i
(1 p)
mi
2006 by Taylor & Francis Group, LLC
Sensor Network Security 39-21
After some computation they yield:
m(k/2 1)
From this, Du et al. conclude that it is sufcient if mk 2( +m), and give an example how to apply this.
If = 10 so that the probability of accepting an invalid result is 1/1024, and there are m = 4 witnesses,
k should be chosen so that k 7. This observation is supposed to enable economizing transmission
effort.
In case a data fusion node is corrupted, Du et al. propose to obtain a result as follows: if the verication
at the base station fails, the base station is supposed to poll witness stations as data fusion nodes, and to
continue trying until the n out of m + 1 scheme described above succeeds. Furthermore, the expected
number of polling messages T(m + 1, n) to be transmitted before the base station receives a valid result
is computed.
Regarding the security of the proposed scheme, however, it has to be considered if an attacker actually
needs to guess MACs in order to send an invalid result? As all messages are transmitted in clear, an
eavesdropper E could easily obtain valid message authentication codes MAC
i
= h(s
i
, w
i
, k
i
). If E later on
wants to act as a bogus data fusion node sending an (at this time) incorrect result s
i
he can replay MAC
i
to
support this value. As Reference 22 assumes a binary decision result, an attacker only needs to eavesdrop
until he has received enough MAC
i
supporting either value of s
i
. Thus, the scheme completely fails to
provide adequate protection against attackers forging witness endorsements.
The main reason for this vulnerability is the missing verication of the actuality of a MAC
i
at the base
station. One could imagine as a quick x letting the base station regularly send out random numbers r
B
that have to be included in the MAC computations. In such a scheme, every r
B
should only be accepted
for one result, requiring the generation and exchange of large random numbers. A potential alternative
could make use of timestamps, which would require synchronized clocks.
However, there are more open issues with this scheme. For example, it is not clear what should happen
if some witness nodes cannot receive enough readings? Also, it is not clear why the MAC
i
are not sent
directly from the witness nodes to the base station? This would at least allow for a direct n out of m + 1
voting scheme avoiding the polling procedure described earlier in case of a compromised data fusion node.
Furthermore, the sufx mode MAC construction h(message, key) selected by the authors is considered to
be vulnerable [2, note 9.65].
Afurther issue is, howto defend against an attacker ooding the network withforged MAC
i
(forged
meaning arbitrary garbage that looks like a MAC)? This wouldallowanattacker tolauncha DoSattackas an
honest fusionnode couldnot knowwhichvalues tochoose. One morehotxfor this couldbe using a local
MAC among neighbors to authenticate the MAC
i
. Nevertheless, this would imply further requirements
(e.g., shared keys among neighbors, replay protection), and the improved scheme nevertheless would
not appear to be mature enough to rely on it.
Some more general conclusions that can be drawn fromthis are that rst, optimization (e.g., economiz-
ing on MAC size, message length) can be considered as being one of the attackers best friends, and second
in security, we often learn (more) from failures. Nevertheless, the article of Du et al. allows to discuss the
need and the difculties of constructing a secure data aggregation scheme, that does not consume too
many resources and is efcient enough to be deployed in sensor networks. As such it can be considered as
a valuable contribution despite its security deciencies.
39.7 Summary
Wireless sensor networks are an upcoming technology with a wide range of promising applications. As
in other networks, however, security is crucial for any serious application. Prevalent security object-
ives in wireless sensor networks are condentiality and integrity of data, as well as availability of sensor
network services being threatened by DoS attacks, attacks on routing, etc. Severe resource constraints
2006 by Taylor & Francis Group, LLC
39-22 Embedded Systems Handbook
in terms of memory, time, and energy, and an unfair power balance between attackers and sensor
nodes makes attaining these security objectives particularly challenging. Approaches proposed for wire-
less ad hoc networks which are based on asymmetric cryptography are generally considered to be too
resource consuming. This chapter has reviewed basic considerations on protection against DoS and
attacks on routing, and given an overview of rst approaches proposed so far. For ensuring cond-
entiality and integrity of data the SNEP and TESLA protocols were discussed, and considering key
management the LEAP protocol and probabilistic key management has been reviewed. At present there
are only few works on how to design security functions suitable for the specic communication pat-
terns in sensor networks (especially with respect to data aggregation). The witness based approach
described in Reference 22 with its aws reveals the difculties in designing an appropriate protocol
for this.
References
[1] Schfer, G. Security in Fixed and Wireless Networks. John Wiley & Sons, New York, 2003
[2] Menezes, A., van Oorschot, P., and Vanstone, S. Handbook of Applied Cryptography. CRC Press
LLC, Boca Raton, FL, 1997.
[3] Karl, H. and Willig, A. A Short Survey of Wireless Sensor Networks. TKN Technical report series,
TKN-03-018, Technical University, Berlin, Germany, 2003.
[4] Wood, A. and Stankovic, J. Denial of Service in Sensor Networks. IEEE Computer, 35, 5462, 2002.
[5] Tuomas Aura, Pekka, N., and Leiwo, Jussipekka. DOS-Resistant Authentication with Client Puzzles.
In Proceedings of the Security Protocols Workshop 2000, Vol. 2001 of Lecture Notes in Computer
Science. Springer, Cambridge, UK, April 2000.
[6] Karlof, C. and Wagner, D. Secure Routing in Wireless Sensor Networks: Attacks and Countermeas-
ures. AdHoc Networks Journal, 1, 293315, 2003.
[7] Wood, A. Security in Sensor Networks. Sensor Networks Seminar, University of Virginia,
USA, 2001.
[8] Otway, D. and Rees, O. Efcient and Timely Mutual Authentication. ACM Operating Systems
Review, 21(1), 810, 1987.
[9] Hu, Y., Perrig, A., and Johnson, D. Wormhole Detection in Wireless Ad Hoc Networks. Technical
report TR01-384, Rice University, USA, June 2002.
[10] Perrig, A., Szewcyk, R., Tygar, J., Wen, V., and Culler, D. SPINS: Security Protocols for Sensor
Networks. Wireless Networks, 8, 521534, 2002.
[11] Dife, W. and Hellman, M.E. New Directions in Cryptography. Transactions of IEEE Information
Theory, IT-22, 644654, 1976.
[12] Rivest, R.L., Shamir, A., and Adleman, L.A. A Method for Obtaining Digital Signatures and Public
Key Cryptosystems. Communications of the ACM, 21(2), 120126, 1978.
[13] ElGamal, T. A Public Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms.
IEEE Transactions on Information Theory, 31, 469472, 1985
[14] Baldwin, R. and Rivest, R. The RC5, RC5CBC, RC5CBCPad, and RC5CTS Algorithms. RFC
2040, IETF, Status: Informational, October 1996. ftp://ftp.internic.net/rfc/rfc2040.txt
[15] Kaliski, B.S. and Yin, Y.L. On the Security of the RC5 Encryption Algorithm. RSA Laboratories
Technical report, TR-602, Version 1.0, 1998.
[16] Gong, L., Needham, R.M., and Yahalom, R. Reasoning About Belief in Cryptographic Protocols.
In Symposium on Research in Security and Privacy. IEEE Computer Society, IEEE Computer Society
Press, Washington, May 1990, pp. 234248.
[17] Haller, N., Metz, C., Nesser, P., and Straw, M. A One-Time Password System. RFC 2289, IETF,
Status: Draft Standard, February 1998. ftp://ftp.internic.net/rfc/rfc2289.txt
[18] Perrig, A. and Tygar, J.D. Secure Broadcast Communication in Wired and Wireless Networks. Kluwer
Academic Publishers, Dordrecht, 2003.
2006 by Taylor & Francis Group, LLC
Sensor Network Security 39-23
[19] Chan, H., Perrig, A., and Song, D. Random Key Predistribution Schemes for Sensor Networks.
In Proceedings of the IEEE Symposium on Security and Privacy. Berkeley, California, 2003,
pp. 197213.
[20] Zhu, S., Setia, S., and Jajodia, S. LEAP: Effcient Security Mechanisms for Large-Scale Distributed
Sensor Networks. In Proceedings of the 10th ACM Conference on Computer and Communication
Security. Washington, DC, USA, 2003, pp. 6272.
[21] Eschenauer, L. and Gligor, V.D. A Key Management Scheme for Distributed Sensor Networks.
In Proceedings of the 9th ACM Conference on Computer and Communication Security. Washington,
DC, USA, 2002, pp. 4147.
[22] Du, W., Deng, J., Han, Y., and Varshney, P.A. Witness-Based Approach for Data Fusion Assurance
in Wireless Sensor Networks. In Proceedings of the IEEE 2003 Global Communications Conference
(Globecom2003). San Francisco, CA, USA, 2003, pp. 14351439.
2006 by Taylor & Francis Group, LLC
40
Software
Development for
Large-Scale Wireless
Sensor Networks
Jan Blumenthal,
Frank Golatowski,
Marc Haase, and
Matthias Handy
University of Rostock
40.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40-1
40.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40-2
Architectural Layer Model Middleware and Services for
Sensor Networks Programming Aspect versus Behavioral Aspect
40.3 Current Software Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40-5
TinyOS MAT TinyDB SensorWare MiLAN
EnviroTrack SeNeTs
40.4 Simulation, Emulation, and Test of Large-Scale
Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40-16
TOSSIM A TinyOS SIMulator EmStar Sensor Network
Applications (SNA) Test and Validation Environment
40.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40-25
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40-25
40.1 Introduction
The increasing miniaturization of electronic components and advances in modern communication tech-
nologies enable the development of high-performance spontaneously networked and mobile systems.
Wireless microsensor networks promise novel applications in several domains. Forest re detection,
battleeld surveillance, or telemonitoring of human physiological data are only in the vanguard of plenty
of improvements encouraged by the deployment of microsensor networks. Hundreds or thousands of
collaborating sensor nodes form a microsensor network. Sensor data is collected from the observed area,
locally processed or aggregated, and transmitted to one or more base stations.
Sensor nodes can be spread out in dangerous or remote environments whereby new application elds
can be opened. A sensor node combines the abilities to compute, communicate, and sense. Figure 40.1
shows the structure of a typical sensor node consisting of processing unit, communication module (radio
interface), and sensing and actuator device.
Figure 40.2 shows a scenario taken from the environmental application domain: leakage detection of
dykes. During oods, sandbags are used to reinforce dykes. Piled along hundreds of kilometers around
40-1
2006 by Taylor & Francis Group, LLC
40-2 Embedded Systems Handbook
Central unit
(processor, memory)
Sensor
Communication
module
Sensor
Actuator
Battery
FIGURE 40.1 Structure of a sensor node.
River
Sandbag with sensor Base station
FIGURE 40.2 Example of sensor network application: leakage detection.
lakes or rivers, sandbag dykes keep waters at bay and bring relief to residents. Sandbags are stacked against
sluice gates and parts of broken dams to block off the tide. To nd out spots of leakage each sandbag
is equipped with a moisture sensor and transmits sensor data to a base station next to the dyke. Thus,
leakages can be detected earlier and reinforcement actions can be coordinated more efciently.
Well-known research activities in the eld of sensor networks are UCLAs WINS [1], Berkeleys Smart
Dust [2], WEBS [3], and PicoRadio [4]. An example of European research activities is the EYES-Project
[5]. Detailed surveys on sensor networks can be found in [6] and [7]. This chapter focuses on innov-
ative architectures and basic concepts of current software development solutions for wireless sensor
networks.
40.2 Preliminaries
Central unit of a sensor node is a low-power microcontroller that controls all functional parts. Software
for such a microcontroller has to be resource aware on one hand. On the other hand, several Quality-of-
Service (QoS) aspects have to be met by sensor node software, such as latency, processing time for data
fusion or compression, or exibility regarding routing algorithms or MAC techniques.
Conventional software development for microcontrollers usually covers hardware abstraction layer
(HAL), operating systemand protocols, and application layer. Often, software for microcontrollers is lim-
ited to an application-specic monolithic software block that is optimized for performance and resource
usage. Abstracting layers, such as HAL or operating system, are often omitted due to resource constraints
and low-power aspects.
Microcontrollers are often developed and programmed for a specic, well-dened task. This limitation
of the application domain leads to high-performance embedded systems even with strict resource con-
straints. Development and programming of such systems is too much effort. Furthermore, an application
developed for one microcontroller is in most cases not portable to any other one, so that it has to be
reimplemented fromscratch. Microcontroller and application forman inseparable unit. If the application
domain of an embedded system changes often, the whole microcontroller is replaced instead of writing
and downloading a new program.
For sensor nodes, application-specic microcontrollers are preferred instead of general-purpose micro-
processors. This is because of the small size and the lowenergy consumptionof those controllers. However,
requirements concerning a sensor node exceed the main characteristics of a conventional microcontroller
2006 by Taylor & Francis Group, LLC
Software Development 40-3
and its software. The main reason for this is the dynamic character of a sensor nodes task. Sensor nodes
can adopt different tasks, such as sensor data acquisition, data forwarding, or information processing.
The task assigned to a node with its deployment is not xed until the end of its life-cycle. Depending
on, for instance, location, energy level, or neighborhood of a sensor node, a task change can become
advantageous or even necessary.
Additionally, software for sensor nodes should be reusable. An application running on a certain sensor
node should not be tied to a specic microcontroller but to some extent be portable onto different
platforms to enhance interoperability of sensor nodes with different hardware platforms. Not limited
to software development for wireless sensor networks is the general requirement for a straightforward
programmability and, as a consequence, a short development time.
It is quite hard or even impossible to meet the requirements mentioned above with a monolithic applic-
ation. Hence, at present there is much research effort in the areas of middleware and service architectures
for wireless sensor networks. A middleware for wireless sensor networks should encapsulate required
functionality in a layer between operating system and application. Incorporating a middleware layer has
the advantage that applications get smaller and are not tied to a specic microcontroller. At the same time,
development effort for sensor node applications (SNAs) reduces since a signicant part of the function-
ality moves from application to middleware. Another research domain tends to service architectures for
wireless sensor networks. A service layer is based on mechanisms of a middleware layer and makes its
functionality more usable.
40.2.1 Architectural Layer Model
Like in other networking systems, the architecture of a sensor network can be divided into different
layers (see Figure 40.3). The lower layers are the hardware and HAL. The operating system layer and
protocols are above the hardware-related layers. The operating system provides basic primitives, such
as multithreading, resource management, and resource allocation that are needed by higher layers. Also
access to radio interface and input/output operations to sensing devices are supported by basic operating
system primitives. Usually in node-level operating systems these primitives are rudimentary and there is
no separation between user and kernel mode. On top of the operating system layer reside middleware,
service, and application layer.
In recent years, much work has been done to develop sensor network node devices (e.g., Berkeley
motes [8]), operating systems and algorithms, for example, for location awareness, power reduction,
data aggregation, and routing. Today researchers are working on extended software solutions including
middleware and service issues for sensor networks. The main focus of these activities is to simplify
application development process and to support dynamic programming of sensor networks.
The overall development process of sensor node software usually ends with a manual download of an
executable image over direct wired connections or over-the-air interface to target node.
Applications
Hardware abstraction layer
Operating systems and protocols
Middleware
Services
Hardware
FIGURE 40.3 Layered software model.
2006 by Taylor & Francis Group, LLC
40-4 Embedded Systems Handbook
After deployment of nodes, it is nearly impossible to improve or adapt new programs to the target
nodes. But this feature is necessary in future wireless sensor networks to adapt the behavior of sensor
network dynamically through new injected programs or capsules, a possibility that exists in MAT [24].
The task assigned to a node with its deployment is not xed until the end of its life-cycle. Depending
on, for instance, location, energy level, or neighborhood of a sensor node, a task change can become
advantageous or even necessary.
40.2.2 Middleware and Services for Sensor Networks
In sensor networks, design and development of solutions for higher-level middleware functionality and
creation of service architectures are an open research issue. Middleware for sensor networks has two
primary goals:
Support of acceptable middleware applicationprogramming interfaces (APIs), which abstracts
and simplies low-level APIs to ease application software development and to increase portability.
Distributed resource management and allocation.
Besides the native network functions, such as routing and packet forwarding, future software architectures
are required to enable location and utilization of services. A service is a program that can be accessed
through standardized functions over a network. Services allowa cascading without previous knowledge of
each other, and thus enable the solution of complex tasks. A typical service used during the initialization
of a node is the localization of a data sink for sensor data. Gateways or neighboring nodes can provide
this service. To nd services, nodes use a service discovery protocol.
40.2.3 Programming Aspect versus Behavioral Aspect
Wireless sensor networks do not have to consist of homogeneous nodes. In reality, a network composed
of several groups of different sensor nodes is imaginable. This fact changes the software development
approach and points out new challenges as they are well known fromthe distributed systems domain. In
an inhomogeneous wireless sensor network, nodes contain different low-level systemAPIs however with
similar functions. From a developers point of view it is hard to create programs, since APIs are mostly
incompatible. To overcome the mentioned problems in heterogeneity and complexity, new software
programming techniques are required. One attempt to accomplish this aspect is the denition of an
additional API or an additional class library on top of each systemAPI. But they are all limited by some
means or other, for example, platform independency, exibility, quantity, programming language. All
approaches to achieve an identical API on different systems are covered by the programming aspect
(Figure 40.4).
Software for sensor networks
Programming aspect Behavioral aspect
Systemwide API
Splitting the complexity of APIs
Hiding heterogeneity of distributed systems
Separation of interface and implementation
Optimization of interfaces
Access to remote resources
without previous knowledge
Adaptation of software to
dynamical changes
Task change
Evolution of network over time
FIGURE 40.4 Two aspects of software for wireless sensor networks.
2006 by Taylor & Francis Group, LLC
Software Development 40-5
The programming aspect enables the developer to easily create programs on different hardware and
software platforms. But an identical API on all platforms does not necessarily take the dynamics of the
distributed systeminto account. Ideally, the application does not notice any dynamic systemchanges. This
decoupling is termed as behavioral aspect and covers:
Access to remote resources without previous knowledge, for example, remote procedure calls
(RPCs) and discovered services.
Adaptations withinthe middleware layer todynamic changes inthe behavior of a distributedsystem,
caused by incoming or leaving resources, mobility of nodes, or changes of the environment.
The ability of the network to evolve over time including modications of the systems task, exchange
or adaptation of running software parts, and mobile agents.
40.3 Current Software Solutions
This chapter presents ve important software solutions for sensor networks. It starts with the most mature
development TinyOS and depending software packages. It continues with SensorWare followed by two
promising concepts, MiLAN and EnviroTrack. The section nalizes with an introduction to SeNeTs that
features interface optimization.
40.3.1 TinyOS
TinyOS is a component-based operating system for sensor networks developed at UC Berkeley. TinyOS
can be seen as an advanced software framework [8] that has a large user community due to its open source
character and its promising design. The framework contains numerous prebuilt sensor applications and
algorithms, for example, multihopadhoc routing andsupports different sensor node platforms. Originally
it was developed for Berkeleys Mica Motes. Programmers experienced with the C programming language
can easily develop TinyOS applications written in a proprietary language called NesC [9].
The design of TinyOS is based on the specic sensor network characteristics: small physical size, low-
power consumption, concurrency-intensive operation, multiple ows, limited physical parallelism and
controller hierarchy, diversity in design and usage, and robust operation to facilitate the development
of reliable distributed applications. The main intention of the TinyOS developers was retaining energy,
computational and storage constraints of sensor nodes by managing the hardware capabilities effectively,
while supporting concurrency-intensive operation in a manner that achieves efcient modularity and
robustness [10]. Therefore, TinyOS is optimized in terms of memory usage and energy efciency. It
provides dened interfaces between the components that reside in neighboring layers. A layered model is
shown in Figure 40.5.
40.3.1.1 Elemental Properties
TinyOS utilizes an event model instead of a stack-based threaded approach, which would require more
stack space and multitasking support for context switching, to handle high levels of concurrency in a
very small amount of memory space. Event-based approaches are the favorite solution to achieve high
Hardware abstraction
Acting Sensing Communication
Application (user components)
Main (includes scheduler)
FIGURE 40.5 Software architecture of TinyOS.
2006 by Taylor & Francis Group, LLC
40-6 Embedded Systems Handbook
performance in concurrency intensive applications. Additionally, the event-based approach uses CPU
resources more efciently and therefore takes care of the most precious resource, the energy.
An event is serviced by an event handler. More complex event handling can be done by a task. The event
handler is responsible for posting the task to the task scheduler. Event and task scheduling is performed
by a two-level scheduling structure. This kind of scheduling provides that events, associated with a small
amount of processing, can be performed immediately, while longer running tasks can be interrupted by
events. Tasks are handled rapidly, however, no blocking or polling is permitted.
The TinyOS system is designed to scale with the technology trends supporting both, smaller designs
and crossover of software components into hardware. The latter provides a straightforward integration of
software components into hardware.
40.3.1.2 TinyOS Design
The architecture of a TinyOS systemconguration is shown in Figure 40.6. It consists of the tiny scheduler
and a graph of components. Components satisfy the demand for modular software architectures. Every
component consists of four interrelated parts: a command handler, an event handler, an encapsulated
xed-size and statically allocated frame, and a bundle of simple tasks. The frame represents the internal
state of the component. Tasks, commands, and handlers execute in the context of the frame and operate
on its state. In addition, the component declares the commands it uses and the events it signals. Through
this declaration, modular component graphs can be composed. The composition process creates layers
of components. Higher-layer components issue commands to lower-level components and these signal
events to higher-level components. To provide an abstract denition of the interaction of two components
via commands and events, the bidirectional interface is introduced in TinyOS.
h
h
h
h
t s
FIGURE 40.6 TinyOS architecture in detail.
2006 by Taylor & Francis Group, LLC
Software Development 40-7
Commands are nonblocking requests made to lower-layer components. A command provides feedback
to its caller by returning status information. Typically, the command handler puts the command paramet-
ers into the frame and posts a task into the task queue for execution. The acknowledgment whether the
command was successful, can be signaled by an event. Event handlers are invoked by events of lower-layer
components, or when directly connected to the hardware, by interrupts. Similar to commands, the frame
will be modied and tasks are posted. Both, commands and tasks, perform a small xed amount of work
similar to interrupt service routines. Tasks perform the primary work. They are atomic, run to completion,
and can only be preempted by events. Tasks are queued in a First In First Out (FIFO) task scheduler to
perform an immediate return of event or command handling routines. Due to the FIFO scheduling, tasks
are executed sequentially and should be short. Alternatively to the FIFO task scheduler, priority-based or
deadline-based schedulers can be implemented into the TinyOS framework.
TinyOS distinguishes three categories of components. Hardware abstraction components map physical
hardware into the component model. Mostly, this components export commands to the underlying
hardware and handle hardware interrupts. Synthetic hardware components extend the functionality
of hardware abstraction components by simulating the behavior of advanced hardware functions, for
example, bit-to-byte transformation functions. For future hardware releases, these components can dir-
ectly cast into hardware. High-level software components perform application-specic tasks, for example,
control, routing, data transmission, calculation on data, and data aggregation.
An interesting aspect of the TinyOS framework is the similarity of the component description to the
description of hardware modules in hardware description languages, for example, VHDL or Verilog.
A hardware module, for example, in VHDL is dened by an entity with input and output declarations,
status registers to hold the internal state, and a nite state machine controlling the behavior of the
module. In comparison, a TinyOS component contains commands and events, the frame, and a behavioral
description. These similarities simplify the cast of TinyOS components to hardware modules. Future sensor
node generations can benet from this similarity in describing hardware and software components.
40.3.1.3 TinyOS Application
A TinyOS application consists of one or more components. These components are separated into modules
and congurations. Modules implement application-specic code, whereas congurations wire different
components together. By using a top-level conguration, wired components can be compiled and linked
to form an executable. The interfaces between the components declare a set of commands and events
which provide an abstract description of components. The application developer has to implement the
appropriate handling routine into the component.
Figure 40.7 shows the component graph of a simple TinyOS application, that turns an LED on and
off depending on the clock. The top-level conguration contains the application-specic components
(ClockC, LedsC, BlinkM) and an operating-system-specic component providing the tiny task-scheduler
and initialization functions. The Main component encapsulates the TinyOS specic components fromthe
application. StdControl, Clock, and Leds are the interfaces used in this application. While BlinkMcontains
the application code, ClockCand LedsCare again congurations encapsulating further component graphs
controlling the hardware clock and the LEDs connected to the controller. TinyOS provides a variety of
additional extensions, such as the virtual machine (VM) MAT and the database TinyDB for cooperative
data acquisition.
40.3.2 MAT
MAT [24] is a byte-code interpreter for TinyOS. It is a tiny communication-centric VM designed as a
component for the system architecture of TinyOS. MAT is located in the component graph on top of
several system components, represented by sensor components, network component, timer component,
and nonvolatile storage component.
The developer motivation for MAT was to solve novel problems in sensor network management and
programming, in response to changing tasks, for example, exchange of the data aggregation function
2006 by Taylor & Francis Group, LLC
40-8 Embedded Systems Handbook
BlinkM
ClockC LedsC
Main
StdControl
Clock Leds
Clock LED
Hardware
TinyOS
component graph
FIGURE 40.7 Simple TinyOS application.
or routing algorithm. However, the associated inevitable reprogramming of hundreds or thousands of
nodes is restricted to energy and storage resources of sensor nodes. Furthermore, the network is limited in
bandwidth and network activity as a large energy draw. MAT attempts to overcome these problems, by
propagating so-called code capsules through the sensor network. The MAT VM provides the possibility
to compose a wide range of sensor network applications by the use of a small set of higher-level primitives.
In MAT, these primitives are one-byte instructions and they are stored into capsules of 24 instructions
together with identifying and versioning information.
40.3.2.1 MAT Architecture
MAT is a stack-based architecture that allows a concise instruction set. The use of instructions hides the
asynchronous character of native TinyOS programming; because instructions are executed successively as
several TinyOS tasks.
The MAT VM shown in Figure 40.8 has three execution contexts: Clock, Send, and Receive, which can
run concurrently at instruction granularity. Clock corresponds to timer events and Receive to message
receive events, signaled from the underlying TinyOS components. Send can only be invoked from the
Clock or Receive context. Each context holds an operand stack for handling data and a return stack
for subroutines calls. Subroutines allow programs to be more complex as a single capsule can provide.
Therefore, MAT has four spaces for subroutine code.
The code for the contexts and the subroutines is installed dynamically at runtime by code capsules.
One capsule ts into the code space of a context or subroutine. The capsule installation process supports
self-forwarding of capsules to reprogram a whole sensor network with new capsules. It is the task of the
sensor network operator to inject code capsules in order to change the behavior of the network.
The program execution in MAT starts with a timer event or a packet receive event. The program
counter jumps to the rst instruction of the corresponding context (Clock or Receive) and executes until
it reaches the Halt instruction. Each context can call subroutines for expanding functionality. The Send
context is invoked from the other contexts to send a message in response to a sensor reading or to route
an incoming message.
2006 by Taylor & Francis Group, LLC
Software Development 40-9
Send Receive Clock
Code
O
p
e
r
a
n
d
s
t
a
c
k
R
e
t
u
r
n
s
t
a
c
k
P
C
Code
O
p
e
r
a
n
d
s
t
a
c
k
R
e
t
u
r
n
s
t
a
c
k
P
C
O
p
e
r
a
n
d
s
t
a
c
k
R
e
t
u
r
n
s
t
a
c
k
P
C
C
o
d
e
Sub
3
Sub
2
Sub
1
Sub
0
C
o
d
e
C
o
d
e
C
o
d
e
Shared variable
Code
Network Timer Sensor Logger
TinyOS framework
MAT
Execution contexts Subroutines
Subroutine call
FIGURE 40.8 MAT architecture.
The MAT architecture provides separation of contexts. One context cannot access the state of another
context. There is only one single shared variable among the three contexts that can be accessed by special
instructions. The context separation qualies MAT to fulll the traditional role of an operating system.
Compared to native TinyOS applications, the source code of MAT applications is much shorter.
40.3.3 TinyDB
TinyDB is a query processing systemfor extracting information froma network of TinyOS sensor nodes
[11]. TinyDB provides a simple, SQL-like interface to specify the kind of data to be extracted from the
network along with additional parameters, for example, the data refresh rate. The primary goal of TinyDB
is to prevent the user from writing embedded C programs for sensor nodes or composing capsules of
instructions regarding to MAT. The TinyDB framework allows data-driven applications to be developed
and deployed much more quickly as developing, compiling, and deploying a TinyOS application.
Given a query specifying the data interests, TinyDB collects the data from sensor nodes in the envir-
onment, lters and aggregates the data, and routes it to the user autonomously. The network topology in
TinyDB is a routing tree. Query messages ood down the tree and data messages ow back up the tree
participating in more complex data query processing algorithms.
2006 by Taylor & Francis Group, LLC
40-10 Embedded Systems Handbook
The TinyDB system is divided into two subsystems: sensor node software and a Java-based client
interface on a PC. The sensor node software is the heart of TinyDB running on each sensor node. It
consists of:
Sensor catalog and schema manager responsible for tracking the set of attributes, or types of
readings and properties available on each sensor.
Query processor, utilizing the catalog to fetch the values of local attributes, to receive sensor readings
from neighboring nodes, to combine and aggregate the values together, to lter, and to output the
values to parents.
Small, handle-based dynamic memory manager.
Network topology manager to deal with the connectivity of nodes and to effectively route data and
query subresults through the network.
The sensor node part of TinyDB is installed on top of TinyOS on each sensor node as an application. The
Java-based client interface is used to access the network of TinyDB nodes from a PC physically connected
to a bridging sensor node. It provides a simple graphical query-builder and a result display. The Java API
simplies writing PC applications that query and extract data from the network.
40.3.4 SensorWare
SensorWare is a software framework for wireless sensor networks that provides querying, dissemination,
and fusion of sensor data as well as coordination of actuators [12]. A SensorWare platform has less stringent
resource restrictions. The initial implementation runs on iPAQ handhelds (1 MB ROM/128 KB RAM).
The authors intended to develop a software framework regardless of present sensor node limitations.
SensorWare developed at the University of California, Los Angeles, aims at the programmability of
an existing sensor network after its deployment. The functionality of sensor nodes can be dynamically
modied through autonomous mobile agent scripts. SensorWare scripts can be injected into the network
nodes as queries and tasks. After injection, scripts can replicate and migrate within the network. Motivation
for the SensorWare development was the observation that the distribution of updates and the download
of complete images to sensor nodes are impractical for the following reasons. First, in a sensor network,
a special sensor node may not be addressable because of missing node identiers. Second, the distribution
of complete images through a sensor network is highly energy consuming. Besides that, other nodes are
affected by a download when multihop connections are necessary.
Updating complete images does not correspond to the low-power requirements of sensor networks. As
a consequence, it is more practicable to distribute only small scripts. In the following section, the basic
architecture and concepts of SensorWare are described in detail.
40.3.4.1 Basic Architecture and Concepts
SensorWare consists of a scripting language and a runtime environment. The language contains various
basic commands that control and execute specic tasks of sensor nodes. These tasks include, for example,
communication with other nodes, collaboration of sensor data, sensor data ltering, or moving scripts to
other nodes. The language comprises necessary constructs to generate appropriate control ows.
SensorWare utilizes Tcl as scripting language. However, SensorWare extends Tcls core commands.
These core extension commands are joined in several API groups, such as Networking-API, Sensor-API,
Mobility-API (see Figure 40.9).
SensorWare is event based. Events are connected to special event handlers. If an event is signaled an
event handler serves the event according to its inherent state. Furthermore, an event handler is able to
generate new events and to alter its current state by itself.
The runtime environment shown in Figure 40.10 contains xed and platform-specic tasks. Fixed tasks
are part of each SensorWare application. It is possible to add platform-specic tasks depending on specic
application needs. The script manager task receives new scripts and forwards requests to the admission
2006 by Taylor & Francis Group, LLC
Software Development 40-11
FIGURE 40.9 SensorWare scripting language.
Script manager
(e.g., state tracking,
create new scripts)
Resource handling
Radio/
net-working
CPU and time
service
Sensing
Admission control
and policing of
resource usage
FIGURE 40.10 SensorWare runtime environment.
Applications
and
services
Injection of
scripts by user
Message
exchanging
Applications
and
services
HW abstraction layer
RTOS
Scripts Scripts
SensorWare
Code
migration
SensorWare
Hardware
Sensor node1
HW abstraction layer
RTOS
Hardware
Sensor node2
FIGURE 40.11 Sensor node architecture.
control task. The admission control task is responsible for script admission decisions and checks the
overall energy consumption. Resource handlers manage different resources of the network.
Figure 40.11 shows the architecture of sensor nodes with included SensorWare software. The Sensor-
Ware layer uses operating systemfunctions to provide the runtime environment and control scripts. Static
node applications coexist with mobile scripts. To realize dynamic programmability of a deployed sensor
network a transient user can inject scripts into the network. After injection scripts are replicated within
the network, the script code migrates between different nodes. SensorWare ensures that no script is loaded
twice onto a node during the migration process.
2006 by Taylor & Francis Group, LLC
40-12 Embedded Systems Handbook
40.3.5 MiLAN
Middleware Linking Applications and Networks (MiLANs) is a middleware concept introduced by Mark
Perillo and Wendi B. Heinzelman fromthe University of Rochester [13,14]. The main idea is to exploit the
redundancy of information provided by sensor nodes. The performance of a cooperative algorithm in a
distributed sensor network application depends on the number of involved nodes. Because of the inherent
redundancy of a sensor network where several sensor nodes provide similar or even equal information,
evaluating all possible sensor nodes leads to high energy and network costs. Therefore, a sensor network
application has to choose an appropriate set of sensor nodes to fulll application demands.
Each application should have the ability to adopt its behavior in respect to the available set of compon-
ents and bandwidth within the network. This can be achieved by a parameterized sensor node selection
process with different cost values. These cost values are described by the following cost equations:
Application performance: The minimumrequirements for network performance are calculatedfrom
the needed reliability of monitored data. F
R
= { S
i
: j J R(S
i
, j) r
j
} where F
R
stands for the
allowable set of possible sensor node combinations, S
i
represents the available sensor nodes, and
R(S
i
) is their reliability.
Network costs: Denes a subset of sensor nodes that meet the network constraints. The network
feasible set F
N
= {S
i
: N(S
i
) n
0
} where N(S
i
) represents the total cost and n
0
the maximal data
rate the network can support.
Application performance and network costs are combined to the overall feasible set: F = F
R
F
N
.
Energy: Describes the energy dissipation of the network: C
P
(S
i
) =
s
j
S
i
C
P
(S
j
), where C
P
(S
j
) is
the power cost to node S
j
.
It is up to the application to decide how these equations are weighted. This decision-making process is
completely hidden fromthe application. Thus, the development process is simplied signicantly. MiLAN
uses two strategies to achieve the objective to balance QoS and energy costs:
Turning off nodes with redundant information
Using of energy-efcient routing
The MiLAN middleware is located between network and application layer. It can interface a great variety
of underlying network protocols, such as Bluetooth and 802.11. MiLAN uses an API to abstract from
network layer but gives the application access to low-level network components. A set of commands
identies and congures the network layer.
40.3.6 EnviroTrack
EnviroTrack is a TinyOS-based application developed at the University of Virginia that solves a fun-
damental distributed computing problem, environmental tracking of mobile entities [25]. Therefore,
EnviroTrack provides a convenient way to program sensor network applications that track activities in
their physical environment. The programming model of EnviroTrack integrates objects living in physical
time andspace intothe computational environment of the applicationthroughvirtual objects, calledtrack-
ing objects. A tracking object is represented by a group of sensor nodes in its vicinity and is addressed by
context labels. If anobject moves inthe physical environment, thenthe corresponding virtual object moves
too because it is not bound to a dedicated sensor node. Regarding the tracking of objects, EnviroTrack
does not assume cooperation fromthe tracked entity.
Before a physical object or phenomenon can be tracked, the programmer has to specify its activities
and corresponding actions. This specication enables the system to discover and tag those activities and
to instantiate tracking objects. For example, to track an object warmer than 100C, the programmer
species a Boolean function, temperature >100C, a critical number or mass of sensor nodes, which fulll
the Boolean function within a certain time (a fact that is often referred to as freshness of information).
These parameters of a tracking object are called aggregate state. All sensor nodes matching this aggregate
2006 by Taylor & Francis Group, LLC
Software Development 40-13
state join a group. The network abstraction layer assigns a context label to this group. Using this label,
different groups can be addressed independent of the set of nodes currently assigned to it. If the tracked
object moves, nodes join or leave the group because of the changed aggregate state but the label resides
persistent. This group management enables context-specic computation.
The EnviroTrack programming system consists of:
EnviroTrack compiler. In EnviroTrack programs, a list of context declarations is dened. Each denition
includes an activation statement, an aggregate state denition, and a list of objects attached to the deni-
tions. The EnviroTrack compiler includes C program templates. The whole project is then built using the
TinyOS development tools.
Group management protocol. All sensors associated to a group are maintained by this protocol. A group
leader is selected out of the group members when the critical mass of nodes and freshness of the approx-
imate aggregate state is reached. The group management protocol ensures that only a single group leader
per group exists. The leader sends a periodical heartbeat to inform its members that the leader is alive.
Additionally, the heartbeat signal is used to synchronize the nodes and to inform nodes that are not part
of the group, but fulll the sensing condition.
Object naming and directory services. These services maintain all active objects and their locations. The
directory service provides a way to retrieve all objects of a given context type. It also assigns names to
groups so they can be accessed easily. It handles dynamical joining and leaving of group members.
Communication and transport services. The Migration Transport Protocol (MTP) is responsible for the
transportation of data packets between nodes. All messages are routed via group leader nodes. Group
leader nodes identify the context group of the target node and the position of its leader using the directory
service. The packet is then forwarded to the leader of the destination group. All leadership information
provided by MTP packets is stored in the leaders on a least recently used basis to keep the leader up-to-date
and to reduce directory lookups.
EnviroTrack enables the construction of an information infrastructure for the tracking of environmental
conditions. It manages dynamic groups of redundant sensor nodes and attaches computation to external
events in the environment. Furthermore, EnviroTrack implements noninterrupted communication
between dynamically changing physical locales dened by environmental events.
40.3.7 SeNeTs
SeNeTs is a middleware architecture for wireless sensor networks. It is developed at the University of
Rostock [15]. The SeNeTs middleware is primarily designed to support the developer of a wireless sensor
network during the predeployment phase (programming aspect). SeNeTs supports the creation of small
and energy-saving programs for heterogeneous networks. One of the key features of SeNeTs is the optim-
ization of APIs. The required conguration, optimization, and compilation of software components is
processed by a development environment. Besides the programming aspect, the middleware supports the
behavioral aspect as well such as task change or evolution over time.
40.3.7.1 SeNeTs Architecture
SeNeTs is based on the software layer model introduced in Chapter 2. To increase exibility and enhance
scalability of sensor node software, it separates small functional blocks as shown in Figure 40.12. In
addition, the operating systemlayer is separated into a node-specic operating systemand a driver layer,
which contains at least one sensor driver and several hardware drivers, such as timer driver and RF driver.
The node-specic operating system handles device-specic tasks, for example, boot-up, initialization of
hardware, memory management, and process management as well as scheduling. Host middleware is the
superior software layer. Its main task is to organize the cooperation of distributed nodes in the network.
Middleware core handles four optional components, which can be implemented and exchanged according
to the nodes task. Modules are additional components that increase the functionality of the middleware.
Typical modules are routing modules or security modules. Algorithms describe the behavior of modules.
2006 by Taylor & Francis Group, LLC
40-14 Embedded Systems Handbook
Node-specific operating system
Hardware
Sensor driver
Services Modules Algorithms
Host middleware
Middleware core
VM
Sensor
Hardware drivers
FIGURE 40.12 Structure of a node application.
Distributed middleware
Administration terminal
Node A Node B Node C
Sensor network application
Operating system
Host middleware
Hardware
Operating system
Host middleware
Hardware
Operating system
Host middleware
Hardware
FIGURE 40.13 Structure of a sensor network.
For example, the behavior of a security module can vary in the case the encryption algorithm changes.
The services component contains the required software to perform local and cooperative services. This
component usually cooperates with service components of other nodes to fulll its task. VMs enable an
execution of platform independent programs installed at runtime.
Figure 40.13 shows the expansion of the proposed architecture to a whole sensor network from the
logical point of view. Nodes can only be contacted through services of the middleware layers. The distrib-
uted middleware coordinates the cooperation of services within the network. It is logically located in the
network layer but physically exists in the nodes. All layers together in conjunction with their conguration
compose the sensor network application. Thus, nodes do not perform any individual tasks. The adminis-
tration terminal is an external entity to congure the network and evaluate results. It can be connected to
the network at any location.
All functional blocks of the described architecture are represented by components containing real source
code and a description about dependencies, interfaces, and parameters in XML. One functional block can
be rendered by alternative components. All components are predened in libraries.
40.3.7.2 Interface Optimization
One of the key features in SeNeTs is interface optimization. Interfaces are the descriptions of functions
between two software parts. As illustrated in Figure 40.14, higher-level applications using services and
2006 by Taylor & Francis Group, LLC
Software Development 40-15
Hardware
Drivers
Node-specific operating system
Middleware
Application
Degree of
abstract
interfaces
Degree of
hardware-
dependent
interfaces
FIGURE 40.14 Interfaces within the software-layer model.
Middleware
ARM
Temperature
sensor
Sensor
driver
Node-specific
operating system
modules
Middleware
Node-specific
operating system
modules
Sensor
drivers
Processor Sensor
Interface
optimization
FIGURE 40.15 Interface optimization.
middleware technologies require abstract software interfaces. The degree of hardware-dependent inter-
faces increases in lower software layers. Hardware-dependent interfaces are characterized by parameters
to congure hardware components directly in contrast to abstract software interfaces whose parameters
describe abstractions of the underlying system.
Software components require a static software interface to the application in order to minimize cus-
tomization effort for other applications and to support compatibility. The use of identical components
in different applications leads to a higher number of complex interfaces in these components. This is
caused by component programming focused on supporting most possible use cases of all possible applic-
ations whereby each application uses only a subpart of the functionality of a component. Reducing the
remaining overhead is the objective of generic software and can be done by interface optimization during
compile time.
Interface optimizations result in proprietary interfaces within a node (Figure 40.15). Parts of the
software cannot be exchanged without sensible effort. In a sensor node, the software is mostly static except
programs for VMs. Accordingly, static linking is preferred. Statically linked software in conjunction with
interface optimization leads to faster and smaller programs.
In SeNeTs, interfaces are customized to the application in contrast to common approaches used in
desktop computer systems. These desktop systems are characterized by writing huge adaptation layers.
The interface optimization can be propagated through all software layers and, therefore, saves resources.
As an example of an optimization, a function OpenSocket(int name, int mode) identies the network
interface with its rst parameter and the opening mode in the second parameter. However, a node that has
only one interface openedwithconstant mode once or twice does not needthese parameters. Consequently,
knowledge of this information at compile time can be used for optimizing, for example, by:
Inlining the function
Eliminating both parameters fromthe delivery process
2006 by Taylor & Francis Group, LLC
40-16 Embedded Systems Handbook
TABLE 40.1 Types of Interface Optimization
Optimization Description
Parameter Parameters which are not used in one of the called
elimination subfunctions can be removed.
Static If a function is still called with same parameters, these parameters can be
parameters dened as constants or static variables in the global namespace. Thus, the parameter
delivery to the function can be removed.
Parameter The sequence order of parameters is optimized in order to pass parameters through cascading
ordering functions with same or similar parameters. It is particularly favorable in systems using
processor registers instead of the system stack to deliver parameters to subfunctions.
Parameter In embedded systems, many data types are not byte-aligned, for example, bits to congure
aggregation hardware settings. If a function has several non-byte-aligned parameters,
these parameters may be combined.
Another possibility is to change the semantics of data types. A potential use case is the denition of
accuracy of addresses that results in changing data types width. In SeNeTs, there are several types of
interface optimizations proposed, which are given in Table 40.1.
Some optimizations such as static parameters are sometimes counterproductive in particular, if
register-oriented parameter delivery is used. This is caused by the use of offset addresses once at para-
meter delivery instead of absolute addresses embedded in the optimized function. Consequently, the
introduced optimizations strongly depend on:
Processor and processor architecture
Type of parameter delivery (stack or register oriented)
Memory management (small, huge, size of pointers)
Objective of optimization (memory consumption, energy consumption, compact code, etc.)
Sensor network application
40.3.7.3 Development Process
Figure 40.16 shows the development process of sensor node software in SeNeTs. First, for each functional
block the components have to be identied and included into the project. During design phase, the chosen
components are interconnected and congured depending on developers settings. Then, interface as well
as parameter optimization is performed. The nal source codes are generated and logging components
can be included to monitor runtime behavior. The generated source codes are compiled and the executable
is linked together. During evaluation phase, the created node application can be downloaded to the node
and executed. Considering the monitoring results, a new design cycle can be started to improve project
settings. As a result of the design ow, optimized node application software is generated. The node
application now consists of special tailored parts only needed by the specic application of the node.
Optionally, software components in a node can be linked together statically or dynamically. Statically
linking facilitates an optimization of interfaces between several components within a node. A dynamic
link process is used for components exchanged during runtime, for example, algorithms downloaded
fromother nodes. This procedure results in system-wide interfaces with signicant overhead and prevents
interface optimization.
40.4 Simulation, Emulation, and Test of Large-Scale
Sensor Networks
Applications and protocols for wireless sensor networks require novel programming techniques and new
approaches for validation and test of sensor network software. In practice, sensor nodes have to operate
2006 by Taylor & Francis Group, LLC
Software Development 40-17
Design and edit
Compiling and linking
Evaluation
Components
Node software
Interface optimization and
source code generation
FIGURE 40.16 Development process of node software.
in an unattended manner. The key factor of this operation is to separate unnecessary information from
important ones as early as possible in order to avoid communication overhead. In contrast, during
implementation and test phases, developers need to obtain as much information as possible from the
network. A test and validation environment for sensor network applications has to ensure this.
Consider a sensor network with thousands of sensor nodes. Furthermore, consider developing a data
fusion and aggregation algorithmthat collects sensor information fromnodes and transmits themto few
base stations. During validation and test, developers often have to change application code, recompile,
and upload a new image onto the nodes. These updates often result in ooding of the network using the
wireless channel. However, this would dissipate a lot of time and energy. But how could we ensure that
every node runs the most recent version of our application?
Pure simulation produces important insights. However, modeling the wireless channel is difcult.
Simulation tools often employ simplied propagation models in order to reduce computational efforts
for large-scale networks. Widely used simulation tools, such as NS2 [16], use simplied network protocol
stacks and do not simulate at bit level. Furthermore, code used in simulations often cannot be reused on
real sensor node hardware; why should developers implement applications and protocols twice?
In contrast to simulation, implementation on a target platform is often complicated. The targeted
hardware itself may be still in development stage. Perhaps there are a few prototypes, but developers need
hundreds of them for realistic test conditions. Moreover, prototype hardware is very expensive and far
away fromthe targeted1 cent/node. Consequently, a software environment is required that combines the
scaling power of simulations with real application behavior. Moreover, the administration of the network
must not affect sensor network applications. Afterwards, three current software approaches are presented.
40.4.1 TOSSIM A TinyOS SIMulator
Fault analysis of distributed sensor networks or their particular components is quite expensive and time
consuming especially when sensor networks consist of hundreds of nodes. For that purpose, a simulator
providing examination of several layers (e.g., communication layer, routing layer) is an efcient tool for
sensor application development.
2006 by Taylor & Francis Group, LLC
40-18 Embedded Systems Handbook
TinyOS SIMulator (TOSSIM) is a simulator for wireless sensor networks based on the TinyOS frame-
work. As described in References 17 and 18, the objectives of TOSSIM are scalability, completeness,
delity, and bridging. Scalability means TOSSIMs ability to handle large sensor networks with many
nodes in a wide range of congurations. The reactive nature of sensor networks requires not only the
simulation of algorithms but also the simulation of complete sensor network applications. Therefore,
TOSSIM achieves completeness in covering as many system interactions as possible. TOSSIM is able to
simulate thousands of nodes running entire applications. The simulators delity becomes important for
capturing subtle timing interactions on a sensor node and between nodes. A signicant attribute is the
revealing of unanticipated events or interactions. Therefore, TOSSIM simulates the TinyOS network stack
down to bit level. At last, TOSSIM bridges the gap between an academic algorithm simulation and a real
sensor network implementation. Therefore, TOSSIM provides testing and verication of application code
that will run on real sensor node hardware. This avoids programming algorithms and applications twice,
for simulation and for deployment. The TOSSIM components are integrated into the standard TinyOS
compilation tool chain, which supports the direct compilation of unchanged TinyOS applications into
the TOSSIMframework.
Figure 40.17 shows a TinyOS application divided into hardware independent and hardware dependent
components. Depending onthe target platform, the appropriate hardware dependent modules are selected
in the compilation step. This permits an easy extension regarding newsensor node platforms. At the same
time, this is the interface to the TOSSIM framework. Compared with a native sensor node platform,
TOSSIM is a sensor node emulation platform supporting multiple sensor node instances running on
standard PC hardware. Additionally, the TOSSIM framework includes a discrete event queue, a small
number of reimplemented TinyOS hardware abstraction components, mechanisms for extensible radio
and Analog-to-Digital Converter (ADC) models, and communication services for external programs to
interact with a simulation.
The core of the simulator is the event queue. Because TinyOS utilizes an event-based scheduling
approach, the simulator is event driventoo. TOSSIMtranslates hardware interrupts into discrete simulator
events. The simulator event queue emits all events that drive the execution of a TinyOS application. In
contrast to real hardware interrupts, events cannot be preempted by other events and therefore are not
nested.
The hardware emulation of sensor node components is performed by replacing a small number of
TinyOS hardware components. These include the ADC, the clock, the transmit strength variable poten-
tiometer, the EEPROM, the boot sequence component, and several components of the radio stack. This
enables simulations of a large number of sensor node congurations.
The communication services are the interface to PC applications driving, monitoring, and actuating
simulations by communicating with TOSSIM over TCP/IP. The communication protocol was designed
at an abstract level and enables developers to write their own systems that hook into TOSSIM. TinyViz is
an example of a TOSSIMvisualization tool that illustrates the possibilities of TOSSIMs communication
services. It is a Java-based graphical user interface providing visual feedback on the simulation state to
control running simulations, for example, modifying ADC readings and radio loss properties. A plug-in
interface for TinyViz allows developers to implement their own application-specic visualization and
control code.
TOSSIMdoes not model radio propagation, power draw, or energy consumption. TOSSIMs is limited
by the interrupts that are timed by the event queue. They are nonpreemptive.
In conclusion, TOSSIM is an event-based simulation framework for TinyOS-based sensor networks.
The open-source framework and the communication services permit an easy adaptation or integration of
simulation models and the connection to application-specic simulation tools.
40.4.2 EmStar
EmStar is a software environment for developing anddeploying applications for sensor networks consisting
of 32-bit embedded Microserver platforms [19,20]. EmStar consists of libraries, tools, and services.
2006 by Taylor & Francis Group, LLC
Software Development 40-19
TinyOS application
ADC Clock RFM
H
a
r
d
w
a
r
e
d
e
p
e
n
d
e
n
t
s
y
s
t
e
m
c
o
m
p
o
n
e
n
t
s
MICA mote hardware
platform
TinyOS application
ADC Clock RFM
TOSSIM platform
TinyOS application
ADC Clock RFM
RENE mote hardware
platform
ADC
model
Event queue
Component
graphs
Communication
services
R
a
d
i
o
m
o
d
e
l
APP
TEMP PHOTO AM
CRC
BYTE
RFM CLOCK ADC
ADC event
TOSSIM
implementations
.........
.........
FIGURE 40.17 Comparison of TinyOS and TOSSIM system architecture.
Libraries implement primitives of interprocess communication. Tools support simulation, emulation,
and visualization of sensor network applications. Services provide network functionality, sensing, and
synchronization. EmStars target platforms are so-called Microservers, typically iPAQ or Crossbow Stargate
devices. EmStar does not support Berkeley Motes as platform, however, can easily interoperate with
Motes. EmStar consists of various components. Table 40.2 gives the name and a short description of
each component. The last row of the table contains hypothetical, application-specic components; all
others are EmStar core components. Figure 40.18 illustrates the cooperation of EmStar components in a
sample application for environmental monitoring. The dark-gray boxes represent EmStar core modules.
Hypothetical application-specic modules are lled light-gray. The sample application collects data from
an audio sensor and therewith tries to detect the position of an animal in collaboration with neighboring
sensor nodes.
40.4.2.1 EmStar Tools and Services
EmStar provides tools for the simulation, emulation, and visualization of a sensor network and its
operation. EmSim runs virtual sensor nodes in a pure simulation environment. EmSim models both
radio and sensor channels. EmCee runs the EmSim core, however, uses real radios instead of modeled
channels. Both EmSimand EmCee use the same EmStar source code and associated conguration les as
2006 by Taylor & Francis Group, LLC
40-20 Embedded Systems Handbook
TABLE 40.2 EmStar Components
Component Description
Emrun Management and watchdog process (responsible for start-up, monitoring,
and shut-down of EmStar modules)
Emproxy Gateway to a debugging and visualization system
udpd, linkstats, neighbors, Network protocol stack for wireless connections
MicroDiffusion
timehist, syncd, audiod Audio sampling service
FFT, detect, collab_detect Hypothetical modules, responsible for Fast Fourier Transformation,
and (collaborative) event detection
collab_detect
FFT timehist
linkstats
syncd
EmRun
Neighbors
udpd
ADC
audiod
EmProxy
Detect
MicroDiffusion
EmLog/*
802.11 NIC
Sensor/anim
Link/ls0/neighbors
Sensor/audio/fft
sync/hist
Sensor/audio/0 sync/params
Link/ls0
Link/udp0
Clients
Status
Gradients Data
Sensor
FIGURE 40.18 EmStar sample application consisting of EmStar core modules (dark-gray) and hypothetical
application-specic modules (light-gray).
a real deployed EmStar system. This aspect alleviates the development and debugging process of sensor
network applications.
EmView is a visualization tool for EmStar systems that uses a UDP protocol to request status updates
fromsensor nodes. In order to obtain sensor node or network information, EmView queries an EmProxy
server that runs as part of a simulation or on a real node. EmRun starts, stops, and manages an EmStar
system. EmRun supports process respawn, in-memory login, fast startup, and graceful shutdown.
EmStar services comprise link and neighborhood estimation, time synchronization, and routing. The
Neighborhood service monitors links and maintains lists of active, reliable nodes. EmStar applications can
use these lists to get informed about topology changes. The LinkStats service provides applications with
more detailed information about link reliability than the Neighborhood service, however, produces more
packet overhead. Multipath-routing algorithms can benet fromthe LinkStats service by weighting their
2006 by Taylor & Francis Group, LLC
Software Development 40-21
path choices with LinkStats information. The TimeSync service is used to convert timestamps between
different nodes. Additionally, EmStar supports several routing protocols, but allows the integration of
new routing protocols as well.
40.4.2.2 EmStar IPC Mechanism
Communication between EmStar modules is managed by so-called FUSD-driven devices (FUSD
Framework for User-Space Devices), a microkernel extension to Linux. FUSD allows devicele callbacks
to be proxied into user space and implemented by user-space programs instead of kernel code. Besides
intermodule communication, FUSD allows interaction of EmStar modules and users. FUSD drivers are
implementedinuser space, however, cancreate device les withthe same semantics as kernel-implemented
device les. Applications can use FUSD-driven devices to transport data or expose state.
Several device patterns exist for EmStar systems that are frequently needed in sensor network applic-
ations. Example device patterns comprise a status device pattern exposing the current state of a module,
a packet device pattern providing a queued multiclient packet interface, a command device pattern that
modies conguration les and triggers actions, and a query device pattern implementing a transactional
RPC mechanism.
40.4.3 Sensor Network Applications (SNA) Test and Validation
Environment
In SeNeTs, SNAs run distributed on independent hosts such as PCs, PDAs, or evaluation boards of
embedded devices [21]. The parallel execution decouples applications and simulation environment. The
quasi-parallel and sequential processing of concurrently triggered events in simulations is disadvantage-
ous compared with real-world programs. SeNeTs prevents this very unpleasant effect. Without SeNeTs,
this effect results in sequenced execution of parallel working SNAs and corrupted simulation output.
To summarize, realistic simulations of sensor networks are complicated.
40.4.3.1 System Architecture
The development and particularly the validation of distributed applications are hard to realize. Especially
systems with additional logging and controlling facilities affect the primary behavior of applications.
Suppose, a logging message is transmitted, then an application message transmitted may be delayed.
Exceptionally in wireless applications with limited channel capacity, the increased communication leads to
a modiedtiming behavior andas a consequence todifferent results. The channel capacity is denedas
1
n
,
since n symbolizes the number of nodes. Due to the degrading channel capacity in large sensor networks,
the transport medium acts as a bottleneck [22]. Thus, in wireless sensor networks with thousands of
nodes, the bottleneck effect becomes the dominant part.
To eliminate the bottleneck effect, SeNeTs contains two independent communications channels as illus-
trated in Figure 40.19. The primary communication channel is dened by the sensor network application.
Base
station
App 6
App 5
App 7
App 8
App 2
App 1
App 3
App 4
Secondary transmission channel
Primary
transmission
channel
FIGURE 40.19 Communication channels in SeNeTs.
2006 by Taylor & Francis Group, LLC
40-22 Embedded Systems Handbook
Commands
Graphical
user interface
Scripts
Network
server
Host: spice
Application
server
Host: speedy
Application
server
Host: rtl
Node 1
Node 4
Node 3
Node 2
Visualization, evaluation Application configuration Network configuration Node applications
FIGURE 40.20 SeNeTs components using the secondary transmission channel.
It uses the communication method required by SNAs, for example, Bluetooth or ZigBee. The secondary
communication channel is an administration channel only used by SeNeTs components. This channel
transmits controlling and logging messages. It is independent of the primary communication channel and
uses a different communication method, for example, Ethernet or Ultrasound. The separation into two
communication channels simplies the decoupling of application modules and administration modules
after testing.
The parallel execution of applications on different host systems requires a cascaded infrastructure
to administrate the network. Figure 40.20 displays important modules in SeNeTs: node applications,
application servers (ASs), a network server (NS), and optional evaluation or visualization modules. All of
these modules are connected via the secondary transmission channel.
40.4.3.2 Network Server
The NS administrates sensor networks and associated sensor nodes. The NS starts, stops, or queries SNAs.
In an SeNeTs network, exactly one NS exists. However, this NS is able to manage several sensor networks
simultaneously. Usually, the NS runs as service of the operating system.
An NS opens additional communication ports. External programs, such as scripts, websites, or telnet
clients, can connect to these ports to send commands. These commands may be addressed and forwarded
to groups or stand-alone components. Furthermore, the NS receives logging messages from applications
containing their current state. Optional components, such as graphical user interfaces, can install callbacks
to receive this information.
40.4.3.3 Application Server
The AS manages instances of node applications on one host (Figure 40.20). It acts as bridge between node
applications and the NS. Usually, at least one AS exists within the SeNeTs network. Ideally, only one node
application should be installed on an AS to prevent quasi-parallel effects during runtime.
The AS runs independent of the NS. It connects to the NS via a pipe to receive commands. Each
command is multiplexed to one of the connected node applications. Moreover, if the pipe to the NS
breaks, node applications will not be affected besides losing logging and controlling facilities. Later, the
NS can establish the pipe again.
Generally, an AS starts as service together with the hosts operating system. At startup, it requires
conguration parameters of the nodes hardware. With these parameters, the AS assigns hardware to node
applications. Suppose a host system that comprises two devices representing sensor nodes as principally
shown in Figure 40.20. Then, the AS requires device number, physical position of the node, etc. to
congure the dynamically installed node applications at runtime.
40.4.3.4 SeNeTs Application
Applications for wireless sensor nodes are usually designed based on a layered software model as depicted
in Figure 40.21(a) [15]. On top of the nodes hardware, a specialized operating system is set up such as
2006 by Taylor & Francis Group, LLC
Software Development 40-23
Operating system Sensor driver
Services Modules Algorithms
Sensor node application
Middleware management
Sensor node
application
Simulator SeNeTs Node
Target type
Environment
emulation
(a) (b)
SeNeTs
Adaptation
Logging
Controlling
SeNeTs hardware
abstraction
Debugging
SeNeTs sensor node application
FIGURE 40.21 (a) Software layer model of a sensor node application, (b) software layer model of a SeNeTs
application.
TinyOS [23]. A sensor driver contains software to initialize the measurement process and to obtain sensor
data. Above the operating system and the sensor driver, middleware components are located containing
services to aggregate data or to determine the nodes position. The aforementioned modular design
allows:
Abstraction of hardware, for example, sensors, communication devices, memory, etc.
Adaptation of the nodes operating system
Addition of optional components, for example, logging and conguration
The SeNeTs Adaptation is a set of components which are added or exchanged to wrap the SNA.
Figure 40.21(b) represents the SeNeTs Adaptation layer consisting of at least a logging component, a con-
trolling unit, a HAL, and an optional environment encapsulation module. These additional components
provide substantial and realistic test and controlling facilities.
An application composed of an SNA and SeNeTs Adaptation components is called SeNeTs Application
(SeA). The SNA is not changed by added components. Generally, it is not necessary to adapt the SNA to
SeNeTs interfaces. However, supplementary macros can be added to interact with the linked components.
An SeA runs as process of the host system. Due to the architecture of an SNA with its own operating
system, the SeA runs autonomously without interaction to other processes of the host system. At startup,
the SeA opens a pipe to communicate with the AS. After the test phase, all SeNeTs components can be
removed easily by recompilation of all node applications. SeNeTs specic components and logging calls
are automatically deactivated due to compiler switches.
40.4.3.5 Environment Management
Sensor network applications require valid environment data, such as temperature or air pressure. Under
laboratory conditions, this information is not or partly not available. Therefore, environment data must be
emulated. SeNeTs provides these environment data to the node application by the environment emulation
module (Figure 40.21[b]). All environment emulation modules are controlled by the environment man-
agement of the NS which contains all predened or congured data (Figure 40.22). This data comprises
positions of other nodes, distances to neighboring nodes, etc. If required, other data types may be added.
In the AS, the environment data cache module stores all environment information required by each node
application to reduce network trafc.
Optionally, position-based ltering is provided by the environment emulation component of SeNeTs.
Especially, if large topologies of sensor nodes should be emulated under small-sized laboratory conditions,
this ltering approach is essential. Suppose real and virtual positions of nodes are known, a mapping from
physical address to virtual address is feasible. A node application only receives messages fromnodes that
2006 by Taylor & Francis Group, LLC
40-24 Embedded Systems Handbook
Network server
Host: spice
Command
execution
Network
management
Logging
Environment
management
Application server
Host: rtl
Environment
data cache
Reachability list
air pressure
temperature
Application
management
FIGURE 40.22 Environment management in SeNeTs.
(a)
A B
C D
(b)
A B C D
FIGURE 40.23 (a) Physically arranged sensor nodes (black dots). All nodes are in transmission range (surrounding
circles) of each other. (b) Virtually arranged nodes with appropriate transmission ranges. Nodes are no longer able to
communicate without routing.
are virtually in transmission range. All other messages are rejected by the SeNeTs Adaptation components.
This is accomplished by setting up a lter in the primary communication channel.
One application scenario that illustrates position-based ltering is ood prevention. Here, sensor
nodes are deployed in sandbags, piled along a dyke of hundreds of meters or even kilometers. These nodes
measure the humidity and detect potential leakages. Testing this scenario under real-world conditions
is not practical and very expensive. However, the evaluation under real-world conditions of software
regarding communication effort, self-organization of the network, routing, and data aggregation is most
important.
Figure 40.23 illustrates the difference between laboratory and real world. Figure 40.23(a) represents
laboratory conditions where all nodes are in transmission range of each other. Figure 40.23(b) sketches the
ood prevention scenario under real conditions. In Figure 40.23(a), the nodes A to D are in transmission
range of each other. Therefore in contrast to the real-world scenario, no routing is required. Next,
data aggregation yields wrong results, because nodes are not grouped as they would in reality. Thus, if
physically arranged nodes in test environment do not meet the requirements of the real world, the results
are questionable.
Assume, node A sends a message to node D, then all nodes receive the message due to the physical
vicinity in the test environment (Figure 40.23[a]). Node C and D receive the message, but they are not in
the virtual transmission range of node A. Thus, the environment emulation module rejects these messages.
As a result, SeNeTs prevents a direct transmission from node A to node D. Messages can be transmitted
only by using routing nodes B and C (Figure 40.23[b]). In short, the emulation of the sensor network
software becomes more realistic.
2006 by Taylor & Francis Group, LLC
Software Development 40-25
40.5 Summary
At the present time, TinyOS is the most mature operating system framework for sensor nodes. The
component-based architecture of TinyOS allows an easy composition of SNAs. New components can be
added easily to TinyOS to support novel sensing or transmission technologies or to support upcoming
sensor node platforms. MAT addresses the requirement to change a sensor nodes behavior at runtime by
introducing a VM on top of TinyOS. Via transmitting capsules containing high-level instructions, a wide
range of SNAs can be installed dynamically into a deployed sensor network. TinyDB was developed to
simplify data querying from sensor networks. On top of TinyOS, it provides an easy to use SQL interface
to express data queries and addresses the group of users nonexperienced with writing embedded C code
for sensor nodes. TOSSIM is a simulator for wireless sensor networks based on the TinyOS framework.
EnviroTrack is an object-based programming model to develop sensor network applications for track-
ing activities in the physical environment. Its main feature is dynamical grouping of nodes depending on
environmental changes described by predened aggregate functions, critical mass, and freshness horizon.
SensorWare is a software framework for sensor networks employing lightweight and mobile control scripts
that allow the dynamic deployment of distributed algorithms into a sensor network. In comparison to
the MAT framework, the SensorWare runtime environment supports multiple applications to run con-
currently on one SensorWare node. The MiLAN middleware provides a framework to optimize network
performance, which needed sensing probability and energy costs based on equations. It is the program-
mers decision to weight these equations. EmStar is a software environment for developing and deploying
applications for sensor networks consisting of 32-bit embedded Microserver platforms. SeNeTs is a new
approach to optimize the interfaces of sensor network middleware. SeNeTs aims at the development of
energy-saving applications and the resolving of component dependencies at compile time.
References
[1] G.J. Pottie and W.J. Kaiser, Wireless integrated network sensors, Communications of the ACM, 43,
5158, 2000.
[2] J.M. Kahn, R.H. Katz, and K.S.J. Pister, Next century challenges: mobile networking for smart dust,
in Proceedings of the ACM MobiCom99, Washington, USA, 1999, pp. 271278.
[3] D. Culler, E. Brewer, and D. Wagner, A platform for WEbS (wireless embedded sensor actuator
systems), Technical report, University of California, Berkeley, 2001.
[4] J. Rabaey et al., PicoRadio supports ad hoc ultra-low power wireless networking, IEEE Computer,
33(7), 4248, 2000.
[5] EYES Energy-efcient sensor networks, URL: http://eyes.eu.org
[6] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, A survey on sensor networks, IEEE
Communications Magazine, 40(8), 102114, 2002.
[7] P. Rentala, R. Musunuri, S. Gandham, and U. Saxena, Survey on sensor networks, Technical report
UTDCS-10-03, University of Texas, 2003.
[8] J. Hill et al., Systemarchitecture directions for networked sensors, in Proceedings of the Ninth Inter-
national Conference on Architectural Support for Programming Languages and Operating Systems,
Cambridge, MA, USA, November 2000.
[9] D. Gay, P. Levis, R.V. Behren, M. Welsh, E. Brewer, and D. Culler, The nesC language: a holistic
approach to networked embedded systems, in Proceedings of the Conference on Programming
Language Design and Implementation (PLDI), San Diego, CA, June 2003.
[10] D. Culler, TinyOS a component-based OS for the networked sensor regime, URL:
http://webs.cs.berkeley.edu/tos/, 2003.
[11] S. Madden, J. Hellerstein, and W. Hong, TinyDB: in-network query processing in TinyOS, Intel
Research, IRB-TR-02-014, October 2002.
2006 by Taylor & Francis Group, LLC
40-26 Embedded Systems Handbook
[12] A. Boulis and M.B. Srivastava, A framework for efcient and programmable sensor networks,
in Proceedings of the Fifth IEEE Conference on Open Architectures and Network Programming
(OPENARCH 2002), New York, June 2002.
[13] A. Murphy and W. Heinzelman, MiLAN: middleware linking applications and networks,
Technical report, University of Rochester, Computer Science Department, URL: http://
hdl.handle.net/1802/305, January 2003.
[14] M. Perillo and W. Heinzelman, Providing application qos through intelligent sensor management,
in Proceedings of the First IEEE International Workshop on Sensor Network Protocols and Applications
(SNPA03), Anchorage, AK, USA, May 2003.
[15] J. Blumenthal, M. Handy, F. Golatowski, M. Haase, and D. Timmermann, Wireless sensor
networks new challenges in software engineering, in Proceedings of the Ninth IEEE Inter-
national Conference on Emerging Technologies and Factory Automation (ETFA), Lisbon, Portugal,
September 2003.
[16] The Network Simulator ns-2, http://www.isi.edu/nsnam/ns
[17] P. Levis et al., TOSSIM: accurate and scalable simulation of entire TinyOS applications, in
Proceedings of the First ACM Conference on Embedded Networked Sensor Systems (SenSys 2003),
Los Angeles, November 2003.
[18] TOSSIM: A Simulator for TinyOS Networks Users Manual, in TinyOS documentation.
[19] L. Girod, J. Elson, A. Cerpa, T. Stathopoulos, N. Ramanathan, and D. Estrin, EmStar: a software
environment for developing and deploying wireless sensor networks, in Proceedings of USENIX 04,
Boston, June 2004.
[20] EmStar: software for wireless sensor networks, URL: http://cvs.cens.ucla.edu/emstar/, 2004.
[21] J. Blumenthal, M. Handy, and D. Timmermann, SeNeTs test and validation environment for
applications in large-scale wireless sensor networks, in Proceedings of the Second IEEE International
Conference on Industrial Informatics INDIN04, Berlin, June 2004.
[22] J. Li, C. Blake, D.S.J. De Couto, H.I. Lee, and R. Morris, Capacity of ad hoc wireless networks, in
Proceedings of MobiCom, Rome, July 2001.
[23] Berkeley WEBS: TinyOS, http://today.cs.berkeley.edu/tos/, 2004.
[24] P. Levis and D. Culler, MAT: a tiny virtual machine for sensor networks, in Proceedings of the ACM
Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS),
San Jose, California, USA, October 2002.
[25] T. Abdelzaher, B. Blum et al., EnviroTrack: an environmental programming model for tracking
applications in distributed sensor networks, Technical report CS-2003-02, University of Virginia,
2003.
2006 by Taylor & Francis Group, LLC
VI
Embedded Applications
2006 by Taylor & Francis Group, LLC
Automotive Networks
41 Design and Validation Process of In-Vehicle Embedded Electronic Systems
Franoise Simonot-Lion and YeQiong Song
42 Fault-Tolerant Services for Safe In-Car Embedded Systems
Nicolas Navet and Franoise Simonot-Lion
43 Volcano Enabling Correctness by Design
Antal Rajnk
2006 by Taylor & Francis Group, LLC
41
Design and Validation
Process of In-Vehicle
Embedded Electronic
Systems
Franoise Simonot-Lion
Institut National Polytechnique de
Lorraine
YeQiong Song
Universit Henri Poincar
41.1 In-Vehicle Embedded Applications: Characteristics
and Specic Constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41-1
Economic and Social Context Several Domains and Specic
Problems Automotive Technological Standards
A Cooperative Development Process
41.2 Abstraction Levels for In-Vehicle Embedded System
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41-8
Architecture Description Languages EAST-ADL for
In-Vehicle Embedded System Modeling
41.3 Validation and Verication Techniques . . . . . . . . . . . . . . . . 41-10
General View of Validation Techniques Validation by
Performance Evaluation
41.4 Conclusions and Future Trends . . . . . . . . . . . . . . . . . . . . . . . . . 41-20
41.5 Appendix: In-Vehicle Electronic System
Development Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41-21
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41-22
41.1 In-Vehicle Embedded Applications: Characteristics and
Specic Constraints
41.1.1 Economic and Social Context
While automobile production is likely to increase slowly in the coming years (42 million cars produced
in 1999 and only 60 million planned in 2010), the part of embedded electronics and more precisely
embedded software is growing. The cost of electronic systems was $37 billion in 1995 and $60 billion in
2000, with an annual growth rate of 10%. In 2006, the electronic embedded system will represent at least
25% of the total cost of a car and more than 35% for a high-end model [1].
The reasons for this evolution are technological as well as economical. On the one hand, the cost of
hardware components is decreasing while their performance and reliability are increasing. The emergence
41-1
2006 by Taylor & Francis Group, LLC
41-2 Embedded Systems Handbook
of automotive embedded networks such as LIN, CAN, TTP/C, FlexRay, MOST, and IDB-1394 leads to
a signicant reduction of the wiring cost as well. On the other hand, software technology facilitates the
introduction of new functions whose development would be costly or even not feasible if using only
mechanical or hydraulic technology and therefore allows satisfying the end-user requirements in terms
of safety and comfort. Well-known examples are Electronic Engine control, ABS, ESP, Active suspension,
etc. In short, thanks to these technologies, the customers can buy a safe, efcient, and personalized
vehicle while the carmakers are able to master the differentiation of product variants and the innovation
(analysts stated that more than 80% of innovation, and therefore of added value, will be obtained thanks
to electronic systems [2]). Another new factor is emerging. A vehicle already includes some electronic
equipments such as hand free phones, audio/radio devices, and navigation systems. For the passengers,
many entertainment devices, such as video equipments, and communication with outside world will be
available in the very near future. Even if these kinds of applications have little to do with the vehicles
operation itself, they increase signicantly the part of software embedded in a car.
Who is concerned by this evolution? First the vehicle customer, for which the requirements are on
the one hand, the increase of performance, comfort, assistance for mobility efciency (navigation), and
on the other hand, the reduction of vehicle fuel consumption and cost. Furthermore, he requires a
reliable embedded electronic system that ensures safety properties. Second, the stakeholders, carmakers,
and suppliers, who are interested in the reduction of time-to-market, development, production, and
maintenance costs. Finally, this evolution has a strong impact on the society: legal restrictions on exhaust
emission, protection of the natural resources and the environment.
The example of electronic systems formerly presented does not have to meet the same level of depend-
ability. So their designs are relevant for different techniques. Nevertheless, common characteristics are
their distributed nature and the fact that they have to provide a level of quality of service xed by the
market, the safety and cost requirements. Therefore their development and production have to be based
on a suitable methodology including their modeling, validation, optimization, and test.
41.1.2 Several Domains and Specic Problems
In-vehicle embedded systems are usually classied in four domains that correspond to different function-
alities, constraints, and models [3, 4]. Two of them are concerned specically with safety: power train
and chassis domain. The third one, body, is emerging and presently integrated in a majority of cars.
Finally, telematic, multimedia, and Human Machine Interface domains take benet of continuous
progress in the eld of multimedia, wireless communications, and Internet.
41.1.2.1 Power Train
This domain represents the system that controls the motor according to, on the one hand, requests of
the driver, that can be explicit orders (speeding up, slowing down, etc.) or implicit constraints (driving
facilities, driving comfort, fuel consumption, etc.) and, on the other hand, environmental constraints
(exhaust pollution, noise, etc.). Moreover, this control has to take into account requirements from other
parts of the embedded system as climate control or ESP (Electronic Stability Program).
In this domain, the main characteristics are:
At a functional point of view: the power train control takes into account different working modes
of the motor (slow running, partial load, full load, etc.); this corresponds to different and complex
control laws (multivariables) with different sampling periods (classical sampling periods for signals
provided by other systems are 1, 2, or 5 msec while the sampling of signals on the motor itself is in
phase with the motor times.
At a hardware point of view: this domain requires sensors whose specication has to consider
the minimization of the criteria cost/resolution, and microcontrollers providing high computa-
tion power, thanks to their multiprocessors architecture, dedicated coprocessors (oating point
computations), and high storage capacities.
2006 by Taylor & Francis Group, LLC
Design and Validation Process 41-3
At an implementation point of view: the specied functions are implemented as several tasks with
different activationrules according tothe sampling rules, stringent time constraints imposedontask
scheduling, mastering safe communications with other systems and with local sensors/actuators.
In this domain, systems are relevant of continuous systems, sampled systems, and discrete systems.
Traditional tools for their functional design and modeling are, for example, Matlab/Simulink,
Matlab/Stateow. Currently the validations of these systems are mainly done by simulation and, for
their integration, by emulation methods and/or tests. Last, as illustrated formerly, the power train domain
includes hard real-time systems; so, performance evaluation and timing analysis activities have to be
proceeded on their implementation models.
41.1.2.2 Chassis
Chassis domain gathers all the systems that control the interaction of the vehicle with the road and the
chassis components (wheel, suspension, etc.) according to the request of the driver (steering, braking, or
speed up orders), the road prole, and the environmental conditions (wind, etc.). These systems have to
ensure the comfort of driver and passengers (suspension) as well as their safety. This domain includes
systems as ABS (Anti-lock Braking System), ESP (Electronic Stability Program), ASC (Automatic Stability
Control), 4WD (4 Wheel Drive). Note that, chassis is the critical domain contributing to the safety of
the passengers and of the vehicle itself. Furthermore, X-by-Wire technology, currently applied in avionic
systems, is emerging in automotive industry. X-by-Wire is a generic term used when mechanical and/or
hydraulic systems are replaced by electronic ones (intelligent devices, networks, computers supporting
software components that implement ltering, control, diagnosis, functionalities). For example, we can
cite brake-by-wire, steer-by-wire, that will be shortly integrated in cars for the implementation of critical
and safety relevant functions. The characteristics of the chassis domain and the underlying models are
similar to those presented for power train domain, that is multivariable control laws, different sampling
periods, and stringent time constraints. Regarding the power train domain, systems controlling chassis
components are fully distributed. Therefore, the development of such systems must dene a feasible
system, that is, satisfying performance, dependability, and safety constraints. Conventional mechanical
andhydraulic systems have stoodthe test of time andhave provedtobe reliable; it is not the same for critical
software based systems. In aerospace/avionic industries, X-by-Wire technology is currently employed; but,
for ensuring safety properties, specic hardware and software components, specic fault tolerant solutions
(heavy and costly redundancies of networks, sensors, and computers), and certied design and validation
methods are used. Now there is a challenge to adapt these solutions to automotive industries that impose
stringent constraints on component cost, electronic architecture cost (minimization of redundancies),
and development time length.
41.1.2.3 Body
Wipers, lights, doors, windows, seats, and mirrors are controlled more and more by software based sys-
tems. These kinds of functions make up the body domain. They are not subjected to stringent performance
constraints but globally involve many communications between them and consequently a complex dis-
tributed architecture. There is an emergence of the notion of subsystem or subcluster based on low cost
sensoractuator level networks as, for example, LIN that connect modules realized as integrated mechat-
ronic systems. On another side, the body domain integrates a central subsystem, termed the central body
electronic whose main functionality is to ensure message transfers between different systems or domains.
This system is recognized to be a central critical entity.
Body domain mainly implies to discrete event applications. Their design and validation rely on state
transition models (as SDL, Statecharts, UML state transition diagrams, Synchronous models). These
models allow, mainly by simulation, the validation of a functional specication. Their implementation
implies a distribution over complex hierarchical hardware architecture. High computation power for the
central body electronic entity, fault tolerance, and reliability properties are imposed on the body domain
systems. A challenge in this context is rst to be able to develop exhaustive analysis of state transition
2006 by Taylor & Francis Group, LLC
41-4 Embedded Systems Handbook
diagrams and second, to ensure that the implementation respects the fault tolerance and safety constraints.
The problem here is to achieve a good balance between time-triggered approach and exibility.
41.1.2.4 Telematic and Human Machine Interface
Next generation of telematic devices provides new-sophisticated Human Machine Interface (HMI) to the
driver and the other occupants of a vehicle. They enable not only to communicate with other systems inside
the vehicle but also to exchange information with the external world. Such devices will be upgradeable
in the future and for this domain, a plug and play approach has to be favored. These applications have
to be portable and the services furnished by the platform (operating system and/or middleware) have to
offer generic interfaces and downloading facilities. The main challenge here is to preserve the security of
the information from, to, or inside the vehicle. Sizing and validation do not rely on the same methods as
for the other domains. Here we shift from considering messages, tasks, and deadline constraints to uid
data streams, bandwidth sharing, and multimedia quality of service and from safety and hard real-time
constraints to security on information and soft real-time constraints. Note that, if this domain is more
related to entertainment activities, some interactions exist with other domains. For example, the telematic
framework offers a support for future remote diagnostic services. In particular, the standard OBD-3,
currently under development, extends OBD-2 (Enhanced On Board Diagnosis) by adding telemetry. As
its predecessor, it denes the protocol for collecting measures on the power train physical equipments and
alerting, if necessary, the driver and a protocol for the exchanges with a scan tool. Thanks to a technology
similar to that which is already being used for automatic electronic toll collection systems, an OBD-3-
equipped vehicle would be able to report the vehicle identication number and any emission problems
directly to a regulatory agency.
41.1.3 Automotive Technological Standards
A way for ensuring some level of interoperability between components developed by different partners is
brought at rst by the standardization of services sharing the hardware resources between the application
processes. For this reason, in the current section, we provide an outline of the main standards used in
automotive industry, in particular the networks and their protocols and the operating systems. Then, we
introduce some works in progress for the denition of a middleware that will be a solution for portability
and exibility purpose.
41.1.3.1 Networks and Protocols
Due to the stringent cost, real-time, and reliability constraints, specic communication protocols and
networks have been developed to fulll the needs of the ECU (Electronic Control Unit) multiplexing.
SAE has dened three distinct protocol classes named class A, B, and C. Class A protocol is dened for
interconnecting actuators and sensors with a low bit rate (about 10 Kbps). An example is LIN. Class B
protocol supports a data rate as high as 100 Kbps and is designed for supporting nonreal-time control
and inter ECU communication. J1850 and low speed CAN are examples of SAE class B protocol. Class C
protocol is designed for supporting real-time and critical applications. Networks like high speed CAN,
TTP/C belong to the class C, which support data rates as high as 1 or several mega bits per second. This
section intends to outline the most known of them.
41.1.3.1.1 Controller Area Network
Controller Area Network (CAN) [5,6] is without any doubt the mostly used in-vehicle network. CAN was
initially designed byRobert Boschcompany at the beginning of the 1980s for multiplexing the increasing
number of ECUs in a car. It became an OSI standard in 1994 and is now a de facto standard for data
transmission in automotive applications due to its low cost, robustness, and bounded communication
delay. CANis mainly used in power train, chassis, and body domains. Further information on CANrelated
protocols and development, including TTCAN, could be found in http://can-cia.org/.
Controller Area Network is a priority-based bus that allows to provide a bounded communication delay
for eachmessage priority. The MAC(MediumAccess Control) protocol of CANuses CSMAwithbit-by-bit
2006 by Taylor & Francis Group, LLC
Design and Validation Process 41-5
TABLE 41.1 CAN and VAN Frame Format
Bit
SOF
1
ID
11/29 1 2
RTR
(Reserved)
4
DLC
064
Data
16
CRC
2
ACK
7
EOF
3
IFS
10 15 2 2 8 4 Time slot
12 Bit
SOF ID EOD ACK EOF IFG VAN
CAN
(0 28)10
(0 28)8
Data
5
4
Command
15 +3
15
CRC
nondestructive arbitration over the IDeld (Identier). The identier is coded using 11 bits (CAN2.0A) or
29 bits (CAN2.0B) and it also serves as priority. Up to 8 bytes of data can be carried by one CANframe and
a CRCof 16 bits is used for transmission error detection. CANuses a NRZbit encoding scheme for making
the bit-by-bit arbitration feasible with a logical AND operation. However the use of bit-wise arbitration
scheme intrinsically limits the bit rate of CANas the bit time must be long enough to cover the propagation
delay on the whole network. A maximumof 1 Mbps is specied to a CAN bus not exceeding 40 m.
The maximum message transmission time should include the worst-case bit stufng number
(CAN2.0A). This length is given by:
C
i
=
44 +8 DLC +
34 +8 DLC
4
bit
(41.1)
where DLC is the data length in bytes and
bit
the bit time; the fraction represents the overhead due
to the bit stufng, a technique implemented by CAN for bit synchronization, which consists in inserting
an opposite bit every time ve consecutive bits of the same polarity are encountered.
Frame format is given in Table 44.1. We will not detail eld signication here; note however that the
Inter Frame Space (IFS) has to be considered when calculating the bus occupation time of a CANmessage.
41.1.3.1.2 Vehicle Area Network
Vehicle Area Network (VAN) [7, 8] is quite similar to CAN. It was used by the French carmaker PSA
Peugeot-Citron for the body domain. Although VAN has some more interesting technical features than
CAN, it is not largely adopted by the market and has now been abandoned in favor of CAN. Its MAC
protocol is also CSMA with bit-by-bit nondestructive arbitration over the ID eld (Identier), coded with
12 bits. Up to 28 bytes of data can be carried by one VAN frame and a CRC of 15 bits is used. The bit rate
can reach 1 Mbps. One of the main differences between CAN and VAN is that CAN uses NRZ code while
VANuses a so-called E-Manchester (Enhanced Manchester) code: a binary sequence is divided into blocks
of 4 bits and the rst three bits are encoded using NRZ code (whose duration is dened as one Time Slot
per bit) while the fourth one is encoded using Manchester code (two Time Slots per bit). It means that
4 bits of data are encoded using 5 Time Slots (TS). Thanks to E-Manchester coding, VAN, unlike CAN,
does not need bit stufng for bit synchronization. This coding is sometimes denoted by 4B/5B.
The format of VAN frame is given in Table 41.1. The calculation of the transmission duration
(or equivalent frame length) of a VAN frame is given by:
C
i
= (60 +10 DLC) TS (41.2)
Note, however, that the Inter Frame Gap (IFG), xed to 4 TS, has to be considered when calculating the
total bus occupation time of a VAN message. Finally, VAN has one feature, which is not present in CAN:
the in-frame response capability. The same single frame can include the remote message request of the
consumer (identier and command elds) and the immediate response of the producer (data and CRC
elds).
41.1.3.1.3 J1850
SAE J1850 [9] is developed in North America and has been used by carmakers such as Ford, GM, and
DaimlerChrysler. The MAC protocol follows the same principle as CAN and VAN, that is, it uses CSMA
2006 by Taylor & Francis Group, LLC
41-6 Embedded Systems Handbook
with bit-by-bit arbitration for collision resolution. J1850 supports two data rates: 41.6 Kbps for PWM
(Pulse Width Modulation) and 10.4 Kbps for VPW (Variable Pulse Width). The maximum data length
is 11 bytes. The typical applications are SAE class B ones such as instrumentation/diagnostics and data
sharing in engine, transmission, ABS.
41.1.3.1.4 TTP/C
Time-Triggered Protocol (TTP/C) [10] has been developed at the Vienna University of Technology.
Hardware implementations of the TTP/C protocol, as well as software tools for the design of the
application, are commercialized by TTTech (www.tttech.com).
At the MAC layer, the TTP/C protocol implements the synchronous TDMA scheme: the stations
(or nodes) have access to the bus in a strict deterministic sequential order. Each station possesses the bus
for a constant time duration called a slot during which it has to transmit one frame. The sequence of
slots that all stations have access once to the bus is called a TDMA round.
TTP/C is suitable for SAE class C applications with strong emphasis on fault tolerant and deterministic
real-time feature. It is now one of the two candidates for X-by-Wire applications. The bit rate is not
limited in TTP/C specication. Todays available controllers (TTP/C C2 chips) support data rates as high
as 5 Mbps in asynchronous mode and 5 to 25 Mbps in synchronous mode.
41.1.3.1.5 FlexRay
The FlexRay protocol (www.exray.com) is currently being developed by a consortium of major companies
from the automotive eld. The purpose of FlexRay is, like TTP/C, to provide for X-by-Wire applications
with deterministic real-time and reliability communication. The specication of the FlexRay protocol
is however neither publicly available nor nalized at the time of writing of this chapter.
The FlexRay network is very exible with regard to topology and transmission support redundancy.
It can be congured as a bus, a star, or multistars and it is not mandatory that each station possess
replicated channels even though it should be the case for X-by-Wire applications.
At the MAC level, FlexRay denes a communication cycle as the concatenation of a time-triggered
(or static) window and an event-triggered (or dynamic) window. To each communication window, whose
size is set at design time, a different protocol applies. The communication cycles are executed periodically.
The time-triggered window uses a TDMA protocol. In the event-triggered part of the communication
cycle, the protocol is FTDMA (Flexible Time Division Multiple Access): the time is divided into so called
mini-slots, each station possesses a given number of mini-slots (not necessarily consecutive) and it can
start the transmission of a frame inside each of its own mini-slot. A mini-slot remains idle if the station
has nothing to transmit.
41.1.3.1.6 Local Interconnect Network
Local Interconnect Network (LIN) (www.lin-subbus.org) is a low cost serial communication system
intended to be used for SAE class A applications, where the use of other automotive multiplex networks
such as CANis too expensive. Typical applications are in body domain for controlling door, window, seat,
proof, and climate.
Besides the cost consideration, LINis also a sub network solution to reduce the total trafc load on the
main network (e.g., CAN) by building a hierarchical multiplex system. For this purpose, many gateways
exist allowing for example to interconnect a LIN subnet to CAN.
The protocol of LINis based on the master/slave model. A slave node must wait for being polled by the
master to transmit data. The data length can be 1/2/4/8 bytes. A master can handle at most 15 slaves (there
are 16 identiers by class of data length). LINsupports data rates up to 20 Kbps (limited for EMI-reasons).
41.1.3.1.7 Media Oriented System Transport
Media Oriented System Transport (MOST) (http://mostnet.de/) is a multimedia ber optic network
developed in 1998 by MOST coorperation (a kind of consortium composed of carmakers, set makers,
system architects and key component suppliers). The basic application blocks supported by MOST are
2006 by Taylor & Francis Group, LLC
Design and Validation Process 41-7
audio and video transfer, based on which end-user applications like radios, GPS navigation, video displays
and ampliers, and entertainment systems can be built.
The MOST protocol denes data channels and control channels. The control channels are used to
set up what data channels the sender and receiver use. Once the connection is established, data can
ow continuously for delivering streaming data (Audio/Video). The MOST network proposes a data rate
of 24.8 Mbps.
41.1.3.1.8 IDB-1394
IDB-1394 is an automotive version of IEEE-1394 for in-vehicle multimedia and telematic applica-
tions jointly developed by the IDB Forum (www.idbforum.org) and the 1394 Trade Association
(www.1394ta.org). IDB-1394 denes a system architecture/topology that permits existing IEEE-1394 con-
sumer electronics devices to interoperate with embedded automotive grade devices. The systemtopology
consists of an automotive grade embedded plastic optical ber network including cable and connectors,
embedded network devices, one or more consumer convenience port interfaces, and the ability to attach
hot-pluggable portable devices.
The IDB-1394 embedded network will support data rates of 100 Mbps, 200 Mbps, and 400 Mbps.
The maximumnumber of embedded devices is limited to 63 nodes.
From both data rate and interoperability with existing IEEE-1394 consumer electronic devices point of
view, IDB-1394 is a serious competitor of the MOST technology.
41.1.3.2 Operating Systems
OSEK/VDX (Offene Systeme und deren schnittstellen fr die Elektronik imKraft-fahrzeug) [11] is a mul-
titask operating systemthat becomes a standard in European automotive industry. Two types of tasks are
supported by OSEK/VDX, basic tasks without blocking point and extended tasks that can include blocking
points. This Operating Systemdoes not allow the dynamic creation/destruction of tasks. It implements a
Fixed Priority (FP) scheduling policy combined with Priority Ceiling Protocol (PCP) [12] to avoid priority
inversion or deadlock due to exclusive resource access. OSEK/VDX offers a synchronization mechanism
through private events and alarms. A task can be preemptive or nonpreemptive. An implementation of
OSEK/VDX has to be compliant to one of the four conformance classes BCC1, BCC2, ECC1, ECC2
dened according to the supported tasks (basic only or basic and extended), the number of tasks on
each priority level (only one or possibly several) and the limit of the reactivation counter (only one or
possibly several). The MODISTARC project (Methods and tools for the validation of OSEK/VDX-based
DISTributed ARChitectures ) [13] aims to provide the relevant test methods and tools to assess the con-
formance of OSEK/VDXimplementations. OSEK/VDXComand OSEK/VDXNMare complementary to
OSEK/VDX for communication and network management services. Furthermore, a language OSEK/OIL
(OSEK Implementation Language) is a basis both for the conguration of an application and the tuning
of the required operating system. In order to ensure dependability and fault tolerance for critical applica-
tions, the time-triggered operating systemOSEKtime [11] was proposed. It supports static scheduling and
offers interrupt handling, dispatching, system time and clock synchronization, local message handling,
and error detection mechanisms and offers predictability and dependability through fault detection and
fault tolerance mechanisms. It is compatible to OSEK/VDX and is completed by FTCom layer (Fault
Tolerant Communication) for communication services.
Rubus is another operating system tailored for automotive industry. It is developed by Arcticus
systems [14], with support fromthe research community, and is, for example, used by Volvo Construction
Equipment. Rubus OS consists of three parts achieving an optimum solution: the Red Kernel, which
manages execution of ofine scheduled time-triggered tasks, the Blue Kernel dedicated for execution of
event-triggered tasks and the Green Kernel in charge of the external interrupts.
These three operating systems are well suited to power train, chassis, and body domain because the
number of tasks integrated in these applications is known ofine. On the other hand, they do not t to
the requirements of telematic applications. In this last domain, are available, for example, Window CE for
Automotive that extends the classical operating systemWindows CE with telematic-oriented features.
2006 by Taylor & Francis Group, LLC
41-8 Embedded Systems Handbook
Finally, an important issue for the multipartners development and the exibility requirement is the
portability of software components. For this purpose, several projects aim to specify an embedded
middleware, which has to hide the specic communication system (portability) and to support fault
tolerance (see Titus Project [15], ITEA EAST EEA Project [24], DECOS Project [16], or Volcano [17]).
Note that these projects as well as Rubus Concept [14] provide not only a middleware or an operating
systembut also a way for a Component Based Approach for designing a real-time distributed embedded
application.
41.1.4 A Cooperative Development Process
Strong cooperation between suppliers and carmakers in the design process implies the development of
a specic concurrent engineering approach. For example, in Europe or Japan, carmakers provide the
specication of subsystems to suppliers, which are, then, in charge of the design and realization of these
subsystems including the software and hardware components and possibly the mechanical or hydraulic
parts. The results are furnished to the carmakers, who have to integrate them on the car and test them.
The last step consists in calibration activities, that is, in tuning some control and regulation parameters
for meeting the required performances of the controlled systems. This activity is closely related to testing
activities. In United States, the process is slightly different, as the suppliers cannot be really considered as
independent of carmakers. Nevertheless, the subsystem integration and calibration activities are always
to be done and, obviously, any error detected during this integration leads to a costly feedback on the
specication or design steps. Therefore, in order to improve the quality of the development process, new
design methodologies are emerging. In particular, the different actors of a system development apply
more and more methods and techniques ensuring the correctness of subsystems as early as possible in
the design stages and a new trend is to consider the integration of subsystems at a virtual level [18]. This
means that carmakers as well as suppliers will be able to design, prove, and validate the models of each
subsystemand of their integration at each level of the development in a cooperative way. This newpractice
will reduce the cost of development and production of new electronic embedded systems signicantly
while increasing exibility for the design of variants.
41.2 Abstraction Levels for In-Vehicle Embedded
System Description
As shown in Section 41.1.4, the way to improve the quality and the exibility of an embedded electronic
system while decreasing the development and production cost is to design and validate this system at
a virtual level. Therefore, the problem is, rst, to identify the abstraction level at which the components
and the whole system are to be represented. In order to apply validation and verication techniques on
the models, the second problemconsists in specifying which validation and verication activities have to
be applied and, consequently, which formalisms support the identied models.
41.2.1 Architecture Description Languages
Two main key words were introduced formerly: architectures, that refer to the concept of Architecture
Description Language (ADL), well known in computer science and, components, that leads to modularity
principles and object approach. An ADL is a formal approach for software and system architecture spe-
cication [19]. In the avionic context for which the development of embedded systems is relating to the
same problems, MetaH[20] has been developed at Honeywell and, in 2001, has been chosen as the basis of
a standardization effort aiming to dening anAvionics Architecture Description Language (AADL) stand-
ard under the authority of SAE. This language can describe standard control and data ow mechanisms
used in avionic systems, and important nonfunctional aspects such as timing requirements, fault and error
behaviors, time and space partitioning, and safety and certication properties. In automotive industry,
2006 by Taylor & Francis Group, LLC
Design and Validation Process 41-9
some recent efforts brought a solution for mastering the design, modeling, and validation of in-vehicle
electronic embedded systems. The rst result was obtained by the French project AEE (Architecture
Embedded Electronic) [21] and more specically through the denition of AIL_Transport (Architec-
ture Implementation Language for Transport). This language, based on UML, allows specifying in the
same framework, electronic embedded architectures, from the highest level of abstraction, for the cap-
ture of requirements and the functional views, to the lowest level, for the modeling of an implementation
taking into account services and performances of hardware supports and the distribution of software
components [22, 23].
41.2.2 EAST-ADL for In-Vehicle Embedded System Modeling
Taking AIL_Transport as one of the entry points for the European project ITEAEAST-EEA[24], (July 2001
to June 2004); a new language named EAST-ADL was dened. As AIL_Transport, EAST-ADL offers
a support for the unambiguous description of in-vehicle embedded electronic systems at each level
of their development. It provides a framework for the modeling of such systems through seven views
(see Figure 41.1) [25]:
Vehicle view describing user visible features such as anti-lock braking or windscreen wipers.
Functional analysis architecture level represents the functions realizing the features, their behavior,
and their exchanges. There is an n-to-n mapping between vehicle view entities and functional
analysis architecture entities, that is, one or several functions may realize one or several features.
Functional design architecture level models a decomposition or renement of functions described
at functional analysis architecture level in order to meet constraints regarding allocation,
efciency, reuse, supplier concerns, and so on. Again, there is an n-to-n mapping between entities
on functional design architecture and functional analysis architecture.
Logical architecture level where the class representation of the functional design architecture has
been instantiated to a at software structure suitable for allocation. This level provides an abstrac-
tion of the software components to implement on hardware architecture. The logical architecture
contains the leaf functions of the functional design architecture. Fromthe logical architecture point
of view, the code could automatically be generated in many cases.
Vehicle view
Functional analysis
architecture
Functional design
architecture
Logical
architecture
Operational
architecture
H
a
r
d
w
a
r
e
a
r
c
h
i
t
e
c
t
u
r
e
T
e
c
h
n
i
c
a
l
a
r
c
h
i
t
e
c
t
u
r
e
FIGURE 41.1 The abstraction layers of the EAST-ADL.
2006 by Taylor & Francis Group, LLC
41-10 Embedded Systems Handbook
In parallel to the application functionality, the execution environment is modeled fromthree views:
1. The hardware architecture level includes the description of the ECUs and more precisely those of the
used microcontroller, the sensors and actuators, the communication links (serial links, networks)
and their connections.
2. At technical architecture level given the model of the operating system or middleware API and the
services provided (behavior of the middleware services, schedulers, frame packing, and memory
management, in particular).
3. The operational architecture models the tasks, managed by the operating systems and frames,
managed by the protocols. On this lowest abstraction level, all implementation details are captured.
A system described on the functional analysis level may be loosely coupled to hardware based on intui-
tion, various known constraints, or as a back annotation from more detailed analysis on lower levels.
Furthermore, the structure of the functional design architecture and of the logical architecture is aware
of the technical architecture. Finally, this EAST-ADL provides the consistency within and between arti-
facts belonging to the different levels, at a syntactic and semantic point of view. This leads to make an
EAST-ADL based model a strong and nonambiguous support for automatically building models suited to
optimal conguration and/or validation and verication activities. For each of these identied objectives
(simulation or formal analysis at functional level, optimal distribution, frame packing, round building
for TDMA-based networks, formal test sequences generation, timing analysis, performance evaluation,
dependability evaluation, etc.), a software, specic to the activity, to the related formalism and to the
EAST-ADL, extracts the convenient data fromthe EAST-ADL repository and translates into the adequate
formalism. Then, the concerned activity can run, thanks to the adequate tools.
41.3 Validation and Verication Techniques
In this section, we introduce, briey in Section 41.3.1, the validation issues in automotive industry and
the place of these activities in the development process and detail in Section 41.3.2 a specic validation
technique that aims to prove that an operational architecture meets its performance properties.
41.3.1 General View of Validation Techniques
The validation of an embedded systemconsists of proving, on the one hand, that this systemimplements
all the required functionalities and, on the other hand, that it ensures functional and extra-functional
properties such as performance, safety properties. From an industrial point of view, validation and
verication activities address two complementary objectives:
1. Validationandvericationof all or parts of a systemat a functional level without taking intoaccount
the implementation characteristics (e.g., hardware performance). For this purpose, simulation or
formal analysis techniques can be used.
2. Verication of properties of all or parts of a system at operational level. These activities integrate
the performances of both the hardware and technical architectures and the load that is due to a
given allocation of the logical architecture. This objective can also be reached through simulation
and formal analysis techniques. Furthermore, according to the level of guarantee required for the
system under verication, a designer can need deterministic guarantees or simply probabilistic
ones, involving different approaches.
The expressionformal analysis is employed when mathematic techniques can be applied on an abstrac-
tion of the system while simulation represents the possibility to execute a virtual abstraction of it.
Obviously, formal analysis leads to an exhaustive analysis of the system, more precisely of the model
that abstracts it. It provides a precise and denitive verdict. Nevertheless, the level of abstraction or the
2006 by Taylor & Francis Group, LLC
Design and Validation Process 41-11
accuracy of a model is in inverse ratio to its capacity to be treated in a bounded time. So this tech-
nique is generally not suitable to large systems at ne grain abstraction level as required, for example,
for the verication of performance properties of a wide distributed operational architecture; in fact, in
this case, the system is modeled thanks to timed automata or queuing systems whose complexity can
make their analysis impossible. To solve this problem, simulation techniques can be applied. They accept
models at almost any level of detail. However, the drawback is that it is merely impossible to guarantee
all the feasible execution can be simulated. Therefore, the pertinence of these results is linked to the
scenario and the simulation duration and therefore we can only ensure that a system is correct for a
set of scenarios but this does not imply that the system will stay correct for any scenario. In fact, in
automotive industry, simulation techniques are more largely used than formal analysis. An exception
to this can be found in the context of the verication of properties to be respected by frames sharing
a network. Well-known formal approach, usually named timing analysis is available for this purpose.
Finally, note that some tools are of course of general interest for the design and validation of electronic
embedded systems as, for example Matlab/Simulink or Stateow [26], Ascet [27], Statemate [28], and
SCADE [29] etc. In some cases, an interface encapsulates these tools in order to suit the tool to the
automotive context.
Moreover, these techniques that work on virtual platforms are completed by test techniques in order
to assume that a realization is correct: test of software components, of logical architectures, test of an
implemented embedded system. Note that the test activities as well as the simulation ones consist of
providing a scenario of events and/or data that stimulate the systemunder test or stimulate an executable
model of the system; then, in both techniques we have to look which events and/or data are produced by
the system. The input scenario can be manually built or formally generated. In this last case, the test or
simulation activity is closely linked to a formal analysis technique [30].
At last, one of the main targets of validation and verication activities is the dependability aspect
of the electronic embedded systems. As seen in the rst section, some of these systems are said to be
safety-critical. This fact is enhanced in chassis domain through the emergence of X-by-Wire applications.
In this case, high dependability level is required: the system has to be compliant to a number of failures
by hour less than 10
9
(this means that the systemhas to work 115,000 years without a failure). By now,
it is a challenge because it is impossible to ensure this property only through the actual reliability of the
electronic devices. Moreover, as the application may be sensitive to electromagnetic perturbations, its
behavior cannot be entirely predictable. So, the required safety properties can be reached by introducing
fault tolerant strategies.
41.3.2 Validation by Performance Evaluation
The validation of a distributed embedded systemrequires, at least, proving that all the timing properties
are respected. These properties are generally expressed as timing constraints applied to the occurrences
of specic events, as, for example, a bounded jitter on a frame emission, a deadline on a task, a bounded
end-to-end response time between two events. The rst way of doing this is analytically, but this means
one should be able to establish a model that furnishes the temporal behavior of the system and that
can be mathematically analyzed. Considering the complexity of an actual electronic embedded system,
such a model has to be strongly simplied and generally provides only oversized solutions. For instance,
the holistic scheduling approach introduced by Tindell and Clark [31] allows just the evaluation of the
worst-case end-to-end response time for the periodic activities of a distributed embedded application.
Using this holistic scheduling approach, Song et al., in [32], studied the end-to-end task response time for
architecture composed of several ECUs, interconnected by CAN.
Faced with the complexity of this mathematical approach, the simulation of a distributed application
is therefore a complementary technique. It allows taking into account a more detailed model as well as
the unavoidable perturbations that may affect the foreseen behavior. For example, simulation-based
analysis [33] of the systempresented in[32] gave more realistic performance measures thanthose obtained
analytically.
2006 by Taylor & Francis Group, LLC
41-12 Embedded Systems Handbook
Engine
controller
AGB ABS/VDC Suspension WAS/DHC
ISU Z Z Y
CAN
VAN
FIGURE 41.2 Hardware architecture.
An outline of these two approaches is illustrated in Sections 41.3.2.3 and 41.3.2.2 by the means of
a common case study presented in Section 41.3.2.1; then, in Section 41.3.2.4, the respective results
obtained are compared. Finally, we show how a formal architecture description language, as intro-
duced in Section 41.2, is a strong factor for promoting validation and verication on virtual platform
in automotive industry.
41.3.2.1 Case Study
Figure 41.2 shows the electronic embedded system [34], used in the two following sections as a basis for
both mathematical and simulation approaches. In fact, this system is derived from an actual one presently
embedded in a vehicle manufactured by PSA Peugeot-Citron Automobile Co. [35]. It includes functions
related to power train, chassis, and body domains.
41.3.2.1.1 Hardware Architecture Level (Figure 41.2)
We consider nine nodes (ECU) that are interconnected by means of one CAN and oneVAN network.
The naming of these nodes recall the global function that they support. Engine controller, AGB
(Automatic Gear Box), ABS/VDC (Anti-lock Brake System/Vehicle Dynamic Control), WAS/DHC
(Wheel Angle Sensor/Dynamic Headlamp Corrector), Suspension controller refer nodes connected
on CAN, while X, Y, and Z (named so for condentiality reason) refer nodes connected on VAN.
At last, the ISU (Intelligent Service Unit) node ensures the gateway function between CAN andVAN.
The communication is supported by two networks: CAN 2.0A (bit rate equal to 250 kbps) and
VAN (time slot rate xed to 62.5 kTS/s).
The different ECUs are connected to these networks by means of network controllers. For this case
study are considered the Intel 82527 CAN network controller (14 transmission buffers), the Philips
PCC1008T VAN network controllers (one transmission buffer and one First In First Out (FIFO)
reception queue with two places), and the MHS 29C461 VAN network controllers (handling up to
14 messages in parallel).
41.3.2.1.2 Technical Level
The operating system OSEK [11] runs on each ECU. The scheduling policy is Fixed Priority Protocol.
Each OS task is a basic task, at OSEK sense. In the actual embedded system, preemption is not permitted
for tasks. In the study presented, analytical method is applied strictly to this system, while simulations are
run for different congurations among which, two accept preemptible tasks.
41.3.2.1.3 Operational Level
The entities that are considered at this level are tasks and messages (frames); they are summarized in
Figure 41.3, Figure 41.4, Figure 41.5, and Figure 41.7. The mapping of the logical architecture (not
presented here) onto the Technical/Hardware ones produces 44 OSEK OS tasks (in short, tasks, in the
following) and 19 messages exchanged between these tasks. Furthermore, we assume that a task operating
2006 by Taylor & Francis Group, LLC
Design and Validation Process 41-13
ECU: Suspension
P
i
T
i
D
i
T_SUS1 4 20 M9 20
T_SUS2 5 M5 20
T_SUS3 1 M1 10
T_SUS4 2 M2 14
T_SUS5 3 M7 15
ECU : Engine_Ctrl
P
i
T
i
Input Output Input Output Input Output
Input Output
Input Output
D
i
T_Engi ne1 1 10 M1 10
T_Engi ne2 4 20 M3 20
T_Engi ne3 7 20 M10 100
T_Engi ne4 3 M4 15
T_Engi ne5 2 M2 14
T_Engi ne6 6 M8 50
T_Engi ne7 5 M6 40
ECU:AGB
P
i
T
i
D
i
T_AGB1 2 15 M4 15
T_AGB2 3 50 M11 50
T_AGB3 4 M8 50
T_AGB4 1 M2 14
ECU: ABS/VDC
P
i
T
i
D
i
T_ABS1 20 M5 20
T_ABS2 40 M6 40
T_ABS3 15 M7 15
T_ABS4 100 M12 100
T_ABS5 M3 20
T_ABS6 M9 20
ECU: WAS/DHC
P
i
T
i
C
i
D
i
T_WAS1 1 14 2 M2 14
T_WAS2 2 M9 2 20
2
5
1
6
3
4
FIGURE 41.3 Operating systemtasks on nodes connected to CAN.
ECU: X
P
i
T
i
Input Output
Input Output
D
i
T_X 1 2 150 M16 150
T_X 2 4 200 M17 200
T_X 3 1 M15 50
T_X 4 3 M19 150
ECU: Z
P
i
T
i
Input Output D
i
T_Z1 2 100 M18 100
T_Z2 3 150 M19 150
T_Z3 4 M17 200
T_Z4 1 M15 50
ECU: Y
P
i
T
i
D
i
T_Y 1 2 50 M15 50
T_Y 2 3 M13 50
T_Y 3 1 M14 10
T_Y 4 4 M18 100
T_Y 5 5 M16 150
FIGURE 41.4 Operating systemtasks on nodes connected to VAN.
ECU: ISU
P
i
T
i
Input Output D
i
T_I SU1 4 50 M8 50
T_I SU2 5 M11 M13 50
T_I SU3 1 M1 M14 10
T_I SU4 6 M10 100
T_I SU5 3 M6 40
T_I SU6 2 M9 20
T_I SU7 7 M12 100
FIGURE 41.5 Operating systemtasks distributed on the gateway ECU.
system consumes (produces) possibly a message simultaneously to the beginning (respectively the end)
of its execution. In the case study, two kinds of task can be identied according to their activation law:
Tasks activated by occurrence of the event reception of a message (event-triggered tasks) as, for
example, T_Engine6 and T_ISU2.
Tasks that are activated periodically (time-triggered tasks), as T_AGB2.
Each task is characterized by its name and, on the ECU, named k, on which it is mapped by (see Figure 41.3,
Figure 41.4, and Figure 41.5):
T
k
i
. Its activation period in ms (for time-triggered tasks) or the name, M
n
, of the message whose
reception activates it (for event triggered tasks).
C
k
i
. Its WCET (Worst-Case Execution Time) on this ECU (disregarding possible preemption);
in the case study, we assume that this WCET is equal to 2 ms for each task.
D
k
i
. Its relative deadline in ms.
P
k
i
. Its priority.
M
i
. Its possibly produced message (we assume, in this case study, that at most, one message
is produced by one task; note the method can be applied even if a task produces more than
one message).
2006 by Taylor & Francis Group, LLC
41-14 Embedded Systems Handbook
Task
activation
Task
completion
t
Task response time
Pre-emption (for
preemptive task only)
FIGURE 41.6 Task response time.
t
i
Message
(frame)
Task
i
Producer
task
DLC
i
(bytes)
T
i
Inherited
period
M1 _Engine1 8 10
M2 _WAS1 3 14
M3 _Engine2 3 20
M4 AGB1 2 15
M5 _ABS1 5 20
M6 _ABS2 5 40
M7 _ABS3 4 15
M8 _ISU1 5 50
M9 _SUS1 4 20
M10 _Engine3 7 100
M11 _AGB2 5 50
M12 _ABS4
T
T
T
T
T
T
T
T
T
T
T
T 1 100
t
i
Message
(frame)
Task
i
Producer
task
DLC
i
(bytes)
T
i
Inherited
period
M13 T_ISU2 8 50
M14 T_ISU3 10 10
M15 T_Y1 16 50
M16 T_X1 4 150
M17 T_X2 4 200
M18 T_Z1 20
2
100
M19 T_Z2 150
(a) (b)
FIGURE 41.7 Messages exchanged over networks.
For notation convenience, we assume that, on each ECU named k, priority P
k
i
is higher than priority P
k
i +1
.
In the following section, a task is simply denoted
i
if its priority is P
k
i
on an ECU, named k.
The task response time is classically dened as the time interval between the activation of a given task
and the end of its execution (Figure 41.6). We denoted R
j
i
the task response time of the instance j of a
task
i
.
Each message (frame) is characterized by its name and (Figure 41.7):
DLC
i
. Its size, in byte.
C
i
. Its transmission duration; this duration is computed thanks to the formulae given in (41.1) and
(41.2) (see Sections 41.1.3.1.1 and 41.1.3.1.2).
Task
i
. The name of the task that produces it.
T
i
. Its inherited period (for time-triggered tasks), assumed in [31] and [32], to be equal to the
activation period of its producer task.
P
i
. Its priority.
A message will also be denoted by
i
if its priority is P
i
.
The message response time is the time interval between the production of a specic message and its
reception by a consumer task (Figure 41.8). We denoted R
j
i
the message response time of the instance j of
a message
i
.
Finally, in this system, from Figure 41.3, Figure 41.4, Figure 41.5, and Figure 41.7, we identify some
logical chains that are causal sequences of tasks and messages. In the case study, the most complex logical
chains that can be identied are:
lc1: T_Engine1 - M1 - T_ISU3 - M14 - T_Y3
and
lc2: T_AGB2 - M11 - T_ISU2 - M13 - T_Y2
2006 by Taylor & Francis Group, LLC
Design and Validation Process 41-15
Message
production
=
producer task
completion
t
Message response time
End of message
transmission
=
consumer task
activation
FIGURE 41.8 Message response time.
t
T_AGB2
M11
T_ISU2
M13
T_Y2
T_AGB2
activation
T_AGB2
completion
=
M11
production
T_ISU2
activation
T_Y2
activation
T_ISU2
completion
=
M13
production
T_Y2
completion
Logical chain response time for lc2
FIGURE 41.9 Example of logical chain.
Here, the task T_Y3 (respectively T_Y2), running on a VAN connected node, depends on the message
M14 (respectively M13) supported by the VAN bus; M14 (respectively M13) is produced by task T_ISU 3
(respectively T_ISU 2) running on the ISU node; T_ISU 3 (respectively T_ISU 2) is activated by the
message M1 (respectively M11) that is produced by T_Engine1 (respectively T_AGB2) running on CAN
connected nodes.
The logical chain response time, more generally named End-to-End Response Time, is dened for lc1
(respectively lc2) as the time interval between the activation of T_Engine1 (respectively T_AGB2) and the
completion of T_Y2 (respectively T_Y3) (Figure 41.9). We note R
j
lci
the logical chain response time of the
instance j of the logical chain lci.
41.3.2.1.4 Performance Properties
As presented in Figure 41.3, Figure 41.4, and Figure 41.5, relative deadline constraints are imposed on
each task in this application. Furthermore, for the given application, some other performance properties
were required. Among these properties, we focus on two specic ones:
1. Property A: No message, transmitted on CAN or VAN, is lost.
This means that no message can be overwritten in network controller buffers or, more formally,
that each message is transmitted before its inherited period T
i
, considered as the worst case.
2. Property B: This property is expressed on the two logical chain lc1 and lc2 presented above. The
logical chain response time for lc1 (respectively lc2) is as regular as possible for each instance of
2006 by Taylor & Francis Group, LLC
41-16 Embedded Systems Handbook
lc1 (respectively lc2). More formally, if R1 is the set of logical chain response times obtained for each
instance j of lc1 (respectively lc2), the property is: j, |R
j
lc1
E [R1]|.
This kind of property is commonly required in embedded automatic control applications where
the command elaborated through a logical chain has to be applied to an actuator as regularly as
possible.
An embedded system is correct if, at least, it meets the above-mentioned properties. Well, the task
scheduling policy on each node and the MAC protocol of VAN and CAN lead unavoidably to jitters
on task terminations. So, a mathematical approach as well as a simulation one were applied in order
to prove that the proposed operational architecture meets all its constraints. Thanks to a mathematical
approach, related to general techniques named timing analysis, for each entity (task or message) and for
each logical chain, we nd out lower and upper bounds of their respective response times. These values
represent the best and worst cases. In order to handle more detailed and more realistic models, we use a
simulation method, which furnishes minimum, maximum, and mean values of the same response times.
Furthermore, several simulations, with different parameter congurations, were performed in order to
obtain an architecture meeting the constraints. In fact, we use the mathematical approach for validating
the results obtained by simulation.
41.3.2.2 Simulation Approach
We model four different congurations of the presented operational architecture according to the formal-
ism supported by SES Workbench tool. For each conguration, we use this tool in order to run a simulation
and obtain different results. Furthermore, as we want to analyze specic response times, we introduce
adequate probes in the model. Thanks to this, the log le obtained throughout the simulation process can
be easily analyzed by applying an elementary lter that furnishes the results in a readable way.
Three kinds of parameters are considered and can be different from one conguration to another:
Networks controllers, specically the VAN ones
The fact that tasks can be preempted or not
The task priority
Rather than describing a simulation campaign that should exhaustively include each possible case among
these possibilities, we prefer to present it by following an intuitive reasoning, starting from a given
conguration (conguration 1) and, by modifying one kind of parameter at a time, leads successively to
better conguration (conguration 2, then conguration 3) for nally reaching a correct conguration
that veries the required properties A and B.
41.3.2.2.1 Conguration 1
As a rst simulation attempt:
As given in the description of the actual embedded system (see Section 41.3.2.1), all the tasks are
considered as being OSEK basic tasks and are characterized by their local priority. Moreover, their
execution is done without preemption.
We assign the Intel 82527 controller to each node connected to CAN bus and the Philips PCC1008T
controller for those are connected on VAN network. Note that the ISU ECU integrates these two
network controllers.
In this case, a probe is introduced in the model; it observes the occurrences of message production and
message transmission and detects the fact that a given instance of a message is stored in the buffer of a
network controller before the instance of the previous produced message was transmitted through the
network. For each of these detected events, it writes a specic string in the log le. The lter then consists
in extracting the lines containing this string from the log le. A screenshot is given in Figure 41.10 where
it can be seen that some messages are overwritten in the single transmission buffer of the VAN controller
that was chosen for this conguration. So, we conclude that the property A is not veried.
2006 by Taylor & Francis Group, LLC
Design and Validation Process 41-17
FIGURE 41.10 Log le ltered for verication of property A.
Logical chain response times
Simulation results Analytic results
Configurations
Logical
chain
minimum mean maximum
Standard
deviation
minimum maximum
Configuration 2 l c1 9.09 11.03
12.82
16.67 1.775 8.992 22.116
l c2 9.82 16.67 1.458 8.576 41.172
Configuration 3 l c1 9.09 9.45 12.62 0.667 8.992 16.116
l c2 12.45 14.16 20 2.101 8.576 35.172
Configuration 4 l c1 9.09 9.45 12.62 0.667 8.992 16.116
l c2 12.45 12.61 14.11 0.490 8.576 27.172
FIGURE 41.11 Response time evaluation.
41.3.2.2.2 Conguration 2
One of the possible causes for the nonverication of property A by the previous conguration could
be that the VAN controller PCC1008T, providing only one single buffer, is not suitable for the required
performance property. So, we assignthe full VANcontrollerMHS 29C461 to all nodes transmitting message
on the VAN bus (ISU computer, X, Y, and Z). We modify the SES Workbench model and we relaunch
the simulation. This time probes and lters proposed for the conguration 1 provides an empty list.
So we can conclude that messages are correctly transmitted and that property A is veried. Furthermore,
SES Workbench gives additional results such as the network load; for this conguration the load of CAN
bus is less than 21.5%and of VAN bus less than 41%.
On the same conguration, we study property B. For this purpose, probes are introduced for observing
the occurrences of the rst task activation (T_Engine1 or T_AGB2) and the occurrences of the last task
completion (T_Y3 or T_Y2). Alter is developed for the evaluation of the minimum, mean, and maximum
logical chain response times of lc1 and lc2 as well as their standard deviation. The obtained results are given
in Figure 41.11. Under this conguration, none of the chains meet the required property.
41.3.2.2.3 Conguration 3
In the two last congurations, preemption is not allowed for any task. We change this characteristic and
allowthe preemption; as T_Engine1 and T_AGB2 have the highest local priority and considering that they
are basic tasks, they will never wait for the processor. One more, we model the operational architecture by
modifying the scheduling policy on nodes Engine_ctlr and AGB without changing the other parameters.
The same probes and lters are used; the results obtained by simulation of conguration 3 are shown
in Figure 41.11. We can conclude that property B is veried only for the logical chain lc1. So, this
conguration does not correspond to a correct operational architecture.
41.3.2.2.4 Conguration 4
Further log le analysis points out the problem: priority of T_ISU2 is probably too weak. After modifying
the priority of this task (2 in place of 5), by always using the same probes and lters and simulating the
new model, we obtain the results presented in Figure 41.11. The property B is veried for lc1 and lc2.
2006 by Taylor & Francis Group, LLC
41-18 Embedded Systems Handbook
41.3.2.3 Deterministic Timing Analysis
In order to validate these results, we apply analytic formulas of [32] to the case study. The main purpose
of this analysis is to obtain the lower (best case) and the upper (worst case) bounds on the response times.
It is worth noting that in practice neither the best case nor the worst case can necessarily be achieved, but
they provide some deterministic bounds.
As time-triggered design approach is adopted, both tasks and messages are hoped to be periodic
although in practice jitters exist on events whose occurrences are supposed to be periodic. In the following,
a nonpreemptive task or a message
i
whose priority is P
i
can be indifferently characterized by (C
i
,T
i
) as
dened previously.
As introduced earlier, we are interested in evaluating:
The response time R
i
of such a task or message of priority P
i
.
The logical chain response time of lc1 and lc2 obtained by summing these individual response
times.
41.3.2.3.1 Best-Case Evaluation
The best case corresponds to the situation where a task
i
(respectively a message
i
), whose priority is P
i
,
is executed (respectively transmitted) without any waiting time. In this case,
R
i
= C
i
(41.3)
The best case of the logical chain response time is the sumof best-case response times of all entities (tasks
and messages) involved in the chain. Applying it to the two logical chains, we obtain (see Figure 41.11):
R_best
lcx
=
ylcx
C
y
(41.4)
41.3.2.3.2 Worst-Case Evaluation
We distinguish the evaluation of the worst case for task and message response time.
Messages. For a message
i
of priority P
i
the worst-case response time can be calculated as:
R
i
= C
i
+I
i
(41.5)
where I
i
is the interference period during which the transmission medium is occupied by other higher
priority messages and by one lower priority message (because of nonpreemption). Take notice of the fact
that the message response time is dened here in a way different fromthose specied by Tindell and Burns
in [36], so, the jitter J
i
is not included in the formulae (41.5).
The following recurrence relation can calculate the interference period I
i
:
I
n+1
i
= max
i+1jN
(C
j
) +
i1
j=1
I
n
i
+J
j
T
j
+1
C
j
(41.6)
where N is the number of messages and max
i+1jN
(C
j
) is the blocking factor due to the nonpreemption.
A suitable initial value could be I
0
i
= 0. Equation (41.6) converges to a value as long as the transmission
mediums utilization is less than or equal to 100%. We also notice that the jitters should be taken into
account for the calculation of the worst-case interference period as the higher priority messages are
considered periodic with jitters.
Tasks. For a task
i
whose priority is P
i
, the same arguments leadtoformulae similar tothose obtainedfor
messages. However, we must distinguish two cases for the task response time evaluation. For the nonpree-
mptive xed priority scheduling, equations (41.5) and (41.6) are directly applicable while, if the basic tasks
2006 by Taylor & Francis Group, LLC
Design and Validation Process 41-19
are scheduled thanks to a preemptive xed priority policy, the factor max
i +1j N
(C
j
) has not to be
considered (the possibility of preemption ensures that a task at a given priority level cannot be preempted
by a task at a lower level). Therefore, the following recurrence relation allows to calculate the response
time of a basic preemptive task
R
n +1
i
= C
i
+
j <i
J
j
+ R
n
i
T
j
C
j
(41.7)
Again, equation (41.7) converges to a value as long as the processor utilization is less than or equal to 100%.
In addition, a suitable initial value for computing could be R
0
i
= 0.
Logical chains. Finally, we can apply equation (41.5) for nonpreemptive case (respectively
equation [41.7] for preemptive case) to calculate the worst-case response time of the two logical chains:
R_worst
lcx
=
y lcx
R
y
(41.8)
Figure 41.11 presents the bounds (minimum and the maximum response time) obtained thanks to this
mathematical timing analysis for both the logical chains according to the equations (41.4) and (41.8).
Note that the maximum response time in Figure 41.11 corresponds to the nonpreemptive case for the
conguration 2 while the other two congurations are based on preemptive assumption.
41.3.2.4 Comments on Results
First, we notice that simulation results remain within bounds given by the analytical method of
Section 41.3.2.3. However, it can be seen that analytic bounds for the worst case are never reached
during simulation. Maximum values obtained by simulation vary from 40 to 70% of analytic calculated
worst cases while mean values vary from 30 to 60%. The importance of the simulation to obtain more
realistic results becomes obvious when evaluating performances of an embedded system. From these
tables we can also see that, compared with nonpreemptive scheduling, preemptive scheduling logically
results in shorter response time for high priority tasks and longer response time for low priority tasks.
Note however that this fact seems to be in contrast with analytic method results, where the worst-case
bound gets better for preemptive policies than for nonpreemptive ones, irrespective of task priority! This
is perfectly normal when results from the two methods have not to be interpreted in the same way: analytic
results can be used as bounds to validate simulation results, but they have different meanings and they are
rather complementary.
41.3.2.5 Automatic Generation of Models for Simulation Purpose
Usually, the direct use of a general-purpose simulation platform is not judged suitable by in-vehicle
embedded system designers since too much effort must be made in building the simulation model.
Thanks to a nonambiguous description of embedded systems, as seen in Section 41.2, it is possible
to generate automatically a model that can be run on a specic discrete simulation tool. For example,
in [34], is proposed a modeling methodology, developed in collaboration with the French carmaker PSA
Peugeot-Citron, based on a component approach. This methodology has been implemented through
the development of a simulation tool called Carosse-Perf and based on SES Workbench simulation
platform [37]. It is composed, on the one hand, of a library of prebuilt component modeled in SES
Workbench formalism and, on the other hand, of a constructor that uses these models and the descrip-
tion of the embedded distributed architecture in order to obtain the whole model that will be simulated.
The constructor extracts the pertinent information from the static description of the system at logical
architecture level (tasks, data exchanged between tasks, behavior), from the technical and hardware archi-
tectures (policies for the access to resources scheduler policy and network protocols , performances
of the hardware components) and, nally, from the description of how the logical architecture is mapped
onto the technical one. Technical and hardware architecture components are modeled once and for
all in SES Workbench formalism. The principle of the model building is presented in Figure 41.12(a).
2006 by Taylor & Francis Group, LLC
41-20 Embedded Systems Handbook
Library of
predefined
hardware
components
models
Hardware
architecture
modeling
Hardware
architecture
description
Hardware model
in SES
workbench
language
Compilation
Runable simulation
program
E
x
t
r
a
c
t
i
o
n
Constraints
description
Logical
architecture
description
Environment
scenario
description
Runable simulation
program
Simulation
Trace
Trace
analysis
Results
(a) (b)
ESI
LAI
FIGURE 41.12 Simulator generation and simulation process.
As at the simulation step, the behavior of the logical architecture entities (tasks and messages) and the envi-
ronment signal occurrences animate the simulation, the constructor has to include two generic modules
in the model that will be executed by the simulator: a logical architecture interpreter and an environment
scenario interpreter (named respectively LAI and ESI in Figure 41.12) whose role is to extract, during
the simulation, the current event from logical architecture entities or environment signals that is to be
managed by the discrete event simulator.
This kind of tool allows designers to easily build a simulation model of their new in-vehicle embedded
systems (operational architecture) andthento simulate the model. More details about the underlying prin-
ciples can be found in [34]. Carosse-Perf was used to automatically construct the models corresponding
to the four congurations of the case study and to simulate them.
41.4 Conclusions and Future Trends
The part of embedded electronics and especially embedded software takes more and more importance
within a car, in terms of both the functionality and cost. Due to the cost, real-time, and dependability
constraints in automotive industry, many automotive specic networks (e.g., CAN, LIN, FlexRay) and
operating systems (e.g., OSEK/VDX) have been or are still being developed, most of themwithin the SAE
standardization process.
Todays in-vehicle embedded systemis a complex distributed system, mainly composed of four different
domains: power train, chassis, body, and telematic. Functions of the different domains are under quite
different constraints. SAEhas classied the automotive applications into classes A, B, and Cwith increasing
order of criticality on real-time and dependability constraints. For the design and validation of such
a complex system, an integrated design methodology as well as the validation tools are therefore necessary.
After introducing the specicity of the automotive applicationrequirements interms of time-to-market,
design cost, variant handling, real-time and dependability, and multipartner involvement (carmakers and
suppliers) during the development phases, in this chapter, we have described the approach proposed by
2006 by Taylor & Francis Group, LLC
Design and Validation Process 41-21
EAST-ADL which is a promising design and development framework tailored to t the specic needs of
the embedded automotive applications.
Concerning the validation of the meeting of the application constraints by an implementation of the
designed embedded system, we have reviewed the possible ways and illustrated the use of simulation for
validating the real-time performance. This illustration is done through a case study drawn from a PSA
Peugeot-Citron application. The obtained results have shown that the use of simulation approach, com-
bined with the timing analysis method (especially the holistic scheduling method), permits to efciently
validate the designed embedded architecture.
If we can consider that the power train and body domains begin to achieve their maturity, the chassis
domain and especially the X-by-Wire systems are however still in their early developing phase. The
nalization of the new protocol FlexRay as well as the development of the 42 V power supply will certainly
push forward the X-by-Wire system development. The main challenge for X-by-Wire systems is to prove
that their dependability is at least as high as that of the traditional mechanical/hydraulic systems.
Portability of embedded software is another main preoccupation of the automotive embedded applica-
tion developers, and consists of another main challenge. For this purpose, carmakers and suppliers
established AUTOSAR consortium (http://www.autosar.org/) to propose an open standard for automotive
embedded electronic architecture. It will serve as a basic infrastructure for the management of functions
within both future applications and standard software modules. The goals include the standardization of
basic system functions and functional interfaces, the ability to integrate and transfer functions, and to
substantially improve software updates and upgrades over the vehicle lifetime.
41.5 Appendix: In-Vehicle Electronic System
Development Projects
System Engineering of Time-Triggered Architectures (SETTA). This project (January 2000 to December
2001) was partly funded by the European Commission under the Information Society Technologies pro-
gram. The overall goal of the SETTA project was to push time-triggered architecture an innovative
European-funded technology for safety-critical, distributed, real-time applications such as y-by-wire or
drive-by-wire to future vehicles, aircraft, and to train systems. The consortium was led by Daimler-
Chrysler AG. DaimlerChrysler and the partners Alcatel (A), EADS (D), Renault (F), and Siemens VDO
Automotive (D) acted as the application providers and technology validators. The technology providers
were Decomsys (A) and TTTech (A). The academic research component was provided by the University
of York (GB), and the Vienna University of Technology (A). http://www.setta.org/.
Embedded Electronic Architecture (EAST-EEA) ITEA-Project-No. 00009. The major goal of
EAST-EEA (July 2001 to June 2004) was to enable a proper electronic integration through denition
of an open architecture. This would allow to reach hardware and software interoperability and reuse for
mostly distributed hardware. The partners were AUDI AG (D), BMW AG (D), DaimlerChrysler AG (D),
Centro Ricerche Fiat (I), Opel Power train GmbH (D), PSA Peugeot Citron (F), Renault (F), Volvo
Technology AB (S), Finmek Magneti Marelli Sistemi Elettronici (I), Robert Bosch GmbH (D), Siemens
VDO Automotive AG (D), Siemens VDO Automotive SAS (F), Valeo (F), ZF Friedrichshafen AG (D),
ETAS GmbH (D), Siemens SBS C-LAB (D), VECTOR Informatik (D), CEA-LIST (F); IRCCyN (F),
INRIA (F), Linkping University of Technology (S), LORIA (F), Mlardalen University (S), Paderborn
University C-LAB (D), Royal Institute of Technology (S), Technical University of Darmstadt (D).
www.east-eea.net/docs.
AEE project (Embedded Electronic Architecture). This project (November 1999 to December 2001) was
granted by the French Ministry for Industry. It involved French carmakers (PSA, RENAULT), OEM
suppliers (SAGEM, SIEMENS, VALEO), EADS LV company and research centers (INRIA, IRCCyN,
LORIA). It aimed to specify new solutions for in-vehicle embedded system development. The Architecture
Implementation Language (AIL_Transport) had been dened to specify and describe precisely any vehicle
electronic architecture. http://aee.inria.fr/en/index.html.
2006 by Taylor & Francis Group, LLC
41-22 Embedded Systems Handbook
Electronic Architecture and System Engineering for Integrated Safety Systems (EASIS). The goal of the
EASIS project (January 2004 to December 2006) is to dene and develop a platform for software-based
functionality in vehicle electronic systems providing common services upon which future applications
can be built; a vehicle on-board electronic hardware infrastructure which supports the requirements
of integrated safety systems in a cost effective manner; a set of methods and techniques for handling
critical dependability-related parts of the development lifecycle and an engineering process enabling the
application of integrated safety systems. This project is funded by the European Community (6th FWP).
Partners are Kuratorium Ofs E. V. (G), DAF Trucks N.V. (N), Centro Richerche FIAT, Societa Consortile
per Azioni (I), Universitaet Duisburg Essen, Standort Essen (G), Dspace GMBH (G), Valo
Electronique et Systmes de Liaison (F), Motorola GMBH (G), Peugeot-Citron Automobiles SA (F),
Mira Limited (UK), Philips GMBH Forschungslaboratorien (G), ZF Friedrichshafen AG (G), Adam
Opel Aktiengesellschaft (G), ETAS (G), Volvo Technology AB (S), Lear Automotive (S), S.L. (S), Vector
Informatik GMBH (G), Continental Teves AG & CO. OHG (G), Decomsys GMBH (A), Regienov (F),
Robert Bosch GMBH (G).
Automotive Open System Architecture (AUTOSAR). The objective of the partnership involved in
AUTOSAR (May 2003 to August 2006) is the establishment of an open standard for automotive E/E
architecture. It will serve as a basic infrastructure for the management of functions within both future
applications and standard software modules. The goals include the standardization of basic system func-
tions and functional interfaces, the ability to integrate and transfer functions and to substantially improve
software updates and upgrades over the vehicle lifetime. The AUTOSAR scope includes all vehicle domains.
A three-tier structure, proven in similar initiatives, is implemented for the development partnership.
Appropriate rights and duties are allocated to the various tiers: Premium Members, Associate Members,
Development Member, and Attendees. http://www.autosar.org/.
References
[1] Society of Automotive Engineers, www.sae.org.
[2] G. Leen, D. Heffernan, Expanding automotive electronic systems, Computer, 35, 8893, 2002.
[3] A. Sangiovanni-Vincentelli, Automotive Electronics: Trends and Challenges, Convergence 2000,
Detroit MI, October 2000.
[4] F. Simonot-Lion, In Car embedded electronic architectures: how to ensure their safety, in Pro-
ceedings of the 4th IFAC Conference on Fieldbus Systems and their Applications FET03. Aveiro,
Portugal, July 2003, pp. 18.
[5] ISO, Road Vehicles Interchange of Digital Information Controller Area Network for High-Speed
Communication, ISO 11898, International Organization for Standardization (ISO), 1994.
[6] ISO, Road Vehicles Low-Speed Serial Data Communication Part 2: Low-Speed Controller Area
Network, ISO 11519-2, International Organization for Standardization (ISO), 1994.
[7] ISO, Road Vehicles Low-Speed Serial Data Communication Part 3: Vehicle Area Network,
ISO 11519-3, International Organization for Standardization (ISO), 1994.
[8] B. Abou, J. Malville, Le bus VAN Vehicle Area Network: Fondements du protocole, Dunod, Paris,
1997.
[9] SAE, Class B Data Communications Network Interface, J1850, Society of Automotive Engineers
(SAE), May 2001.
[10] TTTech, Specication of the TTP/C Protocol, Version 0.5, TTTech Computertechnik GmbH,
July 1999.
[11] OSEK, OSEK/VDX Operating System, Version 2.2, 2001. http://www.osek-vdx.org.
[12] J.B. Goodenough, L. Sha, The priority ceiling protocol: a method for minimizing the blocking of
high priority tasks, in Proceedings of the 2nd International Workshop on Real-Time Ada Issues, Ada
Letters 8, 1988, pp. 2031.
[13] Modistarc Project, http://www.osek-vdx.org/whats_modistarc.htm.
[14] http://www.arcticus.se/.
2006 by Taylor & Francis Group, LLC
Design and Validation Process 41-23
[15] U. Freund, M. von der Beeck, P. Braun, and M. Rappl, Architecture centric modeling of automotive
control software, SAE Technical paper series 2003-01-0856.
[16] DECOS Project, http://www.decos.at/.
[17] A. Rajnak, K. Tindell, and L. Casparsson, Volcano Communications Concept, Volcano Communica-
tions Technologies AB, Gothenburg, Sweden, 1998. Available at http://www.vct.se.
[18] P. Giusto, J.-Y. Brunel, A. Ferrari, E. Fourgeau, L. Lavagno, and A. Sangiovanni-Vincentelli,
Automotive virtual integration platforms: whys, whats, and hows, in Proceedings of the 2002
IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD02),
Freiburg, Germany, 1618 September, 2002, pp. 370378.
[19] M. Nenad, N.T. Richard, A framework for classifying and comparing architecture description
languages, Technical report, Department of Information and Computer Science, University of
California, Irvine, 1997.
[20] V. Steve, MetaH Users Manual, Honeywell Technology Center, Carnegie-Mellon, 1995.
http://www.htc.honeywell.com/metah/uguide.pdf.
[21] AEE, Architecture Electronique Embarque, 1999, http://aee.inria.fr.
[22] E. Jean-Pierre, S.-L. Franoise, An architecture description language for in-vehicle embedded
system development, in Proceedings of the 15th IFAC World Congress, IFAC B02, Barcelona, Spain,
2126 July, 2002.
[23] M. Jrn, E. Jean-Pierre, Embedded electronic architecture, in Proceedings of 3rd International
Workshop on Open Systems in Automotive Networks, Bad Homburg, Germany, 23 February, 2000.
[24] ITEA EAST EEA Project, www.east-eea.net/docs.
[25] U. Freund, O. Gurrieri, J. Kster, H. Lonn, J. Migge, M.O. Reiser, T. Wierczoch, and M. Weber,
An architecture description language for developing automotive ECU-software, in INCOSE2004,
Toulouse, France, 2024 June, 2004, pp. 101112.
[26] www.mathworks.com/.
[27] Ascet SupplyChain, www.ascet.com/.
[28] Ilogic Statemate, www.ilogix.com/.
[29] Esterel Technologies SCADE Suite
TM
for Safety-Critical Software, www.esterel-
technologies.com.
[30] C. Jard. Automatic Test Generation Methods for Reactive Systems. CIRM Summer School,
Marseille, 1998.
[31] Tindell Ken and Clark John, Holistic schedulability analysis for distributed hard real-time systems,
Microprocessor and Microprogramming, 40, 117134, 1994.
[32] Y.Q. Song, F. Simonot-Lion, and N. Navet, De lvaluation de performances du systme de
communication la validation de larchitecture oprationnelle cas du systme embarqu
dans lautomobile, Ecole dt temps rel 1999, Poitiers (France), C.N.R.S., Poitiers (France),
Ed. LISI-ENSMA, 1999.
[33] Y.Q. Song, F. Simonot-Lion, and B. Pierre, VACANS A tool for the validation of CAN-based
applications, in Proceedings of WFCS97, Barcelona, Spain, October 1997.
[34] C. Paolo, Y.Q. Song, F. Simonot-Lion, and A. Mondher, Analysis and simulation methods for
performance evaluation of a multiple networked embedded architecture, IEEE Transactions on
Industrial Electronics, 49, 12511264, 2002.
[35] C. Alain, The electrical electronic architecture of PSA Peugeot Citroen vehicles: current situation
and future trends, in Presentation at Networking and Communication in the Automobile, Munich,
Germany, March 2000.
[36] K. Tindell and A. Burns, Guaranteed message latencies on controller area network (CAN),
in Proceedings of the 1st International CAN Conference, ICC94, 1994.
[37] SES Workbench, HyPerformix Inc. http://www.hyperformix.com.
2006 by Taylor & Francis Group, LLC
42
Fault-Tolerant
Services for Safe
In-Car Embedded
Systems
Nicolas Navet and
Franoise Simonot-Lion
Institut National Polytechnique de
Lorraine
42.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42-1
The Issue of Safety-Critical Systems in the Automotive
Industry Generic Concepts of Dependability
42.2 Safety-Relevant Communication Services . . . . . . . . . . . . . 42-3
Reliable Communication Higher-Level Services
42.3 Fault-Tolerant Communication Systems . . . . . . . . . . . . . . . 42-8
Dependability from Scratch: TTP/C Scalable
Dependability: FlexRay Adding Missing Features to an
Existing Protocol: CAN
42.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42-12
Acknowledgment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42-12
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42-12
42.1 Introduction
In the next decade, most features of a car will be supported by an electronic embedded system. This
strategy is already used for functions such as light management, window management, door manage-
ment, etc., as well as for the control of traditional functions, such as braking, steering, etc. Moreover,
the planned deployment of X-by-Wire technologies is leading the automotive industry in the world of
safety-critical applications. Therefore, such systems must, obviously, respect their functional requirements,
obey the properties of performance and cost and furthermore, guarantee their dependability despite the
possible faults (physical or design) that may occur. More precisely, the design of such systems must
take into account the dependability of two kinds of requirements. On the one hand, safety, the absence
of catastrophic consequences, for the driver, the passengers, and the environment, has to be ensured
and on the other hand, the system has to provide reliable service and be available for the request of its
users. This section introduces the emerging standards that are likely to inuence the certication process
for in-vehicle embedded systems and describes the general concepts of dependability and the means by
which dependability can be attained. The communication system is a key point for an application: it is
42-1
2006 by Taylor & Francis Group, LLC
42-2 Embedded Systems Handbook
in charge of the transmission of critical information or events between functions that are deployed on
distant stations (Electronic Control Units ECUs) and it is a means for the OEM(car-makers) to inte-
grate functions provided by different suppliers. So, in this chapter, we pay special attention to in-vehicle
embedded networks and to the services that enhance the dependability of the exchanges and the depend-
ability of the embedded applications. Note that a classical means, that is sometimes imposed by the
regulatory policies in domains close to those in automotives, consists of introducing mechanisms that
enable a system to tolerate faults. The purpose of Section 42.2 is to present the main services, provided
by a protocol, that allow an application to tolerate certain faults. These services generally provide fault
detection and, for some of them, are able to mask fault occurrences from upper layer and to prevent
the propagation of faults. In Section 42.3, we compare some classes of protocols with respect to their
ability to ensure services for increasing the dependability of an application. For each class, we will dis-
cuss the effort needed at the middleware level or application level for reaching the same quality of
system.
42.1.1 The Issue of Safety-Critical Systems in the Automotive Industry
In some domains recognized as critical (e.g., nuclear plants, railways, avionics), safety requirements in
computer-basedembeddedsystems are very rigorous andthe manner of specicationandthe management
of dependability/safety requirements are animportant issue. These systems have toobey regulatory policies
that require these industries to followa precise certication process. At the moment, nothing similar exists
inthe automotive industry for certifying electronic embeddedsystems. Nevertheless, the problemis crucial
for car-makers as well as for suppliers and, so, several proposals, are presently under study. Among the
existing certication standards [1], RTCA/DO-178B [2], used in avionics, or EN50128 [3], applied in the
railway industry, provide stringent guidelines for the development of a safety-critical embedded system.
But, these standards are hardly transposable for in-vehicle software-based systems: partitioning of software
(critical/noncritical), multiple versions, dissimilar software components, use of active redundancy, and
hardware redundancy. In the automotive sector, the Motor Industry Software Reliability Association
(MISRA), a consortium of the major actors of automotive products in UK, proposes a loose model for
the safety-directed development of vehicles with software on-board [4]. Finally, the generic standard
IEC 61508 [5], applied to Electrical/Electronic/Programmable Electronic systems is a good candidate for
supporting a certication process in the automotive industry. In Europe, in particular, in the transport
domain, the trend is to move fromrule-based torisk-based regulation [6]. So, the certication process
will certainly be based on the denition of safety performance levels that characterize a safety function
regarding the consequences of its failures dened as catastrophic, severe, major, minor, or insignicant.
The IEC 61508 standard proposes, in addition to other requirements on the design, validation, and
testing processes, four integrity levels, termed Safety Integrity Levels (SILs) and a quantitative safety
requirement for each (see Table 42.1). The challenge is therefore to prove that each function realized by
a computer-based system, reaches the requirements imposed by its SIL. Dependability,safety,failure,
etc. are terms used in standard documents. So, we evoke, in the next section, denitions admitted in the
context of dependability.
TABLE 42.1 Relationship between Integrity Levels and
Quantitative Requirements for a Systemin Continuous
Operation (IEC-61508)
Integrity Probability of dangerous
level failure occurence/h
SIL 4 P 10
8
SIL 3 10
8
< P 10
7
SIL 2 10
7
< P 10
6
SIL 1 10
6
< P 10
5
2006 by Taylor & Francis Group, LLC
Fault-Tolerant Services 42-3
42.1.2 Generic Concepts of Dependability
Dependability is dened in Reference 7 as the ability of a system to deliver service that can justiably be
trusted. The service delivered by a system is its behavior as it is perceived by another system (human or
physical) interacting with it.
A service can deviate from its desired functionality. The occurrence of such an event is termed a failure.
An error is dened as the part of the system state that may cause a failure. A fault is the determined
or hypothesized cause of an error. It can be active, when it produces an error and dormant otherwise.
A system fails according to several failure modes. A failure mode characterizes a service that does not t
with its desired functionality according to three parameters: the failure domain (value domain or time
domain, see Section 42.2.2.2.1), the perception of the failure by several users of the system (consistent or
inconsistent), and the consequences of the failures (from insignicant to catastrophic). As we will see in
Section 42.2 at the communication level, services are available to contend with the occurrences of failures
in the value or time domain and to preserve the consistency, as well as the possibility, of the perception of
a failure by several stations. The consequence of a failure at the communication level is the responsibility
of the designer of the embedded systemand its assessment is a difcult issue.
Dependability is a concept that covers, in fact, several attributes. Froma quality point of view, reliability,
or the continuity of a correct service, and availability, expressing the readiness for a correct service, are
important for automotive embedded systems. Note that the online detection of a lowlevel of the reliability
or availability of a service supported by an embedded systemcan lead to thenonavailabilityof the vehicle
and consequently affect the quality of the vehicle as perceived by the customer.
Safety is the reliability of the system regarding critical failure modes, or failure modes leading to cata-
strophic, severe, or major consequences [8]. This attribute characterizes the ability of a systemto avoid the
occurrences of catastrophic events that may be very costly in terms of monetary loss and human suffering.
One way to reach the safety objective is, rst, to apply a safe development process in order to prevent
and remove any design faults. As presented in Reference 9, this method has to be completed, in the design
step, with an evaluation of the embedded systems behavior (fault forecasting). This can be achieved
through a qualitative analysis (identication of failure modes, component failures, environmental condi-
tions leading to a system failure) and a quantitative analysis (the probability evaluation applied to some
parameters for the verication of dependability properties). The last means for reaching dependability is
to apply a fault-tolerant approach. This technique is mandatory for in-car embedded systems because the
environment of the system is partially known and the reliability of the hardware components cannot be
fully guaranteed.
Note that, the problem, in the automotive industry, is not only to be compliant to standards whose
purpose mainly concerns the safety of the driver, the passengers, the vehicle, and its environment but also
to ensure a level of performance, comfort, and, more generally, the quality of the vehicle. The specication,
in a quantitative way, of the properties required by an electronic embedded system, and the proof that this
systemmeets these requirements are the principal challenges in the automotive industry.
42.2 Safety-Relevant Communication Services
In this section, we discuss the main services and functionalities that the communication system should
offer for easying the design of fault-tolerant automotive applications. In order to reduce the development
time and increase quality through the reuse of validated components, these services should, as much as
possible, be implemented in layers belowthe application-level software. More precisely, some services such
as the global time are usually provided by the communication controller, while others, such as redundancy
management, are implemented in the middleware software layer (e.g., OSEK fault-tolerant layer [10] or
the middleware described in Reference 11). As suggested in Reference 12, solutions where the middleware
is running on a dedicated CPU, will enhance the predictability of the systemby reducing the interactions
between the middleware layer and the application-level software. In particular, it will prevent conicts in
accessing the CPU, which may induce temporal faults such as missed deadlines.
2006 by Taylor & Francis Group, LLC
42-4 Embedded Systems Handbook
42.2.1 Reliable Communication
The purpose of this section is to discuss the main services and features related to the data exchange
one can expect for safety-critical automotive applications. On the one hand, these services serve to hide
the occurrence of faults from higher levels. For example, a shielded transmission support will mask
some EMIs (electromagnetic interferences), considered as faults. On the other hand, other services are
intended to detect the occurrence of errors and to avoid their propagation in the system (e.g., a Cyclic
Redundancy Check [CRC] will prevent corrupted data from being used by an applicative process).
42.2.1.1 Robustness against EMIs
Embedded automotive systems suffer from environmental perturbations such as particles, temperature
peaks, or EMIs. The EMI type of perturbations has been identied for a long time [13,14] as being
a serious threat to the correct behavior of an automotive system. EMIs can either be radiated by some
in-vehicle electrical devices (switches, relays, etc.) or come from a source outside the vehicle (radio, radar,
ashes of lightning, etc.). EMIs could affect the correct functioning of all the electronic devices but the
transmission support is a particularly weak link. The whole problem is to ensure that the system will
behave according to its specication, whatever the environment.
In general, the same Medium Access Control (MAC) protocol can be implemented on different types
of physical layers (e.g., unshielded pair, shielded twisted pair, or plastic optical ber) which exhibit signif-
icantly different behavior with regards to EMIs (see Reference 15 for more details on the electromagnetic
sensitivity of different types of transmission support). Unfortunately, the use of an all-optical network,
which offers very high immunity to EMIs, is not generally feasible because of the low-cost requirement
imposed by the automotive industry.
Besides using a resilient physical layer, another means to alleviate the EMI problem is to replicate the
transmission channels where each channel transports its own copy of the same frame. Although an EMI
is likely to affect both channels in quite a similar manner, the redundancy provides some resilience to
transmission errors.
The two previous approaches are classical means for hiding as well as a possible fault due to EMIs
that can occur at the physical layer level. Nevertheless, when a frame is corrupted during transmission
(i.e., at least one bit has been inverted), it is crucial that the receiver be able to detect it in order to discard
the frame. This is the role of the CRCwhose so-called Hamming distance indicates the number of inverted
bits below which the CRC will detect the corruption. It is worth noting that if the Hamming distance of
the MAC protocol CRC is too small with regard to the dependability objectives, a middleware layer can
transparently insert an additional CRC in the data eld of the MAC level frame. This will reinforce the
ability of the systemto detect errors happening during the transmission.
42.2.1.2 Time-Triggered Transmissions
One major design issue is to ensure that at run-time no errors will jeopardize the requirements imposed on
the temporal behavior of the system; for data exchanges, these temporal requirements can be imposed on
response times of frames or jitter upon reception. Among communication networks, one distinguishes
time-triggered (TT) protocols where transmissions are driven by the progress of time (i.e., frames are
transmitted at predened points in time) and event-triggered (ET) protocols where transmissions are
driven by the occurrence of events. Major representatives of ET and TT protocols considered for use in
safety-critical in-vehicle communications will be discussed in Section 42.3. Both types of communications
have advantages and drawbacks but it is now widely admitted that dependability is much easier to ensure
using a TT bus (see, for instance, [9,1618]), the main reasons being that:
Access to the medium is deterministic (i.e., the order of the transmissions is dened statically at
the design time and organized inrounds that repeat in cycles), and thus the frame response times
are bounded and there is no jitter at reception.
2006 by Taylor & Francis Group, LLC
Fault-Tolerant Services 42-5
It simplies the composability, which is the ability to add new nodes without affecting existing
ones,
1
as well as partitioning, which is the property that assures that a failure occurring in one
subsystem cannot propagate to others.
The behavior of a TT communication system is predictable, which makes it easier to understand
its behavior and verify if the temporal constraints have been respected.
Message transmissions can be used as heartbeats, which allow a very prompt detection of station
failures.
Finally, the medium access scheme does not limit the network bandwidth, as is the case with the
arbitration on message priority used by Controller Area Network (CAN), and thus large amounts
of data can be transferred between nodes.
These reasons explain that, currently, only TT communication systems are being considered for use in
safety-critical applications, such as steer-by-wire [19,20] or brake-by-wire.
42.2.1.3 Global Time
Some control functions need to know the occurrence order among a set of events that happened in the
system; some functions, such as diagnosis, even need to be able to precisely date them. This can be achieved
by forming a global synchronized time base.
The second reason why a global time is needed comes from the TT communication scheme. In TT
communications, as time drives the transmissions, all nodes of the network must have a coherent notion
of time and a clock synchronization algorithm is required. This clock synchronization algorithm is, in fact,
a service that tolerates faults that can affect local clocks. In fact, the local clocks tend to drift apart since
oscillators are not perfect; this imposes periodic resynchronization. For instance, on TTP/C, each node
periodically adjusts its clock according to the difference between its own clock and the average value of
those from other nodes (the clocks with the highest value and lowest value are discarded).
A crucial performance metric for a clock synchronization algorithm is the maximum difference that
can be observed among all local clocks. This value directly impacts the networks throughput in TT buses
since the length of a transmission window, in addition to the actual transmission time of the frame, has
to include some extra time to compensate for the skew between local clocks (i.e., a frame transmitted at
the right point in time must not be rejected because the clock of a receiver diverges from the clock of the
sender). Other criteria of major interest are the number and the type of faults (e.g., wrong clock value or
no value received) that can be tolerated by the algorithm. For example, the TTP/C algorithm can tolerate
a single fault on a network composed of at least four nodes (see Reference 21 for a detailed analysis).
42.2.1.4 Atomic Broadcast and Acknowledgment
At some point in time, it is mandatory that some functions distributed on the network have the same
understanding of the state of the system in order to interoperate in a satisfactory manner. This implies
that the information on the state of the system must be consistent throughout the whole network (this
property is termed spatial consistency or exact agreement). The requirement of spatial consistency
is particularly important for active redundancy,
2
which is the basic strategy for ensuring fault tolerance,
that is, the capacity of a system to deliver its service even in the presence of faults. To be able to compare
the output results, it is crucial that the set of all replicated components process the same input data, which,
in particular, implies that the values obtained from local sensors are exchanged over the network. All
nonfaulty nodes must thus receive the messages inthe same order andwiththe same content. This property,
which is calledatomic broadcastor interactive consistent broadcast(see References 22 and 16), enables
1
Adding new nodes requires that some bandwidth has been reserved for their transmission at design time.
For instance, in TTP/C, some slots can be left free for future use.
2
Active redundancy means that a set of components realizing the same functions in parallel enables the system
to continue to operate despite the loss of one or more units. In passive redundancy, additional components are only
activated when the primary component fails.
2006 by Taylor & Francis Group, LLC
42-6 Embedded Systems Handbook
distributed processes to reach common decisions orconsensusdespite faults, for instance, using majority
voting.
In practice, it may happen that all or a subset of nodes do not receive a message because of an incorrect
signal shape due to EMIs or because nodes are temporarily faulty. The communication system usually
provides, through the use of a CRC for detecting corrupted frames, a weak form of atomic broadcast
that ensures that all stations that successfully receive a frame get the same value. This alone is however
not sufcient for constructing fault-tolerant applications and, in addition, at least the acknowledgment
of the reception of a message is needed because the sender, and possibly other nodes, may have to adapt
their behavior according to this information (e.g., reschedule the transmission of the information in
a subsequent frame). This latter requirement is important, in the automotive context, for distributed
functions, such as steering, braking, or active suspension.
42.2.1.5 Avoiding Babbling-Idiots
As said before, it is crucial that the system does not deviate from the temporal behavior dened at
design time. If a node does not behave in the specied manner, it has to be detected and masked at the
communication systemlevel in order to prevent the failure frompropagating.
It may happen that a faulty ECUtransmits outside its specication, for example, it may send at a wrong
point in time or send a frame larger than planned at design time. When communications are multiplexed,
this will perturb the correct functioning of the whole network, especially the temporal behavior of
the data exchanges. One well-known manifestation is the so-called babbling idiots [23,24] nodes that
transmit continuously (e.g., due to a defective oscillator). To avoid this situation, a component called the
bus guardian, restricts the controllers ability to transmit by allowing transmission only when the node
exhibits a specied behavior. Ideally, the bus guardian should have its own copy of the communication
schedule, should be physically separated from the controller, should possess its own power supply and
should be able to construct the global time itself. Due to the strong pressure fromthe automotive industry
concerning costs, these assumptions are not fullled in general, which reduces the efciency of the bus
guardian strategy.
If the network has a star topology, with a central interface called the star for interconnection,
instead of the classical bus topology, then the star can act as a central bus guardian and protect against
errors that cannot be avoided by a local bus guardian. For instance, a star topology is more resilient to
spatial proximity faults (e.g., temperature peaks) and to faults due to the desynchronization of an ECU
(i.e., the star can disconnect a desynchronized station). To avoid a single point of failure, a dual star
topology should be used with the drawback that the length of the wires is signicantly increased.
42.2.2 Higher-Level Services
In this section, we identify services that provide fault-tolerant mechanisms belonging conceptually to
layers above the MAC in the OSI reference model.
42.2.2.1 Group Membership Service
As discussed in Section 42.2.1.4, atomic broadcast ensures that all nonfaulty stations possess the same
variables describing the state of the systemat a particular point in time. Another property that is required
for implementing fault tolerance at a high level is that all nonfaulty stations knowthe set of stations that are
operational (or nonfaulty). This service, which is basically a consensus on the set of operational nodes,
is provided by the group membership and it is generally highly recommended for X-by-Wire applications.
A classical example detailed in Reference 12 is a brake-by-wire system where four ECUs, interconnected
by a network, control the brakes located at the four wheels of the car. As soon as a wheel ECU is no
longer functioning, the brake force applied to its wheel has to be redistributed among the remaining three
wheels in such a way that the car can be safely parked. As pointed out in Reference 12, for a brake-by-wire
application, the time interval between the dysfunctioning of the wheel ECU and the knowledge of this
2006 by Taylor & Francis Group, LLC
Fault-Tolerant Services 42-7
event by all other stations has an impact on the safety of the application and thus it has to be bounded
and taken into account at design time.
A membership service implemented at the communication system level assumes that all nodes that are
correctly participating in the communication protocol are nonfaulty. In TT systems, as transmissions are
perfectly foreseeable, the decisions regarding membership can be taken at points in time where frames
should have been received. In a very simplied way, a missing or faulty frame indicates to the receivers
that the sending node is not functioning properly. In addition, a node that is unable to transmit must
consider itself as faulty and stops operating. Since it takes some time to detect faulty nodes, there can
be faulty stations in the membership list of a node during some time intervals. The maximum number
of such undetected faulty nodes, the maximum duration it takes to discover that a node is faulty, the
maximum number of faulty stations, and the types of faults than can be detected are major performance
criteria of a membership algorithm. Other criteria include: the time needed for a repaired node to rejoin
the membership list, how well the different nodes agree on the membership list at any point in time (are
cliques, i.e., sets of stations that disagree on the state of the system, possible?, and how long can these
cliques coexist?) and the implementation overheads mainly in terms of CPU load and network bandwidth.
Group membership algorithms are complex distributed algorithms and formal methods are of great
help in analyzing and validating them; the reader can refer to [21,22,25,26] as good starting points on this
topic.
42.2.2.2 Management of Nodes Redundancy
A classical way for ensuring fault tolerance is to replicate critical components. We saw, in Section 42.2.1.1,
that the redundancy of the bus can hide faults due to EMIs. To achieve fault tolerance, certain nodes are
also replicated and clustered into so-called Fault-Tolerant Units (FTUs). An FTU is a set of several stations
that performs the same function and each node of an FTU possesses its own slot in the round so that the
failure of one or more stations in the same FTU can be tolerated. Actually, the role of FTUs is twofold.
First, they make the system resilient in the presence of transmission errors (some frames sent by nodes
of the FTU may be correct while others are corrupted). Second, they provide a means to ght against
measurement and computation errors occurring before transmission (some nodes may send the correct
values while others may make errors).
42.2.2.2.1 Fail-Silence Property
In the fault tolerance terminology, a node is said to be fail-silent if (1) it sends frames at the correct point
in time (correctness in the time domain), and (2) the correct value is transmitted (correctness in the value
domain), or (3) it sends detectably incorrect frames (e.g., wrong CRC) in its own slot or no frame at all.
A communication system such as TTP/C provides very good support for the requirements (1) and (3)
(whose fulllment provides the so-called fail silence in the temporal domain) especially through the bus
guardian concept (see Section 42.2.1.5), while the value domain is the responsibility of higher-level layers.
The use of fail-silent nodes greatly decreases the complexity of designing a critical application since data
produced by fail-silent nodes is always correct and thus can be safely consumed by the receivers. Tolerating
one arbitrary failure can be achieved with FTUs made of two nodes whereas three are necessary if the nodes
are not fail silent. However, in practice, it is difcult to ensure the fail-silent assumption, especially in the
value domain. Basically, a fail-silent node has to implement redundancy plus error detection mechanisms
and stop functioning after a failure is detected. Self-check mechanisms can be implemented in hardware
or, more usually, in software on commercial off-the-shelf hardware [27]. An example of such mechanisms
is the double execution strategy, which consists of running each task twice and comparing the output.
However, both executions can be affected in the same way by a single error; a solution that provides some
protection against so-called common-mode faults is to perform a third execution with a set of reference
input data and to compare the output of the execution with precomputed results that are known to
be correct. This strategy is known as double execution with reference check.
The reader is referred to References 11, 27, and 28 for good starting points on the problem of
implementing fail-silent nodes.
2006 by Taylor & Francis Group, LLC
42-8 Embedded Systems Handbook
42.2.2.2.2 Message Agreement
From an implementation point of view, it is usually preferable to present only one copy of data to the
application in order to simplify the application code (considering possible divergences between replicated
message instances is not needed) and to keep it independent from the degree of redundancy (i.e., the
number of nodes composing an FTU).
The algorithm responsible for the choice of the value that will be transmitted to the application is
termedthe agreement algorithm. Many agreement strategies are possible: pick-any (replicated messages
are coming from an FTU made of fail-silent nodes), average-value, pick-a-particular-one (the selected
value has been produced by the best sensor), majority vote, etc. OSEK/VDXconsortium[10] has proposed
a software layer responsible for implementing the agreement strategy. Two other important services
of the OSEK FTCom (Fault-Tolerant Communication layer) are (1) to manage the packing of signals
(elementary pieces of informationsuchas the speedof the vehicle) into frames according to a precomputed
conguration, which is needed if the use of network bandwidth has to be optimized (see, for instance,
References 29 and 30 for frame-packing algorithms), and (2) to provide message ltering mechanisms for
passing only signicant data to the application. Another fault-tolerant layer that offers the agreement
service is described, as well as the set of associated tools, in Reference 11.
42.2.2.3 Support for Functioning Mode
Afunctioning mode is a specic operational phase of an application. Typically, several functioning modes,
that are mutually exclusive, are dened in a safety-critical application. For a vehicle, possible modes
include factory mode (e.g., download of calibration parameters), prerun mode (after doors are unlocked
and before the engine is started preheating is possible for some components), postrun mode (engine
was shut-off but, e.g., cooling can still be necessary), park mode (most ECUs are powered off), and even
show-room mode. Besides these normal functioning modes, the occurrence of a failure can trigger the
switching to a particular mode that will aimto bring the systemback to a safe state again.
Particular functions corresponds to each functioning mode, which means a different set of tasks and
messages as well as different schedules. If mode changes provide exibility, great care must be taken
that changes happen at the right points in time and that all nodes agree on the current mode. The
communication system can provide some support in this area by ensuring that mode changes take place
only at predened points in time, are triggered by the authorized nodes and that the message schedule is
changed simultaneously for all nodes. For example, TTP/C [31,32] offers services for immediate mode
changes (i.e., the change is performed at the end of the transmission window where it was requested)
as well as deferred mode changes (i.e., the change is performed at the end of the current message schedule
or cluster cycle in the TPP/C terminology).
42.3 Fault-Tolerant Communication Systems
Among communication protocols that are considered for being used in safety-critical automotive systems,
one can distinguish three main types:
Protocols that have been designed fromscratch to provide all the main fault-tolerant services. The
prominent representative of this class is the TTP/C protocol [47].
Protocols that offer the basic functionalities for fault-tolerant systems among which are global
time and bus guardians. The idea is to allow a scalable dependability on a per network or
even on a per node basis. Missing features are to be implemented in software layers above
the communication controllers. The representative of this class in the automotive context is
FlexRay [33].
Protocols not initially conceived with the objective of fault tolerance to which missing features are
added. This is the case with CAN[34], current de facto standard in production cars, which is being
considered for use in safety-critical applications (see, for instance, Reference 17) with the condition
of additional features.
2006 by Taylor & Francis Group, LLC
Fault-Tolerant Services 42-9
42.3.1 Dependability from Scratch: TTP/C
The TTP/C protocol, which is specied in Reference 32, was designed and extensively studied at the
Vienna University of Technology. TTP/C is a central part of the Time-Triggered Architecture (TTA see
Reference 35) which is a complete framework for building fault-tolerant distributed applications according
to the TT paradigm. Hardware implementations of the TTP/C protocol, as well as software tools for the
design of the application, are commercialized by the TTTech company and are available today.
On a TTP/C network, transmission support is replicated and each channel transports its own copy of the
same message. TTP/C can be implemented with a bus topology or a more resilient single star or dual star
topology. At the MAC level, the TTP/C protocol implements a synchronous TDMA scheme: the stations
(or nodes) have access to the bus in a strict deterministic sequential order and each station possesses the
bus for a constant period of time called aslotduring which it has to transmit one frame. The sequence of
slots, such that all stations have accessed the bus one time, is called a TDMA round. The size of the slot
is not necessarily identical for all stations in the TDMA round, but a slot belonging to one station is the
same size in each round. Consecutive TDMA rounds may differ according to the data transmitted during
the slots, and the sequence of all TDMA rounds is the cluster cycle that repeats itself in a cycle.
The TTP/C possesses numerous features and services related to dependability along with TT com-
munication. In particular, TTP/C implements a clique avoidance algorithm (the stations that belong to
a minority in their understanding of the state of the system will eventually be excluded) and a mem-
bership algorithm that also provides data acknowledgment (one knows after a bounded time whether a
station has received a message or not). Bus guardian, global clock, and support for mode changes are also
parts of the specication.
The algorithms used in TTP/C are by themselves intricate and interact in a very complex manner but
most of them have been formally veried (see [21,25,36]). The fault hypothesis used for the design of
TTP/C is well specied, but also quite restrictive (two successive faults such as transmission errors must
occur at least two rounds apart). Situations outside the fault hypothesis are treated using never give up
(NUP) strategies that aimto continue operating in a degraded mode. Fromthe point of view of the set of
available services, TTP/C is a mature solution. In our opinion, future research should investigate whether
the fault hypothesis considered in the TTP/C design are pertinent in the context of automotive embedded
systems where the environment can be very harsh (e.g., bursts of transmission errors may happen). This
can be done starting from measurements taken on-board of prototypes that would help to estimate the
relevance of the fault hypothesis. Other research could study the behavior of the communication system
outside the fault hypothesis and the impact on the application; this could be undertaken by using fault
injection.
42.3.2 Scalable Dependability: FlexRay
A consortium of major companies from the automotive eld is currently developing the FlexRay
protocol. The core members are BMW, Bosch, Daimler-Chrysler, General Motors, Motorola, Philips,
and Volkswagen. The rst publicly available specications of the FlexRay Protocol have already been
released [33].
The FlexRay network is very exible with regard to topology and transmission support redundancy.
It can be congured as a bus, a star, or multistar, and it is not mandatory that each station possess
replicated channels or a bus guardian, even though this should be the case for critical functions. At the
MAC level, FlexRay denes a communication cycle as the concatenation of a TT (or static) window and
an ET (or dynamic) window. In each communication window, whose size is set statically at design time,
a different protocol is applied. The communication cycles are executed periodically. The TT window uses
a TDMA MAC protocol; the main difference with TTP/C is that a station might possess several slots in
the TT window, but the size of all the slots is identical.
In the ET part of the communication cycle, the protocol is FTDMA (Flexible Time Division Multiple
Access): the time is divided into so-called minislots, each station possesses a given number of minislots
2006 by Taylor & Francis Group, LLC
42-10 Embedded Systems Handbook
Channel A
Channel B
n n+1 n+2
n+2
Frame ID n+1
n
Frame ID n
n+1
Frame ID n+2
n+3
MiniSlot
n+4
Frame ID n+4
n+3
Frame ID n+4
n+5
n+4
n+6
Slot Counter
n+7
FIGURE 42.1 Example of message scheduling in the dynamic segment of the FlexRay communication cycle.
(not necessarily consecutive) and it can start the transmission of a frame inside each of its own minislots.
The bus guardian is not used in the dynamic window to control whether transmissions take place as
specied. A minislot remains idle if the station has nothing to transmit. An example of a dynamic window
is shown in Figure 42.1: on channel B, frame m has begun transmitting in minislot n while minislots n + 1
and n +2 have not been used. It is noteworthy that frame n +4 is not received simultaneously on channels
A and B since, in the dynamic window, transmissions are independent in both channels.
The FlexRay MAC protocol is more exible than the TTP/C MAC since in the static window nodes are
assigned as many slots as necessary (up to 4095 for each node) and since the frames are only transmitted if
necessary in the dynamic part of the communication cycle. In a similar way as with TTP/C, the structure
of the communication cycle is statically stored in the nodes, however, unlike TTP/C, mode changes with
a different communication schedule for each mode are not possible.
From the dependability point of view, FlexRay species solely TT communication with bus guardian
and clock synchronization algorithm on dual wires (shielded or unshielded see Reference 37 for the
specications of the physical layer). Should we consider the example of brake-by-wire in Section 42.2.2.1,
the protocol offers no way offered for a node to know that one of the wheel ECUs is no longer operational,
which would be needed to take the appropriate decision (e.g., redistribution of the brake force). Features
that can be necessary for implementing fault-tolerant applications, such as membership and acknowledg-
ment services or mode management facilities, will have to be implemented in software or hardware layers
on top of FlexRay with the drawback that efcient implementations might be more difcult to achieve
above the data-link layer level. There is indeed in literature individual solutions for each of the missing
services but these protocols might have very complex interactions when used jointly, which requires that
the whole communication prole is carefully validated by tests, simulation, fault injection, and formal
proof under a well-dened fault hypothesis.
In automotive systems, critical and noncritical functions will increasingly coexist and interoperate.
In the FlexRay specication ([33], p. 8), it is argued that the protocol provides scalable dependability, that is,
the ability to operate in congurations that provide various degrees of fault-tolerance. Indeed, the
protocol allows for mixing single and dual transmission supports (interconnected though a star) on the
same network, subnetworks of nodes without busguardians or with different fault-tolerance capability
with regard to clock synchronization, nodes that do not send or receive TT messages, etc. This exibility
can prove to be efcient in the automotive context in terms of cost and reuse of existing components if
missing fault-tolerance features are providedina middleware layer suchas OSEKFTCom(see introduction
of Section 42.2 and Reference 10) or the one currently under development within the automotive industry
project AUTOSAR (see http://www.autosar.org).
42.3.3 Adding Missing Features to an Existing Protocol: CAN
Controller Area Network has proved to be a very cost and performance effective solution for data exchange
in automotive systems during the last 15 years. However, as specied by the ISO standards [34,38],
2006 by Taylor & Francis Group, LLC
Fault-Tolerant Services 42-11
CAN lacks almost all the features and services identied in Section 42.2 as are important for the
implementation of fault-tolerant systems: no redundant medium, no TT communication, no global time,
no atomic broadcast (even in the weak form described in Section 42.2.1.4, due to the well-known incon-
sistent message omission [39]), no reliable acknowledgment, no bus guardian, no group membership, no
functioning mode management services, etc.
Some authors advocate that CAN can be used as a base and missing facilities can be added as needed
[17] and, over the last years, there was in fact a number of studies and proposals aimed at adding fault-
tolerant features to CAN (see, for instance, [9,4048]). In the rest of this section, we discuss some such
proposals of possible interest for automotive systems.
42.3.3.1 TTCAN: TT Communications on Top of CAN
Two main protocols were proposed to enable TT transmissions over CAN: TTCAN (Time-Triggered
Controller Area Network see References 40 and 49) and FTT-CAN (Flexible Time-Triggered CAN see
Reference 9). In the following, we consider TTCAN, which has received much attention in the automotive
eld since it was proposed by Robert Bosch GmbH, a major actor in the automotive industry.
Time-Triggered CAN was developed on the basis of the CAN physical and data-link layers. The bus
topology of the network, the characteristics of the transmission support, the frame format, as well as the
maximum data rate 1 Mbits/sec are imposed by CAN protocol [49]. In addition to the standard
CAN features, TTCAN controllers must have the possibility to disable automatic retransmission and to
provide the application with the time at which the rst bit of a frame was sent or received [49]. Channel
redundancy is possible, but not standardized, and no bus guardian is implemented in the node. The key
idea is to propose, as with FlexRay, a exible TT/ET protocol. TTCAN denes a basic cycle (the equivalent
of the FlexRay communication cycle) as the concatenation of one or several TT (or exclusive) windows
and one ET (or arbitrating) window. Exclusive windows are devoted to TT transmissions (i.e., periodic
messages) while the arbitrating window is ruled by the standard CAN protocol: transmissions are dynamic
and bus access is granted according to the priority of the frames. Several basic cycles, that differ in their
organization (exclusive and arbitrating windows) and in the messages sent inside exclusive windows, can
be dened. The list of successive basic cycles is called the system matrix and the matrix is executed in
loops. Interestingly, the protocol enables the master node, the node that initiates the basic cycle through the
transmission of the reference message, to stop functioning in TTCAN mode and to resume in standard
CAN. Later, the master node can switch back to TTCAN mode by sending a reference message.
42.3.3.2 Improving Error Connement
Controller Area Network protocol possesses fault connement-mechanisms aimed at differentiating
between short disturbances caused by EMI and permanent failures due to hardware dysfunctioning.
The scheme is based on error counters that are increased and decreased according to particular events
(e.g., successful reception of a frame, reception of a corrupted frame, etc.). The relevance of the algorithms
involved is questionable (see Reference 50) but the main drawback is that a node has to diagnose itself,
which can lead to the nondetection of some critical errors such as the node transmitting continu-
ously a dominant bit (one manifestation of the babbling idiot fault known as stuck-at-dominant,
see Section 42.2.1.5 and Reference 46). Furthermore, other faults such as the partitioning of the network
into several subnetworks may prevent all nodes from communicating due to bad signal reection at the
extremities.
To address these problems, several solutions were proposed among which are the variant of RedCAN
discussed in Reference 47 and CANcentrate discussed in Reference 46. The latter proposal is an active star
that integrates some fault-diagnosis and fault-connement mechanisms that can in particular prevent
a stuck-at-dominant behavior. The former proposal relies on a ring architecture where each node is
connected to the bus through a switch that possesses the ability to exclude a faulty node or a faulty
segment fromthe communication. These two proposals are promising but developments are still needed
(e.g., test implementation, fault injection, formal proofs) before they can be actually used in safety-critical
2006 by Taylor & Francis Group, LLC
42-12 Embedded Systems Handbook
applications. Furthermore, some faults such as a node transmitting correct frames more often than
specied at design time are not covered by these proposals.
Many other mechanisms were proposed for increasing the dependability on CAN-based networks
[4145,48], but as pointed out in Reference 43, if each proposal solves a particular problem, they have not
been thought to be combined. Furthermore, the fault hypothesis used in the design are not necessarily the
same and the interactions between protocols remains to be studied in a formal way.
42.4 Conclusion
In the current state of practice, automotive embedded systems widely make use of fault-prevention
(e.g., shielded ECU or transmission support), fault-detection (e.g., watch-dog ECU that monitors the
functioning state of the engine controller, check whether a data is obsolete or out-of-range) and fault-
connement techniques (e.g., missing critical data are reconstituted on the basis of other data and more
generally, specication and implementation of several degraded functioning modes). Redundancy is used
at the sensor level (e.g., for the wheel angle) but seldom at the ECU level because of cost pressure and
because the criticality of the functions does not absolutely impose it. Some future functions, such as
brake- and steer-by-wire, are likely to require active redundancy in order to comply with the acceptable
risk levels and the design guidelines that could be issued by certication organisms.
For critical functions that are distributed and replicated throughout the network, the communication
system will play a central role by providing the services that will simplify the implementation of fault-
tolerant applications. The networks that are candidates are TTP/C, FlexRay, and CAN-based TT solutions.
TTP/C is a mature technology that provides the most important services for supporting fault-tolerant
applications. Moreover, TTP/C was designed under a well-specied fault hypothesis and the committees
of most of its algorithms were formally proven. In our opinion, future research should investigate the
relevance of the TTP/C fault hypothesis in the context of automotive embedded systems and the behavior
of the protocol outside the fault hypothesis. At the time of writing, FlexRay, which is developed by the
major actors of the European automotive industry, seems in a strong position for becoming a standard in
the industry. The main advantage of FlexRay is its exibility; in particular, it provides both TT and ET
communications and nodes with different fault-tolerance capabilities can coexist on the same network.
The services provided by FlexRay do not fulll all the needs for fault tolerance and higher-level protocols
will have to be developed and validated before FlexRay can be used in very demanding applications. The
major issue is that higher-level implementations tend to be less efcient (e.g., bandwidth overhead for
acknowledgment, maximum time needed for detecting faulty nodes). Finally, the solutions based on the
TT-CAN protocol will require additional low-level mechanisms for fault connement as well as higher-
level services such as atomic broadcast and membership. Many proposals exist for more dependability
on CAN-based network but much work remains to be done to come up with a coherent and validated
communication stack that includes all necessary services.
Acknowledgment
We would like to thank Mr. Christophe Marchand, project leader in the eld of diagnosis at PSA Peugeot
Citron, for helpful comments on an earlier version of this chapter.
References
[1] Y. Papadopoulos and J.A. McDermid. The potential for a generic approach to certication of
safety-critical systems in the transportation sector. Journal of Reliability Engineering and System
Safety, 63: 4766, 1999.
[2] Radio Technical Commission for Aeronautics. RTCA DO-178B software considerations in
airbone systems and equipment certication, 1994.
2006 by Taylor & Francis Group, LLC
Fault-Tolerant Services 42-13
[3] CENELEC. Railway applications software for railway control and protection systems,
EN50128, 2001.
[4] P.H. Jesty, K.M. Hobley, R. Evans, and I. Kendall. Safety analysis of vehicle-based systems.
In Proceedings of the 8th Safety-Critical Systems Symposium, Southampton, UK, 2000.
[5] IEC. IEC61508-1, Functional Safety of Electrical/Electronic/Programmable Safety-Related
Systems Part 1: General Requirements, IEC/SC65A, 1998.
[6] J.A. McDermid. Trends in system safety: a European view? In Proceedings of the 7th Australian
Workshop on Safety Critical Systems and Software, North Adelaide, Australia, 2002.
[7] A. Avizienis, J. Laprie, and B. Randell. Fundamental concepts of dependability. In Proceedings of
the 3rd Information Survivability Workshop, Boston, USA, 2000, pp. 712 .
[8] ARTIST, Project IST-2001-34820. Selected topics in embedded systems design: roadmaps for
research, May 2004. Available at http://www.artist-embedded.org/Roadmaps/
ARTIST_Roadmaps_Y2.pdf.
[9] J. Ferreira, P. Pedreiras, L. Almeida, and J.A. Fonseca. The FTT-CAN protocol for exibility in
safety-critical systems. IEEE Micro, Special Issue on Critical Embedded Automotive Networks, 22:
4655, 2002.
[10] OSEK Consortium. OSEK/VDX Fault-Tolerant Communication, Version 1.0, July 2001. Available
at http://www.osek-vdx.org/.
[11] C. Tanzer, S. Poledna, E. Dilger, and T. Fuhrer. A fault-tolerance layer for distributed fault-tolerant
hard real-time systems. In Proceedings of the Annual IEEE Workshop on Fault-Tolerant Parallel and
Distributed Systems, San Juan, Puerto Rico, USA, 1999.
[12] H. Kopetz and G. Bauer. The time-triggered architecture. Proceedings of the IEEE, 91:
112126, 2003.
[13] I.E. Noble. EMCand the automotive industry. Electronics and Communication Engineering Journal,
4(5): 263271, 1992.
[14] E. Zanoni and P. Pavan. Improving the reliability and safety of automotive electronics. IEEE Micro,
13: 3048, 1993.
[15] J. Barrenscheen and G. Otte. Analysis of the physical CAN bus layer. In Proceedings of the 4th
International CAN Conference, ICC97, Berlin, Germany, October 1997, pp. 06.0206.08.
[16] J. Rushby. Acomparison of bus architecture for safety-critical embedded systems. Technical report,
NASA/CR, March 2003.
[17] L.-B. Fredriksson. CAN for critical embedded automotive networks. IEEE Micro, Special Issue on
Critical Embedded Automotive Networks, 22: 2835, 2002.
[18] A. Albert. Comparison of event-triggered and time-triggered concepts with regards to distributed
control systems. In Proceedings of Embedded World 2004, Nrnberg, February 2004.
[19] X-by-Wire Project, Brite-EuRam111 Program. X-By-Wire safety related fault tolerant systems
in vehicles, nal report, 1998.
[20] C. Wilwert, Y.Q. Song, F. Simonot-Lion, and T. Clment. Evaluating quality of service and
behavioral reliability of steer-by-wire systems. In Proceedings of the 9th IEEE International
Conference on Emerging Technologies and Factory Automation (ETFA), Lisbon, Portugal, 2003.
[21] J. Rushby. An overview of formal verication for the time-triggered architecture. In Proceed-
ings of Formal Techniques in Real-Time and Fault-Tolerant Systems, Oldenburg, Germany, 2002,
pp. 83105.
[22] T.D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal
of the ACM, 43: 225267, 1996.
[23] K. Tindell and H. Hansson. Babbling idiots, the dual-priority protocol, and smart CANcontrollers.
In Proceedings of the 2nd International CAN Conference, London, UK, 1995, pp. 7.227.28.
[24] C. Temple. Avoiding the babbling-idiot failure in a time-triggered communication system. In Pro-
ceedings of the 28th International Symposium on Fault-Tolerant Computing, Munich, Germany,
June 1998.
[25] H. Pfeifer. Formal verication of the TTP group membership algorithm. In Proceedings
of FORTE/PSTV 2000, Pisa, Italy, 2000.
2006 by Taylor & Francis Group, LLC
42-14 Embedded Systems Handbook
[26] H. Pfeifer and F.W. von Henke. Formal analysis for dependability properties: the time-triggered
architecture example. In Proceedings of the 8th IEEE International Conference on Emerging
Technologies and Factory Automation (ETFA 2001), Antibes, France, October 2001, pp. 343352.
[27] F. Brasileiro, P. Ezhilchelvan, S. Shrivastava, N. Speirs, and S. Tao. Implementing fail-silent nodes
for distributed systems. IEEE Transactions on Computers, 45: 12261238, 1996.
[28] M. Hiller. Software fault-tolerance techniques from a real-time systems point of view
an overview. Technical report, Chalmers University of Technology, Gteborg, Sweden,
November 1998.
[29] R. Santos Marques, N. Navet, and F. Simonot-Lion. Frame packing under realtime constraints.
In Proceedings of the 5th IFAC International Conference on Fieldbus Systems and their Applications
FeT2003, Aveiro, Portugal, July 2003, pp. 185192.
[30] R. Saket and N. Navet. Frame packing algorithms for automotive applications. Technical report
RR-4998, INRIA, 2003. Available at http://www.inria.fr/ rrrt/rr-4998.html.
[31] H. Kopetz, R. Nossal, R. Hexel, A. Krger, D. Millinger, R. Pallierer, C. Temple, and M. Krug. Mode
handling in the time-triggered architecture. Control Engineering Practice, 6: 6166, 1998.
[32] TTTech Computertechnik GmbH. Time-Triggered Protocol TTP/C, High-Level Specication
Document, Protocol Version 1.1, November 2003. Available at http://www.tttech.com.
[33] FlexRay Consortium. FlexRay Communication System, Protocol Specication, Version 2.0, June 2004.
Available at http://www.flexray.com.
[34] International Standard Organization. ISO 11519-2, Road Vehicles Low Speed Serial Data
Communication Part 2: Low Speed Controller Area Network, ISO, 1994.
[35] H. Kopetz. Real-Time Systems: Design Principles for Distributed Embedded Applications. Kluwer
Academic Publishers, Dordrecht, 1997.
[36] G. Bauer and M. Paulitsch. An investigation of membership and clique avoidance in ttp/c.
In Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems, Nrnberg, Germany,
2000.
[37] FlexRay Consortium. FlexRay Communication System, Electrical Physical Layer, Version 2.0,
June 2004. Available at http://www.flexray.com.
[38] International Standard Organization. ISO 11898, Road Vehicles Interchange of Digital
Information Controller Area Network for High-Speed Communication, ISO, 1994.
[39] J. Runo, P. Verssimo, G. Arroz, C. Almeida, and L. Rodrigues. Fault-tolerant broadcasts in CAN.
In Proceedings of the 28th International Symposium on Fault-Tolerant Computing Systems, IEEE,
Munich, Germany, June 1998, pp. 150159.
[40] International Standard Organization. 11898-4, RoadVehicles Controller Area Network (CAN)
Part 4: Time-Triggered Communication, ISO, 2000.
[41] G. Lima and A. Burns. Timing-independent safety on top of CAN. In Proceedings of the 1st
International Workshop on Real-Time LANs in the Internet Age, Vienna, Austria, 2002.
[42] G. Lima and A. Burns. A consensus protocol for CAN-based systems. In Proceedings of the 24th
Real-Time Systems Symposium, Cancun, Mexico, 2003, pp. 420429.
[43] G. Rodriguez-Navas, M. Barranco, and J. Proenza. Harmonizing dependability and real time in
CAN networks. In Proceedings of the 15th Euromicro Conference on Real-Time Systems, Porto,
Portugal, 2003.
[44] J. Ferreira, L. Almeida, J. Fonseca, G. Rodriguez-Navas, and J. Proenza. Enforcing consistency
of communication requirements updates in FTT-CAN. In Proceedings of the 22nd Symposium on
Reliable Distributed Systems, Florence, Italy, 2003.
[45] G. Rodriguez-Navas and J. Proenza. Clock synchronizationinCANdistributed embedded systems.
In Proceedings of the 3rd International Workshop on Real-Time Networks, Catania, Italia, 2004.
[46] M. Barranco, G. Rodriguez-Navas, J. Proenza, andL. Almeida. CANcentrate: anactive star topology
for can networks. In Proceedings of the 5th International Workshop on Factory Communication
System, Vienna, Austria, 2004.
2006 by Taylor & Francis Group, LLC
Fault-Tolerant Services 42-15
[47] H. Sivencrona, T. Olsson, R. Johansson, and J. Torin. RedCAN: simulations of two fault recovery
algorithms for CAN. In Proceedings of the 10th IEEE Pacic Rim International Symposium on
Dependable Computing, Papeete, French Polynesia, 2004, pp. 302311.
[48] L.M. Pinho and F. Vasques. Reliable real-time communication in can networks. IEEE Transactions
on Computers, 52: 15941607, 2003.
[49] Robert Bosch GmbH. Time Triggered Communication on CAN. Available at http://www.can.
bosch.com/content/TT_CAN.html, 2004.
[50] B. Gaujal and N. Navet. Fault connement mechanisms onCAN: analysis and improvements. IEEE
Transactions on Vehicular Technology, 54(5), 2004. Accepted for publication. Preliminary version
available as INRIA Research Report at http: //www.inria.fr/rrrt/rr-4603.html.
2006 by Taylor & Francis Group, LLC
43
Volcano Enabling
Correctness by Design
Antal Rajnk
Volcano Communications
Technologies AG
43.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43-1
43.2 Volcano Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43-3
Volcano Signals and the Publish/Subscribe Model Frames
Network Interfaces The Volcano API Timing Model
Capture of Timing Constraints
43.3 Volcano Network Architect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43-10
The Car OEM Tool Chain One Example VNA Tool Overview
43.4 Volcano Software in an ECU . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43-15
Volcano Conguration Workow
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43-18
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43-18
More Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43-18
43.1 Introduction
Volcano is a holistic concept dening a protocol independent design methodology for distributed real-time
networks in vehicles. The concept is dealing with both technical and nontechnical entities (i.e., partitioning
of responsibilities into well-dened roles in the development process).
The vision of Volcano is Enabling Correctness by Design. By taking a strict systems engineering
approach and focusing resources into design, a majority of system-related issues can be identied and
solved early in a project. The quality is designed into the vehicle, not tested out. Minimized cost, increased
quality, and high degree of conguration/reconguration exibility are the trademarks of the Volcano
concept.
The Volcano approach is particularly benecial as the complexity of vehicles is increasing very rapidly
and as projects will have to cope with new functions and requirements throughout their lifetime.
A unique feature of the Volcano concept is the solution called post-compile-time reconguration
exibility, where the network conguration containing signal to frame mapping, ID assignment, and
frame period is located in a congurable ash area of the Electronic Control Unit (ECU), and can
be changed without the need for touching the application software thus eliminating the need for
re-validation, saving cost, and lead-time. The origin of the concepts can be traced back to a project at
Volvo Car Corporation during 1994 to 1998 when development of Volvos new large platform [3] had
taken place. It is reusing solid industrial experience, and is taking into account recent ndings from
real-time research (Figure 43.1) [2].
43-1
2006 by Taylor & Francis Group, LLC
43-2 Embedded Systems Handbook
ECE
PDM
PSM
RTI
REM
TCM
ETM
SAS
ABS
DDM
DIM SWM
CCM
AUM
PHM
SRS
UEM
CEM
CAN high speed (250kbit)
CAN low speed (125kbit)
FIGURE 43.1 The main networks of the Volvo S80 [4].
The concept is characterized by three important features:
Ability to guarantee the real-time performance of the network already at the design stage, thus
signicantly reducing the need for testing.
Built-in exibility enabling the vehicle manufacturer to upgrade the network in the preproduction
phase of a project as well as in the aftermarket.
Efcient use of available resources.
The actual implementation of the concept consists of two major parts:
The ofine tool-set for requirement capturing and automated network design (covering multiple
protocols and gateway conguration). It provides strong administrative functions for variant and
version handling, which are needed during the complete life cycle of a car project.
The target part, represented by a highly efcient and portable embedded software package, offers a
signal-based API, handles multiple protocols, integrated gateway functionality, and post-compile-
time reconguration capability, together with a PC-based generation tool.
Even though the implementation originally supported the Control Area Network (CAN) and Volcano
lite
1
protocols, it has successfully been extended to t other emerging network protocols also. LIN was
added rst, followed by the FlexRay and MOST protocols. The philosophy behind this is that communic-
ation has to be managed in one single development environment, covering all protocols used, in order
to ensure end-to-end timing predictability, which still provides the necessary architectural freedom to
choose the most economic solution for the task.
The Volcano approach is particularly benecial as the complexity of vehicles is increasing very rapidly
and as projects will have to cope with new functions and requirements throughout their lifetime. Over the
last 40 years the computing industry has discovered that certain techniques are needed in order to manage
complex software systems. Two of these techniques are: abstraction (where unnecessary information is
hidden) and composability (if software components proven to be correct are combined, then the resulting
system will be correct as well). Volcano is making heavy use of both these techniques.
The automotive industry is implementing an increasing number of software functions. Introduction of
protocols, such as MOST for multimedia and FlexRay for active chassis systems, results in highly complex
electrical architectures. Finally all these complex subnetworks are linked through gateways. The behavior
of the entire car network has a crucial inuence upon the cars performance and reliability. To manage
software development involving many suppliers, hundreds of thousands of lines of code and thousands of
signals require a structured systems engineering approach. Inherent in the concept of systems engineering
is a clear partitioning of the architecture, requirements, and responsibilities.
1
A low-speed, SCI-based proprietary master-slave protocol used by Volvo.
2006 by Taylor & Francis Group, LLC
Volcano Enabling Correctness by Design 43-3
A modern vehicle includes a number of microprocessor-based components called Electronic Control
Units (ECUs), provided by a variety of suppliers.
Control Area Network provides an industry-standard solution for connecting ECUs together using a
single broadcast bus. A shared broadcast bus makes it much easier to add desired functionality: ECUs
can be added easily, and they can communicate data easily and cheaply (adding a function may be just
software). But increased functionality leads to more software and greater complexity. Testing a module
for conformance to timing requirements is the most difcult of the problems. With a shared broadcast
bus, the timing performance of the bus might not be known until all the modules are delivered and the bus
usage of each is known. Testing for timing conformance can only then begin (which is often too far into
the development of a vehicle to nd and correct major timing errors). The supplier of a module can only
do limited testing for timing conformance: they do not have a complete picture of the nal load placed on
the bus. This is particularly important when dealing with the CAN bus: arrivals of frames from the bus
may cause interrupts on a module wishing to receive the frames, and so the load on the microprocessor
in the ECU is partially dependent on the bus load.
It is often thought that CANis somehowunpredictable and the latencies for lower priority frames in the
network are unbounded. This is untrue, and in fact CANis a highly predictable communications protocol.
Furthermore, CAN is well suited to handle large amounts of trafc with differing time constraints.
However, with CAN there are a few particular problems:
The distribution of identiers. CAN uses identiers for two purposes: distinguishing different
messages on the bus, and assigning relative priorities to those messages the latter being often
neglected.
Limited bandwidth. This is due to low maximum signaling speed of 1 Mbit/sec, further reduced by
signicant protocol overhead.
Volcano was designed to provide abstraction, composability, and identier distribution reecting true
urgencies, and at the same time providing the most efcient utilization of the protocol.
43.2 Volcano Concepts
The Volcano concept is founded on the ability to guarantee the worst-case latencies of all frames sent in a
multiprotocol network system. This is a key step because it gives the following:
A way of guaranteeing that there are no communications-related timing problems.
A way of maximizing the amount of information carried on the bus. The latter is important for
reduced production costs.
The possibility to develop highly automated tools for design of optimal network congurations.
The timing guarantee for CANis providedby mathematical analysis developedfromacademic research[1].
Other protocols, such as FlexRay, are predictable by design. For this reason, some of the subjects discussed
below are CAN specic, others are independent of the protocol used.
The analysis is able to calculate the worst-case latency for each frame sent on the bus. This latency
is the longest time from placing a frame in a CAN controller at the sending side to the time the frame is
correctly received at all receivers. The analysis needs to make several assumptions about how the bus is
used. One of these assumptions is that there is a limited set of frames that can access the bus, and that
time-related attributes of these frames are known (e.g., frame size, frame periodicity, queuing jitter, and
so on).
Another important assumption is that the CAN hardware can be driven correctly:
The internal message queue within any CAN controller in the system is organized (or can be used)
such that the highest priority message will be sent out rst if more than one message is ready
2006 by Taylor & Francis Group, LLC
43-4 Embedded Systems Handbook
to be sent. (The hardware-slot position based arbitration is OK as long as the number of sent
frames is less than the number of transmit slots available in the CAN controller.)
The CAN controller should be able to send out a stream of scheduled messages without releasing
the bus in the interframe space between two messages. Such devices will arbitrate for the bus right
after sending the previous message and will only release the bus in case of lost arbitration.
A third important assumption is the error model: the analysis can account for retransmissions due to
errors on the bus, but requires a model for the number of errors in a given time interval.
The Volcano software running in each ECU controls the CAN hardware and accesses the bus so that
all these assumptions are met, allowing application software to rely on all communications taking place on
time. This means that integration testing at the automotive manufacturer can concentrate on functional
testing of the application software.
Another important benet is that a large amount of communications protocol overhead can be avoided.
Examples of how protocol overheads are reduced by obtaining timing guarantees are:
There is no need to provide frame acknowledgment within the communications layer, dramatically
reducing bus trafc. The only case where an ECU can fail to receive a frame via CAN is if the ECU
is off the bus, a serious fault that is detected and handled by network management and on-board
diagnostics.
Retransmissions are unnecessary. The system-level timing analysis guarantees that a frame will
arrive on time. Timeouts only happen after a fault, which can be detected and handled by network
management and/or the on-board diagnostics.
A Volcano system never suffers from intermittent overruns during correct operation because of the
timing guarantees, and therefore achieves these efciency gains.
43.2.1 Volcano Signals and the Publish/Subscribe Model
The Volcano system provides signals as the basic communication object. Signals are small data items
that are sent between ECUs.
The publish/subscribe model is used for dening signaling needs. For a given ECU there are a set of
signals that are published (i.e., made available to the system integrator), and a number of subscribed
signals (i.e., signals that are required as inputs to the ECU).
The signal model is provided directly to the programmer of ECU application software, and the Volcano
software running in each ECU is responsible for translation between signals and CAN frames.
An important design requirement for the Volcano software was that the application-programmer
is unaware of the bus behavior: all the details of the network are hidden and the programmer only deals
with signals through a simple API. This is crucial because a major problem with alternative techniques
is that the application software makes assumptions about the CAN behavior and, therefore, changing the
bus behavior becomes difcult.
In Volcano there are three types of signals:
Integer signals. These represent unsigned numbers and are of a static size between 1 and 16 bits. So,
for example, a 16-bit signal can store integers in the range 0 to 65,535.
Boolean signals. These represent truth conditions (true/false). Note that this is not the same as
a 1-bit integer signal (which stores the integer values 0 or 1).
Byte signals. These represent data with no Volcano-dened structure. A byte signal consists of
a xed number of between 1 and 8 bytes.
The advantage of Boolean and integer signals is that the values of a signal are independent of pro-
cessor architecture (i.e., the values of the signals are consistent regardless of the endian-ness of the
microprocessors in each ECU).
2006 by Taylor & Francis Group, LLC
Volcano Enabling Correctness by Design 43-5
For published signals, Volcano internally stores the value of these signals and in case of periodic signals
will send them to the network according to a pattern dened ofine by the system integrator. The system
integrator also denes the initial value of a signal. The value of a signal persists until updated by the
application program via a write call or until Volcano is reinitialized.
For subscribed signals, Volcano internally stores the current value of each signal. The system integrator
also denes the initial value of a signal. The value of a subscribed signal persists until:
It is updated by receiving a new value from the network
Volcano is reinitialized
A signal refresh timeout occurs and the value is replaced by a substitute value dened by the
application-programmer
In the case where new signal values are received from the network, these values will not be reected in the
values of subscribed signals until a Volcano input call is made.
A published signal value is updated via a write call. The latest value of a subscribed signal is obtained
via a read call. Awrite call for a subscribed signal is not permitted.
The last-written value of a published signal may be obtained via a read call.
43.2.1.1 Update Bits
The Volcano concept permits placement of several signals with different update rates into the same frame.
It provides a special mechanism named update bit to indicate which signals within the frame has
actually been updated: that is, the ECU generating the signal wrote a fresh value of the signal since the last
time it has been transmitted. The Volcano software on an ECU transmitting a signal automatically clears
the update bit when it has been sent. This ensures that a Volcano-based ECU on the receiving side will
know each time the signal has been updated (the application can see this update bit, by using ags tied
to an update bit: see below). Using update bits to their full extent require that the underlying protocol
is secure. (Frames cannot be lost without being detected.) The CAN protocol is regarded as such, but
not the LIN protocol. Therefore, the update bit mechanism is limited to CAN within Volcano.
43.2.1.2 Flags
A ag is a Volcano object purely local to an ECU. It is bound to one of two things:
The update bit of a received Volcano signal; the ag is set when the update bit is set.
The containing frame of a signal; the ag is set when the frame containing the signal is received
(regardless of whether an update bit for the signal is set).
Many ags can be bound to each update bit, or the reception of a containing frame. Volcano sets all the
ags bound to an object when the occurrence is seen. The ags are cleared explicitly by the application
software.
43.2.1.3 Timeouts
A timeout is, like the ags, a Volcano object purely local to an ECU. The timeout is declared by the
application-programmer and is bound to a subscribed signal. A timeout condition occurs when the
particular signal was not received within the given time limit. In this case, the signal (or/and a number of
other signals) is/are set to a value specied as part of the declaration of the timeout. As with the ags, the
timeout reset mechanism can be bound to either:
The update bit of a received Volcano signal.
The frame carrying a specic signal.
43.2.2 Frames
A frame is a container capable of carrying a certain amount of data (0 to 8 bytes for CAN and LIN).
Several signals can be packed into the available data space and transmitted together in one frame on the
2006 by Taylor & Francis Group, LLC
43-6 Embedded Systems Handbook
network. The total size of a frame is determined by the protocol. A frame can be transmitted periodically
or sporadically. Each frame is assigned a unique identier. The identier serves two purposes in the
CAN case:
Identifying and ltering a frame on reception at an ECU.
Assigning a priority to a frame.
43.2.2.1 Immediate Frames
Volcano normally hides the existence of network frames from the application designer. However, under
certain cases there is a need to send and receive frames with very short processing latencies. In these cases
direct application support is required. Such frames are designated immediate frames.
There are two Volcano calls to handle immediate frames:
Atransmit call, which immediately sends the designated frame to the network.
A receive call, which immediately processes the designated incoming frame if that frame is
pending.
There is also a read update bit call to test the update bit of a subscribed signal within an immediate
frame.
The signals packed into an immediate frame can be accessed with normal read and write function
calls in the same way as all other normal signals. The application-programmer is responsible for ensuring
that the transmit call is made only when the signal values of published signals are consistent.
43.2.2.2 Frame Modes
In Volcano, it is allowed to specify different frame modes for an ECU. A frame mode is a description of
an ECU working mode, where a set of frames (signals) can be active (in- and output). The frames can
be active in one or many frame modes. The timing properties of frames do not have to be the same for
different frame modes supporting the same frame.
43.2.3 Network Interfaces
A network interface is the device used to send and receive frames to and from networks. A network
interface connects a given ECU to the network. In the CAN case, more than one network interface (CAN
controller) on the same ECU may be connected to the same network. Likewise, an ECU may be connected
to more than one network.
The network interface inVolcano are protocol specic. The protocols currently supported are CANand
LIN FlexRay and MOST are under implementation.
The network interface is managed by a standard set of Volcano calls. These allow the interface to
be initialized or reinitialized, connected to the network (i.e., begin operating the dened protocol),
disconnected from the network (i.e., take no further part in the dened protocol). There is also a Volcano
call to return the status of the interface.
43.2.4 The Volcano API
The Volcano API provides a set of simple calls to manipulate signals and to control the CAN/LIN control-
lers. There are also calls to control Volcano sending to, and receiving fromnetworks. To manipulate signals
there are read and write calls. A read call returns to the caller the latest value of a signal; a write
call sets the value of a signal. The read and write calls are the same regardless of the underlying
network type.
43.2.4.1 Volcano Thread-of-Control
There are two Volcano calls that must be called at the same xed rate: v_input() and v_output(). If the
v_gateway() function is used, the same calling rate shall be used as for the v_input() and v_output()
2006 by Taylor & Francis Group, LLC
Volcano Enabling Correctness by Design 43-7
functions. The v_output() call places the frames into the appropriate controllers. The v_input() call takes
received frames and makes the signal values available to read calls. The v_gateway() call copies values
of signals in frames received from the network to values of signals in frames sent to the network. The
v_sb_tick() call handles transmitting and receiving frames for sub-buses.
Volcano also provides a very low latency communication mechanism in the form of the immediate
frame API. This is aviewof frames on the network, which allows transmission and reception from/to the
Volcano domain without the normal Volcano input/output latencies, or mutual exclusion requirements
with the v_input() and v_output() calls. There are two communication calls in the immediate signal API:
v_imf_rx() and v_imf_tx().
The v_imf_tx() call copies values of immediate signals into a frame and places the frame in the
appropriate CANcontroller for transmission. The v_imf_rx() takes a receivedframe containing immediate
signals and makes the signal values available to read calls.
A third call v_imf_queued() allows the user to see if an immediate frame has really been sent on the
network. The controller calls allow the application to initialize, connect, and disconnect from networks,
and to place the controllers into sleep mode among others.
43.2.4.2 Volcano Resource Information
The ambition of the Volcano concept is to provide a fully predictable communications solution. In order
to achieve this, the resource usage of the Volcano embedded part has to be determined. Resources of
special interest are memory and execution time.
43.2.4.2.1 Execution Time of Volcano Processing Calls
In order to bound processing time a budget for the v_input() call that is, the maximum number of
frames that will be processed by a single call to v_input() has to be established. A corresponding process
for transmitted frames applies as well.
43.2.5 Timing Model
TheVolcanotiming model covers end-to-endtiming (i.e., frombuttonpress toactivation). Atiming model
is used to be able to set in context the signal timing information needed in order to analyze a network
conguration of signals and frames. This section denes the required information that must be provided
by an application-programmer in order to be able to guarantee the end-to-end timing requirements.
A Volcano signal is transported over a network within a frame. Figure 43.2 identies six time points
between the generation and consumption of a signal value.
Max_age
T
PL
T
SL
Notional
generation
Frame enters
arbitration
Transmission
completed
Notional
consumption
1 2 3 4 5 6
Time
T
T
T
BT
T
AT
First v_output
at which new
value is
available
First v_input
at which
signal is
available
FIGURE 43.2 The Volcano timing model.
2006 by Taylor & Francis Group, LLC
43-8 Embedded Systems Handbook
The six time points are:
1. Notional generation (signal generated) either by hardware (e.g., switch pressed) or software
(e.g., timeout signaled). The user can dene this point to best reect their system.
2. First v_output() (or v_imf_tx() for an immediate frame) at which a new value is available. This is
the rst such call after the signal value is written by a write call.
3. The frame containing the signal is rst entered for transmission (arbitration on a CAN bus).
4. Transmission of the frame completes successfully (i.e., the subscribers communication controller
receives the frame from the network).
5. v_input() (or v_imf_rx() for an immediate frame) makes the signal available to the application.
6. Notional consumption the user application consumes the data. The user can dene this point
to best reect their system.
The max_age of the signal is the maximum age, measured from notional generation, at which it is
acceptable for notional consumption. The max_age is the overall timing requirement on a signal.
T
PL
(publish latency) is the time from notional generation to the rst v_output() call when the signal
value is available to Volcano (a write call has been made). It will depend on the properties of the
publishing application. Typical values might be the frame_processing_period (if the signal is written
fresh at every period but this is not synchronized with v_output()), the offset between the write call and
v_output() (if the two are synchronized), or the sum of the frame_processing_period and the period of
some lower rate activity that generates the value. This value must be given by the application-programmer.
T
SL
(subscribe latency), the time from the rst v_input that makes the new value available to the
application to the time when the value is consumed. The consumption of a signal is a user-dened event
that will depend on the properties of the subscribing function. As an example it can be a lamp being lit,
or an actuator starting to move. This value must be given by the application-programmer.
The intervals T
BT
, T
T
, and T
AT
are controlled by the Volcano 5 conguration and are dependent upon
the nature of the frame in which the signal is transported.
The value T
BT
is the time before transmission (the time from the v_output call until the frame enters
arbitration on the bus). T
BT
is a per-frame value that depends on the type of frame carrying the signal
(see later sections). This time is shared by all signals in the frame, and is common to all subscribers to
those signals.
The value T
AT
is the time after transmission (the time from when the frame has been successfully
transmitted on the network until the next v_input call). T
AT
is a per-frame value that may be different for
each subscribing ECU.
The value T
T
is the time required to transmit the frame (including the arbitration time) on the
network.
43.2.5.1 Jitter
The application-programmer at the supplier must also provide information of the jitter to the systems
integrator. This information is as follows:
The input_jitter and output_jitter refer to the variability in the time taken to complete the v_input()
and v_output() calls, measured relative to the occurrence of the periodic event causing Volcano processing
to be done (i.e., calls to v_input() v_gateway(), and v_output() to be made). Figure 43.3 shows how the
output_jitter is measured. In the gure, E marks the earliest completion time of the v_output() call, and
L marks the latest completion time, relative to the start of the cycle. The output_jitter is therefore L E.
The input_jitter is measured according to the same principles.
If a single-thread systemis used, without interrupts, the calculation of the input_jitter and output_jitter
is straightforward: the earliest time is the best-case execution time of all the calls in the cycle (including
the v_output() call), and the latest time is the worst-case execution time of all the calls. The situation is
more complex if interrupts can occur or the system consists of multiple tasks, since the latest time must
take into account preemption from interrupts and other tasks.
2006 by Taylor & Francis Group, LLC
Volcano Enabling Correctness by Design 43-9
T
V
E
Occurrence of
periodic event
that initiates
Volcano
processing calls
Worst-case
execution time of
v_output call
Frame
processing
period
Completion
v_output call
Execution of
other
computation
Best-case
execution time of
v_output call
L
T
V
T
V
FIGURE 43.3 Measurement of output jitter.
43.2.6 Capture of Timing Constraints
The declaration of a signal in a Volcano xed conguration le provides syntax to capture the following
timing-related information:
Whether a signal is state or state change (info_type)
Whether a signal is sporadic or periodic (generation_type)
The latency
The min_interval
The max_interval
The max_age
The rst two (together with whether the signal is published or subscribed to) provide signal properties
that determine the kind of the signal.
A state signal carries a value that completely describes the signaled property (e.g., the current position
of a switch). A subscriber to such a signal need only observe the signal value when the information is
required for the subscribers purposes (e.g., signal values can be missed without affecting the usefulness
of later values). Astate change signal carries a value that must always be observed in order to be meaningful
(e.g., distance traveled since last signal value). A subscriber must observe every signal value.
A sporadic signal is one that is written by the application in response to some event (e.g., a button
press). A periodic signal is one that is written by the application at regular intervals.
The latency of a signal is the time from notional generation to being available to Volcano (for a
published signal), or from being made available to the application by Volcano to notional consumption
(for a subscribed signal). Note that immediate signals (those in immediate frames) include time taken to
move frames to/from the network in these latencies.
The min_interval has different interpretation for published and for subscribed signals. For a published
signal, it is the minimum time between any pair of write calls to the signal (this allows, e.g., the calculation
of the maximum rate at which the signal could cause a sporadic frame carrying it to be transmitted).
For a subscribed signal, it is the minimum acceptable time between arrivals of the signal. This is
optional: it is intended to be used if the processing associated with the signal is triggered by arrival of
a new value, rather than periodic. In such a case, it provides a constraint that the signal should not be
connected to published signal with a faster rate.
2006 by Taylor & Francis Group, LLC
43-10 Embedded Systems Handbook
The max_interval has different interpretation for published and subscribed signals. For a pub-
lished signal, the interesting timing information is already captured by min_interval and publish
latency.
For a subscribed signal it is the maximuminterval betweennotional consumptionsof the signal (i.e., it
can be used to determine that signal values are sampled quickly enough so that none will be missed).
The max_age of a signal is the maximum acceptable age of a signal at notional consumption, measured
from notional generation. This value is meaningful for subscribed signals.
In addition to the signal timing properties described above, the Volcano xed conguration le provides
syntax to capture the following additional timing-related information:
The Volcano processing period. The Volcano processing period denes the nominal interval between
successive v_input() calls on the ECU, and also between successive v_output() calls (i.e., the rates of the
calls are the same, but v_input() and v_output() are not assumed to become due at the same instant).
For example, if the Volcano processing period is 5 msec then each v_output() call becomes due 5 msec
after the previous one became due.
The Volcano jitter time. The Volcano jitter denes the time by which the actual call may lag behind the
time at which it became due. Note that becomes due refers to the start of the call, and jitter refers to
completion of the call.
43.3 Volcano Network Architect
To manage increasing complexity in electrical architectures a structured development approach is believed
essential to assure correctness by design. Volcano automotive group has developed a network design tool,
Volcano Network Architect (VNA), to support a development process, based on strict systems engineering
principles. Gatewaying of signals betweendifferent networks is automatically handled by the VNAtool and
the accompanying embedded software. The tool supports partitioning of responsibilities into different
roles, suchas systemintegrator andfunctionowner. Thirdparty tools may be usedfor functional modeling.
These models can be imported into VNA.
Volcano Network Architect is the top-level tool in the Volcano Automotive Groups tool chain for
designing vehicle network systems. The tool chain supports important aspects of systems engineering
such as:
Use of functional modeling tools.
Partitioning of responsibilities.
Abstracting away fromhardware and protocol-specic details providing a signals-based API for the
application developer.
Abstracting away from the network topology through automatic gatewaying between different
networks.
Automatic frame compilation to ensure that all declared requirements are fullled (if possible),
that is, delivering correctness by design.
Reconguration exibility by supporting post-compile-time reconguration capability.
The VNA tool supports network design and makes management and maintenance of distributed network
solutions more efcient. The tool supports capturing of requirements and then takes a user through all
stages of network denition.
43.3.1 The Car OEM Tool Chain One Example
Increasing competition and complex electrical architectures demands enhanced processes. Function mod-
eling has proved to be a suitable tool to capture the functional needs in a vehicle. Tools, such as Rational
Rose, provide a good foundation to capture all different functions and other tools, such as Statemate and
Simulink, model them in order to allocate objects and functionality in the vehicle. Networking is essential
2006 by Taylor & Francis Group, LLC
Volcano Enabling Correctness by Design 43-11
FIGURE 43.4 VNA screen.
since the functionality is distributed among a number of ECUs in the vehicle. Substantial parts of the
outcome from the function modeling are highly suitable to use as input to a network design tool, such
as VNA.
The amount of information required to properly dene the networks are vast. To support input of data
VNA provides an automated import from third party tools through an XML-based format.
It is the job of the signal database administrator/systemintegrator to ensure that all data entered into the
systemare valid and internally consistent. VNAsupports this task through a built-in multilevel consistency
checker that veries all data (Figure 43.4).
In this particular approach the network is designed by the system integrator in close contact with
the different function owners in order to capture all necessary signaling requirements functional and
nonfunctional (including timing). When the requirements are agreed and documented in VNA, the
system integrator uses VNA to pack all signals into frames, this can be done manually or automatically.
The algorithm used by VNA handles gatewaying by partitioning end-to-end timing requirements into
requirements per network segment.
All requirements are captured in the form of a Microsoft Word document called Software Requirement
Specication (SWRS) that is generated by VNA and sent to the different node owners as a draft copy to
be signed-off. When all SWRS has been signed-off VNA automatically creates all necessary conguration
les used in the vehicle along with a variety of les for third party analysis and measurement tools.
The network level (global) conguration les are used as input to the Volcano Conguration Tool and
Volcano Back-End Tool in order to generate a set of downloadable binary conguration les for each
node. The use of recongurable nodes makes the system very exible since the Volcano concept separates
application dependent information and network dependent information. A change in the network by the
systemintegrator can easily be applied to a vehicle without having to recompile the application software in
the nodes. The connection between function modeling and VNA provide good support for iterative
design. It veries network consistency and timing up front, to ensure a predictable and deterministic
network.
43.3.2 VNA Tool Overview
43.3.2.1 Global Objects
The workow in VNA ensures that all relevant information about the network is captured. Global objects
shall be created rst, and then (re-)used in several projects. The VNAuser works with objects of types such
2006 by Taylor & Francis Group, LLC
43-12 Embedded Systems Handbook
Frame
compiler
Consistency
check
Config.
generator
Quick
generator
GUI
D
Backup
DB
in
RA
D
Console
Use
manage
D
If
D
Converter
Volcano configuration files
Fixed Target Network
Specs,
reports, and
document
ASAP
SWRS
LIN description files
.ldf
Generi
Exp./Imp.
XML
FIBEX
XML files
Conversion
tool
Customer
Third party format
HTML
DB
FIGURE 43.5 The database is a central part of the VNA system. In order to ensure highest possible performance,
each instance of VNA accesses a local mirror of the database that is continuously synchronized with its parent.
as signals/nodes/interfaces, etc. These objects are used to build up the networks used in a car. Signals are
dened by name and type, and can have logical or physical encoding information attached. Interfaces
detailing hardware requirements are dened leading to describe actual nodes on a network. For each
node, receive and transmit signals are dened, and timing requirements are provided for the signals. This
information is intended for global use, that is, across car variants, platforms, etc.
43.3.2.2 Project or Conguration Related Data (Projects, Congurations, Releases)
When all global data have been collected the network will be designed by connecting the interfaces
in a desired conguration. VNA has strong project and variant handling. Different congurations can
selectively use or adapt the global objects, for example, by removing a high-end feature from a low-end
car model. This means that VNA can manage multiple congurations, designs, and releases, with version
and variant handling.
The release handling ensures that all components in a conguration are locked. It is however still
possible to reuse the components in unchanged form. This makes it possible to go back to any released
conguration at any point in time (Figure 43.5).
43.3.2.3 Database
All data objects, both global and conguration-specic, are stored in a common database. The VNA tool
was designed to have one common multiuser database per car OEM. In order to secure highest possible
performance all complex and time-consuming VNAoperations are performed toward a local RAMmirror
of the database. Aspecially designed database interface ensures consistency in the local mirror. Operations
that are not time critical, such as database management, operate toward the database.
The built-in multiuser functionality allows multiple users to access all data stored in the database
simultaneously. To ensure that a data object is not modied by more than one user the object must be
locked before any modication, read access is of course allowed for all users although an object is locked
for modication.
2006 by Taylor & Francis Group, LLC
Volcano Enabling Correctness by Design 43-13
43.3.2.4 Version and Variant Handling
The VNA database implements functionality for variants and versions handling. Most of the global data
objects, for example, Signals, functions, and nodes, may exist in different versions, but only one version
of an object can be used in a specic project/conguration.
The node objects can be seen as the main global objects, since hierarchically they include all other types
of global objects. The node objects can exist in different variants but only one object can be used from a
variant folder in a particular project/conguration.
43.3.2.5 Consistency Checking
Extensive functionality for consistency checking is built into the VNA tool. The consistency check can be
manually activated when needed, but is also running continuously to check user input and give immediate
feedback on any suspected inconsistency. The consistency check ensures that the network design follows
predened rules and generates errors when appropriate.
43.3.2.6 Timing Analysis/Frame Compilation
The Volcano concept is based on a foundation of guaranteed message latency and a signals-based publish
and subscribe model. This provides abstraction by hiding the network and protocol details, allowing the
developer to work in the application domain with signals, functions, and related timing information.
Much effort has been spent on developing and rening the timing analysis in VNA. The timing analysis
is built upon a scheduling model called DMA (Deadline Monotonic Analysis), and calculates the worst-
case latency for each frame among a dened set of frames sent on the bus. Parts of this functionality have
been built into the consistency check routine as described above but the real power of the VNA tool is
found in the frame packer/frame compiler functionality.
The frame packer/compiler attempts to create an optimal packing of signals into frames, than calculate
the proper IDs to every frame ensuring that all the timing requirements captured earlier in the process
are fullled (if possible). This automatic packing of multiple signals into each frame makes more ef-
cient use of the data bus, by amortizing some of the protocol overheads involved thus lowering bus load.
The combined effect of multiple signals per frame and perfect ltering results in a lower interrupt and
CPU load, which means that the same performance can be obtained at lower cost. The frame packer
can create the most optimal solution if all nodes are recongurable. To handle carry over nodes that
are not recongurable (ROM-based), these nodes and their associated frames can be classed as xed.
Frame packing can also be performed manually if desired. Should changes to the design be required at
a later time, the process allows rapid turnaround of design changes, rerunning the Frame Compiler and
regenerating the conguration les.
The VNA tool can be used to design network solutions that are later realized by embedded software
from any provider. However, the VNA tool is designed with the Volcano embedded software (VTP) in
mind, which implements the expected behavior into the different nodes. To get the full benets of the tool
chain, VNA and VTP should be used together.
43.3.2.7 Volcano Filtering Algorithm
A crucial aspect of network conguration is how to choose identiers so that the load on a CPU related
to handling of interrupts generated by frames of no interest for the particular node is minimized: most
CAN controllers have only limited ltering capabilities. The Volcano ltering algorithm is designed to
achieve this.
An identier is split into two parts: the priority bits and the lter bits. All frames on a network must
have unique priority bits; for real-time performance the priority setting of a frame should reect the
relative urgency of the frame. The lter bits are used to determine if a CAN controller should accept
or reject a frame. Each ECU that needs to receive frames by interrupts is assigned a single lter bit; the
hardware ltering in the CAN controller is set to must match 1 for the lter bit, and dont care for all
other bits.
2006 by Taylor & Francis Group, LLC
43-14 Embedded Systems Handbook
0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1
0 8 12 16 20 24 28 4 ID bit
Number of priority bits= 7 Number of filter bits=13
Unused bits (0)
2 C
4 0 0 0 6 1
2 C 4 0 8 7
FIGURE 43.6 A CAN identier on an extended CAN network. The network clause has dened the CAN identiers
to have 7 priority bits and 13 lter bits. The least signicant bit of the value corresponds with the bit of the identier
transmitted last. Only legal CAN identiers can be specied: identiers with the 7 most signicant bits equal to 1 are
illegal according to the CAN standard.
The lter bits of a frame are set for each ECU to which the frame needs to be seen. So a frame that
is broadcast to all ECUs on the network is assigned lter bits all set to 1. For a frame sent to a single
ECU on the network, just one lter bit is set. Figure 43.6 illustrates this; the frame shown is sent to
four ECUs.
If an ECU takes an interrupt for just the frames that it needs then the ltering is said to be perfect. In
some systems there may be more ECUs needing to receive frames by interrupt than there are lter bits in
the network; in this case, some ECUs will need to share a bit. If this happens then Volcano will lter the
frames in software, using the priority bits to uniquely identify the frame and discarding unwanted frames.
The priority bits are the most signicant bits. They indicate priority and uniquely identify a frame. The
number of priority bits must be large enoughtouniquely identify a frame ina givennetwork conguration.
The priority bits for a given frame are set by the relative urgency (or deadline) of the frame. This is
derived from how urgently each subscriber of a signal in the frame needs the signal (as described earlier).
In most systems 5 to 10 priority bits are sufcient.
The lter bits are the remaining least signicant bits and are used to indicate the destination ECUs
for a given frame. Treating them as a target mask does this: Each ECU (or group of ECUs) is assigned
a single lter bit. The ltering for a CAN controller in the ECU is set up to accept only frames where
the corresponding lter bit in the identier is set. This can give perfect ltering: an interrupt is raised
if and only if the frame is needed by the ECU. Perfect ltering can dramatically reduce the CPU load
compared with ltering in software. Indeed, perfect ltering is essential if the system integrator needs
to connect ECUs with slow 8-bit CPUs to high-speed CAN networks (if ltering were implemented in
software the CPU would spend most of its available processing time handling interrupts and discarding
unwanted frames). The ltering scheme also allows for broadcast of a frame to an arbitrary set of ECUs.
This can reduce the trafc on the bus since frames do not need to be transmitted several times to different
destinations. Because the system integrator is able to dene the conguration data and because that data
denes the complete network behavior of an ECU, the in-vehicle networks are under the control of the
system integrator.
43.3.2.8 Multiprotocol Support
The existing version of VNA supports the complementary, contemporary network protocols of CAN and
LIN. The next version, will also have support for the FlexRay protocol. A prototype version of VNA with
2006 by Taylor & Francis Group, LLC
Volcano Enabling Correctness by Design 43-15
partial MOST support is currently under construction. As network technology continues to advance into
other protocols, VNA will also move to support these advances.
43.3.2.9 Gatewaying
A network normally consists of multiple network segments using different protocols. Signals, may be
transferred from one segment to another through a gateway node. As implemented throughout the
whole tool chain of Volcano Automotive Group, gatewaying of data even across multiple protocols
is automatically congured in VNA. In this way VNA allows any node to subscribe any signal gener-
ated on any network without needing to know how this signal is gatewayed from the publishing node.
Handling of timing requirements over one or more gateways is also handled byVNA. The Volcano solution
requires no special gatewaying hardware and therefore provides the most cost-efcient solution to signal
gatewaying.
43.3.2.10 Data Export and Import
The VNA tool enables the OEMs to get a close integration between VNA and functional modeling tools
and to share data among different OEMs and subcontractors, for example, node developers.
Support of emerging standards, such as FIBEX and XML, will further simplify information sharing and
become a basis for conguration of third party communication layers.
43.4 Volcano Software in an ECU
The Volcano tool chain includes networking software running in each ECU in the system. This software
uses the conguration data to control the transmission and reception of frames on one or more buses
and present signals to the application-programmer. One view of the Volcano network software is as
a communications engine under the control of the system integrator. The view of the application-
programmer is different: the software is a black-box into which published signals are placed, and out of
which can be summoned subscribed signals.
The main implementation goals for Volcano target software are as follows:
Predictable real-time behavior: no data-loss under any circumstances.
Efciency: low RAM usage, fast execution time, small code size.
Portability: low cost of moving to a new platform.
43.4.1 Volcano Conguration
Building a conguration is a key part of the Volcano concept. A conguration is, as already mentioned,
based on details, such as how signals are mapped into frames, allocation of identiers, and processing
intervals.
For each ECU, there are two authorities acting in the conguration process: the system integrator and
the ECU supplier. The system integrator provides the Volcano conguration for the ECU regarding the
network behavior at the system level, and the supplier provides the Volcano conguration data for the
ECU in terms of the internal behavior.
43.4.1.1 The Conguration Files
The Volcano conguration data is captured in four different types of les. These are:
Fixed information, which is agreed between the supplier and system integrator.
Private information, which is provided by the ECUsupplier. The ECUsupplier does not necessarily
have to provide this information to the system integrator.
Network conguration information, which is supplied by the system integrator.
Target information, which is the supplier descriptionof the ECUpublished to the systemintegrator.
2006 by Taylor & Francis Group, LLC
43-16 Embedded Systems Handbook
43.4.1.1.1 Fixed Information
The xed information is the most important in achieving a working system. It consists of a complete
description of the dependencies between the ECU and the network. This includes a description of the
signals the ECU needs from the network, how oftenVolcano calls will be executed, and so on. The inform-
ation also includes description of the CAN controller(s), and possible limitations regarding reception and
transmission boundaries and supported frame modes. The xed information forms a contract between
the supplier and the system integrator: the information should not be changed without both parties being
aware of the changes. The xed information le is referred to as the FIX le.
43.4.1.1.2 Private Information
The private le contains additional information for Volcano, which does not affect the network: timeout
values associated to signals and what ags are used by the application. The private information le is
referred to as the PRI le.
43.4.1.1.3 Network Information
The network information species the network conguration of the ECU. The system integrator must
dene the number of frames sent from, and received by the ECU, the frame identier and length, and
details of how the signals in the agreed information are mapped into these frames. Here, the vehicle
manufacturer also denes the different frame modes used in the network. The network information le
is referred to as the NET le.
43.4.1.1.4 Target Information
The target information contains information about the resources that the supplier has allocated toVolcano
in the ECU. It describes the ECUs hardware (e.g., used CAN controllers and where those are mapped in
memory). The target information le is referred to as the TGT le.
43.4.2 Workow
The Volcano system identies two major roles in the development of a network of ECUs: the application
designer (which may include the designer of the ECU system or the application-programmer) and the
system integrator. The application designer is typically located at the organization developing the ECU
hardware and application software. The system integrator is typically located at the vehicle manufacturer.
The interface between the application designer and the system integrator is carefully controlled, and the
information owned by each side is strictly dened. The Volcano tool chain implementation is clearly
reecting this partitioning of roles.
The Volcano system includes a number of tools to help the system integrator in dening a network
conguration. The Network Architect is a high-level design tool, with a database containing all the
publish/subscribe informationfor eachECUavailable, as describedinthe previous sections. After mapping
the signaling needs on particular network architecture, thus dening the connections between the pub-
lished and subscribed signals, an automatic Frame Compiler will be run. The Frame Compiler tool uses
the requirements captured earlier to build a conguration that meet those requirements. There are many
possibilities to optimize the bus behavior. The frame compiler includes the CAN bus timing analysis and
LIN schedule table generation and will not generate a conguration that violates the timing requirements
placed on the system. The frame compiler also uses the analysis to answer what if? type of questions and
guide the user in building a valid and optimized network conguration.
The output of the frame compiler is used to build conguration data specic to each ECU. This is used
by the Volcano target software in the ECU to properly congure and use the hardware resources.
The Volcano conguration data generator tool set (V5CFG/V5BND) is used to translate this ASCII text
information to executable binary code in the following way:
When the supplier executes the tool, it reads the FIX, PRI, and TGT les to generate compile-time
data les. These data les are compiled and linked together with the application program together
with the Volcano library supplied for the specic ECU system.
2006 by Taylor & Francis Group, LLC
Volcano Enabling Correctness by Design 43-17
When the vehicle manufacturer executes the tool, it reads the FIX, NET, and TGT les to generate
the binary data that is to be located in the ECUs Volcano conguration memory (known as the
Volcano NVRAM). An ECUis then congured (or recongured) by downloading the binary data
to the ECUs memory.
Note: It is vital torealize that, changes toeither the FIXor the TGTle cannot be done without coordinating
between the system integrator and the ECU supplier.
The vehicle manufacturer can, however, change the NET le without informing the ECU supplier.
In the same way, the ECU supplier can change the PRI le without informing the system integrator.
Figure 43.7 shows howthe Volcano Target Code for an ECUis congured by the supplier and the system
integrator.
Private
V5CFG
configuration
tool
V5CFG
configuration
tool
V5BND
target
tailoring
Fixed Network
Volcano 5
target library
Compile-time
data
Application
program
Target
Binary data
for ECU
configuration
Program code (ROM / FLASH
EEPROM)
Volcano 5 NVRAM pool
ECU memory
Intermediate
fix / net
Intermediate
fix / pri
Agreed
information
ECU
supplier
Vehicle
manufacturer
V5BND
target code
generation
FIGURE 43.7 Volcano Target Code conguration process.
2006 by Taylor & Francis Group, LLC
43-18 Embedded Systems Handbook
The Volcano concept and related products has been successfully used in production since 1996. Present
car OEMs using the entire tool chain are Aston Martin, Jaguar, LandRover, MG Rover Volvo Cars, and
Volvo Bus Corporation.
Acknowledgments
I wish to acknowledge the contributions of my colleagues at Volcano Automotive Group in particular,
Istvn Horvth, Niklas Amberntsson, and Mats Ramnefors for their contributions to this chapter.
References
[1] K. Tindell and A. Burns, Guaranteeing Message Latencies on Controller Area Network (CAN),
in Proceedings of the 1st International CAN Conference, 1994, pp. 211.
[2] K. Tindell, A. Rajnak, and L. Casparsson, CAN Communications Concept with Guaranteed Message
Latencies, SAE paper, 1998.
[3] L. Casparsson, K. Tindell, A. Rajnak, and P. Malmberg, Volcano A Revolution in On-board
Communication, Volvo Technology report, 1998.
[4] W. Specks and A. Rajnk, The Scaleable Network Architecture of the Volvo S80, in Proceedings
of the 8th International Conference on Electronic Systems for Vehicles, Baden-Baden, October 1998,
pp. 597641.
More Information
http: www.VolcanoAutomotive.com
2006 by Taylor & Francis Group, LLC
Industrial Automation
44 Embedded Web Servers in Distributed Control Systems
Jacek Szymanski
45 HTTP Digest Authentication for Embedded Web Servers
Mario Crevatin and Thomas P. von Hoff
2006 by Taylor & Francis Group, LLC
44
Embedded Web
Servers in Distributed
Control Systems
Jacek Szymanski
Alstom Transport
44.1 Objective and Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44-1
44.2 Application Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44-2
44.3 FDWS Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44-4
Embedded Server Functions Embedded Site Structure
Embedded Server Operation Site Implementation
44.4 Guided Tour to Embedded Server Implementation . . . 44-10
Steps of Embedded Site Implementation Process
Implementation of VFS Implementation of Look-and-Feel
Objects Implementation of Page Composition Routines
Implementation of Script Activation Routines
Implementation of Application Wrappers Putting Pieces Together
44.5 Example of Site Implementation in a
HART Protocol Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44-20
Structure of the Site Embedded in the Protocol Gateway
Detailed Implementation of Principal Functions Access to
Site Home Page Access to Parameters of the Gateway
Access to Active Channel List Access to Channel Parameters
Monitoring of Principal Channel Measure Access Control
and Authentication Application Wrapper
44.6 Architecture Summary and Test Case Description for
the Embedded Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44-31
Embedded Site Architecture Test Description Test Scenarios
44.7 Summing Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44-38
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44-38
44.A1 Appendix: Conguration of VFS . . . . . . . . . . . . . . . . . . . . . . . 44-39
Programming of VFS Component BNF of Specication
Language Specication Example
44.1 Objective and Contents
In todays landscape of information technology the World Wide Web (WWW) is omnipresent. Its usage
is unavoidable in everyday life and in all domains. The WWW technology is around for approximately
44-1
2006 by Taylor & Francis Group, LLC
44-2 Embedded Systems Handbook
15 yr. From its early days the WWW passed many phases from initial research status through the euphoria
of e-anything in the late 1990s, till todays mature status in the domain of information broadcast,
advertising, and e-commerce.
The main objective of this account is to present the application of web-related technologies in the
domain of industrial control and more specically in the area of distributed control equipment operating
on the oor shop around the eldbus interconnections. The rationale for using the technology is to
provide access to (control) system elements by communication means based on the industrial standards
dened around the WWW.
Embedded Web Servers are now omnipresent within packages proposed by different software editors.
The proposed products differ in size, performance, price, architecture, application area. The objective
of this account is not to provide review of the existent products or their comparison. Rather than reviewing
the features of the ready-made solution, the account proposes to go through the technological bases on
which relies the construction of these applications on the example of an existent software application
called in the sequel Field Device Web Server (FDWS).
The FDWS was implemented with the objective to enhance the operational functionality of a large class
of distributed control system architectures and especially eldbus-based parts of these architectures, by
providing them with the power and the exibility of internet technology.
The account outlines the design of embedded web servers in two steps. First, it presents the context
in which the embedded web servers are usually implemented. Second, it sketches the structure of an
FDWS application with the presentation of its component packages and the mutual relationship between
the content of the packages and the architecture of a typical embedded site. The main motivation of
the account is, however, to show the user an exemplary approach to an embedded site design. For this
reason, an illustrated real-life example presents the details of design, implementation, and test trials of an
embedded website implemented in an existent eld device.
44.2 Application Context
To sketch the impact of technology on control applications, it is important to identify the location of a
eld device in a typical architecture of the control system, as shown in Figure 44.1. The eld device is part
of an automation cell a collection of cooperating instruments that realize a well-dened automation
function. These devices are of different levels of complexity fromsimple sensors with extremely limited
functions to process computers equipped with powerful processors and large memory banks containing
several embedded software programs.
Automation cells of a control system cooperate in order to implement a coherent control application.
The cooperation is possible, thanks to information exchange via a higher level network that forms the
systembackbone. The backbone links the automation cells with the control roomsupervisory computers
that provide the interface between the systemand human operators. So the global control applications are
structured into two collections of functions:
Automation functions implemented in automation cells.
Supervisory functions implemented in control roomcomputers.
Information exchange between the two parts is based almost exclusively on the clientserver paradigm.
The idea that is at the base of the development of the FDWS has its origins in the analysis of the
structure of the WWW, and in the statement that it makes a perfect example of successful application of
interoperability principle applied to diverse software products. The interoperability of internet products
(clients and servers) is based on universally accepted standards that are:
TCP/IP protocol for reliable data transport.
HTTP protocol for application information exchange.
XML/ HTML format for information presentation and structuring.
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-3
Automation cell
F
i
e
l
d
b
u
s
Interconnecting network
Field devices
Control room
console
Process computer
FIGURE 44.1 Place of eld devices in automation system. (From J. Szymanski, Proceedings of WFCS 2000,
September 68, ISEP, Porto Portugal, 2000, pp. 301308. With permission. Copyright 2000 IEEE.)
Application
Client
tier
Server
tier
Application
tier
FIGURE 44.2 Three-tier architecture of internet clientserver application. (From J. Szymanski, Proceedings of
WFCS 2000, September 68, ISEP, Porto Portugal, 2000, pp. 301308. With permission. Copyright 2000 IEEE.)
Another successful application of universally accepted standards concerns the architectural pattern. The
internet base distributed applications are based on the principle of multitier architecture (Figure 44.2),
which makes use of universal client and server frameworks independent of the nature of data processed,
on the condition that the data exchanged over the network are transported via HTTP protocol and are
structured according to XML/HTML format.
The multitier architecture standardizes basic services of client and server parts of the architecture. In the
majority of congurations, the client part is totally independent of the application (so-called thin client).
It is important to state that the properties and advantages of the multitier architecture are independent
on the technology used for implementation of its components and are valid for both embedded and
nonembedded implementations.
2006 by Taylor & Francis Group, LLC
44-4 Embedded Systems Handbook
The universal nature of the client places the burden of application personalization on the server side,
which is interfaced directly with the embedded application software. This conguration is at the origin of
the internal architecture of the server described in the further sections of the account.
This architecture simplies the development process of newly created applications. As one can see, the
client tier of the architecture is totally standard and as such is not to be developed. A big part of server tier
is also based on generic modules, such as TCP/IP and socket modules, which are included in the standard
deliveries for the majority of implementation platforms. The FDWS is designed to interface with these
modules and provides a large degree of independence from the implementation details.
Not all application independent modules exist on implementation platforms and FDWS provides a
collection of software modules that can be included into an application. It is important to state here that
the FDWS is only an exemplary presentation of an embedded web technology. For other implementations
consider Reference 1.
Figure 44.2 shows the three-tier version of the architecture where the generic parts of it, described
above, were completed by the application-dependent part that is considered monolithic. In the quest
for further factorization of the design it can be considered that this monolithic software bloc be split in
thinner layers.
The application-dependent part of server tier is to be developed for each application. Despite evident
advantages of a standard architectural pattern, design and implementation process of this part of the
embedded site is not an easy task. The reason for this is that it requires the technical uency in four
disciplines:
Comprehensionof basic applicationimplementedby the equipment hosting the site. This is because
the operation of the hosting equipment should be enhanced rather than modied the embedded
site is then the extension of existent application.
Skill of creation of HTTP-based websites implying the knowledge of technologies such as CGI
scripting, HTML, Java, JavaScript, etc. This is because these techniques are the basis of operation
of components executed in the generic client.
Good knowledge of constraints of the platform on which the site is to be installed. This reason is
imposed by the principle of minimal interference with the basic application and should inuence
the complexity of site structure as well as the size of site components.
Good comprehension of FDWS technology; at least of its part in charge of data transfer through
the server tier toward the application tier.
The guided tour through the development of an application embedded in a eld device is described in
sections which follow.
44.3 FDWS Technology
44.3.1 Embedded Server Functions
The place of the server tier in the structure presented in Figure 44.2 shows its role in the architecture
of the application. This role has nothing to do with the organic mission of the hosting equipment, that
is, the server software will not directly intervene in execution of control algorithms while installed on a
Programmable Logic Controller (PLC) nor will it do any protocol conversion while installed in a protocol
gateway such as HART/FIP converter described below.
1. Embedded server can store and serve the complete interface to the application within the
equipment.
2. Server can activate routines that are able to extract and interpret orders sent from the client part
and modify application status via accessible interface.
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-5
3. Server resident routines can extract the information coming from the application, format it, and
wrap into HTML pages in order to provide them to the client side.
4. Dynamically generated user-interface components can easily manage the evolution of visual aspect
of the user interface in function of the application status; this aspect can ease operations, such as
anomaly signaling.
5. Internal server mechanisms provide the possibility of easy implementation of password-based
security locks.
In this context, the server role consists in providing a greatly exible and relatively easily implementable
interface. This interface provides remote clients with controlled and congurable access mechanisms
to data, structure, status, and processing modes of (organic) applications embedded in control system
devices.
The FDWS software is designed to implement all the required functions of an embedded server. These
functions express the requirements from the point of view of the nal user and have to be reformulated
in terms of communication architecture. From this viewpoint the embedded site takes the shape of an
HTTP protocol server operating above the TCP/IP transport. The basic functions of such an entity are:
1. Management of connections coming from distant clients.
2. Analysis of clients requests, in terms of syntax and semantics.
3. Maintenance of local server objects in view of their access by distant clients.
4. Decision of granting or refusing access to server objects; composition and transmission of responses
corresponding to clients requests.
5. Execution of processing expressed in clients requests.
The FWDS software is structured in ve interrelated software packages. Each of the basic server func-
tions is supported by one or many software modules. The roles of the packages are as explained in
Table 44.1.
The software is organized in ve packages for better design, easier deployment, and maintainability.
Figure 44.3 shows the mutual interdependence among packages. In a typical implementation the modules
fromall ve packages have to be used in embedded site construction.
44.3.2 Embedded Site Structure
The architecture of an embedded server does not differ in principle from the architecture of a regular
(nonembedded) web-enabled application.
TABLE 44.1 Package Functions of FDWS Software
Package function Role
Main server engine In charge of connection management process: this package groups the
modules that realize the functions of server engine operation, network
adaptation, and support of persistence of request data
Http request parser In charge of request analysis: this package implements the parsing of PDU
and CGI environment building
Embedded le system In charge of controlling access to server objects: this package provides the
elements that support the implementation of embedded equivalent of disk
le system
Dynamic page generator Is in charge of generation of servers requests on-the-y (servlets): this
package provides the elements that support the implementation of
dynamically generated HTML pages
Embedder response composer In charge of response composition
2006 by Taylor & Francis Group, LLC
44-6 Embedded Systems Handbook
HTTP
Request
Parser
Main
Server
Engine
Embedded
Response
Composer
Dynamic
Page
Generator
Embedded
File
System
HTTP
Request
Parser
Main
Server
Engine
Embedded
Response
Composer
Dynamic
Page
Generator
Embedded
File
System
HTTP
Request
Parser
Main
Server
Engine
Embedded
Response
Composer
Dynamic
page
Generator
Embedded
File
System
HTTP
request
parser
Main
server
engine
Embedded
response
composer
Dynamic
page
generator
Embedded
file
system
FIGURE 44.3 General architecture of FieldWebServer software. (From J. Szymanski, Proceedings of WFCS 2000,
September 68, ISEP, Porto Portugal, 2000, pp. 301308. With permission. Copyright 2000 IEEE.)
In the most general terms every server tier is composed of three basic elements:
1. Generic server body principal active component that loops hearing to the incoming service
requests and processes them; request processing consists in:
(a) Parsing the Protocol Data Unit (PDU) syntax.
(b) Recovering environment variables in order to support server operations.
(c) Identifying requested resources together with the operations to be applied to them.
The generic server body is in principle independent of the applications in which it is incorporated.
Its basic elements are server engine, request parser, response composer, and persistence module.
2. Virtual File System (VFS), an embedded object repository organized as the le system of a typical
compute. It is an active component implementing the logistics of server pages management. This
component helps managing the collection of objects being in direct contact with the application.
3. Collection of application specic components elements that implement both look-and-feel part
of application (HTML pages, compressed images, Java applets, ActiveX controls) and its dynamics
(embedded scripts and servlets). These components, which are managed by the VFS, are designed
in order to convey data between client part and the essential application. Most naturally these
elements are totally dependent on application.
The analysis of the structure of embedded server puts in evidence yet another building block of the
architecture the application wrapper. This block is very often introduced into the device structure for
convenience reasons. Its role consists in adapting the functional interface of basic application to the needs
of the page composition module. The structure of this block is totally dependent on the basic application.
The construction of this block is not supported by the modules of the FieldWebServer and for this reason
it is left outside of the server-tier structure.
Taking into account these considerations the whole Internet-based server architecture, in the context of
a control device can be represented by the schematic from Figure 44.4. The left part of the schematic above
shows the software architecture at run time that puts in evidence mutual relationships among all building
block instances. The right part of the schematic shows the organization of FieldWebServer module library.
44.3.3 Embedded Server Operation
Application of FDWS technology in a device is possible if and only if three conditions are fullled:
The executionmodel of the software embeddedinthe device is basedonthe multiprocess paradigm,
it is necessary that all server operations are encapsulated in a separate thread of execution.
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-7
Basic
Application
Application
Wrapper
Module
Virtual File
em
Page Composition
Module
HTML pages,
images &applets
CGI scripts
Fieldbus stack(layers1 and 2)
TCP/IP
Socket Presentation Layer
Server Engine
Request
Analyser
Response
Composer
Persistence Module
Basic
application
Application
wrapper
module
VFS
Page composition
module
HTML pages,
images and applets
CGI scripts
Fieldbus stack (layers 1 and 2)
TCP/IP
Socket presentation layer
Server engine
Request
analyser
Response
composer
Persistence module
FIGURE 44.4 Architecture of internet-based application in a ledbus equipment. (From J. Szymanski, Proceedings
of WFCS 2000, September 68, ISEP, Porto Portugal, 2000, pp. 301308. With permission. Copyright 2000 IEEE.)
The device processing power is sufcient enough to support the additional computation burden
caused by the server operation.
The basic (organic) application of the device exposes a well-dened programming interface that
will provide server routines with the means of accessing to the application data.
Putting server into execution needs two operations which should be executed in order:
Server data that implement internal server objects have to be congured.
Server engine routine has to be activated by the device monitoring program in an independent
execution thread.
44.3.3.1 Minimal Server Interface
Efcient development of embedded servers using the FDWS technology is possible only in the case when
the designer understands basic elements of server interface and server operation. The structure of FDWS
software allows the user to have access to the interface of all the modules from which its packages are
constructed. This means that an advanced user could have made calls to more than a hundred functions
and has direct access to many tenths of global variables. It is really a very complex task and in the majority
of cases not necessary. All what an average FDWS user should be aware of is limited to some ve modules
fromthree packages. In the case when the user disposes the adequate tools for software conguration, its
knowledge of large FDWS interface can be limited to three data types, three global variables, and to less
than 10 routine calls.
2006 by Taylor & Francis Group, LLC
44-8 Embedded Systems Handbook
Normal server operation is composed of two phases:
Initialization and conguration of internal data structures, this phase is executed only at the server
thread startup.
Activation of the main server loop (server engine routine), lasting till the server task destruction.
Both phases are described below.
44.3.3.2 Init Phase
Server engine routine operates on four global variables, all exposed by the modules from request parser
package and module fromserver engine package. The variables have the following meaning:
Pointer to VFS root. Pointer to the data structure which is the root of VFS. The VFS represents the store
of server-owned objects and is structured as a disk le system. It should contain all passive objects that
the server is supposed to provide: HTML pages, images of all formats, Java applets.
Pointer to CGI root. Pointer to the data structure which represents the structured store of active server
objects commonly referred to as CGI scripts.
Script exec routine. Pointer to a routine which is in charge of activation of CGI script routines.
Page compose routine. Pointer to a routine which is in charge of composition of passive objects stored
in VFS.
Comprehension of the role of these variables is the key to understanding server operation, since the
software user is to provide the adequate values to the variables. All four variables should be duly initialized
prior to activationof the server engine routine or the server will not be able to provide any object or execute
any CGI routine. The initialization code should be written by the application programmer according to
the principles described in Section 44.3.3.3.
44.3.3.3 Operation Phase
Initialization phase congures vital server resources. All passive and active components of the embedded
site are placed within reach of server engine via VFS root and CGI root pointers. The engine is also
provided with the methods of reaction to client requests via two user routines pointed by script exec and
page compose routine. Operation phase can now be effectively reacted. The FDWS software contains the
ready-made server engine that implements the operation phase according to a predened scheme that
suits the majority of cases. This standard server engine is implemented by the routine from the package
server engine within the module.
The signature of the routine is as follows:
int server_boot (unsigned short service)
where the formal parameter service is the TCP port number on which the server listens to the connections
fromthe distant clients.
The standard server engine provided within FDWS software implements the policy of the iterative
server; that is, the server which realizes sequentially the requests received from distant clients. This
means that two transactions are never actively executed by the server on the same time. If a series
of requests arrives on the site their data are queued with the buffers of communication software
modules.
Server engine operation entails execution of an initialization step followed by a loop over a sequence of
ve steps. The most important operation of the initialization step involves an attempt of creation of an
access point to the network via a passive socket of SOCK_STREAMtype that is bound to the IP address
of the hosting station and to the TCP port number passed as the procedure parameter.
If this operation succeeds, the routines control activates the loop; in the contrary, the routine exits with
an error code and error message that is passed over to the systemof the embedding device.
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-9
The success in creation of this socket, often referred to as the main socket, enables the programto enter
the processing loop. Server operation proceeds then in the following steps:
Step 1. Server waits passively for incoming requests from the distant clients. On arrival of clients request,
server attempts to accept the connection request and open a secondary (stream-type) socket. If this
operation fails, the secondary socket is not created and the execution thread returns to listening to the
main socket. In the case of success, step 2 is executed.
Step 2. Server reads and parses request data unit received via secondary socket. In this second step of
processing loop, the HTTP request coming froma distant client is received and analyzed. This analysis is
done by a set of routines grouped in the package of the HTTP parser.
If the request structure is recognized as being conformant with the version of the protocol implemented
by server, parsing routine extracts all the important data, which allows the server to elaborate the response.
These data concern the following parameters contained in the request data unit:
Protocol version
Requested HTTP service (GET or POST)
Full identication of the service object (path within server internal structure, object name,
extension)
Object class (HTML page, CGI routine, applet, image, . . .)
Browser options (type, accepted formats, acceted languages, OS type, etc.)
Request parameters (optionally, if included in the request)
CGI variables (optionally, if included in the request)
In the case of nonconformance of the request data, the analysis is abandoned and the parsing routine
returns an error status. If the error occurs the loop control is transferred to the step 4. Otherwise, the
processing executes step 3.
Step 3. Server searches the object identied by the request analysis, prepares, and sends the response data
unit. Successful termination of the request analysis provides the server with all the data necessary to
elaborate a response matching the received request. The step of response preparation is decomposed of
three sequentially executed actions:
Identication of object class (passive object or CGI script).
Object searchwithinone of the server object repositories. To execute this actionobject management
routines exploit the data provided by the user in the initialization phase (server page root and CGI
script root pointers to object repositories) if the requested object is found, the next action is
executed. Otherwise, a standard not found page is send back to the client.
Object composition. To execute this action generic server routines call user-provided routines
plugged to the loop thread in the initialization phase via user congured pointers, script exec
routine and page compose routine.
The generic part of these actions is implemented by the routines from the packages server engine and
VFS. Execution of this step always transfers execution control to step 5.
Step 4. Error report. This step is the alternative to step 3 and is executed only when the analysis of
the received request declares its structure to be nonconformant with the structure of the conventions
recognized by the server. In such a case, a standard error-notifying page is send back to the client. This
situation should be considered as an implementation of a graceful failure mode.
Step 5. Connection closure. According to the HTTP protocol requirements the complete transaction
between the client and the server should be terminated by the request of connection closure initiated by
the server. The routine operation is limited to secondary socket closure.
Application of the ready-made solution freezes the main features of the embedded site. For example,
the request processing policy is imposed to be iterative. The advantage of such a solution for embedded
application is evident: no problems with concurrent accesses to server-owned objects. Disadvantage is
the lack of efciency since some client requests can be rejected by the underlying network software is the
processing loop lags to fetch the arriving request at a sufciently rapid pace.
2006 by Taylor & Francis Group, LLC
44-10 Embedded Systems Handbook
/* crude server loop routine */
void server_loop(
unsigned short service_port_nr,
tcallback parse_request_routine,
tcallback generate_response_routine,
tcallback error_report_routine,
tcallback closing_routine);
FIGURE 44.5 Code of the server routine body. (FromJ. Szymanski, Proceedings of WFCS 2000, September 68, ISEP,
Porto Portugal, 2000, pp. 301308. With permission. Copyright 2000 IEEE.)
The advanced user is not obliged to follow the standard engine. The server engine package contains
a skeletal support for user congurable engines implemented by the routine server_loop, which has ve
parameters as shown in Figure 44.5.
The rst routines parameter is the TCP port number, four others are pointers to user-provided routines
that should implement steps 2, 3, 4, and 5 of the above described loop. If any of the parameters is set to
NULL, the appropriate step of the loop is implemented by the default routine provided by FDWS packages.
The application programmer can thus take over the control of any of execution phases by providing the
pointer to his own routine.
44.3.4 Site Implementation
The programmer who wishes to implement the embedded site should undertake the following steps:
Create the routines that ll in the appropriate memory regions with embedded HTML objects
(pages, page templates, images, applets).
Programthe routines that implement the CGI scripts referenced within embedded objects.
Provide the routines that generate data structures enabling the management of all repositories
included in VFS. These structures should contain references to memory regions, in which are
stored server-embedded objects, and should also hold the addresses of all routines implementing
CGI scripts. These routines should assign the reference of the data structure to the server page
root and the script exec routine pointers.
Provide the routines that determine actions to be undertaken on invocation of each server
object and on activation of each CGI script; references of these routines should be assigned to
send_page_routine and send_script_routine pointers (the implementation of script exec routine
and page compose routine, respectively).
An initialization routine that activates all necessary conguration actions described above.
Create the server process that calls the initialization routine and activates the server engine routine.
All the steps need to be realized according to certain rules described in the next section.
44.4 Guided Tour to Embedded Server Implementation
44.4.1 Steps of Embedded Site Implementation Process
Applicationprogrammer shouldunderstandFDWS software operationinmainfeatures. If he or she agrees
on the predened operation mode, nothing is to be modied nor extended in the routine implementing
the main server loop, described in the Section 44.2.4. In the case he or she decides to customize one or
many phases of the main server loop, the development effort is to increase. In any case, all application
dependent part of the embedded site is to be developed.
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-11
Embedded site is placed within the target platform as an executable object. It is of no importance
whether it is statically linked and loaded with the main application or if its dynamically linked when other
software processes are already running. This detail depends on the platform.
The scenario described in this section is based on the following assumptions:
Server is activated in a separate process.
Server process is implemented as a relocatable object code statically linked with the main
application.
Modules implementing reusable server mechanisms are placed in the static library and are linked
with the code of server process.
The main effort of presentation below is concentrated on the development of application-dependent
software. This software implements the following elements of embedded site:
The VFS tree which determines the skeleton of site structure. The VFS tree holds the references of
all objects making part of the site and provides the mechanism of search for server objects: pages,
applets, images, and CGI scripts. The application developer constructs this part of the embedded
site by using the routines from the package of the VFS. This work can be tedious and complicated
while done manually, but can be easily mechanized by using a conguration tool presented below.
Embedded look-and-feel objects which are data structures representing embedded passive objects
(page frames, applets, images). These data are usually implemented as octet arrays residing in
memory regions referenced by the VFS tree nodes. In the usual development process these objects
are designed and implemented by the tools adapted to the object nature (HTML editors, image
editors, Java development environment). The necessary step concerning the transformation from
the standard formats of their representations (ASCII les, gif/jpeg les, byte code) to the byte arrays
loadable to device memories is to be supported by appropriate tools.
Page composition routines which merge application-dependent data with static page frames in
order to form complete HTML pages that incorporate application status; these routines are to be
programmed manually or are to be generated from a user-friendly notation.
Routines representing active server objects (CGI scripts, dynamic pages) executed on requests
received from the client. These routines serve to integrate application data to server pages. The
routines usually reuse generic functions provided by the FDWS packages; their design is highly
dependent on the application and only manual development process is possible.
Script launching routine.
Application wrapper which extracts useful information from the basic application of hosting device.
Initialization routine which assigns four pointer variables from the server interface with the
appropriate values.
Server process code which calls initialization routine and bootstraps main server loop.
The mutual relationships among these elements as well as the relationship with the library of server
modules are presented in Figure 44.6. Their construction is described in detail, point by point, in the
following sections.
44.4.2 Implementation of VFS
The basis of the VFS is the data structure that is manipulated by the routines that look for the HTML
objects referenced in the client requests. The standard implementation of the site architecture assumes
that this data is composed of two separate lookup trees, called repositories:
Passive object repository holding the references of all HTML pages, images, and applets; the root
of this tree is referenced by the repository root pointer.
Active object repository holding the references to routines that implement the CGI scripts; the root
of this tree is referenced by script routine pointer.
2006 by Taylor & Francis Group, LLC
44-12 Embedded Systems Handbook
Script
launcher
VFS
Page
composer
INIT
Application wrapper
Application
Custom component
Custom component
Script
launcher
VFS
Page
composer
INIT
FDWS
modules
Application wrapper
Application
Custom component
Custom component
Pages,
images,
applets
CGI script,
servlets
dynamic pages
FIGURE 44.6 Relationships of elements of embedded site. (From J. Szymanski, Proceedings of WFCS 2000,
September 68, ISEP, Porto Portugal, 2000, pp. 301308. With permission. Copyright 2000 IEEE.)
The tree is built of three type of nodes:
Repository root. It is a unique entry point to each data structure. This object holds the list of references
to other elements of the tree: embedded directories and embedded les. One of the embedded les, called
by default server page plays the special role in the process of response composition. The tree root can also
hold a reference to the list of authentication records whose role and structure is described below.
Embedded directory node type. This type of node plays the role of the root of a subtree within the server
structure. It holds the list of references to other embedded directories and/or embedded les. It can also
hold the reference to the list of authentication record.
Embedded le node type. This node is the tree leaf that holds directly the reference to the data necessary
to compose and send the requested object.
An example of structure of the repository is presented in Figure 44.7.
Data type of objects that formthe structure of VFS repositories is denedintheVFS package. Repository
creation and the tree grow up is obtained by the successive calls of these routines. One of the possible
sequences of calls that implement the creation of the page repository like in Figure 44.7 can be as follows:
1. Creation of the tree root
2. Creation of the by-default page tree node
3. Append the default page to the tree root
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-13
Repository root
Page by default
Directory public Directory images
Directory javadir
FIGURE 44.7 Example of VFS repository.
4. Creation of embedded directory node named public
5. Creation of series of embedded le nodes and insertion of nodes to the directory
6. Append of directory node to the repository
7. Repeat steps 4, 5, and 6 for directories images and javadir
It is important to state that the data structure constructed by the procedure detailed above generates
the containers which should be lled with the references to data which really implement the embedded
objects. These references should point to the really exploitable data (byte arrays mentioned in the preceding
sections) which are to be generated as the result of a separate operation, described in the section which
follows.
The method of creation of active object repository is nearly identical of the one described above.
44.4.3 Implementation of Look-and-Feel Objects
Data structures corresponding to the repositories of the VFS enable the usual operations of le man-
agement system like le creation, deletion, search, access to data stored in le-type node, activation of
routine referenced by script-type node. They do not contain directly any data or code that should be
created by separate operations. These data are implemented by the passive server objects, contributing to
look-and-feel aspects of embedded site. The objects are placed inside memory regions accessible to the
server routines. They take two different forms:
For static HTML pages and HTML page templates are represented as character strings
Embedded images and embedded Java applets are stored as byte arrays
The difference of storage form is to be explained by the method of object processing in response
composition phase of server operation. HTML pages are composed of printable characters only and
never contain a null character which is used uniquely as the marker of page(string) terminator. The same
assumption does not hold neither for images in .gif and .jpeg formats nor for applets. These objects can
(and do) contain nonprintable bytes, included null character and cannot be stored as character strings.
Their storage format should follow the pattern of a byte array of a known size.
From the external point of view, embedded pages and embedded images do not differ from regular
(nonembedded) ones and there is no reason for them not to be created by the tools which usually serve to
edit them (Microsoft Front Page, Netscape Composer, Microsoft Image Composer, etc.). Standard tools
create standard storage formats, compatible with the le system of the hosting platform. This is the main
problem in the creation of embedded sites, since the standard storage formats are not directly useful in
the construction of the server custom component. The output les produced by the tools have to be
transformed into modules that can be linked (statically or dynamically) with the code of other server
modules. Example of such a module, embedding an image in gif format, is shown in Figure 44.8.
2006 by Taylor & Francis Group, LLC
44-14 Embedded Systems Handbook
extern const unsigned char aautobull2_img[];
extern int aautobull2_img_length;
const unsigned char aautobul12_img[] = {
0x47, 0x49, 0x46, 0x38, 0x39, 0x61, 0x0c, 0x00, 0x0c, 0x00, 0xb3, 0xff, 0x00, 0xff, 0xff, 0x66
, 0xff, 0xff, 0x33, 0xff, 0xff, 0x00, 0xcc, 0xff, 0x00, 0xc0, 0xc0, 0xc0, 0x99, 0xff, 0x00, 0x99
, 0xcc, 0x00, 0x99, 0x99, 0x00, 0x99, 0x66, 0x00, 0x66, 0x99, 0x00, 0x66, 0x66, 0x00, 0x33, 0x66
, 0x00, 0x33, 0x33, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x21, 0xf9, 0x04
, 0x01, 0x00, 0x00, 0x04, 0x00, 0x2c, 0x00, 0x00, 0x00, 0x00, 0x0c, 0x00, 0x0c, 0x00, 0x00, 0x04
, 0x42, 0x90, 0xc8, 0x49, 0x6b, 0xbb, 0xb7, 0x92, 0x86, 0x42, 0x10, 0x47, 0x43, 0x35, 0x02, 0xe0
, 0x09, 0x83, 0x21, 0x6e, 0x88, 0x79, 0x0e, 0x83, 0x22, 0x92, 0x81, 0x39, 0x08, 0xc6, 0x91, 0xcc
, 0xde, 0x57, 0x14, 0xba, 0xdd, 0x46, 0x40, 0x84, 0x19, 0x14, 0x0b, 0xd9, 0xe6, 0x00, 0xlb, 0xlc
, 0x16, 0x50, 0xc6, 0xaa, 0x31, 0x28, 0x18, 0x12, 0x48, 0xe9, 0x48, 0xal, 0x53, 0x68, 0x2d, 0x98
, 0x95, 0x24, 0x02, 0x00, 0x3b};
int aautobull2_img_length = 149;
/* */
FIGURE 44.8 Code snippet implementing an embedded .gif image. (From J. Szymanski, Proceedings of WFCS 2000,
September 68, ISEP, Porto Portugal, 2000, pp. 301308. With permission. Copyright 2000 IEEE.)
extern char* transpassword_str;
extern int transpassword_str_length;
static const unsigned char transpassword_str_array[] = {
0x3c, 0x68, 0x74, 0x6d, 0x6c, 0x3e, 0x0a, 0x0a, 0x3c, 0x68, 0x65, 0x61, 0x64, 0x3e, 0x0a, 0x3c
, 0x74, 0x69, 0x74, 0x6c, 0x65, 0x3e, 0x50, 0x61, 0x73, 0x73, 0x77, 0x6f, 0x72, 0x64, 0x20, 0x45
, 0x2d, 0x2d, 0x3e, 0x3c, 0x2f, 0x66, 0x6f, 0x6e, 0x74, 0x3e, 0x3c, 0x2f, 0x62, 0x6f, 0x64, 0x79
, 0x3e, 0x0a, 0x3c, 0x2f, 0x68, 0x74, 0x6d, 0x6c, 0x3e, 0x0a, 0x00};
char* transpassword_str = (char*) (&transpassword_str_array);
int transpassword_str_length = 1546;
FIGURE 44.9 Code snippet representing an embedded HTML page.
The module contains two variables:
Byte array that holds the data normally placed within a disk le, the reference of this variable should
be passed as the parameter of the call to le creation routine.
Integer variable storing the length of the array; this variable is used by the routines that serve the
object data over the network.
Nearly the same format can be used to store HTML pages, with the difference, shown in Figure 44.9.
In this second module, the byte array is encapsulated within the module and only its reference, casted
to the type compatible with the character string type, is exported. It can also be seen that the nal null
character is placed at the end of the byte array. This enables the processing of the array in exactly the same
way as a character string is processed.
Transformation of standard storage formats to the modules shown above is done by simple programs
which read disk les and generate appropriate modules automatically. The principle of their operation is
shown in Figure 44.10.
The memory regions corresponding to the les are reserved and lled in at build time of the server code
by the operation of compiler and linker producing object code of appropriate modules. The same process
leads to the resolution of references of the regions held within data structures of repositories of VFS.
44.4.4 Implementation of Page Composition Routines
One of the specic features of web servers operating on diskless platforms concerns the method of serving
HTML objects. This problem is not so crucial in the case of servers placed on platforms equipped with
disks, where the basic service consists of copying page contents froma disk le to the communication link
(socket). A similar procedure can be used for the embedded platform, where an octet string or memory
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-15
embedpage
.html .c
embedbin
.gif .c
.jpeg
.class
.c
.c
FIGURE 44.10 Principle of transformation of passive objects into embedding modules.
region takes the role of a mass storage le. However, this procedure is of lesser interest for the majority of
embedded application. Static HTML pages have no appeal when used as front ends for control application.
Genuinely useful pages should incorporate information produced by the back-end application.
Two methods are possible to implement such a requirement:
HTML page is split into two objects: static frame (page template) and application-dependent data;
in the process of page serving the frame is merged with the data recovered dynamically from the
application.
HTML page is generated on the y by routines that compute page components one by one;
the attributes of page components are determined by the parameters of the routines, which are
application-dependent data.
The rst method is simpler to implement but in the case of complicated interfaces requires voluminous
page templates stored in large byte arrays. The second method reduces storage consumption but needs a
supplementary special software basis composed of routines implementing page computations. In the case
of FDWS, these routines are provided by package of online generation of HTML pages. Both methods are
briey described in the next sections.
44.4.4.1 Template-Based Dynamic Pages
Template-based generation of dynamic HTML involves separation of every page into two components:
static page template and dynamic data. Page generation involves merging both components before the
page is transmitted to the requesting client.
Page templates resemble a lot to regular pages and can be constructed using regular tools for HTML
page creation. In the embedded site structure, templates are stored in the same way as the embedded static
pages, that is, as byte arrays or character strings.
A page template contains all constant page elements, such as nonvarying text, images, applets, constant
hyperlinks, constant attributes of HTML tags, etc. Anything varying within the page is to be replaced by
a placeholder dynamic data representative.
Placeholders can replace virtually any element, be it a text, numeric data, or a tag attribute. Their
implementation strongly depends on the method of serving the page. In the case of this software, place-
holders implementation is based on the C language conversion specications, as used in format strings of
C functions fromthe family of printf (printf, sprintf, fprintf, etc.). This means that any numeric integer
2006 by Taylor & Francis Group, LLC
44-16 Embedded Systems Handbook
data are replaced by the specications %d, %i, %o, %u, %x, %X. Floating point numeric data are replaced
by the %f, %e, and %E specications. Strings are replaced by %s specication.
Examples
HTML page of the formshown below.
<HTML>
<HEAD>
<TITLE> Count of visitors </TITLE>
</HEAD>
<BODY>
Page of ALSTOMTECHNOLOGY was seen by 123456 visitors.
</BODY>
</HTML>
If two highlighted elements are variable data, the page template should have the formshown below.
<HTML>
<HEAD>
<TITLE> Count of visitors </TITLE>
</HEAD>
<BODY>
Page of %s was seen by %d visitors.
</BODY>
</HTML>
The origin of such representation lies in the type of implementation of the routine that merges the
template with the variable data. In the FDWS this function is implemented by the routine fromthe server
engine package. The routine signature is of the same type as that of fprintf or sprintf, namely:
int sockprintf(unsigned short socket_id, char
page_template, );
The piece of code that composes the page as above is shown in Figure 44.11. Page template is referenced
by vis_page_str pointer and ssock is the unique identier of the server socket. The method employed
char comp_name_str[32];
int vis_nr;
FIGURE 44.11 Code fragment which implements the example of embedded page generation.
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-17
requires that any dynamic data to be merged with page templates be transformed to a signed decimal,
a signed oating point number or to a string.
The proposed server interface provides a unique entry point to page composition code of all server
pages via the pointer send_page_routine. It is important to observe that page composition code sequences
for every page within the server should be accessible via this entry point. This constraint can be fullled
only if the embedded server customized component contains a routine which intercepts all requests for
HTML object services and dispatches them as specialized pieces of code. The recommended solution is
presented in the next chapter.
44.4.4.2 Dynamic Pages Generated on the Fly
In the dynamic page generation the static page frame is not employed and the page is produced as a result
of a series of routine calls that generate code strings representing successive components (HTML tags) of
the page. Routines can be called conditionally and can have parameters dependent on application data.
The data strings produced by the routines are either directly written to the socket or stored in a buffer that
is nally send to the socket.
44.4.5 Implementation of Script Activation Routines
The structure of the VFS imposed by the standard architecture of embedded server separates objects
provided at users disposal in two distinguished collections: passive objects (pages, applets, images)
and active objects (scripts). Standard application interface separates the conguration of passive object
composition and transmission from active object servicing. The operation of active object servicing is
done by the procedure which should be user provided and inserted to the servers structure via its reference.
As for page composition routine, this entry point should be unique for activation of every script solicited
by clients requests.
44.4.6 Implementation of Application Wrappers
There is no general method proposed for this part of customized component. Only vague recommenda-
tions, deducted from the mission of the basic application, can be provided. The wrapper modules serve
as an interface adaptor for the data transmitted between the server objects and application objects.
Requirements imposed on these modules from the server-side follow the method of data insertion
into dynamic pages. Any useful information extracted from the application should be transformed into a
scalar data having one of the basic types usable with the page templates, that is, integer numbers, oating
point numbers, and character strings. Complete requirements imposed on the interface software from the
application-side are impossible to determine due to the diversity of application types.
Some basic principles can however be identied. Data sent by the client are transported by the elements
of CGI interface included in the body of POST service. These elements are normally constructed as a
series of pairs (name, value). These data are automatically recovered from the client request PDU and
stored within a special memory region accessible via the modules from the server engine package. The
modules provide a set of functions that allow the programmer to recover and handle the requested data.
The basis of modules interface is built on one data type represented by the code in Figure 44.12 and two
functions which provide the application program with access to the memory region in reading and in
writing. As it can be seen, data in the region are identied by their alphanumeric identiers recovered
from the POST service request PDU.
The implementation of ve components of the embedded site described above provide all operations
necessary to put in action the front-end tier of the application. Now the operations should be activated in
a good sequence, initializations rst, followed by the main server loop activation.
The operation of initialization should set four variables exported by the server to the values that
reference passive object repository root, active objects repository root, page composition routine, and
script launching routine. An example of such an initialization routine is presented in Figure 44.13.
2006 by Taylor & Francis Group, LLC
44-18 Embedded Systems Handbook
typedef struct tdbtag_res{
int result;
union{
char* string;
int integer;
float real;} value;
}tdb_result
FIGURE 44.12 Data type supporting interface with client provided PDUs.
void init_VFS(void){
server_root = db_page_root_gen();
send_page_routine =send_page;
cgi_bin = db_cgi_bin_gen();
send_script_routine =send_script;
}
FIGURE 44.13 The interface of initialization routine.
int embedded_server_launcher(unsigned short service)
{
init_VFS();
init_application_wrapper();
return server_boot(service);
}
FIGURE 44.14 Server launching routine.
This piece of code, a series of four simple assignments, assumes that the application programmer has
provided four routines: db_page_root_gen, which generates passive object repository, db_cgi_bin_gen,
which generates active object repository, send_page, which implements page composition method
send_script, which implements a method of script launching. The two functions db_page_root_gen()
and db_cgi_bin_gen(), which generate VFS repositories, are called and executed by the initialization
routine. The result of their execution is the immediate creation of VFS tree structures and assignment
of their references to the pointers server_root and cgi_bin. This is not the case of page composition
(send_page) and script activation (send_script) routines for which only references are affected to the
server API pointers. In the presented solution, it is assumed that these four functions are exported from
the modules that implement them.
Now that the initialization routine is implemented, the code of server process can be proposed. The
routine in Figure 44.14 is an example of such a code. It is a simple piece of code which undertakes two ini-
tialization functions: that of server structures via the call to init_VFS() routine and application interface via
init_application_wrapper() routine before launching the server loop via the call to server_boot(service)
routine. The only parameter of this routine signies the number of TCP/IP port on which server expects
to receive clients requests.
44.4.7 Putting Pieces Together
The preceding sections provide all necessary details concerning the creation of custom elements of the
content of an embedded server. It is important to show how the steps of this development process are
sequenced. This sequence of steps is presented by the graph shown in Figure 44.15. The process presented
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-19
.html
.htm
.gif
.jpeg
.class
Look-and-feel
VFS
conf
init
Organisation
Links with application
scripts
Appl.
wrapper
.html
.htm
Text
editor
embedpage
embedbin
compilVFS
.c
.c
.c
C/C++
compiler
.o
HTTPreuse
Server
driver
Linker
Loadable
object
file
Application
FIGURE 44.15 Development process for an embedded server application.
below shows how to obtain a nal result, which is a loadable object code le from the primitive elements
which are grouped in three categories:
1. Collection of passive elements (pages, images, applets).
2. VFS creation and management: page composition and script activation.
3. Interface with organic application, initialization code, and CGI scripts routines.
These categories of application elements are developed with the means which are appropriate to the nature
of each element. This signies that:
HTML pages were created with an HTML editor.
Gif and Jpeg objects were developed with image editing tools and devices.
Java Applets and Beans were developed with the standard tools included in JDK.
C/C++ code implementing application wrapper software and routines playing the role of
CGI scripts should be developed with the suit of tools (editor/compiler/debugger/loader) used
for the hosting platform.
In order to create the integrated component loadable to the hosting platform, the site components of
should be preliminarily transformed from their initial storage format to a common format which is the
collection of compilable C-coded modules. The preceding section proposed the method of transformation
of passive components to C-coded modules by means of specialized processors. There is no problem for
the modules originally coded in C: that is, for initialization code, script routines, and application wrapper.
They should be designed and coded in accordance with the usual principles of efcient implementation.
The method of development of VFS component poses the biggest challenge in the development of
embedded site. VFS design is straightforward and relatively easy. Also its manual implementation seems
to be a simple chain of repetitive actions. It can be directly deducted fromthe graphical site representation
(like the one in Figure 44.7). The implementation process is so regular that it can be easily automated;
2006 by Taylor & Francis Group, LLC
44-20 Embedded Systems Handbook
HART/FIP gateway
HART channels
WorldFIP
Protocol gateway
WorldFIP
FIGURE 44.16 Automation cell with a HART/FIP gateway and its conguration console.
that is, the repository tree can be transformed to a sequence of procedure calls by a processor a VFS
compiler. This compiler transforms the textual description of the VFS repository structures into appro-
priate C-coded modules that implement all four functions necessary to initialize the server API. More
detailed description of the compilers operation is placed in the Appendix.
44.5 Example of Site Implementation in a
HART Protocol Gateway
In order to illustrate a real-life application, a site embedded in an industrial device is presented. The chosen
device is the instrumentation gateway to process control cells. Its role consists in linking the network of
sensors and actuators with the process computers and PLCs. To do so the gateway collects the information
from the instruments connected to it via HART instrumentation protocol and transfers it to automation
cells via WorldFIP protocol. Each gateway can provide connections up to eight HART channels. In this
example one of two compositions of channels is possible: eight input channels or six input channels and
two outputs.
Operation of the gateway is controlled by a collection of parameters that tune its performance to the
needs of given installation. Through the set of parameters one can set-up characteristics of HART and
WorldFIP protocols, modify certain translation parameters, gateway operation modes, etc. Each HART
channel can also be tuned to the type of HART transmitter connected to it.
All these tuning operations are usually done by a special purpose device console conguration (see
schematic in Figure 44.16). The console communicates with gateway via a proprietary protocol based on
UDP/IP transport.
The idea of the application described below is to replace the special purpose tuning console by a
standard web browser and implement all tuning functions on the basis of three-tier architecture described
in the introductory section of this account. All the functions related to the manmachine interface and to
the tuning operations will be implemented by the front-end tier based on embedded HTTP server. The
architecture of such an application would be then transformed to the one shown in Figure 44.17.
The server embedded in the gateway together with a standard web browser should replace the operation
of the tuning console. For this reason, it should give the user access to all necessary functions implemented
by the special purpose conguration console. The set of the functions is described below.
The console gives access to the parameters of the gateway via an appropriate screen (Table 44.2). The
front-end server placed within the gateway should provide access to the same set of parameters, respecting
read only and read/write mode.
The console-provides also the means of monitoring the status of all HART channels in real time
by displaying the nature of the transmitter connected (HART type/non-HART type), its signaling
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-21
TCP/IP router
HART/FIP gateway
HART Channels
WorldFIP
WWW browser
Ethernet
FIGURE 44.17 Gateway parameter tuning functions realized by the three-tier architecture.
TABLE 44.2 Access to Gateway Parameters
Type of parameters Parameter name Access
Identication Tag Read/write
Product name Read only
Manufacturer name Read only
Software version Read only
Hardware properties Power supply type Read only
I/O mode Read only
WorldFIP mediumtype Read only
WorldFIP mediummode Read only
WorldFIP bit rate Read only
WorldFIP protocol Refreshment time Read/write
Promptness Read/write
HART protocol Timeout Read/write
No. of retries Read/write
Processing parameters Measure format Read/write
Operation mode Read/write
Antialiasing lter Read/write
Conguration version Read/write
Conguration revision Read/write
(type, manufacturer name) and its status (active/inactive). The same function is to be placed in the
front-end server of the protocol gateway. The gateway provides the possibility of parameter tuning of
every active HART channel. The appropriate screen gives access to the set of channel parameters as in
Table 44.3. Access to all the devices should be provided by the front-end server of the gateway.
The operation of ALSPA P80H console is in principle oriented toward parameter-tuning functions. In
some cases, however, it gives the possibility of direct monitoring of process variables by gaining access to
transmitter primary variable. The same function is required from the server embedded in the protocol
gateway.
44.5.1 Structure of the Site Embedded in the Protocol Gateway
The architecture of the server embedded in the gateway is strongly inuenced by the functional require-
ments presented above. It is composed of a collection of HTML pages, corresponding to the console
screens, which are organized in ve directories. Three of the ve directories group the pages according to
the functional criterion: there is one directory (di80_parameters) holding the pages provided for access
2006 by Taylor & Francis Group, LLC
44-22 Embedded Systems Handbook
TABLE 44.3 Access to Channel Parameters
Type of parameters Parameter name Access
Identication Manufacturer name Read only
Transmitter model Read only
Transmitter tag Read/write
Descriptor Read/write
HART unique identier Read only
Cell limits Upper cell limit Read only
Lower cell limit Read only
Minimum span Read only
Transmitter conguration Damping factor Read/write
Transfer function Read/write
Primary variable units Read/write
Lower measurement range Read/write
Upper measurement range Read/write
Home Page
images
16 Virtual files
(.gif and .jpeg)
_fpclass
3 Virtual files
(.class)
transmitters
36 Virtual files
(.htm)
measures
8 Virtual files
(.htm)
di80_parameters
6 Virtual files
(.htm)
Passive object
repository
root
FIGURE 44.18 Passive object repository of the embedded server.
to the gateway parameters, one (transmitters) grouping the pages accessing active channel list together
with channel-parameter-tuning pages and one (measures) that groups the pages which monitor channel
measures. Two remaining directories group the pages according to structural criterion: one (images)
is provided to store all embedded images included within the pages, the other (_fpclass) contains all
embedded Java applets.
This architecture is presented by the graph of the passive object directory shown in Figure 44.18.
The VFS of the server contains also another repository which contains three scripting routines. These
script les are placed directly under the repository root (see Figure 44.19).
Totally, the server contains 70 embedded les collected within 5 directories.
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-23
CGI script
repository root
3 Virtual files
(.cgi)
FIGURE 44.19 CGI (script) repository of the embedded server.
FIGURE 44.20 Home Page of the embedded server.
44.5.2 Detailed Implementation of Principal Functions
The sections below present all important pages that give the user access to functions implemented by the
front-end server. All the pages were developed using Microsoft Front Page HTML editor and incorporate
graphical elements provided by this tool (page background, fonts, banners, buttons, etc.).
44.5.3 Access to Site Home Page
The access to the site is obtained via the default page presented in Figure 44.20. This page has rather
informative character; it displays the photo of the gateway and the list of the principal functions imple-
mented by the embedded server. The direct access to the functions can be obtained via three buttons
placed above the photo.
2006 by Taylor & Francis Group, LLC
44-24 Embedded Systems Handbook
FIGURE 44.21 Page giving access to the parameters of the gateway in read only mode.
44.5.4 Access to Parameters of the Gateway
The rst and the third button of the welcome page of server gives the access to two server pages that give the
user the possibility of reading parameters (Parameters button) or modifying parameters (Set Parameters
button). Both buttons link the default page with two pages residing in the directory transmitters.
The page presented in Figure 44.21 gives the access to the parameters of the gateway. It is implemented as
a frame of three panes, that is, its implementation requires four embedded les. The upper pane identies
the screen via the large title banner realized as an animated gif image. Left lower pane contains the menu
composed of ve hyperlinks that provide the convenient access to ve groups of gateway parameters. The
parameters, distributed between ve tables, are displayed in the right lower pane in the form of tables.
This pane is too small to display all ve tables at the same time. This explains the need for menu pane that
avoids using scrollbars for access to parameter tables.
Parameter modication is implemented by the page presented in Figure 44.22. This page is separated
from the previously described page for security reasons. It is to be used in case one wants to change some
parameter values. He will then enter into a transaction with the server resident components which should
nish up with the modication of a chosen parameter or parameter set.
The set of parameters displayed in this page is narrower than the one displayed in the page previously
described. This is normal since only modiable parameters are displayed on the screen.
The page editing environment provides the facility to add to the page some form-element controls
coded automatically in JavaScript. This concerns obligations to ll in some cases (password see
Section 44.5.5), keep introduced value within a given interval of values (e.g., Retries eld value should be
kept between 1 and 6), respect the format of some of information input to certain elements (e.g., only
gures are accepted in timeout, retries, refreshment, etc. cases).
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-25
FIGURE 44.22 Gateway parameter setting form.
The contents of the form elements lled in by the user willing to modify corresponding gateway
parameters is sent to the server using POST service of HTTP protocol when the SUBMIT button of the
page is pressed. The modication of the parameters and update of the browsers screen with the new
values is implemented by a specialized routine invoked via one of the CGI scripts.
44.5.5 Access to Active Channel List
The middle button in the Home Page of the embedded site gives access to the page that displays the list
of all active channels connected to the gateway at given time. The status of the channels is described in
the tabular form, as shown in Figure 44.23. Each channel corresponds to a row in the table. Each row is
composed of four elements that indicate the high-level descriptions of connected channels.
The position of the channel descriptor in the table corresponds to the channel number. Third column
of the table indicates transmitter status. The remaining columns of each line are lled in only if the third
one is set to ACTIVE. In such a case, the rst column contains the name of the transmitter manufacturer
(if recognized), the second the device type, and the fourth the unique HART identier (normalized
by HART protocol description). This identier serves as the link to the page describing HART channel
parameters.
In the case when the status of the channel is recognized as non-HART (analog 4/20 mA current loop)
or NO CURRENT (current loop not connected), the three signicant columns of such a row are empty.
44.5.6 Access to Channel Parameters
This page provides the possibility to display the parameters of an active HART channel. The page that
interfaces the user with this facility is presented in Figure 44.24. The page is organized in the form of a
frame composed of three panes: heading pane with the title banner, menu pane giving access to groups of
parameters, and parameter pane accessed either via menus or via scroll bar of browsers window.
2006 by Taylor & Francis Group, LLC
44-26 Embedded Systems Handbook
FIGURE 44.23 Table displaying channel status.
FIGURE 44.24 Page giving access to HART channel parameters.
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-27
The page externally resembles the one that gives access to gateway parameters in read only mode
(three panes, one of it accessed via menus placed in the other). Functionally, there are two fundamental
differences between them.
First, the channel parameter page gives access to all channel parameters respecting their mode. Read
only parameters are displayed as plain text sections while read/write parameters correspond to the active
form entries.
Second, the page design ensures that only potentially modiable parameter values can be sent back to
the server while button SUBMIT is pushed. The process of parameter update is under the control of a
CGI (script) routine.
44.5.7 Monitoring of Principal Channel Measure
All the above-described functions are fully oriented toward parameter tuning. Service offered concerned
either global properties of the HART/FIP converter or acted on characteristics of an individual channel.
The quality of service was comparable with the one offered by the original conguration console.
The function of recovery of primary measure of a HART channel described in this section differs from
the others both in the nature and quality of service offered. Functionally, it is no longer a parameter-tuning
operation. Data handled in this operation do not concern the status of the channel itself but reect the
evolution of the phenomenon measured by the channel transmitter. For this reason, doing this operation
from time to time, in irregular time points, and displaying numerical values on the screen does not provide
much valuable information. Unfortunately, this is the only mode in which this function can be exploited
via an ordinary-tuning console.
The three-tier architecture enables a totally different implementation. The monitoring function is
implemented by an HTML page accessible from the channel-tuning page via the link primary measure
(see Figure 44.24). This page contains a Java applet that does more than displaying a numerical value. Its
operation involves periodically fetching channel measures and displaying them in the form of a trend curve
(Figure 44.25). Each curve point corresponds to a complete transaction between the applet and the server.
The transaction is initiated by the HTTP request that activates an embedded script routine (servlet), which
elaborates the primary measure of the channel by activating an appropriate HART command. Measure
values obtained by execution of this command are transported via an HTTP response (tunneled in HTTP
PDU). This solution ensures that the communication remains operational even in the case when the
browser machine executing the applet is connected outside of the systems security barrier.
44.5.8 Access Control and Authentication
HTTP-based systems are on principle open and accessible to any client that knows the URL of the server.
This fact makes the system prone to unauthorized accesses and needs the implementation of access control
functions.
In the case of this application, the protection is implemented in two ways:
Natural authentication mechanism exploiting HTTP standard feature; on the client side this
mechanism is built into any standard web browser.
Supplementary protection by password entry, which is built-in into forms and handled by a
specialized script.
The rst mechanism is based on the procedure of access control standard for HTTP protocol. This
procedure is based on so-called authentication challenge transaction. According to the standards of HTTP
1.0 protocol any Universal Resource Locator (URL) can point to a server resource that is accessible to a
restricted set of users, each being identied by his name and his password. When the browser accesses such
a resource for the rst time since its activation, the server will produce the response initiating the challenge
(declaration of unauthorized access see [2]). The browser will react to this response by displaying a dialog
box as in Figure 44.26.
2006 by Taylor & Francis Group, LLC
44-28 Embedded Systems Handbook
FIGURE 44.25 Applet monitoring primary variable in a HART channel.
FIGURE 44.26 Authentication box for the French version of Internet Explorer.
The user is expected to ll in both text zones of the box and press the button OK. This operation
will repeat the previously executed request with a PDU option that presents users credentials to the
server engine. The credentials contain the pair of user name + user password, encoded according to the
algorithm corresponding to the authentication mode. In the most popular authentication mode, called
Basic Authentication, the credentials are coded according to so-called base64 encoding. If on the server
side the pair user name:password corresponds to the contents of one of the authentication records
attached to the resource, the response PDU will contain the resource contents. In the opposite case,
the authentication process fails, the access to the resource is denied, and the authentication transaction
reiterated.
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-29
FIGURE 44.27 Message from the script that forces the user to ll in the password box.
FIGURE 44.28 Page signaling bad authentication password.
It is worth noting that the authentication transaction for a given subtree is done only once per client
session. This means that for a protected subtree of aVFS repository, the challenge will take place only during
the rst request. Any request that follows will automatically contain the authentication information.
Basic authentication mode is not a reliable protection against unauthorized accesses since the coding
scheme of credentials is simplistic and can be easily overcome. There are more powerful access control
schemes, such as the one called Digest Authentication in which the decoding of credentials is more
complex and provides higher level of security against undesirable intrusions.
In any case, for the protected access, an authentication procedure based on HTPP standard features is
not restrictive enough since once identied, the client station can operate without further authentication
while accessing a given URL. If the station in question passes under the control of unauthorized user, the
server will still answer positively to clients request, since the credentials are memorized and kept ready for
each subsequent URL access till the end of the clients operation. To avoid any problem with this mode of
authentication another model, operating on authentication per request basis is to be used.
This is implemented by insertion into some pages (forms) a supplementary text zone of password
type, which requires to be lled each time the form is submitted. The password is veried at each activation
of associated script. Form page should be edited in a manner that constrains the submission of the form
on sending the password. This is frequently done by a page embedded script that blocks the submission
process when the password is not provided and prompts the user by a warning message (see Figure 44.27).
Submission of the page with the wrong password provokes the contents of the form to be rejected by the
server which sends back the refusal page, as in Figure 44.28.
44.5.9 Application Wrapper
All the components described above contribute to implementation of user interface to the organic applic-
ation of the protocol gateway. They rely on the data provided through the basic application interface but
2006 by Taylor & Francis Group, LLC
44-30 Embedded Systems Handbook
Tag Descriptor
HART
timeout
HART
retries
Filter
constant
Operation
mode
Version Revision
di80_get_parameters
di80_set_parameters
get_manufacturer
set_manufacturer
get_io_mode
set_io_mode
B
a
s
i
c
a
p
p
l
i
c
a
t
i
o
n
S
e
r
v
e
r
p
a
g
e
s
a
n
d
s
c
r
i
p
t
s
FIGURE 44.29 Part of application wrapper providing access to the gateway parameters.
they impose some requirements on the formats of these data:
Access to the gateway parameters, to some of them in read only mode, on the individual basis.
Access to the list of active HART channels, which should be updated before being served via
an HTML.
Access to channel parameters in reading and in writing on the individual basis.
It is important that the values read from the application interface for all three types of data listed above
should have either the form of scalar integer values, scalar oating point values, or zero-ended character
strings.
The original interface to the gateway API does not fulll these requirements. In its original version it
provides the following functions:
Global access to all parameters of the gateway, as well as in reading (access routines return packets
of coded values in read mode and accept only records of coded values in write mode).
Global access to channel status data; it returns a packet of coded data in read mode.
Low-level mode of access to HART channels via blocks of octets specically coded according to
HART protocol standards.
There is denitely a need for a supplementary adaptor module that transforms low-level data produced
by the application tier into structured formats adapted to the mode of operation of the server. This
module the application wrapper is split into two parts:
Part providing access to the parameters, both in reading and in writing.
Part managing access to channels this part groups the function of displaying the channel status
table and the function of accessing channel-specic data via HART protocol.
The part providing access to the gateway parameters is built around the data structure representing the
gateway parameters as a persistent record.
The record is updated either by the server scripts modifying individual elds or by the call to the API
routines. The server side routines manipulate data as individual scalar values and for this reason need
access (both in reading and in writing) to the individual items of the record. The application side operates
on the basis of global access to the data, that is, the gateway parameters are all set and read at once, by the
activation of one of two interface routines (see Figure 44.29).
The second part of the application wrapper provides access to eight HART channels connected to the
protocol gateway. This part is structured around a table of eight records that represent eight potentially
active channels. Each record groups the parameters of an individual HART channel (see Figure 44.30).
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-31
send_sensor_command
B
a
s
i
c
a
p
p
l
i
c
a
t
i
o
n
S
e
r
v
e
r
p
a
g
e
s
a
n
d
s
c
r
i
p
t
s
Channel 1
.
.
.
.
.
.
Channel 8
di80_get_instrumentlist
HART
channel
buffer
get_transmitters
get_sensor_tag
set_sensor_tag_and_desc
FIGURE 44.30 Access to HART channel parameters.
transmitters
Home
Page
di80_params
measures
images _fpclass
FIGURE 44.31 Summary of conguration of the site embedded in the protocol gateway.
In general, the update of all parameters in this table is obtained via service requests sent to HART trans-
mitters via eight channels of the gateway. Fromthe user point of view not all the parameters are accessible
by the same means. Parameters that dene channel status and transmitter identity are obtained by one
global command that updates one part of each record of all the table simultaneously. Other parameters
are grouped into collections that correspond to one aspect of transmitter operation (cell conguration,
primary measure characteristics, measure units, etc.). Access to each parameter collection is ruled by one
command in writing and one command in reading. Collections are not disjoint and for this reason access
to a parameters can be done by different commands.
44.6 Architecture Summary and Test Case Description for the
Embedded Server
44.6.1 Embedded Site Architecture
The overview of conguration of the server embedded in protocol gateway is presented in Figure 44.31.
The diagrampresented in this gure shows the functional relationships among different site components.
The components are grouped in ve directories, as shown above.
2006 by Taylor & Francis Group, LLC
44-32 Embedded Systems Handbook
The entry point of the embedded web of server pages is the HTML page named in the diagram above
Home Page. This page contains the hypertext links to the objects placed in the directories transmitters
and di80_params that represent the web domains responsible for browsing, respectively, HART chan-
nels and the gateway parameters. Server objects placed in the directory measures are referenced by the
links embedded in the objects in the directory transmitters and are in charge of graphical representation
of channel measures.
Objects placed in the directory images (embedded images) and in the directory _fpclass (embedded
applets) have different relationships with respect to other site objects. They are incorporated to the server
pages rather than linked to them and from the functional point of view play the auxiliary role in the site
operation.
The contents of all ve directories are presented below.
44.6.1.1 Directory Transmitters
This directory contains the set of embedded HTML pages that enable browsing of parameters of
active HART transmitters. The natural entry point to this realm of the embedded site is the page
transmitter_list.htm that represents the status of eight HART channels, which can potentially be con-
nected to eight HART transmitters. The interface to each potentially active channel is implemented by
a collection of four pages. Potentially, there are eight groups of pages, one per channel, but only pages;
corresponding to active channels can be displayed.
The group of pages accessing parameters of a channel is organized according to the following pattern
(see Figure 44.32):
Top-level channel front page of frame type incorporating three component HTML pages
(HART_sensor0.htmHART_sensor7.htm one per channel):
(a) Banner page set.html (shared by all channels),
(b) Menu page (menu0.htmmenu7.htm one per channel)
(c) Parameter browsing form (sensor0.htmsensor7.htm one per channel)
Password error signaling page (password0.htmpassword7.htm one per channel)
Pages contain links to other directories (see Figure 44.33) and incorporate elements from other directories.
There is no direct links from one channel browsing page set to other channels. All links should pass via
transmitter_list.htmpage.
44.6.1.2 Directory di80_params
This directory contains the set of embedded HTML pages that enable browsing of parameters of the
gateway. The pages are organized according to the diagram shown in Figure 44.33. The pages provide
access to two functions:
Displaying all gateway parameters in read only mode
Modifying some of the parameters
The rst function is accessible via the collection of four pages:
Top level, frame type page get_di80_params.htmthat wraps three other pages located in the frame
panes.
Page upper_page.htmcontaining banners and links to other domains of the site.
Menu page left_page.htm that supports direct selection of parameter groups; this page contains
direct links to each of the ve tables that group gateway parameters.
Page tables.htmthat display actual values of parameters.
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-33
transmitter_list.htm
HART_sensor0.htm
menu0.htm
set.htm
sensor0.htm
password_0.htm
Channel 0
Home Page
measure0.htm
HART_sensor7.htm
menu7.htm
sensor7.htm
password_7.htm
Channel 7
measure7.htm
di80_params
FIGURE 44.32 Overview of pages placed in the directory transmitters.
get_di80_params.htm
Home Page
set_di80.htm
upper_page.htm
left_page.htm
tables.htm
transmitter_list.htm
password.htm
FIGURE 44.33 Overview of pages placed in the directory di80_params.
2006 by Taylor & Francis Group, LLC
44-34 Embedded Systems Handbook
measure0.htm
measure1.htm
measure6.htm
measure7.htm
HART_sensor0.htm
HART_sensor1.htm
HART_sensor6.htm
HART_sensor7.htm
HART Trend.class
FIGURE 44.34 Pages organized in the directory measures.
The second function is realized by two pages:
Page set_di80.htm displays all modiable parameters of gateway. The page is organized as a
parameter browsing formwhich, while submitted, activates a script routine in charge of updating
the parameters.
Page password.htmsignals password error in the submission of page set_di80.htm.
Each of the two functions is attained by a separate entry point. It is possible to leave smoothly the functions
via the links to Home Page of the server and to the transmitter list page.
44.6.1.3 Directory Measures
The directory measures groups eight pages that correspond to the function of monitoring the values
of primary measure for every HART channel. The pages have no other functions than wrapping the
appropriate applets and providing exit links back to transmitter parameters page. All eight pages are
independent; there is no link among them, each is attained via a separate entry point, and each has its
own exit link (see Figure 44.34).
44.6.1.4 Directory _fpclass
The directory _fpclass is the server domain whose name and existence is inherited fromthe server design
pattern suggested by the site development tool. It contains three applets: two of them, proposed by the
tool, implement active buttons that link pages.
The third applet implements the display of signal trends and is used to monitor primary measures
recovered from active HART channels. The design of this applet is nontrivial since it actively samples
channel measures by activating a script routine on the server side. The routine is in charge of getting
channel measure via an appropriate HART command and in transferring it to the applet wrapped in an
HTTP response PDU. Thus the data exchange between the applet and the server script passes via an HTTP
tunnel and can easily pass by the security barriers of the site (rewalls).
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-35
Intranet
CCD/Clamart
Ethernet/WorldFIP
router
HART/FIP gateway
Rosemount 3051C
pressure transmitter
Rsistance
Channel 0
Channel 5
Channel 6
Channel 7
WorldFIP
Rosemount3144
temperature transmitter
Http client
(browser)
Fisher DVC 5000
valve
FIGURE 44.35 Schematic of the test platform.
44.6.1.5 Directory Images
The directory images is a at collection of 16 images used by other pages. There is no links between objects
in this directory. Some of the images are used by many different pages placed in different server domains.
The idea of such an organization of images is inherited fromthe site development tool.
44.6.2 Test Description
44.6.2.1 Test Platform
The summary of architecture presented above serves as a reference for the description of test scenarios
described in this section. The system in which the tests were done is represented by the schematic from
the Figure 44.35. The embedded server is placed within the HART/FIP gateway connected to a segment of
1 Mbit/sec twisted pair, dual-mediumWorldFIP eldbus. Data transfers over the segment are organized
by the bus arbitrator operating with the basic cycle of 20 msec.
The HARTinterface of the gateway is congured inthe mode 6 inputs 2 outputs. The HARTchannels
are connected as follows:
Channel 0: active connected to Rosemount 3051C pressure transmitter
Channels 1 to 4: empty
Channel 5: active connected to Rosemount 3144 temperature transmitter
Channel 6: active connected to Fishers DVD 5000 valve
Channel 7: simulated active by a resistance enabling closed-current loop
The WorldFIP segment is connected to the CCDEthernet-based Intranet via the router node implemented
on Compaq Deskpro computer under Windows NT4. Routing of TCP/IP trafc is done by the native
TCP/IP protocol stack of Windows NT4 that works with the standard Ethernet PC board on the Intranet
side and with the WorldFIP CC121 board controlled by WorldFIP NDIS driver.
Test scenarios are executed using a standard Internet browser connected to the Intranet. Two most
popular Internet browsers were used for the tests: MS Internet Explorer V5 and Netscape Navigator V4.5.
2006 by Taylor & Francis Group, LLC
44-36 Embedded Systems Handbook
44.6.3 Test Scenarios
This section describes nine test scenarios that comprise a necessary and sufcient set of operations which
should prove the correctness of servers operation. The tests activate all phases of servers life cycle and
under trial all designed functions embedded within the protocol gateway.
Some functions of the server are tested by almost all test scenarios, except the rst one. This concerns
the generic functions of the server loop that are: request parsing, requested object search, and retrieval,
response generation.
Another generic test concerns the process of merging static page templates with dynamically retrieved
process data. Almost all HTML pages in the server structure are obtained by this operation, except Home
Page. These generic test objectives are not repeated in the description of test scenarios below.
44.6.3.1 Execution of Server Initialization Phase
The objective of this scenario is to test smooth initialization of servers data structures and creation
of access point to the network. To execute the scenario launch the server process and observe servers
console. No error messages means that the software was executed till the beginning of the server loop:
that is, servers execution thread was created, VFS tree is instantiated, application wrapper is ready to
communicate with the application, server socket is created, and the server thread is waiting for clients
connections.
Possible erroneous reactions are: system exception on no available heap space or server thread error
message on impossibility of creation of servers passive socket. Only these two fatal errors can be put into
evidence by this scenario. Absence of any error messages does not prove its correct operation. To be sure
that all the above operations are correctly executed it is necessary to pass by all eight scenarios.
44.6.3.2 Server Access via Home Page
The objective of this test is to prove denitely the correct execution of some initial operations and show
that page composer works on plain HTML pages with no data taken fromthe process interface.
In this scenario the server machine should be called by the test browser and the server should answer by
sending the Home Page. Received page elements should be examined visually in order to detect a visible
defect (incoherent text, jammed background, distorted images, applets that fail to operate).
Move the mouse over the buttons of the main menu. The image should change taking the form of
a selected button. Click on a button, this should activate a hyperlink to one of the three pages of the site.
It is necessary to test all three buttons fromthe page.
44.6.3.3 Authentication
The objective of this test is to prove the correct activation of challenge transaction on protected realms.
In this server, the access to realms di80_params and transmitters requires the client to present the
authentication credentials (registered user name and correct password). The access to the Home Page of
the server is not protected but any of the three possible links leading fromthis page to three destinations
should trigger the authentication request. To start the test sequence, rst activate any of three links, for
example, the one to the transmitter list. This should force the server to return the response that activates
the dialog box onthe clients screenand forces the user to provide its credentials. Submissionof credentials:
username =hartp and password =alstom should open the access to the page with the list. Fromnow
on any path within the servers structure should be opened and no authentication should be requested
any longer till the execution of the client browser is stopped.
If the protection on all three paths fromthe Home Page is to be tested, the browser should be restarted
before the tests of each path, in order to make it lose the credentials entered during the rst authentication.
Otherwise, the rst authentication will open the access for all subsequent links from the Home Page to
the realms of restricted access.
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-37
44.6.3.4 Page of di80_params in Read Only Mode
The objective of this test involves testing the part of the application wrapper module that retrieves the
parameters.
Activation of a link to this part of the server should cause the reception of a three pane frame type
page with the ve parameter pages in the lower right pane and ve links in the lower left pane. Examine
all page panes visually. The displayed image should be regular with no apparent defects, all tables in
the parameter pane should be lled in with coherent values. All links in the menu pane should move an
appropriate table to the top of parameter pane.
The unique upper pane shouldcontainthe page banner andthree links: toHome Page, tothe transmitter
list page, and to the page that enables parameter modication. All links in this pane should be active and
should lead to the expected functions.
This page should be refresh every 60 sec.
44.6.3.5 Page of Gateway Parameters in Modication Mode
The objective of this test is triple: primarily it tests the script that analyzes the set of data provided by the
HTTP POST service, secondarily it tests the application wrapper function that modies the parameters
of the gateway, nally it tests the technique of dynamic generation of some sophisticated parts of HTML
pages such as pop-up menus, checkboxes, and groups of radio buttons.
The page that corresponds to the function contains a form that is composed of ve groups of items,
each formitemcorresponding to a modiable gateway parameter. In the part of the scenario concerning
the script activated by the POST request, the following test cases are incorporated:
Test of functions that control the coherence of formats of parameters entered into the formitems.
Test of efciency of parameter modications.
Test of control of access protection by password.
The external links to other server functions should also be tested.
44.6.3.6 Retrieval of List of Active Channels
The objective of this test is to verify the part of application wrapper module that is responsible of providing
the global status of eight HARTchannels connected to the gateway and in testing the technique of dynamic
generation of large context dependent sections of HTML page.
As a result of the request of the page with the channel list, all three HART transmitters should be iden-
tied and described. The channel with the resistance simulating closed-current loop should be declared
as Non-HART device. All four empty channels should be labeled as inaccessible. Links giving access to
individual transmitter pages should be displayed as active. Disconnection of a channel should be seen in
the table after the page update.
The test scenario of this page includes the test of effectiveness of all the links in this page: those to
channel descriptions and those to other server functions.
44.6.3.7 Access to Channel Parameters
The primary objective of this test is to verify the part of application wrapper module that is in charge of
controlling the HART channel parameters. It also tests the correct operation of the script that coordinates
the process of channel parameter handling as well as the dynamic generationof pop-upmenus, checkboxes,
and radio buttons. The nature of tests done in this scenario resembles to those described for the gateway
parameter modication, since one of the parts of this page has a form incorporated. Test cases for
this page also concern the coherence of data entered into the form and function of protection by the
password.
2006 by Taylor & Francis Group, LLC
44-38 Embedded Systems Handbook
The test for the page takes into account the verication of external links to Home Page, transmitter list,
and to parameter setting function.
44.6.3.8 Trend Applet
The objective of this test is conceptually more sophisticated than other scenarios. It puts under verication
the data exchange between a Java applet and a server script based on the principle of data tunneling via
HTTP protocol PDUs.
The test scenario includes activation of this page and verication of the following cases:
Applet initialization and start-up
Trend refreshment
Applet stop phase
Applet restart
Efciency of the link back to channel description
This test should be done for every active channel.
44.6.3.9 Call of a Nonexistent Servers Object
The objective of this test is toverify the correct reactionof the server tothe request concerning a nonexistent
object.
To initiate this test the browser should request a nonexistent server object. This can be done when the
objects URL is manually entered to the browsers address box. The correct reaction of the server should
be the transmission of the page signaling the absence of the requested object.
44.7 Summing Up
This account was conceived as a complement of the reference manual describing the FDWS software
modules. It assists the developers using the FDWS software library in designing clean and efcient imple-
mentations of embedded servers. The structure of the document is organized in a manner in which it
can be used as a self-standing guide to comprehension of the technique deployed by the FDWS soft-
ware. It contains the presentation of principle of operations of embedded servers, sketches the basis
of the technology, and leads the designer through a real-life example toward a solution of a concrete
design case.
The major mission of the document is to facilitate the FDWS module library, which contains a lot of
routines and is not so easily accessible without a guide. The presentation of the technology is voluntar-
ily conceived as being platform independent (no links to a development tool or to a target platform).
The reason for this is that the basic idea of the software is its platform independence. With the same
facility, embedded servers based on FDWS technology can be incorporated to a PLC, an I/O nest, or an
industrial PC.
The developed software library constitutes the rst step on the way to the complete and univer-
sally applicable technology. A supplementary effort is necessary in order to increase the softwares
utility. The biggest progress is needed in the domain of conguration tool (or tool suit) that
would signicantly increase the users comfort in the process of implementation of embedded HTTP
servers.
References
[1] Jeremy Bentham, TCP/IP Lean: Web Servers for Embedded Systems, C MP Books, May 2002.
[2] T. Berners-Lee, R. Fielding, and H. Frystyk, Hypertext Transfer Protocol HTTP/1.0, Network
Working Group, RFC 1945.
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-39
44.A1 Appendix: Conguration of VFS
44.A1.1 Programming of VFS Component
One of the most important components of an embedded server structure is the VFS. This component
hosts the central data structure which manages embedded objects being the targets of remote client
requests.
Building such a component consists, in dynamic generation, of adequate data structures that are
organized in the formof repository trees.
It is to be recalled that the VFS is composed of three basic elements, realized in general by disjoint
modules:
1. Tree-like data structure whose role is equivalent to one of the le system management tables.
Through this data structure called repository skeleton the user can nd, read, and modify the les
embedded in the host environment.
2. Collection of routines which process this data structure.
3. Collection of memory regions storing the embedded les.
The generation process of repository skeletons is programmed as a sequence of calls of specialized routines
which create and link together nodes of repository (repository root, directory nodes, le nodes, and script
nodes) and attach memory regions storing embedded data to the le nodes. It is important to state that
the routines that process embedded le nodes in order to satisfy remote client requests should be gen-
erated coherently with the structure of repository. The method of programming of the VFS generation
is straightforward, but for big and somewhat complicated repositories, manual maintenance of the gen-
eration code (especially maintenance of coherence of its three above mentioned elements) can become
awkward and time consuming.
The regularity of operations required to obtain the complete VFS component suggests the possibility of
automationof its productionfroma higher level specication. The idea consists inspecifying the repository
structure and operations necessary to generate the requested server entities within the same description
expressed in a specialized high-level language. The process of transformation of such a specication can
be realized by a tool that compiles the specication le into executable modules of C-language code
(Figure 44.A1).
The specication le is expressed in the language for which syntax is described below.
Specification
Page
generation
module
VFS
generation
module
VFS
compiler
FIGURE 44.A1 Compilation of VFS description into executable modules.
2006 by Taylor & Francis Group, LLC
44-40 Embedded Systems Handbook
C code
Global section
Repository
spec
Script
repository
File
repository
FIGURE 44.A2 VFS specication structure.
44.A1.1.1 Specication Structure
VFS specication is composed of two sections (Figure 44.A2):
Optional global declaration section
Obligatory repository specication section
The global declaration section, which can be omitted in some simple specications, data structure den-
itions, and routines, programmed directly in C language. These routines can be invoked in some part of
the second section of the specication.
The secondsectionof the descriptionof one or tworepositories (le repository and/or script repository).
Specication of at least one repository is obligatory (Figure 44.A2).
Repository specication is composed of three elements obligatory and one optional:
Repository type this elements allows the compiler to distinguish between le repository and
script repository.
Repository name character string necessary to identify the repository tree.
By-default node description le node specication which is necessary for the description of le
repository.
Repository body list of nodes composing the repository; this list is composed of a sequence of
le, script, and directory node specications; the list can be empty.
Each element of the list is described according to a specic syntax. All the elements have the name and
the qualier that identies the node type (le, script, or directory). Script specications contain the
pointer to the script routine. File descriptions may contain pointers to memory region containing le
data and le qualier that permits to determine the nature of the le contents (page HTML, embedded
image, embedded applet). Embedded directory descriptors contain directory body of exactly the same
nature as repository body and optionally the list of records holding access credential descriptors (pairs
usernamepassword).
Repository root together with the list of nodes from the repository body spans the rst level of the
repository tree. File (and script) nodes are the leaves of the tree while the directories span the sub-trees of
the repository.
To eachleaf of the node qualiedas anHTMLpage one canattachthe following two optional sections:
List of variables that convey the data from outside of the HTML page structure and that serve to
inject the data to the page structure.
Sections of C-code that process data before merging themwith the page template.
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-41
This brief textual description of the VFS specication language is followed by the more rigorous syntax
specication.
44.A1.2 BNF of Specication Language
The syntax of the VFS specication presented below uses the widely accepted version of specication
meta-language EBNF (Extended Bakhus Naur Form). This version uses the following conventions:
Keywords are in bold uppercase and are closed in double quotes.
Nonterminal symbols are in lower case.
Single character terminal symbols are enclosed in single quotes.
Symbols ::= (derivation) and [ ] (optional section) are part of meta-language.
IDENT, STRING, NUMBER, and SPECIAL_STRING are meta-keywords (keywords transporting
a value).
specication ::= [GLOBAL{ target_code }] le_repository_spec
[script_repository_spec ]
le_repository_spec ::= <REPOSITORY>MAIN rep_name def_le_spec
cplx_node_body </REPOSITORY>
script_repository_spec ::= <REPOSITORY>CGI rep_name cplx_node_body
</REPOSITORY>
def_le_spec ::= le_spec
rep_name ::= IDENT
cplx_node_body::= [access_list] node_list
node_list ::= node_spec | rep_list node_spec
node_spec::= le_spec | script_spec| directory_spec
directory_spec ::= <DIRECTORY> dir_name cplx_node_body
</DIRECTORY>
dir_name ::= IDENT
le_spec ::= <FILE> le_name [CONTENTS = region_name le_proc]
script_spec ::= <SCRIPT> script_name ROUTINE= routine_name
le_proc ::= qualier [( param_section)][{ target_code} ]
qualier::= TEXT | SIZE= NUMBER nature
nature ::= GIF |JPEG | JAVA | TEXT | PLUGIN
param_section ::= param_spec | param_section param_spec
param_spec::= par_qualif par_name : type [= init_value]
par_qualif::= DATA | FREE
par_name ::= IDENT
int_value ::= NUMBER | STRING
taget_code::= SPECIAL_STRING
type::= INTEGER | FLOAT | STRING
44.A1.3 Specication Example
The specication text below gives the complete description of the VFS component presented in
Figure 44.A3. The gure shows the repository that contains the following nodes:
One by default page named ROOT.
Directory public that contains ve HTML pages: gauge1, gauge2, di80_param_form, dvc5000_1,
dvc5000_2.
Directory images that contains six images in gif and jpeg format: alstom, DI80Mimic, ccd,
HartMimic1, sensor, and valve.
Director javadir that contains one le: Trend.
2006 by Taylor & Francis Group, LLC
44-42 Embedded Systems Handbook
global{
#include "env_var.h"
}
main
<repository> page_root
<default> ROOT CONTENTS =indexnew_str TEXT
<directory> public.one
<file> gauge1.htm CONTENTS = Hello_page_str TEXT(FREE Ititle:STRING ="Furnace Temperature"
FREE legend:STRING ="Furnace Temperature" FREE IButton:STRING =" gauge 1"
FREE Imval:INTEGER =550 FREE Iinit:INTEGER = 80
FREE Iinterval:INTEGER = 5 FREE Iscriptname:STRING ="gen.cgi?genPar=22"
FREE yaxlegend:STRING ="I/sec")
<file> gauge2.htm CONTENTS = Hello_page_str TEXT(FREE Ititle:STRING ="Cooling Fluid Temperature"
FREE legend:STRING="Cooling Fluid Temperature" FREE IButton:STRING ="gauge
2"
FREE Imval:INTEGER = 150 FREE linit:INTEGER = 55 FREE Iinterval:INTEGER = 2
FREE Iscriptname:STRING ="gen.cgi?genPar=69" FREE yaxlegend:STRING ="m")
<file> di80_param_form.htm CONTENTS = di80_param_form_str TEXT(DATA tag_name:STRING ="cooler"
DATA time_out:INTEGER = 300 DATA retries:INTEGER = 3
DATA refreshment:INTEGER = 75 DATA promptness:INTEGER = 100
DATA version:INTEGER = 1 DATA revision:INTEGER = 1
DATA filter:FLOAT = 0.44 FREE Icheckstr:STRING ="checked"
FREE Icheckstr_bis:STRING =" " FREE Iselstr:STRING = "selected"
FREE Iselstr_bis:STRING = ""
{char* interm;
tdb_result Ires;
get_db_data("OperatingMode",&Ires);
if(Ires.result==c_string && Ires.value.string !=NULL)
if(strcmp(Ires.value.string,"OPERATIONAL")==0 &&strcmp(Icheckstr,"checked")!=0
ll
strcmp(Ires.value.string,"INITIALISATION")==0&&
strcmp(Icheckstr,"checked")==0){
interm = Icheckstr; Icheckstr = Icheckstr_bis; Icheckstr_bis= interm; };
get_db_data("MesureFormat",&Ires);
if(Ires.result==c_string && Ires.value.string !=NULL)
if(strcmp(Ires.value.string,"ANALOG")==0 && strcmp(Iselstr,"selected")!=0 ||
strcmp(Ires.value.string,"DIGITAL")==0 && strcmp(Iselstr,"selected")==0){
interm = Iselstr; Iselstr = Iselstr_bis; Iselstr_bis= interm;
}; })
<file> dvc5000_1.htm CONTENTS = dvc5000_str TEXT (FREE Iact:STRING = "dvc5000_1"
FREE Ititle:STRING = "Input Valve" FREE Imamps:FLOAT = 0.0
FREE Itrav:FLOAT = 0.0 DATA DriveSign:FLOAT = 0.0
{Imamps = 4.0+16.0*IDriveSign.value.real/100.0;
Itrav=IDriveSign.value.real;})
<file> dvc5000_2.htm CONTENTS =dvc5000i_str TEXT (FREE lact:STRING ="dvc5000_2"
FREE ltitle:STRING = "Output Valve - inversed drive"
FREE Imamps:FLOAT =0.0 FREE Itrav:FLOAT =0.0 DATA DriveSignl:FLOAT
=0.0
{Imamps = 4.0+16.0*(1-IDriveSignl.value.real/100.0);
Itrav=IDriveSignl.value. real;})
</directory>
<directory> images
<file> alstom.gif CONTENTS =alstom_img SIZE = alstom_img_length GIF
<file> DI80Mimic.gif CONTENTS = DI80Mimic_img SIZE = DI80Mimic_img_length GIF
<file> ccd.gif CONTENTS = ccd_img SIZE = ccd_img_length GIF
<file> HartMimic1.jpeg CONTENTS = HartMimic1_img SIZE = HartMimic1_img_length JPEG
<file> sensor.gif CONTENTS = sensor_img SIZE = sensor_img_length GIF
<file> valve.gif CONTENTS = valve_img SIZE = valve_img_length GIF
</directory>
<directory> javadir
<file> Trend.class CONTENTS = Trend_bcode SIZE = Trend_bcode_length JAVA
</directory>
</repository>
cgi
<repository> script_rep
<script> set.cgi ROUTINE = first_script
<script> xy_coordinates.cgi ROUTINE = coordinates
<script> gen.cgi ROUTINE = generator
</repository>
FIGURE 44.A3 Conguration script of the embedded server contents.
The conguration le contains also the CGI repository that contains three scripts.
The compilation of this specication le produces two les in ANSI C language: one generating the
VFS skeleton and the other generating the page composition routine. Below we present the contents of
the le generating the VFS skeleton (Figure 44.A4).
2006 by Taylor & Francis Group, LLC
Embedded Web Servers in Distributed Control Systems 44-43
#include "DI80_VFS_0.h"
#include "data_base_processing.h"
extern tdata_base_struct server_root; extern tdata_base_struct cgi_bin;
extern char* indexnew_str; extern char* starter_str;
extern char* Hello_page_str; extern char* di80_param_form_str;
extern char* dvc5000_str; extern char* dvc5000i_str;
extern int alstom_img_length;
extern int DI80Mimic_img_length;
extern int ccd_img_length;
extern int HartMimic1_img_length;
extern int sensor_img_length;
extern int valve_img_length;
extern int Trend_bcode_length;
extern void first_script(int,...);
extern void coordinates(int,...);
extern void generator(int,...);
static tdata_base_struct db_page_root_gen_0(void)
{tdata_base_struct Irepository;
tdata_base_struct Iptrstack[10];
Irepository = InitRepository(NULL,"ROOT","page_root");
Iptrstack[0]=BuildFileNode("ROOT",indexnew_str,0,0,1);
AppendNode(Irepository,Iptrstack[0]);
Iptrstack[0]=BuildFileNode("starter.htm",starter_str,0,1,1);
AppendNode(Irepository,Iptrstack[0]);
Iptrstack[0]=BuildDirNode("public.one");
Iptrstack[1]=BuildFileNode("gauge1.htm",Hello_page_str,0,2,1);
InsertNode(Iptrstack[0],Iptrstack[1]);
Iptrstack[1]=BuildFileNode("gauge2.htm",Hello_page_str,0,3,1);
InsertNode(Iptrstack[0],Iptrstack[1]);
Iptrstack[1]=BuildFileNode("di80_param_form.htm",di80_param_form_str,0,4,1);
InsertNode(Iptrstack[0],Iptrstack[1]);
Iptrstack[1]=BuildFileNode("dvc5000_1.htm",dvc5000_str,0,5,1);
InsertNode(Iptrstack[0],Iptrstack[1]);
Iptrstack[1]=BuildFileNode("dvc5000_2.htm",dvc5000i_str,0,6,1);
InsertNode(Iptrstack[0],Iptrstack[1]);
AppendNode(Irepository,Iptrstack[0]);
Iptrstack[0]=BuildDirNode("images");
Iptrstack[1]=BuildFileNode("alstom.unused.gif",alstom_img,alstom_img_length,7,2);
InsertNode(Iptrstack[0],Iptrstack[1]);
Iptrstack[1]=BuildFileNode("DI80Mimic.old.gif",DI80Mimic_img,DI80Mimic_img_length,8,2);
InsertNode(Iptrstack[0],Iptrstack[1]);
Iptrstack[1]=BuildFileNode("ccd.unused.gif",ccd_img,ccd_img_length,9,2);
InsertNode(Iptrstack[0],Iptrstack[1]);
Iptrstack[1]=BuildFileNode("HartMimic1.jpeg",HartMimic1_img,HartMimic1_img_length,10,3);
InsertNode(Iptrstack[0],Iptrstack[1]);
Iptrstack[1]=BuildFileNode("sensor.gif",sensor_img,sensor_img_length,11,2);
InsertNode(Iptrstack[0],Iptrstack[1]);
Iptrstack[1]=BuildFileNode("valve.gif",valve_img,valve_img_length,12,2);
InsertNode(Iptrstack[0],Iptrstack[1]);
AppendNode(Irepository,Iptrstack[0]);
Iptrstack[0]=BuildDirNode("javadir");
Iptrstack[1]=BuildFileNode("Trend.class",Trend_bcode,Trend_bcode_length,13,4);
InsertNode(Iptrstack[0],Iptrstack[1]);
AppendNode(Irepository,Iptrstack[0]);
return Irepository;
}
static tdata_base_struct db_script_rep_gen_1(void)
{tdata_base_struct Irepository;
tdata_base_struct Iptrstack[10];
Irepository = InitRepository(NULL,NULL,"script_rep");
Iptrstack[0]=BuildScriptNode("set.cgi",16,first_script);
AppendNode(Irepository,Iptrstack[0]);
Iptrstack[0]=BuildScriptNode("xy_coordinates.cgi",17,coordinates);
AppendNode(Irepository,Iptrstack[0]);
Iptrstack[0]=BuildScriptNode("gen.cgi",18,generator);
AppendNode(Irepository,Iptrstack[0]);
return Irepository;
}
FIGURE 44.A4 Code generated by the conguration tool set from the script in Figure 44.A3.
2006 by Taylor & Francis Group, LLC
45
HTTP Digest
Authentication for
Embedded Web
Servers
Mario Crevatin and
Thomas P. von Hoff
ABB Switzerland Ltd.
45.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45-1
Motivation Security Objectives Outline
45.2 Security Extensions in the TCP/IP Stack . . . . . . . . . . . . . . . 45-3
Link Layer Security IPSec Secure Sockets Layer/Transport
Layer Security Application Layer Security
45.3 Basic Access Authentication Scheme . . . . . . . . . . . . . . . . . . . 45-4
45.4 DAA Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45-5
Cryptographical Prerequisites Digest Authentication
Digest Authentication with Integrity Protection Digest
Authentication with Mutual Authentication Summary
45.5 Weaknesses and Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45-9
Basic Authentication Replay Attacks Man-in-the-Middle
Attack Dictionary Attack/Brute Force Attack Buffer
Overow URI Check
45.6 Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45-11
Servers Browsers DAA Compatibility
45.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45-12
Appendix: A Brief Review of the HTTP . . . . . . . . . . . . . . . . . . . . . . . 45-12
Acknowledgment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45-14
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45-14
45.1 Introduction
45.1.1 Motivation
The application area of the Hypertext Transfer Protocol (HTTP) becomes larger and larger. While it was
originally intended as the protocol to transfer HTML les, it is increasingly used by other applications.
One reason is that the port assigned to HTTP is almost never blocked by a rewall. Thus, running an
application on top of HTTP allows to communicate through network security elements such as packet
lters. Examples for such applications are web mail and Web-based Distributed Authoring and Versioning
(WebDAV) [1, 2]. Since these web services contain no security features in their specication, they depend
45-1
2006 by Taylor & Francis Group, LLC
45-2 Embedded Systems Handbook
on security provided by HTTP or lower protocol layers. Most implementations of protocols below HTTP
do not provide user authentication, hence this service is offered by extensions to HTTP, namely basic and
Digest Access Authentication (DAA) [3].
Intodays industrial communication, the trendis toreplace proprietary communicationprotocols by the
standardized TCP/IP protocol stack [4]. This is also owing to the increased connectivity of automation
networks, thus opening new opportunities to improve the efciency of operations and maintenance
of automation systems. In the course of this development the number of embedded web servers has
increased rapidly. These web servers allow web-based conguration, control, and monitoring of devices
and industrial processes. Owing to the connectivity of the communication networks of the various
hierarchy levels (control network, Local Area Network [LAN], Wide Area Network [WAN]), establishing
access to any device from any place in the plant or even globally becomes technically feasible. However,
in addition to many opportunities, this technology also leads to many security challenges [13].
Usually embedded web servers are run on processors with limited resources, both in terms of memory
andprocessor power. These restrictions favor the deployment of lightweight security mechanisms. Vendors
offer tailored versions of the comprehensive security protocol suites such as Secure Sockets Layer (SSL)
and IP Security Protocol (IPSec). However, these versions may still not be suitable for all types of pro-
cessors and applications, owing to their requirements on memory and computational power. In the case
where applications are restricted to HTTP, DAA is an alternative solution [5]. This protocol extension
to HTTP is economical, concerning memory and processor power requirements. Although designed for
user authentication in particular, many more services have been included in its original denition. In this
chapter, we focus on the mechanisms and services as well as on the potential applications of HTTP digest
authentication.
45.1.2 Security Objectives
We distinguish the following security objectives for communication systems:
Condentiality: Guarantees that information is shared only among authorized persons or
organizations. Encryption of the transmitted data using cryptography prevents unauthorized
disclosure.
Integrity: A system protects the integrity of data if it makes any modication detectable. This can
be achieved by adding a cryptographic check sum.
Authenticity: Guarantees that a receiver of a message can ascertain its origin and that an intruder
cannot masquerade as an authorized person. Authenticity is a prerequisite for access control.
Access control: Guarantees that only authorizedpeople or devices have access tospecic information.
Availability: Guarantees that a resource is always available.
In business and commercial environments, auditability, nonrepudiability, and third-party protection also
belong to the set of security objectives. Note that the relevance of the individual security objectives vary
fromcase tocase anddependmuchonthe specic application. Abusiness webapplicationwhere monetary
transactions may be involved has different security requirements than an industrial application. While
for the former application condentiality of the data transfer is a major issue, this is less sensitive in the
latter case. In turn, other security objectives such as user authentication and integrity protection are much
more critical in industrial communication. These considerations become an issue in particular, when the
embedded web server is not within a well-protected network, but is installed at a remote location. Such
situations may occur in distributed applications.
45.1.3 Outline
First, an overview of the services of the security extensions in the TCP/IP protocol suite is given, with
a focus on SSL and IPSec. Starting with a brief review of the HTTP message exchange, the mechanisms
of HTTP basic and digest authentication are detailed and all their additional useful options (integrity
protection and mutual authentication) are discussed. Furthermore, the current implementation status
2006 by Taylor & Francis Group, LLC
HTTP Digest Authentication 45-3
of some (embedded) web servers (Apache 2.0.42, Allegro Rom Pager 4.05, GoAhead 2.1.2) and browsers
(Mozilla 1.01, Internet Explorer 6.0.26, Opera 6.05) are investigated. The results of functionality and
inter-operability tests are presented.
45.2 Security Extensions in the TCP/IP Stack
Security services are provided at different layers in the TCP/IP communication protocol suite by appropri-
ate protocol extensions [6]. An overview of those extensions is shown in Figure 45.1. The communication
protocol stack concept makes the security services of a given layer transparent to the upper layers. The
security extensions on the Internet layer and the transport layer, IPSec and SSL, respectively, provide
a large range of security services and have therefore been widely implemented.
45.2.1 Link Layer Security
As extensions to the Point-to-Point Protocol (PPP), the (cryptographically weak) PasswordAuthentication
Protocol (PAP) and the stronger Challenge Handshake Authentication Protocol (CHAP) provide authen-
tication. To establish secure tunnels with a PPP connection into a LAN or a WAN, the Point-to-Point
Tunnel Protocol (PPTP) or the Layer 2 Tunnel Protocol (L2TP) can be used.
45.2.2 IPSec
This network layer security protocol is particularly useful if several network applications need to be
secured. As protection is applied at the IP layer, IPSec provides a single means of protection for most
data exchanges (UDP and TCP applications). It is transparent to all upper layers. The security services
provided by IPSec are:
Access control (IP ltering)
Data integrity
Encryption (optional)
Data origin authentication (optional)
These services are based on cryptographic mechanisms guaranteeing a high security level when used with
strong algorithms. However, a drawback of IPSec is that for each host-to-host link, a specic conguration
is required. While IPSec provides machine-to-machine security, it cannot perform authentication of the
user. Therefore, IPSec is mainly deployed to establish Virtual Private Networks (VPNs).
An IPSec implementation on a Coldre MCF5307 65 MHz processor showed a program memory
requirement of 64 KB, without support of the Internet Key Exchange (IKE) Protocol. Experiments
Application
layer
Basic/digest
authentication
PGP
SSH
Transport
layer
Internet
layer
Link
layer
SSL
TLS
IPSEC
PPTP
L2TP
PAP
CHAP
FIGURE 45.1 Network layers and associated security protocols.
2006 by Taylor & Francis Group, LLC
45-4 Embedded Systems Handbook
consisting of ping requests between two Coldre processors were performed. The delay between a ping
request and the reception of its reply was observed to become twice or even three times longer com-
pared with the unprotected case, when IPSec was activated using the Authentication Header (AH) or the
Encapsulation Security Payload (ESP) conguration, respectively.
45.2.3 Secure Sockets Layer/Transport Layer Security
The SSL is a protocol created by Netscape Communications Corporation. The standardized version is also
known as Transport Layer Security (TLS). SSL is transparent to the end user and to the upper protocol
layers. It protects all applications running on top of TCP, but does not protect UDP applications. The
https prex on the URI (Uniform Resource Identiers) and the lock icon on the browser GUI indicate
that the SSL protocol is in use. If the servers certicate is not signed by a certicate authority trusted by
the client (browser), the user is prompted to accept or to refuse the certicate.
The security services provided by SSL/TLS are:
Session key management and negotiation of cryptographic algorithms
Condentiality using encryption
Server authentication using certicates
Data integrity protection
Secure Sockets Layer includes optional client authentication, which is rarely performed in practice. Under
the protection by encryption provided by SSL, user authentication is often implemented at applica-
tion level. In summary, SSL provides a high level of security, but has high memory and computation
requirements, particularly when considering the constraints of embedded web servers.
45.2.4 Application Layer Security
The procedures described in the previous sections operate on the lower layers and focus on the authen-
tication of machines. On the application layer, individual applications may provide their own security
enhancements. Typical security tools are PGP/GnuPG to secure mail transfer and SSH (secure shell).
For HTTP, there exist the protocol extensions HTTP basic and digest authentication, which authenticate
users to control their access to protected documents. Authentication of the user is in contrast to machine
authentication provided by the protocols described in the sections above. Since the remainder of this
chapter focuses on the protocol extensions of HTTP, a brief review of HTTP is given in the Appendix.
45.3 Basic Access Authentication Scheme
The HTTP basic authentication scheme [3] is the simplest authentication scheme and provides some weak
protection. This is owing to the fact that username and password can be discovered by eavesdropping on
the message exchange. The HTTP message exchange for basic authentication is depicted in Figure 45.2. On
reception of a 401 unauthorized message, the client (browser) prompts the user for his or her username
and password. These are transmitted in clear over the link in the authorization request-header-eld, for
each accessed document within the same protection space (Figure 45.3).
1. The browser issues a HTTP GET command to the server, with the requested URI.
2. The server answers with a 401 unauthorized HTTP error code and requests the browser to send a
valid username and password (credentials)
1
using basic authentication. The realm
2
(string) is also
included in the challenge sent to the client. These parameters are part of the WWW-authenticate
request-header-eld.
1
Credentials: Information that can be used to establish the identity of an entity. Credentials include things as private
keys, tickets, or simply a username and password pair. This is also known as the shared secret.
2
Realm: Name identifying a protection space (zone) on a server. Usually shown to the user at the password prompt.
2006 by Taylor & Francis Group, LLC
HTTP Digest Authentication 45-5
Client
1
Server
GET URI HTTP/1.1
GET URI HTTP/1.1
Authorization: Basic dGVzdDp0ZXN0
HTTP/1.1 200 OK
<data>
HTTP/1.1 401 unauthorized
WWW-authenticate: Basic realm=Basic Test Zone
2
4
3
5
FIGURE 45.2 HTTP basic authentication negotiation.
FIGURE 45.3 Internet Explorers basic authentication prompt.
3. The browser prompts the user for username and password. The realm (here Basic Test Zone) is
usually displayed to the user. The credentials are sent with a new GET request, encoded in Base64
format. Decoding is trivial because Base64 is a simply invertible encoding scheme. The credentials
are sent in the authorization response-header-eld.
4. After the server has checked and accepted the password, the requested document is sent to the client
with a HTTP 200 response.
5. The client displays the document, and automatically sends the same credentials for any subsequent
request made under the same protection space. Hence the password is sent unencrypted with each
request.
Note that, unless condentiality is provided by some other security protocol on a lower layer (see
Section 45.2) username and password are transmitted in an unprotected way.
45.4 DAA Scheme
45.4.1 Cryptographical Prerequisites
Unless public keys are used, authentication is based on a shared secret (credentials) between the authen-
ticating and authenticated entity. Usually, these credentials consist of the relation between a username
2006 by Taylor & Francis Group, LLC
45-6 Embedded Systems Handbook
FIGURE 45.4 IEs digest authentication prompt.
and its password (Figure 45.4). One possibility to authenticate a peer over the network is the submission
of username and password as executed in the basic authentication scheme (see Section 45.3). However,
since they are transmitted in clear over the network, an attacker having access to the network trafc can
eavesdrop the credentials. The challenge/response concept solves this problem by avoiding to send the
password in clear. Instead, the authenticating entity A sends a challenge x to the entity B to be authentic-
ated. B calculates the response z
B
= f (x, y ), where y is the shared secret between A and B. A also calculates
z
A
= f (x, y ) and checks whether z
A
coincides with z
B
. If so, the identity of B is proven to A.
To make the procedure of challenge and response secure, there are two requirements. First, x needs to
be random, so that z
B
is of no value to any attacker intercepting it. Second, f should be a one-way hash
function [7]. The properties of hash functions are:
A nite-length output message (hash) is calculated from an arbitrary-length input message.
It is easy to determine the output message.
Given an output message it is hard to nd a corresponding input message.
It is hard to nd another input message with the same output message (hash).
Functions having these properties meet the requirements for cryptographic check sums. It is their quasi-
uniqueness (hard to nd two input messages producing the same output) that allows an owner B of a
message z to show only the messages hash to another user A and A can still trust that z is known by B.
The same property is used to protect the integrity of data storage or message transmission. A comparison
of the hash of the received message or the stored data with the original hash will detect any change in the
message or the data. The most frequently used hash functions are MD5 and SHA-1 [7].
45.4.2 Digest Authentication
Although very similar to the basic authentication scheme, DAA [3] is much more secure. Instead of
sending the username and the password in an unprotected way, a unique code is calculated fromusername,
password, and a unique number received from the server. Figure 45.5 shows the HTTP DAA transactions
between a web server and a browser:
1. The browser requests a document in the usual way, with a HTTP request message.
2. The server sends back a 401 unauthorized challenge response message. The server generates a
nonce (number used once) and sends it to the client. Note that the nonce must be different for
every 401 message.
2006 by Taylor & Francis Group, LLC
HTTP Digest Authentication 45-7
HTTP/1.1 401 unauthorized
GET/protected/test.html HTTP/1.1
<data>
WWW-authenticate: Digest realm=DigestZone,
nonce=3gw6..., algorithm=MD5, domain=/protected,
qop=auth
GET/protected/test.html HTTP/1.1
Authorization: Digest username=test, realm=DigestZone
nonce=3gw6..., algorithm=MD5, uri=/protected/test.html,
qop=auth, response=65bia..., nc=0001,
cnonce=82c...
HTTP/1.1 200 OK
<data>
Authentication-Info: rspauth=d9260....
qop=auth, nc=0001, cnonce=82c...,
Client
1
Server
2
4
3
5
FIGURE 45.5 HTTP digest authentication: negotiation.
3. The browser prompts the user for its username and its password, and computes a 128-bit response
using the MD5 algorithm[7] as a one-way hash function:
response=MD5[MD5(username:realm:password):nonce:nc:cnonce:qop:MD5(method:uri)]
This response is sent to the server along with the nonce received, the uri requested, its own
generated cnonce (client nonce), and the username. qop stands for Quality of Protection and
indicates whether additional integrity protection is provided. In this example, this is not the case,
hence qop=auth.
4. The server calculates an own response following the same scheme as given in Step 3, using the
information sent to the client before and its own version of the username and password (or option-
ally a hashed form of them). It compares the received version with the computed one and grants
access to the resource (HTTP 200 OK response) if the results match.
If the authorization fails, a new 401 error message is sent. All 401 error messages include an
HTML error page to be displayed by the browser. Browsers usually reprompt the user for a new
username and password three times before giving up and displaying the error page.
5. For any subsequent request, the client usually generates a different cnonce. A counter, nc, is
incremented. This new cnonce and counter, along with the new uri is used to recompute a new
valid response value. Usually the browser stores the username and the password temporarily in the
memory in order to allowa user to reaccess a given protection space without retyping the password.
45.4.3 Digest Authentication with Integrity Protection
The RFCfor digest authentication [3] provides the capability to include the hash of the entity (the payload,
usually HTML code) in the computed MD5-hash response:
response=MD5[MD5(username:realm:password):nonce:nc:cnonce:qop:MD5
(method:uri:MD5{entity})]
In this way, any modication of the transmitted information will result in a different MD5-hash response,
which would be easily detectable. While the integrity of the document from the server is insured in a
response message, the integrity of POST data is protected in a request message. To indicate that the server
supports integrity protection, the argument QoP is set to auth-int.
2006 by Taylor & Francis Group, LLC
45-8 Embedded Systems Handbook
Request document
Challenge: nonce
Authorization: response
Challenge: cnonce
Authorization: rspauth
Client Server
FIGURE 45.6 Mutual authentication mechanism.
Note that, for GET requests with arguments, the integrity of the payload (the arguments) is already
protected without the option qop =auth-int, because the URI with its arguments is included in the
MD5-hash. For integrity protection of the response from the server the rspauth eld must be present.
See Section 45.4.4 on mutual authentication on rspauth.
45.4.4 Digest Authentication with Mutual Authentication
We have already seen that digest authentication identies the client. However, DAA foresees authentication
of the server by the client, providing mutual authentication. The server already knows the client is
trustworthy, because the browser has sent the proof that the user knows their shared secret. This occurred
when sending the response to the server (see Step 3 in Figure 45.5). Exactly the same mechanism is
used to authenticate the server. After receiving correct client credentials along with the GET request, the
server sends back a proof that it also knows the shared secret. This is done via the rspauth eld sent
in the Authentication-Info header of the HTTP 200 OK message, along with the document previously
requested. The challenge initiating the server response is the cnonce fromthe client.
rspauth=MD5[MD5(username:realm:password):nonce:nc:cnonce:qop:MD5(:uri)]
The browser uses this information to authenticate the server. If integrity protection is activated, the hash
for the entity is included in rspauth.
rspauth=MD5[MD5(username:realm:password):nonce:nc:cnonce:qop:MD5(:uri:MD5{entity})]
3
The nonce fromthe server is used to challenge the client with an unpredictable number. In the same way,
when server authentication is used for mutual authentication with DAA, the cnonce from the client is
used as a challenge that the server cannot predict. Therefore, it is not possible to precompute responses to
those challenges.
This is known as a mutual challenge/response mechanism. Its concept is depicted in Figure 45.6.
45.4.5 Summary
DAA offers a secure authentication scheme with low implementation complexity, well adapted to
embedded systems. In this environment, authentication and integrity protection is important, whereas
condentiality is often not required. Unfortunately, integrity protection and mutual authentication are
not yet supported by todays typical DAA implementations on clients and servers.
3
The only difference to the client response is the missing eld method.
2006 by Taylor & Francis Group, LLC
HTTP Digest Authentication 45-9
TABLE 45.1 Comparison Between DAA and SSL on
Functionality and Footprint Size
DAA SSL
Mandatory features
Client authentication
Server authentication
Data integrity
Data condentiality
Optional features
Client authentication
Server authentication
Data integrity
Data condentiality
Memory requirements
RAM 1648 Byte
a
100 KB250 KB
b
ROM 6312 Byte
a
200 KB
c
a
According to own measurements on a Coldre MCF5307.
b
RAM required depending on the number of simultaneous
secure connections (1 to 24) according to Reference 8.
c
According to information of Allegro.
Table 45.1 compares mandatory and optional features of DAA and SSL, and also their memory require-
ments. Note that, at the time our tests were conducted the optional features were supported by some
implementations only.
45.5 Weaknesses and Attacks
45.5.1 Basic Authentication
If an attacker has access to the network, he can eavesdrop the HTTP transaction very easily and obtain
the username and the password if there is no encryption provided, for example, by SSL or IPSec. In those
cases, HTTP basic authentication should be replaced by the digest scheme (Table 45.2).
45.5.2 Replay Attacks
Vulnerabilities appear when data, for example, sensitive commands, are sent to the server. With POST and
some other HTTP methods such data is transmitted in the entity part of the message. An attacker could
replay credentials from an intercepted POST request, with tampered form data (commands) thus taking
control of the remote server (automation system). In contrast, the arguments sent in a GET request are
part of the arguments to compute the digest. Thus, GET requests are safer and should preferably be used
to send data to the server. The fact that the credentials are stored in the browsers cache when using the
GET method has no effect to the security, because of the uniqueness of the nonce.
Proper nonce generation together with a reliable check for uniqueness provides good protection against
replay of previously used valid credentials. Although the denition of a nonce requires its uniqueness,
implementers might be tempted to reuse a nonce, this must be avoided. Within a session, a replay attack
is prevented by incrementing the counter nc, making the previously calculated hash value invalid. Note
that, the change of 1 bit in the argument of a one-way hash function changes in the average half of the bits
of the hash value [7]. In addition, integrity protection would prevent fromtampering information when
POST is used.
On an embedded device, where usually only one legitimate user at a time is likely to access the system,
locking the protection space (realm) to only one client at a time also contributes to protection.
2006 by Taylor & Francis Group, LLC
45-10 Embedded Systems Handbook
TABLE 45.2 Comparison between DAA Server
Implementations
Server
Features Apache RomPager GoAhead
Basic access authentication
DAA
Mutual authentication
Integrity protection
Nonce check
Uri check
45.5.3 Man-in-the-Middle Attack
An attacker might be able to insert himself into the path between a client and a server. Capturing the
packet containing the servers response with the challenge for digest authentication, he can replace the
WWW-authenticate eld with a eld requesting basic authentication from the client. Username and
password can thus be gained as the client returns them in an unprotected form, owing to the faked
WWW-authenticate eld.
Disabling of basic authentication in the browser and requiring mutual authentication could prevent
such an attack.
45.5.4 Dictionary Attack/Brute Force Attack
This attack assumes that the user chooses a simple password. In the clients request message (Step 3) all
information to calculate the response eld, apart from the password, is available. Having such a message
in his hands an attacker might thus compute thousands of responses generated with a list of possible
passwords from a dictionary, and see if it coincides with the response sent by the browser. The success
probability of such a dictionary attack can be decreased by proper selection of the password, for example,
avoiding common words and including lower and upper case letters as well as special characters.
For a brute force attack the list of possible passwords is replaced by all combination of characters.
However, the fact that a hash must be calculated for each password guess makes a brute force attack very
expensive.
45.5.5 Buffer Overow
In a buffer overow attack the attacker sends very long commands to the server [9]. In embedded web
servers, web pages often serve as an entry to CGI programs. Input data for such a programare transmitted
in the HTTP message. If the size of such data is too large, the data can be overwritten in the memory
over the original executable code of the browser/server, and be executed by the processor, unless a careful
error check is performed. Mostly, this will crash the server, resulting in a denial of service. However, it can
also allow a skilled attacker to gain access to everything the server has access to, for example, condential
information or the control of the devices in the automation system.
Buffer overow can be avoided when all applications performcareful data range checks. For integrated
third-party code it is crucial to update the code to the latest available version and to apply any security
patches or service packs as soon as they become available.
45.5.6 URI Check
Some web servers do not check whether the uri eld located in the authorization header corresponds
to the requested URI in the GET request. An attacker may replay a stored valid request from a client for
2006 by Taylor & Francis Group, LLC
HTTP Digest Authentication 45-11
a given uri, but modify the GET line to obtain some other protected documents he wants or, possibly
even more dangerous, the server may accept modied parameters in a GET request. This might allow an
attacker to send arbitrary commands to an embedded automation device. See uri check in Section 45.6.
45.6 Implementations
Some available embedded web server implementations and browsers have been tested for their support of
DAA. Investigations have shown that the implementations do not match the specication in every aspect.
This section briey outlines the results.
45.6.1 Servers
1. Apache 2.0.42. Among the DAA implementations tested, Apache has the best one. While mutual
authentication is in place and working, integrity protection is not yet implemented. The nonce lifetime is
adjustable and the uri is checked. Apache is also the most robust server in terms of resistance to exploits,
since a large user community uses and tests it continuously. Note that DAA is compatible with the Opera
browser only if the AuthDigestDomain directive is congured in the le .htaccess. The Internet Explorer
(IE) removes the arguments when copying the GET query in the uri eld. Therefore, DAA is not compatible
between IE and Apache for GET requests with parameters. The nonce is composed of a timestamp (in the
clear), followed by the = sign, and the SHA1 hash [7] of the previous timestamp, realm, and a server
secret (private key):
nonce=timestamp:=:SHA1(timestamp+secret)
2. Allegro RomPager 4.05. In the RomPager web server [10] DAA is implemented without mutual
authentication and integrity protection. There exists the option StrictDigestAndIp, where the valid-
ity of the unique nonce is time limited and never more than one IP address is granted access in a given
instant. This feature is appropriate in embedded systems because it prevents replay attacks. RomPager
makes a full uri check. In addition, it is able to cope with the less secure uri processing of the IE. Further-
more, RomPager assumes that requests received via HTTP 1.0 originate from a browser not supporting
digest. As a consequence, DAA does not work with IE and Opera when connected via a proxy. Therefore,
Opera needs to be congured without proxies. On IE the Use HTTP 1.1 through proxy connections
option can be set. The nonce is generated using the time, the server IP address, the previous nonce (if
there was one), and the server name.
nonce=MD5(Time:Server-IP-Address:[previous nonce:]Server-Name)
3. GoAhead 2.1.2. GoAhead is a free software and open source server, developed for embedded devices
on a variety of platforms. No mutual authentication and integrity protection is supported. A given
nonce never expires and is never checked. Hence, there is no protection from replay attacks. The server
removes the parameters of a GET query request in the digest uri eld. This prevents Mozilla from being
compatible with those types of requests.
nonce=MD5(RANDOMKEY:timestamp:myrealm)
45.6.2 Browsers
All three browsers tested here have their strengths and weaknesses. IE is the only one using a different
prompt for basic and digest authentication (see Figure 45.3 and Figure 45.4). Being made aware of this
difference, an attentive user can recognize a Man-in-the-Middle-Attack (see Section 45.5.3). However, the
IE removes GET arguments in the uri. Mozilla is an open-source browser, and thus very easy to modify.
On the other side, the current implementation is not very user friendly (slow and continually asking for
the username and password). Opera is the strongest one in terms of security and the only one supporting
2006 by Taylor & Francis Group, LLC
45-12 Embedded Systems Handbook
TABLE 45.3 Compatibility of Client and Server
Implementations of DAA
Clients
Mozilla 1.01
Servers Netscape 7 IE 6.0.26 Opera 6.05
Apache 2.0.42 (win32)
a
b
RomPager 4.05
GoAhead 2.1.2
b
a
Not working for GET with parameters.
b
Requires valid domain.
mutual authentication. Concerning DAA, a combination of the three products mentioned above would
probably meet the features expected from a perfect DAA implementation. These are:
Option to disable basic authentication.
The user shall be notied by some visual indication that DAA is used, when he is prompted for
username/password, and also during browsing.
Support of DAA with mutual authentication, with some displayed indication that the server has
been authenticated. Possibility to refuse pages if the server has not been authenticated.
Support of DAA with integrity protection, with a visual indication that it is used.
Verication that the URI requested with DAA is in the protection space.
45.6.3 DAA Compatibility
Table 45.3 summarizes the compatibility of different clients versus different servers. Note that, the
compatibility tests did not include server authentication and data integrity.
45.7 Conclusions
Digest Access Authentication is a light-weight, yet efcient way of providing user authentication. Applic-
ations running on top of HTTP can benet from the services of DAA. Typically these applications are
web services like WebDAV and HMI applications in automation systems, migrating from proprietary
communication protocols towards TCP/IP technology. Wherever basic authentication is still in use and
not protected by a security protocol of a lower layer, it should be replaced by DAA. For embedded web
server applications, it is urgent that browser and web server vendors implement mutual authentication and
integrity protection, as these services are required to achieve a high security level. Where condentiality
is not required, an implementation of DAA including all features dened in the RFC [3], namely mutual
authentication and integrity protection, would provide sufcient, yet light-weight, security for embedded
systems.
Appendix: A Brief Review of the HTTP
The HTTP is widely used to exchange text data across different platforms over a TCP/IP network. The
denition of HTTP 1.1 is given in [11]. HTTP is based on standard request/response messages transmitted
between a client (browser) and a web server. An example of a typical HTTP handshake is depicted in
Figure 45.A1. The procedure is straightforward:
1. The browser sends a GET request to the server, indicating the requested resource.
2. The server responds with a 200 OK message along with the document requested.
2006 by Taylor & Francis Group, LLC
HTTP Digest Authentication 45-13
Client Server
1
2
GET/simple.html HTTP/1.1
Host: 192.168.0.3
User-Agent: Mozilla/5.0 (...) Gecko/20020530
Accept: text/html (...)
Accept-Language: en-us, en;q=0.50
Accept-Encoding: gzip, deflate, compress;q=0.9
Accept-Charset: ISO-8859-1, utf-8;q=0.66, ;q=0.66
Keep-Alive: 300
Connection: keep-alive
HTTP/1.1 200 OK
date: Sun, 29 Dec 2002 15:21:13 GMT
Server: Apache/2.0.39 (Win32)
Last-Modified: Sun, 29 Dec 2002 15:05:13 GMT
ETag "524d-fa-4921304a"
Accept-Ranges: bytes
Content-Length: 250
Keep-Alive: timeout =15, max =100
Connection: Keep-Alive
Content-Type: text/html; Charset =ISO-8859-1
<!DOCTYPE html PUPLIC"-//W3C//DTD HTML4.01
Transitional//EN">
<html>
<head>
<title>Test page</title>
<meta http-equiv ="content-type"
content ="text/html; charset =ISO-8859-1">
</head>
<body>
This is a simple HTML page
<body>
</html>
Header
Header
Entity
FIGURE 45.A1 Example of a HTTP message exchange.
Using Figure 45.A1, the key aspects of HTTP are briey explained. A HTTP message consists of a header
and in most cases an entity. Note that, while all data in the header is transferred as ASCII text, the entity
might contain non-ASCII data, for example, .jpg-les.
Header
A Header is composed of a start-line and header-elds.
Start-line
There are two types of start-lines:
A request start-line is of the formMethod URI HTTP-Version, for example,
GET/simple.html HTTP/1.1
A status start-line has the formHTTP-Version Status-Code Phrase, for example,
HTTP/1.1 200 OK
Header-elds
Header-elds give various information, including date, language, and security information. In this
document, the security-relevant header elds where discussed in detail. Examples:
Host: 192.168.0.3
Date: Sun, 29 Dec 2002 15:21:29 GMT
WWW-authenticate: Digest realm=abc,...
2006 by Taylor & Francis Group, LLC
45-14 Embedded Systems Handbook
Method
The most relevant methods are GET and POST.
GET: The most common method to request a document. The GET request can also include
arguments in the URI. This is widely used to transmit commands from the client to the server.
POST: Method used to send information to the server, usually from a web page form.
URI
Uniform Resource Identiers [12]. URIs in HTTP can be represented in absolute form or relative to
some known base URI. Example:
http://www.test.ch/simple.html or /simple.html.
Entity
The rest of the message, for example, a HTML document.
Acknowledgment
The authors wouldlike tothank Emanuel Corthay fromthe Swiss Federal Institute of Technology Lausanne
for his valuable contribution in the examination of the individual implementations.
References
[1] J. Slein, F. Vitali, E. Whitehead, and D. Durand, Requirements for a Distributed Authoring and
Versioning Protocol for the World Wide Web, RFC 2291, 1998.
[2] E. Whitehead, A. Faizi, S. Carter, and D. Jensen, HTTP Extensions for Distributed Authoring
WebDAV, RFC 2518, 1999.
[3] J. Franks, P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen, and L. Stewart, HTTP
Authentication: Basic and Digest Access Authentication, RFC 2617, June 1999.
[4] M. Naedele, IT Security for Automation Systems: Motivations and Mechanisms, atp Vol. 45,
pp. 8491, 2003.
[5] T. von Hoff and M. Crevatin, HTTP digest authentication in embedded automation systems. In
Proceedings of the IEEE International Conference on Emerging Technologies for Factory Automation
(ETFA03), Vol. 1, pp. 390397, 2003.
[6] W. Stallings, Network Security Essentials: Applications andStandards, Prentice-Hall, NewYork, 2000.
[7] B. Schneier, Applied Cryptography: Protocols, Algorithms, and Source Code in C, John Wiley &Sons,
NewYork, 1996.
[8] Rom Pager Secure Programming Reference Version 4.20, Allegro Software Development
Corporation, Boxborough, MA, 2002.
[9] E. Cole, Hackers Beware, New Riders, 2002.
[10] Rom Pager Web Server Engine Porting & Conguration, Allegro Software Development
Corporation, Boxborough, MA, 2000.
[11] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee, Hypertext
Transfer Protocol HTTP/1.1, RFC 2626, June 1999.
[12] T. Berners-Lee, R. Fielding, and L. Masinter, Uniform Resource Identier (URI), RFC 2396,
August 1998.
[13] R.J. Anderson, Security Engineering: A Guide to Building Dependable Distributed Systems,
John Wiley & Sons, NewYork, 2001.
2006 by Taylor & Francis Group, LLC
Intelligent Sensors
46 Intelligent Sensors: Analysis and Design
Eric Dekneuvel
2006 by Taylor & Francis Group, LLC
46
Intelligent Sensors:
Analysis and Design
Eric Dekneuvel
University of Nice at Sophia
Antipolis
46.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46-1
46.2 Designing an Intelligent Sensor . . . . . . . . . . . . . . . . . . . . . . . . . 46-2
Analysis The External Model Functional Decomposition
of a Service Sensor Architectural Design
46.3 The CAP Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46-10
Description Illustration Implementation
46.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46-17
Acknowledgment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46-18
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46-18
46.1 Introduction
Today, thanks to the advances in numerical processing and communications, more and more function-
alities are embedded into distributed components with the charge for providing the right access to these
services. Complex systems are then seen like a collection of interacting subsystems embedding control
and estimation algorithms. The inherent modularity concept behind this approach is the key answer to
the increasing complexity of the systems and this has led to the denition of new models and languages
for the formal specication of the components [1]. In this chapter, we are more particularly interested in
intelligent sensors, components associating computing and communication devices with sensing func-
tions [2]. In order to reduce the complexity, the design of an intelligent sensor requires the necessity to
provide a model of the sensor at a high level of abstraction of the implementation. The disparity of the
knowledge encapsulated inside the instrument renders the modeling process very sensitive to the modeling
strategy adopted and to the models used. A real-life component like the intelligent instrument usually
involves the cooperation of three kinds of programs [3]:
A level of data management to performtransformational tasks.
One or more reactive kernels to compute the outputs fromthe logical inputs, selecting the suitable
reaction (computations and output emissions) to incoming inputs.
Some interfaces with the environment to acquire the inputs and processes the outputs. This level
includes interrupt management, input reading from sensors and conversion between logical and
physical inputs/outputs. Communication with the other components of the system will also be
managed at this level.
46-1
2006 by Taylor & Francis Group, LLC
46-2 Embedded Systems Handbook
Data management covers research elds such as the probability theory, the possibility theory, the
measurement theory, and uncertainty management. Unlike a numeric sensor that provides an objective
quantitative description of objects, a symbolic sensor provides a subjective qualitative description of
objects [4]. This qualitative description, adapted to the sensor measurement, can be used in Knowledge
Based Systems (KBS), checking the validity of a measurement or improving the relevance of a result
[5]. The reactive part is probably the most difcult part of the design of the intelligent sensor. Like all
reactive systems, the intelligent sensor must continuously react to its environment at a speed determined
by this environment. This often involves the ability of exhibiting a deterministic behavior, of allowing
concurrency and of satisfying strict real-time requirements.
A generic intelligent sensor model has been developed to help during the specication step of the
sensor functionalities [6]. The purpose of the intelligent sensor generic model is to provide a high level
of abstraction of the implementation of the sensor, focusing on the fundamental characteristics that the
sensor must exhibit. For this, the generic model uses the point of view of the user to describe the services
and the operating modes in which the services are available [2]. Then, by using a language to compute
the formal description, we are in a position of evaluating the component, from a static and/or dynamic
point of view. The availability of a language to compute the formal model allows the evaluation of the
component. Once the component is validated, a prototyping step can be launched in order to obtain an
operational system prototype. This prototyping step being usually expensive in time and resources, the
nal implementation should be made as much as possible using automatic synthesis from the high-level
description to ensure implementation that are correct by construction [7].
In this chapter, after reviewing the main characteristics of the generic intelligent sensor formal model,
we talk about an implementation of the model provided by the CAP language, a language specically
developed for the design of intelligent sensors.
46.2 Designing an Intelligent Sensor
46.2.1 Analysis
As stated earlier, the diversity of the embedded functions, the exibility, the reuse, argue for a distribution
of the functionalities inside a complex system into areas of responsibility [8]. From an external viewpoint,
an intelligent sensor will be considered as a modular unit behaving as a server. As such, it will be designed
to offer its customers (the operator, other instruments, or other modules) an access to the various
functionalities encapsulated inside the sensor.
Let us consider a simple example. A navigation system has to be designed using a closed loop on a
surface like a wall to control the locomotion. As can be seen in Figure 46.1, the environment of the
system to be designed exhibits various entities or actors, such as the axes, the operator, and the obstacles.
Every occurrence of a start_moving request, a closed loop is activated until a new request like stop_moving
is emitted by the operator. Once the links between the system and the environment are dened, the
functional specications can be established. For this, a dataow diagram (see Figure 46.2) can be easily
dened by identifying the data necessary for the navigation goals: a position measurement value useful
for the closed loop to compute the values of the speed that are to be applied on the various axes.
Suppose we now decide to include another activity that will enable the system to follow a predened
trajectory. By this way, the operator (or a high-level decisional system) is provided with a possibility to
choose between two methods according to the current context of the execution. This trajectory execution
activity can be interested in knowing if unexpected obstacles are met along this trajectory in order to
stop the system before striking one of the obstacles. If both functionalities, the obstacle detection and
the computing of the position, use the same physical resource (a set of ultrasonic sensors for example),
it is better to encapsulate them for homogeneity inside a subsystem in charge of the physical resource.
The intelligent sensor module interacts with the environment through several messages. This is the
interface of the module. The structure of a message (its signature) is generally limited at this level to
a list of parameters such as the sender identity, the communication medium used, and the contents. We
2006 by Taylor & Francis Group, LLC
Intelligent Sensors: Analysis and Design 46-3
Excitation
Easy
data
Speed
measurements
Start_moving
Stop_moving
Operator
Axes Obstacles
Speed
values
Navigation
system
FIGURE 46.1 Context diagram of the application.
Trajectory
execution
Obstacle
detection
Position
counting
Wall
following
Control
Unexpected
obstacle
Position
Speed
instruction
FIGURE 46.2 A global dataow diagram.
usually make a difference between messages for data communication and messages for control. Control
messages enable the customers to communicate with the sensor in a bidirectional way, using a clientserver
protocol (see Figure 46.3). The customer requests the launching of an activity through the request link.
The customer receives an identication number for its demand and can be informed about the status of
the request (activity launched, terminated, etc.), owing to another message of control (reply).
To be effective, the intelligent sensor interface description must be complemented by the behavioral
description of the module. While the structural viewpoint describes the internal organization of a complex
system, the behavioral viewpoint will express all the information that characterizes the module to be
designed froman external viewpoint [9]. Ageneric model of an intelligent instrument has been developed
for this purpose, using the concept of external services to qualify the set of operations offered to the outer
entities. Reference 10 gives the following denition:
Denition 46.1 From an external point of view, a service is the result of the execution of a treatment, or a set
of treatments, for which one can provide a functional interpretation.
In other terms, the execution of a service typically results in the production of output values according
to input values consumed by the execution of a processing. The services are not limited to measure-
ment aspects. The set of the services cover a large spectrum of functionalities that we can expect from
intelligent sensors. Intelligent sensors must be congured, calibrated, enabled, so that they can provide
their measurements to the rest of the system. Selecting a particular sensor, an alimentation, a reference
voltage and a sampling frequency, are common examples of conguration services that can be used to
2006 by Taylor & Francis Group, LLC
46-4 Embedded Systems Handbook
Input data Output data
Reply
Intelligent sensor
component (server)
Wall following
component (customer)
Pilot
component (customer)
Request
Position
computing
Obstacle
detection
FIGURE 46.3 The intelligent sensor interface.
set the value of these parameters. Processing embedded inside services can be as simple as the acquisi-
tion of a value but usually, it involves more complex treatments such as signal processing (to improve,
for example, the resolution of a given value), the data processing, the validation of the measurement,
and so on.
In the generic model of an intelligent sensor, a service will consequently be modeled by two sets of
parameters:
External: That is, how the service communicates with other services. The services are gathered into
User Operating Modes.
Internal: That is, how the external service is decomposed into internal basic processing units.
Let us examine in detail both aspects successively.
46.2.2 The External Model
Figure 46.4 depicts the external model of a service in use. A service is mainly described by the input/output
data and is triggered by an external event. The used data and events can be organized into classes, with the
description of a set of characteristics such as the format, the accuracy, the refresh period, etc., for each data
class. The description of the input and output behaviors exhibits the possible interconnection between the
service and those that precede or follow this particular service. This is then a data-driven representation
of the service relationships equivalent to an explicit representation with the advantage of being more
efcient and general (new services can be added without being obliged to physically interconnect the
entire system).
Control parameters are received from the parent activity that requests the service. The control para-
meters affect the modalities of processing and the modalities of the underlying sensor the service might
encapsulate. The control parameters are usually passed in conjunction with the service activation request.
For example, one can easily imagine the obstacle detection service running in a cyclic shooting modality
or in a single shooting modality, depending on the nature of the activity that requests the service.
The launching of a service can be conditioned by the verication of its activation conditions. The
distinction between a request and a condition is that the request for a service is emitted by the user, while
the condition is processed by the system. Those conditions are often related with the access rights or to
the security aspects and induce the verication of the origin of the request, the mode of transmission
used, etc.
2006 by Taylor & Francis Group, LLC
Intelligent Sensors: Analysis and Design 46-5
Resources
Request Conditions
Control
parameters
Input
data
Output
data
Service
FIGURE 46.4 Graphic model for representing a service.
Resources include both hardware (sensor, CPU, memory, etc.) as well as software (extended Kalman
lter, etc.). Input and output data can also be considered as resources with the problem of the data
obsolescence.
Other properties can be added such as time and complexity measures that can help to select between
different methods.
The set of services can be easily assimilated to a set of instructions we nd in a regular computer. To
cope with various states of the sensor that can occur during its life (out of order, in conguration, in
manual exploitation, in automatic exploitation, etc.), the different external services are organized into
coherent subsets of User Operating Modes (USOMs). In the model, a sensor service can be requested, and
thus accepted, only if the current active USOM includes this service. This prevents the request of services
when they cannot be available. According to Reference 10:
Denition 46.2 An external mode is a subset of the set of external services included in the intelligent
instrument. An external mode includes at least one external service and each service is included in an external
mode at least.
The operating modes can be easily described with respect to a labeled transition system where the label
is the external event matching a request of commuting the current mode (see Figure 46.5). Moreover, in
each USOM, a notion of context may exist, where the context is the subset of the services that are implicitly
requestedas long as the systemremains inthe givenUSOM. The external services includedinside anUSOM
are supposed to behave independently. We say that they belong to orthogonal regions of a state, sometimes
termed as constellations. As an example, the external services inside the intelligent sensor of the navigation
systemcould be structured into a wall following or an execution trajectory mode. The position computation
and the obstacle detection services would be implicitly executed when entering the corresponding mode.
If the sensor reveals a complex state space, it can be decomposed into nonoverlapping substates to reduce
the complexity. For example, the wall following and execution trajectory states can belong to a more general
measuring state, often called a macro-state or a super-state, belonging to an active macro-state, etc. Some
properties that the design must satisfy and that can be checked against the functional specications have
been elaborated. They complement the formal model. For example, properties may express some axioms
such as:
1. An external mode is a nonempty set of external modes.
2. Each external service belongs to one external mode, at least.
3. In an intelligent instrument, the set of disconnected vertexes in the state-transition diagram is
empty, that is, there is no external disconnected mode.
4. A transition between two modes is unique in the graph.
5. Each external mode must be reachable and each external mode can be left.
6. etc.
2006 by Taylor & Francis Group, LLC
46-6 Embedded Systems Handbook
Service
y
Service
x
USOM
2
USOM
1
T
21
T
12
Service
x
FIGURE 46.5 The concept of USOMs.
These properties must be veriedtoguarantee a safe productionof the intelligent instrument. For example,
the verication of the property 4 can be easily done by checking if the fan-in and fan-out degrees of every
vertex of the graph are not equal to zero. An incidence matrix can help in doing this.
The external viewpoint of the intelligent instrument will usually be complemented with a second level
of description, to capture the algorithmic ow. This level must exhibit the treatments that concur to the
global functionality which is usually called internal services.
46.2.3 Functional Decomposition of a Service
Complex operations often need to be decomposed into multiple primitive operations in order to produce
the overall behavior. For example, an external measurement service can induce a very complex treat-
ment, probably following a step of initialization and, for self-terminated services, followed by a step of
termination. So, Denition 46.1 is usually complemented with the following denition [11]:
Denition 46.3 An external service is the result of the execution of internal services.
From the viewpoint of the designer, an internal service is an elementary operation, possibly extracted
from a library of components, for which no further decomposition is needed. Its I/O behavior can be
easily described through an algorithm. Depending on the area of applications, such a conceptual unit can
be known under various appellations such as a module, a codel [12], etc.
The functional decomposition of a complex external service into internal services clearly has the
following advantages:
A structured programming helps the designer to describe the different steps of the treatment,
without using internal state variables.
Transitions fromone step to the other explicitly denes the possible interruption points of the ser-
vice. Between these points, the operation is considered to be an atomic transaction. This preserves
the functionality of coherency problems (loss of data and so on).
Reuse of common units of programming: they are common to different services or are the result
of previous developments. For example, an obstacle detection service and a position computing
service can share a lot of common portions of code: the signal emission, the signal acquisition, and
so on. These units can be part of a library of Intellectual Properties (IP) modules.
As the reader can see, there is no mention at this point on the nature of the realization. Design units can be
implemented oncustomized or onsoftware processors. The hardware units canalso be freely implemented
in the discrete domain using FPGA (Field Programmable Gate Array) or DSP (Digital Signal Processor)
components, or by using analog components in the continuous domain. In order to reduce the complexity
of the design, the denition of the executive architecture (problem known as the partitioning problem)
2006 by Taylor & Francis Group, LLC
Intelligent Sensors: Analysis and Design 46-7
Input data
acquisition
Internal
event
External
event
Processing
Output data
emission
FIGURE 46.6 Example of a functional decomposition.
must be postponed during a detailed design step, taking into account various design constraints such as
economic and real-time constraints. The internal description of a complex service can be expressed using
an activity diagram. The activity diagrams are well suited to show algorithmic details or procedural ow
inside the service. They are often compared with owcharts but are more expressive. Such a diagram
describes the internal operations to be achieved on the incoming ow and their temporal dependencies.
Depending on the complexity of the service, the detailed renement of an activity can be performed
using several hierarchical successive levels. The processing step in the Figure 46.6 can, for example, be
rened into a feature extraction step followed by a data classication step, and a nal decision step in
sequential ordering. Elementary operations, those for which no renement is needed will be described by
their internal behavior, usually through a algorithm. The activation of the internal services is controlled
by internal events.
Denition 46.4 An internal event in an intelligent instrument is an event that is produced and consumed by
the instrument itself.
The producer gives birth to the event. Consumers can react to this event in order to start their processing.
The activation of an operation will often depend on the completion of the previous operation but, more
complex temporal dependencies will also frequently happen like the activation of a signal processing
operation conditioned to the end of an external conversion operation. In this way, an external service
can itself be in the position of a client of another component, dynamically starting an external activity
to request data necessary to the achievement of its mission. While an external event is associated with
a unique external service, an internal event can be associated with several internal services, leading to
a n producerm consumer relationship. In this, activity diagrams differ from conventional owcharts
relatively to their capacity to represent concurrency. In Figure 46.7, the execution of the internal service V
is followed by the simultaneous execution of the internal services Wand Y.
Expressing sequential and parallel compositions of treatments is not always sufcient. The execution
of a service can be affected by the state of the resources. The concept of version has been created with
the aim of providing alternative versions of treatments that will enable the service to operate under
nonnominal conditions. This is a means to take the fault tolerance probleminto account. All the versions
of a given service will share the same request and produce the same output, but the inputs, procedures,
and resources will differ from one version to another. For example, a measurement service uses two
transducers in a nominal mode of the service, to compute a data value using a sophisticated data analysis
method. If a defect is detected on one of the transducers, the measurement service can continue to operate
using a subset of the features extracted from the input data. Of course, the quality of the result will
decrease. The versions are typically ranked and classied into internal modes such as the nominal mode
and the degraded mode. The management of the versions of a service can be straightforward: at time t
2006 by Taylor & Francis Group, LLC
46-8 Embedded Systems Handbook
Internal
service W
Internal
service U
Internal
service Y
Internal
service Z
Internal
service V
Nominal
Degraded
FIGURE 46.7 Version and internal mode concept.
when the request for service is emitted, the version to be carried out will be the one with the lowest rank
whose resources are all nonfaulty [13]. Like for the USOMs, the description of the internal modes can be
done using a state diagram.
Having reviewed the generic formal model of the intelligent sensor, let us now turn to some validation
aspects of the sensor.
46.2.4 Sensor Architectural Design
We have seen the mathematical properties underlying the intelligent sensor generic model of computation.
These properties can be efciently used to answer questions about system behavior without carrying
out expensive verication tasks. The formal validation generally uses an automata-theoretic approach,
modeling the formal description by Finite-State Machine (FSM) and the language of the automaton
[14]. As stated earlier, the nal implementation of the intelligent sensor should be made as much as
possible, using automatic generation from the generic model, to ensure implementations that are correct
by construction. For example, the protocol dened at the high level of abstraction uses the concept
of message passing, where a message is an abstraction of data and/or control information passed from
one component to another. Various mechanisms (the message signature) can be envisioned at a lower
level of denition, including a function call, an interruption, an event using a Real-Time Operating
System (RTOS), an ADA rendezvous or a Remote Procedure Call (RPC) in a distributed implementation.
Consequently, a prototype of the sensor is also a useful mean to validate the specication in the presence
of the real-time inputs, with physical characteristics similar to those of the nal implementation and
which will be produced by the synthesis stage. Rapid prototyping aims at analyzing the performance of
an implementation, to validate its capability of satisfying hard real-time constraints, etc. To do so, the key
technologies are the use of software synthesis, hardware synthesis, and the synthesis of interfaces between
software and hardware using programmable components.
The prototype to be generated will be highly dependant on the physical architecture selected and on the
physical communication links. The targeted architecture is strongly dependant on the cost of components
and production. Consequently, as shown in the Figure 46.8, there are some nontrivial aspects to be
analyzed, to be in a position of producing a prototype:
The denition of the hardware/software architecture (partitioning, mapping).
The sequencing of the software on each software processor (scheduling).
2006 by Taylor & Francis Group, LLC
Intelligent Sensors: Analysis and Design 46-9
Formal model of the
intelligent sensor
Formal validation
Partitioning, mapping,
scheduling
Prototype of the
intelligent sensor
Hardware/software
synthesis
FIGURE 46.8 Design ow of an intelligent sensor.
I
2
I
3
I
1
MUX
xxxxx
Ref.
E/B
EPROM RAM
Communication
interface
ADC DAC
T
2
T
3
T
1
MUX
Software processor
Output interface Input interface
Amp Amp
Microprocessor
System bus
FIGURE 46.9 Typical model of an architecture for an intelligent sensor system.
Figure 46.9 shows typical hardware/software architecture for an intelligent sensor. We can observe the
hybrid character of the intelligent sensors, mixing together analog and numerical components. Each
component description can be rened to exhibit the detailed architecture of a component. On the
gure, we can see an architecture organized around a microprocessor that processes the functionalit-
ies for which it is in charge. This processor can be a DSP, a processor that has a CPU customized for
data-intensive operations such as digital ltering. Bidirectional communication is ensured through vari-
ous means, using a serial link, a CAN (Controller Area Network), an Ethernet link, etc. [15]. Finally,
memories (ROM, RAM, etc.) ensure the memorization of the information located inside the sensor. In
the future, the hardware architecture will tend to combine more and more customized hardware with
embedded software. The denition of a hardware/software architecture involves checking if the sensor
can be schedulable, that is, if all the performance requirements can be guaranteed. A deadline (a point in
the time or a delta-interval by which a system action must occur) is an example of a requirement that,
2006 by Taylor & Francis Group, LLC
46-10 Embedded Systems Handbook
when missed, constitutes an erroneous computation. Consequently, the denition of hardware/software
architectures generally requires more complex modeling, by dening the external timing requirements of
the messages. The requirement of Quality-of-Service (QoS) of a message can be expressed using different
means. For example, the response timing can be dened in terms of timeliness requirements (typically,
deadlines) [16].
Assigning an execution order to concurrent modules, and nding a sequence of instructions imple-
menting a functional module are the primary challenges in software organization. These can be nontrivial
issues to deal with, particularly when one must consider the performance, as well as the functional require-
ments, of the system. The software implementation can be facilitated by the use of real-time languages
and their underlying executive kernel. Such languages provide a style of programming that enables the
manipulation of events and/or state changes with constructions expressing the behavior through the par-
allelism, the synchronizing, etc. Selecting a language to compute the model is not straightforward with
basically several possibilities: developing a dedicated language or the use of an existing language, the use
of a graphical or a textual input form, etc. The domain of applicability of a language must also be carefully
studied with basically two approaches. The synchronous approach states that time is a sequence of instants
between which nothing interesting occurs [17]. In each instant, some events occur in the environment
and a reaction is computed instantly by the modeled design. This means that computation and internal
communication take no time. This hypothesis is very convenient, allowing modeling the complete system
as a single FSM with a completely predicable behavior. ESTEREL [18,19] and its graphical expression form,
the SYNCCHARTS [20,21], are representatives of synchronous imperative programming languages. Like
all language specialized to control-dominated systems programming, data manipulation cannot be done
very naturally. In the asynchronous approach like in ELECTRE [22], events are observed and processed
immediately. This approach enhances the expression power of the language. Moreover, the design can be
more efciently implemented on heterogeneous hardware/software architectures. On another side, timing
constraints are difcult to check.
46.3 The CAP Language
46.3.1 Description
The generic intelligent sensor model gave birth to a new language that can be akin to the category
of asynchronous languages, the CAP language [23]. Its ability of providing a rapid prototype of the
intelligent sensor model on common microcontrollers available on the market has been one of the main
reasons of developing this language. The developers have consequently limited the implementation to
monoprocessors and to a sequential operation.
The CAP language is an incomplete language in the sense that it species only the interaction between
computational modules (internal services), and not the computation performed by the modules. An
interface with a host language species the behavioral contents of such units through C instructions.
Like every conventional language, the grammar can be described using a Backus Naur Form (BNF) also
known as a metalanguage. Figure 46.10 shows a formal grammar dened by Reference 10. Reference 11
complemented the language with the possibility of declaring internal services. As can be seen, the two parts
of the model are normally successively expressed. The interface of the instrument follows the metavariable
instrument with:
A number referring to the instrument as a node in the network.
Variables that can be exported or updated on the network.
Communication links imported or exported to implement the command protocol.
The expression of the graph of the external modes is achieved through the set of vertices and the set of
transitions. Each transition is then described by declaring the input vertex, the output vertex, and the link
of communication source of the event of transition. Variables of the imported or exported lists are dened
according to a type in C. Finally, the expression of a list of external modes in the denition of a service
2006 by Taylor & Francis Group, LLC
Intelligent Sensors: Analysis and Design 46-11
FIGURE 46.10 An overview of the formal grammar. (From J. Ttussor, PhD thesis. With permission.)
can be noted. These modes are the only ones in which the service can be launched. The description of
the internal model is close to the external one and can be easily understood. As shown on Figure 46.11,
the principle of the operation of the compiler is relatively straightforward: after the lexical and syntactical
analyses, a set of lists reecting the formal model is produced by the code generator. After being generated,
the code is compiled and linked with the user-dened libraries and the processor-dependant startup code.
The generated data structures are split into two sets of list: a rst set contains the names of the objects and
is subdivided into eight lists: external modes, internal modes, external services, internal services, variables,
export links, import links, and internal events. The second set contains the detailed description of each
transition. The conformity of the description to the formal model will be analyzed using these lists. They
also contribute to the providing of a level of abstraction between the software synthesizer and the real
machine.
2006 by Taylor & Francis Group, LLC
46-12 Embedded Systems Handbook
Source of the IS
Intermediate
data structures
Input/output
libraries
Kernel
Executable
CAP compiler
C compiler
FIGURE 46.11 The CAP design ow. (From J. Ttussor, PhD thesis. With permission.)
Let us illustrate this principles on an intelligent instrument designed to process ultrasonic sounds in
order to produce a distance measurement value.
46.3.2 Illustration
To illustrate the approach, let us take the following example [11]: a measurement system is composed of
one ultrasonic transmitter able to emit a signal toward a target in order to compute the distance between
this target and the sensor with the help of two receivers. Figure 46.12 shows the basic conguration of the
measurement system.
The intelligent instrument delivers the two measurements of distance d
1
and d
2
, each of them with a val-
idation degree d
v1
and d
v2
, respectively. As shown in Figure 46.13, the basic principle of the measurement
is the following: the ultrasonic emitter sends a sinusoidal waveform linearly modulated in frequency to the
F = (f
max
f
min
) interval with a rate of variation of the frequency: = f /T
r
. Then, the instantaneous
frequency F
received
undergoes an offset relatively to the frequency F
emitted
of f = |F
emitted
F
received
|.
Reference 24 has shown that the distance d can be determined by measuring the offset only inside the
interval [t
0
, T
r
] where its value is f
a
. For that, the measurement is inhibited during the time t
a
where the
offset of the frequency has the value f
b
.
The state space of the sensor is decomposed into two USOMs: the conguration mode (which is the
default USOM) and the measurement mode (see Figure 46.14). The transition between each mode is
triggered on the cantcp_in event.
A number of internal services have been specically developed for this application:
Initialization and conguration services enable the modication of the parameter values used for
the computation of d,: duration of the slope T
r
, slope p, time t
dead
, and voltage u
min
. They take
account of particular conditions of measurements (nature of the obstacle, environment, etc.).
Internal services such as those depicted in Figure 46.15 contribute to the measurement elaboration;
for example, the internal service for the slope generationhas the responsibility of sending the signal in
directionof the obstacle; suppression of aberrant measurements lters and keeps the impulsions close
to the median f
amoy
(average of the measured frequencies on T
r
), while the others are discarded.
The Dynamic Packet Transport (DPT) service computes the pseudo-triangular distribution of the
possibilities.
2006 by Taylor & Francis Group, LLC
Intelligent Sensors: Analysis and Design 46-13
Receiver Receiver
Transmitter
Sensor
d
1
d
0
d
2
FIGURE 46.12 Conguration of the measurement system. (FromJ. Ttussor, PhD thesis. With permission.)
Emitted
signal
Received
signal
Slope a
F
f =f
a
f =f b
U
s
t
t
T
r
t
0
t
dead
f
min
f
max
f
FIGURE 46.13 Principle for the measurement of f
a
. (FromJ. Ttussor, PhD thesis. With permission.)
Internal calculations are not very complex. For example, the distance computation service computes the
distance on every f
a
input. By doing this, there are N
exp
measurements of d carried out during the
interval T
r
, using the following formula:
D =
f
a
2
T
r
t
dead
F
V (46.1)
The internal service named validity estimation computes a value in the [0, 1] interval according to the
following rule:
D
v
=
N
exp
N
th
(46.2)
where N
th
is the theoretical number of the periods of f
a
that are supposed to be observed by the instrument.
2006 by Taylor & Francis Group, LLC
46-14 Embedded Systems Handbook
Import_slope_duration Import_umin
Configuration
Measure
Cantcp_in Cantcp_in
Import_wave_speed
Reset
Import_slope Law_selection
Distance_measurement Standard deviation
Mode configuration, measure.
Transition configuration to measure on cantcp_in.
Transition measure to configuration on cantcp_in.
Default mode configuration.
FIGURE 46.14 Graph of the USOMs and the corresponding CAP declarations. (From J. Ttussor, PhD thesis. With
permission.)
As shown, the declaration of the distance_measurement external service includes the request (on
cantcp_in) and the USOM (in measure) where it is available. The measurement functionality can be
carried out according to four internal modes: nominal, degraded1, degraded2, and critical. For example,
when the validity of the measurement d1 (see equation [46.2]) goes down below a given threshold in the
nominal mode, the instrument can automatically switch in the degraded1 mode.
Figure 46.16 part of the synthesized code presents an excerpt of the synthesized code. As stated earlier,
these symbolic data structures will be used by the verication tool to check the properties of the formal
model. The data structures (array of services, etc.) will be typically stored inside a volatile memory (RAM
or SRAM) of the hardware architecture, while the automaton, not reachable to the user will be set in a long-
term memory (EPROM, FLASH, etc.). Depending on the operating mode, development or exploitation
mode, the user program can be stored inside the volatile memory or set in long-term memory.
46.3.3 Implementation
The execution of the various services is handled by an automaton that has the responsibility to interpret the
formal model. As shown in Figure 46.17, the execution machine runs a cyclic program that processes the
inputs, updating a FIFO(First In First Out) storing of the pending events. Then, depending on the nature
of the event, a transition can be red or a call to a procedure is achieved. Permanent functions are then
executed. The loop ends with the emission of the output messages. For each step, the detailed behavior is
the following:
1. Reading of a message on each communication link: if the intelligent sensor is concerned by the
message, it will be processed according to its type:
In case of a data message, the corresponding variable is updated and the associated event is
triggered.
2006 by Taylor & Francis Group, LLC
Intelligent Sensors: Analysis and Design 46-15
Slope
Generation
Acquisition
(voice 2)
Acquisition
(voice 1)
Suppression of
aberrant measurements
1
Translation in
accumalized units 1
Distance
computation 1
Standard_
deviation 1
Get_Standard_
deviation1
Get_Standard_
deviation2
Standard_
deviation2
Get_measurement 1
Get_measurement 2
DPT1
DPT2
Validity
estimation 1
Validity
estimation 2
Get_validity1
Get_validity2
Average
computation
Get_DPT1
Get_DPT2
Get average
Distance
computation 2
Translation in
accumalized units 2
Suppression of
aberrant measurements
2
Service distance_measurement on cantcp_in in measure
{uses slop_generation, acquisition 1, distance_conputation 1, ... }
Iservice acquisition 1 on end (slop_generation)
{/* C code of the internal service */}
FIGURE 46.15 Functional decomposition of the measurement functionality. (From J. Ttussor, PhD thesis. With
permission.)
In case of an event message, the corresponding event is inserted inside the FIFO event for an
ulterior processing.
In case of a query message, the current mode is exported on output links.
In case of a variable message, the variable is linked to import an external variable.
2. Event queue processing: three situations can arise and be analyzed in the following order:
The event is a request of a change of the external mode. Provided this change is authorized, the
internal variable is updated and the new mode is exported on output links.
The same thing for an event corresponding to a request of a change of the internal mode.
The event is an execution request of an external service. Provided the service(s) is (are) enabled
in the current mode, these services are executed using the procedural entry point.
The same thing for an event corresponding to a request for executing an internal service. If
dened, an event noticing the end of execution is inserted in the FIFO event.
3. Execution of the EVER services: these are services that run permanently.
4. Diffusion of the messages on communication links (exported variables, change of mode, etc.).
2006 by Taylor & Francis Group, LLC
46-16 Embedded Systems Handbook
FIGURE 46.16 Part of the synthesized code. (FromJ. Ttussor, PhD thesis. With permission.)
Input link
FIFO
Event link
FIFO
Output
link FIFO
Reading and processing of the
messages
Read and processing of the events
Triggering of EVER events
Emission of the messages
FIGURE 46.17 Principle of operation of the automaton.
2006 by Taylor & Francis Group, LLC
Intelligent Sensors: Analysis and Design 46-17
Note the reduced portability of the application, because of the tight link that exists between the application
and the hardware. Open software platforms (EmbeddedJava, JavaCard, etc. [25]) can be considered as
attractive solutions tothis problem, by interpreting the applications via a virtual machine. But, the designer
has to be aware of the substantial performance penalty that could be paid by adopting this solution.
46.4 Conclusion
In this chapter, a generic model for the design of intelligent instruments has been discussed. In the formal
model of the sensor, the specication of the external services is given according to a user point of view
of the functionalities available inside the sensor. The external model uses the concept of the USOMs to
preserve the activation of services that are not available during the current external mode. Internal services
dene the basic units that will be assembled to describe the complex behavior of an external service,
taking into account various temporal dependencies. Like the external services, the internal services can
be gathered inside internal modes. Internal modes dene the internal states of a service according to its
possibilities of operating under various contextual situations. Some of the basic principles underlying
the implementation of the model on hardware architectures have also been exposed. The advantages of
the numeric instrumentation for the processing relatively to the conventional instrumentation are well
established today. We have nally discussed of an implementation that can be generated automatically
through the use of language like CAP. The automatic generation of the implementation is conditioned to
the formal verication of the properties underlying the generic model.
As stated earlier, the intelligent sensors to come will be composed of more complex and heterogeneous
components. This trendwill change the industrial landscape, making the trade andassembly of IPs embod-
iedinlayouts, RTL(Register Transfer Level) designs andsoftware programs indispensable [17]. This aspect,
not specic to the design of the intelligent sensors is a great challenge. For example, Hardware/software
cosimulation is often performed with separated simulation models [26]. This makes trade-off evalu-
ation difcult because the models must be recompiled whenever a change in the architecture mapping
is made. The conuent system level design tool [27] introduces an intermediate level of abstraction, the
functional level, between the specication and the architectural model of the sensor [28]. As shown in
Figure 46.18, this level denes the logical architecture of the system in terms of functional components
Requirements
document
Specifications
docment
Functional
model
Architectural
model
Prototypes
Customer needs
Requirement defination
System specifications
Functional design
Architecture design
Printotyping
Rense
Capitalization
IP
Product
FIGURE 46.18 The MCSE design methodology for complex sensor hardware architectures.
2006 by Taylor & Francis Group, LLC
46-18 Embedded Systems Handbook
(simply called functions) and the relations between them(ports, shared variables, events depending on the
kind of relationship). Like in the Vulcan system[29], the use of a control/data owgraph for the behavioral
model facilitates the partitioning at the operation level. This environment has been used successfully for
the design of an intelligent sensor for pattern recognition [30]. The functional model provides an envir-
onment for behavioral and performance analysis in a technology- and language-independent manner
that allows implementation of the same functionality on diverse physical architectures [31]. Automatic
synthesis can be achieved on hardware (VHDL descriptions) or RTOS primitives (VxWorks). A systemC
simulation engine and code generator is also available for system-on-chip (SoC).
Acknowledgment
All the gures in relation with the CAP language are reprinted with permission from Dr L. Foulloy,
University of Savoy, France.
References
[1] N. Medvodovic and R.N. Taylor. A classication and comparison framework for software
architecture description language. IEEE Transactions on Software Engineering, 26: 7093, 2000.
[2] M. Staroswiecki and M. Bayart. Models and languages for the interoperability of smart
instruments. Automatica, 32: 859873, 1996.
[3] N. Halbwachs. Synchronous Programming of Reactive Systems. Kluwer Academic Publishers,
Dordrecht, 1993.
[4] E. Benoit, R. Dapoigny, and L. Foulloy. Fuzzy-based intelligent sensors: modelling, design,
applications. In Proceedings of the 8th IEEE International Conference on Emerging Technologies
(ETFA2001), Antibes, France, October 2001.
[5] E. Dekneuvel, M. Ghallab, and J.P. Thibault. Hypotheses management in a multi-sensory per-
ception machine. In Proceedings of the 10th European Conference on Articial Intelligence (ECAI),
Vienna, Austria, August 1992.
[6] J.M. Riviere, M. Bayart, J.M. Thiriet, A. Bouras, and M. Robert. Intelligent instruments: some
modeling approaches. Measurement and Control, 29: 179186, 1996.
[7] S. Edwards, L. Lavagno, E.A. Lee, and A. Sangiovanni-Vincentelli. Design of embedded systems:
formal models, validation and synthesis. Proceedings of the IEE, 85: 366390, 1997.
[8] D. Harel, H. Lachover, A. Naamad, A. Pnueli et al. STATEMATE: a working environment for
the development of complex reactive systems. IEEE Transactions on Software Engineering, 4(16):
403414, 1990.
[9] J.P. Calvez. Embedded Real-Time Systems. A Specication and Design Methodology. John Wiley &
Sons, NewYork, 1993.
[10] A. Bouras and M. Staroswiecki. Building distributed architectures by the interconnection of
intelligent instruments. In IFAC INCOM98, Nancy, June 1998.
[11] J. Tailland, L. Foulloy, and E. Benoit. Automatic generation of intelligent instruments frominternal
model. In Proceedings of the International Conference SICICA2000, Argentina, September 2000.
[12] S. Fleury, M. Herrb, and R. Chatila. Design of a modular architecture for autonomous robot. In
Proceedings of the IEEE International Conference on Robotics and Automation, San Diego, CA, 1994.
[13] M. Staroswiecki, G. Hoblos, and A. Aitouche. Fault tolerance analysis of sensor systems. In
Proceedings of the 38th IEEE Conference on Decision and Control Phoenix, USA, 1999.
[14] R.P. Kurshan. Automatic-Theoretic Verication of Coordinating Processes. Princeton University
Press, 1994.
[15] J. Warrior. Smart sensor networks of the future. Sensor Magazine Mars, 4045, 1997.
[16] B.P. Douglass. Doing Hard Time. Developing Real-Time Systems with UML, Objects, Frameworks,
and Patterns. Addison Wesley, Reading, MA, 1999.
2006 by Taylor & Francis Group, LLC
Intelligent Sensors: Analysis and Design 46-19
[17] F. Balarin, M. Chiodo, P. Giusto, H. Hsieh, A. Jurecska, L. Lavagno, C. Passerone, A. Sangiovanni-
Vincentelli, E. Sentovich, K. Suzuki, and B. Tabbara. HardwareSoftware Co-Design of Embedded
Systems. Kluwer Academic Publishers, Dordrecht, 1997.
[18] G. Berry and G. Gonthier. The ESTEREL synchronous programming language: design, semantics,
implementation. Science of Computer Programming, 19: 87152, 1992.
[19] Esterel Studio. http://www.esterel-technologies.com
[20] C. Andr. Representation and analysis of reactive behaviors: a synchronous approach. In
Proceedings of the CESA96, Lille, France, 1996, pp. 1929.
[21] C. Andr, F. Boulanger, and A. Girault. A software implementation of synchronous programs.
In Proceedings of the 2nd International Conference on Application of Concurrency to System Design
(ICACSD 2001), Newcastle upon Tyne, UK, June 2529, 2001, pp. 133142.
[22] F. Cassez and O. Roux. Compilation of the ELECTRE reactive language into nite transition
systems. Theoretical Computer Science, 146: 109143, 1995.
[23] E. Benoit, J. Tailland, L. Foullooy, and G. Mauris. A software tool for designing intelligent sensors.
In Proceedings of the IEEE Instrumentation and Measurement Technology IMTC/2000, Baltimore,
MD, May 2000.
[24] G. Mauris, E. Benoit, and L. Foulloy. Ultrasonic smart sensors: the importance of the measurement
principle. In Proceedings of the IEEE/SMC International Conference on Systems Engineering in the
Service of Humans, Le touquet, France, October 1993.
[25] EmbeddedJava and Javacard. http://Java.sun.com
[26] J. Rowson. Hardware/software co-simulation. In Proceedings of the Design Automation Conference,
1994, pp. 439440.
[27] CoFluent Studio. http://www.couentdesign.com
[28] J.P. Calvez. A co-design case study with the MCSE methodology. Design Automation of Embedded
Systems, Special Issue on Embedded Systems Case Studies, 1: 183211, 1996.
[29] R.K. Gupta, C.N. Coelho, and G. De Micheli. Program implementation schemes for
hardware/software systems. IEEE Computer, 27: 4855, 1994.
[30] E. Dekneuvel, F. Muller, and T. Pitarque. Ultrasonic smart sensor design for a distributed percep-
tion system. In Proceedings of the 8th IEEE International Conference on Emerging Technologies and
Factory Automation (ETFA), Antibes Juan le spins, France, 1518 October 2001.
[31] J.P. Calvez and O. Pasquier. Performance assessment of embedded hw/sw systems. In Proceedings
of the International Conference on Computer Design, Austin, TX, October 1995.
2006 by Taylor & Francis Group, LLC