Encontre seu próximo livro favorito

Torne'se membro hoje e leia gratuitamente por 30 dias.
Cooperative and Graph Signal Processing: Principles and Applications

Cooperative and Graph Signal Processing: Principles and Applications

Ler amostra

Cooperative and Graph Signal Processing: Principles and Applications

1,947 página
19 horas
Lançado em:
Jul 4, 2018


Cooperative and Graph Signal Processing: Principles and Applications presents the fundamentals of signal processing over networks and the latest advances in graph signal processing. A range of key concepts are clearly explained, including learning, adaptation, optimization, control, inference and machine learning. Building on the principles of these areas, the book then shows how they are relevant to understanding distributed communication, networking and sensing and social networks. Finally, the book shows how the principles are applied to a range of applications, such as Big data, Media and video, Smart grids, Internet of Things, Wireless health and Neuroscience.

With this book readers will learn the basics of adaptation and learning in networks, the essentials of detection, estimation and filtering, Bayesian inference in networks, optimization and control, machine learning, signal processing on graphs, signal processing for distributed communication, social networks from the perspective of flow of information, and how to apply signal processing methods in distributed settings.

  • Presents the first book on cooperative signal processing and graph signal processing
  • Provides a range of applications and application areas that are thoroughly covered
  • Includes an editor in chief and associate editor from the IEEE Transactions on Signal Processing and Information Processing over Networks who have recruited top contributors for the book
Lançado em:
Jul 4, 2018

Relacionado a Cooperative and Graph Signal Processing

Livros relacionados
Artigos relacionados

Amostra do Livro

Cooperative and Graph Signal Processing - Academic Press

Cooperative and Graph Signal Processing

Principles and Applications

First Edition

Petar M. Djurić

Cédric Richard

Table of Contents

Cover image

Title page




Part 1: Basics of Inference Over Networks

Chapter 1: Asynchronous Adaptive Networksa


1.1 Introduction

1.2 Single-Agent Adaptation and Learning

1.3 Centralized Adaptation and Learning

1.4 Synchronous Multiagent Adaptation and Learning

1.5 Asynchronous Multiagent Adaptation and Learning

1.6 Asynchronous Network Performance

1.7 Network Stability and Performance

1.8 Concluding Remarks

Chapter 2: Estimation and Detection Over Adaptive Networks


2.1 Introduction

2.2 Inference Over Networks

2.3 Diffusion Implementations

2.4 Distributed Adaptive Estimation (DAE)

2.5 Distributed Adaptive Detection (DAD)

2.6 Universal Scaling Laws: Estimation Versus Detection


Chapter 3: Multitask Learning Over Adaptive Networks With Grouping Strategies


3.1 Introduction

3.2 Network Model and Diffusion LMS

3.3 Group Diffusion LMS

3.4 Grouping Strategies

3.5 Simulations

3.6 Conclusion and Perspectives

Proof of Corollary 3.1

Chapter 4: Bayesian Approach to Collaborative Inference in Networks of Agents



4.1 Introduction

4.2 Bayesian Inference Over Networks

4.3 Example: Diffusion Kalman filter

4.4 Conclusion

Chapter 5: Multiagent Distributed Optimization


5.1 Introduction

5.2 Distributed Optimization

5.3 Distributed Gradient Descent

5.4 Dual Methods

5.5 Second-Order Methods

5.6 Practical Considerations

5.7 Conclusion

Chapter 6: Distributed Kalman and Particle Filtering



6.1 Distributed Sequential State Estimation

6.2 Distributed Kalman Filtering

6.3 Distributed Particle Filtering

6.4 Conclusions

A.1 Appendix

Chapter 7: Game Theoretic Learning


7.1 Introduction

7.2 Learning in Games Preliminaries

7.3 Network Learning Algorithms for Complete Information Games

7.4 Network Learning Algorithms for Incomplete Information Games

7.5 Summary and Discussion

Part 2: Signal Processing on Graphs

Chapter 8: Graph Signal Processing


8.1 Introduction

8.2 Brief Review of the Literature

8.3 DSP: A Quick Refresher

8.4 Graph Signal Processing

8.5 Conclusion

Chapter 9: Sampling and Recovery of Graph Signals


9.1 Introduction

9.2 Notation and Background

9.3 Sampling and Recovery

9.4 Adaptive Sampling and Recovery

9.5 Conclusions

Chapter 10: Bayesian Active Learning on Graphs



10.1 Bayesian Active Learning on Graphs

10.2 Active Learning in BMRF: Expected Error Minimization

10.3 Algorithms to Approximate EEM

10.4 Experiments


A.1 Proof of Lemma 10.1

A.2 Proof of Lemmas 10.2 and 10.3

Chapter 11: Design of Graph Filters and Filterbanks



11.1 Graph Fourier Transform and Frequencies

11.2 Graph Filters

11.3 Filterbanks and Multiscale Transforms on Graphs

11.4 Conclusion

Chapter 12: Statistical Graph Signal Processing: Stationarity and Spectral Estimation


12.1 Random Graph Processes

12.2 Weakly Stationary Graph Processes

12.3 Power Spectral Density Estimators

12.4 Node Subsampling for PSD Estimation

12.5 Discussion and the Road Ahead

Chapter 13: Inference of Graph Topology



13.1 Introduction

13.2 Graph Inference: A Historical Overview

13.3 Graph Inference From Diffused Signals

13.4 Robust Network Topology Inference

13.5 Nonstationary Diffusion Processes

13.6 Discussion

Chapter 14: Partially Absorbing Random Walks: A Unified Framework for Learning on Graphs



14.1 Introduction

14.2 Partially Absorbing Random Walks

14.3 A Unified View for Learning on Graphs

14.4 Experiments

14.5 Conclusions


Part 3: Distributed Communications, Networking, and Sensing

Chapter 15: Methods for Decentralized Signal Processing With Big Data


15.1 Introduction

15.2 Decentralized Optimization Algorithms

15.3 Application: Matrix Completion

15.4 Numerical Experiments

15.5 Conclusions and Other Extensions


Chapter 16: The Edge Cloud: A Holistic View of Communication, Computation, and Caching



16.1 Introduction

16.2 Holistic View of Communication, Computation, and Caching

16.3 Joint Optimization of Communication and Computation

16.4 Joint Optimization of Caching and Communication

16.5 Graph-Based Resource Allocation

16.6 Network Reliability

16.7 Conclusions

Chapter 17: Applications of Graph Connectivity to Network Security


17.1 Introduction

17.2 Algebraic Graph Theory Overview

17.3 Using Connectivity to Direct an Interference Attack

17.4 Dynamic Game for Improving Connectivity Against Jamming

17.5 Conclusions

Chapter 18: Team Methods for Device Cooperation in Wireless Networks


18.1 Introduction

18.2 Team Decisions Framework

18.3 Team Decision Methods for Decentralized MIMO Precoding

18.4 Conclusion


Chapter 19: Cooperative Data Exchange in Broadcast Networks


19.1 Introduction

19.2 Problem Definition

19.3 Randomized Algorithm for Problem CDE

19.4 Data Exchange With Faulty Clients

19.5 Data Exchange in the Presence of an Eavesdropper

19.6 Conclusion

Chapter 20: Collaborative Spectrum Sensing in the Presence of Byzantine Attacks


20.1 Introduction

20.2 Collaborative Spectrum Sensing

20.3 Spectrum Sensing in a Parallel CRN

20.4 Spectrum Sensing in a Peer-to-Peer CRN

20.5 Conclusion and Open Issues


Part 4: Social Networks

Chapter 21: Dynamics of Information Diffusion and Social Sensing


21.1 Introduction and Motivation

21.2 Information Diffusion in Large Scale Social Networks

21.3 Bayesian Social Learning Models for Online Reputation Systems

21.4 Revealed Preferences: Are Social Sensors Utility Maximizers?

21.5 Social Interaction of Channel Owners and YouTube Consumers

21.6 Closing Remarks

Chapter 22: Active Sensing of Social Networks: Network Identification From Low-Rank Data



22.1 Introduction

22.2 DeGroot Opinion Dynamics

22.3 Network Identification

22.4 Numerical Experiments

22.5 Conclusions

Chapter 23: Dynamic Social Networks: Search and Data Routing


23.1 Introduction

23.2 Targeted Social Search Process

23.3 Inaltekin-Chiang-Poor Small-World Search Models

23.4 Altruistic Data Routing

23.5 Conclusions

Chapter 24: Information Diffusion and Rumor Spreading


24.1 Introduction

24.2 Related Work

24.3 Models of Information Cascades

24.4 Large-Scale Dynamics of Independent Cascades

24.5 Monitoring Information Cascades

24.6 An Algorithm for Reducing Information Cascades

24.7 Case Studies

24.8 Experiments

24.9 Conclusion

Chapter 25: Multilayer Social Networks



25.1 Introduction

25.2 Mathematical Formulation of Multilayer Networks

25.3 Diagnostics for Multilayer Networks: Centrality Analysis

25.4 Clustering and Community Detection in Multilayer Networks

25.5 Estimation of Dynamic Social Interaction Networks

25.6 Applications

25.7 Conclusions

Chapter 26: Multiagent Systems: Learning, Strategic Behavior, Cooperation, and Network Formation


26.1 Learning in Multiagent Systems: Examples and Challenges

26.2 Cooperative Contextual Bandits

26.3 Distributed Learning With Global or Group Feedback

26.4 Learning in Networks With Incomplete Information

26.5 Conclusion

Part 5: Applications

Chapter 27: Genomics and Systems Biology


27.1 Introduction

27.2 Gene Regulation

27.3 Next-Generation Sequencing Techniques

27.4 Gene Regulatory Network Prediction

27.5 Gene Regulatory Network Analysis

27.6 Gene Regulatory Network Modeling

27.7 Conclusions and Future Directions

Chapter 28: Diffusion Augmented Complex Extended Kalman Filtering for Adaptive Frequency Estimation in Distributed Power Networks



28.1 Introduction

28.2 Problem Formulation

28.3 Widely Linear Frequency Estimators and Their Nonlinear State Space Models

28.4 Diffusion Augmented Complex Extended Kalman Filters

28.5 Simulations

28.6 Conclusion

Chapter 29: Beacons and the City: Smart Internet of Things


29.1 Introduction

29.2 BLE Beacon Characteristics and Protocols

29.3 Limitations and Challenges

29.4 BLE Beacons in Smart Cities Applications

29.5 Proximity Estimation Through BLE Beacons

29.6 Experiments With Beacons

29.7 Conclusions

Chapter 30: Big Data



30.1 Learning From Big Data: Opportunities and Challenges

30.2 Matrix Completion and Nuclear Norm

30.3 Decentralized Analytics

30.4 Streaming Analytics

30.5 Concluding Summary

Chapter 31: Graph Signal Processing on Neuronal Networks


31.1 Introduction

31.2 Basic Concepts of Graph Signal Processing in Neuroscience

31.3 Dimensionality Reduction and Supervised Classification Through GSP

31.4 Graph Filtering of Brain Networks

31.5 Graph Learning

31.6 Spectral Graph Wavelets

31.7 An Illustration: ERP Classification on EEG Functional Connectivity Networks

31.8 Conclusions and Future Directions



Academic Press is an imprint of Elsevier

125 London Wall, London EC2Y 5AS, United Kingdom

525 B Street, Suite 1650, San Diego, CA 92101, United States

50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

Copyright © 2018 Elsevier Inc. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).


Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

Library of Congress Cataloging-in-Publication Data

A catalog record for this book is available from the Library of Congress

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

ISBN 978-0-12-813677-5

For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

Publisher: Mara Conner

Acquisition Editor: Tim Pitts

Editorial Project Manager: Peter Jardim

Production Project Manager: Sruthi Satheesh

Designer: Christian Bilbow

Typeset by SPi Global, India


Selin Aviyente       Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, United States

Paolo Banelli       Department of Engineering, University of Perugia, Perugia, Italy

Sergio Barbarossa       Department of Information Engineering, Electronics, and Telecommunications, Sapienza University of Rome, Rome, Italy

Pierre Borgnat       ENS de Lyon, Univ Lyon 1, CNRS, Laboratoire de Physique & IXXI, Univ Lyon, Lyon, France

Elena Ceci       Department of Information Engineering, Electronics, and Telecommunications, Sapienza University of Rome, Rome, Italy

Shih-Fu Chang       Department of Electrical Engineering and Department of Computer Science, Columbia University, New York, NY, United States

Chao Cheng       Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States

Jie Chen       Center of Intelligent Acoustics and Immersive Communications, School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an, China

Sundeep Prabhakar Chepuri       Delft University of Technology, Delft, The Netherlands

Luca Corinzia

CMLA, ENS Cachan, CNRS, University of Paris-Saclay, Cachan, France

ETH Zurich, Zurich, Switzerland

Kamil Dedecius       Department of Adaptive Systems, Institute of Information Theory and Automation, The Czech Academy of Sciences, Prague, Czech Republic

Petar M. Djurić       Department of Electrical and Computer Engineering, Stony Brook University, Stony Brook, NY, United States

Ceyhun Eksin       Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, United States

Andrey Garnaev       Wireless Information Network Laboratory (WINLAB), Rutgers, The State University of New Jersey, North Brunswick, NJ, United States

David Gesbert       Communication Systems Department, EURECOM, Biot, France

Georgios B. Giannakis       Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, United States

Paulo Gonçalves       ENS de Lyon, Univ Lyon 1, CNRS, Inria, LIP & IXXI, Univ Lyon, Lyon, France

Alfred Hero       EECS Department, University of Michigan, Ann Arbor, MI, United States

Franz Hlawatsch       Institute of Telecommunications, TU Wien, Vienna, Austria

William Hoiles       Biosymetrics, New York, NY, United States

Hazer Inaltekin       Department of Electrical and Electronic Engineering, The University of Melbourne, Parkville, VIC, Australia

Kwang-Sung Jun       Wisconsin Institute for Discovery, Madison, WI, United States

Bhavya Kailkhura       Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, United States

Argyris Kalogeratos       CMLA, ENS Cachan, CNRS, University of Paris-Saclay, Cachan, France

Sithan Kanna       Department of Electrical and Electronic Engineering, Imperial College London, London, United Kingdom

Soummya Kar       Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, United States

Paul de Kerret       Communication Systems Department, EURECOM, Biot, France

Vikram Krishnamurthy       School of Electrical and Computer Engineering, Cornell Tech, Cornell University, New York, NY, United States

Amir Leshem       Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel

Geert Leus       Delft University of Technology, Delft, The Netherlands

Zhenguo Li       Huawei Noah’s Ark Lab, Shatin, Hong Kong Special Administrative Region

Sijia Liu       EECS Department, University of Michigan, Ann Arbor, MI, United States

Ying Liu       Wireless Information Network Laboratory (WINLAB), Rutgers, The State University of New Jersey, North Brunswick, NJ, United States

Paolo Di Lorenzo       Department of Information Engineering, Electronics, and Telecommunications, Sapienza University of Rome, Rome, Italy

Danilo P. Mandic       Department of Electrical and Electronic Engineering, Imperial College London, London, United Kingdom

Morteza Mardani       Department of Electrical Engineering, Stanford University, Stanford, CA, United States

Antonio G. Marques       Department of Signal Theory and Communications, King Juan Carlos University, Madrid, Spain

Gonzalo Mateos       Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, United States

Vincenzo Matta       Department of Information Engineering, Electrical Engineering and Applied Mathematics, DIEM, University of Salerno, Fisciano, Italy

Mattia Merluzzi       Department of Information Engineering, Electronics, and Telecommunications, Sapienza University of Rome, Rome, Italy

José M.F. Moura       Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, United States

Eric Moulines       CMAP, Ecole Polytechnique, Palaiseau, France

Robert Nowak       Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, WI, United States

Brandon Oselio       EECS Department, University of Michigan, Ann Arbor, MI, United States

Konstantinos Plataniotis       Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada

H. Vincent Poor       Department of Electrical Engineering, Princeton University, Princeton, NJ, United States

Michael G. Rabbat       McGill University and Facebook AI Research, Montreal, QC, Canada

Alejandro Ribeiro       Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, United States

Cédric Richard       Laboratoire Lagrange, Université Côte d’Azur, CNRS, OCA, Nice, France

Stefania Sardellitti       Department of Information Engineering, Electronics, and Telecommunications, Sapienza University of Rome, Rome, Italy

Ali H. Sayed       School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

Anna Scaglione       School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, United States

Kevin Scaman

CMLA, ENS Cachan, CNRS, University of Paris-Saclay, Cachan, France

MSR, Inria Joint Center, Palaiseau, France

Mihaela van der Schaar       Electrical Engineering Department, University of California, Los Angeles, Los Angeles, CA, United States

Santiago Segarra

Institute for Data, Systems, and Society

Massachusetts Institute of Technology, Cambridge, MA, United States

Petros Spachos       School of Engineering, University of Guelph, Guelph, ON, Canada

Alex Sprintson       Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, United States

Brian Swenson       Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, United States

Cem Tekin       Electrical and Electronics Engineering Department, Bilkent University, Ankara, Turkey

Shang Kee Ting       DSO National Laboratories, Singapore, Singapore

Wade Trappe       Wireless Information Network Laboratory (WINLAB), Rutgers, The State University of New Jersey, North Brunswick, NJ, United States

Nicolas Tremblay       CNRS, GIPSA-lab, Univ. Grenoble Alpes, Grenoble, France

Pramod K. Varshney       Department of EECS, Syracuse University, Syracuse, NY, United States

Nicolas Vayatis       CMLA, ENS Cachan, CNRS, University of Paris-Saclay, Cachan, France

Aditya Vempaty       IBM Thomas J. Watson Research Center, Yorktown Heights, NY, United States

Marisel Villafañe-Delgado       Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, United States

Hoi-To Wai       School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, United States

Daifeng Wang       Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, United States

Xiao-Ming Wu       Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong Special Administrative Region

Yili Xia       School of Information Science and Engineering, Southeast University, Nanjing, PR China

Jie Xu       Electrical and Computer Engineering Department, University of Miami, Coral Gables, FL, United States

Simpson Zhang       Economics Department, University of California, Los Angeles, Los Angeles, CA, United States

Xiaochuan Zhao       University of California at Los Angeles, UCLA, Los Angeles, CA, United States


Petar M. Djurić ; Cédric Richard

Cooperative and graph signal processing are areas that have seen significant growth in recent years, a trend that without a doubt will continue for many years to come. These areas are well within the scope of network science, which deals with complex systems described by interconnected elements and/or agents. The networks can be of many types including physical, engineered, information, social, biological, and economic networks. Within the framework of cooperative and graph signal processing one aims at discovering principles of various tasks including distributed detection and estimation, adaptation and learning over networks, distributed decision making, optimization and control over networks, and modeling and identification. The range of possible applications is vast and includes robotics, smart grids, communications, economic networks, life sciences, ecology, social networks, wireless health, and transportation networks.

With this book we want to provide in a single volume the basics of cooperative and graph signal processing and to address a number of areas where they are applied. The chapters are grouped in five parts. In Part 1, the fundamentals of inference over networks are addressed, Part 2 is on graph signal processing, Part 3 focuses on communications, networking, and sensing, Part 4 studies social networks, and Part 5 describes applications that range from genomics and system biology to big data and brain networks. All the chapters are written by experts and leaders in the field.

The contents of Part 1 include the fundamentals of inference, learning, and optimization over networks in synchronous and asynchronous settings (Chapter 1); estimation and detection where the emphasis is on studying agents that track drifts in models and examining performance limits under both estimation and detection (Chapter 2); learning in networks where the agents make inference about parameters of both, common and local interest (Chapter 3), collaborative inference of agents within the Bayesian framework (Chapter 4); multiagent distributed optimization, where the aim of all agents is to agree on the minimizer (Chapter 5); sequential filtering using state-space models by Kalman and particle filtering (Chapter 6); and game-theoretic learning where the objective of the decision-makers in the network depends on both the actions of other agents and the state of the environment (Chapter 7).

Part 2 starts with the basics of graph signal processing (Chapter 8); then it continues with a review of recent advances in sampling and recovery of signals defined over graphs (Chapter 9); active learning in networks where one has the freedom of choosing nodes for querying and wants to select intelligently informative nodes to minimize prediction errors (Chapter 10); design of graph filters and filter banks (Chapter 11); extension of the notion of stationarity to random graph signals (Chapter 12); estimating a graph topology from graph signals (Chapter 13); and a unifying framework for learning on graphs and based on partially absorbing random walks (Chapter 14).

Part 3, with its theme on communications, networking, and sensing, addresses projection-based algorithms and projection-free decentralized optimization algorithms for performing big data analytics (Chapter 15); optimal joint allocation of communication, computation, and caching resources in 5G communication networks (Chapter 16); analysis of a network’s structural vulnerability to attacks (Chapter 17); team methods for cooperation in device-centric setups (Chapter 18); cooperative data exchange so that clients can decode missing messages while minimizing the total number of transmissions (Chapter 19); and Byzantine attacks and defense for collaborative spectrum-sensing in cognitive radio networks (Chapter 20).

Part 4, on social networks, provides material on diffusion models for information in social networks, Bayesian social learning, and revealed preferences (Chapter 21); network identification for social networks (Chapter 22); search and data routing in dynamic social networks (Chapter 23); information diffusion in social networks and control of diffusion of rumors and fake news (Chapter 24); methods for dealing with multilayer social networks, including multilayer community detection and multilayer interaction graph estimation (Chapter 25); and investigation of how cooperative and strategic learning lead to different types of networks and how efficient these networks are in making its agents achieve their goals (Chapter 26).

Part 5 brings various applications of cooperative and graph signal processing including construction, modeling, and analysis of gene regulatory networks (Chapter 27); cooperative frequency estimation in power networks (Chapter 28); the use of Bluetooth low energy beacons in smart city applications (Chapter 29); scalable decentralized and streaming analytics that facilitate machine learning from big data (Chapter 30); and capturing the dynamics of brain activity by analyzing functional brain networks and multivariate signals on the networks with graph signal processing methods (Chapter 31).

This book is a collective effort of the chapter authors. We are thankful for their enthusiastic support in writing their parts and for their timely responses. We are grateful to the reviewers for taking the time to read the chapters and for their valuable feedbacks. Finally, we thank Elsevier for all its support throughout the duration of this project.

Part 1

Basics of Inference Over Networks

Chapter 1

Asynchronous Adaptive Networksa

Ali H. Sayed*; Xiaochuan Zhao†    * School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

† University of California at Los Angeles, UCLA, Los Angeles, CA, United States


The overview article [1] surveyed advances related to adaptation, learning, and optimization over synchronous networks. Various distributed strategies were discussed that enable a collection of networked agents to interact locally in response to online streaming data and to continually learn and adapt to drifts in the data and models. Under reasonable technical conditions, the adaptive networks were shown to be mean-square stable in the slow adaptation regime, and their mean-square-error performance and convergence rate were characterized in terms of the network topology and data statistical properties [2]. Classical results for single-agent adaptation and learning were recovered as special cases. Following the works [3–5], this chapter complements the exposition from [1] and extends the results to asynchronous networks where agents are subject to various sources of uncertainties that influence their behavior, including randomly changing topologies, random link failures, random data arrival times, and agents turning on and off randomly. In an asynchronous environment, agents may stop updating their solutions or may stop sending or receiving information in a random manner and without coordination with other agents. The presentation will reveal that the mean-square-error performance of asynchronous networks remains largely unaltered compared to synchronous networks. The results justify the remarkable resilience of cooperative networks.


Multiagent networks; Distributed optimization; Diffusion strategy; Asynchronous behavior; Adaptive networks

1.1 Introduction

Adaptive networks consist of a collection of agents with learning abilities. The agents interact with each other on a local level and diffuse information across the network to solve inference and optimization tasks in a decentralized manner. Such networks are scalable, robust to node and link failures, and are particularly suitable for learning from big datasets by tapping into the power of collaboration among distributed agents. The networks are also endowed with cognitive abilities due to the sensing abilities of their agents, their interactions with their neighbors, and the embedded feedback mechanisms for acquiring and refining information. Each agent is not only capable of sensing data and experiencing the environment directly, but it also receives information through interactions with its neighbors and processes and analyzes this information to drive its learning process.

As already indicated in [1, 2], there are many good reasons for the peaked interest in networked solutions, especially in this day and age when the word network has become commonplace whether one is referring to social networks, power networks, transportation networks, biological networks, or other networks. Some of these reasons have to do with the benefits of cooperation over networks in terms of improved performance and improved robustness and resilience to failure. Other reasons deal with privacy and secrecy considerations where agents may not be comfortable sharing their data with remote fusion centers. In other situations, the data may already be available in dispersed locations, as happens with cloud computing. One may also be interested in learning and extracting information through data mining from large datasets. Decentralized learning procedures offer an attractive approach to dealing with such datasets. Decentralized mechanisms can also serve as important enablers for the design of robotic swarms, which can assist in the exploration of disaster areas.

1.1.1 Asynchronous Behavior

The survey article [1] and monograph [2] focused on the case of synchronous networks where data arrive at all agents in a synchronous manner and updates by the agents are also performed in a synchronous manner. The network topology was assumed to remain largely static during the adaptation process. Under these conditions, the limits of performance and stability of these networks were identified in some detail for two main classes of distributed strategies: consensus and diffusion constructions. In this chapter, we extend the overview from [1] to cover asynchronous environments. In such environments, the operation of the network can suffer from the occurrence of various random events, including randomly changing topologies, random link failures, random data arrival times, and agents turning on and off randomly. Agents may also stop updating their solutions or may stop sending or receiving information in a random manner and without coordination with other agents. Results in [3–5] examined the implications of such asynchronous events on network performance in some detail and under a fairly general model for the random events. The purpose of this chapter is to summarize the key conclusions from these works in a manner that complements the presentation from [1] for the benefit of the reader. While the works [3–5] consider a broader formulation involving complex-valued variables, we limit the discussion here to real-valued variables in order not to overload the notation and to convey the key insights more directly. Proofs and derivations are often omitted and can be found in the above references; the emphasis is on presenting the results in a motivated manner and on commenting on the insights they provide into the operation of asynchronous networks.

We indicated in [3–5] that there already exist many useful studies in the literature on the performance of consensus strategies in the presence of asynchronous events (see, e.g., [6–16]). There are also some studies in the context of diffusion strategies [17, 18]. However, with the exception of the latter two works, the earlier references assumed conditions that are not generally favorable for applications involving continuous adaptation and learning. For example, some of the works assumed a decaying step-size, which turns off adaptation after sufficient iterations have passed. Some other works assumed noise-free data, which is a hindrance when learning from data perturbed by interferences and distortions. A third class of works focused on studying pure averaging algorithms, which are not required to respond to continuous data streaming. In the works [3–5], we adopted a more general asynchronous model that removes these limitations by allowing for various sources of random events and, moreover, the events are allowed to occur simultaneously. We also examined learning algorithms that respond to streaming data to enable adaptation. The main conclusion from the analysis in these works, and which will be summarized in future sections, is that asynchronous networks can still behave in a mean-square-error (MSE) stable manner for sufficiently small step-sizes and, interestingly, their steady-state performance level is only slightly affected in comparison to synchronous behavior. The iterates computed by the various agents are still able to converge and hover around an agreement state with a small MSE. These are reassuring results that support the intrinsic robustness and resilience of network-based cooperative solutions.

1.1.2 Organization of the Chapter

Readers will benefit more from this chapter if they first review the earlier article [1]. We continue to follow a similar structure here as well as a similar notation because the material in both this chapter and the earlier reference [1] are meant to complement each other. We organize the presentation into three main components. The first part (Section 1.2) reviews fundamental results on adaptation and learning by single stand-alone agents. The second part (Section 1.3) covers asynchronous centralized solutions. The objective is to explain the gain in performance that results from aggregating the data from the agents and processing it centrally at a fusion center. The centralized performance is used as a frame of reference for assessing various implementations. While centralized solutions can be powerful, they nevertheless suffer from a number of limitations. First, in real-time applications where agents collect data continuously, the repeated exchange of information back and forth between agents and the fusion center can be costly, especially when these exchanges occur over wireless links or require nontrivial routing resources. Second, in some sensitive applications, agents may be reluctant to share their data with remote centers for various reasons including privacy and secrecy considerations. More importantly perhaps, centralized solutions have a critical point of failure: if the central processor fails, then this solution method collapses altogether.

For these reasons, we cover in the remaining sections of the chapter (Sections 1.4 and 1.5) distributed asynchronous strategies of the consensus and diffusion types, and examine their dynamics, stability, and performance metrics. In the distributed mode of operation, agents are connected by a topology and are permitted to share information only with their immediate neighbors. The study of the behavior of such networked agents is more challenging than in the single-agent and centralized modes of operation due to the coupling among interacting agents and the fact that the networks are generally sparsely connected.

1.2 Single-Agent Adaptation and Learning

We begin our treatment by reviewing stochastic gradient algorithms, with emphasis on their application to the problems of adaptation and learning by stand-alone agents.

1.2.1 Risk and Loss Functions

. When the variable w is complex-valued, some important technical differences arise that are beyond the scope of this chapter; they are addressed in [2–5]. Likewise, some adjustments to the arguments are needed when the risk function is nonsmooth (nondifferentiable), as explained in the works [19, 20]. It is sufficient for our purposes in this chapter to limit the presentation to real arguments and smooth risk functions without much loss in generality.

We denote the gradient vectors of J(w) relative to w and wT by the following row and column vectors, respectively:



These definitions are in terms of the partial derivatives of J(w, where the notation col{⋅} refers to a column vector that is formed by stacking its arguments on top of each other. Likewise, the Hessian matrix of J(w) with respect to w is defined as the following M × M symmetric matrix:


which is constructed from two successive gradient operations. It is common in adaptation and learning applications for the risk function J(w) to be constructed as the expectation of some loss function, Q(w;x), where the boldface variable x is used to denote some random data, say,


and the expectation is evaluated over the distribution of x.

Example 1.1

(Mean-square-error (MSE) costs)

Let d denote , and let u denote a zero-mean 1 × M . The combined quantities {d, u} represent the random variable x referred to in Eq. (. We formulate the problem of estimating d from u in the linear least-mean-squares-error sense or, equivalently, the problem of seeking the vector wo that minimizes the quadratic cost function:


This cost corresponds to the following choice for the loss function:


Such quadratic costs are widely used in estimation and adaptation problems [21–25]. They are also widely used as quadratic risk functions in machine learning applications [26, 27]. The gradient vector and Hessian matrix of J(w) are easily seen to be:


Example 1.2

(Logistic or log-loss risks)

Let γ denote a binary random variable that assumes the values ± 1, and let h denote an M . The combined quantities {γ, h} represent the random variable x referred to in Eq. (1.2). In the context of machine learning and pattern classification problems [26–28], the variable γ designates the class that feature vector h belongs to. In these problems, one seeks the vector wo that minimizes the regularized logistic risk function:


where ρ is the natural logarithm function, and ∥w∥²=wTw. The risk (1.4a) corresponds to the following choice for the loss function:


Once wo is recovered, its value can be used to classify new feature vectors, say, {hℓ}, to another class. It can be easily verified that:



where IM denotes the identity matrix of size M × M. ■

1.2.2 Conditions on Cost Function

Stochastic gradient algorithms are powerful iterative procedures for solving optimization problems of the form


While the analysis that follows can be pursued under more relaxed conditions (see, e.g., the treatments in [29–32]), it is sufficient for our purposes to require J(w) to be strongly convex and twice-differentiable with respect to w. The cost function J(w) is said to be ν-strongly convex if, and only if, its Hessian matrix is sufficiently bounded away from zero [30, 33–35]:


for all w and for some scalar ν > 0, where the notation A > 0 signifies that matrix A is positive-definite. Strong convexity is a useful condition in the context of adaptation and learning from streaming data because it helps guard against ill-conditioning in the algorithms; it also helps ensure that J(w) has a unique global minimum, say, at location wo; there will be no other minima, maxima, or saddle points. In addition, it is well known that strong convexity helps endow stochastic-gradient algorithms with geometric convergence rates in the order of O(αi), for some 0 ≤ α < 1 and where i is the iteration index [30, 31].

In many problems of interest in adaptation and learning, the cost function J(w) is either already strongly convex or can be made strongly convex by means of regularization. For example, it is common in machine learning problems [26, 27] and in adaptation and estimation problems [23, 25] to incorporate regularization factors into the cost functions; these factors help ensure strong convexity. For instance, the MSE cost (1.3a) is strongly convex whenever Ru > 0. If Ru happens to be singular, then the following regularized cost will be strongly convex:


where ρ > 0 is a regularization parameter similar to Eq. (1.4a).

Besides strong convexity, we shall also assume that the gradient vector of J(w) is δ-Lipschitz, namely, there exists δ > 0 such that


for all w1 and w2. It can be verified that for twice-differentiable costs, conditions (1.6) and (1.8) combined are equivalent to


For example, it is clear that the Hessian matrices in Eqs. (1.3c) and (1.4d) satisfy this property because


in the first case and


in the second case, where the notation λmin(R) and λmax(R) refers to the smallest and largest eigenvalues of the symmetric matrix argument, R, respectively. In summary, we will be assuming the following conditions [2, 3, 36, 37].

Assumption 1.1

(Conditions on cost function)

The cost function J(w) is twice-differentiable and satisfies Eq. (1.9) for some positive parameters ν δ. Condition (1.9) is equivalent to requiring J(w) to be ν-strongly convex and for its gradient vector to be δ-Lipschitz as in Eqs. (1.6) and (1.8), respectively.

1.2.3 Stochastic-Gradient Approximation

The traditional gradient-descent algorithm for solving Eq. (1.5) takes the form:


where i ≥ 0 is an iteration index and μ > 0 is a small step-size parameter. Starting from some initial condition, w−1, the iterates {wi} correspond to successive estimates for the minimizer wo. In order to run recursion (1.11), we need to have access to the true gradient vector. This information is generally unavailable in most instances involving learning from data. For example, when cost functions are defined as the expectations of certain loss functions as in Eq. (1.2), the statistical distribution of the data x may not be known beforehand. In that case, the exact form of J(w) will not be known because the expectation of Q(w;x) . Doing so leads to the following stochastic-gradient recursion in lieu of Eq. (1.11):


We use the boldface notation, wi, for the iterates in Eq. (1.12) to highlight the fact that these iterates are now randomly perturbed versions of the values {wi} generated by the original recursion (1.11). The random perturbations arise from the use of the approximate gradient vector. The boldface notation is therefore meant to emphasize the random nature of the iterates in Eq. (1.12). We refer to recursion (1.12) as a synchronous implementation because updates occur continuously over the iteration index i. This terminology is meant to distinguish the above recursion from its asynchronous counterpart, which is introduced later in Section 1.2.5.

We illustrate construction (1.12) by considering a scenario from classical adaptive filter theory [21–23], where the gradient vector is approximated directly from data realizations. The construction will reveal why stochastic-gradient implementations of the form (1.12), using approximate rather than exact gradient information, become naturally endowed with the ability to respond to streaming data.

Example 1.3

(LMS adaptation)

Let d(i. Let ui denote a streaming sequence of 1 × M . Both processes {d(i), ui} are assumed to be jointly wide-sense stationary. The cross-covariance vector between d(i) and ui is . The data {d(i), ui} are assumed to be related via a linear regression model of the form:


for some unknown parameter vector wo, and where v(iand assumed independent of uj for all i, j. Observe that we are using parentheses to represent the time dependency of a scalar variable, such as writing d(i), and subscripts to represent the time dependency of a vector variable, such as writing ui. This convention will be used throughout the chapter. In a manner similar to Example 1.1, we again pose the problem of estimating wo by minimizing the MSE cost


where now the quantities {d(i), ui} represent the random data xi in the definition of the loss function, Q(w;xi). Using Eq. (1.11), the gradient-descent recursion in this case will take the form:


The main difficulty in running this recursion is that it requires knowledge of the moments {rdu, Ru}. This information is rarely available beforehand; the adaptive agent senses instead realizations {d(i), ui} whose statistical distributions have moments {rdu, Ru}. The agent can use these realizations to approximate the moments and the true gradient vector. There are many constructions that can be used for this purpose, with different constructions leading to different adaptive algorithms [21–24]. It is sufficient to focus on one of the most popular adaptive algorithms, which results from using the data {d(i), ui} to compute instantaneous approximations for the unavailable moments as follows:


By doing so, the true gradient vector is approximated by:


, by the gradient vector of the loss function itself (which, equivalently, amounts to dropping the expectation operator). Substituting Eq. (1.13e) into Eq. (1.13c) leads to the well-known (synchronous) least-mean-squares (LMS, for short) algorithm [21–23, 38]:


The LMS algorithm is therefore a stochastic-gradient algorithm. By relying directly on the instantaneous data {d(i), ui}, the algorithm is infused with useful tracking abilities. This is because drifts in the model wo from Eq. (1.13a) will be reflected in the data {d(i), ui}, which are used directly in the update (1.13f). ■

If desired, it is also possible to employ iteration-dependent step-size sequences, μ(i), in Eq. (1.12) instead of the constant step-size μ, and to require μ(i) to satisfy


Under some technical conditions, it is well known that such step-size sequences ensure the convergence of wi toward wo [2, 30–32]. However, conditions (1.14) force the step-size sequence to decay to zero, which is problematic for applications requiring continuous adaptation and learning from streaming data. This is because, in such applications, it is not unusual for the location of the minimizer, wo, to drift with time. With μ(i) decaying toward zero, the stochastic-gradient algorithm (1.12) will stop updating and will not be able to track drifts in the solution. For this reason, we shall focus on constant step-sizes from this point onward because we are interested in solutions with tracking abilities.

Now, the use of an approximate gradient vector in Eq. (1.12) introduces perturbations relative to the operation of the original recursion (1.11). We refer to the perturbation as gradient noise and define it as the difference:


The presence of this perturbation prevents the stochastic iterate, wi, from converging almost surely to the minimizer wo when constant step-sizes are used. Some deterioration in performance will occur and the iterate wi will instead fluctuate close to wo. We will assess the size of these fluctuations by measuring their steady-state mean-square value (also called mean-square-deviation or MSD). It will turn out that the MSD is small and in the order of O(μ); see Eq. (1.28c) further ahead. It will also turn out that stochastic-gradient algorithms converge toward their MSD levels at a geometric rate. In this way, we will be able to conclude that adaptation with small constant step-sizes can still lead to reliable performance in the presence of gradient noise, which is a reassuring result. We will also be able to conclude that adaptation with constant step-sizes is useful even for stationary environments. This is because it is generally sufficient in practice to reach an iterate wi within some fidelity level from wo in a finite number of iterations. As long as the MSD level is satisfactory, a stochastic-gradient algorithm will be able to attain satisfactory fidelity within a reasonable time frame. In comparison, although diminishing step-sizes ensure almost-sure convergence of wi to wo, they nevertheless disable tracking and can only guarantee slower than geometric rates of convergence (see, e.g., [2, 30, 31]). The next example from [36] illustrates the nature of the gradient noise process (1.15) in the context of MSE adaptation.

Example 1.4

(Gradient noise)

It is clear from the expressions in Example 1.3 that the corresponding gradient noise process is


where we introduced the error vector:


Let the symbol Fi−1 represent the collection of all possible random events generated by the past iterates {wj} up to time i − 1 (more formally, Fi−1 is the filtration generated by the random process wj for j i − 1):


It follows from the conditions on the random processes {ui, v(i)} in Example 1.3 that



for some constant c ≥ 0. If we take expectations of both sides of Eq. (. Therefore, even if the adaptive agent is able to approach wo . ■

1.2.4 Conditions on Gradient Noise Process

In order to examine the convergence and performance properties of the stochastic-gradient recursion (1.12), it is necessary to introduce some assumptions on the stochastic nature of the gradient noise process, si(⋅). The conditions that we introduce in the sequel are similar to conditions used earlier in the optimization literature, e.g., in [30, pp. 95–102] and [39, p. 635]; they are also motivated by the conditions we observed in the MSE case in Example 1.4. Following the developments in [3, 36, 37], we let


denote the conditional second-order moment of the gradient noise process, which generally depends on i. We assume that, in the limit, the covariance matrix tends to a constant value when evaluated at wo and is denoted by


For example, comparing with expression (1.16a) for MSE costs, we have



Assumption 1.2

(Conditions on gradient noise)

It is assumed that the first- and second-order conditional moments of the gradient noise process satisfy Eq. (1.17b) and



almost surely, for some nonnegative scalars β.

Condition (1.19a) ensures that the approximate gradient vector is unbiased. It follows from conditions (1.19a) and (1.19b) that the gradient noise process itself satisfies:



It is straightforward to verify that the gradient noise process (.

1.2.5 Random Updates

We examined the performance of synchronous updates of the form (1.12) in some detail in [1, 2]. As indicated earlier, the focus of the current chapter is on extending the treatment from [1] to asynchronous implementations. Accordingly, the first main digression in the exposition relative to [1] occurs at this stage.

Thus, note that the stochastic-gradient recursion (1.12) employs a constant step-size parameter, μ , to be available at every iteration. Nevertheless, there are situations where data may arrive at the agent at random times, in which case the updates will also be occurring at random times. One way to capture this behavior is to model the step-size parameter as a random process, which we shall denote by the boldface notation μ(i) (because boldface letters in our notation refer to random quantities). Doing so allows us to replace the synchronous implementation (1.12) by the following asynchronous recursion:


Observe that we are attaching a time index to the step-size parameter in order to highlight that its value will now be changing randomly from one iteration to another.

For example, one particular instance discussed further ahead in Example 1.5 is when μ(i) is a Bernoulli random variable assuming one of two possible values, say, μ(i) = μ > 0 with probability and μ(i) = 0 with probability 1 − . In this case, recursion (1.21) will be updating -fraction of the time. We will not limit our presentation to the Bernoulli model but will allow the random step-size to assume broader probability distributions; we denote its first- and second-order moments as follows.

Assumption 1.3

(Conditions on step-size process)

It is assumed that the stochastic process {μ(i), i ≥ 0} consists of a sequence of independent and bounded random variables, μ(i) ∈ [0, μub], where μub > 0 is a constant upper bound. The mean and variance of μ(i) are fixed over i, and they are denoted by:



. Moreover, it is assumed that, for any i, the random variable μ(i) is independent of any other random variable in the learning algorithm.

The following variable will play an important role in characterizing the MSE performance of asynchronous updates and, hence, we introduce a unique symbol for it:


where the subscript "x is meant to refer to the asynchronous" mode of operation. This expression captures the first and second-order moments of the random variations in the step-size parameter into a single variable, μx. While the constant step-size μ determines the performance of the synchronous implementation (1.12), it turns out that the constant variable μx defined above will play an equivalent role for the asynchronous implementation (1.21). Note further that the synchronous stochastic-gradient iteration (1.12) can be viewed as a special case of recursion (1.21) when the variance of μ(iis set to μ. Therefore, by using these substitutions, we will able to deduce performance metrics for Eq. (1.12) from the performance metrics that we shall present for Eq. (1.21). The following two examples illustrate situations involving random updates.

Example 1.5

(Random updates under a Bernoulli model)

Assume that at every iteration i, the agent adopts a random on-off policy to reduce energy consumption. It tosses a coin to decide whether to enter an active learning mode or a sleeping mode. Let 0 < < 1 denote the probability of entering the active mode. During the active mode, the agent employs a step-size value μ. This model is useful for situations in which data arrives randomly at the agent: at every iteration i, new data is available with probability . The random step-size process is therefore of the following type:


In this case, the mean and variance of μ(i) are given by:


Example 1.6

(Random updates under a Beta model)

Because the random step-size μ(i) is limited to a finite-length interval [0, μub], we may extend the Bernoulli model from the previous example by adopting a more general continuous Beta distribution for μ(i). The Beta distribution is an extension of the Bernoulli distribution. While the Bernoulli distribution assumes two discrete possibilities for the random variable, say, {0, μ}, the Beta distribution allows for any value in the continuum [0, μ].

Thus, let x denote a generic scalar random variable that assumes values in the interval [0, 1] according to a Beta distribution. Then, according to this distribution, the pdf of x, denoted by fx(x;ξ, ζ), is determined by two shape parameters {ξ, ζ} as follows [40, 41]:


where Γ(⋅) denotes the Gamma function [42, 43]. Fig. 1.1 plots fx(x;ξ, ζ) for two values of ζ. The mean and variance of the Beta distribution (1.25a) are given by:


We note that the classical uniform distribution over the interval [0, 1] is a special case of the Beta distribution for ξ = ζ = 1; see Fig. 1.1. Likewise, the Bernoulli distribution with .

Fig. 1.1 Beta distribution. The pdf of the Beta distribution, f x ( x ; ξ , ζ ), defined by Eq. ( 1.25a) for different values of the shape parameters ξ and ζ. Figure reproduced with permission from Zhao X, Sayed AH. Asynchronous adaptation and learning over networks—Part I: Modeling and stability analysis. IEEE Trans Signal Process 2015;63(4):811–26.

In the Beta model for asynchronous adaptation, we assume that the ratio μ(i)/μub follows a Beta distribution with parameters {ξ, ζ}. Under this model, the mean and variance of the random step-size become:


1.2.6 Mean-Square-Error Stability

We now examine the convergence of the asynchronous stochastic-gradient recursion (1.21). In the statement below, the notation a = O(μ) means a for some constant b that is independent of μ.

Lemma 1.1 (Mean-square-error stability)

Assume the conditions under Assumptions 1.1–1.3 on the cost function, the gradient noise process, and the random step-size process hold. Let μo = 2ν/(δ² + β²). For any μx satisfying


converges exponentially (i.e., at a geometric rate) according to the recursion


where the scalar α satisfies 0 ≤ α < 1 and is given by


It follows from Eq. (1.28a) that, for sufficiently small step-sizes:



We subtract wo from both sides of Eq. (1.21) to get


We now appeal to the mean-value theorem [2, 30, 44] to write:


where we are introducing the symmetric and random time-variant matrix Hi−1 to represent the integral expression. Substituting into Eq. (1.29a), we get


so that from Assumption 1.2:


It follows from Eq. (1.9) that


because ν δ. In the first line above, the notation ρ(Ain terms of the largest magnitude eigenvalue of A). From Eq. (1.31a) we obtain


Taking expectations of both sides of Eq. (1.30), we arrive at Eq. (1.28a) from Eqs. (1.20b) and (1.31b) with α given by Eq. (1.28b). The bound in Eq. (1.27) on the moments of the random step-size ensures that 0 ≤ α approximation in expression (1.28b) note from Eq. (1.23) that


Iterating recursion (1.28a) gives


Because 0 ≤ α < 1, there exists an iteration value Io large enough such that


, which is O(μx), for any μx < μo/2. ■

1.2.7 Mean-Square-Error Performance

We conclude from Eq. (1.28c) that the MSE can be made as small as desired by using small step-sizes, μx. In this section we derive a closed-form expression for the asymptotic MSE, which is more frequently called the mean-square-deviation (MSD) and is defined as:


Strictly speaking, the limit on the right side of the above expression may not exist. A more accurate definition for the MSD appears in Eq. (4.86) of [2], namely,


However, it was explained in [2, Sec. 4.5] that derivations that assume the validity of Eq. (1.34a) still lead to the same expression for the MSD to first-order in μx as derivations that rely on the more formal definition (1.34b). We therefore continue with Eq. (1.34a) for simplicity of presentation. We explain below how an expression for the MSD can be obtained by following the energy conservation technique of [2, 23, 24, 45, 46]. For that purpose, we need to introduce two smoothness conditions.

Assumption 1.4

(Smoothness conditions)

In addition to Assumptions 1.1 and 1.2, we assume that the Hessian matrix of the cost function and the noise covariance matrix defined by Eq. (1.17a) are locally Lipschitz continuous in a small neighborhood around w = wo:



for small perturbations ∥δw∥≤ r and for some τ, τ2 ≥ 0 and 1 ≤ κ ≤ 2. ■

The range of values for κ can be enlarged, e.g., to κ ∈ (0, 4]. The only change in allowing a wider range for κ is that the exponent of the higher-order term, O(μx³/²), that will appear in several performance expressions, as is the case with Eqs. (, without affecting the first-order term that determines the MSD [2, 4, 47]. Therefore, it is sufficient to continue with κ ∈ [1, 2] to illustrate the key concepts though the MSD expressions will still be valid to first order in μx.

Using Eq. (1.9), it can be verified that condition (1.35a) translates into a global Lipschitz property relative to the minimizer wo, i.e., it will also hold that [2, 4]:


for all w and for some τ′≥ 0. For example, both conditions (after sufficient iterations, i.e., for i ≫ 1. Indeed, let us reconsider recursion (1.29c) and introduce the deviation matrix:


where the constant (symmetric and positive-definite) matrix H is defined as:


Substituting Eq. (1.36a) into Eq. (1.29c) gives




Using Eq. (for any real-valued random variable, we can bound the conditional expectation of the norm of the perturbation term as follows:


so that using Eq. (1.28c), we conclude that:


asymptotically with high probability [, for any constant integer m ≥ 1. Now, calling upon Markov’s inequality [48–50], we conclude from Eq. (1.38b) that for i ≫ 1:


This result shows that the probability of having ∥μ(i) ci−1∥ bounded by rc can be made arbitrarily close to one by selecting a large enough value for m. Once the value for m . This analysis, along with recursion (1.37a), motivates us to assess the mean-square performance of the error recursion (1.29c) by considering instead the following long-term model, which holds with high probability after sufficient iterations i ≫ 1:


Working with iteration (1.39) is helpful because its dynamics are driven by the constant matrix H as opposed to the random matrix Hi−1 of the actual MSD expression for the original recursion (1.29c); see [2, 4, 47] for a formal proof of this fact. Therefore, it is sufficient to rely on the long-term model (1.39) to obtain performance expressions that are accurate to first order in μx. Fig. 1.2 provides a block-diagram representation for Eq. (1.39).

Fig. 1.2 Long-term dynamics. A block-diagram representation of the long-term recursion ( 1.39) for single-agent adaptation and learning. Figure reproduced with permission from Sayed AH. Adaptive networks. Proc IEEE 2014;102(4):460–97.

Before explaining how model (1.39) can be used to assess the MSD, we remark that there is a second useful metric for evaluating the performance of stochastic gradient algorithms. This metric relates to the mean excess-cost, which is also called the excess-risk (ER) in the machine learning literature [26, 27] and the excess-mean-square-error (EMSE) in the adaptive filtering literature [21–23]. We denote it by the letters ER and define it as the average fluctuation of the cost function around its minimum value:


Using the smoothness condition (1.35a), and the mean-value theorem [30, 44] again, it can be verified that [2, 4, 47]:


Lemma 1.2 (Mean-square-error performance)

Assume the conditions under Assumptions 1.1–1.4 on the cost function, the gradient noise process, and the random step-size process hold. Assume further that the asynchronous step-size parameter μx is sufficiently small to ensure mean-square stability as required by Eq. (1.27). Then, the MSD and ER metrics for the asynchronous stochastic-gradient algorithm (1.21) are well approximated to first order in μx by the expressions:



where Rs and H are defined by Eqs. (1.17b) and (1.36b), and where we are adding the superscript asyn for clarity in order to distinguish these measures from the corresponding measures in the synchronous case (mentioned below in Eqs. (1.43a)–(1.43c)). Moreover, we derived earlier in Eq. (1.28b) the following expression for the convergence rate:



We introduce the eigen-decomposition H = UΛUT, where U is orthonormal and Λ is diagonal with positive entries, and rewrite Eq. (1.39) in terms of transformed quantities:


. Let Σ denote an arbitrary M × M diagonal matrix with positive entries that we are free to choose. Then, equating the weighted squared norms of both sides of Eq. (1.42a) and taking expectations gives for i ≫ 1:




From Eqs. (1.17b), (1.19b), (1.28c), and (1.35b) we obtain:


Therefore, substituting into Eq. (:



Você chegou ao final desta amostra. Inscreva-se para ler mais!
Página 1 de 1


O que as pessoas pensam sobre Cooperative and Graph Signal Processing

0 avaliações / 0 Análises
O que você acha?
Classificação: 0 de 5 estrelas

Avaliações de leitores