Você está na página 1de 13

Journal of Network and Computer Applications 35 (2012) 844–856

Contents lists available at SciVerse ScienceDirect

Journal of Network and Computer Applications


journal homepage: www.elsevier.com/locate/jnca

Localized motion-based connectivity restoration algorithms for wireless


sensor and actor networks
Muhammad Imran a,n, Mohamed Younis b, Abas Md Said c, Halabi Hasbullah c
a
Deanship of e-Transactions & Communication, King Saud University, Riyadh, Saudi Arabia
b
Department of Computer Science & Electrical Eng., University of Maryland Baltimore County, USA
c
Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Malaysia

a r t i c l e i n f o a b s t r a c t

Article history: In wireless sensor and actor networks maintaining inter-actor connectivity is very important in
Received 7 February 2011 mission-critical applications where actors have to quickly plan optimal coordinated response to
Received in revised form detected events. Failure of one or multiple actors may partition the inter-actor network into disjoint
16 October 2011
segments, and thus hinders the network operation. Autonomous detection and rapid recovery
Accepted 1 December 2011
procedures are highly desirable in such a case. This paper presents DCR, a novel distributed partitioning
Available online 14 December 2011
detection and connectivity restoration algorithm to tolerate the failure of actors. DCR proactively
Keywords: identifies actors that are critical to the network connectivity based on local topological information, and
Wireless sensor and actor network designates appropriate, preferably non-critical, backup nodes. Upon failure detection, the backup actor
Fault recovery
initiates a recovery process that may involve coordinated relocation of multiple actors. We also present
Connectivity restoration
an extended version of DCR, named RAM, to handle one possible case of a multi-actor failure. The
Node relocation
proposed algorithms strive to avoid procrastination, localize the scope of recovery and minimize the
movement overhead. Simulation results validate the performance of the proposed algorithms.
& 2011 Elsevier Ltd. All rights reserved.

1. Introduction and synchronize their operations. For instance, in a forest mon-


itoring applications, sensors report of the detection of a fire to the
Wireless Sensor and Actor Networks (WSANs) (Akyildiz and actors in the vicinity. Actors such as fire extinguishing robots and
Kasimoglu, 2004) are gaining growing interest because of their flying aircrafts need to be engaged as rapidly as possible in order
suitability for mission critical applications that require autono- to control the erupted fire and prevent it from spreading. There-
mous and intelligent interaction with the environment. Examples fore, actors should collaboratively identify the most appropriate
of these applications include forest fire monitoring, disaster set of actors that will participate in the operation. This requires
management, search and rescue, security surveillance, battlefield that actors are able to communicate with each other and a
reconnaissance, space exploration, coast and border protection, strongly connected inter-actor topology should be maintained at
etc. WSANs consist of numerous miniaturized stationary sensors all time.
and fewer mobile actors. The sensor nodes report an event of The harsh application environment that WSANs operate in
interest to one or multiple actors for processing, making decisions makes actors susceptible to physical damage and component
and performing appropriate actions. The role of an actor is malfunction. Failure of one or multiple nodes may partition the
extremely crucial for a timely response to events such as fire, inter-actor network into disjoint segments. Consequently, an
earthquakes, disasters, etc., and depends on the environment and inter-actor interaction may cease and the network becomes
capabilities of actors that may vary from one application to incapable of delivering a timely response to a serious event.
another. For example, an actor can extinguish a fire, lift rubbles, Therefore, recovery from an actor failure is of utmost importance.
rescue trapped survivors, deactivate a landmine and carry weap- Since, WSANs operate autonomously in unattended setups, repla-
ons. A sample WSAN environment is depicted in Fig. 1. cing the failed actor may be infeasible or take significant time,
In these critical WSAN applications, actors need to collaborate and the recovery should be a self-healing and agile process that
and coordinate with each other on planning an optimal response involves reconfiguring the inter-actor topology. The criticality of
the applications and the resource constrained nature of these
networks necessitate low restoration time and reduced overhead.
n
Corresponding author.
Most of the existing approaches in the literature are purely
E-mail addresses: cimran@ksu.edu.sa, cmimran81@yahoo.com (M. Imran), reactive (Abbasi et al., 2007; Younis et al., 2010; Tamboli and
younis@umbc.edu (M. Younis). Younis, 2009), with the recovery process initiated once the failure

1084-8045/$ - see front matter & 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.jnca.2011.12.002
M. Imran et al. / Journal of Network and Computer Applications 35 (2012) 844–856 845

discussed in Section 3. The proposed DCR and RAM algorithms


S
S are detailed in Section 4. Section 5 presents the analysis the
S
S
S
proposed recovery algorithm. The performance of DCR and RAM is
S
A
A evaluated through simulation and presented in Section 6. Section
S S
7 concludes the paper.
A S
S S
A S
A S
2. System model and problem statement
S S
S
A
S A S Our algorithm is applicable to WSANs that involve sensors and
S
actors. Sensors detect and report events of interest to one or
A Actor node
S
S Sensor node
multiple actors. Actors receive reports from sensors, process and
Actor-Actor communication Sensor-Actor communication collaborate with each other to plan an optimal coordinated
response. Sensors are deployed in abundance while actors are
Fig. 1. An example wireless sensor and actor network setup.
significantly fewer than sensors. Sensors are inexpensive and
have scarce resources compared to actors in terms of energy,
of ‘‘F’’ is detected. The main idea is replace the failed node ‘‘F’’ with communication and computation (processing and memory). The
one of its neighbors or move those neighbors inward to autono- communication range (rc) of an actor refers to the maximum
mously mend severed topology in the vicinity of F. Usually the Euclidean distance that its radio can reach. An actor may have two
repositioning of the neighbors of F causes more links to break and radios for sensor actor and actor  actor communications. To
the relocation process repeats in a cascaded manner. Since these simplify analysis, nodes are assumed to have same communica-
reactive schemes require coordination among the healthy nodes, the tion range. Both sensors and actors are deployed randomly in an
recovery process often imposes high messaging overhead. In addition, area of interest.
these approaches only deal with single node failure, focus on resource After deployment, actors are assumed to discover each other
efficiency and do not consider recovery time. and form a connected inter-actor network using some of the
In this paper, we present a novel distributed partitioning existing techniques such as Akkaya and Younis (2006). Inter-actor
Detection and Connectivity Restoration (DCR) algorithm, which connectivity is the primary concern of this paper. It is assumed
proactively determines potential critical actors and assign backup that sensors are stationary and can send their data to actors over
nodes in order to rapidly repair the topology with little overhead. multi-hop routes. An actor is assumed to be able move on
The design philosophy of DCR is based on ‘‘Guardian nomination’’ demand and such relocation does not affect sensor  actor con-
inspired from social and legal systems. First each actor proactively nectivity. The action range of an actor refers to the maximum area
assesses its criticality, i.e., being a cut-vertex in the network in which an actor can cover (Batalin and Sukhatme, 2005) and is
topology, in a distributed manner based on the local information. assumed to be equal for all actors. We assume that an actor can
Each critical (primary) actor designates appropriate neighbor determine its location using an onboard GPS receiver, or position
(preferably non-critical) as its backup. The backup actor continu- relative to its neighbors using localization techniques such as
ously monitors its primary for possible failure. Once the failure is Bulusu et al. (2000) and Youssef et al. (2005). Each actor main-
detected, the backup initiates a recovery process by replacing the tains a list of direct (1-hop) neighbors and exchanges heartbeat
primary so that the connectivity is restored. The algorithm is messages with them to update its status. DCR and RAM are suited
recursively executed until all actors become strongly connected. for applications in which line-of-sight links are available between
DCR assumes single critical actor failure at time and no other actors that fall in the communication range of each other.
node fails during the recovery process. Although, the possibility of Example applications include reconnaissance missions in deserts
concurrent multiple actor failure is exceptional, it may precipi- and coastal areas and surveillance using small unattended air-
tated by harsh environment and disastrous events such as borne vehicles.
explosions in battlefield. Recovery from such failures is very The impact of an actor’s failure depends on the position of that
challenging and requires careful consideration especially when actor in the network topology. A node is said to be critical, cut-
the failed actors are neighbors. We extend DCR to address one vertex in graph theory terminology, if its removal partitions the
scenario of the multi-node failure when no more than two of the network into disjoint segments (MilenkoJorgić et al., 2004). The
failed actors are adjacent. We present a recovery algorithm for the failure of one or multiple critical actors not only affects the actor
failure of multiple node failures (RAM) in order to handle coverage but significantly impacts inter-actor connectivity. For
concurrent failures of multiple actors. RAM identifies critical example, consider a network topology depicted in Fig. 2. Losing a
actors and designates for them distinct backups. The designated leaf/non-critical node, such as G does not affect inter-actor
backups detect the failure of adjacent actors and simultaneously connectivity. Meanwhile, the failure of a critical node such as F
execute the recovery process. Like DCR, the recovery procedure is partitions the network into disjoint blocks. This paper focuses on
applied recursively until connectivity is restored. The overall restoring inter-actor connectivity lost due to failure of one or
purpose is to avoid procrastination, engage actors locally to multiple adjacent critical actors.
monitor each other, reduce recovery time and overhead. To the
best of our knowledge, RAM is the first localized approach that N J
strives to minimize the recovery overhead while recovering from G
K
simultaneous failure of multiple actors. Simulation results vali- D E
date the performance of the proposed approaches in terms of L
total distance movement, nodes involved in recovery, messaging F
and coverage reduction. It is worth noting that our algorithm is I O
H A B
equally applicable to mobile sensor networks (MSNs) (Dantu
et al., 2005) and mobile robot networks (MRNs) (Das et al., 2007). M
C
This paper is organized as follows. Section 2 discusses the
system model and problem statement. The related work is Fig. 2. An example of connected inter-actor network topology.
846 M. Imran et al. / Journal of Network and Computer Applications 35 (2012) 844–856

In order to tolerate critical node failure, three methodologies (Tamboli and Younis, 2009), VCR (Imran et al., 2010) avoid the
can be identified: (i) proactive, (ii) reactive and (iii) hybrid. increased overhead for tracking 2-hop neighbors and require each
Proactive approaches establish and maintain bi-connected topol- node to maintain only its directly reachable nodes, i.e. 1-hop
ogy in order to provide fault tolerance. This necessitates large neighbors. Like our proposed DCR algorithm, DARA (Abbasi et al.,
actor count that leads to higher cost and becomes impractical. On 2007) strives to restore connectivity lost due to failure of cut-
the other hand, in reactive approaches the network responds only vertex. However, DARA requires more network state in order to
when a failure occurs. Therefore, reactive approaches might not ensure convergence. Meanwhile, in PADRA (Akkaya et al., 2008),
be suitable for time-critical applications. In hybrid approaches Akkaya et al. identify a connected dominating set (CDS) of the
pre-failure planning is pursued in order to increase the efficiency whole network in order to detect cut-vertices. Since the CDS
of the recovery. We argue that a hybrid approach better suits based method is not accurate for critical node detection, they
autonomous WSANs that are deployed for time-critical applica- perform a depth-first search (DFS) on each member for the CDS to
tions due to the reduced recovery time and overhead. DCR uses a confirm that the node is really a cut vertex or not. Although, they
localized scheme to identify critical actors and designate backups use a distributed algorithm, their solution still requires 2-hop
for them. The backup actor detects the failure of the primary and neighbor’s information that increases messaging overhead.
pursues node relocation to repair the partitioned network topol- Another work proposed in Azadeh (2009) also uses 2-hop infor-
ogy. DCR considers one failure at a time and no other node fails mation to detect cut-vertices. The proposed DCR algorithm relies
during the recovery. We extend DCR to handle one possible only on 1-hop information and reduces the communication
scenario of multiple simultaneous failures of actors and will be overhead.
discussed later in Section 4.3. Although RIM (Younis et al., 2010), C3R (Tamboli and Younis,
2009) and VCR (Imran et al., 2010) use 1-hop neighbor informa-
tion to restore connectivity, they are purely reactive and do not
3. Related work differentiate between critical and non-critical nodes. Whereas,
DCR is a hybrid algorithm that proactively identifies critical nodes
The issue of fault tolerance in different WSAN contexts has and designates for them appropriate backups. ACR (Imran et al.,
only been studied in few studies. For instance, the fault-tolerant 2011) is a recently proposed hybrid algorithm that maintains
model presented in Ozaki et al. (2006) designates multiple actors 1-hop information and factors in application-level interests while
to each sensor and multiple sensors to each actor in order ensure connectivity restoration. However, ACR cannot handle simulta-
guaranteed event notification even in case of either failure or neous failure of multiple actors.
inaccessibility. However, our fault-tolerant model is in context of The very first work to handle multiple simultaneous node
maintaining inter-actor connectivity rather than reliable event failures in the context of sensor networks is recently proposed in
notification delivery. On the other hand, some research has also Lee and Younis (2010). The authors deploy additional relay nodes
exploited node mobility as a means for performance optimization (RNs) to restore the overall connectivity using the least RN count.
both in sensor networks and WSANs. For example, the movement Unlike Akkaya et al. (2010), our work relies on reconfiguring the
of the base-station is employed in Akkaya et al. (2005) to increase existing topology instead of employing additional nodes. Akkaya
sensors lifetime and throughput while minimizing latency. How- et al. extended their work (Akkaya et al., 2008) by introducing a
ever, exploiting node mobility to mend severed topologies has mutual exclusion mechanism called MPADRA (Akkaya et al.,
just recently started to attract attention. The reader is referred to 2010) in order to handle multiple simultaneous failures in a
Younis and Akkaya (2008) for a comprehensive survey on node localized manner. Our proposed approach differs from MPADRA
relocation strategies. in multiple aspects. First, MPADRA requires a mutual exclusion
The existing work on using node mobility to recovery from a mechanism to avoid race conditions. Second, MPADRA reserves
failure can be categorized into block and cascaded movement. the nodes on the path in advance before actual relocation. On the
Block movement often requires a high pre-failure connectivity in other hand, RAM designates distinct backups and does not engage
order for the nodes to coordinate their response. An example of relocating nodes beforehand. Third, MPADRA maintains 2-hop
block movement based approaches is reported in Basu and Redi network state information and requires primary and secondary
(2004), where the initial network is assumed to be 2-connected failure handlers for each dominator. Whereas, our approach only
and goal is to sustain such 2-connectivity even under link or node requires 1-hop information and each critical node has only one
failure. The idea of movement of robots is similar to ours but their backup to handle its failure. A variant of DCR was presented in
approach is centralized in nature and does not fit autonomous Imran et al. (2010). This paper improves the backup selection
WSANs. Das et al. studied the similar problem and presented a criteria of Imran et al. (2010), provides detailed analysis and
distributed approach to restore 2-connectivity in Orozco-Barbosa introduces a new mechanism to handle concurrent failure of
et al. (2007). Unlike Basu and Redi (2004) and (Orozco-Barbosa multiple adjacent nodes.
et al. (2007), our algorithm focuses on providing 1-connectivity.
Block movements often becomes infeasible in absence of
higher level of connectivity. Therefore, few researchers have 4. Partition detection and connectivity restoration
pursued cascaded node movement (Guiling et al., 2005) or shifted
relocation (Li et al., 2007). The idea is to gradually replace As mentioned earlier, hybrid algorithms better suits time-
intermediate nodes on the path instead of moving a node for a sensitive applications that require a rapid recovery. The proposed
long distance. Although, the idea of cascaded movement is similar DCR algorithm is hybrid in the sense it consists of two parts, i.e.
to DCR and RAM, the prime objective of Guiling et al. (2005) is to proactive and reactive. In the proactive part, critical actors are
mitigate holes in coverage introduced due to failure of sensors. determined using a localized algorithm. Once critical nodes
Our objective is to restore inter-actor connectivity. (primary) are determined, they select and designate an appro-
Strategies adopting cascaded relocations can be further cate- priate neighbor (backup) to handle their failure when such
gorized based on the network state information that nodes are contingency arises in the future. Each backup starts monitoring
assumed to maintain. Some approaches like DARA (Abbasi et al., its primary through HEARTBEATS. In the reactive part, a backup
2007), PADRA (Akkaya et al., 2008) require each actor to maintain initiates a recovery process when the primary fails. The backup
2-hop neighbors. Others, such as RIM (Younis et al., 2010), C3R replaces the primary and cascaded relocations are performed
M. Imran et al. / Journal of Network and Computer Applications 35 (2012) 844–856 847

until the recovery is complete. The detailed algorithm is described 4.2. Recovery from single critical actor failure
in the balance of this section.
This subsection details the DCR algorithm that is designed to
recover from the failure of a critical actor. The details of the
4.1. Identifying critical actors
algorithm are in the following subsection.

As described earlier, the failure of critical actor divides the


inter-actor network into disjoint segments. DCR and RAM opt to 4.2.1. Backup selection and primary monitoring
identify a backup for each of these critical actors. Several algo- Once the critical actors (primary) are identified, the next step
rithms to identify cut-vertices in a graph, critical nodes in the is to select and designate appropriate neighbors as backups. The
context of WSANs, have been proposed in the literature. These purpose of the pre-nomination of backup nodes is to instanta-
algorithms can be categorized into centralized and distributed. neously react to the failure of critical actors and avoid the possible
Centralized algorithms (Duque-Anton et al., 2000; Goyal and network partitioning caused by such a failure.
Caffery, 2002) require each node to be aware of global topology. Selection of backup: The actors maintain minimum state
These methods involve huge communication overhead due to the information (i.e. 1-hop neighbors) to avoid extra overhead of
dynamic nature of these networks. Frequent changes in the WSAN messaging. Since, neighbors become disconnected when a critical
topology favors distributed and highly localized algorithms. Dis- actor fails, backup actors are determined and notified before a
tributed detection algorithms (Akkaya et al., 2008; Azadeh, 2009) failure of critical nodes takes place. For DCR, a node can serve as
are based on CDS and requires 2-hop neighbor information. Some backup for multiple actors; this will be constrained in RAM, as we
localized algorithms (MilenkoJorgić et al., 2004) require only discuss later. The selection of a backup among 1-hop neighbors is
1-hop neighbor’s positional information at the expense of lower based on the following ordered criteria:
accuracy of cut-vertices identification. Basically, some nodes are
marked as critical while they are not cut-vertices. However, no (a) Travel feasibility: An actor ‘‘A’’ that can relocate to the position
critical node will be missed. Given that DCR and RAM assign of ‘‘F’’ due to the presence of physical constraints, e.g., the
backups that are preferably non-critical, such a category of presence of creek or canyon, cannot serve as a backup of ‘‘F’’.
approaches fits well and the reduced accuracy is not a major (b) Neighbor actor status (NAS): As discussed above, each actor
concern as will be discussed later in this section. Therefore, DCR determines whether it is critical or non-critical. A non-critical
and RAM employ a simple localized cut-vertex detection proce- neighbor actor is preferred to serve as backup. This will limit
dure that only requires 1-hop positional information to detect the scope of the recovery, reduce incurred overhead and
critical nodes. The procedure is based on MilenkoJorgić et al. minimize the impact on coverage.
(2004) and runs on each node in a distributed manner to (c) Actor degree (AD): A non-critical neighboring actor (preferably
determine locally whether a node is critical or not. leaf) is a more suitable candidate for backup since moving that
Each actor determines locally whether it is critical or not based node will have minimum impact on inter-actor connectivity. If a
on neighbor’s position information. It calculates the distance non-critical node is not available in the neighborhood, DCR
between neighbors based on their positions. If the distance is prefers to choose a strongly connected critical node (with high
less than their communication range, the actor is considered non- degree) because there is more probability to have non-critical
critical because neighbors would stay connected without it. On nodes in the neighborhood. This will limit the scope of cascaded
the other hand, if the 1-hop neighbors of an actor can be relocation and thus lower the recovery overhead.
partitioned into more than one segment, the actor is 1-hop (d) Inter-actor distance (ID): A close backup actor is preferred in order
critical. For instance, Fig. 3 shows a localized scope of non-critical to reduce the movement overhead and shorten the recovery time.
node A and critical node F. Nodes B, C, D, and E are 1-hop Again, it has to be feasible for the backup to travel to the position
neighbors of node A as shown in Fig. 3(a). Node A is 1-hop of the primary per the first criterion above.
positional non-critical because its neighbors remain connected
without A. On the other hand, neighbors of F can be divided in to Each critical actor announces the ID of the designated backup
two sub graphs i.e. {B, C} and {G, H, I}. Therefore, F is 1-hop in one of the HEARTBEAT messages that are sent to 1-hop
positional critical as illustrated in Fig. 3(b). Furthermore, leaf neighbors. A node may be selected as a backup for more than
nodes such as I are detected as non-critical, since there failure one actor. In case a backup actor fails or moves outside the range
does not inflict inter-actor connectivity. Again, 1-hop positional of its primary, the primary detects through missing successive
critical nodes are not indeed cut-vertices all the time; obviously, heartbeats and selects another backup using the same procedure
the opposite is true. However, DCR and RAM pursue such specified above. Since the set of candidate backups is limited to
approximate state determination in order to cut on messaging the 1-hop neighbors, the picked backup may not be globally
overhead. Basing the criticality assessment on topological and
positional information of 1-hop neighbors makes DCR and RAM
suited for applications in which line-of-sight links are available
N J N J
between actors that fall in the communication range of each G G
other, as pointed out in Section 2. K K
D E D E
L L

F F
I 0 I 0
H A B H A B
M M
C C

Fig. 4. Critical actors designate their backup using DCR for network segment
Fig. 3. A segment of an inter-actor network showing1-hop positional: (a) critical shown in Fig. 2: (a) backups start monitoring their primary and (b) B detect failure
and (b) non-critical actors. of primary F.
848 M. Imran et al. / Journal of Network and Computer Applications 35 (2012) 844–856

optimal. Nonetheless, the local selection enables DCR to be failure of Af will be interpreted by its backup Aj as if Ai is lost and
applied in a distributed manner and scale for large networks. Aj will thus move to replace Ai. The recovery process consists of
Figure 4(a) shows the setup where critical actors appoint their following steps:
backups. The arrow head point towards the primary. Note that
DCR does not require extra actors for serving as backup. It (a) Primary recovery: The backup actor immediately initiates a
employs existing actors just to take care of each other. recovery process once it detects failure of its primary. The
Failure detection: Once an actor receives a BACKUP notification, scope of recovery depends on the position of backup actor
it starts monitoring the primary through HEARTBEAT messages. which can be one of the following three scenarios. First, if a
The failure of the primary is detected by corresponding backup backup is a non-critical node the scope of the recovery will be
through successive misses of HEARTBEATS. Figure 4(b) indicates limited because it does not require further relocations. The
that the backup node B detects the failure of primary F and backup actor moves to the position of the failed primary and
triggers the recovery process as detailed in the following section. exchange heartbeat messages with new neighbors. It selects
and designates a new backup since it has become a critical
node at the new position. This movement alerts the other
4.2.2. Recovery process primary nodes (if any) at the previous location to choose a
The reactive recovery process is initiated by the backup upon new backup for themselves. An illustrative example is pro-
the detection of a primary failure. The scope of the recovery vided in Fig. 5, where non-critical backup B simply replaces its
depends on the NAS. If the backup is a non-critical actor, it simply primary (i.e. F) and selects a backup for itself.
replaces the primary and the recovery would be complete. The second scenario is when the backup is also a critical node.
However, if the backup is also critical node, cascaded relocation In this case, the backup actor will notify its own backup so
is performed. Basically, repositioning of actor Ai in response to the that the network stays connected. This scenario may trigger a
series of cascaded repositioning of nodes as explained below.
The third scenario is when the failed (primary) and its backup
are both critical nodes and simultaneously serving as backup
for each other. This scenario is articulated in Fig. 6. Actor B
detects the failure of F as both are mutually serving as backup
for each other as shown in Fig. 6(a). Figure 6(b) shows that the
actor B selects another actor ‘‘A’’ as backup. Then B sends a
movement notification message and moves to the position of
F as shown in Fig. 6(c). This movement triggers a series of
cascaded relocations as discussed below and is shown in Fig. 6
Fig. 5. Recovery process when backup actor is non-critical. (d), with A replacing B and C replacing A.
(b) Cascaded relocation: As mentioned earlier, the position of that
backup determines the scope of the recovery. In particular,
the recovery process of the second scenario is repeated to
handle the departure of a backup node. Basically, when the
critical backup actor B moves to the location of the failed
actor, it waits for receiving heartbeat messages from its own
backup BB. Once node B receives a heartbeat message from BB,
it selects and designates a new backup based on the new
neighborhood that it has joined. This process may be again
applied by BB and so on until a non-critical backup replaces a
primary. Figure 7(a) illustrates this scenario where the backup
actor is also critical and the recovery process continues in a
cascaded manner. The failed actor B is replaced by another
critical actor D (i.e. backup). Figure 7(b) shows the scenario
where moving critical actor D further partitions the network,
a cascaded relocation is triggered. The non-critical backup
actor K replaces critical primary actor D and the connectivity
Fig. 6. Applying the recovery process when two actors are simultaneously is restored. Upon conclusion of the recovery, the backup
primary-backup of each other. designation will be updated to get the network ready for

Fig. 7. Illustrating the recovery process when backup actor is critical: (a) the critical node D detects failure of primary B and (b) D replaces Band K replace D.
M. Imran et al. / Journal of Network and Computer Applications 35 (2012) 844–856 849

Bi
 A critical node can be chosen as a backup only if it is not
already appointed by some other node. Moreover, two adja-
Pi Pj cent critical actors cannot serve each other as backup simul-
taneously. This will ensure that there will be some backup
Bj
node to recover incase adjacent actors fail at the same time.
Fig. 8. Illustrating the challenges in handling multiple simultaneous failures,
 If a critical actor ‘‘A’’ picks a non-critical neighbor ‘‘B’’ as a
where moving two non-critical partitions the network. backup, RAM requires ‘‘B’’ to also pick a backup ‘‘C’’ among its
neighbors using the same criteria mentioned above. However,
node ‘‘B’’ status is not changed to critical. This condition
recovery in case a node fails in the future. The pseudo code of enables recovery when the primary and backup both fail at
the DCR algorithm is presented in Appendix A. the same time. In addition, it prevents the scenario of Fig. 8.

4.3. Recovery from failure of multiple actors


4.3.2. Failure detection and recovery
As mentioned earlier, DCR assumes a single critical actor Like DCR, actors periodically exchange heartbeat messages
failure at a time and no other node fails during the recovery with neighbors in order to update their status. The backup actor
process. In other words, DCR handles sequential actor failures. detects the failure of primary through missing successive heart-
Although the probability for multiple nodes failure is small, the beats. Once the failure is confirmed, the backup node (s) initiates
hazardous application environment, exhaustion of onboard a recovery process that depends on the NCA. Since, our backup
energy may cause the failure of more than one adjacent actor. selection criteria strive to ensure that critical actors have distinct
In general the recovery from simultaneous failure of multiple backups, the recovery procedure is executed concurrently on the
nodes is very challenging. DCR and other recovery scheme for a various backups. If the backups are non-critical nodes, they
single node failure are not guaranteed to converge. For DCR two simply replace the corresponding primaries and the recovery is
non-critical backup nodes may move causing a network partition- complete. For example, assume that the failure of adjacent
ing in other parts in the networks. Consider for example the primary actors Pi and Pj is detected by their designated backups
topology of Fig. 8. When node Pi and Pj fail, moving their backup Bi and Bj, respectively. Both Bi and Bj will execute recovery in
nodes Bi and Bj will cause the network to partition although parallel. Figure 9(a) demonstrates that replacing non-critical Bi
neither of these backups are critical nodes. and Bj restores the connectivity lost due to failure of Pi and Pj and
We handle only a special class of multi-node failure which we does not require cascaded relocations. On the other hand, if the
believe is more probable to happen. In scenarios in which the backup is a critical actor, moving that node will further trigger
failure is caused by external factors such as explosions, multiple cascaded relocations until a non-critical node is engaged. For
nodes may get damaged. For this scenario, we propose a novel instance, Fig. 9(b) demonstrates the scenario where a critical
Recovery Algorithm for Multiple node failures (RAM). Like DCR, backup Bi sends a movement notification message to its own
RAM is also a hybrid approach but the criteria for backup backup and moves to the place of failed primary Pi. Moving
selection and recovery are different. RAM identifies critical actors critical node Bi initiates a shifted relocation (Li and Stojmenovic,
using the same procedure explained earlier in Section 4.1. Once 2007) where each backup replaces its corresponding primary.
the critical actors are identified, they choose appropriate backup Whereas, a non-critical backup Bj simply moves to the location of
nodes that will handle their failure. The details of the backup the primary Pj and the recovery is complete. Figure 9(c) shows the
selection and recovery procedure are described in the balance of recovery process when both Bi and Bj are critical nodes. They will
this subsection. send a message to their backups and replace Pi and Pj, respec-
tively. The pseudo-code of the RAM algorithm is detailed in
Appendix B.
4.3.1. Backup selection
Failure of adjacent actors: The presented recovery process of
Once the critical actors (primary) are identified, they appoint
RAM will successfully restore the connectivity except for the
appropriate backups to handle their failure. Like DCR, RAM also
following case for which the topology may not get repaired.
maintains minimum network state information, namely for 1-hop
A critical actor Ai may choose an adjacent critical node Aj as
neighbors, in order to cut on the messaging overhead. However,
backup, while Aj designates another node Ak as a backup and Ak
RAM imposes additional constraints while choosing a backup in
happens to be a neighbor of Ai. RAM can partially recover from
order to ensure convergence and avoid the creation of another
network partitioning. In addition to the criteria applied by DCR,
the selection of a backup among 1-hop neighbors considers the
Neighbor Criticality and Availability (NCA). Basically each actor A
maintains a state whether it is already engaged as a backup by
Bi Bj
some node and its position in the network topology, i.e., critical/
Bi Bj
non-critical. This state information is shared with neighbors
through periodic heartbeat messages. The following are applied
when selecting a backup:
Pi Pj Pi Pj
 When a critical actor chooses a backup, it prefers a non-critical
node that is not serving another primary. In other words, a
non-critical node cannot have more than one primary as long
as another free non-critical node is available in the neighbor- Bi Pi Pj Bj
hood. A critical node is restrained from choosing a non-critical
node as backup that is already designated for another actor. Fig. 9. Recovery process when there is no risk in repartitioning the network and
This is to ensure recovery in case two adjacent actors fail when the backups are (a) both non-critical and (b) one critical and one non-critical
simultaneously. (c) both critical.
850 M. Imran et al. / Journal of Network and Computer Applications 35 (2012) 844–856

A6
A7 A6
A6
A7
A1 A2 A9

A1 A13 A9 A13 A9 A7
A13

Fig. 10. Special case of failure detection and recovery: (a) A13 detect failure of A1 and A2; (b) A13 replaces A2 and appoints A9 as backup and (c) A13 moves to the place of A1,
A9 and A7 follow it.

restore the connectivity of the network lost due to failure of critical


A8 actor (s).
A5 A4

A8 5.1. DCR analysis


A3 A3
This subsection analyzes the performance of the DCR algo-
Fig. 11. Special case of failure detection and recovery: (a) A3 detect failure rithm. We introduce the following theorems:
of critical actor A5 and non-critical A4 and (b) A3 directly replace A5 and ignores
the A4.
Theorem 1. DCR converge to form a connected topology, irre-
spective of the number of network segments created due to the
failure of adjacent critical actors since one of the designated failure of a critical actor.
backup is also failed and none of the other nodes is responsible for
the recovery. An example is depicted in Fig. 10(a). Actor A2 was a Proof. Since actors have symmetric communication links and the
backup of A1 and A13 was a backup of A2. In case both A1 and A2 backup node will reposition at the location of the primary, all
fail, although the backup A13 detects the failure of A2 and executes links in the vicinity of the primary will be restored to the pre-
the recovery procedure described earlier, none of the surviving failure (pre-departure in case of cascaded relocation) level. More-
nodes is responsible for tolerating the failure of A1. over, the cascaded relocations will stop when a non-critical node
Figure 10(b) clearly indicates that RAM cannot restore connectiv- replaces a critical primary. Since each backup will only move once
ity although the failure of A1 was detected by neighbors. The DCR will be guaranteed to terminates. The worst case perfor-
obvious reason is that for a particular critical actor RAM desig- mance is when a critical node fails in the center of the network
nates a backup that is only responsible for replacing such a critical and a non-critical backup is only available at the network
actor when it fails. periphery, i.e., a leaf node that has a node degree of 1.
To handle this case, we introduce a variant of RAM’s recovery
procedure that imposes slightly extra recovery overhead. The idea Theorem 2. DCR impose a maximum travel distance overhead of
is to let backup know about the grand primary as well (i.e. r on each backup actor, where, r is the transmission range of
primary of primary). In case of failure of adjacent critical actors, an actor.
the designated backup coordinates the recovery. For instance, in
Proof. As mentioned earlier, backup actors are selected among
Fig. 10(a), A2 makes A13 aware that it is a backup for A1. In case of
A1 and A2 fail, A13 will replace A2 and find that A1 is also lost as neighbors of a critical actor. Since we assume a free space
propagation model, the maximum distance among the primary
shown in Fig. 10(b). A13 appoints a new backup i.e. A9, sends
notification message and moves to replace A1. The newly and backup is equivalent to an actor radio range, i.e., r. Thus, the
maximum distance a backup actor is required to travel to
appointed backup follows the primary and cascaded relocations
are performed as shown in Fig. 10(c). This special case can be substitute the failed primary is r. Similarly, if the backup is also
a critical node, it will be replaced by moving its backup in
generalized to a ring of critical nodes in which A2 serves as a
backup to A1, A3 is backup of A2, y, An is a backup of An  1, and, An maximum of r. DCR moves each backup only once, therefore,
the maximum movement distance for each of the involved
is a neighbor of A1.Since nodes A1, A2, y, and An  1 are critical, if
they fail, An needs to replace An  1. RAM will recursively send the backup nodes is r.
primary-backup relationship of a series of reachable critical nodes
Theorem 3. DCR does not introduce new critical actors as a result
on the ring. This can simply be achieved by making a primary C to
of recovery process.
inform its backup node B about whether C also serves as a backup
for another primary A. If B has a link to A, B will apply this Proof. We prove this theorem by showing that DCR maintains
procedure. Otherwise if B is a critical node, A will keep on existing links between nodes during the recovery. As discussed in
informing its backup about B and C and so on. Section 4.2, the recovery process consists of two steps. First,
Figure 11(a) illustrates a slightly different scenario. A3detects replacing a failed actor with a backup will re-establish all broken
the failure of non-critical primary A4 and finds that A5 (grand links with its neighbors. Second, if successive cascaded reloca-
primary) has also failed. A3 ignores the failure of A2 (since it is tions are required, all critical nodes will be replaced by their
non-critical) and moves to the position of A5 as shown in backups. As shown in Theorem 1 above, DCR terminates when a
Fig. 11(b). non-critical node move. Therefore, DCR guarantees not to intro-
duce new critical actors as consequence of recovery.

5. Algorithms analysis Theorem 4. The time it takes the DCR algorithm to converge
while restoring inter-actor connectivity is proportional to N and r
In this section, we analyze the performance of DCR and RAM where r is the communication range of actors and N is the number
algorithms. We show that both algorithms converge and successfully of actors.
M. Imran et al. / Journal of Network and Computer Applications 35 (2012) 844–856 851

Proof. Since, DCR proactively (before failure) designates for each Proof. RAM strives to restore connectivity through multiple and
critical actor a backup, the maximum time it takes a backup to independent invocations of DCR. Furthermore, when a series of
substitute failed actor is proportional to r, as proved in Theorem 2. critical nodes are engaged in a primary-backup relationship as a
If moving a critical backup further triggers c relocations, the total part of a ring, RAM handles this as an optimized implementation
recovery time will be proportional to (c þ1)  r because of the of multiple-application of DCR and would not thus introduce a
sequential relocations. Thus in the worst case DCR convergence new cut-vertex in the topology. Thus, based on Theorem 3, it can
time to restore connectivity is (c þ1)  r which does not exceed be concluded that RAM does not introduce new critical actors
(N  r). during the recovery process.

Theorem 5. The total message complexity of DCR is O(N) where Theorem 9. The recovery process of RAM incurs messaging
N is the number of actors in WSAN. overhead of O(N2) where N is the number of actors in a WSAN.

Proof. since DCR only maintains 1-hop neighbor information to Proof. Like DCR, RAM also maintains 1-hop neighbor information
rejuvenate inter-actor connectivity, this requires 1 message for to restore inter-actor connectivity; this requires 1 message for
each actor. Moreover, every critical node participating in the each actor. In addition, a node B that is picked as a backup by
recovery has to send 1 movement notification message to its node A, B will need to inform its own backup C about A. In the
backup. DCR does not count message exchange with neighbors at worst case when the topology is a ring, this will involve N more
new location and considers as a part of the regular status update messages per node, i.e., total of N2. Furthermore, every critical
for maintaining 1-hop neighbor list. Thus, in the worst case, when actor involved in recovery has to send 1 movement notification
all the(c  1) critical nodes move, the total number of messages message to the corresponding backup. If the number of adjacent
will be (N þc 1). Therefore, DCR incurs total message complexity primary actors that fail is f, RAM moves each critical node only
of O(N). once. Thus, in the worst case, when all (C  f) critical nodes move,
the total number of messages will be (N2 þN þC  f). Therefore,
5.2. RAM analysis the messaging complexity overhead in RAM is O(N2).

In this subsection, we analyze the performance of RAM. We 6. Results and analysis


introduce the following theorems:
The performance of DCR and RAM is validated through
Theorem 6. RAM successfully rejuvenates the connectivity bro- simulation. This section describes the simulation environment,
ken due to failure of adjacent actors. performance metrics and results.
Proof. In essence RAM carefully picks the backups so that DCR can
be used without repartitioning the network, and thus the conver- 6.1. Simulation setup and performance metrics
gence of RAM is implicitly guaranteed by Theorem 1 if the backup
selection process is shown to make the recovery from the failure of m In the simulation experiments, we have created inter-actor
actors as m or fewer independent invocations of DCR. RAM strives to topologies that consist of a varying number of nodes (20–100).
designate non-critical (either intermediate or leaf) nodes as backups. Nodes are randomly placed in an area of 1000 m  600 m with no
RAM assigns distinct backups for adjacent critical actors and prevents obstacles that hinder a node from moving to a new position. We
two actors from mutually serving as backups for each other. There- have varied the transmission range of actors between 50 and
fore, DCR can be applied to each failed actor independently. 200 m so that the topology becomes strongly connected. The
In addition, RAM requires a node to inform its backup about its performance is assessed using the following metrics:
own primary if any. This will allow DCR to select the right position to
move to in case a critical actor Ai chooses an adjacent critical node Aj  The total distance moved by all nodes involved in the recovery:
as backup, while Aj designates another node Ak as a backup and Ak this gauges the efficiency of DCR and RAM in terms of energy
happens to be a neighbor of Ai. For this case, Ak will replace Ai when efficiency and overhead involved in the recovery.
both Ai and Aj fail. Doing that for a sequence of primary-backup  The number of nodes moved during the recovery: this metric
critical nodes can be viewed as a multi-application of DCR enabled by reflects the scope of the recovery process.
the increased portion of network state the primaries and  The number of messages exchanged among nodes: again this
backups share. metric indicates the energy dissipation and recovery overhead.
 The percentage of coverage reduction relative to the pre-failure
Theorem 7. The time it takes RAM to converge while reestablish- level: although connectivity is the main objective of DCR and
ing the inter-actor connectivity is proportional to N2 and r where r RAM, node coverage is important for many setups. The loss of a
is the radio range of actors and N is the number of actors. node usually has a negative impact on coverage. This metric
assesses the effectiveness of the proposed approaches in terms
Proof. As stated in Theorem 6, RAM appoints distinct backups for
of mitigating the coverage loss.
adjacent actors and then applies the DCR recovery procedure for
each failed node independently. In DCR, the maximum time it
 Average node degree: measures the level of inter-actor con-
nectivity and availability of alternative paths after the recov-
takes a backup to replace the failed primary is proportional to r, as
ery is complete.
proved in Theorem 2. If the number of failed adjacent primary
actors is f and moving a critical backup further triggers c
relocations for each failed primary, the total recovery time will The following parameters were used to vary the WSAN
be proportional to (f  c þf)  r. The relocations will be sequential configuration in the experiments:
for each failed node but in parallel for all failed primary (s). Thus,
in the worst case RAM convergence time to restore connectivity is  The number of deployed nodes (N) in the network affects the
(f(c)þf)  r which does not exceed (N2  r). node density and the inter-actor connectivity.
 The node communication range (r) influences the network
Theorem 8. The recovery process of RAM does not introduce connectivity and highly affects the recovery overhead in terms
additional cut-vertices in the repaired network topology. of the traveled distance and the number of involved actors.
852 M. Imran et al. / Journal of Network and Computer Applications 35 (2012) 844–856

We compare the performance of DCR to that of DARA (Younis have high node degree since they often have non-critical nodes in
et al., 2010) and RIM (Akkaya et al., 2005). Like DCR, DARA and the neighborhood. Figure 12(a) indicates that the performance of
RIM are distributed algorithms and exploit node relocation to DCR scales very well and is not affected by the node density
recover from node failure. However, their procedure is different. because of choosing non-critical nodes as backups. Similar obser-
When a node F fails, DARA selects a best candidate A among its vation can be made for the communication range (Fig. 12(b)),
1-hop neighbors and replaces it. The algorithm is recursively where the connectivity-restoration overhead is significantly less
applied to tolerate connectivity loss due to movement, i.e., A will than that of the baseline approaches.
be replaced with one of its neighbors and so on. On the other Number of moved nodes: Fig. 13 shows the number of nodes
hand, RIM moves all the 1-hop neighbors towards F until they that were involved in the recovery when DCR and the baseline
become connected. Like DARA, RIM is applied recursively to re- approaches are applied. The performance graphs confirm the
establish links affected by nodes movement. Both DARA and RIM advantage of DCR which moves fewer actors than RIM and DARA.
are reactive approaches and do not provision for recovery ahead This is because DCR limits the scope of the recovery and avoids
of time. successive cascaded relocations by choosing non-critical nodes as
backup. Moreover, DCR moves high degree critical nodes that
often have non-critical nodes in the neighborhood. Furthermore,
6.2. Performance evaluation of DCR
the performance of DCR remains almost constant while varying
the number of nodes and their radio range, which indicates great
The experiments involve randomly generated topologies with
scalability.
varying actor counts and communication ranges. The number of
Number of exchanged messages: Fig. 14 reports the messaging
actors has been set to 20, 40, 60, 80 and 100. The communication
overhead as a function of the network size and radio range. As the
range of actors is changed among 50, 100, 150 and 200. When
figure indicates, DCR incurs far less messaging overhead than
changing the node count, ‘‘r’’ is fixed at 100 m; and ‘‘N’’ is set to 60
DARA and RIM. This is because DCR limits the message exchange
while varying the communication range. The results of the
to only between a pair of primary and backup nodes instead of all
individual experiments are averaged over 30 trials. All results
1-hop and 2-hop neighbors as is the case in RIM and DARA,
are subject to 90% confidence interval analysis and stays within
respectively. Moreover, unlike, DARA and RIM, DCR strives to
10% the sample mean.
involve non-critical nodes in the recovery which limits the need
Total distance moved: Fig. 12 shows the distance traveled by all
for cascaded relocation and thus reduces the number of notifica-
nodes until the connectivity is restored. DCR significantly outper-
tion messages. Furthermore, DCR limits the scope of the recovery
forms both DARA and RIM because it strives to only move non-
by involving high-degree nodes that have non-critical nodes in
critical nodes in order to avoid cascaded relocations. As both
the neighborhood. The average number of notification messages
graphs in the figure indicate, the performance advantage of DCR
sent by DCR in Fig. 14(a) and (b) are 0.31–0.45 and 0–0.57,
remains almost consistent even with higher node densities and
respectively. On the other hand, Fig. 14 indicates that the messa-
longer transmission ranges. This is because DCR strives to avoid
ging overhead in RIM significantly grows for high actor densities
moving critical nodes that causes further partitioning and
and long communication ranges because the number of recovery
requires successive relocations. Furthermore, DCR performs cas-
participants increases in both cases.
caded relocations only when non-critical nodes in the neighbor-
hood of a failed actor are not available. Even then DCR strives to
limit the scope of the relocations by moving critical actors that

Fig. 12. Distance traveled by all nodes during the recovery until restoring Fig. 13. The number of nodes moved during the recovery, while varying the
connectivity, as a function of N in (a) and r in (b). network size (a) and radio range (b).
M. Imran et al. / Journal of Network and Computer Applications 35 (2012) 844–856 853

Fig. 14. Effect of changing N (a) and r (b) on total number of messages exchanged
by all nodes during the recovery.
Fig. 16. Level of inter-actor connectivity after recovery, as a function of N in
(a) and r in (b).

and RIM still do not make up for the coverage loss and definitely
do not match DCR’s performance. The advantage of DCR in terms
of coverage is obviously due to the limited scope of node
relocation, which causes a coverage loss at the network periphery.
Moreover, DCR engage strongly connected nodes in recovery that
have more coverage overlap with neighbors. Hence, moving those
actors only reduce the overlap coverage. Figure 15(b) indicates
that the performance of DCR in terms of coverage reduction is not
much affected with increasing the communication range. On the
other hand, the performance of RIM significantly worsens when
growing the communication range. With the increased value of r,
the network becomes more connected and the number of neigh-
bors of F grows. RIM moves nodes inwards making the area
around F to be more crowded while leaving uncovered parts at
the network periphery and thus cause a significant loss of
coverage.
Average node degree: Fig. 16 shows the level of connectivity
maintained by all approaches after the recovery is completed. As
both figures indicate, DCR consistently maintains the same level
of connectivity of other approaches, despite the fact that DCR is
not factoring connectivity like the other baseline approaches. This
is due to moving high degree non-critical nodes and limiting the
scope of the relocation. Figures 12–15 confirm that DCR strikes a
balance between the various objectives.

6.3. Performance evaluation of RAM

Fig. 15. Coverage reduction after recovery, as a function of N in (a) and r in (b). We use the same simulation setup to evaluate the perfor-
mance of RAM. We identify critical actors and choose two
Percentage of coverage reduction: Fig. 15 shows the impact on adjacent cut-vertices at random to be failed simultaneously. For
coverage, measured in terms of percentage of coverage reduction RAM-I, the failed nodes have backup independent of each other,
relative to the pre-failure level, while changing the N and r. The and thus it is like running DCR twice. As we have seen in the
action range is set to 50 m in these experiments. Overall, DCR previous section, DCR significantly outperforms contemporary
limits the coverage loss and consistently outperforms baseline schemes found in literature; therefore, the validation of RAM-I
approaches. Although increasing the node density helps, DARA is based on DCR. The RAM-A curve reflects the results when one of
854 M. Imran et al. / Journal of Network and Computer Applications 35 (2012) 844–856

Fig. 17. Distance traveled by all nodes during the recovery until restoring
Fig. 18. Distance traveled by all nodes during the recovery until restoring
connectivity, as a function of N in (a) and r in (b).
connectivity, as a function of N in (a) and r in (b).

the designated backups also fails along with its adjacent primary
as illustrated above in special case. For RAM-I and RAM-A, we
have performed experiments with 15 different topologies. The
goal of comparing the performance of RAM-I and RAM-A is to
capture the effect of failure scenarios, for which a node C has to
deal with the failure of its primary B as well as node A that B
serves as a backup.
Total distance moved: Fig. 17 shows the total distance moved by all
the nodes involved in the recovery. Both graphs shows that RAM-A
slightly move longer distance than RAM-I. This is due to engaging
additional nodes to recover from the failure of adjacent node. More-
over, RAM-I have independent pre-designated backups for the failed
actors that do not have to travel additional distance to recover from
failure of the grand primary. Figure 17(a) indicates that the perfor-
mance of both algorithms improves with the increased actor density.
Increasing the number of actors boosts the level of connectivity and
consequently boosts the number of non-critical nodes. The availabil-
ity of non-critical nodes reduces the scope of cascaded relocations. On
the other hand, Fig. 17(b) shows that the travel distance grows with
the increase in the transmission range. This is because nodes have to
travel longer distances in order to restore connectivity.
Number of moved nodes: Fig. 18 reports on the number of
nodes that get involved in the recovery. Again, both performance
graphs indicate that RAM-I marginally outperforms RAM-A in
terms of the scope of recovery. This is because independent
execution of RAM offers more non-critical nodes on the recovery
paths that prevent unnecessary cascaded relocations. The perfor-
Fig. 19. Distance traveled by all nodes during the recovery until restoring
mance of both variants of RAM improves with the increased actor connectivity, as a function of N in (a) and r in (b).
density and the longer transmission range due to the increased
degree of connectivity and the availability of non-critical nodes in
the neighborhood. This limits the scope of recovery. RAM-A. The obvious reason is that RAM-A sends extra recovery
Number of messages exchanged: Fig. 19 shows the messaging coordination messages. Both figures suggest that messaging over-
overhead as a function of the network size and radio range. As head reduces with the higher density and the longer radio range
expected, RAM-A incurs slightly more messaging overhead than since the network connectivity improves in both cases. This
M. Imran et al. / Journal of Network and Computer Applications 35 (2012) 844–856 855

ensures the availability of more non-critical nodes that do not


require sending movement notification messages.

7. Conclusion and future work

This paper has presented a novel distributed hybrid movement


control algorithm for restoring connectivity lost due to critical
actor failure. The proposed DCR algorithm identifies critical actors
in advance based on localized information and designates for
them backup actors. DCR pursues controlled node relocation in
order to reorganize the topology and regain the pre-failure strong
connectivity. In order to handle multiple simultaneous failures of
critical actors, we have proposed RAM. RAM handles failure
scenarios in which two adjacent nodes simultaneously fail. Like
DCR, RAM is also a distributed hybrid approach that identifies
critical actors and assigns for them backups. However, RAM
assigns distinct backups for each critical actor. The designated
backups detect failure of their primaries and move to replace
them. In addition, RAM extends the primary-backup relationship
in some cases in order for the recovery to converge when a
primary and its backup fail at the same time and when the Fig. A1. High level pseudo code for DCR algorithm.
relocation of two backups causes the network to partition.
DCR and RAM serve slightly different goals and do not pursue
exactly the same procedure for backup designation and failure
recovery. The selection of which algorithm to employ would
highly depend on the application requirements. For example in
a highly hostile environment such as a combat field, RAM would
be better suited since simultaneous failure of adjacent nodes may
take place. The performance of DCR and RAM has been validated
analytically and through simulation. The simulation results have
confirmed the effectiveness of both approaches in terms of
messaging and movement overhead while minimizing the scope
of recovery and impact on coverage. In the future, we plan to
evaluate the performance of the proposed approaches in a
prototype network of mobile robots.

Acknowledgment

The author is thankful to Universiti Teknologi PETRONAS for


providing partial support. Younis’ work is supported by the
National Science Foundation, award # CNS 1018171.

Appendix A Detailed Pseudo-Code for Dcr

Figure A1 shows the high level pseudo code of DCR which will
run on each actor ‘‘A’’ in a distributed manner. If an actor A is
critical, it will select an appropriate backup actor using the
AssignBackup() procedure (lines 1–3). While serving as backup
to node F, if actor A either detects the failure of F or receives a
Fig. B1. Pseudocode of RAM for backup selection and failure recovery.
movement notification message from F, it initiates a recovery
process (lines 4–6).
A critical actor A finds an appropriate backup among the backup or its pre-designated backup (lines 9–13). Now actor A can
neighbors. The AssignBackup() procedure preferably designate a move to replace F (line 14).
non-critical neighbor (either leaf or with highest degree) as
backup. In case non-critical node is not available, it chooses a
critical actor with highest degree and least distance to A (line 7). Appendix B Detailed Pseudo-Code for Ram
The recovery procedure is executed on backup actor A, if it either
detects the failure of primary F or receives a message from F. Figure B1 shows the pseudo code of RAM that each actor ‘‘B’’
While executing the recovery procedure, A checks whether it is would execute. The pre-failure steps resemble DCR. During the
critical or not (line 8). If it is critical, it checks the status of its network bootstrapping phase, each actor (either critical or
backup BackupStatus() before going to move. If the backup of A engaged as backup) will appoint an appropriate backup among
has failed, it selects another node as backup. It then sends a neighbor actors using the AppointDistinctBackup() procedure
movement notification message to inform the newly assigned (lines 1–3). If actor B either detects the failure of primary F or
856 M. Imran et al. / Journal of Network and Computer Applications 35 (2012) 844–856

receives a movement notification message from F, node B triggers a Dantu, K, Rahimi, M, Shah, H, Babel, S, Dhariwal, A, Sukhatme, GS, Robomote:
recovery procedure FailureRecovery() to recover from F (lines 4–6). enabling mobility in sensor networks. In: Proceedings of the 4th international
symposium on processing in sensor networks (IPSN 2005), California, USA;
The AppointDistinctBackup() procedure is slightly different April 2005.
from its counterpart ‘‘AssignBackup’’ in DCR. The AppointDistinct- Das, S, et al. Localized movement control for fault tolerance of mobile robot
Backup() procedure ensures that the picked backup node does not networks. In: Proceedings of IFIP 1st international conference on wireless
sensor and actor networks (WSAN’07), Albacete, Spain; September 2007.
serve another primary and bases the selection on the criteria Duque-Anton M, Bruyaux F, Semal P. Measuring the survivability of a network:
mentioned in Section-IV (C) (line 7).The procedure FailureRecov- connectivity and rest-connectivity. European Transactions on Telecommuni-
ery() is also different from the ‘‘Recovery’’ in DCR since in RAM cations 2000;11(2):149–59.
Guiling, W, Guohong, C, La Porta, T, Wensheng, Z. Sensor relocation in mobile
two adjacent actors are not allowed to choose each other as sensor networks. In: Proceedings of the 24th annual IEEE conference on
backup at the same time. If the backup B is a critical actor, it computer communications (INFOCOM’05), Miami, FL; March 2005.
notifies its backup so that the connectivity can be maintained Goyal, D, Caffery, JJ. Partitioning avoidance in mobile ad hoc networks using
network survivability concepts. In: Proceedings of the Seventh International
(lines 8–10). Since backup B is aware of the status of the failed
Symposium on Computers and Communications (ISCC’02), Taormina, Italy;
primary F, it checks whether the failed primary was critical or not. July 2002.
If the failed node F was critical B moves to replace F (lines 11–13). Imran, M, Younis, M, Said, AM, Hasbullah, H. Volunteer-instigated connectivity
Otherwise, no need to replace since it was non-critical. In other restoration algorithm for wireless sensor and actor networks. In: Proceedings
of the IEEE International Conference onWireless Communications, Networking
words, B will directly move to the location of grand primary Gas and Information Security (WCNIS 2010), Beijing, China; June 2010.
shown in Fig. 11 and will be discussed in the following lines. Imran, M, Said, AM, Younis, M, Hasbullah, H, Application-centric connectivity
If the backup node B also detects the failure of its grand restoration algorithm for wireless sensor and actor networks. In: Proceedings
of the 6th international conference on grid and pervasive computing (GPC
primary G (i.e. primary of primary) then B executes the recovery 2011), Oulu, Finland; May 2011.
procedure FailureRecovery()to recover from grand primary as Imran, M, Younis, M, Said, AM, Hasbullah, H. Partitioning detection and con-
mentioned in Figs. 10 and 11 earlier (lines 14–16). nectivity restoration algorithm for wireless sensor and actor networks. In:
Proceedings of the 8th IEEE/IFIP international conference on embedded and
ubiquitous computing (EUC 2010), Hong Kong, China; December 2010.
References Li, N Xu S, Stojmenovic, Ivan. Mesh-based sensor relocation for coverage main-
tenance in mobile sensor networks. In: Proceedings of the 4th international
conference on ubiquitous intelligence and computing (UIC 2007), Hong Kong,
Akyildiz IF, Kasimoglu IH. Wireless sensor and actor networks: research chal- China; July 2007.
lenges. Ad Hoc Networks 2004;2:351–67. Lee S, Younis M. Recovery from multiple simultaneous failures in wireless sensor
Abbasi, AA, Akkaya, K, Younis, M. A distributed connectivity restoration algorithm networks using minimum Steiner tree. The Journal Parallel and Distributed
in wireless sensor and actor networks. In: Proceedings of the 32nd IEEE Computing 2010;70(5):525–36.
conference on local computer networks (LCN 2007), Dublin, Ireland; October MilenkoJorgić, IS, Hauspie, Michaël, Simplot-ryl, David. Localized algorithms for
2007. detection of critical nodes and links for connectivity in ad hoc networks. In:
Akkaya, K, Younis, M. COLA: a coverage and latency aware actor placement for Proceedings of the 3rd annual IFIP mediterranean ad hoc networking work-
wireless sensor and actor networks. In Proceedings of the 64th IEEE vehicular shop, Med-Hoc-Net, Bodrum, Turkey; June 2004.
technology conference (VTC-Fall’ 06), Montreal, Canada; September 2006. Ozaki, K, Watanabe, K, Itaya, S, Hayashibara, N, Enokido, T, and Takizawa, M, A
Akkaya K, Younis M, Bangad M. Sink repositioning for enhanced performance in fault-tolerant model for wireless sensor-actor system. In: Proceedings of the
wireless sensor networks,. Computer Networks 2005;49:512–34. 20th international conference on advanced information networking and
Akkaya, K, Thimmapuram, A, Senel, F, Uludag, S. Distributed recovery of actor applications (AINA 2006), Vienna, Austria; April 2006.
failures in wireless sensor and actor networks. In Proceedings of the IEEE Orozco-Barbosa L, Olivares T, Casado R, Bermúdez A, Das S, Liu H, Kamath A, Nayak
wireless communications and networking conference (WCNC 2008), Las A, Stojmenović I. Localized movement control for fault tolerance of mobile
Vegas, NV; March 2008. robot networks. Wireless Sensor and Actor Networks 2007;248:1–12. Springer
Azadeh, Z. A hybrid approach to actor  actor connectivity restoration in wireless Boston.
sensor and actor networks. In: Proceedings of the 8th IEEE international Tamboli, N and Younis, M. Coverage-aware connectivity restoration in mobile
conference on networks (ICN 2009), Cancun, Mexico; March 2009. sensor networks. In: Proceedings of the IEEE international conference on
Akkaya K, Senel F, Thimmapuram A, Uludag S. Distributed recovery from network communications (ICC 2009), Dresden, Germany; June 2009.
partitioning in movable sensor/actor networks via controlled mobility. IEEE Younis M, Lee S, Abbasi AA. A localized algorithm for restoring inter-node
Transactions on Computers 2010;59(2):258–71. connectivity in networks of moveable sensors. IEEE Transactions on Compu-
Batalin, MA, Sukhatme, GS. The analysis of an efficient algorithm for robot ters 2010;99(12).
coverage and exploration based on sensor network deployment. In: Proceed- Youssef, A, Ashok, A, Younis, M. Accurate anchor-free node localization in wireless
ings of the 2005 IEEE international conference on robotics and automation sensor networks. In: Proceedings of the 24th IEEE international performance,
(ICRA 2005), Barcelona, Spain; April, 2005. computing, and communications conference (IPCCC 2005). Phoenix, AZ; April
Bulusu N, Heidemann J, Estrin D. GPS-less low-cost outdoor localization for very 2005.
small devices. IEEE Personal Communications 2000;7(5):28–34. Younis M, Akkaya K. Strategies and techniques for node placement in wireless
Basu P, Redi J. Movement control algorithms for realization of fault-tolerant ad hoc sensor networks: a survey. The Journal of Ad-Hoc Networks 2008;6(4):
robot networks. IEEE Networks 2004;18(4):36–44. 621–55.

Você também pode gostar