Você está na página 1de 8

A LEARNING-BASED ADAPTIVE ROUTING FOR QOS-AWARE DATA COLLECTION IN FIXED SENSOR NETWORKS WITH MOBILE SINKS

RENZHONG WANG Dept. of Engineering Management and Systems Engineering GANESH K. VENAYAGAMOORTHY Dept. of Electrical and Computer Engineering SANJAY MADRIA Dept. of Computer Science

CIHAN H. DAGLI Dept. of Engineering Management and Systems Engineering

Missouri University of Science and Technology Rolla, MO ABSTRACT Routing data from sensor nodes to designated mobile data sinks is a common and challenging task in a wide spectrum of Wireless Sensor Network (WSN) applications and thus becoming an active research area. In this paper, a reinforcement-learning based adaptive routing scheme implemented through Adaptive Critic Design (ACD) is proposed. In this scheme, sensor nodes discover and improve the routes at the time of packets transmission. Decision is made dynamically at each sensor node based on various constraints and environmental conditions considered and multi-objective optimization performed. Extensive simulations using synthetic network topologies and sink traces are conducted to test the performance of the proposed routing algorithm with the guidance of Design of Experiments (DoE). The results show the proposed scheme is highly robust and adaptive to a variety of situations. I. INTRODUCTION Wireless Sensor Networks (WSNs) are distinguished from traditional networks by characteristics such as deeply embedded routers, resource constrained nodes, and unreliable and asymmetric transmission. A major performance constraint in sensor networks with battery-powered nodes is energy, a significant portion of which is spent on communications. Many protocols have been developed with energy awareness as an essential consideration. Such protocols can be categorized as data-centric, hierarchical, and location-based (Akkaya and Younis, 2005). Data-centric routing is query-based with attribute-based naming being used for specifying the desired data, thus eliminating the need of node addressing. Hierarchical routing groups nodes into clusters and have data aggregation and reduction performed at cluster heads so as to save energy and avoid gateway overload and long-haul communication. Location-based routing exploits the position information to relay data only to desired regions. WSNs with mobile data sinks introduce another dimension, the sink mobility. Routing protocols for such WSNs can be proactive (Luo and Hubaux, 2005), in that sensor nodes push their readings to storage nodes from where they are collected by

mobile sinks whereas others are reactive (Akkaya and Younis, 2004) in that mobile sinks pull readings from nearby sensor nodes as they traverse the sensor network. Hybrid schemes, a combination of the proactive and reactive techniques, are also proposed (Wohlers et al., 2009). Some other approaches mine the sinks' mobility pattern or try to find the optimum trajectories of mobile sinks (Wang et al., 2005, Jain et al., 2006). There are few other protocols that are based on network flow or QoS awareness. QoS-based routing has to balance among multiple QoS metrics and is subject to some performance constraints and link dynamics, which make it very challenging. Sequential Assignment Routing (SAR) proposed in (Sohrabi and Pottie, 2000) is one of the earliest QoS-based routing protocols. SAR calculates a weighted QoS metric as the product of the additive QoS metric and a weight coefficient associated with the priority level of the packet. Other QoS-based frameworks include Integrated Services (Intserv), Differentiated Services (Diffserv), and Multi-Protocol Label Switching (MPLS). From the other perspective, one may divide ad-hoc routing algorithms into structurebased and structure-less. A structure based routing strategy builds and maintains a routing structure, such as a spanning tree, a routing table, or one or multiple paths, while a structure-less routing strategy makes the routing decision at every hop. The reinforcement learning is another promising approach capable of producing highly adaptive routing algorithm. The QELAR protocol proposed in (Hu and Fei, 2010) is a reinforcement-learning based adaptive routing mechanism that aims at prolonging the lifetime of networks by making residual energy of sensor nodes more evenly distributed. A self-organized multipath routing mechanism for WSNs using enhanced ant colony optimization based on delay, energy and velocity was also proposed in (Saleem et al., 2009), where the reinforcement learning feature helps to improve the overall data throughput, especially in case of real time traffic. In this paper, a WSN is considered to be consisting of n static sensor nodes deployed in an area of interest and m mobile sinks roaming over the area for data collection. To devise a routing scheme for such systems, the following assumptions are made: Energy conservation is the primary goal. Communication throughput or delay is the secondary goal. Other aspects, such as circuits, architecture, algorithms, protocols, and dynamic power management, are not the concern of this paper; Sensor nodes are both power and resource (storage and processing) constrained; Sensor nodes are deployed in large scale and are highly distributed and connected through wireless communication. Each node can act as a router; Sensors know a few nodes within proximity but may not know their locations; Mobile nodes with arbitrary mobility patterns and abundant energy, storage and processing resources wirelessly communicate with sensor nodes within range; Wireless communication is subject to interference and limited range. Here, the data refresh rate is the primary constraints. A data packet is considered fresh when it arrives at a mobile sink within a user-specified freshness threshold. Each application defines a lower bound on the delivery ratio of fresh packets and 99% of all packets must be delivered within the specified freshness thresholds. II. OVERVIEW OF THE ADAPTIVE CRITIC DESIGN A WSN with mobile sinks is highly dynamic in nature. For example, 1) sensor nodes may change its states among transmission, idle, and fail; 2) transmission links are established dynamically while link quality fluctuates due to interference, unreliable signal strength, unpredictable traffic, etc.; 3) the number and mobility of mobile sinks vary; 4)

sensor node distribution is heterogeneous in terms of data generation rate and freshness requirements. Such dynamic nature strongly suggests that an adaptive algorithm will be more appropriate for the routing strategy (Zhang et.al., 2004, Baruah et al., 2004). Here, the ACDs (Si et al., 2004) are proposed for devising a dynamic routing algorithm capable of online learning. Specifically, the direct Neural Dynamic Programming (NDP, also known as ADHDP) (Si et al., 2004) was chosen because it does not need a plant model, and thus no plant parameter estimation takes place but instead certain plant information is used directly to find appropriate and convergent control laws and control parameters. Such designs need less computation than some other forms of ACDs. In direct NDP, an action network is used to generate the control law and a critic network is used to approximate the cost-to-go function (Bellman's equation). The action network and the critic network are trained alternatively in an online manner for each input state. A discount rate is used in the estimation of cost-to-go function by the critic network. Therefore, instead of optimizing on a certain operating point as an adaptive controller does, direct NDP not only can achieve optimization in a broad range of operating points but also can adapt to the change of environment by changing the optimum control law. Fig. 1 illustrates the architecture of direct NDP, where the solid lines denote system information flow, while the dashed lines represent error backpropagation paths for reducing the squared Bellman error: {[r(t) + J(t)] -J(t - 1)}2.

X(t) X(t) Action Network u(t) System Critic Network

J(t) + R*

J(t-1)

+ +

Fig.1. Direct Neural Dynamic Programming (Adapted from (Si et al., 2004))

III. DESIGN OF ACD FOR THE ROUTING PROBLEM 1. Mathematical Model In order to facilitate the problem representation and programming, it is assumed that all data transmitted over the WSN conform to a predefined metadata. Here, the Messageinitiated Constraint-Based Routing (MCBR) metadata scheme proposed in (Zhang and Fromherz, 2004) was adopted, where modifications are made wherever necessary to fit the problem in this paper. The basic idea of MCBR is as follows: An MCBR specification for a message m is a tuple, , which consists of a destination constraint, a route constraint, and a routing objective. The goal of routing is to deliver the message from to one or all of the destination nodes, , satisfying via a sequence or a tree of intermediate nodes p : ,, . . . , such that is satisfied at and minp Om(p). A local objective function o is defined on a set of attributes: o: A1 A2 . . . An R+, where Ai is the domain of attribute i and R+ is the set of positive real numbers. The value of o at a node v, denoted o(v), is o(a1, a2, . . . , an), where ai is the current attribute value of attribute i at node v. A global routing objective can be obtained by aggregating local objectives over the routing path in the form of , where only additive operation is considered.

In this paper, the objective function is a weighted sum of four factors (sub-functions) that reflects QoS requirements: (1) primary communication cost. A constant function is defined (i.e., unit transmission cost); (2) cost associated with energy consumption. A linear function, ku + c, is defined, where u is the amount of energy used in the node, and k and c are constants; (3) cost associated with available neighbors for establishing connections. An objective function like k/n+c is used, where n is the number of neighbors; 4) congestion awareness cost function, o(l) = l + 1, where l is the message load attribute (e.g., number of messages in the nodes queue). In addition, a congestion avoiding constraint like l lm , where lm is the load limit, is also used. 2. Adaptive Critic Design Implementation For the problem setting in this paper, each sensor node can be considered as a decision maker, which evaluates the states of its environment and choose an appropriate neighbor as its parent node for forwarding the data packet. Hence, each sensor node can be implemented as a small ACD system. These connected ACD systems can collectively achieve global optimization approximately by optimizing local objectives. The global objective has not been used because it is well-known that finding an optimal path with an additive objective while satisfying an additive constraint is NP-hard problem. The detailed design of each ADHDP system is as follows: States: The input state vectors, X(t) = [E, Q, Nb], for a sensor node is a set of inputs from all its neighbor nodes, each having a three-fold value consisting of E - energy used, Q - queue length, and Nb - number of neighbors. Actions: The action network output is the selection of parent node. The utility function is u(t) = w1(k1E (t)+ c1) + w2(k2/Nb(t) + c2) + w3(k3Q(t) + c3) + w4T (1) where k1, c1 , k2, c2 , k3, and c3 are constant. T is a binary value (T = 1, if communication is established; otherwise, T = 0). Both the action network and critic network use feed-forward multi-layer perceptron (MLP) neural network, each with only 8 neurons. Small scale of neural network is preferred due to resources constraints. The online learning strategy is employed, i.e., the weights of both networks will only be updated once for every input. In practice, an even less frequent update can be used such as update once for every ten packets received.

3. Adaptive routing algorithm Given a routing specification of a message, a cost function can be defined on each node, called J-value, indicating the minimum cost-to-go from this node to the destination. The cost is initially unknown, and an initial estimation is made according to the type of message. Furthermore, a node also stores its neighbors J-values, called NJ-values, which are estimated initially according to the neighbors attributes and updated when packets are received from the neighbors. For each packet sent out from a node, the current Jvalue and state vector [E, Q, Nb] of the node for the type of message are attached as a header. All the nodes are set to be in promiscuous listening mode. Whenever a node hears a packet of type m, whether it is the designated receiver or not, it updates the corresponding NJ-value and re-estimates its own J-value. The learning algorithm begins when a sensor detects a mobile sink as it comes in range and subsequently initiates a tree construction process. This process has three phases: 1) The initialization phase. Nodes receiving beacon initiate the tree construction by sending tree construction beacon which

contains simple information of (m, J, X ) that announces the message type, the estimated J-value and its states. As a result, an initial spanning tree rooted at the sink is built. The initial spanning tree may not be optimal. Each node other than the sink has a pointer to its parent node chosen by its action network; 2) The forwarding phase. Each node passes its received packet to its parent (or the sink if it is within range). Note that a nodes parent may change if the action network chooses a different neighbor according to its judgments; 3) Confirmation phase. If the packet is not received by the parent node within a certain time period, the J-value of that node is increased, and the parent pointer might be set to a new neighbor if necessary. An upper-bound of retransmission is imposed.

IV. EXPERIMENTS AND RESULTS 1. Simulation Environment and Setup Here the radio propagation model and MAC layer communication model provided by PROWLER (free software) (Simon, 2008) is adopted as the simulator. PROWLER is a probabilistic sensor network simulation tool written in Matlab supporting event-driven simulation. The topology of the fixed sensor network is simulated by a 7 by 7 rectangle grid with optional random offset. Such small grid can be viewed as a random sample from a larger sensor network. The maximum radio range of each sensor node follows normal distribution, N(r, ), with r being a certain multiple of the average distance between sensor nodes (d). The threshold of radio strength for triggering data transmission can be set. With the deployment of sensor nodes and coverage range set, the possible connections among sensor nodes can be determined. The sensor nodes then use this information to set their neighbors. Once set, the neighborhood information is fixed. For the mobile sink, complete randomized trajectory was used.

2. Performance Metrics The following QoS metrics have been used to assess the performance of the proposed algorithm: (1) Delivery ratio: The total number of packets received at mobile sinks over the total number of packets sent from all sources. (2) Throughput: The total number of packets received at sinks over the time elapsed. (3) Energy efficiency: The ratio of the total numbers of packets received at mobile sink(s) to the total number of transmissions in the network. (4) Lifetime: The average lifetime left in a sensor node can be approximated by , where and are the mean and standard deviation of energy consumption respectively and E is the total initial energy.

3. Design of Experiments Factors that might affect the performance of the proposed routing algorithm include (but not limited to): (1) The topology of the sensor network; (2) The radio range of a sensor nodes; (3) The moving speed of mobile sink(s); (4) The number of mobile sink(s); (5) The maximum delay for retransmission; (6) Discount rate of learning algorithm () ; (7) Parent switch threshold (). A partial ( ) factorial design with main effects aliased with two-factor interactions was used to test the significance of these factors. The above factors from (1) to (5) are environmental factors that are usually beyond the control of a WSN system. Factors like (6) and (7) as well as weights in the objective function are design variables to be optimized. The Response Surface Method (RSM)

based on regression analysis is employed here to perform the optimization. The three objectives to be balanced are energy efficiency, throughput, and energy variance. The weights in the objective function are a three-fold value (see eq.1). Hence, a mixture design using augmented simplex lattice is used. The and is then optimized using Central Composite Design (CCD). Here it is assumed that there is little interaction between the two factors, and , and the three factors, the weights, so the optimization can be carried out separately. The results are summarized in Tables 1 and 2. Next, three sets of simulation outputs are provided. The corresponding experimental settings are summarized in Table 3. Fig. 2 presents various performance plots.

Table 1. Design Expert Output of Response Surface Method (RSM) Optimization for Weights in Objective Function Using Mixture Design

EnergyW QueueW NeighborW Throughput EEfficiency EVariance Desirability 0.381 0.619 0.000 30.3091 4.67154 0.0136694 0.805

Table 2. Design Expert Output of Response Surface Method (RSM) Optimization for Learning Rate () and Parent Switch Threshold () Using Central Composite Design

-1.40*

0.65*

Throughput 31.5475

EEffiency 4.94037

EVariance 0.0124798

Desirability 1.000

(* Coded values. The corresponding real values are = 0.05 and = 0.74 respectively)

Table 3. Experimental Settings for Selected Simulation Output

Factors Number of mobile sinks Learning rate 1 Parent switch threshold 1 Max delay 2 Radio Range 3 Topology 4 Sink moving speed 5

#1 4 0.2 0.2 4000 2 RG 1

#2 4 0.5 0.2 1000 1 RG 3

#3 1 0.5 0 4000 3 RGro 3

Remarks
1 2 3

range from 0 to 1 in abstract simulation time (sec) in multiple of the average distance between sensor nodes (d) 4 RG stands for Rectangle Grid. RGro stands for Rectangle Grid with random offset. 5 in d/sec

4. Discussion of Results The results of Analysis of Variance (ANOVA) suggested that all factors are significant for Energy Efficiency metric. Among those factors, the number of mobile sink is the most significant factors, the parent switch threshold () and the radio range follows. The Max Delay for Retransmission is not very significant for Delivery Ratio and Throughput. The relative small learning rate () and moderate parent switch threshold () suggested by the statistical optimization implies that the learning algorithm should adapt to the dynamic of the system very quickly in terms of assessing system states but maintain relative stable routing path so as to reduce the cost of route discovery and risks of uncertainty. The relative large weight assigned to the queue length as suggested by the statistical optimization indicates that load balance is as important as the pure energyaware measurement in achieving energy saving goal. An overloaded node has several disadvantages such as delay in transmission, increase in retransmission (doubled by

0.25 0.2

Delivery ratio

50 40

Throughput

Delivery ratio

0.15 0.1 0.05 0 0 10

Throughput

Setting 1 Setting 2 Setting 3

30 20 10

Setting 1 Setting 2 Setting 3

20 30 Simulation time Energy efficiency

40

10

20 30 Simulation time

40

0.25

1000 995 Setting 1 Setting 2 Setting 3

Average Lifetime Left in a sensor

Energy efficiency 1

0.2

0.1 0.05 0 0 10

Lifetime

0.15

990 985 980

Setting 1 Setting 2 Setting 3

20 30 Simulation time

40

975

10

20 30 Simulation time

40

Fig. 2. Performance Plots for 3 Experimental Settings

sending acknowledgement), and depleting much quicker than other nodes. The 0 weight for neighborhood as suggested by the statistical optimization implies that the connectionaware goal is debatable. The node with more neighbors has more options to dispatch the load but also has greater probabilities to be included in a routing path, which implies more energy consumption. The performance plots suggest that all performance measures except for the Lifetime stabilized after 10 Sec of simulation time. This was the result of both the convergence of system states and the end of learning. In addition, since overlapped radio with similar strength will interfere with each other and thus cannot trigger data transmission, a low delivery ratio is expected. V. CONCLUSION AND FUTURE WORK This research proposed a learning-based adaptive routing algorithm for fixed WSN with mobile sinks. The learning mechanism is achieved through reinforcement learning strategy using ACD design. The highly dynamic nature of the WSN and mobile sink system is in favor of adaptive routing strategy. The approach proposed here has a number of attractive properties including: (1) resilient to un-predicable link failures and mobile sinks trajectory; (2) automatic adaptation to different routes when network conditions change; (3) address and exploit the heterogeneous of sensor distribution. In this work, no assumption over the mobile sinks mobility is made. As a result, the model is applicable to a potentially wider set of applications than those approaches that assume certain mobility patterns of mobile sinks. For future research, other types of ACD design such as SNAC (Single Network Adaptive Critic) can be tried so as to reduce the complexity of the learning system. More factors can be experimented with. More scenarios such as sensor nodes failure can be studied. In addition, the performance of the proposed algorithms needs to be compared with other existing routing schemes.

REFERENCES
Akkaya K., and M. Younis, 2004, Energy-aware routing to a mobile gateway in wireless sensor networks, in Global Telecommunications Conference Workshops (GlobeCom), pp. 1621. Akkaya, K., and Younis M,. 2005. A survey on routing protocols for wireless sensor networks, Ad Hoc Networks 3, no. 3: 325349. Gandham, S., Dawande, M., Prakash, R., and Venkatesan, S., 2003, Energy efficient schemes for wireless sensor networks with multiple mobile base stations, Proceedings, Global Telecommunications Conference(GlobeCom), Vol. 1, 2003, pp. 377381. Hu, T., and Fei, Y., 2010. QELAR: A Machine-Learning-Based Adaptive Routing Protocol for Energy-Efficient and Lifetime-Extended Underwater Sensor Networks, IEEE Transactions on Mobile Computing 9, no. 6: 796-809. Jain, S., Shah, R., Brunette, W., Borriello, G., and Roy S., 2006, Exploiting mobility for energy efficient data collection in wireless sensor networks, Mob. Netw. Appl., vol. 11, no. 3, pp. 327339. Luo J., and Hubaux, J. P., 2005, Joint mobility and routing for lifetime elongation in wireless sensor networks, Proceedings, Intl Conference on Computer Communications (Infocom), 2005, pp. 17351746 Saleem, K., Fisal, N., Hafizah, S.,Kamilah, S., and Rashid, R. A., 2009, A Self-Optimized Multipath Routing Protocol for Wireless Sensor Networks, International Journal of Recent Trends in Engineering, Vol 2, No. 1. Si, J., Barto, A. G., Powell, W. B. and Wunsch, D, 2004. Handbook of learning and approximate dynamic programming. Wiley-IEEE Press, John Wiley & sons, Inc, pp. 125 152. Simon, G., 2008 Prowler: Probabilistic wireless network simulator, http://www.isis.vanderbilt.edu/projects/nest/prowler/. Sohrabi, K., Pottie, J., 2000 Protocols for self-organization of a wireless sensor network, IEEE Personal Communications, Volume 7, Issue 5, pp 16-27. Wang, Z., Basagni, S., Melachrinoudis, E., and Petrioli, C., 2005, Exploiting sink mobility for maximizing sensor networks lifetime, in Annual Hawaii Intl Conference on System Sciences, Wohlers, R., Trigoni, N., Zhang, R., and Ellwood S., 2009, TwinRoute: Energy-Efficient Data Collection in Fixed Sensor Networks with Mobile Sinks, Proceedings, the 2009 Tenth International Conference on Mobile Data Management: Systems, Services and MiddlewareVolume 00, 192201 Zhang, Y., and Fromherz, M., 2004 Message-initiated constraint-based routing for wireless ad-hoc sensor networks. In Proc. IEEE Consumer Communication and Networking Conference, 2004. Zhang, Y., Fromherz, M., and Kuhn, L., 2004, Smart routing with learning-based QoS-aware metastrategies, Proceedings, Quality of Service in the Emerging Networking, ser. Lecture Notes in Computer Science 3266, 2004.

Você também pode gostar