Escolar Documentos
Profissional Documentos
Cultura Documentos
(t) = t
Input: Time.
Output: The measurement s
i
[t] of a sensor i at time t.
Model: s
i
[t] = h
(t) = t.
The model approximates the set of measurements with just one
parameter .
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Learning with wireless sensor data
Motivation
Machine learning techniques can be used to reduce
communication by approximating sensor data with models.
Eective approach as sensor data are
temporally and spatially related (correlations)
Noisy: exact measurements rarely needed.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Learning with wireless sensor data
1) Model-driven data
acquisition
5
4
3
6
2
1
Base
station
5
4
3
6
2
1
Base
station
3) Aggregative approaches
1
Base
station
2
3
4
h
1
h
2
h
3
h
4
h
1
h
2
h
3
h
4
2) Replicated models
15
20
25
30
t
e
m
p
e
r
a
t
u
r
e
(
C
)
15
20
25
30
t
e
m
p
e
r
a
t
u
r
e
(
C
)
15
20
25
30
t
e
m
p
e
r
a
t
u
r
e
(
C
)
(a) (b) (c)
Figure 7: Kernel regression model obtained from temperature data collected at the Intel Research, Berkeley lab, using 5
kernel regions, with 3 basis function per region, at dierent times of the day (the circles represent the actual temperature at
the sensor locations): (a) at night, locations near windows are colder; (b) in the morning, the East side of the lab faces the sun,
signicantly increasing the temperature; and (c) in the early evening, the temperature is uniformly warm.
23.1 23.2
22.1 22.1
22.1 22.2
22.5 22.5
27.4
24.4
27.8
26.8
28.3
25.7
27.1
22.2 23.4
20.1 22.2
20.9 20.4
20.3
24.6
27.0
25.4
27.9
25.9
28.8
25.7
29.0 27.6
24.5 26.0
23.8 24.3
22.7
21.9 22.8
19.9
26.4 26.7
23.5 23.8
28
16
18
20
22
24
26
Figure 8: A contour plot generated by running kernel-based
quadratic regression on the data collected at 10 AM on Oc-
tober 28th in the Intel Research, Berkeley lab. The labels
represent the actual temperatures measured at the sensor
locations. Note that this rich contour was obtained from a
regression model with only 15 parameters.
the sliding window, T. We measured the root-mean-squared
(RMS) between the model,
f(x, y, t), and the value at every
point in the data set, D(x, y, t).
To measure the ability of regression to predict the value at
locations in the sensor eld where there are no readings, we
also experimented with subsampling of the data set D to a
dataset with 1/8th of the of the original data, and measured
the RMS of regression applied to this dataset versus D.
The results of these error measurements for dierent basis
sets with varying time windows and subsampling are shown
in Figure 9. We experimented with three dierent basis
function sets per kernel: either (1) linear-space, quadratic-
time (e.g.,
f(x, y, t) = c
1
(x) +c
2
(y) +c
3
(t
2
) +c
4
(t) +c
5
), (2)
linear-space, linear-time, or (3) linear-space, constant-time.
We also measured the RMS of simply computing the average
value of the readings in each kernel over the time window
T. Note that regression performs quite well compared to
averaging, and that, as expected, increasing the number of
basis functions increases the quality of the t. Surprisingly,
regression using the reduced data set (with 1/8 the points)
performs as well as regression with the entire data set; this
is likely due to low variations in temperature within an 8
reading (16 minute) window.
Since average error over an entire data set does not cap-
ture the worst case performance of these approaches, we also
plotted the error of these schemes at dierent times of day,
using a time window size of two hours. The results of this
experiment are shown in Figure 10. Notice that the linear
20 40 80 120 160 200 240 280 320 360 400
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
size of time window in minutes
R
M
S
average over regions
constant in time
linear in time
quadratic in time
quadratic in time (using 1/8 of data)
quadratic in time (10 kernels)
Figure 9: The RMS error of regression with varying time
windows and numbers of basis functions per kernel for the
data set collected from the Intel - Berkeley Lab, compared
against simple averaging in each kernel.
6 pm midnight 6 am noon 6 pm midnight 6 am noon 6 pm midnight
0
0.5
1
1.5
2
2.5
time of day
R
M
S
quadratic in time
linear in time
constant in time
average over regions
Figure 10: The error of dierent regression models for the
lab data set at dierent times of day, using a time window
size of 2 hours.
and quadratic t perform much better during times when
the temperature changes dramatically (e.g., when the sun
rises and sets see Figure 6 for a plot of the temperature
reading from several sensors), but that all schemes perform
well at times of low variance (e.g., at night). This suggests
that an adaptive scheme, where dierent numbers of basis
functions are used depending on signal variability may be
benecial; such an exploration is left for future work.
We also measured the communication costs for our lab
deployment (using 3 coecients per kernel) and found that,
in TOSSIM, the total number of bytes sent by all sensors
was 5808 bytes versus 875 bytes to extract a single reading
from every sensor. After this 5800 bytes of communication,
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Model-driven data acquisition (1)
5
4
3
6
2
1
Base
station
5
4
3
6
2
1
Base
station
1 2 3 4 5 6
1
2
3
N
N
e
p
o
c
h
s
X
NS
S
q
= {2, 3, 6}
S
S
p
= {1, 4, 5}
Learning
process:
What sensors
can be used to
predict sensor
? S
p
S
S
q
S
Partition S in S
q
(queried) and S
p
(predicted)
Find prediction models such that S
p
are predicted from S
q
.
h
: R
|S
q
|
R
|S
p
|
s
q
[t] s
p
[t]
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Model-driven data acquisition (2)
Tradeo communication costs/accuracy:
C(S
q
): Communication cost of collecting the measurements
from S
q
.
R(S
p
): Accuracy of the predictions for sensors in S
p
.
The goal of the optimization problem is to nd the subset S
q
that
[Deshpande et al., 2005]
1
minimizes C(S
q
),
2
such that R(S
p
) > , is user dened.
NP-hard!
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Model-driven data acquisition (3)
5
4
3
6
2
1
Base
station
5
4
3
6
2
1
Base
station
Preliminaries
Model-driven data acquisition (3)
start with S
q
= {1, 2, 3, 4, 5, 6}, S
p
=
5
4
3
6
2
1
Base
station
5
4
3
6
2
1
Base
station
remove
sensor 1
remove
sensor 5
remove
sensor 4
Preliminaries
Model-driven data acquisition (3)
start with S
q
= {1, 2, 3, 4, 5, 6}, S
p
=
S
q
= {2, 3, 4, 5, 6}, S
p
= {1}
S
q
= {2, 3, 4, 6}, S
p
= {1, 5}
S
q
= {2, 3, 6}, S
p
= {1, 4, 5}
Preliminaries
Model-driven data acquisition (3)
start with S
q
= {1, 2, 3, 4, 5, 6}, S
p
=
S
q
= {2, 3, 4, 5, 6}, S
p
= {1}
S
q
= {2, 3, 4, 6}, S
p
= {1, 5}
S
q
= {2, 3, 6}, S
p
= {1, 4, 5}
Preliminaries
Model-driven data acquisition (3)
start with S
q
= {1, 2, 3, 4, 5, 6}, S
p
=
S
q
= {2, 3, 4, 5, 6}, S
p
= {1}
S
q
= {2, 3, 4, 6}, S
p
= {1, 5}
S
q
= {2, 3, 6}, S
p
= {1, 4, 5}
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Model-driven data acquisition (4)
All the planning is centralized
Prediction model:
Any model can be used (depends on BS resources),
Linear models: Ecient and eective computational tricks
([Deshpande et al., 2005]).
p|q
=
p
+
pq
1
qq
(s
q
[t]
q
)
p|q
=
pp
pq
qp
.
P (s
i
[t] [s
i
[t] , s
i
[t] + ]) > , i S
p
Communication costs:
Shortest path problem,
Dierent metric possible (based on radio link quality,
remaining energy, load balancing, . . . ).
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Model-driven data acquisition (5)
Summary:
Advantages:
Sensor nodes in S
p
remain in their sleeping mode (lowest
energy level),
Models are computed at the BS: exhaustive view of the
network (spatial and temporal).
Drawbacks:
The BS undertakes all the computations: not scalable,
Failure to get one measurement from S
q
may lead to very
high errors (see Le Borgne et al., 2005 and Chu et al., 2006 for possible
improvements).
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Replicated models
1
Base
station
2
3
4
h
1
h
2
h
3
h
4
h
1
h
2
h
3
h
4
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Replicated models
Overview
Recall: In environmental monitoring, a sensor sends its
measurements periodically.
Measurements s[t] are sent at every time t.
Base
station
Wireless node
s[t]
s
[
t
]
t
s
[
t
]
t
Replicated models:
Models h are sent instead of the measurements.
Base
station
Wireless node
s
[
t
]
t t
h
s
[
t
]
Danger
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Replicated models
Overview
Models computed by the sensor node
The node can compare the model prediction with the true
measurements:
A new model is sent if |s[t] s[t]| >
is user-dened, and application dependent.
Simple learning procedure must be used. Most simple model:
Constant model [Olston et al., 2001]
s
i
[t] = s
i
[t 1]
Simply: The next measurement is the same as the previous
one
no parameter to compute
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Replicated models
Constant model
Temperature measurements, Solbosch greenhouse. = 2
C.
160 170 180 190 200 210 220
3
0
3
2
3
4
3
6
3
8
4
0
4
2
Accuracy: 2C
Constant model
Time instants
T
e
m
p
e
r
a
t
u
r
e
(
C
)
q q q q q
Sensor node
Base station
5 updates instead of 58
more than 90% of communication savings.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Replicated models
Autoregressive models
More complex models can be used: autoregressive models AR(p)
[Santini et al., Tulone et al., 2006].
s[t] =
1
s[t 1] + . . . +
p
s[t p]
0 5 10 15 20
2
0
2
5
3
0
3
5
4
0
4
5
Accuracy: 2C
AR(2)
Time (Hour)
T
e
m
p
e
r
a
t
u
r
e
(
C
)
Time (hours)
T
e
m
p
e
r
a
t
u
r
e
(
C
)
An AR(2) reduces the number
of updates by 6 percents
in comparison to
the constant model.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Replicated models
Pros and cons
Pros:
Guarantee the observer with accuracy.
Simple or complex models can be used.
Cons:
In most cases, no a priori information is available on the
measurements. Which model to choose a priori?
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Adaptive Model Selection [Le Borgne et al., 2007]
Motivation
Tradeo: More complex models better predict measurements,
but have a higher number of parameters.
Model complexity
M
e
t
r
i
c
Communication costs
Model error
AR(p) : s
i
[t] =
p
j=1
j
s
i
[t j]
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Adaptive Model Selection
Collection of models
A collection of K models {h
k
}, 1 k K, of increasing
complexity are run by the node.
Base
station
Wireless node
s
[
t
]
t t
{h
1
, h
2
,
. . . , h
K
}
s
[
t
]
h
2
W
k
: new metric estimating the communication costs.
When an update is needed, the model with the lowest W
k
is
sent.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Adaptive Model Selection
Metric to assess communication savings
W
k
: the weighted update rate
W
k
= C
k
U
k
Update rate U
k
: percentage of updates for model k ([Olston,
2001, Jain et al., 2004, Santini et al., Tulone et al., 2006]).
Model cost C
k
: takes into account the number of parameters
of the k-th model.
C
k
=
P
PD+1
P: Size of the packet.
D: Size of the data load.
P D is the packet overhead
SYNC Packet Address Message Group Data . . . Data CRC SYNC
BYTE Type Type ID Length BYTE
1 2 3 5 6 7 . . . Size D P-2 P
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Adaptive Model Selection
Model selection
When data collection starts, no idea which model is best!
As time passes, estimates of W
k
can be obtained by the
nodes.
Let W
k
[t] be an estimate of W
k
at time t.
As time passes, the condence in the estimates W
k
[t] gets
higher.
Racing [Maron, 1997]: Statistical model selection technique
based on the Hoeding bound, which allows to discard poorly
performing model.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Adaptive Model Selection
Model selection
When data collection starts, no idea which model is best!
As time passes, estimates of W
k
can be obtained by the
nodes.
Let W
k
[t] be an estimate of W
k
at time t.
As time passes, the condence in the estimates W
k
[t] gets
higher.
Racing [Maron, 1997]: Statistical model selection technique
based on the Hoeding bound, which allows to discard poorly
performing model.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Adaptive Model Selection
Racing
Model type
W
e
i
g
h
t
e
d
u
p
d
a
t
e
r
a
t
e
h
1
h
2
h
3
h
4
h
5
h
6
W
1
[t]
Upper bound
for W
k
[
t
]
Lower bound
for W
1
[t]
W
1
[t]
W
2
[t]
W
3
[t]
W
4
[t]
W
5
[t]
W
6
[t]
At rst, all models are in competition.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Adaptive Model Selection
Racing
Model type
W
e
i
g
h
t
e
d
u
p
d
a
t
e
r
a
t
e
h
1
h
2
h
3
h
4
h
5
h
6
W
1
[t]
Upper bound
for
W
k
[
t
]
W
1
[t]
W
2
[t] W
3
[t]
W
4
[t]
W
5
[t]
W
6
[t]
As time passes, model h
1
statistically outperforms h
6
.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Adaptive Model Selection
Racing
Model type
W
e
i
g
h
t
e
d
u
p
d
a
t
e
r
a
t
e
h
1
h
2
h
3
h
4
h
5
h
6
W
1
[t]
Upper bound
for
W
k
[
t
]
W
1
[t]
W
2
[t]
W
3
[t]
W
4
[t]
W
5
[t]
h
3
then statistically outperforms h
5
.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Adaptive Model Selection
Racing
Model type
W
e
i
g
h
t
e
d
u
p
d
a
t
e
r
a
t
e
h
1
h
2
h
3
h
4
h
5
h
6
Upper bound
for
W
k
[
t
]
W
1
[t]
W
2
[t]
W
3
[t]
W
4
[t]
W
3
[t]
h
3
nally is selected as the best one.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Adaptive Model Selection
Experimental evaluation
14 time series, various types of measured physical quantities.
Data set Sensed quantity Sampling period Duration Number of samples
S Heater temperature 3 seconds 6h15 3000
I Light light 5 minutes 8 days 1584
M Hum humidity 10 minutes 30 days 4320
M Temp temperature 10 minutes 30 days 4320
NDBC WD wind direction 1 hour 1 year 7564
NDBC WSPD wind speed 1 hour 1 year 7564
NDBC DPD dominant wave period 1 hour 1 year 7562
NDBC AVP average wave period 1 hour 1 year 8639
NDBC BAR air pressure 1 hour 1 year 8639
NDBC ATMP air temperature 1 hour 1 year 8639
NDBC WTMP water temperature 1 hour 1 year 8734
NDBC DEWP dewpoint temperature 1 hour 1 year 8734
NDBC GST gust speed 1 hour 1 year 8710
NDBC WVHT wave height 1 hour 1 year 8723
Error threshold is set to 0.01r where r is the range of the
measurements.
AMS is run with K = 6 models: the constant model (CM)
and autoregressive models AR(p) with p ranging from 1 to 5.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Adaptive Model Selection
Experimental evaluation
CM AR1 AR2 AR3 AR4 AR5 AMS
S Heater 74 78 68 70 76 81 AR2
I Light 38 42 44 48 51 53 CM
M Hum 53 55 55 60 62 66 CM
M Temp 48 50 50 54 56 60 CM
NDBC DPD 65 89 89 95 102 109 CM
NDBC AWP 72 75 81 88 93 99 CM
NDBC BAR 51 52 44 47 49 50 AR2
NDBC ATMP 39 41 40 43 46 49 CM
NDBC WTMP 27 28 23 25 27 28 AR2
NDBC DEWP 57 54 58 62 67 71 AR1
NDBC WSPD 74 87 92 99 106 113 CM
NDBC WD 85 84 91 98 104 111 AR1
NDBC GST 80 84 90 96 103 110 CM
NDBC WVHT 58 58 63 67 71 76 CM
Bold numbers report signicantly better weighted update
rates (Hoeding bound, = 0.05).
For all time series, the AMS selects the best model.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Replicated models
Summary:
Advantages:
Allows sensor nodes to determine autonomously the model
which best ts their measurements,
Distributes the computation among the sensor nodes.
Drawbacks:
In case of sensor failure, the BS assumes that the model is
correct,
Mainly for temporal modeling. Extension to spatial modeling
is not that good in practice [Silberstein et al., 2005; Chu et al., 2006].
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Aggregative approaches
15
20
25
30
t
e
m
p
e
r
a
t
u
r
e
(
C
)
15
20
25
30
t
e
m
p
e
r
a
t
u
r
e
(
C
)
15
20
25
30
t
e
m
p
e
r
a
t
u
r
e
(
C
)
(a) (b) (c)
Figure 7: Kernel regression model obtained from temperature data collected at the Intel Research, Berkeley lab, using 5
kernel regions, with 3 basis function per region, at dierent times of the day (the circles represent the actual temperature at
the sensor locations): (a) at night, locations near windows are colder; (b) in the morning, the East side of the lab faces the sun,
signicantly increasing the temperature; and (c) in the early evening, the temperature is uniformly warm.
23.1 23.2
22.1 22.1
22.1 22.2
22.5 22.5
27.4
24.4
27.8
26.8
28.3
25.7
27.1
22.2 23.4
20.1 22.2
20.9 20.4
20.3
24.6
27.0
25.4
27.9
25.9
28.8
25.7
29.0 27.6
24.5 26.0
23.8 24.3
22.7
21.9 22.8
19.9
26.4 26.7
23.5 23.8
28
16
18
20
22
24
26
Figure 8: A contour plot generated by running kernel-based
quadratic regression on the data collected at 10 AM on Oc-
tober 28th in the Intel Research, Berkeley lab. The labels
represent the actual temperatures measured at the sensor
locations. Note that this rich contour was obtained from a
regression model with only 15 parameters.
the sliding window, T. We measured the root-mean-squared
(RMS) between the model,
f(x, y, t), and the value at every
point in the data set, D(x, y, t).
To measure the ability of regression to predict the value at
locations in the sensor eld where there are no readings, we
also experimented with subsampling of the data set D to a
dataset with 1/8th of the of the original data, and measured
the RMS of regression applied to this dataset versus D.
The results of these error measurements for dierent basis
sets with varying time windows and subsampling are shown
in Figure 9. We experimented with three dierent basis
function sets per kernel: either (1) linear-space, quadratic-
time (e.g.,
f(x, y, t) = c
1
(x) +c
2
(y) +c
3
(t
2
) +c
4
(t) +c
5
), (2)
linear-space, linear-time, or (3) linear-space, constant-time.
We also measured the RMS of simply computing the average
value of the readings in each kernel over the time window
T. Note that regression performs quite well compared to
averaging, and that, as expected, increasing the number of
basis functions increases the quality of the t. Surprisingly,
regression using the reduced data set (with 1/8 the points)
performs as well as regression with the entire data set; this
is likely due to low variations in temperature within an 8
reading (16 minute) window.
Since average error over an entire data set does not cap-
ture the worst case performance of these approaches, we also
plotted the error of these schemes at dierent times of day,
using a time window size of two hours. The results of this
experiment are shown in Figure 10. Notice that the linear
20 40 80 120 160 200 240 280 320 360 400
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
size of time window in minutes
R
M
S
average over regions
constant in time
linear in time
quadratic in time
quadratic in time (using 1/8 of data)
quadratic in time (10 kernels)
Figure 9: The RMS error of regression with varying time
windows and numbers of basis functions per kernel for the
data set collected from the Intel - Berkeley Lab, compared
against simple averaging in each kernel.
6 pm midnight 6 am noon 6 pm midnight 6 am noon 6 pm midnight
0
0.5
1
1.5
2
2.5
time of day
R
M
S
quadratic in time
linear in time
constant in time
average over regions
Figure 10: The error of dierent regression models for the
lab data set at dierent times of day, using a time window
size of 2 hours.
and quadratic t perform much better during times when
the temperature changes dramatically (e.g., when the sun
rises and sets see Figure 6 for a plot of the temperature
reading from several sensors), but that all schemes perform
well at times of low variance (e.g., at night). This suggests
that an adaptive scheme, where dierent numbers of basis
functions are used depending on signal variability may be
benecial; such an exploration is left for future work.
We also measured the communication costs for our lab
deployment (using 3 coecients per kernel) and found that,
in TOSSIM, the total number of bytes sent by all sensors
was 5808 bytes versus 875 bytes to extract a single reading
from every sensor. After this 5800 bytes of communication,
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Routing tree
The network is made of S sensors S = {1, 2, . . . , S}.
Radio range is limited.
Base station
Routing
tree
Radio range
Nodes create children/parents links to form a routing tree.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Data collection in wireless sensor networks
Let s
i
[t] be the measurements of node i at time t, 1 i S.
At time t, all the measurements are sent to the base station
by means of the routing tree.
Base station
Routing
tree
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Data collection in wireless sensor networks
Let s
i
[t] be the measurements of node i at time t, 1 i S.
At time t, all the measurements are sent to the base station
by means of the routing tree.
Base station
Routing
tree
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Data collection in wireless sensor networks
Let s
i
[t] be the measurements of node i at time t, 1 i S.
At time t, all the measurements are sent to the base station
by means of the routing tree.
Base station
Routing
tree
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Data collection in wireless sensor networks
Let s
i
[t] be the measurements of node i at time t, 1 i S.
At time t, all the measurements are sent to the base station
by means of the routing tree.
Base station
Routing
tree
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Data collection in wireless sensor networks
Communication is much more important at nodes close to the
base station.
The batteries of these nodes expire rst
The root node is the bottleneck
Rest of the network disconnected!
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Data aggregation
Some operations like the sum or the average of the
measurements can be easily distributed in a network.
Computation of the average of the measurements:
=
s
1
[t] + s
2
[t] + . . . + s
S
[t]
S
=
P
S
i =1
s
i
[t]
S
3 6
2
1
5
4
9
8
7
Base
station
s
1
[t]
6
i=1
s
i
[t]
3
i=1
s
i
[t]
s
4
[t]
9
i=1
s
i
[t]
=
9
i=1
s
i
[t]
9
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Network load distribution
Collection of all the measurements s
i
[t]:
N
e
t
w
o
r
k
l
o
a
d
(
T
x
+
R
x
/
e
p
o
c
h
)
0
5
1
0
1
5
Store and Forward
PCA
(1 princ. comp.)
A
m
o
u
n
t
o
f
p
a
c
k
e
t
s
p
r
o
c
e
s
s
e
d
p
e
r
e
p
o
c
h
(
T
x
+
R
x
)
0
5
1
0
1
5
2
0
SnF PCA (1 PC)
Network load incurred by SnF and PCA schemes
Square grid topology ! Side 3
N
u
m
b
e
r
o
f
p
a
c
k
e
t
s
p
r
o
c
e
s
s
e
d
(
T
x
+
R
x
)
p
e
r
e
p
o
c
h
0
5
1
0
1
5
2
0
3
2
1
6
5
4
9
8
7
1 2 3 4 5 6 7 8 9
Sensor node
Base
station
2
0
1
2
1
2 2
1
3 6
9
Highest
network
load: 17
=
9
i=1
s
i
[t]
9
The highest network load is L
D
max
= S 1 +S = 2S 1 = 17.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Network load distribution
With aggregation:
N
e
t
w
o
r
k
l
o
a
d
(
T
x
+
R
x
/
e
p
o
c
h
)
0
5
1
0
1
5
Store and Forward
PCA
(1 princ. comp.)
A
m
o
u
n
t
o
f
p
a
c
k
e
t
s
p
r
o
c
e
s
s
e
d
p
e
r
e
p
o
c
h
(
T
x
+
R
x
)
0
5
1
0
1
5
2
0
SnF PCA (1 PC)
Network load incurred by SnF and PCA schemes
Square grid topology ! Side 3
N
u
m
b
e
r
o
f
p
a
c
k
e
t
s
p
r
o
c
e
s
s
e
d
(
T
x
+
R
x
)
p
e
r
e
p
o
c
h
0
5
1
0
1
5
2
0
3
2
1
6
5
4
9
8
7
1 2 3 4 5 6 7 8 9
Sensor node
Base
station
2
0
1
1
1
1
1 1
1
1
1
=
9
i=1
s
i
[t]
9
Highest
network
load: 3
The highest network load is L
A
max
= |C
i
C
| + 1 = 3, where |C
i
C
|
is the maximum number of children in the routing tree.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Data aggregation
Advantage of aggregation:
Reduce the communication.
The network load does not depend on the network size:
scalability.
Changes the network load distribution: the root node is no
more the bottleneck.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Data aggregation
Operators:
Count, sum, average, min, max (SQL operators, [Madden et
al., 2005]),
Distributed regression [Guestrin, 2004].
Middleware:
Tiny Diusion [Intanagonwiwat, 2000],
Tiny Aggregation (TAG) [Madden et al., 2005],
Dozer [Burri, 2007], . . .
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Principal Component Aggregation [Le Borgne et al., 2008]
Principal Component Analysis
PCA: Reduces data dimensionality by nding the subspace of
dimension q that best represents the data.
S coordinates
q<S coordinates
s
2
[
t
]
s
3
[t]
s
1
[t]
(s
1
[t], s
2
[t], s
3
[t]) R
S
(z
1
[t], z
2
[t]) R
q
Here, S = 3, q = 2.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Principal Component Analysis
Properties
PCA: versatile technique for
Compression,
noise ltering,
event detection,
event recognition.
It is particularly appropriate when data are correlated (as is
the case with wireless sensor network data).
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Principal Component Analysis
Illustrative example
No noise SNR=1
Three dierent phenomena, appearing and
disappearing over 20 epochs.
Thirty cycles of 3*20 epochs. First cycle used
for learning, and the remaining used for
testing.
The sensor network is a 10*10 grid of sensor
nodes.
1 PC 3 PCs 100 PCs
Dramatic dimensionality reduction (from 100 to 3!).
noise ltering.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Principal Component Analysis
Computation
On the basis on N observations, minimize:
arg min
W
[S,q]
N
t=1
||s[t] WW
T
s[t]||
2
PCA basis
rst eigenvectors of
the covariance matrix of
the measurements
q
Projection
on the PC basis:
z[t] = W
T
s[t]
Initialization stage
Dimensionality reduction
z[t] R
q
, s[t] R
S
W
[S,q]
(q S)
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Principal Component Aggregation
In-network projections
Coordinates z
k
[t] are products w
T
k
s[t] =
S
i =1
w
(i ,k)
s
i
[t].
If elements w
(i ,k)
, 1 k q are made available to each
sensor during the initialization stage, these products can be
computed with an aggregation service.
3 6
2
1
5
4
9
8
7
Base
station
w
(1,1)
s
1
[t]
3
i=1
w
(i,1)
s
i
[t]
6
i=1
w
(i,1)
s
i
[t]
5
i=4
w
(i,1)
s
i
[t]
9
i=1
w
(i,1)
s
i
[t]
9
i=1
w
(i,1)
s
i
[t] = w
T
1
s[t] = z
1
[t]
w
(4,1)
s
4
[t]
2
i=1
w
(i,1)
s
i
[t]
w
(7,1)
s
7
[t]
8
i=7
w
(i,1)
s
i
[t]
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Network load distribution
q=1
N
e
t
w
o
r
k
l
o
a
d
(
T
x
+
R
x
/
e
p
o
c
h
)
0
5
1
0
1
5
Store and Forward
PCA
(1 princ. comp.)
A
m
o
u
n
t
o
f
p
a
c
k
e
t
s
p
r
o
c
e
s
s
e
d
p
e
r
e
p
o
c
h
(
T
x
+
R
x
)
0
5
1
0
1
5
2
0
SnF PCA (1 PC)
Network load incurred by SnF and PCA schemes
Square grid topology ! Side 3
N
u
m
b
e
r
o
f
p
a
c
k
e
t
s
p
r
o
c
e
s
s
e
d
(
T
x
+
R
x
)
p
e
r
e
p
o
c
h
0
5
1
0
1
5
2
0
3
2
1
6
5
4
9
8
7
1 2 3 4 5 6 7 8 9
Sensor node
Base
station
2
0
q
q
q
q
q q
q
q
q
z[t] R
q
Highest
network
load: 3q
The highest network load is L
PCAg
max
= q(|C
i
C
| + 1), where |C
i
C
|
is the maximum number of children in the routing tree.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Principal Component Aggregation
Data accuracy
When measurements are correlated, few PCs retain most of the
variance (P(q): Proportion of retained variance).
Number q of PCs
1
S 0
0
White noise
Low correlations
High correlations
P
(
q
)
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Principal Component Aggregation
Cross-layer optimization
5 10 15 20 25 30
0
5
0
1
0
0
1
5
0
2
0
0
Maximum number of children in the routing tree
H
i
g
h
e
s
t
n
e
t
w
o
r
k
l
o
a
d
Action type
D action
1 A action
3 A actions
5 A actions
H
i
g
h
e
s
t
n
e
t
w
o
r
k
l
o
a
d
|C
i
C
|
Maximum number of children
Reducing the number of children |C
i
C
| in the routing tree
further reduces the communication costs of the PCAg.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Aggregative approaches
Summary:
Advantages:
Allows to transform data within a sensor network.
Applications: Compression, noise ltering, event recognition
or detection.
Scalability: The network load does not depend on network
size.
Drawbacks:
Requires an initialization stage,
Sensitive to sensor failure, and outliers.
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Other experimental setups
Several deployments at the ULB.
Data sets (and code) available at www.ulb.ac.be/di/labo.
Microclimate monitoring - Solbosch greenhouses.
18 sensors in three greenhouses. Several experiments.
Data collected: Temperature, humidity and light.
Sampling interval: 5 minutes.
Experimental setups monitoring - Unit of Social Ecology.
18 sensors in three experimental labs, running for 5 days.
Data collected: Temperature, humidity and light.
Sampling interval: 5 minutes.
PIMAN project (Region Bruxelles Capitale - 2007/2008):
Goal: localize an operator in an industrial environment.
Techniques: Triangulation, multidimensional scaling, Kalman lters.
Several deployments (up to 48 sensors).
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Perspectives
Next challenges for learning and wireless networks:
Wireless sensor and actuators networks
Sense and control.
Ultrawide band
Cognitive radios.
Miniaturization goes on
Scalable data processing techniques.
1950 1990 2000 2010 2020
?
Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Thank you for your attention!
Questions?