For Data Compression and Prediction: Learning in Wireless Sensor Networks

Preliminaries Model-driven Replicated models Aggregative approaches Conclusions
Learning in Wireless Sensor Networks

for data compression and prediction
Yann-Ael Le Borgne
Computational modeling/ETRO
Vrije Universiteit Brussels
November 12
th
, 2009
Joint seminar BILAB/INFRES, Telecom - Paris Tech, Paris, France
Outline
Wireless Sensor Networks (WSN)
Supervised Learning
Learning in WSN
Model-driven data acquisition
Replicated models
Aggregative approaches
Conclusions
Wireless Sensor Networks
Wireless sensors: latest trend of Moores law (1965)
The number of transistors that can be
placed inexpensively on an integrated circuit
doubles every two years.
1950 1990 2000 2010
Computing devices get
Smaller
Cheaper
Enable new kinds of interactions with our world
Wireless sensors
Sensor nodes can collect, process and communicate data
[Warneke et al., 2001; Akyildiz et al., 2002]
TMote Sky
Deputy dust
Sensors: Light, temperature, humidity,
pressure, acceleration, sound, . . .
Radio: 10s kbps, 10s meters
Microprocessor: A few MHz
Memory: 10s KB
Environmental monitoring and periodic data collection
Base
station
Wireless node
Internet
Sensor network
Base station
separate calibration procedures to test the system prior to
placing it in the eld, and then spent a day in the forest
with ropes, harnesses, and a notebook.
We decided on the following envelope for our deployment:
Time: One month during the early summer, sampling all
sensors once every 5 minutes. The early summer con-
tains the most dynamic microclimatic variation. We
decided that sampling every 5 minutes would be su-
cient to capture that variation.
Vertical Distance: 15m from ground level to 70m from
ground level, with roughly a 2-meter spacing between
nodes. This spatial density ensured that we could cap-
ture gradients in enough detail to interpolate accu-
rately. The envelope began at 15m because most of
the foliage was in the upper region of the tree.
Angular Location: The west side of the tree. The west
side had a thicker canopy and provided the most buer-
ing against direct environmental eects.
Radial Distance: 0.1-1.0m from the trunk. The nodes
were placed very close to the trunk to ensure that we
were capturing the microclimatic trends that aected
the tree directly, and not the broader climate.
Figure 1 shows the nal placement of each mote in the
tree. We also placed several nodes outside of our angular
and radial envelope in order to monitor the microclimate in
the immediate vicinity of other biological sensing equipment
that had previously been installed.
Figure 1: The placement of nodes within the tree
4.1 Hardware and Network Architecture
The sensor node platform was a Mica2Dot, a repackaged
Mica2 mote produced by Crossbow, with a 1 inch diameter
form factor. The mote used an Atmel ATmega128 micro-
controller running at 4 MHz, a 433 MHz radio from Chip-
con operating at 40Kbps, and 512KB of ash memory. The
mote was connected to digital sensors using I2C and SPI
serial protocols and to analog sensors using the on-board
ADC.
The choice of measured parameters was driven by the bio-
logical requirements. We measured traditional climate vari-
ables temperature, humidity, and light levels. Tempera-
ture and relative humidity feed directly into transpiration
models for redwood forests. Photosynthetically active radi-
ation (PAR, wavelength from 350 to 700 nm) provides infor-
mation about energy available for photosynthesis and tells
us about drivers for the carbon balance in the forest. We
measure both incident (direct) and reected (ambient) levels
of PAR. Incident measurements provide insight into the en-
ergy available for photosynthesis, while the ratio of reected
to incident PAR allows for eventual validation of satellite
remote sensing measurements of land surface reectance.
The Sensirion SHT11 digital sensor provided temperature
( 0.5
C) and humidity ( 3.5%) measurements. The in-

cident and reected PAR measurements were collected by
two Hamamatsu S1087 photodiodes interfaced to the 10-bit
ADC on Mica2Dot.
The platform also included a TAOS TSL2550 sensor to
measure total solar radiation (300nm - 1000nm), and an
Intersema MS5534A to measure barometric pressure, but
we chose not to use them in our deployment. During cali-
bration, we found that the TSR sensor was overly sensitive,
and would not produce useful information in direct sunlight.
Because TSR and PAR would have told roughly the same
story, and because PAR was more useful from the biology
viewpoint, we decided not to gather data on total solar ra-
diation. As for the pressure sensor, barometric pressure is
simply too diuse a phenomenon to show appreciable dier-
ences over the height of a single redwood tree. A standard
pressure gradient would exist as a direct function of height,
but any pressure changes due to weather would aect the
entire tree equally. Barometric pressure sensing should be
useful in future large-scale climate studies.
The package for such a deployment needs to protect the
electronics from the weather while safely exposing the sen-
sors. Our chosen sensing modalities place specic require-
ments on the package. Standardized temperature and hu-
midity sensing should be performed in a shaded area with
adequate airow, implying that the enclosure must provide
such a space while absorbing little radiated heat. The out-
put of the sensors that measure direct radiation is dependent
on the sensor orientation, so the enclosure must expose these
sensors and level their sensing plane. The sensors measuring
ambient levels of PAR must be shaded but need a relatively
wide eld of view.
The package designed for this deployment is shown in Fig-
ure 2. The mote, the battery, and two sensor boards t in-
side the sealed cylindrical enclosure. The enclosure is milled
from white HDPE, and reects most of the radiated heat.
The endcaps of the cylinder form two sensing surfaces one
captures direct radiation, the other captures all other mea-
surements. The white skirt provides extra shade, protec-
53
One-to-one applications Many-to-one applications
Data is sent by nodes periodically to a base station.
Applications: Medical, ecology, industry, disaster prevention,
interactive arts, . . .
Challenges in environmental monitoring:
Long-running applications (months or years),
Limited energy on sensor nodes.
Preliminaries Adaptive Model Selection Distributed Principal Component Analysis
Operation mode Telos node
Standby 5.1 A
MCU Active 1.8 mA
MCU + Radio RX 21.8 mA
MCU + Radio TX (0dBm) 19.5 mA
The radio is the most energy consuming module.
95% of energy consumption in typical data collection tasks [Madden, 2003].
If run continuoulsy with the radio, the lifetime is about 5 days.
x10
Preliminaries Adaptive Model Selection Distributed Principal Component Analysis
Operation mode Telos node
Standby 5.1 A
MCU Active 1.8 mA
MCU + Radio RX 21.8 mA
MCU + Radio TX (0dBm) 19.5 mA
x10
Machine learning
Overview
Goal: Uncover structure and relationships in a set of
observations, by means of models (mathematical functions).
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
x[1:22]
y
2
[
1
:
2
2
]
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
x[1:22]
y
2
[
1
:
2
2
]
Variable 1 ( ) x
Variable 1 ( ) x
V
a
r
i
a
b
l
e

2

(

)
y
V
a
r
i
a
b
l
e

2

(

)
y
Learning
Procedure
Observations
y = h(x)
Model
A learning procedure is used to nd the model.
Machine learning
Learning methodology
Unknown relationship
Observations
Learning procedure
Model
Input
Output
Output
x
y
y
L(y, y) Error
Input x can have several dimensions (Image classication)
Dierent models exist (linear models, neural networks,
decision trees, ...) with specic learning procedures.
Machine learning
Modeling sensor data
Temporal model:
0 20 40 60 80 100 120
0
2
0
4
0
6
0
8
0
1
0
0
1
2
0
x
y
Time
M
e
a
s
u
r
e
m
e
n
t
t
s
i
[
t
]
Model
s
i
[t] = t
Training
examples
s
i
[t] = h
(t) = t
Input: Time.
Output: The measurement s
i
[t] of a sensor i at time t.
Model: s
i
[t] = h
(t) = t.
The model approximates the set of measurements with just one
parameter .
Learning with wireless sensor data
Motivation
Machine learning techniques can be used to reduce
communication by approximating sensor data with models.
Eective approach as sensor data are
temporally and spatially related (correlations)
Noisy: exact measurements rarely needed.
Learning with wireless sensor data
1) Model-driven data
acquisition
5
4
3
6
2
1
Base
station
5
4
3
6
2
1
Base
station
3) Aggregative approaches
1
Base
station
2
3
4
h
1
h
2
h
3
h
4
h
1
h
2
h
3
h
4
2) Replicated models
15
20
25
30
t
e
m
p
e
r
a
t
u
r
e

(
C
)
15
20
25
30
t
e
m
p
e
r
a
t
u
r
e

(
C
)
15
20
25
30
t
e
m
p
e
r
a
t
u
r
e

(
C
)
(a) (b) (c)
Figure 7: Kernel regression model obtained from temperature data collected at the Intel Research, Berkeley lab, using 5
kernel regions, with 3 basis function per region, at dierent times of the day (the circles represent the actual temperature at
the sensor locations): (a) at night, locations near windows are colder; (b) in the morning, the East side of the lab faces the sun,
signicantly increasing the temperature; and (c) in the early evening, the temperature is uniformly warm.
23.1 23.2
22.1 22.1
22.1 22.2
22.5 22.5
27.4
24.4
27.8
26.8
28.3
25.7
27.1
22.2 23.4
20.1 22.2
20.9 20.4
20.3
24.6
27.0
25.4
27.9
25.9
28.8
25.7
29.0 27.6
24.5 26.0
23.8 24.3
22.7
21.9 22.8
19.9
26.4 26.7
23.5 23.8
28
16
18
20
22
24
26
Figure 8: A contour plot generated by running kernel-based
quadratic regression on the data collected at 10 AM on Oc-
tober 28th in the Intel Research, Berkeley lab. The labels
represent the actual temperatures measured at the sensor
locations. Note that this rich contour was obtained from a
regression model with only 15 parameters.
the sliding window, T. We measured the root-mean-squared
(RMS) between the model,

f(x, y, t), and the value at every
point in the data set, D(x, y, t).
To measure the ability of regression to predict the value at
locations in the sensor eld where there are no readings, we
also experimented with subsampling of the data set D to a
dataset with 1/8th of the of the original data, and measured
the RMS of regression applied to this dataset versus D.
The results of these error measurements for dierent basis
sets with varying time windows and subsampling are shown
in Figure 9. We experimented with three dierent basis
function sets per kernel: either (1) linear-space, quadratic-
time (e.g.,

f(x, y, t) = c
1
(x) +c
2
(y) +c
3
(t
2
) +c
4
(t) +c
5
), (2)
linear-space, linear-time, or (3) linear-space, constant-time.
We also measured the RMS of simply computing the average
value of the readings in each kernel over the time window
T. Note that regression performs quite well compared to
averaging, and that, as expected, increasing the number of
basis functions increases the quality of the t. Surprisingly,
regression using the reduced data set (with 1/8 the points)
performs as well as regression with the entire data set; this
is likely due to low variations in temperature within an 8
reading (16 minute) window.
Since average error over an entire data set does not cap-
ture the worst case performance of these approaches, we also
plotted the error of these schemes at dierent times of day,
using a time window size of two hours. The results of this
experiment are shown in Figure 10. Notice that the linear
20 40 80 120 160 200 240 280 320 360 400
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
size of time window in minutes
R
M
S
average over regions
constant in time
linear in time
quadratic in time
quadratic in time (using 1/8 of data)
quadratic in time (10 kernels)
Figure 9: The RMS error of regression with varying time
windows and numbers of basis functions per kernel for the
data set collected from the Intel - Berkeley Lab, compared
against simple averaging in each kernel.
6 pm midnight 6 am noon 6 pm midnight 6 am noon 6 pm midnight
0
0.5
1
1.5
2
2.5
time of day
R
M
S
quadratic in time
linear in time
constant in time
Figure 10: The error of dierent regression models for the
lab data set at dierent times of day, using a time window
size of 2 hours.
and quadratic t perform much better during times when
the temperature changes dramatically (e.g., when the sun
rises and sets see Figure 6 for a plot of the temperature
reading from several sensors), but that all schemes perform
well at times of low variance (e.g., at night). This suggests
that an adaptive scheme, where dierent numbers of basis
functions are used depending on signal variability may be
benecial; such an exploration is left for future work.
We also measured the communication costs for our lab
deployment (using 3 coecients per kernel) and found that,
in TOSSIM, the total number of bytes sent by all sensors
was 5808 bytes versus 875 bytes to extract a single reading
from every sensor. After this 5800 bytes of communication,
Model-driven data acquisition (1)
5
4
3
6
2
1
Base
station
5
4
3
6
2
1
Base
station
1 2 3 4 5 6
1
2
3
N
N

e
p
o
c
h
s
X
NS
S
q
= {2, 3, 6}
S
S
p
= {1, 4, 5}
Learning
process:
What sensors
can be used to
predict sensor
? S
p
S
S
q
S
Partition S in S
q
(queried) and S
p
(predicted)
Find prediction models such that S
p
are predicted from S
q
.
h
: R
|S
q
|
R
|S
p
|
s
q
[t] s
p
[t]
Tradeo communication costs/accuracy:
C(S
q
): Communication cost of collecting the measurements
from S
q
.
R(S
p
): Accuracy of the predictions for sensors in S
p
.
The goal of the optimization problem is to nd the subset S
q
that
[Deshpande et al., 2005]
1
minimizes C(S
q
),
2
such that R(S
p
) > , is user dened.
NP-hard!
5
4
3
6
2
1
Base
station
5
4
3
6
2
1
Base
station
Preliminaries
start with S
q
= {1, 2, 3, 4, 5, 6}, S
p
=
5
4
3
6
2
1
Base
station
5
4
3
6
2
1
Base
station
remove
sensor 1
remove
sensor 5
remove
sensor 4
Preliminaries
start with S
q
= {1, 2, 3, 4, 5, 6}, S
p
=
S
q
= {2, 3, 4, 5, 6}, S
p
= {1}
S
q
= {2, 3, 4, 6}, S
p
= {1, 5}
S
q
= {2, 3, 6}, S
p
= {1, 4, 5}
Preliminaries
start with S
q
= {1, 2, 3, 4, 5, 6}, S
p
=
S
q
= {2, 3, 4, 5, 6}, S
p
= {1}
S
q
= {2, 3, 4, 6}, S
p
= {1, 5}
S
q
= {2, 3, 6}, S
p
= {1, 4, 5}
Preliminaries
start with S
q
= {1, 2, 3, 4, 5, 6}, S
p
=
S
q
= {2, 3, 4, 5, 6}, S
p
= {1}
S
q
= {2, 3, 4, 6}, S
p
= {1, 5}
S
q
= {2, 3, 6}, S
p
= {1, 4, 5}
All the planning is centralized
Prediction model:
Any model can be used (depends on BS resources),
Linear models: Ecient and eective computational tricks
([Deshpande et al., 2005]).
p|q
=
p
+
pq
1
qq
(s
q
[t]
q
)
p|q
=
pp

pq
qq
qp
.
P (s
i
[t] [s
i
[t] , s
i
[t] + ]) > , i S
p
Communication costs:
Shortest path problem,
Dierent metric possible (based on radio link quality,
remaining energy, load balancing, . . . ).
Summary:
Advantages:
Sensor nodes in S
p
remain in their sleeping mode (lowest
energy level),
Models are computed at the BS: exhaustive view of the
network (spatial and temporal).
Drawbacks:
The BS undertakes all the computations: not scalable,
Failure to get one measurement from S
q
may lead to very
high errors (see Le Borgne et al., 2005 and Chu et al., 2006 for possible
improvements).
Replicated models
1
Base
station
2
3
4
h
1
h
2
h
3
h
4
h
1
h
2
h
3
h
4
Replicated models
Overview
Recall: In environmental monitoring, a sensor sends its
measurements periodically.
Measurements s[t] are sent at every time t.
Base
station
Wireless node
s[t]
s
[
t
]
t
s
[
t
]
t
Replicated models:
Models h are sent instead of the measurements.
Base
station
Wireless node
s
[
t
]
t t
h
s
[
t
]
Danger
Replicated models
Overview
Models computed by the sensor node
The node can compare the model prediction with the true
measurements:
A new model is sent if |s[t] s[t]| >
is user-dened, and application dependent.
Simple learning procedure must be used. Most simple model:
Constant model [Olston et al., 2001]
s
i
[t] = s
i
[t 1]
Simply: The next measurement is the same as the previous
one
no parameter to compute
Replicated models
Constant model
Temperature measurements, Solbosch greenhouse. = 2
C.
160 170 180 190 200 210 220
3
0
3
2
3
4
3
6
3
8
4
0
4
2
Accuracy: 2C
Constant model
Time instants
T
e
m
p
e
r
a
t
u
r
e

(
C
)
q q q q q
Sensor node
Base station
5 updates instead of 58
more than 90% of communication savings.
Replicated models
Autoregressive models
More complex models can be used: autoregressive models AR(p)
[Santini et al., Tulone et al., 2006].
s[t] =
1
s[t 1] + . . . +
p
s[t p]
0 5 10 15 20
2
0
2
5
3
0
3
5
4
0
4
5
Accuracy: 2C
AR(2)
Time (Hour)
T
e
m
p
e
r
a
t
u
r
e

(
C
)

Time (hours)
T
e
m
p
e
r
a
t
u
r
e

(
C
)
An AR(2) reduces the number
of updates by 6 percents
in comparison to
the constant model.
Replicated models
Pros and cons
Pros:
Guarantee the observer with accuracy.
Simple or complex models can be used.
Cons:
In most cases, no a priori information is available on the
measurements. Which model to choose a priori?
Adaptive Model Selection [Le Borgne et al., 2007]
Motivation
Tradeo: More complex models better predict measurements,
but have a higher number of parameters.
Model complexity
M
e
t
r
i
c
Communication costs
Model error
AR(p) : s
i
[t] =
p
j=1
j
s
i
[t j]
Adaptive Model Selection
Collection of models
A collection of K models {h
k
}, 1 k K, of increasing
complexity are run by the node.
Base
station
Wireless node
s
[
t
]
t t
{h
1
, h
2
,
. . . , h
K
}
s
[
t
]
h
2
W
k
: new metric estimating the communication costs.
When an update is needed, the model with the lowest W
k
is
sent.
Metric to assess communication savings
W
k
: the weighted update rate
W
k
= C
k
U
k
Update rate U
k
: percentage of updates for model k ([Olston,
2001, Jain et al., 2004, Santini et al., Tulone et al., 2006]).
Model cost C
k
: takes into account the number of parameters
of the k-th model.
C
k
=
P
PD+1
P: Size of the packet.
D: Size of the data load.
P D is the packet overhead
SYNC Packet Address Message Group Data . . . Data CRC SYNC
BYTE Type Type ID Length BYTE
1 2 3 5 6 7 . . . Size D P-2 P
Model selection
When data collection starts, no idea which model is best!
As time passes, estimates of W
k
can be obtained by the
nodes.
Let W
k
[t] be an estimate of W
k
at time t.
As time passes, the condence in the estimates W
k
[t] gets
higher.
Racing [Maron, 1997]: Statistical model selection technique
based on the Hoeding bound, which allows to discard poorly
performing model.
Model selection
When data collection starts, no idea which model is best!
As time passes, estimates of W
k
can be obtained by the
nodes.
Let W
k
[t] be an estimate of W
k
at time t.
As time passes, the condence in the estimates W
k
[t] gets
higher.
Racing [Maron, 1997]: Statistical model selection technique
based on the Hoeding bound, which allows to discard poorly
performing model.
Racing
Model type
W
e
i
g
h
t
e
d

u
p
d
a
t
e

r
a
t
e
h
1
h
2
h
3
h
4
h
5
h
6
W
1
[t]
Upper bound
for W
k
[
t
]
Lower bound
for W
1
[t]
W
1
[t]
W
2
[t]
W
3
[t]
W
4
[t]
W
5
[t]
W
6
[t]
At rst, all models are in competition.
Racing
Model type
W
e
i
g
h
t
e
d

u
p
d
a
t
e

r
a
t
e
h
1
h
2
h
3
h
4
h
5
h
6
W
1
[t]
Upper bound
for
W
k
[
t
]
W
1
[t]
W
2
[t] W
3
[t]
W
4
[t]
W
5
[t]
W
6
[t]
As time passes, model h
1
statistically outperforms h
6
.
Racing
Model type
W
e
i
g
h
t
e
d

u
p
d
a
t
e

r
a
t
e
h
1
h
2
h
3
h
4
h
5
h
6
W
1
[t]
Upper bound
for
W
k
[
t
]
W
1
[t]
W
2
[t]
W
3
[t]
W
4
[t]
W
5
[t]
h
3
then statistically outperforms h
5
.
Racing
Model type
W
e
i
g
h
t
e
d

u
p
d
a
t
e

r
a
t
e
h
1
h
2
h
3
h
4
h
5
h
6
Upper bound
for
W
k
[
t
]
W
1
[t]
W
2
[t]
W
3
[t]
W
4
[t]
W
3
[t]
h
3
nally is selected as the best one.
Experimental evaluation
14 time series, various types of measured physical quantities.
Data set Sensed quantity Sampling period Duration Number of samples
S Heater temperature 3 seconds 6h15 3000
I Light light 5 minutes 8 days 1584
M Hum humidity 10 minutes 30 days 4320
M Temp temperature 10 minutes 30 days 4320
NDBC WD wind direction 1 hour 1 year 7564
NDBC WSPD wind speed 1 hour 1 year 7564
NDBC DPD dominant wave period 1 hour 1 year 7562
NDBC AVP average wave period 1 hour 1 year 8639
NDBC BAR air pressure 1 hour 1 year 8639
NDBC ATMP air temperature 1 hour 1 year 8639
NDBC WTMP water temperature 1 hour 1 year 8734
NDBC DEWP dewpoint temperature 1 hour 1 year 8734
NDBC GST gust speed 1 hour 1 year 8710
NDBC WVHT wave height 1 hour 1 year 8723
Error threshold is set to 0.01r where r is the range of the
measurements.
AMS is run with K = 6 models: the constant model (CM)
and autoregressive models AR(p) with p ranging from 1 to 5.
Experimental evaluation
CM AR1 AR2 AR3 AR4 AR5 AMS
S Heater 74 78 68 70 76 81 AR2
I Light 38 42 44 48 51 53 CM
M Hum 53 55 55 60 62 66 CM
M Temp 48 50 50 54 56 60 CM
NDBC DPD 65 89 89 95 102 109 CM
NDBC AWP 72 75 81 88 93 99 CM
NDBC BAR 51 52 44 47 49 50 AR2
NDBC ATMP 39 41 40 43 46 49 CM
NDBC WTMP 27 28 23 25 27 28 AR2
NDBC DEWP 57 54 58 62 67 71 AR1
NDBC WSPD 74 87 92 99 106 113 CM
NDBC WD 85 84 91 98 104 111 AR1
NDBC GST 80 84 90 96 103 110 CM
NDBC WVHT 58 58 63 67 71 76 CM
Bold numbers report signicantly better weighted update
rates (Hoeding bound, = 0.05).
For all time series, the AMS selects the best model.
Replicated models
Summary:
Advantages:
Allows sensor nodes to determine autonomously the model
which best ts their measurements,
Distributes the computation among the sensor nodes.
Drawbacks:
In case of sensor failure, the BS assumes that the model is
correct,
Mainly for temporal modeling. Extension to spatial modeling
is not that good in practice [Silberstein et al., 2005; Chu et al., 2006].
15
20
25
30
t
e
m
p
e
r
a
t
u
r
e

(
C
)
15
20
25
30
t
e
m
p
e
r
a
t
u
r
e

(
C
)
15
20
25
30
t
e
m
p
e
r
a
t
u
r
e

(
C
)
(a) (b) (c)
Figure 7: Kernel regression model obtained from temperature data collected at the Intel Research, Berkeley lab, using 5
kernel regions, with 3 basis function per region, at dierent times of the day (the circles represent the actual temperature at
the sensor locations): (a) at night, locations near windows are colder; (b) in the morning, the East side of the lab faces the sun,
signicantly increasing the temperature; and (c) in the early evening, the temperature is uniformly warm.
23.1 23.2
22.1 22.1
22.1 22.2
22.5 22.5
27.4
24.4
27.8
26.8
28.3
25.7
27.1
22.2 23.4
20.1 22.2
20.9 20.4
20.3
24.6
27.0
25.4
27.9
25.9
28.8
25.7
29.0 27.6
24.5 26.0
23.8 24.3
22.7
21.9 22.8
19.9
26.4 26.7
23.5 23.8
28
16
18
20
22
24
26
Figure 8: A contour plot generated by running kernel-based
quadratic regression on the data collected at 10 AM on Oc-
tober 28th in the Intel Research, Berkeley lab. The labels
represent the actual temperatures measured at the sensor
locations. Note that this rich contour was obtained from a
regression model with only 15 parameters.
the sliding window, T. We measured the root-mean-squared
(RMS) between the model,

f(x, y, t), and the value at every
point in the data set, D(x, y, t).
To measure the ability of regression to predict the value at
locations in the sensor eld where there are no readings, we
also experimented with subsampling of the data set D to a
dataset with 1/8th of the of the original data, and measured
the RMS of regression applied to this dataset versus D.
The results of these error measurements for dierent basis
sets with varying time windows and subsampling are shown
in Figure 9. We experimented with three dierent basis
function sets per kernel: either (1) linear-space, quadratic-
time (e.g.,

f(x, y, t) = c
1
(x) +c
2
(y) +c
3
(t
2
) +c
4
(t) +c
5
), (2)
linear-space, linear-time, or (3) linear-space, constant-time.
We also measured the RMS of simply computing the average
value of the readings in each kernel over the time window
T. Note that regression performs quite well compared to
averaging, and that, as expected, increasing the number of
basis functions increases the quality of the t. Surprisingly,
regression using the reduced data set (with 1/8 the points)
performs as well as regression with the entire data set; this
is likely due to low variations in temperature within an 8
reading (16 minute) window.
Since average error over an entire data set does not cap-
ture the worst case performance of these approaches, we also
plotted the error of these schemes at dierent times of day,
using a time window size of two hours. The results of this
experiment are shown in Figure 10. Notice that the linear
20 40 80 120 160 200 240 280 320 360 400
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
size of time window in minutes
R
M
S
constant in time
linear in time
quadratic in time
quadratic in time (using 1/8 of data)
quadratic in time (10 kernels)
Figure 9: The RMS error of regression with varying time
windows and numbers of basis functions per kernel for the
data set collected from the Intel - Berkeley Lab, compared
against simple averaging in each kernel.
6 pm midnight 6 am noon 6 pm midnight 6 am noon 6 pm midnight
0
0.5
1
1.5
2
2.5
time of day
R
M
S
quadratic in time
linear in time
constant in time
Figure 10: The error of dierent regression models for the
lab data set at dierent times of day, using a time window
size of 2 hours.
and quadratic t perform much better during times when
the temperature changes dramatically (e.g., when the sun
rises and sets see Figure 6 for a plot of the temperature
reading from several sensors), but that all schemes perform
well at times of low variance (e.g., at night). This suggests
that an adaptive scheme, where dierent numbers of basis
functions are used depending on signal variability may be
benecial; such an exploration is left for future work.
We also measured the communication costs for our lab
deployment (using 3 coecients per kernel) and found that,
in TOSSIM, the total number of bytes sent by all sensors
was 5808 bytes versus 875 bytes to extract a single reading
from every sensor. After this 5800 bytes of communication,
Routing tree
The network is made of S sensors S = {1, 2, . . . , S}.
Radio range is limited.
Base station
Routing
tree
Radio range
Nodes create children/parents links to form a routing tree.
Data collection in wireless sensor networks
Let s
i
[t] be the measurements of node i at time t, 1 i S.
At time t, all the measurements are sent to the base station
by means of the routing tree.
Base station
Routing
tree
Let s
i
Base station
Routing
tree
Let s
i
Base station
Routing
tree
Let s
i
Base station
Routing
tree
Communication is much more important at nodes close to the
base station.
The batteries of these nodes expire rst
The root node is the bottleneck
Rest of the network disconnected!
Data aggregation
Some operations like the sum or the average of the
measurements can be easily distributed in a network.
Computation of the average of the measurements:
=
s
1
[t] + s
2
[t] + . . . + s
S
[t]
S
=
P
S
i =1
s
i
[t]
S
3 6
2
1
5
4
9
8
7
Base
station
s
1
[t]
6
i=1
s
i
[t]
3
i=1
s
i
[t]
s
4
[t]
9
i=1
s
i
[t]
=
9
i=1
s
i
[t]
9
Network load distribution
Collection of all the measurements s
i
[t]:
N
e
t
w
o
r
k

l
o
a
d

(
T
x
+
R
x
/
e
p
o
c
h
)
0
5
1
0
1
5
Store and Forward
PCA
(1 princ. comp.)
A
m
o
u
n
t

o
f

p
a
c
k
e
t
s

p
r
o
c
e
s
s
e
d

p
e
r

e
p
o
c
h

(
T
x
+
R
x
)
0
5
1
0
1
5
2
0
SnF PCA (1 PC)
Network load incurred by SnF and PCA schemes
Square grid topology ! Side 3
N
u
m
b
e
r

o
f

p
a
c
k
e
t
s

p
r
o
c
e
s
s
e
d

(
T
x
+
R
x
)

p
e
r

e
p
o
c
h
0
5
1
0
1
5
2
0
3
2
1
6
5
4
9
8
7
1 2 3 4 5 6 7 8 9
Sensor node
Base
station
2
0
1
2
1
2 2
1
3 6
9
Highest
network
load: 17
=
9
i=1
s
i
[t]
9
The highest network load is L
D
max
= S 1 +S = 2S 1 = 17.
With aggregation:
N
e
t
w
o
r
k

l
o
a
d

(
T
x
+
R
x
/
e
p
o
c
h
)
0
5
1
0
1
5
Store and Forward
PCA
(1 princ. comp.)
A
m
o
u
n
t

o
f

p
a
c
k
e
t
s

p
r
o
c
e
s
s
e
d

p
e
r

e
p
o
c
h

(
T
x
+
R
x
)
0
5
1
0
1
5
2
0
SnF PCA (1 PC)
N
u
m
b
e
r

o
f

p
a
c
k
e
t
s

p
r
o
c
e
s
s
e
d

(
T
x
+
R
x
)

p
e
r

e
p
o
c
h
0
5
1
0
1
5
2
0
3
2
1
6
5
4
9
8
7
1 2 3 4 5 6 7 8 9
Sensor node
Base
station
2
0
1
1
1
1
1 1
1
1
1
=
9
i=1
s
i
[t]
9
Highest
network
load: 3
A
max
= |C
i
C
| + 1 = 3, where |C
i
C
|
is the maximum number of children in the routing tree.
Data aggregation
Advantage of aggregation:
Reduce the communication.
The network load does not depend on the network size:
scalability.
Changes the network load distribution: the root node is no
more the bottleneck.
Data aggregation
Operators:
Count, sum, average, min, max (SQL operators, [Madden et
al., 2005]),
Distributed regression [Guestrin, 2004].
Middleware:
Tiny Diusion [Intanagonwiwat, 2000],
Tiny Aggregation (TAG) [Madden et al., 2005],
Dozer [Burri, 2007], . . .
Principal Component Aggregation [Le Borgne et al., 2008]
Principal Component Analysis
PCA: Reduces data dimensionality by nding the subspace of
dimension q that best represents the data.
S coordinates
q<S coordinates
s
2
[
t
]
s
3
[t]
s
1
[t]
(s
1
[t], s
2
[t], s
3
[t]) R
S
(z
1
[t], z
2
[t]) R
q
Here, S = 3, q = 2.
Properties
PCA: versatile technique for
Compression,
noise ltering,
event detection,
event recognition.
It is particularly appropriate when data are correlated (as is
the case with wireless sensor network data).
Illustrative example
No noise SNR=1
Three dierent phenomena, appearing and
disappearing over 20 epochs.
Thirty cycles of 3*20 epochs. First cycle used
for learning, and the remaining used for
testing.
The sensor network is a 10*10 grid of sensor
nodes.
1 PC 3 PCs 100 PCs
Dramatic dimensionality reduction (from 100 to 3!).
noise ltering.
Computation
On the basis on N observations, minimize:
arg min
W
[S,q]
N
t=1
||s[t] WW
T
s[t]||
2
PCA basis
rst eigenvectors of
the covariance matrix of
the measurements
q
Projection
on the PC basis:
z[t] = W
T
s[t]
Initialization stage
Dimensionality reduction
z[t] R
q
, s[t] R
S
W
[S,q]
(q S)
Principal Component Aggregation
In-network projections
Coordinates z
k
[t] are products w
T
k
s[t] =
S
i =1
w
(i ,k)
s
i
[t].
If elements w
(i ,k)
, 1 k q are made available to each
sensor during the initialization stage, these products can be
computed with an aggregation service.
3 6
2
1
5
4
9
8
7
Base
station
w
(1,1)
s
1
[t]
3
i=1
w
(i,1)
s
i
[t]
6
i=1
w
(i,1)
s
i
[t]
5
i=4
w
(i,1)
s
i
[t]
9
i=1
w
(i,1)
s
i
[t]
9
i=1
w
(i,1)
s
i
[t] = w
T
1
s[t] = z
1
[t]
w
(4,1)
s
4
[t]
2
i=1
w
(i,1)
s
i
[t]
w
(7,1)
s
7
[t]
8
i=7
w
(i,1)
s
i
[t]
q=1
N
e
t
w
o
r
k

l
o
a
d

(
T
x
+
R
x
/
e
p
o
c
h
)
0
5
1
0
1
5
Store and Forward
PCA
(1 princ. comp.)
A
m
o
u
n
t

o
f

p
a
c
k
e
t
s

p
r
o
c
e
s
s
e
d

p
e
r

e
p
o
c
h

(
T
x
+
R
x
)
0
5
1
0
1
5
2
0
SnF PCA (1 PC)
N
u
m
b
e
r

o
f

p
a
c
k
e
t
s

p
r
o
c
e
s
s
e
d

(
T
x
+
R
x
)

p
e
r

e
p
o
c
h
0
5
1
0
1
5
2
0
3
2
1
6
5
4
9
8
7
1 2 3 4 5 6 7 8 9
Sensor node
Base
station
2
0
q
q
q
q
q q
q
q
q
z[t] R
q
Highest
network
load: 3q
PCAg
max
= q(|C
i
C
| + 1), where |C
i
C
|
is the maximum number of children in the routing tree.
Data accuracy
When measurements are correlated, few PCs retain most of the
variance (P(q): Proportion of retained variance).
Number q of PCs
1
S 0
0
White noise
Low correlations
High correlations
P
(
q
)
Cross-layer optimization

5 10 15 20 25 30
0
5
0
1
0
0
1
5
0
2
0
0
Maximum number of children in the routing tree
H
i
g
h
e
s
t

n
e
t
w
o
r
k

l
o
a
d
Action type
D action
1 A action
3 A actions
5 A actions
H
i
g
h
e
s
t

n
e
t
w
o
r
k

l
o
a
d
|C
i
C
|
Maximum number of children
Reducing the number of children |C
i
C
| in the routing tree
further reduces the communication costs of the PCAg.
Summary:
Advantages:
Allows to transform data within a sensor network.
Applications: Compression, noise ltering, event recognition
or detection.
Scalability: The network load does not depend on network
size.
Drawbacks:
Requires an initialization stage,
Sensitive to sensor failure, and outliers.
Other experimental setups
Several deployments at the ULB.
Data sets (and code) available at www.ulb.ac.be/di/labo.
Microclimate monitoring - Solbosch greenhouses.
18 sensors in three greenhouses. Several experiments.
Data collected: Temperature, humidity and light.
Sampling interval: 5 minutes.
Experimental setups monitoring - Unit of Social Ecology.
18 sensors in three experimental labs, running for 5 days.
Data collected: Temperature, humidity and light.
Sampling interval: 5 minutes.
PIMAN project (Region Bruxelles Capitale - 2007/2008):
Goal: localize an operator in an industrial environment.
Techniques: Triangulation, multidimensional scaling, Kalman lters.
Several deployments (up to 48 sensors).
Perspectives
Next challenges for learning and wireless networks:
Wireless sensor and actuators networks
Sense and control.
Ultrawide band
Cognitive radios.
Miniaturization goes on
Scalable data processing techniques.
1950 1990 2000 2010 2020
?
Thank you for your attention!
Questions?

For Data Compression and Prediction: Learning in Wireless Sensor Networks

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

For Data Compression and Prediction: Learning in Wireless Sensor Networks

Enviado por

Direitos autorais:

Formatos disponíveis

Preliminaries Model-driven Replicated models Aggregative approaches Conclusions

Learning in Wireless Sensor Networks

C) and humidity ( 3.5%) measurements. The in-

Você também pode gostar