Escolar Documentos
Profissional Documentos
Cultura Documentos
Context
promoted by the National Competence Center in Research (NCCR), under the authority of the Swiss National Science Foundation (FNRS) goal: study fundamental & applied questions raised by future mobile communication and information systems call for proposals for 2nd phase (Nov. 2005 Oct. 2009)
research cluster: in-network information management support end-to-end information management for sensor and mobile ad-hoc networks
Semantic integration of heterogeneous sensor network resources and backend databases exploiting the temporal and spatial dimension is required to make sensor data available through the Internet and Grid infrastructure
Outline
1. Introduction
Definition, Applications, Differences, Storage
2. Queries
2.1. Querying in Cougar 2.2. Querying in TinyDB 2.3. In-network Aggregation
3. Other Issues
1. Introduction
relatively new can benefit from current efforts in data streams and P2P networks
1. Introduction
Sensor network:
10s to 100s of autonomous nodes that operate without human interaction (e.g. configuration of network routes, recharging of batteries, tuning of parameters) for weeks or months sensor node:
battery-powered, wireless computer physically small (few cubic centimeters) extremely low power (few tens of milliWatts versus tens of Watts for a laptop) Power = Watts (W) = Amps (A) * Volts (V) Energy = Joules (J) = W * time
1. Introduction
takes time-stamped measurements of physical phenomena, e.g. temperature, light, sound, air pressure: sensor data contains its characteristics, e.g. id, location, type of sensor: stored data a sensor network database: combination of sensor data and stored data from every sensor
Modern sensors:
do not only respond to physical signals to produce data embed computing and communications capabilities: able to store, process locally, transfer the data they produce
6
1. Introduction
Many applications monitor the physical world by querying and analyzing sensor data, e.g.:
supervising items in a factory warehouse (temperature) organizing vehicle traffic in a large city (vehicle passing) monitoring earthquakes in shake-test sites monitoring habitat: petrels (birds) on Great Duck Island
1. Introduction
Differences between sensor network DB and other DB, at the physical level:
node memory is limited by cost and energy considerations, unlike disk storage that has become incredibly inexpensive the system is highly volatile (nodes may be depleted, links may go down)
the system should provide the illusion of a stable environment
access to data may be hampered by long delays; rates at which input data arrives to a DB operator can be highly variable
rates and availability of data have to be continuously monitored (not enough to make a query execution plan once)
1. Introduction
relational tables are not static (new data is continuously being sensed)
regarded as append-only streams where certain reordering operations no longer available
high energy cost of communication encourages in-networking processing during query execution
query processing to be closely coupled and co-optimized with the networking layer
sensor tasking interacts with the sensor database system classical metrics of DBMS performance have to be adjusted
1. Introduction
Differences between sensor network DB and other DB, at the logical level:
sensor network data consists of measurements from the physical world, which include errors (e.g. noise)
range queries (instead of exact queries) and probabilistic or approximate queries
10
1. Introduction
data is extracted from the sensor network in a predefined way and is stored in a database located on a unique front-end server (connected to the network via an access point) query processing takes place on the centralized database well suited for answering predefined queries over historical data disadvantages:
nodes near the access point become traffic hot spots, central points of failure, may be depleted of energy prematurely does not take advantage of in-network aggregation of data to reduce communication load, when only aggregate data needs to be reported sampling rates have to be set to the highest that might be needed for any potential query, possibly burdening the network with unnecessary traffic
11
1. Introduction
in-network (distributed) approach: stores the data within the network itself and allows queries to be injected anywhere in the network
efficient:
only relevant data are extracted from the sensor network allows data to be aggregated before it is sent to an external query
12
1. Introduction
network usage:
total usage: total nb. of paquets sent in the network hot spot usage: max. nb. of paquets processed by any particular node
preprocessing time: time to construct an index storage space: storage for data and index query time: time to process a query, assemble an answer, and return it throughput: average nb. of queries processed per unit of time update and maintenance cost: cost for sensor data insertions, deletions, or repairs when nodes fail
13
1. Introduction
persistence: data stored must remain available to queries, despite sensor node failures and changes in the network topology consistency: a query must be routed correctly to a node where the data are stored controlled access to data: different update operations must not undo one anothers work, queries must always see a valid state of the DB scalability in network size: as the nb. of nodes increases, the total storage capacity should increase, and the communication cost should not grow unduly load balancing: storage should not unduly burden any node, nor should a node become a concentration point of communication topological generality: DB architecture should work well on broad range of network topologies
14
2. Queries
Express queries to a sensor network DB at a logical, declarative level, using SQL Example: flood warning system. A user from an emergency management agency sends a query to the flood sensor DB:
for the next 3 hours, retrieve every 10 minutes the maximum rainfall level in each county in Southern California, if it is greater than 3.0 inches
select max(rainfall_level), county from Sensors where state = 'Southern California' group by county having max(rainfall_level) > 3.0 in duration [now, now + 180 min] sampling period 10 min
15
2. Queries
Characteristics of queries:
a query is expressed over one table comprising all sensors in the network, each sensor corresponds to a row in the table. assumed that the DB schema is known at a fixed base station. For a P2P system where a query may originate from any node, the DB schema will have to be broadcast to every node. monitoring queries are long-running duration clause: period during which data is to be collected, sampling period clause: frequency at which the query results are returned. desired result is a set of notifications of system activity (periodic or triggered by special situations)
16
2. Queries
most queries contain some conditions on the sensors involved (usually geographical conditions)
17
2. Queries
3 types of queries:
A user interacting with a sensor DB will issue a sequence of queries to obtain the information use outputs from past queries as inputs to further commands
18
maintains an SQL-like query interface for users at a front-end server connected to a sensor network represents sensor data as time series (each measurement associated with a timestamp) creates for each type of sensor (e.g. temperature sensors, seismic sensors) an abstract data type (ADT)
an ADT provides access to encapsulated data through a set of functions
assumes that the nodes are time synchronized reasonably well no misalignment when multiple time series are aggregated
19
a measurement may not be instantaneously available (network delays) Cougar introduces virtual relations (defined for ADT methods):
relations that are not actually materialized, views that are persistent during their associated time interval whenever a signal processing function returns a value, a record is inserted into the virtual relation (append-only) records are never updated or deleted
20
Example: simplified schema of the sensor DB contains one relation Sensors(loc POINT, floor INT, s SENSORNODE)
loc: location of the sensor floor: floor where the sensor is located in the data warehouse s: sensor node SENSORNODE is an ADT that has the methods: getTemp() and detectAlarmTemp(threshold) where threshold is the temperature above which abnormal temperatures are returned. Both methods return temperature as float.
21
Query2: every minute, return the temperature measured by all sensors on the third floor
select Sensors.s.getTemp() from Sensors where Sensors.floor=3 and $every(60);
22
Query3: generate a notification whenever two sensors within 5 yards of each other measure simultaneously an abnormal temperature
select S1.s.detectAlarmTemp(100), S2.s.detectAlarmTemp(100), from Sensors S1, Sensors S2 where distance(S1.loc,S2.loc) < 5 and S1.s > S2.s and $every();
23
expensive to transmit data from all sensors to the front-end server where the query processing could be performed Cougar considers distributed (in-network) query
processing
ex: push the selection (max(rainfall_level) > 3.0 in) out to each sensor, so that only those that satisfy the condition return a virtual record (level measurement + sensor ID + timestamp)
24
to model uncertainty (due to device noise, environmental perturbations), Cougar uses Gaussian ADT (GADT)
models uncertainty as a continuous probability distribution function over measurement values GADT has a set of defined functions: Prob, Diff, Conf ex: retrieve from sensors all tuples whose temperature is within 0.5 degrees of 68 degrees, with at least 60% probability
select * from Sensors where Sensors.s.getTemp().Prob([67.5,68.5] >= 0.6)
25
distributed query processor, runs on Berkeley Mica mote platform, on top of TinyOS operating system successful deployments in Intel Berkeley Lab and redwood trees at UC Botanical Garden
largest deployment: ~80 weather station nodes collect dense sensor readings to monitor climatic variations across altitudes, angles, time, forest locations, etc. study how dense sensor data affect predictions of conventional tree-growth models
26
goal: reduce power consumption (placing new sensors, replacing or recharging batteries of sensors is time consuming and expensive) idea: smart sensors have control over where, when, and how often data is physically acquired (i.e. sampled) and delivered to query processing operators TinyDB has the features of a traditional query processor (select, join, project, aggregate), and special ACQP features
27
query submitted at a PC (base station), parsed, optimized query sent into the sensor network, disseminated, processed result flows back up the routing tree that was formed as the query propagated
28
Snoozing mode: processor and radio idle, waiting for a timer to expire, or external event to wake the device Processing mode: when the device wakes, it enters this mode. Query results generated locally Processing and Receiving mode: results collected from neighbors over the radio Transmitting mode: results for the query are delivered by the local mote
29
Communication:
Typical communication distances for low power wireless radios: few feet to around 100 feet
short ranges imply multi-hop communication where intermediate nodes relay information for their peers
30
Routing tree:
allows a base station at the root of the network to disseminate a query and collect query results formed by:
the root sends a request, all child nodes that hear this request process it, and forward it on to their children, and so on, until the entire network has heard the request, nodes pick a parent node (with the most reliable connection to the root, i.e. highest link quality). This parent will be responsible for forwarding the nodes (and its childrens) query results to the base station.
31
A
B
R:{} C
B
D R:{} B E
Lina Al-Jadir, April 2005 32
B B B
R:{} F
B
declarative SQL-like query interface (selection, join, projection, aggregation) + explicit support for sampling, windowing view the entire sensor network as:
single, infinitely-long logical table, with columns for all the attributes defined in the network
sensor readings (one column per sensor type) meta-data: node id, location, etc. internal states: routing tree parent, timestamp, etc.
Query1: return sensor id, light and temperature readings, once per second for 10 seconds
select nodeid, light, temp from Sensors sample interval 1s for 10s
results of a query:
stream to the root, where they may be logged or output to the user output: sequence of tuples, each tuple includes a timestamp.
34
some blocking operations (e.g. sort) not allowed over such streams unless a bounded subset of the stream, or window, is specified windows defined as fixed-size materialization points over streams
create storage point recentlight size 8 as (select nodeid, light from Sensors sample interval 10s)
35
Joins allowed between 2 storage points on the same node, or between a storage point and the Sensors relation. Query2: return number of recent light readings (from 0 to 8 in the past) that were brighter than the current reading, every 10 seconds (landmark query)
select count(*) from Sensors s, recentLight r where r.nodeId = s.nodeId and r.light > s.light sample interval 10s
36
TinyDB supports grouped aggregations, and temporal operations Query3: return the average volume over the last 30 seconds, once every 5 seconds, sampling once per second (sliding-window query)
select winavg(volume, 30s, 5s) from Sensors sample interval 1s
can be used to stop a query via stop query id command, or queries can be limited to run for a period via a FOR clause, or include a stopping condition as an event
37
Event-based queries:
events as a mechanism for initiating data collection events generated explicitly, either by another query or the operating system Query4: report the average light and temperature at sensors near a bird nest when a bird has been detected
on event bird-detect(loc): select avg(light), avg(temp), event.loc from Sensors s where dist(s.loc, event.loc) < 10m sample interval 2s for 30s
38
allow the system to be dormant until some external conditions occur, instead of continually polling or blocking on an iterator waiting for some data to arrive significant reduction in power consumption
39
Lifetime-based queries:
in lieu of explicit SAMPLE INTERVAL clause, users may request a specific query lifetime via LIFETIME <x> clause, where x is a duration in days, weeks, or months Query5: the network should run for at least 30 days, sampling light and acceleration sensors at a rate that is as quick as possible
select nodeId, light, accel from Sensors lifetime 30 days
computes sampling and transmission rate given a number of Joules of energy remaining
40
Power-aware optimization:
queries are parsed and optimized at the base station (ordering of sampling, selection, and joins) before being disseminated in the network each node in TinyDB has metadata describing its local attributes. This metadata is periodically copied to the root for use by the optimizer.
power: cost to sample this attribute (in J) sample time (in s) constant?: is this attribute constant-valued (e.g. id)? rate of change: how fast the attribute changes (units/s) range: dynamic range of attribute values
41
a sample from a sensor s must be taken to evaluate any predicate over the attribute sensors.s. if a predicate discards a tuple of the sensors table, then subsequent predicates need not examine the tuple, and the expense of sampling any attributes in those predicates can be avoided
42
example:
select accel, mag from Sensors where accel > c1 and mag > c2 sample interval 1s 3 possible query plans: P1: magnetometer and accelerometer sampled before either selection is applied P2: mag. sampled - selection over it accel. sampled selection over it P3: accel. sampled - selection over it mag. sampled selection over it
P1 (traditional DBMS) always more expensive than P2 and P3 P3 better than P2 since Powermagn >> Poweraccel (unless the mag. predicate is much more selective than accel predicate).
43
Power-sensitive dissemination:
after the query has been optimized, it is disseminated into the network:
broadcast of the query from the root as each sensor hears the query, it must decide if the query applies locally and/or needs to be broadcast to its children in the routing tree if a query does not apply at a node, or at any of its children, the entire subtree is excluded from the query saves time of disseminating, executing, forwarding results extends the nodes lifetime common situation: constant-valued attributes (nodeId or location in fixedlocation network) used in a query predicate
44
routing tree that allows each node to efficiently determine if any of the nodes below it needs to participate in a query over some constant attribute A conceptually, it is an index over A used to locate nodes that have data relevant to the query each node stores a single unidimensional interval: the range of A values beneath each of its children when a query q with a predicate over A arrives at node n:
if the query applies locally, n begins executing the query itself if any childs value of A overlaps the query range of A in q, n prepares to receive results and forwards the query; otherwise the query is not forwarded
45
example: SRT over the latitude (x in location). Only 3 nodes participate in the query.
46
even though SRTs are limited to constant attributes, SRT maintenance must occur:
new nodes can appear, existing nodes can fail, link qualities can change
using SRT is efficient, but maintenance and construction costs construction of SRT: several policies for parent selection
each node picks a random parent from the nodes with which it can communicate reliably each node picks a parent whose attribute value is closest to his own
47
Processing queries:
once a query has been optimized and disseminated, the query processor executes it
query execution = sequence of operations at each node
node sleeps -- wakes -- samples the sensor -- applies operator to data generated locally and received from children -- delivers the result to its parent
once results have been sampled and operators applied, the results are enqueued onto radio queue for delivery (both tuples from the local node and tuples forwarded from other nodes) when network contention and data rates are low, the queue can be drained faster than results arrive but situations when the queue will overflow prioritizing data delivery
48
nave: FIFO delivery, tuples are dropped if they do not fit in the queue winavg: 2 results at the head of the queue are averaged to make room delta: the largest changes are probably interesting
a tuple is assigned an initial score relative to its difference from the most recent value transmitted from this node at each point in time, the tuple with the highest score is delivered the tuple with the lowest score is dropped when the queue overflows
network contention: reduce the frequency of network-related losses power consumption: meet lifetime requirements
49
To summarize, ACQP:
when: event clause where: semantic routing trees how often: lifetime clause, adapting sampling and transmission rates query optimization: ordering of sampling operators quality of data: prioritizing data
50
in-network approach:
each sensor sends its data directly to the server total of 16 message transmissions
each sensor computes a partial state record, consisting of {sum, count}, based on its data and that of its children (if any) total of only 6 message transmissions
1 2 3 3 4 3
S
f(c,d,f(a,b),e) c f(a,b) a
51
through the routing structure (e.g. routing tree) using a broadcast mechanism using multicast to reach only the nodes that may contribute to the query (e.g. if having-predicate specifies a geographic region)
data is collected and aggregated within the network, using the same routing structure
52
supports 5 SQL operators: count, min, max, sum, average and 2 extensions: median, histogram aggregation implemented via a merging function f (commutative and associative), an initializer i, and an evaluator e:
ex: for average, a partial state record is (S,C), sum and count of sensor values f((S1,C1), (S2,C2)) = (S1+S2, C1+C2) ex: for average: i(x) = (x, 1) ex: for average, e(S,C) = S/C
53
each epoch (sampling period) is divided into time intervals. Nb. of intervals reflects the depth of the routing tree. aggregation results reported at the end of each sampling period when a node broadcasts a query, it specifies the time interval within which it expects to hear the result from its children. during its scheduled interval, each node:
listens for the packets from the children, receives them (gray) computes a new partial state record by combining its own data level 1 and the partial state records from level 2 its children (black) sends the result up the tree to its level 3 parent (white) level 4 start epoch Time
Root
end epoch
54
losing a parent node may orphan an entire subtree each node has to periodically rediscover its parent to make sure it is connected TinyDB also considers providing redundancy by duplicating parent nodes for each child, and by caching data over a past window of time at each node.
55
3. Other issues
Data-Centric Storage:
tree-based query propagation mechanism (used by TinyDB) appropriate when queries issued by a server Data-centric storage (DCS): method to support queries from any node in the network, by providing a rendez-vous mechanism for data and queries build indices to speed up the execution of queries involving data ranges build indices for continuously changing sensor data (continuous updates to a static index incurs heavy modification & communication costs) discard old data and maintain some temporal summaries
56
Data aging
References
Book: Wireless Sensor Networks: An Information Processing Approach, by F. Zhao and L. Guibas, Elsevier, 2004. Papers: Bonnet P., Gehrke J., Seshadri P., Towards Sensor Database Systems, Proc. Int. Conf. on Mobile Data Management (MDM), Hong Kong (China), 2001. Madden S., Franklin M.J., Hellerstein J.M., Hong W., The Design of an Acquisitional Query Processor For Sensor Networks, Proc. Int. Conf. on Management of Data (SIGMOD), San Diego (USA), 2003.
+ Hong W., Madden S., Implementation and Research Issues in Query Processing for Wireless Sensor Networks, tutorial slides, Int. Conf. on Data Engineering (ICDE), Boston (USA), 2004.
57