Você está na página 1de 73

Introduction

R-spatial
Representing spatial data
References

Representing and handling spatial and


spatio-temporal data in R
Roger Bivand1

16 June 2014

Department of Economics, Norwegian School of Economics, Helleveien 30,


N-5045 Bergen, Norway; Roger.Bivand@nhh.no
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Overview
1

Introduction
Motivation: longer lives

R-spatial
What does use look like?
R-spatial sp

Representing spatial data


Olinda Hansens Disease data set
Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Motivation: longer lives

Using spatial data


Being explicit about the use of (spatial) data (for example in
R) permits attention to be given to the important steps of
data handling and representation rushing to analysis often
leads to problems
Handling spatial data is never easy, because representational
choices lead to support consequences, which qualify or
undermine inferential outcomes
Being close to the data and close to the code may give
considerable insight and freedom, and permits co-working with
disciplines sharing frames of understanding that are mutually
comprehensible
Just publishing is not going to be enough; we will need to be
able to demonstrate how our conclusions have been reached
by releasing code and data (some data is public already).
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Motivation: longer lives

Visualizing longer lives I


A news item a year ago described work on overall premature mortality
(<75) in England, linking to Public Health England (PHE) sites:

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Motivation: longer lives

Visualizing longer lives II: forensic reproducible research?

The website does not explain what


statistical methods have been used,
other than saying that directly
standardised rates were used. In the
absence of contact details, I posted a
question under the Connect tab, and
received a reply (antedated) some time
later:

Roger Bivand

Representing spatial data

Visualizing longer lives III


Before contacting Paul Fryers by email
and meeting him at a spatial
epidemiology meeting, it was not easy to
grasp how the rates had been
constructed, and especially how the class
intervals for the maps had been
constructed. The data are provided in
spreadsheet form, and can be merged
with area boundaries (Contains
Ordnance Survey data Crown
copyright and database right 2013):
> library(rgdal)
> sm <- readOGR(".", "Prem_mort_sim")
> sm1 <- sm[!is.na(sm$Value),
+
]
> names(sm1)[16:26]
[1]
[4]
[7]
[10]

"Indictr"
"Area_Nm"
"Uppr_CI"
"Sex"

"Tim_Prd" "Area_Cd"
"Value"
"Lowr_CI"
"Count"
"Denmntr"
"Age"

Visualizing longer lives IV

Premature mortality rate per 100,000, 20092011, England;


box width proportional to population
450

400

250

300

350

200

If we compare the different classes of


local authorities represented in the data
set for premature mortality rates per
100,000, 2009-2011, we see considerable
variability between class, reflecting some
of the structuring of the online maps.
Covariates are also available for
download, but are not used here.

County

London Borough

Unitary Authority

30

20

I(Uppr_CI Lowr_CI)

40

10

Using the data provided, and credit


should be given to PHE for making it
available, we can see that the gap
between the upper and lower (95%)
confidence intervals is large for
observations with small populations and
vice-versa. The Denmntr variable is
three-years worth of the age
standardised European Standard
Population, it turns out, and the rates
(Value) are annual per 100,000:

50

Visualizing longer lives V

0e+00

1e+06

2e+06
Denmntr

3e+06

4e+06

Visualizing longer lives VI

400

300
200

sm1$Value alpha95 * se

200

250

300

350

400

450

sm1$Lowr_CI

Upper CI
250 300 350 400 450

> alpha95 <- qnorm((1 - 0.95)/2,


+
lower.tail = FALSE)
> se <- sqrt((sm1$Value * (1e+05 +
sm1$Value))/sm1$Denmntr)

Lower CI

sm1$Value + alpha95 * se

In conversation, Paul Fryers referred to


work (and spreadsheets with macros) on
the Association of Public Health
Observatories website. These appear to
have connections to the use of funnel
plots to reflect the sample size effect on
the rates from smaller aggregate areas.
Using formulae from Dover and
Schopflocher (2011, p. 3), we can
reconstruct the confidence intervals:

250

300

350
sm1$Uppr_CI

400

450

Visualizing longer lives VII

350

sm1$Value[o]

300

alpha998 <- qnorm((1 - 0.998)/2,


lower.tail = FALSE)
phat <- 268
o <- order(sm1$Denmntr)
se <- sqrt((phat * (1e+05 phat))/sm1$Denmntr)
sm1$zvalue <- (sm1$Value - phat)/se

250

>
+
>
>
>
+
>

1e+06

0e+00

200

Given this insight, we can construct a


funnel plot and fumble towards the class
intervals used for the published maps of
health outcomes:

County
London Borough
Metropolitan District
Unitary Authority

400

450

2e+06
sm1$Denmntr[o]

3e+06

4e+06

Visualizing longer lives VIII


So finally the five class 99.8% and 95% map, and the (politically chosen)
four class 95% map split on zero:
Funnel plot zscores, 4 classes

Funnel plot zscores, 5 classes


40

40

30

30

20

20

10

10

10

10

20

20

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

R-spatial
1

Introduction
Motivation: longer lives

R-spatial
What does use look like?
R-spatial sp

Representing spatial data


Olinda Hansens Disease data set
Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

R-spatial
Before describing how R functionality for using spatial data,
let us look at some examples of autonomous uses
Once software is released, it may start living its own life, as
users express their own needs in a scripting environment
like R, users are offered great freedom
Following these examples, we will move forward by showing
how the R-spatial community and software projects have
grown
This in turn will point to the crucial dependence of R-spatial
on the wider OSGeo communities, their projects, software,
and willingness to help
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

What does use of R-spatial look like?


The intention of providing classes for spatial data in R is to let users and
developers work without having to re-think representation. Some
examples:
The FAO yearbook and their input to the classInt package for
choropleth map class intervals
plotKML to permit spatial and spatio-temporal results to be
displayed in Google Earth
James Cheshires enthusiastic blog and courses in an active
community largely in England focusses on visualisation, with
extensions to ggplot2
Oscar Perpi
nan Lamigueiros considered work in displaying data
(and a new book) is driven by the needs of work on solar power
(and much more)
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

The 2013 FAO Statistical Yearbook


The 2013 FAO Statistical Yearbook . . . has been created from beginning
to end with the statistical software R and the typesetting language
LaTeX: from data retrieval, to data processing, indicator construction,
and blueprint-ready pdf file for distribution.

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

Google Earth and plotKML


Partly to meet the needs of Global Soil
Information Facilities, Tomislav Hengl
has been coordinating activity on
displaying output on Google Earth
(admittedly not open source) in the
plotKML package
(http://gsif.isric.org/doku.php?
id=wiki:tutorial_plotkml). Other
packages use R graphical devices with
contextual backgrounds from GE and
OSM.

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

James Cheshire: spatial.ly

http://spatial.ly/ is a successful
and upbeat blog, evangelical in tone:
The software has become established as
one of the best around for statistics and
it is becoming increasingly recognised as
a tool for data visualisation and spatial
analysis.
(http://spatial.ly/2013/04/
analysis-visualisation-spatial-data/)

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

Ease of access and representativeness I


James Cheshire: Thanks to the release of log files containing all hits to
http://cran.rstudio.com/ server it is possible to make a map showing the
parts of the world with the most active R users (specifically those mostly
using the RStudio interface).
(http://spatial.ly/2013/06/r_activity/)

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

Ease of access and representativeness II

Oscar Perpi
nan: More or less explicitly they use the RStudio logs to give
an answer to the question How many people use this R package? In my
opinion, such question cannot be answered safely with the RStudio
download logs.
The RStudio mirror is a sample that cannot be safely regarded as
representative of the mirrors network. The mirrors of the Comprehensive
R Archive Network, mostly operated by public or nonprofit institutions,
provide faster package downloads for users at their geographical location
(http://procomun.wordpress.com/2013/06/15/rstudiologs/).

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

Displaying spatial data


Oscar Perpi
n
an: Some time ago . . . I wondered how a multivariate choropleth
map could be produced with R. Here is the code I have arranged to show the
results of the last Spanish general elections in a similar fashion
(http://procomun.wordpress.com/2012/02/18/maps_with_r_1/).

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

Bergens Tidende interactive graphics


R can also be used to prepare data for Javascript (including d3
http://d3js.org/), including maps. This is a recent example from my local
newspaper concerning immigrant populations in Norwegian municipalities
(http://multimedia.bt.no/fjordkloden):

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

Bergens Tidende interactive graphics


However, when the reporting units are very detailed, it is possible that the
unreflective or untrained observer may draw unwarranted conclusions (this is
todays map, I produced the boundaries which are Statistics Norway units
clipped against the coastline):

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

R spatial
Albrecht Gebhardt did a lot of the early porting from S to R; there
were also valuable contributions from spatial statisticians at
Lancaster University, at UCAR, and many others
Kurt Hornik, who runs CRAN, encouraged me to talk about R and
GIS at the March 2001 Distributed Statistical Computing meeting in
Vienna, at which I got to know active developers personally
At a meeting in Santa Barbara in Spring 2002, I met other
researchers who provided critical, occasionally very critical,
comments and encouragement to continue fostering an R spatial
community
By the next DSC meeting in March 2003, I was organising a
thematic session on spatial statistics, and a crucial fringe developers
workshop to discuss how to advance spatial data analysis in R
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

CRAN Spatial task view


Since 2003, a number of
community-building steps have been
made over and above developing
contributed packages. From the CRAN
side, the Spatial task view is the hub, to
which traffic is channelled to package
pages and to ancilliary websites, as well
as the special interest group mailing list.
There is also a task view for handling
and snalyzing spatio-temporal data

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

R-sig-geo mailing list

Roger Bivand

monthly number of emails on rsiggeo


400

300

100

200

# of emails

Following the 2003 workshop, we


started a project on Sourceforge to
permit joint development, and a mailing
list served within the family of R lists
from Zurich. Traffic on the list has
grown steadily, with a subscribed
membership in May 2014 of over 3000.
Naturally, many of these lurk without
posting, while others post without
helping, and many fewer help by
answering posted questions. This final
group is however growing, and since the
list archives are also kept on Nabble,
they can be searched for information.

2004

2006

Representing spatial data

2008

2010

2012

2014

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

Spatial on R Forge
R Forge is used actively by individuals
and groups in developing packages for
spatial data analysis, with 100 projects
registered in August 2012. Some
projects are registered in more than one
topical area, some may never mature,
but some are already in active use; the
raster package is already frequently
discussed on R-sig-geo it was released
to CRAN in late March 2010 after a
gestation of 16 months.

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

What does use look like?


R-spatial sp

The sp package
In 2003, we agreed that a shared system of
new-style classes to contain spatial data would
permit many-to-one and on-to-many conversion
of representations, avoiding the then prevalent
many-to-many conversion problem. The idea
was to make it easier for GIS people and stats
people to work together by creating objects that
looked familiar to both groups, although the
groups differ a lot in how they see data
objects. Package dependencies have grown,
here the upper diagram (from our book) shows
packages depending on sp in April 2008, the
lower diagram in May 2014 (see also
http://dirk.eddelbuettel.com/blog/2012/
08/16#counting_cran_depends_followup):
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Representing spatial data


1

Introduction
Motivation: longer lives

R-spatial
What does use look like?
R-spatial sp

Representing spatial data


Olinda Hansens Disease data set
Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Olinda Hansens Disease data set


The data are discussed in Lara, T. et al. (2001) Leprosy
surveillance in Olinda, Brazil, using spatial analysis techniques.
Cadernos de Sa
ude P
ublica, 17(5): 11531162.
They use 243 1990 census districts, and include the
1991-1996 counts of new cases (CASES), 1993 year-end
population figures (POP) and a deprivation index (DEPRIV)
The data have been made available in cooperation with
Marilia Sa Carvalho, and their use here is partly based on
teaching notes that we wrote together some years ago
The data now take the form of a shapefile, but were initially
assembled from a Mapinfo file of census district borders and
text files of variable values
Roger Bivand

Representing spatial data

Olinda Hansens Disease data set


So that weve got something to go on, lets read in the data set, using an
interface function in the rgdal package (packages are kept in libraries and
extend Rs basic functions and classes):
> library(rgdal)

The function as we will see later takes at least two arguments, a data
source name, here the current directory, and a layer name, here the name
of the shapefile without extension. Once we have checked the names in
the imported object, we can use the spplot method in sp to make a
map:
> olinda <- readOGR(".", "setor1")
OGR data source with driver: ESRI Shapefile
Source: ".", layer: "setor1"
with 243 features and 10 fields
Feature type: wkbPolygon with 2 dimensions
> names(olinda)
[1] "AREA"
[7] "SET"

"PERIMETER" "SETOR_"
"CASES"
"POP"

"SETOR_ID"
"DEPRIV"

"VAR5"

"DENS_DEMO"

> spplot(olinda, "DEPRIV", col.regions = grey.colors(20, 0.9, 0.3))

Map of deprivation variable

0.8

0.6

0.4

0.2

0.0

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Data frames

Very often data are stored in a data frame with rows


representing observations, here 1990 census districts in Olinda
In data frames, the columns can be of different classes in a
matrix, all the columns have to have the same class
The columns are accessed using the $ operator; the columns
are list elements of the data frame
We can also use square brackets to access data frame values,
like look-up in a data base

Roger Bivand

Representing spatial data

Olinda data

> str(as(olinda, "data.frame"))

'data.frame': 243 obs. of 10 variables:


$ AREA
: num 79139 151153 265037 137696 121873 ...
$ PERIMETER: num 1270 2189 2818 2098 3353 ...
$ SETOR_
: num 2 3 4 5 6 7 8 9 10 11 ...
$ SETOR_ID : num 1 2 3 4 5 6 9 10 11 12 ...
$ VAR5
: int 242 224 223 229 228 222 218 225 227 221 ...
$ DENS_DEMO: num 4380 10563 6642 13174 13812 ...
$ SET
: num 242 224 223 229 228 222 218 225 227 221 ...
$ CASES
: num 1 6 5 1 1 4 6 2 1 6 ...
$ POP
: num 337 1550 1711 1767 1638 ...
$ DEPRIV
: num 0.412 0.168 0.192 0.472 0.302 0.097 0.141 0.199 0.228 0.223 .

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Object framework
To begin with, all contributed packages for handling spatial
data in R had different representations of the data.
This made it difficult to exchange data both within R between
packages, and between R and external file formats and
applications.
The result has been an attempt to develop shared classes to
represent spatial data in R, allowing some shared methods and
many-to-one, one-to-many conversions.
We chose to use new-style classes to represent spatial data,
and are confident that this choice was justified.

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Spatial objects
The foundation object is the Spatial class, with just two slots
(new-style class objects have pre-defined components called
slots)
The first is a bounding box, and is mostly used for setting up
plots
The second is a CRS class object defining the coordinate
reference system, and may be set to CRS(as.character(NA)),
its default value.
Operations on Spatial* objects should update or copy these
values to the new Spatial* objects being created

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Coordinate reference systems


Coordinate reference systems (CRS) are at the heart of
geodetics and cartography: how to represent a bumpy ellipsoid
on the plane
We can speak of geographical CRS expressed in degrees and
associated with an ellipse, a prime meridian and a datum, and
projected CRS expressed in a measure of length, and a chosen
position on the earth, as well as the underlying ellipse, prime
meridian and datum.
Most countries have multiple CRS, and where they meet there
is usually a big mess this led to the collection by the
European Petroleum Survey Group (EPSG, now Oil & Gas
Producers (OGP) Surveying & Positioning Committee) of a
geodetic parameter dataset
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Coordinate reference systems


The EPSG list among other sources is used in the workhorse
PROJ.4 library, which as implemented by Frank Warmerdam
handles transformation of spatial positions between different
CRS
This library is interfaced with R in the rgdal package, and the
CRS class is defined partly in sp, partly in rgdal
A CRS object is defined as a character NA string or a valid
PROJ.4 CRS definition
The validity of the definition can only be checked if rgdal is
loaded

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Spatial points

The most basic spatial data object is a point, which may have
2 or 3 dimensions
A single coordinate, or a set of such coordinates, may be used
to define a SpatialPoints object; coordinates should be of
mode double and will be promoted if not
The points in a SpatialPoints object may be associated with
a row of attributes to create a SpatialPointsDataFrame object
The coordinates and attributes may, but do not have to be
keyed to each other using ID values

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Spatial points classes and their slots

SpatialPointsDataFrame

Spatial

SpatialPoints

bbox

coords.nrs

proj4string

data
SpatialPoints
data.frame

coords
Spatial

Roger Bivand

Representing spatial data

Spatial*DataFrames

Spatial*DataFrames are constructed


as Janus-like objects, looking to
mapping and GIS people as their kind
of object, as a collection of geometric
freatures, with data associated with each
feature. But to data analysts, the object
is a data frame, because it behaves like
one; a data frame is a rectangular data
structure with rows of observations on
columns of variables, and is very widely
used in analysis and visualisation in R.

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Spatial lines and polygons


A Line object is just a spaghetti collection of 2D coordinates; a
Polygon object is a Line object with equal first and last coordinates
A Lines object is a list of Line objects, such as all the contours at
a single elevation; the same relationship holds between a Polygons
object and a list of Polygon objects, such as islands belonging to
the same county
A comment is added to each Polygons object to indicate which
interior ring belongs to which exterior ring (SFS)
SpatialLines and SpatialPolygons objects are made using lists
of Lines or Polygons objects respectively
SpatialLinesDataFrame and SpatialPolygonsDataFrame
objects are defined using SpatialLines and SpatialPolygons
objects and standard data frames, and the ID fields are here
required to match the data frame row names
Roger Bivand

Representing spatial data

Spatial Polygons classes and slots


SpatialLines

Lines

Line

lines

Lines

coords

Spatial

ID

SpatialPolygons

Polygons

Polygon

polygons

Polygons

labpt

plotOrder

plotOrder

area

Spatial

labpt
ID

hole

area
Spatial
bbox
proj4string

comment

ringDir
coords

Back to Olinda
So we can treat olinda as a data frame, for example adding an expected
cases variable. The object also works within model fitting functions, like
glm.
> olinda$Expected <- olinda$POP * sum(olinda$CASES, na.rm = TRUE)/sum(olinda$POP,
+
na.rm = TRUE)
> str(olinda, max.level = 2)
Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
..@ data
:'data.frame': 243 obs. of 11 variables:
..@ polygons
:List of 243
.. .. [list output truncated]
..@ plotOrder : int [1:243] 243 56 39 231 224 60 58 232 145 66 ...
..@ bbox
: num [1:2, 1:2] 288955 9111044 298446 9120528
.. ..- attr(*, "dimnames")=List of 2
..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
> str(as(olinda, "data.frame"))
'data.frame': 243 obs. of 11 variables:
$ AREA
: num 79139 151153 265037 137696 121873 ...
$ PERIMETER: num 1270 2189 2818 2098 3353 ...
$ SETOR_
: num 2 3 4 5 6 7 8 9 10 11 ...
$ SETOR_ID : num 1 2 3 4 5 6 9 10 11 12 ...
$ VAR5
: int 242 224 223 229 228 222 218 225 227 221 ...
$ DENS_DEMO: num 4380 10563 6642 13174 13812 ...
$ SET
: num 242 224 223 229 228 222 218 225 227 221 ...
$ CASES
: num 1 6 5 1 1 4 6 2 1 6 ...
$ POP
: num 337 1550 1711 1767 1638 ...
$ DEPRIV
: num 0.412 0.168 0.192 0.472 0.302 0.097 0.141 0.199 0.228 0.223 ...
$ Expected : num 1.4 6.43 7.1 7.33 6.8 ...

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Back to Olinda

The object also works within model fitting functions, like glm; note the
number of rows.
> str(model.frame(CASES ~ DEPRIV + offset(log(POP)), olinda), give.attr = FALSE)
'data.frame': 241 obs. of 3 variables:
$ CASES
: num 1 6 5 1 1 4 6 2 1 6 ...
$ DEPRIV
: num 0.412 0.168 0.192 0.472 0.302 0.097 0.141 0.199 0.228 0.223 ...
$ offset(log(POP)): num 5.82 7.35 7.44 7.48 7.4 ...

Roger Bivand

Representing spatial data

The GridTopology class

The GridTopology class is the key


constituent of raster representations,
giving the coordinates of the south-west
raster cell, the two cell sizes in the
metric of the coordinates, giving the step
to successive centres, and the numbers
of cells in each dimension. Well return
to examples after importing some data:

> library(sp)
> getClass("GridTopology")
Class "GridTopology" [package "sp"]
Slots:
Name: cellcentre.offset
Class:
numeric
Name:
Class:

cellsize
numeric

Name:
Class:

cells.dim
integer

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Spatial grids and pixels


There are two representations for data on regular rectangular
grids (oriented N-S, E-W): SpatialPixels and SpatialGrid
SpatialPixels are like SpatialPoints objects, but the
coordinates have to be regularly spaced; the coordinates are
stored, as are grid indices
SpatialPixelsDataFrame objects only store attribute data

where it is present, but need to store the coordinates and grid


indices of those grid cells
SpatialGridDataFrame objects do not need to store

coordinates, because they fill the entire defined grid, but they
need to store NA values where attribute values are missing
Roger Bivand

Representing spatial data

Spatial grid and pixels classes and their slots

Spatial

SpatialGridDataFrame
SpatialGrid

SpatialGrid

bbox

data

Spatial
grid

proj4string
GridTopology
cellcentre.offset

SpatialPixelsDataFrame

cellsize

SpatialPixels

SpatialPixels

data

grid

cells.dim

grid.index

SpatialPoints

SpatialPoints

coords
Spatial

data.frame

A SpatialGridDataFrame object
The space shuttle flew a radar topography mission in 2000, giving 90m
resolution elevation data for most of the world. The data here have been
warped to a UTM projection, but for the WGS84 datum - well see below
how to project and if need be datum transform Spatial objects:
> DEM <- readGDAL("UTM_dem.tif")
UTM_dem.tif has GDAL driver GTiff
and has 122 rows and 121 columns
> summary(DEM)
Object of class SpatialGridDataFrame
Coordinates:
min
max
x 288307.3 299442.7
y 9109607.8 9120835.2
Is projected: TRUE
proj4string :
[+proj=utm +zone=25 +south +ellps=WGS84 +units=m +no_defs]
Grid attributes:
cellcentre.offset cellsize cells.dim
x
288353.4 92.02797
121
y
9109653.8 92.02797
122
Data attributes:
Min. 1st Qu. Median
Mean 3rd Qu.
Max.
-1.00
3.00
11.00
19.45
31.00
88.00
> is.na(DEM$band1) <- DEM$band1 < 1
> image(DEM, "band1", axes = TRUE, col = terrain.colors(20))

9110000

9112000

9114000

9116000

9118000

9120000

Elevation at Olinda

288000

290000

292000

294000

296000

298000

The original data set CRS


Very often, a good deal of work is needed to find the actual coordinate
reference system of a data set. Typically, researchers use what is to hand,
without thinking that incomplete CRS specifications limit their ability to
integrate other data. Here we were told that the map was from the 1990
census, that it was most likely UTM zone 25 south, but not its datum.
Investigation showed that until very recently, the most used datum in
Brazil has been Corrego Alegre:
> proj4string(olinda)
[1] NA
> EPSG <- make_EPSG()
> EPSG[grep("Corrego Alegre", EPSG$note), 1:2]

154
460
2741
2742
2743
2744
3140
3141
3142
3143
3144

code
4225
5524
5536
5537
5538
5539
22521
22522
22523
22524
22525

#
#
#
#
#

note
# Corrego Alegre 1970-72
# Corrego Alegre 1961
# Corrego Alegre 1961 / UTM zone 21S
# Corrego Alegre 1961 / UTM zone 22S
# Corrego Alegre 1961 / UTM zone 23S
# Corrego Alegre 1961 / UTM zone 24S
Corrego Alegre 1970-72 / UTM zone 21S
Corrego Alegre 1970-72 / UTM zone 22S
Corrego Alegre 1970-72 / UTM zone 23S
Corrego Alegre 1970-72 / UTM zone 24S
Corrego Alegre 1970-72 / UTM zone 25S

The original data set CRS

In order to insert the parameters for the transformation to the WGS84


datum, we have to override the initial imported values:
> set_ReplCRS_warn(FALSE)
[1] FALSE
> proj4string(olinda) <- CRS("+init=epsg:22525")
> proj4string(olinda)

[1] "+init=epsg:22525 +proj=utm +zone=25 +south +ellps=intl +towgs84=-206,172,-6,0,0,0,0 +units=m +no_defs


> proj4string(DEM)
[1] "+proj=utm +zone=25 +south +ellps=WGS84 +units=m +no_defs"
> proj4string(DEM) <- CRS("+init=epsg:31985")
> proj4string(DEM)
[1] "+init=epsg:31985 +proj=utm +zone=25 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
> set_ReplCRS_warn(TRUE)
[1] TRUE

Getting olinda to WGS84

As we see, although both olinda and DEM are UTM zone 25 south, they
differ in their ellipsoid and datum. Using spTransform methods in rgdal
we can undertake a datum shift for olinda, making it possible to
overplot in register:
> olinda1 <- spTransform(olinda, CRS(proj4string(DEM)))
> image(DEM, "band1", axes = TRUE, col = terrain.colors(20))
> plot(olinda1, add = TRUE, lwd = 0.5)

9110000

9112000

9114000

9116000

9118000

9120000

Getting olinda to WGS84

288000

290000

292000

294000

296000

298000

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Reading rasters
There are very many raster and image formats; some allow
only one band of data, others think data bands are RGB,
while yet others are flexible
There is a simple readAsciiGrid function in maptools that
reads ESRI Arc ASCII grids into SpatialGridDataFrame
objects; it does not handle CRS and has a single band
Much more support is available in rgdal in the readGDAL
function, which like readOGR finds a usable driver if
available and proceeds from there
Using arguments to readGDAL, subregions or bands may be
selected, which helps handle large rasters
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

But lets keep things local


Using Landsat 7 panchromatic imagery,
we can get a similar contextual effect,
here with histogram equalised 15m
resolution:
> pan <- readGDAL("L8s.tif")
L8s.tif has GDAL driver GTiff
and has 800 rows and 800 columns
>
>
>
>
+
>
+

bb0 <- set_ReplCRS_warn(FALSE)


proj4string(pan) <- CRS("+init=epsg:31985")
bb0 <- set_ReplCRS_warn(TRUE)
brks <- quantile(pan$band1,
seq(0, 1, 1/255))
pan$lut <- findInterval(pan$band1,
brks, all.inside = TRUE)

> image(pan, "lut", col = grey.colors(20))


> plot(olinda1, add = TRUE, border = "brown",
+
lwd = 0.5)
> title(main = "Landsat panchromatic channel 15m")
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Constructing and using NDVI


Landsat ETM also lets us construct a
30m resolution normalised difference
vegetation index (of more relevance for
other diseases), and plot it using a
ColorBrewer palette:
> letm <- readGDAL("L_ETMps.tif")
L_ETMps.tif has GDAL driver GTiff
and has 400 rows and 400 columns
>
>
>
>
+
>
>
>

bb0 <- set_ReplCRS_warn(FALSE)


proj4string(letm) <- CRS("+init=epsg:31985")
bb0 <- set_ReplCRS_warn(TRUE)
letm$ndvi <- (letm$band4 - letm$band3)/(letm$band4 +
letm$band3)
library(RColorBrewer)
mypal <- brewer.pal(5, "Greens")
greens <- colorRampPalette(mypal)

> image(letm, "ndvi", col = greens(20))


> plot(olinda1, add = TRUE)
Roger Bivand

Representing spatial data

Spatial classes provided by sp

This table summarises the classes provided by sp, and shows how
they build up to the objects of most practical use, the
Spatial*DataFrame family objects:
data type
points
points
pixels
pixels

class
SpatialPoints
SpatialPointsDataFrame
SpatialPixels
SpatialPixelsDataFrame

attributes
none
data.frame
none
data.frame

full grid
full grid
line
lines
lines
lines
polygon
polygons
polygons
polygons

SpatialGrid
SpatialGridDataFrame
Line
Lines
SpatialLines
SpatialLinesDataFrame
Polygon
Polygons
SpatialPolygons
SpatialPolygonsDataFrame

none
data.frame
none
none
none
data.frame
none
none
none
data.frame

extends
Spatial
SpatialPoints
SpatialPoints
SpatialPixels
SpatialPointsDataFrame
SpatialPixels
SpatialGrid
Line list
Spatial, Lines list
SpatialLines
Line
Polygon list
Spatial, Polygons list
SpatialPolygons

Methods provided by sp

This table summarises the methods provided by sp:


method
[
$, $<-, [[, [[<-

spsample
bbox
proj4string
coordinates
coerce
over

what it does
select spatial items (points, lines, polygons, or
rows/cols from a grid) and/or attributes variables
retrieve, set or add attribute table columns
sample points from a set of polygons, on a set of
lines or from a gridded area
get the bounding box
get or set the projection (coordinate reference system)
set or retrieve coordinates
convert from one class to another
combine two different spatial objects

Introduction
R-spatial
Representing spatial data
References

Using

Spatial

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

family objects

Very often, the user never has to manipulate Spatial family


objects directly, as we have been doing here, because methods
to create them from external data are also provided
Because the Spatial*DataFrame family objects behave in most
cases like data frames, most of what we do with standard data
frames just works like [ or $ (but no merge, etc., yet)
These objects are very similar to typical representations of the
same kinds of objects in geographical information systems, so
they do not suit spatial data that is not geographical (like
medical imaging) as such
They provide a standard base for analysis packages on the one
hand, and import and export of data on the other, as well as
shared methods, like those for visualisation
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Writing objects
In rgdal, writeGDAL can write for example multi-band
GeoTiffs, but there are fewer write than read drivers; in
general CRS and geogreferencing are supported see
gdalDrivers

The rgdal function writeOGR can be used to write vector files,


including those formats supported by drivers, including now
KML see ogrDrivers
External software (including different versions) tolerate output
objects in varying degrees, quite often needing tricks - see
mailing list archives
In maptools, there are functions for writing sp objects to
shapefiles writeSpatialShape, etc., as Arc ASCII grids
writeAsciiGrid, and for using the R PNG graphics device for
outputting image overlays for Google Earth
Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Using the direct KML method

One of the nice things about linking to OSGeo software is that when
someone contributes a driver, it is there for other software too. With the
increasing availability of software like Google Earth (or Google Maps), the
ability to display data or results in a context that is familiar to the user
can be helpful, but we need geographical coordinates:
> olinda_ll <- spTransform(olinda1, CRS("+proj=longlat +datum=WGS84"))
> writeOGR(olinda_ll, dsn = "olinda_ll.kml", layer = "olinda_ll", driver = "KML",
+
overwrite_layer = TRUE)

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Olinda census district boundaries in GE

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Extension of

Spatial

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

family objects

Many packages as we saw a moment ago depend on sp,


fewer extend the classes used there; among them are:
spacetime provides classes and methods for spatio-temporal
data, including space-time regular lattices, sparse lattices,
irregular data, and trajectories
raster is built around a number of S4 classes of which the
RasterLayer, RasterBrick, and RasterStack classes are the
most important; a notable feature of the raster package is
that it can work with raster datasets that are stored on disk
and are too large to be loaded into memory

Roger Bivand

Representing spatial data

Spatio-temporal data

A set of classes for spatio-temporal data


is implemented in spacetime; all classes
derive from an abstract class ST, a
template for specific classes holding
data. Spatial locations derive from
Spatial, and hence may reflect points,
pixels, lines or polygons. Slot time holds
the (start) times, and should be of class
xts. Slot endTime of class POSIXct
indicates end times, and must be a
vector having length identical to the
number of records in slot time.

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Reshaping data frames


There are lots of data frames and matrices that we could use
to try out reshaping and indexing
The plm package simply adds an argument to functions
otherwise using a data frame, which, if NULL, defaults to
assuming that the first two columns contain the individual and
the time index and that observations are ordered by individual
and by time period; data are stored in the long format
One popular spatial panel data set (Baltagi & Levin 1986,
1992) concerns annual cigarette consumption in 46 US states
19631992, stored in long format
Well use this data set to get us started, looking at the
reshaping of this kind of data, simplified because there are no
missing values (downloaded from Wiley website, with column
names added)
Roger Bivand

Representing spatial data

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Introduction
R-spatial
Representing spatial data
References

Reading in spatio-temporal data


The cigarette data are not spatial as
such, and the states are only given by
numeric ID, so need to be assigned as
factor level labels. The underlying
papers detail in tables which states are
omitted. The studies are aimed at
relating per capita sales to the price per
pack in the state itself and its
neighbours, after taking other variables
into account. We also need to identify
the states by name.

Roger Bivand

>
+
>
+
>
+
>
>

cigar <- read.table("Cigar.txt",


header = TRUE)
sn <- read.table("States.txt",
header = TRUE)
cigar$state <- factor(cigar$state,
levels = sn$state, labels = as.character(sn$name))
o <- order(cigar$year, cigar$state)
cigar1 <- cigar[o, ]

Representing spatial data

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Introduction
R-spatial
Representing spatial data
References

Cigarette sales meet spacetime

Using spacetime and a prepared version


of US state boundaries saved as a
shapefile, as well as a constructed time
object of the years of observation taken
from the input file, we can construct a
regular ST object. This time, there are
several variables observed for each state
and year.

Roger Bivand

>
+
>
>
>
>
>
>
+
+

dts0 <- paste(as.character(unique(cigar1$year) +


1900), "-01-01", sep = "")
dts <- as.POSIXct(dts0, format = "%Y-%m-%d")
library(rgdal)
states <- readOGR(".", "Cigar")
row.names(states) <- as.character(states$rn)
library(spacetime)
cigar_st <- STFDF(as(states,
"SpatialPolygons"), dts,
cigar1, endTime = delta(dts))

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Cigarette sales the details by time

The spacetime package makes it


possible to visualize variables through
time and across space; first the temporal
profiles:
> stplot(cigar_st[, , "sales_pc"],
+
mode = "ts", auto.key = list(space = "right",
+
cex = 0.7))

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Cigarette sales the movie


One way to visualise spatio-temporal
data is in movie, or slide show mode. To
do this, we can create an annual series
of thematic maps of sales of cigarette
packs per inhabitant, using the same
class intervals and colour fills (here from
RColorBrewer). The stplot methods
use underlying spplot methods, but if
given an animate= argument, will
continue until stopped with the stop
button or Ctrl-C.

Roger Bivand

> library(RColorBrewer)
> at <- pretty(cigar_st$sales_pc,
+
n = 10)
> cols <- colorRampPalette(brewer.pal(9,
+
"YlOrBr"))(length(at) +
1)
> stplot(cigar_st[, , "sales_pc"],
+
animate = 1, at = at, col.regions = cols,
+
names.attr = format(index(cigar_st@time),
+
format = "%Y"))

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

UK house price data set

On the basis of Holly et al. (2011), we can access house price data, and
ONS CPI data for a shorter period. The file contains an updated dataset
from Nationwide, but running from 1988Q1 to 2011Q1, because CPI only
go back to 1988Q1. We need to transpose the data and reshape it from
wide to long format:
>
>
>
>
>
>
>
+

UKh <- read.csv("UK_house.csv")


UKdef <- log(UKh[, 5:17] * (100/UKh$CPI))
tUKdef <- as.data.frame(t(as.matrix(UKdef)))
row.names(tUKdef) <- names(UKdef)
names(tUKdef) <- make.names(paste(UKh$YEAR, "Q", UKh$QUARTER, sep = ""))
tUKdef$nms <- names(UKdef)
dt1 <- reshape(tUKdef, varying = list(1:93), direction = "long", v.names = "logDefValue",
ids = tUKdef$nms)

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Building an

STFDF

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

object

Once the data are organised, we can add the time object, again not
properly formatted yet, and region borders these are UK NUTS1
regions from a GISCO shapefile, with the exception of the NUTS2 split
between inner and outer London; again we build an STFDF object:
> library(zoo)
> dts <- as.yearqtr(paste(UKh$YEAR, " Q", UKh$QUARTER, sep = ""), "%Y Q%q")
> table(delta(dts) - as.POSIXct(dts))
90 91 92
17 29 47
> UKregs <- readOGR(".", "UK_regions", verbose = FALSE)
> row.names(UKregs) <- as.character(UKregs$rn)
> t2 <- STFDF(sp = as(UKregs, "SpatialPolygons"), time = dts, data = dt1,
+
endTime = delta(dts))

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

Olinda Hansens Disease data set


Spatial objects
Raster data
Using Spatial family objects
Spatio-temporal data

Visualizing UK house prices

Following the same procedure, we can


look at price trends by region, and at an
animation of the series; subsetting is
possible:
> stplot(t2[, , "logDefValue"],
+
mode = "ts", auto.key = list(space = "right",
+
cex = 0.7))
> at <- pretty(t2$logDefValue,
+
n = 10)
> cols <- colorRampPalette(brewer.pal(9,
+
"Reds"))(length(at) - 1)
> stplot(t2[, , "logDefValue"],
+
animate = 2, at = at, col.regions = cols)

Roger Bivand

Representing spatial data

Introduction
R-spatial
Representing spatial data
References

References

Dover, D. C. and Schopflocher, D. P. (2011). Using funnel plots in public health surveillance. Population Health
Metrics, 9(58):111.

Roger Bivand

Representing spatial data

Você também pode gostar