R Workshop Spatial 032609

R workshop: Introduction to geographic mapping and spatial analysis with R
Christopher Moore
http://umn.edu/~moor0554
March 26, 2009
Introduction
Findings from literature review
I have been examining ways that applied educational researchers can make better use of geographic mapping and spatial analysis. Some promising uses of spatial methods include:

Promote participation of evaluation stakeholders. Plan and implement surveys. Conduct cluster randomized trials (randomly assign areas to treatment conditions). Implement quasi-experimental studies. Spatially reference data and join covariates to enhance primary data and minimize respondent burden. Employ spatial (and spatio-temporal) statistical analysis. Disseminating information in statistical maps to promote comprehension and inuence.
Some risks include:

Maps are inherently inaccurate and prone to mislead. Mere visual decoration and distraction. Violation of participants privacy. Spatial autocorrelation complicates statistical analyses. Everything is related to everything else, but near things are more related than distant things. -Toblers (1970) rst law of geography
* *
Larger sample sizes required for statistical power Spatially naive models can yield biased estimates when an important spatially lagged term is omitted
Some ways to mitigate risks:
Create high quality maps that avoid misleading readers. Keep them simple/uncluttered and accurate when printed in greyscale. Carefully choose measures to display in statistical maps (e.g., per capita values instead of raw values).
Assess and account for spatial statistical dependencies.
Spatial data types include:
A point is a single location, such as a global positioning system (GPS) satellite reading or a street address pinpointed (i.e., geocoded) to a unique location. A line is a series of straight line segments that connect a set of ordered points. A polygon is an area enclosed by a set of lines, possibly containing holes (e.g., a polygon in the shape of a donut); also described as areal. A grid is a collection of points or rectangular areas organized in a regular fashion; also described as raster or lattice.
Resources for spatial analysis

http://cran.r-project.org/web/views/Spatial.html CRAN spatial task view https://stat.ethz.ch/pipermail/r-sig-geo/ R Sig Geo list archives http://www.lib.umn.edu/get/springerebooks Applied Spatial Data Analysis with R by Bivand, Pebesma,
and Gomez-Rubio available in PDF format with UMN account

http://www.asdar-book.org/ Bivand, Pebesma, and Gomez-Rubios book website http://www.edwardtufte.com/tufte/ Minimize your ink to information ratio. http://www.stat.columbia.edu/~gelman/blog/ Gelman, a statistician who uses uses R for maps http://geography.uoregon.edu/bartlein/courses/geog417/index_bak.html Lots of R examples from Bartleins
spatial analysis course

http://www.colorado.edu/geography/gcraft/notes/mapproj/mapproj.html http://www.colorbrewer.org
Demonstration
Getting started
Install libraries required for this demonstration:

maptools rgdal maps RColorBrewer classInt geoR nlme psych car MBA 2
lattice spdep
Open a recording window and save the default graphical parameters. > windows(record = T) > oldpar <- par(no.readonly = T)
Projections
Maps project a spherical surface onto a plane, which causes distortion. The larger your area of interest, the more distortion. Over large areas, great circle distance (along curvature of earth) is more appropriate than Euclidean distance. Assign boundaries to a map object. The maps library contains a large database of mapping coordinates. > library(maps) > world <- map("world", plot = F) > head(world$names) [1] [2] [3] [4] [5] [6] "Canada" "South Africa" "Denmark" "Great Lakes:Superior, Huron, Michigan" "USSR" "Pakistan"
> states <- map("state", plot = F) > head(states$names) [1] "alabama" [5] "colorado" "arizona" "arkansas" "connecticut" "california"
Convert boundaries to polygons and apply longlat projection. > library(maptools) > world <- map2SpatialLines(world, proj4string = CRS("+proj=longlat")) > states <- map2SpatialLines(states, proj4string = CRS("+proj=longlat")) Apply equal area projection. > library(rgdal) > world.laea <- spTransform(world, CRS("+proj=laea +lat_0=0 +lon_0=0")) > states.laea <- spTransform(states, CRS("+proj=laea +lat_0=43.0758 +lon_0=-89.3976")) Plot countries and states with dierent projections. > > > > > > > > > + par(mfrow = c(2, 2), pty = "s", cex.axis = 0.5) plot(world, axes = T) title(main = "Longitude and\nLatitude") plot(world.laea, axes = T) title(main = "Lambert Azimuthal\nEqual Area") plot(states, axes = T) title(main = "Longitude and\nLatitude") plot(states.laea, axes = T) title(main = "Lambert Azimuthal\nEqual Area", sub = "Minneapolis perspective")
Longitude and Latitude

100N 1e+07 1e+07 0e+00 5e+06
Lambert Azimuthal Equal Area
200S
100S
100W
100E
160W
1e+07
0e+00
5e+06
1e+07
Longitude and Latitude

60N 1e+06 2e+06 1e+06 0e+00 2e+06
Lambert Azimuthal Equal Area
20N
30N
40N
50N
120W 110W 100W 90W 80W 70W
3e+06 2e+06 1e+06
0e+00
1e+06
Minneapolis perspective
Restore default graphic parameters. > par(oldpar)
Spatial referencing
Join attributes to areas (i.e., mean SAT scores to state polygons). Assign lled areas of contiguous states to an object. > map.states <- map("state", plot = F, fill = T) Examine and process names of areas. Apply a keep rst element function over the list of split state names. Note one name per polygon area (i.e., states and islands), not per state. > list.names.states <- strsplit(map.states$names, + ":") > tail(list.names.states) [[1]] [1] "washington" [[2]] [1] "washington" "orcas island"
"whidbey island"
[[3]] [1] "washington" "main" [[4]] [1] "west virginia" [[5]] [1] "wisconsin" [[6]] [1] "wyoming" > map.IDs <- sapply(list.names.states, function(x) x[1]) > tail(map.IDs) [1] "washington" "washington" [4] "west virginia" "wisconsin" "washington" "wyoming"
Convert boundaries to SpatialPolygons object and apply longlat projection. The SpatialPolygons object contains just spatial information (i.e., no attributes no data frame). > states <- map2SpatialPolygons(map.states, IDs = map.IDs, + proj4string = CRS("+proj=longlat")) > summary(states) Object of class SpatialPolygons Coordinates: min max r1 -124.68134 -67.00742 r2 25.12993 49.38323 Is projected: FALSE proj4string : [+proj=longlat +ellps=WGS84] > plot(states) Note one name per state after converting to SpatialPolygons object. > sp.IDs <- sapply(slot(states, "polygons"), function(x) slot(x, + "ID")) > tail(sp.IDs) [1] "vermont" "virginia" [4] "west virginia" "wisconsin" Download SAT scores, sorted by math. > download.file("http://blog.lib.umn.edu/moor0554/canoemoore/sat.csv", + destfile = "sat.txt") > sat <- read.csv("sat.txt", stringsAsFactors = F, + row.names = 1) > head(sat) north dakota iowa minnesota wisconsin south dakota illinois name.abbrev verbal math takers.pct nd 594 605 5 iowa 594 598 5 minn 586 598 9 wis 584 595 7 sd 585 588 4 ill 569 585 12 5 "washington" "wyoming"
Use SpatialPolygonsDataFrame() function to join data to areas/location. Note that row names of data frame MUST match IDs of SpatialPolygons object. Data will be sorted by polygon IDs; non-matching cases (Alaska, Hawaii, and USA rows) will be dropped). > states.sat <- SpatialPolygonsDataFrame(states, + sat) > summary(states.sat) Object of class SpatialPolygonsDataFrame Coordinates: min max r1 -124.68134 -67.00742 r2 25.12993 49.38323 Is projected: FALSE proj4string : [+proj=longlat +ellps=WGS84] Data attributes: name.abbrev verbal math Length:49 Min. :479.0 Min. :475.0 Class :character 1st Qu.:504.0 1st Qu.:503.0 Mode :character Median :527.0 Median :526.0 Mean :533.8 Mean :533.7 3rd Qu.:563.0 3rd Qu.:558.0 Max. :594.0 Max. :605.0 takers.pct Min. : 4.00 1st Qu.: 9.00 Median :32.00 Mean :36.43 3rd Qu.:65.00 Max. :80.00 > head(states.sat@data) name.abbrev verbal math takers.pct ala 561 555 9 ariz 524 525 34 ark 563 556 6 calif 497 514 49 colo 536 540 32 conn 510 509 80
alabama arizona arkansas california colorado connecticut
Not a statistical map yet. > plot(states.sat)
Write a shapele for use in Quantum GIS, ArcGIS, etc. > writeSpatialShape(states.sat, "sat") > shp.sat <- readShapeSpatial("sat.shp", proj4string = CRS("+proj=longlat")) > proj4string(shp.sat) [1] " +proj=longlat +ellps=WGS84"
Examine attributes in SpatialPolygonsDataFrame

Descriptive statistics > library(psych) > describe(states.sat@data[, -1], skew = F)[, -c(1, + 6, 7, 10, 11)] n mean sd median min max verbal 49 533.82 33.21 527 479 594 math 49 533.73 35.26 526 475 605 takers.pct 49 36.43 28.04 32 4 80 Scatterplot matrix
> library(car) > scatterplot.matrix(data.frame(states.sat@data[, + -c(1, 5)]), smooth = F)
480
520
560
600
q q qq q qq q q q q q qq q q q q
verbal
q q q q q q qqq q q q q qq qq qq q q
qq qq q q q
q q
q q q q q qq q q q q q qq qq q q q q q q q q
| | ||||||| || |||||| | ||| | || | |||| |||| ||||| | ||| | 600

q q q q q q
q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q qq qq q q q
math
q q q q q q q q q q q q q qq q q q q qq
560
q q q q q q qq q q q q q q q q q qq q q q q q
520
480
| | | |||| | | || | | | ||| || ||| || | | | | | | | | |

q q q q q q q q q
q q q
q q q
q q
q q
q q
q q q q q q q q q qq q qq qq q q q q q q q
q q
q q q qq q q q qq q q q q qq q q q q q qq
|||||||||| | | | | | | 20 40
|| || || | | || | | ||| | 60 80
480
520
560
Create choropleth maps

Nice looking color palettes especially for thematic maps. > library(RColorBrewer) > display.brewer.all()
20
40
60
qq q qq q q q q q q
qq q qq q q q q qq
takers.pct
80
480
q q
520
q q
560
qq q qq q q q q q q qq q qq q
YlOrRd YlOrBr YlGnBu YlGn Reds RdPu Purples PuRd PuBuGn PuBu OrRd Oranges Greys Greens GnBu BuPu BuGn Blues Set3 Set2 Set1 Pastel2 Pastel1 Paired Dark2 Accent Spectral RdYlGn RdYlBu RdGy RdBu PuOr PRGn PiYG BrBG
Create ordinal categories/classes separated by quantiles. > > > > > library(classInt) plotvar <- states.sat$verbal nclr <- 5 plotclr <- brewer.pal(nclr, "Greys") plotclr
[1] "#F7F7F7" "#CCCCCC" "#969696" "#636363" "#252525" > class <- classIntervals(plotvar, nclr, style = "quantile") > class style: quantile one of 101,270 possible partitions of this variable into 5 classes under 498.6 498.6 - 515.2 515.2 - 545.8 545.8 - 567.4 10 10 9 10 over 567.4 10 > colcode <- findColours(class, plotclr, digits = 3) > colcode
[1] "#636363" "#969696" "#636363" "#F7F7F7" "#969696" [6] "#CCCCCC" "#CCCCCC" "#F7F7F7" "#CCCCCC" "#F7F7F7" [11] "#969696" "#252525" "#F7F7F7" "#252525" "#252525" [16] "#636363" "#636363" "#CCCCCC" "#CCCCCC" "#CCCCCC" [21] "#636363" "#252525" "#636363" "#252525" "#969696" [26] "#252525" "#CCCCCC" "#969696" "#F7F7F7" "#636363" [31] "#F7F7F7" "#F7F7F7" "#252525" "#969696" "#636363" [36] "#969696" "#F7F7F7" "#CCCCCC" "#F7F7F7" "#252525" [41] "#636363" "#F7F7F7" "#252525" "#CCCCCC" "#CCCCCC" [46] "#969696" "#969696" "#252525" "#636363" attr(,"palette") [1] "#F7F7F7" "#CCCCCC" "#969696" "#636363" "#252525" attr(,"table") under 499 499 - 515 515 - 546 546 - 567 over 567 10 10 9 10 10 A very simple statistical map... > plot(states.sat, col = colcode)
Better looking and more informative...
10
> plotclr <- brewer.pal(nclr, "Purples") > class <- classIntervals(plotvar, nclr, style = "quantile") > colcode <- findColours(class, plotclr, digits = 3) > plot(states.sat, col = colcode, border = "grey", + axes = T) > title(main = "SAT math scores in 1999") > legend("bottomleft", legend = names(attr(colcode, + "table")), fill = attr(colcode, "palette"))
SAT math scores in 1999
30N
40N
50N
under 499 499 515 515 546 546 567 over 567 120W 110W 100W 90W 80W 70W
Add labels (use only if necessary because it might over-clutter your maps)... > + > > + > > plot(states.sat, col = colcode, border = "grey", axes = T) title(main = "SAT math scores in 1999") legend("bottomleft", legend = names(attr(colcode, "table")), fill = attr(colcode, "palette")) centroids <- coordinates(states.sat) text(centroids, states.sat$name.abbrev)
20N
11
50N
wash
mont
nd sd
ore idaho 40N
30N
maine wis vt nh mich ny wyo mass conn ri neb iowa pa nj ohio ill ind nev utah colo md dela wvad.c. kan mo ky va calif nc okla ark tenn ariz nm sc ala ga miss texas la fla under 499 499 515 515 546 546 567 over 567
minn
20N
120W
110W
100W
90W
80W
70W
Diverging colors can point out extremes but will mislead readers when printed in black and white. > > > > + > > + plotclr <- brewer.pal(nclr, "RdBu") class <- classIntervals(plotvar, nclr, style = "quantile") colcode <- findColours(class, plotclr, digits = 3) plot(states.sat, col = colcode, border = "grey", axes = T) title(main = "SAT math scores in 1999") legend("bottomleft", legend = names(attr(colcode, "table")), fill = attr(colcode, "palette"))
12
30N
40N
50N
Qualitative colors are inappropriate for ordinal categories. > > > > + > > + plotclr <- brewer.pal(nclr, "Set1") class <- classIntervals(plotvar, nclr, style = "quantile") colcode <- findColours(class, plotclr, digits = 3) plot(states.sat, col = colcode, border = "grey", axes = T) title(main = "SAT math scores in 1999") legend("bottomleft", legend = names(attr(colcode, "table")), fill = attr(colcode, "palette"))
20N
13
30N
40N
50N
Spatial dependence
Create a spatial proximity matrix after removing Washington DC. > > + > > library(spdep) states.sat <- states.sat[!(row.names(slot(states.sat, "data")) == "district of columbia"), ] nb.states.sat <- poly2nb(states.sat, row.names = rownames(states.sat@data)) listw.states.sat <- nb2listw(nb.states.sat) Examine correlation between observatons and rst order lag. > lag.states.sat <- lag.listw(listw.states.sat, + states.sat$verbal) > cor.test(states.sat$verbal, lag.states.sat) Pearson's product-moment correlation data: states.sat$verbal and lag.states.sat t = 9.0889, df = 46, p-value = 7.758e-12 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.6698548 0.8842286 14
20N
sample estimates: cor 0.8014503 Examine Morans I between observatons and rst order lag. > moran.plot(states.sat$verbal, listw.states.sat) Potentially influential observations of lm(formula = wx ~ x) : dfb.1_ dfb.x dffit cov.r cook.d hat indiana 0.53 -0.50 0.66_* 0.79_* 0.19 0.05 iowa -0.10 0.10 0.12 1.14_* 0.01 0.09 north dakota 0.00 0.00 0.00 1.15_* 0.00 0.09 texas 0.73 -0.69 0.89_* 0.64_* 0.31 0.05
580
q iowa q
spatially lagged states.sat$verbal
north dakota q q
q q q q q q q q q
560
texas q indiana q
540
q q q q q q q q qq q
qq
q qq
520
q q
q qq q
500
q q
480
500
520
540
560
580
states.sat$verbal
Examine Morans I between observatons and lags over several orders. > plot.spcor(sp.correlogram(nb.states.sat, states.sat$verbal, + order = 5, method = "I"), xlab = "Spatial lags", + main = "Spatial correlogram: Autocorrelation CIs")
15
Spatial correlogram: Autocorrelation CIs
Moran's I
0.5 1
0.0
0.5
3 lags
Plot the neighborhood structure. > plot(states.sat) > plot(nb.states.sat, coordinates(states.sat), col = "blue", + lwd = 2, add = T) > title(main = "State borders and contiguous neighborhood structure")
16
State borders and contiguous neighborhood structure
q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q qq q
q q
> plot(nb.states.sat, coordinates(states.sat), col = "blue", + lwd = 2) > title(main = "Contiguous neighborhood structure")
17
Contiguous neighborhood structure
q q q
q q
q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q qq
q q
Use lagged values to plot a smoothed map. > > > > > > + > > + > > > > > + > + > + > par(mfrow = c(1, 2), pty = "s", cex = 0.5) plotvar <- states.sat$verbal plotclr <- brewer.pal(nclr, "Purples") class <- classIntervals(plotvar, nclr, style = "quantile") colcode <- findColours(class, plotclr, digits = 3) plot(states.sat, col = colcode, border = "grey", axes = T) title(main = "SAT math scores in 1999") legend("bottomleft", legend = names(attr(colcode, "table")), fill = attr(colcode, "palette")) plotvar <- lag.states.sat plotclr <- brewer.pal(nclr, "Purples") class <- classIntervals(plotvar, nclr, style = "quantile") colcode <- findColours(class, plotclr, digits = 3) plot(states.sat, col = colcode, border = "grey", axes = T) title(main = "Smoothed map of SAT math scores in 1999", sub = "Mean of contiguous neighbors") legend("bottomleft", legend = names(attr(colcode, "table")), fill = attr(colcode, "palette")) par(oldpar) 18

60N 60N
Smoothed map of SAT math scores in 1999
50N
40N
30N
30N
40N
50N
20N
20N
Mean of contiguous neighbors
Spatial modeling
First, a spatially naive model. See A close look at the spatial structure implied by the CAR and SAR models by Melanie Wall (2004). > lm.states.sat <- lm(verbal ~ takers.pct + I(takers.pct^2), + data.frame(states.sat@data)) > summary(lm.states.sat) Call: lm(formula = verbal ~ takers.pct + I(takers.pct^2), data = data.frame(states.sat@data)) Residuals: Min 1Q -22.0256 -8.0468 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 590.264640 4.062224 145.306 < 2e-16 *** takers.pct -2.881813 0.313574 -9.190 6.84e-12 *** I(takers.pct^2) 0.023260 0.003933 5.914 4.19e-07 *** ---
Median 0.2721
3Q 6.1125
Max 22.1333
19
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1
' '
Residual standard error: 11.52 on 45 degrees of freedom Multiple R-squared: 0.8836, Adjusted R-squared: 0.8784 F-statistic: 170.8 on 2 and 45 DF, p-value: < 2.2e-16 Now, a spatial lag model. > lm.lag.states.sat <- lm(verbal ~ lag.states.sat + + takers.pct + I(takers.pct^2), data.frame(states.sat@data)) > summary(lm.lag.states.sat) Call: lm(formula = verbal ~ lag.states.sat + takers.pct + I(takers.pct^2), data = data.frame(states.sat@data)) Residuals: Min 1Q -22.380 -5.404 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 393.176311 54.203853 7.254 4.86e-09 *** lag.states.sat 0.351215 0.096379 3.644 0.000705 *** takers.pct -2.631304 0.286313 -9.190 8.45e-12 *** I(takers.pct^2) 0.023235 0.003486 6.665 3.54e-08 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 10.21 on 44 degrees of freedom Multiple R-squared: 0.9106, Adjusted R-squared: 0.9045 F-statistic: 149.4 on 3 and 44 DF, p-value: < 2.2e-16 A simultaneous autoregressive model > listw.states.sat <- nb2listw(nb.states.sat) > lm.sar.states.sat <- spautolm(verbal ~ takers.pct + + I(takers.pct^2), data.frame(states.sat@data), + listw.states.sat) > summary(lm.sar.states.sat) Call: spautolm(formula = verbal ~ takers.pct + I(takers.pct^2), data = data.frame(states.sat@data), listw = listw.states.sat) Residuals: Min 1Q -17.8535 -7.8846 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 584.766102 5.163588 113.2480 < 2.2e-16 takers.pct -2.501822 0.308423 -8.1117 4.441e-16 I(takers.pct^2) 0.019276 0.004135 4.6616 3.137e-06
Median 1.013
3Q 7.489
Max 23.734
Median 1.3722
3Q 7.4526
Max 20.3795
20
Lambda: 0.62138 LR test value: 11.4 p-value: 0.00073457 Log likelihood: -178.1777 ML residual variance (sigma squared): 87.458, (sigma: 9.3519) Number of observations: 48 Number of parameters estimated: 5 AIC: 366.36 A linear mixed eects model. An empirical variogram of residuals can be used to as a diagnostic tool for assessing the adequacy of a selected model for the covariance between nearby observatons in time and space (Fitzmaurice, Laird, and Ware, pg. 241). The tted line suggests exponential correlation structure in this example. > library(geoR) > variog.lm.states.sat <- variog(coords = coordinates(states.sat), + data = resid(lm.states.sat)) variog: computing omnidirectional variogram > plot(variog.lm.states.sat) > variofit.lm.states.sat <- variofit(variog.lm.states.sat) variofit: weights used: npairs variofit: minimisation function used: optim variofit: searching for best initial value ... selected values: sigmasq phi tausq kappa initial.value "109.2" "15.79" "54.6" "0.5" status "est" "est" "est" "fix" loss value: 760324.941711764 > lines(variofit.lm.states.sat)
21
200
150
q q q q q q q q q
semivariance
100
0 0
50
10
20 distance
30
40
50
Following Bivand, Pebesma, and Gomez-Rubio, pg. 287: > > > > + > + > + > + + > library(nlme) df.centroids <- data.frame(coordinates(states.sat)) names(df.centroids) <- c("long", "lat") corExp.states.sat <- corSpatial(1, form = ~"long" + "lat", type = "e") corExp.states.sat <- Initialize(corExp.states.sat, df.centroids) states.sat@data <- data.frame(states.sat, key = row(states.sat@data)[, 1]) lme.states.sat <- lme(verbal ~ takers.pct + I(takers.pct^2), random = ~1 | key, data = data.frame(states.sat), corr = corExp.states.sat) summary(lme.states.sat)
Linear mixed-effects model fit by REML Data: data.frame(states.sat) AIC BIC logLik 390.0037 400.8437 -189.0019 Random effects: Formula: ~1 | key 22
StdDev:
(Intercept) Residual 10.80869 3.986240 spatial correlation
Correlation Structure: Exponential Formula: ~"long" + "lat" | key Parameter estimate(s): range 0.04861126 Fixed effects: verbal ~ takers.pct Value Std.Error (Intercept) 590.2646 4.062224 takers.pct -2.8818 0.313574 I(takers.pct^2) 0.0233 0.003933 Correlation: (Intr) tkrs.p takers.pct -0.830 I(takers.pct^2) 0.742 -0.981
+ I(takers.pct^2) DF t-value p-value 45 145.30577 0 45 -9.19020 0 45 5.91415 0
Standardized Within-Group Residuals: Min Q1 Med -0.661548347 -0.241688139 0.008172693 Max 0.664782514 Number of Observations: 48 Number of Groups: 48
Q3 0.183590053
Geocoding
Geocode Big Ten schools. Assign geocoding service prex. > URL <- "http://maps.google.com/maps/geo?q=" Assign locations to a vector. > addresses <- c("801 South Wright Street, Champaign IL", + "107 S. Indiana Ave., Bloomington IN", "1 W Prentiss St, Iowa City IA", + "07 Fletcher St, Ann Arbor MI", "130 E. Elizabeth St, East Lansing MI", + "100 Church Street SE, Minneapolis MN", "1820 Chicago Ave, Evanston IL", + "275 W Woodruff Ave, Columbus OH", "243 S Allen St, State College PA", + "501 Hayes Street, West Lafayette IN", "716 Langdon St, Madison WI") Replace spaces and commas to create geocoding URLs. > addresses <- gsub(" ", "+", addresses) > addresses <- gsub(",", "%2C", addresses) > addresses [1] [2] [3] [4] [5] [6] [7] [8] "801+South+Wright+Street%2C+Champaign+IL" "107+S.+Indiana+Ave.%2C+Bloomington+IN" "1+W+Prentiss+St%2C+Iowa+City+IA" "07+Fletcher+St%2C+Ann+Arbor+MI" "130+E.+Elizabeth+St%2C+East+Lansing+MI" "100+Church+Street+SE%2C+Minneapolis+MN" "1820+Chicago+Ave%2C+Evanston+IL" "275+W+Woodruff+Ave%2C+Columbus+OH" 23
[9] "243+S+Allen+St%2C+State+College+PA" [10] "501+Hayes+Street%2C+West+Lafayette+IN" [11] "716+Langdon+St%2C+Madison+WI" Create a blank data frame to be lled with coordinates. > coords <- data.frame(t(rep(NA, 3))) > colnames(coords) <- c("accuracy", "lat", "long") Loop through addresses, geocode each one, and ll data frame with coordinates. > tmp <- tempfile() > for (i in 1:length(addresses)) { + download.file(paste(URL, addresses[i], "&output=csv", + sep = ""), destfile = tmp) + coords[i, ] <- read.csv(tmp, header = F)[-1] + } Make sure the geocodes are suciently accurate before nalizing geocoded data frame. Accuracy codes:
http://code.google.com/apis/maps/documentation/geocoding/index.html#GeocodingAccuracy
> cbind(addresses, coords) addresses accuracy 1 801+South+Wright+Street%2C+Champaign+IL 8 2 107+S.+Indiana+Ave.%2C+Bloomington+IN 8 3 1+W+Prentiss+St%2C+Iowa+City+IA 8 4 07+Fletcher+St%2C+Ann+Arbor+MI 6 5 130+E.+Elizabeth+St%2C+East+Lansing+MI 8 6 100+Church+Street+SE%2C+Minneapolis+MN 8 7 1820+Chicago+Ave%2C+Evanston+IL 8 8 275+W+Woodruff+Ave%2C+Columbus+OH 8 9 243+S+Allen+St%2C+State+College+PA 8 10 501+Hayes+Street%2C+West+Lafayette+IN 8 11 716+Langdon+St%2C+Madison+WI 8 lat long 1 40.10796 -88.22893 2 39.16648 -86.52689 3 41.65444 -91.53623 4 42.27976 -83.73714 5 42.73823 -84.48285 6 44.97590 -93.23491 7 42.05049 -87.67802 8 40.00393 -83.01684 9 40.79319 -77.85999 10 40.43007 -86.91162 11 43.07582 -89.39763
Point-level maps
Join attributes to points (i.e., Big Ten stadium capacity to stadium locations). Create attributes data frame. > attributes <- data.frame(stadium.cap = c(62872, + 52180, 70585, 107501, 75005, 50300, 49256, + 102329, 107282, 62500, 80321)) 24
> + + > + + > + + > + + + + +
attributes$win.pct <- 100 * c(0.34, 0.25, 0.512, 0.727, 0.465, 0.32, 0.422, 0.777, 0.625, 0.473, 0.59) attributes$under.grads <- c(30895, 30394, 20907, 26083, 36072, 28645, 8284, 39209, 36612, 31290, 28999) attributes$pop <- c(75254, 183733, 147038, 341847, 528193, 3175041, 74239 + 0.5 * 8711000, 1754337, 135758, 182821, 543022) rownames(attributes) <- c("University Of Illinois", "Indiana University", "University of Iowa", "University of Michigan", "Michigan State University", "University of Minnesota", "Northwestern University", "Ohio State University", "Penn State University", "Purdue University", "University of Wisconsin")
Convert data frame to a spatial object with longlat projection. Note that longitude is x and latitude is y, requiring reversal of order in data frame above. The proj4string command indicates no projection. > > > > library(maptools) coordinates(coords) <- c("long", "lat") big.ten <- SpatialPointsDataFrame(coords, attributes) big.ten stadium.cap 62872 52180 70585 107501 75005 50300 49256 102329 107282 62500 80321 pop 75254 183733 147038 341847 528193 3175041 4429739 1754337 135758 182821 543022
coordinates (-88.2289, 40.108) (-86.5269, 39.1665) (-91.5362, 41.6544) (-83.7371, 42.2798) (-84.4828, 42.7382) (-93.2349, 44.9759) (-87.678, 42.0505) (-83.0168, 40.0039) (-77.86, 40.7932) (-86.9116, 40.4301) (-89.3976, 43.0758) win.pct under.grads University Of Illinois 34.0 30895 Indiana University 25.0 30394 University of Iowa 51.2 20907 University of Michigan 72.7 26083 Michigan State University 46.5 36072 University of Minnesota 32.0 28645 Northwestern University 42.2 8284 Ohio State University 77.7 39209 Penn State University 62.5 36612 Purdue University 47.3 31290 University of Wisconsin 59.0 28999 University Of Illinois Indiana University University of Iowa University of Michigan Michigan State University University of Minnesota Northwestern University Ohio State University Penn State University Purdue University University of Wisconsin > class(big.ten) [1] "SpatialPointsDataFrame" attr(,"package") [1] "sp" > library(rgdal) > proj4string(big.ten) 25
[1] NA Compare maps before and after applying longlat projection. > + > > > > par(mfrow = c(1, 2), 7)) plot(big.ten, axes = proj4string(big.ten) plot(big.ten, axes = par(oldpar) pty = "s", lab = c(5, 3, T) <- CRS("+proj=longlat") T)
45N
40N
90W 85W 80W
40N 90W 85W 80W

26
Make a presentable map with varying symbol sizes. Assign contiguous state boundaries to an object. Convert boundaries to SpatialLines object and apply longlat projection. > > > > + > > > + library(maps) states <- map("state", plot = F) states <- map2SpatialLines(states, proj4string = CRS("+proj=longlat")) plot(big.ten, axes = T, col = "white", xlim = c(min(big.ten@coords[, 1]) * 1.025, max(big.ten@coords[, 1]) * 0.975)) plot(states, add = T, col = "grey") plot(big.ten, axes = T, add = T) title(main = "Big Ten schools", xlab = "Longitude", ylab = "Latitude")
45N
> points(big.ten, cex = big.ten$stadium.cap/10000) > text(big.ten, labels = rownames(big.ten@data), + cex = 0.5, pos = 3)
Big Ten schools

48N 46N
University of Minnesota
44N
University of Wisconsin Michigan State University University of Michigan
Latitude
42N
Northwestern University University of Iowa
Penn State University
40N
Purdue University University Of Illinois Indiana University
Ohio State University
36N 95W
38N
90W
85W Longitude
80W
Make a symbols map. > + > > + > + plot(big.ten, axes = T, col = "white", xlim = c(min(big.ten@coords[, 1]) * 1.025, max(big.ten@coords[, 1]) * 0.975)) plot(states, add = T, col = "grey") title(main = "Big Ten schools", xlab = "Longitude", ylab = "Latitude") text(big.ten, labels = rownames(big.ten@data), cex = big.ten$stadium.cap/1e+05)
27
Big Ten schools

48N 46N
44N
Latitude
University of Wisconsin
42N
Northwestern University
University of Michigan Penn State University

State University
Michigan State University
University of Iowa Purdue University University Of Illinois Ohio

Indiana University
36N 95W
38N
40N
90W
85W Longitude
80W
Plot surface approximation. > > + + > + + > + > > + library(MBA) obs.surf <- mba.surf(cbind(big.ten@coords[, 1], big.ten@coords[, 2], big.ten$stadium.cap), no.X = 100, no.Y = 100, extend = T)$xyz.est jet.colors <- colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan", "#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000")) image(obs.surf, xaxs = "r", yaxs = "r", main = "Surface approximation", col = jet.colors(200)) points(big.ten, pch = 3) text(big.ten, labels = rownames(big.ten@data), pos = 1)
28
Surface approximation
45 University of Minnesota 44 43
University of Wisconsin Michigan State University University of Michigan Northwestern University University of Iowa
41
42
Penn State University Purdue University University Of IllinoisOhio State University Indiana University 90 85 80 40
> + > > > > + > +
Plot map with surface approximation. plot(big.ten, axes = T, col = "white", xlim = c(min(big.ten@coords[, 1]) * 1.025, max(big.ten@coords[, 1]) * 0.975)) plot(states, add = T, col = "grey") image(obs.surf, col = jet.colors(200), add = T) points(big.ten, pch = 3) title(main = "Big Ten schools", xlab = "Longitude", ylab = "Latitude") text(big.ten, labels = rownames(big.ten@data), cex = 0.5, pos = 3)
39
29
Big Ten schools

48N 46N
44N
University of Wisconsin Michigan State University University of Michigan
Latitude
42N
Northwestern University University of Iowa
Penn State University
40N
Purdue University University Of Illinois Indiana University
Ohio State University
36N 95W
38N
90W
85W Longitude
80W
3D plots
Cloud plot > library(lattice) > print(cloud(big.ten$stadium.cap ~ big.ten@coords[, + 1] * big.ten@coords[, 2], type = c("p", "h"), + xlab = "Longitude", ylab = "Latitude", zlab = "Stadium\ncapacity", + main = "Cloud plot of Big Ten football stadium capacity"))
30
Cloud plot of Big Ten football stadium capacity
Stadium capacity
Latitude
Longitude
3D perspective plot > persp(obs.surf, xlab = "Longitude", ylab = "Latitude", + zlab = "Stadium\ncapacity", theta = 315, phi = 30, + expand = 0.75, col = "lightgreen", shade = 0.75, + border = NA)
31
um Stadi y it capac
La tit ud e Lo n tu gi de
32

R Workshop Spatial 032609

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

R Workshop Spatial 032609

Enviado por

Direitos autorais:

Formatos disponíveis

R workshop: Introduction to geographic mapping and spatial analysis with R

March 26, 2009

Some risks include:

Some ways to mitigate risks:

Assess and account for spatial statistical dependencies.

Spatial data types include:

Resources for spatial analysis

and Gomez-Rubio available in PDF format with UMN account

spatial analysis course

Longitude and Latitude

Lambert Azimuthal Equal Area

Longitude and Latitude

Lambert Azimuthal Equal Area

120W 110W 100W 90W 80W 70W

3e+06 2e+06 1e+06

Restore default graphic parameters. > par(oldpar)

alabama arizona arkansas california colorado connecticut

Not a statistical map yet. > plot(states.sat)

Examine attributes in SpatialPolygonsDataFrame

> library(car) > scatterplot.matrix(data.frame(states.sat@data[, + -c(1, 5)]), smooth = F)

| | ||||||| || |||||| | ||| | || | |||| |||| ||||| | ||| | 600

| | | |||| | | || | | | ||| || ||| || | | | | | | | | |

Create choropleth maps

Better looking and more informative...

SAT math scores in 1999

SAT math scores in 1999

ore idaho 40N

SAT math scores in 1999

SAT math scores in 1999

spatially lagged states.sat$verbal

Spatial correlogram: Autocorrelation CIs

State borders and contiguous neighborhood structure

Contiguous neighborhood structure

SAT math scores in 1999

Smoothed map of SAT math scores in 1999

Mean of contiguous neighbors

0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1

(Intercept) Residual 10.80869 3.986240 spatial correlation

+ I(takers.pct^2) DF t-value p-value 45 145.30577 0 45 -9.19020 0 45 5.91415 0

> + + > + + > + + > + + + + +

90W 85W 80W

40N 90W 85W 80W

Big Ten schools

University of Wisconsin Michigan State University University of Michigan

Northwestern University University of Iowa

Penn State University

Purdue University University Of Illinois Indiana University

Ohio State University

Big Ten schools

University of Michigan Penn State University

Michigan State University

University of Iowa Purdue University University Of Illinois Ohio

Big Ten schools

University of Wisconsin Michigan State University University of Michigan

Northwestern University University of Iowa

Penn State University

Purdue University University Of Illinois Indiana University

Ohio State University

Cloud plot of Big Ten football stadium capacity

Você também pode gostar

0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1