Você está na página 1de 32

R workshop: Introduction to geographic mapping and spatial analysis with R

Christopher Moore
http://umn.edu/~moor0554

March 26, 2009

Introduction
Findings from literature review
I have been examining ways that applied educational researchers can make better use of geographic mapping and spatial analysis. Some promising uses of spatial methods include:

Promote participation of evaluation stakeholders. Plan and implement surveys. Conduct cluster randomized trials (randomly assign areas to treatment conditions). Implement quasi-experimental studies. Spatially reference data and join covariates to enhance primary data and minimize respondent burden. Employ spatial (and spatio-temporal) statistical analysis. Disseminating information in statistical maps to promote comprehension and inuence.

Some risks include:


Maps are inherently inaccurate and prone to mislead. Mere visual decoration and distraction. Violation of participants privacy. Spatial autocorrelation complicates statistical analyses. Everything is related to everything else, but near things are more related than distant things. -Toblers (1970) rst law of geography
* *

Larger sample sizes required for statistical power Spatially naive models can yield biased estimates when an important spatially lagged term is omitted

Some ways to mitigate risks:

Create high quality maps that avoid misleading readers. Keep them simple/uncluttered and accurate when printed in greyscale. Carefully choose measures to display in statistical maps (e.g., per capita values instead of raw values).

Assess and account for spatial statistical dependencies.

Spatial data types include:

A point is a single location, such as a global positioning system (GPS) satellite reading or a street address pinpointed (i.e., geocoded) to a unique location. A line is a series of straight line segments that connect a set of ordered points. A polygon is an area enclosed by a set of lines, possibly containing holes (e.g., a polygon in the shape of a donut); also described as areal. A grid is a collection of points or rectangular areas organized in a regular fashion; also described as raster or lattice.

Resources for spatial analysis


http://cran.r-project.org/web/views/Spatial.html CRAN spatial task view https://stat.ethz.ch/pipermail/r-sig-geo/ R Sig Geo list archives http://www.lib.umn.edu/get/springerebooks Applied Spatial Data Analysis with R by Bivand, Pebesma,

and Gomez-Rubio available in PDF format with UMN account


http://www.asdar-book.org/ Bivand, Pebesma, and Gomez-Rubios book website http://www.edwardtufte.com/tufte/ Minimize your ink to information ratio. http://www.stat.columbia.edu/~gelman/blog/ Gelman, a statistician who uses uses R for maps http://geography.uoregon.edu/bartlein/courses/geog417/index_bak.html Lots of R examples from Bartleins

spatial analysis course


http://www.colorado.edu/geography/gcraft/notes/mapproj/mapproj.html http://www.colorbrewer.org

Demonstration
Getting started
Install libraries required for this demonstration:

maptools rgdal maps RColorBrewer classInt geoR nlme psych car MBA 2

lattice spdep

Open a recording window and save the default graphical parameters. > windows(record = T) > oldpar <- par(no.readonly = T)

Projections
Maps project a spherical surface onto a plane, which causes distortion. The larger your area of interest, the more distortion. Over large areas, great circle distance (along curvature of earth) is more appropriate than Euclidean distance. Assign boundaries to a map object. The maps library contains a large database of mapping coordinates. > library(maps) > world <- map("world", plot = F) > head(world$names) [1] [2] [3] [4] [5] [6] "Canada" "South Africa" "Denmark" "Great Lakes:Superior, Huron, Michigan" "USSR" "Pakistan"

> states <- map("state", plot = F) > head(states$names) [1] "alabama" [5] "colorado" "arizona" "arkansas" "connecticut" "california"

Convert boundaries to polygons and apply longlat projection. > library(maptools) > world <- map2SpatialLines(world, proj4string = CRS("+proj=longlat")) > states <- map2SpatialLines(states, proj4string = CRS("+proj=longlat")) Apply equal area projection. > library(rgdal) > world.laea <- spTransform(world, CRS("+proj=laea +lat_0=0 +lon_0=0")) > states.laea <- spTransform(states, CRS("+proj=laea +lat_0=43.0758 +lon_0=-89.3976")) Plot countries and states with dierent projections. > > > > > > > > > + par(mfrow = c(2, 2), pty = "s", cex.axis = 0.5) plot(world, axes = T) title(main = "Longitude and\nLatitude") plot(world.laea, axes = T) title(main = "Lambert Azimuthal\nEqual Area") plot(states, axes = T) title(main = "Longitude and\nLatitude") plot(states.laea, axes = T) title(main = "Lambert Azimuthal\nEqual Area", sub = "Minneapolis perspective")

Longitude and Latitude


100N 1e+07 1e+07 0e+00 5e+06

Lambert Azimuthal Equal Area

200S

100S

100W

100E

160W

1e+07

0e+00

5e+06

1e+07

Longitude and Latitude


60N 1e+06 2e+06 1e+06 0e+00 2e+06

Lambert Azimuthal Equal Area

20N

30N

40N

50N

120W 110W 100W 90W 80W 70W

3e+06 2e+06 1e+06

0e+00

1e+06

Minneapolis perspective

Restore default graphic parameters. > par(oldpar)

Spatial referencing
Join attributes to areas (i.e., mean SAT scores to state polygons). Assign lled areas of contiguous states to an object. > map.states <- map("state", plot = F, fill = T) Examine and process names of areas. Apply a keep rst element function over the list of split state names. Note one name per polygon area (i.e., states and islands), not per state. > list.names.states <- strsplit(map.states$names, + ":") > tail(list.names.states) [[1]] [1] "washington" [[2]] [1] "washington" "orcas island"

"whidbey island"

[[3]] [1] "washington" "main" [[4]] [1] "west virginia" [[5]] [1] "wisconsin" [[6]] [1] "wyoming" > map.IDs <- sapply(list.names.states, function(x) x[1]) > tail(map.IDs) [1] "washington" "washington" [4] "west virginia" "wisconsin" "washington" "wyoming"

Convert boundaries to SpatialPolygons object and apply longlat projection. The SpatialPolygons object contains just spatial information (i.e., no attributes no data frame). > states <- map2SpatialPolygons(map.states, IDs = map.IDs, + proj4string = CRS("+proj=longlat")) > summary(states) Object of class SpatialPolygons Coordinates: min max r1 -124.68134 -67.00742 r2 25.12993 49.38323 Is projected: FALSE proj4string : [+proj=longlat +ellps=WGS84] > plot(states) Note one name per state after converting to SpatialPolygons object. > sp.IDs <- sapply(slot(states, "polygons"), function(x) slot(x, + "ID")) > tail(sp.IDs) [1] "vermont" "virginia" [4] "west virginia" "wisconsin" Download SAT scores, sorted by math. > download.file("http://blog.lib.umn.edu/moor0554/canoemoore/sat.csv", + destfile = "sat.txt") > sat <- read.csv("sat.txt", stringsAsFactors = F, + row.names = 1) > head(sat) north dakota iowa minnesota wisconsin south dakota illinois name.abbrev verbal math takers.pct nd 594 605 5 iowa 594 598 5 minn 586 598 9 wis 584 595 7 sd 585 588 4 ill 569 585 12 5 "washington" "wyoming"

Use SpatialPolygonsDataFrame() function to join data to areas/location. Note that row names of data frame MUST match IDs of SpatialPolygons object. Data will be sorted by polygon IDs; non-matching cases (Alaska, Hawaii, and USA rows) will be dropped). > states.sat <- SpatialPolygonsDataFrame(states, + sat) > summary(states.sat) Object of class SpatialPolygonsDataFrame Coordinates: min max r1 -124.68134 -67.00742 r2 25.12993 49.38323 Is projected: FALSE proj4string : [+proj=longlat +ellps=WGS84] Data attributes: name.abbrev verbal math Length:49 Min. :479.0 Min. :475.0 Class :character 1st Qu.:504.0 1st Qu.:503.0 Mode :character Median :527.0 Median :526.0 Mean :533.8 Mean :533.7 3rd Qu.:563.0 3rd Qu.:558.0 Max. :594.0 Max. :605.0 takers.pct Min. : 4.00 1st Qu.: 9.00 Median :32.00 Mean :36.43 3rd Qu.:65.00 Max. :80.00 > head(states.sat@data) name.abbrev verbal math takers.pct ala 561 555 9 ariz 524 525 34 ark 563 556 6 calif 497 514 49 colo 536 540 32 conn 510 509 80

alabama arizona arkansas california colorado connecticut

Not a statistical map yet. > plot(states.sat)

Write a shapele for use in Quantum GIS, ArcGIS, etc. > writeSpatialShape(states.sat, "sat") > shp.sat <- readShapeSpatial("sat.shp", proj4string = CRS("+proj=longlat")) > proj4string(shp.sat) [1] " +proj=longlat +ellps=WGS84"

Examine attributes in SpatialPolygonsDataFrame


Descriptive statistics > library(psych) > describe(states.sat@data[, -1], skew = F)[, -c(1, + 6, 7, 10, 11)] n mean sd median min max verbal 49 533.82 33.21 527 479 594 math 49 533.73 35.26 526 475 605 takers.pct 49 36.43 28.04 32 4 80 Scatterplot matrix

> library(car) > scatterplot.matrix(data.frame(states.sat@data[, + -c(1, 5)]), smooth = F)

480

520

560

600
q q qq q qq q q q q q qq q q q q

verbal
q q q q q q qqq q q q q qq qq qq q q

qq qq q q q

q q

q q q q q qq q q q q q qq qq q q q q q q q q

| | ||||||| || |||||| | ||| | || | |||| |||| ||||| | ||| | 600


q q q q q q

q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q qq qq q q q

math

q q q q q q q q q q q q q qq q q q q qq

560

q q q q q q qq q q q q q q q q q qq q q q q q

520

480

| | | |||| | | || | | | ||| || ||| || | | | | | | | | |


q q q q q q q q q

q q q

q q q

q q

q q

q q

q q q q q q q q q qq q qq qq q q q q q q q

q q

q q q qq q q q qq q q q q qq q q q q q qq

|||||||||| | | | | | | 20 40

|| || || | | || | | ||| | 60 80

480

520

560

Create choropleth maps


Nice looking color palettes especially for thematic maps. > library(RColorBrewer) > display.brewer.all()

20

40

60

qq q qq q q q q q q

qq q qq q q q q qq

takers.pct

80

480

q q

520

q q

560

qq q qq q q q q q q qq q qq q

YlOrRd YlOrBr YlGnBu YlGn Reds RdPu Purples PuRd PuBuGn PuBu OrRd Oranges Greys Greens GnBu BuPu BuGn Blues Set3 Set2 Set1 Pastel2 Pastel1 Paired Dark2 Accent Spectral RdYlGn RdYlBu RdGy RdBu PuOr PRGn PiYG BrBG

Create ordinal categories/classes separated by quantiles. > > > > > library(classInt) plotvar <- states.sat$verbal nclr <- 5 plotclr <- brewer.pal(nclr, "Greys") plotclr

[1] "#F7F7F7" "#CCCCCC" "#969696" "#636363" "#252525" > class <- classIntervals(plotvar, nclr, style = "quantile") > class style: quantile one of 101,270 possible partitions of this variable into 5 classes under 498.6 498.6 - 515.2 515.2 - 545.8 545.8 - 567.4 10 10 9 10 over 567.4 10 > colcode <- findColours(class, plotclr, digits = 3) > colcode

[1] "#636363" "#969696" "#636363" "#F7F7F7" "#969696" [6] "#CCCCCC" "#CCCCCC" "#F7F7F7" "#CCCCCC" "#F7F7F7" [11] "#969696" "#252525" "#F7F7F7" "#252525" "#252525" [16] "#636363" "#636363" "#CCCCCC" "#CCCCCC" "#CCCCCC" [21] "#636363" "#252525" "#636363" "#252525" "#969696" [26] "#252525" "#CCCCCC" "#969696" "#F7F7F7" "#636363" [31] "#F7F7F7" "#F7F7F7" "#252525" "#969696" "#636363" [36] "#969696" "#F7F7F7" "#CCCCCC" "#F7F7F7" "#252525" [41] "#636363" "#F7F7F7" "#252525" "#CCCCCC" "#CCCCCC" [46] "#969696" "#969696" "#252525" "#636363" attr(,"palette") [1] "#F7F7F7" "#CCCCCC" "#969696" "#636363" "#252525" attr(,"table") under 499 499 - 515 515 - 546 546 - 567 over 567 10 10 9 10 10 A very simple statistical map... > plot(states.sat, col = colcode)

Better looking and more informative...

10

> plotclr <- brewer.pal(nclr, "Purples") > class <- classIntervals(plotvar, nclr, style = "quantile") > colcode <- findColours(class, plotclr, digits = 3) > plot(states.sat, col = colcode, border = "grey", + axes = T) > title(main = "SAT math scores in 1999") > legend("bottomleft", legend = names(attr(colcode, + "table")), fill = attr(colcode, "palette"))

SAT math scores in 1999

30N

40N

50N

under 499 499 515 515 546 546 567 over 567 120W 110W 100W 90W 80W 70W

Add labels (use only if necessary because it might over-clutter your maps)... > + > > + > > plot(states.sat, col = colcode, border = "grey", axes = T) title(main = "SAT math scores in 1999") legend("bottomleft", legend = names(attr(colcode, "table")), fill = attr(colcode, "palette")) centroids <- coordinates(states.sat) text(centroids, states.sat$name.abbrev)

20N

11

SAT math scores in 1999

50N

wash

mont

nd sd

ore idaho 40N

30N

maine wis vt nh mich ny wyo mass conn ri neb iowa pa nj ohio ill ind nev utah colo md dela wvad.c. kan mo ky va calif nc okla ark tenn ariz nm sc ala ga miss texas la fla under 499 499 515 515 546 546 567 over 567

minn

20N

120W

110W

100W

90W

80W

70W

Diverging colors can point out extremes but will mislead readers when printed in black and white. > > > > + > > + plotclr <- brewer.pal(nclr, "RdBu") class <- classIntervals(plotvar, nclr, style = "quantile") colcode <- findColours(class, plotclr, digits = 3) plot(states.sat, col = colcode, border = "grey", axes = T) title(main = "SAT math scores in 1999") legend("bottomleft", legend = names(attr(colcode, "table")), fill = attr(colcode, "palette"))

12

SAT math scores in 1999

30N

40N

50N

under 499 499 515 515 546 546 567 over 567 120W 110W 100W 90W 80W 70W

Qualitative colors are inappropriate for ordinal categories. > > > > + > > + plotclr <- brewer.pal(nclr, "Set1") class <- classIntervals(plotvar, nclr, style = "quantile") colcode <- findColours(class, plotclr, digits = 3) plot(states.sat, col = colcode, border = "grey", axes = T) title(main = "SAT math scores in 1999") legend("bottomleft", legend = names(attr(colcode, "table")), fill = attr(colcode, "palette"))

20N

13

SAT math scores in 1999

30N

40N

50N

under 499 499 515 515 546 546 567 over 567 120W 110W 100W 90W 80W 70W

Spatial dependence
Create a spatial proximity matrix after removing Washington DC. > > + > > library(spdep) states.sat <- states.sat[!(row.names(slot(states.sat, "data")) == "district of columbia"), ] nb.states.sat <- poly2nb(states.sat, row.names = rownames(states.sat@data)) listw.states.sat <- nb2listw(nb.states.sat) Examine correlation between observatons and rst order lag. > lag.states.sat <- lag.listw(listw.states.sat, + states.sat$verbal) > cor.test(states.sat$verbal, lag.states.sat) Pearson's product-moment correlation data: states.sat$verbal and lag.states.sat t = 9.0889, df = 46, p-value = 7.758e-12 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.6698548 0.8842286 14

20N

sample estimates: cor 0.8014503 Examine Morans I between observatons and rst order lag. > moran.plot(states.sat$verbal, listw.states.sat) Potentially influential observations of lm(formula = wx ~ x) : dfb.1_ dfb.x dffit cov.r cook.d hat indiana 0.53 -0.50 0.66_* 0.79_* 0.19 0.05 iowa -0.10 0.10 0.12 1.14_* 0.01 0.09 north dakota 0.00 0.00 0.00 1.15_* 0.00 0.09 texas 0.73 -0.69 0.89_* 0.64_* 0.31 0.05

580

q iowa q

spatially lagged states.sat$verbal

north dakota q q
q q q q q q q q q

560

texas q indiana q

540

q q q q q q q q qq q

qq

q qq

520

q q

q qq q

500

q q

480

500

520

540

560

580

states.sat$verbal

Examine Morans I between observatons and lags over several orders. > plot.spcor(sp.correlogram(nb.states.sat, states.sat$verbal, + order = 5, method = "I"), xlab = "Spatial lags", + main = "Spatial correlogram: Autocorrelation CIs")

15

Spatial correlogram: Autocorrelation CIs

Moran's I

0.5 1

0.0

0.5

3 lags

Plot the neighborhood structure. > plot(states.sat) > plot(nb.states.sat, coordinates(states.sat), col = "blue", + lwd = 2, add = T) > title(main = "State borders and contiguous neighborhood structure")

16

State borders and contiguous neighborhood structure

q q q

q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q qq q

q q

> plot(nb.states.sat, coordinates(states.sat), col = "blue", + lwd = 2) > title(main = "Contiguous neighborhood structure")

17

Contiguous neighborhood structure

q q q

q q

q q q

q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q qq

q q

Use lagged values to plot a smoothed map. > > > > > > + > > + > > > > > + > + > + > par(mfrow = c(1, 2), pty = "s", cex = 0.5) plotvar <- states.sat$verbal plotclr <- brewer.pal(nclr, "Purples") class <- classIntervals(plotvar, nclr, style = "quantile") colcode <- findColours(class, plotclr, digits = 3) plot(states.sat, col = colcode, border = "grey", axes = T) title(main = "SAT math scores in 1999") legend("bottomleft", legend = names(attr(colcode, "table")), fill = attr(colcode, "palette")) plotvar <- lag.states.sat plotclr <- brewer.pal(nclr, "Purples") class <- classIntervals(plotvar, nclr, style = "quantile") colcode <- findColours(class, plotclr, digits = 3) plot(states.sat, col = colcode, border = "grey", axes = T) title(main = "Smoothed map of SAT math scores in 1999", sub = "Mean of contiguous neighbors") legend("bottomleft", legend = names(attr(colcode, "table")), fill = attr(colcode, "palette")) par(oldpar) 18

SAT math scores in 1999


60N 60N

Smoothed map of SAT math scores in 1999

50N

40N

30N

under 501 501 519 519 546 546 568 over 568 120W 110W 100W 90W 80W 70W

30N

40N

50N

under 511 511 526 526 538 538 560 over 560 120W 110W 100W 90W 80W 70W

20N

20N

Mean of contiguous neighbors

Spatial modeling
First, a spatially naive model. See A close look at the spatial structure implied by the CAR and SAR models by Melanie Wall (2004). > lm.states.sat <- lm(verbal ~ takers.pct + I(takers.pct^2), + data.frame(states.sat@data)) > summary(lm.states.sat) Call: lm(formula = verbal ~ takers.pct + I(takers.pct^2), data = data.frame(states.sat@data)) Residuals: Min 1Q -22.0256 -8.0468 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 590.264640 4.062224 145.306 < 2e-16 *** takers.pct -2.881813 0.313574 -9.190 6.84e-12 *** I(takers.pct^2) 0.023260 0.003933 5.914 4.19e-07 *** ---

Median 0.2721

3Q 6.1125

Max 22.1333

19

Signif. codes:

0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1

' '

Residual standard error: 11.52 on 45 degrees of freedom Multiple R-squared: 0.8836, Adjusted R-squared: 0.8784 F-statistic: 170.8 on 2 and 45 DF, p-value: < 2.2e-16 Now, a spatial lag model. > lm.lag.states.sat <- lm(verbal ~ lag.states.sat + + takers.pct + I(takers.pct^2), data.frame(states.sat@data)) > summary(lm.lag.states.sat) Call: lm(formula = verbal ~ lag.states.sat + takers.pct + I(takers.pct^2), data = data.frame(states.sat@data)) Residuals: Min 1Q -22.380 -5.404 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 393.176311 54.203853 7.254 4.86e-09 *** lag.states.sat 0.351215 0.096379 3.644 0.000705 *** takers.pct -2.631304 0.286313 -9.190 8.45e-12 *** I(takers.pct^2) 0.023235 0.003486 6.665 3.54e-08 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 10.21 on 44 degrees of freedom Multiple R-squared: 0.9106, Adjusted R-squared: 0.9045 F-statistic: 149.4 on 3 and 44 DF, p-value: < 2.2e-16 A simultaneous autoregressive model > listw.states.sat <- nb2listw(nb.states.sat) > lm.sar.states.sat <- spautolm(verbal ~ takers.pct + + I(takers.pct^2), data.frame(states.sat@data), + listw.states.sat) > summary(lm.sar.states.sat) Call: spautolm(formula = verbal ~ takers.pct + I(takers.pct^2), data = data.frame(states.sat@data), listw = listw.states.sat) Residuals: Min 1Q -17.8535 -7.8846 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 584.766102 5.163588 113.2480 < 2.2e-16 takers.pct -2.501822 0.308423 -8.1117 4.441e-16 I(takers.pct^2) 0.019276 0.004135 4.6616 3.137e-06

Median 1.013

3Q 7.489

Max 23.734

Median 1.3722

3Q 7.4526

Max 20.3795

20

Lambda: 0.62138 LR test value: 11.4 p-value: 0.00073457 Log likelihood: -178.1777 ML residual variance (sigma squared): 87.458, (sigma: 9.3519) Number of observations: 48 Number of parameters estimated: 5 AIC: 366.36 A linear mixed eects model. An empirical variogram of residuals can be used to as a diagnostic tool for assessing the adequacy of a selected model for the covariance between nearby observatons in time and space (Fitzmaurice, Laird, and Ware, pg. 241). The tted line suggests exponential correlation structure in this example. > library(geoR) > variog.lm.states.sat <- variog(coords = coordinates(states.sat), + data = resid(lm.states.sat)) variog: computing omnidirectional variogram > plot(variog.lm.states.sat) > variofit.lm.states.sat <- variofit(variog.lm.states.sat) variofit: weights used: npairs variofit: minimisation function used: optim variofit: searching for best initial value ... selected values: sigmasq phi tausq kappa initial.value "109.2" "15.79" "54.6" "0.5" status "est" "est" "est" "fix" loss value: 760324.941711764 > lines(variofit.lm.states.sat)

21

200

150

q q q q q q q q q

semivariance

100

0 0

50

10

20 distance

30

40

50

Following Bivand, Pebesma, and Gomez-Rubio, pg. 287: > > > > + > + > + > + + > library(nlme) df.centroids <- data.frame(coordinates(states.sat)) names(df.centroids) <- c("long", "lat") corExp.states.sat <- corSpatial(1, form = ~"long" + "lat", type = "e") corExp.states.sat <- Initialize(corExp.states.sat, df.centroids) states.sat@data <- data.frame(states.sat, key = row(states.sat@data)[, 1]) lme.states.sat <- lme(verbal ~ takers.pct + I(takers.pct^2), random = ~1 | key, data = data.frame(states.sat), corr = corExp.states.sat) summary(lme.states.sat)

Linear mixed-effects model fit by REML Data: data.frame(states.sat) AIC BIC logLik 390.0037 400.8437 -189.0019 Random effects: Formula: ~1 | key 22

StdDev:

(Intercept) Residual 10.80869 3.986240 spatial correlation

Correlation Structure: Exponential Formula: ~"long" + "lat" | key Parameter estimate(s): range 0.04861126 Fixed effects: verbal ~ takers.pct Value Std.Error (Intercept) 590.2646 4.062224 takers.pct -2.8818 0.313574 I(takers.pct^2) 0.0233 0.003933 Correlation: (Intr) tkrs.p takers.pct -0.830 I(takers.pct^2) 0.742 -0.981

+ I(takers.pct^2) DF t-value p-value 45 145.30577 0 45 -9.19020 0 45 5.91415 0

Standardized Within-Group Residuals: Min Q1 Med -0.661548347 -0.241688139 0.008172693 Max 0.664782514 Number of Observations: 48 Number of Groups: 48

Q3 0.183590053

Geocoding
Geocode Big Ten schools. Assign geocoding service prex. > URL <- "http://maps.google.com/maps/geo?q=" Assign locations to a vector. > addresses <- c("801 South Wright Street, Champaign IL", + "107 S. Indiana Ave., Bloomington IN", "1 W Prentiss St, Iowa City IA", + "07 Fletcher St, Ann Arbor MI", "130 E. Elizabeth St, East Lansing MI", + "100 Church Street SE, Minneapolis MN", "1820 Chicago Ave, Evanston IL", + "275 W Woodruff Ave, Columbus OH", "243 S Allen St, State College PA", + "501 Hayes Street, West Lafayette IN", "716 Langdon St, Madison WI") Replace spaces and commas to create geocoding URLs. > addresses <- gsub(" ", "+", addresses) > addresses <- gsub(",", "%2C", addresses) > addresses [1] [2] [3] [4] [5] [6] [7] [8] "801+South+Wright+Street%2C+Champaign+IL" "107+S.+Indiana+Ave.%2C+Bloomington+IN" "1+W+Prentiss+St%2C+Iowa+City+IA" "07+Fletcher+St%2C+Ann+Arbor+MI" "130+E.+Elizabeth+St%2C+East+Lansing+MI" "100+Church+Street+SE%2C+Minneapolis+MN" "1820+Chicago+Ave%2C+Evanston+IL" "275+W+Woodruff+Ave%2C+Columbus+OH" 23

[9] "243+S+Allen+St%2C+State+College+PA" [10] "501+Hayes+Street%2C+West+Lafayette+IN" [11] "716+Langdon+St%2C+Madison+WI" Create a blank data frame to be lled with coordinates. > coords <- data.frame(t(rep(NA, 3))) > colnames(coords) <- c("accuracy", "lat", "long") Loop through addresses, geocode each one, and ll data frame with coordinates. > tmp <- tempfile() > for (i in 1:length(addresses)) { + download.file(paste(URL, addresses[i], "&output=csv", + sep = ""), destfile = tmp) + coords[i, ] <- read.csv(tmp, header = F)[-1] + } Make sure the geocodes are suciently accurate before nalizing geocoded data frame. Accuracy codes:
http://code.google.com/apis/maps/documentation/geocoding/index.html#GeocodingAccuracy

> cbind(addresses, coords) addresses accuracy 1 801+South+Wright+Street%2C+Champaign+IL 8 2 107+S.+Indiana+Ave.%2C+Bloomington+IN 8 3 1+W+Prentiss+St%2C+Iowa+City+IA 8 4 07+Fletcher+St%2C+Ann+Arbor+MI 6 5 130+E.+Elizabeth+St%2C+East+Lansing+MI 8 6 100+Church+Street+SE%2C+Minneapolis+MN 8 7 1820+Chicago+Ave%2C+Evanston+IL 8 8 275+W+Woodruff+Ave%2C+Columbus+OH 8 9 243+S+Allen+St%2C+State+College+PA 8 10 501+Hayes+Street%2C+West+Lafayette+IN 8 11 716+Langdon+St%2C+Madison+WI 8 lat long 1 40.10796 -88.22893 2 39.16648 -86.52689 3 41.65444 -91.53623 4 42.27976 -83.73714 5 42.73823 -84.48285 6 44.97590 -93.23491 7 42.05049 -87.67802 8 40.00393 -83.01684 9 40.79319 -77.85999 10 40.43007 -86.91162 11 43.07582 -89.39763

Point-level maps
Join attributes to points (i.e., Big Ten stadium capacity to stadium locations). Create attributes data frame. > attributes <- data.frame(stadium.cap = c(62872, + 52180, 70585, 107501, 75005, 50300, 49256, + 102329, 107282, 62500, 80321)) 24

> + + > + + > + + > + + + + +

attributes$win.pct <- 100 * c(0.34, 0.25, 0.512, 0.727, 0.465, 0.32, 0.422, 0.777, 0.625, 0.473, 0.59) attributes$under.grads <- c(30895, 30394, 20907, 26083, 36072, 28645, 8284, 39209, 36612, 31290, 28999) attributes$pop <- c(75254, 183733, 147038, 341847, 528193, 3175041, 74239 + 0.5 * 8711000, 1754337, 135758, 182821, 543022) rownames(attributes) <- c("University Of Illinois", "Indiana University", "University of Iowa", "University of Michigan", "Michigan State University", "University of Minnesota", "Northwestern University", "Ohio State University", "Penn State University", "Purdue University", "University of Wisconsin")

Convert data frame to a spatial object with longlat projection. Note that longitude is x and latitude is y, requiring reversal of order in data frame above. The proj4string command indicates no projection. > > > > library(maptools) coordinates(coords) <- c("long", "lat") big.ten <- SpatialPointsDataFrame(coords, attributes) big.ten stadium.cap 62872 52180 70585 107501 75005 50300 49256 102329 107282 62500 80321 pop 75254 183733 147038 341847 528193 3175041 4429739 1754337 135758 182821 543022

coordinates (-88.2289, 40.108) (-86.5269, 39.1665) (-91.5362, 41.6544) (-83.7371, 42.2798) (-84.4828, 42.7382) (-93.2349, 44.9759) (-87.678, 42.0505) (-83.0168, 40.0039) (-77.86, 40.7932) (-86.9116, 40.4301) (-89.3976, 43.0758) win.pct under.grads University Of Illinois 34.0 30895 Indiana University 25.0 30394 University of Iowa 51.2 20907 University of Michigan 72.7 26083 Michigan State University 46.5 36072 University of Minnesota 32.0 28645 Northwestern University 42.2 8284 Ohio State University 77.7 39209 Penn State University 62.5 36612 Purdue University 47.3 31290 University of Wisconsin 59.0 28999 University Of Illinois Indiana University University of Iowa University of Michigan Michigan State University University of Minnesota Northwestern University Ohio State University Penn State University Purdue University University of Wisconsin > class(big.ten) [1] "SpatialPointsDataFrame" attr(,"package") [1] "sp" > library(rgdal) > proj4string(big.ten) 25

[1] NA Compare maps before and after applying longlat projection. > + > > > > par(mfrow = c(1, 2), 7)) plot(big.ten, axes = proj4string(big.ten) plot(big.ten, axes = par(oldpar) pty = "s", lab = c(5, 3, T) <- CRS("+proj=longlat") T)

45N

40N

90W 85W 80W

40N 90W 85W 80W


26

Make a presentable map with varying symbol sizes. Assign contiguous state boundaries to an object. Convert boundaries to SpatialLines object and apply longlat projection. > > > > + > > > + library(maps) states <- map("state", plot = F) states <- map2SpatialLines(states, proj4string = CRS("+proj=longlat")) plot(big.ten, axes = T, col = "white", xlim = c(min(big.ten@coords[, 1]) * 1.025, max(big.ten@coords[, 1]) * 0.975)) plot(states, add = T, col = "grey") plot(big.ten, axes = T, add = T) title(main = "Big Ten schools", xlab = "Longitude", ylab = "Latitude")

45N

> points(big.ten, cex = big.ten$stadium.cap/10000) > text(big.ten, labels = rownames(big.ten@data), + cex = 0.5, pos = 3)

Big Ten schools


48N 46N

University of Minnesota

44N

University of Wisconsin Michigan State University University of Michigan

Latitude

42N

Northwestern University University of Iowa

Penn State University

40N

Purdue University University Of Illinois Indiana University

Ohio State University

36N 95W

38N

90W

85W Longitude

80W

Make a symbols map. > + > > + > + plot(big.ten, axes = T, col = "white", xlim = c(min(big.ten@coords[, 1]) * 1.025, max(big.ten@coords[, 1]) * 0.975)) plot(states, add = T, col = "grey") title(main = "Big Ten schools", xlab = "Longitude", ylab = "Latitude") text(big.ten, labels = rownames(big.ten@data), cex = big.ten$stadium.cap/1e+05)

27

Big Ten schools


48N 46N

University of Minnesota

44N

Latitude

University of Wisconsin

42N

Northwestern University

University of Michigan Penn State University


State University

Michigan State University

University of Iowa Purdue University University Of Illinois Ohio


Indiana University

36N 95W

38N

40N

90W

85W Longitude

80W

Plot surface approximation. > > + + > + + > + > > + library(MBA) obs.surf <- mba.surf(cbind(big.ten@coords[, 1], big.ten@coords[, 2], big.ten$stadium.cap), no.X = 100, no.Y = 100, extend = T)$xyz.est jet.colors <- colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan", "#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000")) image(obs.surf, xaxs = "r", yaxs = "r", main = "Surface approximation", col = jet.colors(200)) points(big.ten, pch = 3) text(big.ten, labels = rownames(big.ten@data), pos = 1)

28

Surface approximation
45 University of Minnesota 44 43

University of Wisconsin Michigan State University University of Michigan Northwestern University University of Iowa

41

42

Penn State University Purdue University University Of IllinoisOhio State University Indiana University 90 85 80 40
> + > > > > + > +

Plot map with surface approximation. plot(big.ten, axes = T, col = "white", xlim = c(min(big.ten@coords[, 1]) * 1.025, max(big.ten@coords[, 1]) * 0.975)) plot(states, add = T, col = "grey") image(obs.surf, col = jet.colors(200), add = T) points(big.ten, pch = 3) title(main = "Big Ten schools", xlab = "Longitude", ylab = "Latitude") text(big.ten, labels = rownames(big.ten@data), cex = 0.5, pos = 3)

39

29

Big Ten schools


48N 46N

University of Minnesota

44N

University of Wisconsin Michigan State University University of Michigan

Latitude

42N

Northwestern University University of Iowa

Penn State University

40N

Purdue University University Of Illinois Indiana University

Ohio State University

36N 95W

38N

90W

85W Longitude

80W

3D plots
Cloud plot > library(lattice) > print(cloud(big.ten$stadium.cap ~ big.ten@coords[, + 1] * big.ten@coords[, 2], type = c("p", "h"), + xlab = "Longitude", ylab = "Latitude", zlab = "Stadium\ncapacity", + main = "Cloud plot of Big Ten football stadium capacity"))

30

Cloud plot of Big Ten football stadium capacity

Stadium capacity

Latitude

Longitude

3D perspective plot > persp(obs.surf, xlab = "Longitude", ylab = "Latitude", + zlab = "Stadium\ncapacity", theta = 315, phi = 30, + expand = 0.75, col = "lightgreen", shade = 0.75, + border = NA)

31

um Stadi y it capac
La tit ud e Lo n tu gi de
32

Você também pode gostar